In machine learning, the main objective is to select a model that captures the predictive power of its training data, while acquiring the capacity to subsequently generalize these forecasting strengths to previously unseen data. For time-series forecasting in nonstationary data domains this is difficult because relevant training cues only exist in the short recent history of the signal. However, training with short-duration sequences can leave many model types sensitive to localized noise and poorly exposed to some of the crucial data patterns needed for complete learning, thus impeding forecasting accuracy when operating on new data.
During this study, AlgoTactica has demonstrated the exceptional forecasting power of Gaussian Process Regression (GPR) models when trained on short time-series histories. Given a previously unseen data case, GPR uses a kernel function to locate similar examples in its training set, and then interpolates a prediction based on known outcomes associated with this subset of stored patterns. The GPR is well-recognized for its unique interpolation power, and belongs to a class of models generally referred to as kernel machines. Here, its performance is compared with that of another well-known kernel machine, the Support Vector Regression (SVR).
The Electricity Market Spot Price figure illustrates a nonstationary time series consisting of hourly spot prices, in dollars per megawatt hour ($/MWh), for electricity supply contracts. Both GPR and SVR models were trained on 168 vector values from the very short red sequence, shown in the plot, and then used to make predictions on cases occurring in the previously unseen blue sequence. Within the blue test sequence, the task consisted of each model operating on price vectors observed at time t, in order to predict the future price expected to occur 24 hours later (t+24).
The Lag Plot scattergram shows the bivariate paired relationship existing between prices at times t and t+24, for both the red training sequence and the blue testing sequence. It is very apparent that the domain of pairs for the training sequence only sparsely covers the central domain of pairs for the testing sequence, and does not extend to the extreme edges of the test domain at all. This verifies that the models have been trained on data which inadequately represents the range of possibilities in the test set. Therefore, making accurate predictions on the test data will require that the models perform interpolation when operating in the regions not covered by the red dots in the scattergram. The Price PDF diagram further illustrates that there are distributional dissimilarities between the training and test data sets.
GPR and SVR models were designed using a Bayesian optimization procedure on the training data, and their predictions on the test set were then randomly bootstrapped to produce sampling distributions for the forecasting errors, as shown in the RMS Error Distributions plot. Clearly, the GPR produces much smaller errors than the SVR, yielding a median RMS error value of 6.8, which is approximately 44% smaller than the 12.1 value from the SVR. The GPR Fit plot shows a section of the predicted time series forecasted by the GPR, superimposed on the actual known values occurring in the test set and which are shown in grey; a similar plot for the SVR predictions is also shown in the SVR Fit diagram. Here, it is very apparent that the interpolation ability of the GPR enables it to produce a superior forecast sequence that acceptably approximates the known true values; however, the SVR is not able to achieve a useful forecast at all.
The Error by Day of Week plot presents the average daily error for each model forecast performed on the test set. Overall, the curves have very similar shapes, however, the SVR curve plots much higher on the y-axis, reflecting the poor performance of this model. Similarly, the Error by Hour of Day plot shows the average hourly error produced by each model. For the GPR, this error is similar from hour to hour, suggesting that the model is able to predict the peaks, troughs, and intermediate values of the waveform with nearly equal capability. However, the SVR shows significant fluctuations, with much larger error margins and therefore increasingly degraded performance occurring during hours 1-6 and 17-22.