We operate in a very competitive environment with over 70 business and domestic energy suppliers on the market. Small suppliers and new entrants pose the main challenge – they offer very low prices which are not always sustainable. For example, 11 energy suppliers ceased trading between Feb-18 and Jan-19 for various reasons, mainly because they had run out of cash.
Wholesale prices are the prices we buy energy at. They amount to up to 40% of an average bill. They are extremely volatile, which drives the necessity of accurate demand forecasting.
In the energy industry it is important to accurately forecast:
In sake of data confidentiality, all data presented in this document was artificially simulated in R
.
In the UK there are 14 distribution zones, i.e. areas that total national electricity consumption is geographically divided into.
Let’s plot the time series. Electric load data is famous for having multiple seasonal cycles, namely intra-day, intra-week and intra-year. Intra-day and intra-week cycles are driven by regular day-to-day activity of people and businesses, while intra-year cycle is caused by temperature movement.
An intra-day cycle for domestic and business customers looks like this.
To inspect a seasonal component we can also perform a time series decomposition.
In this tutorial we use a data set that contains business half-hourly power demand in 2017 in one of the UK cities in one of the half-hours. Variable names are hidded in sake of confidentiality.
Let’s inspect relationship between demand and some of the variables.
Relationship between Demand
and Temperature
is worth looking at as the latter largerly drives the former.
There are two statistical model types we can use – univariate or multivariate.
Let’s consider Australian monthly production data set elec
from the fpp2
package.
library(fpp2)
autoplot(elec)
Exponential smoothing assigns exponentially decreasing weights for newest to oldest observations. The weights are assigned based on the time series structure, which can be examined by using a technique called time series decomposition. Time series decomposition splits a time series into three components – Error, Trend and Seasonality (ETS).
decompose(elec) %>% autoplot()
It looks like errors and seasonality are multiplicative, while trend is additive in this time series. We can fit the exponential smoothing model.
We can fit an exponential smoothing model using the ets()
function from the forecast
package. Exponential smoothing models are useful in short-term forecasting, while longer term forecasts can be unreliable. We can produce a forecast using the model we’ve just fitted.
esModel <- ets(elec, model = "MAM")
esFrc <- esModel %>% forecast(model = esModel, h = 36)
autoplot(esFrc)
AutoRegressive Intergrated Moving Average is a family of univariate statistical models that like exponential smoothing rely on historic observations of a time series to yield predictions.
We can fit an ARIMA model to the elec
data by calling auto.arima()
or Arima()
function from the forecast
package, as well as produce forecasts using this model.
arimaModel <- auto.arima(elec)
arimaFrc <- elec %>% forecast(model = arimaModel, h = 36)
autoplot(arimaFrc)
Recall our actual data with electric load.
We could use either ARIMA or Exponential Smoothing to produce a week-ahead forecast for this time series.
Both models fail to produce reasonalbe forecasts when there’s a deviation from the general pattern. The trough on the right is due to Christmas, when business electricity consumption is expected to be lower than during any other time of the year. Had we used both models in the middle of the year when demand was more repetitive, the results would have been better.
To cope with this issue we might want to consider using additional variables that would enable the model to capture external events. Multivariate models come handy in this instance.
Regression is one of the most wide spread statistical methods. It uses external variables to describe relationship between them and a target. We can fit a regression model by calling the lm()
function from base R
.
As discussed earlier, we can also compute additional variables to achieve a better model fit.
In addition, to produce forecasts with regression we’d need to split our data into training and test sets. I use 80/20 split rule here.
##
## Call:
## lm(formula = Demand ~ ., data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -62311 -4702 1415 6960 20152
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 503279.463 35528.972 14.165 < 2e-16 ***
## Var1 -519.251 2126.214 -0.244 0.8072
## Var2 -201.647 312.318 -0.646 0.5190
## Var3 2.658 2.309 1.151 0.2508
## Var4 216.057 307.817 0.702 0.4833
## Var5 -4559.780 3340.124 -1.365 0.1733
## Var6 -2775.003 1756.988 -1.579 0.1154
## Var7 -20572.251 3397.249 -6.056 4.56e-09 ***
## Var8 -20573.571 3402.575 -6.046 4.79e-09 ***
## Var9 126.387 16.146 7.828 1.07e-13 ***
## Var10 -6189.577 2505.005 -2.471 0.0141 *
## Var11 1624.451 2501.985 0.649 0.5167
## Var12 73.389 2514.980 0.029 0.9767
## Var13 -2853.152 2514.513 -1.135 0.2575
## Var14 -63066.284 2510.967 -25.116 < 2e-16 ***
## Var15 -79850.493 2502.151 -31.913 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11410 on 276 degrees of freedom
## Multiple R-squared: 0.8995, Adjusted R-squared: 0.894
## F-statistic: 164.7 on 15 and 276 DF, p-value: < 2.2e-16
It turns out to be a pretty good model with high adjusted R^2. Let’s check residuals.
##
## Breusch-Godfrey test for serial correlation of order up to 19
##
## data: Residuals
## LM test = 57.188, df = 19, p-value = 1.068e-05
Let’s predict demand from the test set using this model and compare it with actuals.
We can compute forecast error such as Mean Absolute Percentage Error (MAPE) to check forecast accuracy.
## MAPE
## 1 0.09765663