Deep Learning for Short-Term Load Forecasting
Deep Learning for Short-Term Load Forecasting
ABSTRACT Residential demand response is vital for the efficiency of power system. It has attracted
much attention from both academic and industry in recent years. Accurate short-term load forecasting is
a fundamental task for demand response. While short-term forecasting for aggregated load data has been
extensively studied, load forecasting for individual residential users is still challenging due to the dynamic
and stochastic characteristic of single users’ electricity consumption behaviors, i.e., the variability of the
residential activities. To address this challenge, this paper presents a short-term residential load forecasting
framework, which makes use of the spatio-temporal correlation existing in appliances’ load data through deep
learning. Multiple time series are conducted in the framework to describe electricity consumption behaviors
and their internal spatio-temporal relationship. And a method based on deep neural network and iterative
ResBlock is proposed to learn the correlation among different electricity consumption behaviors for short-
term load forecasting. Experiments based on real world measurements have been conducted to evaluate the
performance of the proposed forecasting approach. The results show that both the appliances’ load data
and iterative ResBlocks can help to improve the forecasting performance. Compared with existing methods,
measurements on Root Mean Squared Error, Mean Absolute Error and Mean Absolute Percentage Error
for the proposed approach are reduced by 3.89%-20.00%, 2.18%-22.58% and 0.69%-32.78%. In addition,
further experiments are conducted to evaluate the impact of using appliances’ load data, iterative ResBlocks
as well as other factors for the proposed approach.
INDEX TERMS Smart grid, short-term load forecasting, deep learning, residential load forecasting, iterative
ResBlocks.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see [Link]
VOLUME 8, 2020 55785
Y. Hong et al.: Deep Learning Method for Short-Term Residential Load Forecasting in Smart Grid
previous values. The aggregated load forecasting gives the More recently, researchers explored using deep learn-
estimation of the total electricity consumption for a group ing techniques to perform STLF for individual users, due
of users in a specific area, such as a city or a residential to its ability to extract the latent features of users’ elec-
community. tricity consumption behaviors and less domain knowledge
Short-term forecasting for aggregated load data has been requirements compared to traditional methods. Classical deep
extensively studied. Time series analysis has been applied on learning models are used for STLF, such as Deep Neu-
the STLF problem, including Auto Regressive Moving Aver- ral Network (DNN) [43], Extreme Learning Machine [36].
age (ARMA) based method [11]–[13], [37], [42] and Sup- Kong et al. [10] proposed a Long Short-Term Memory RNN
port Vector Regression (SVR) based method [14], [15], [45]. (LSTM-RNN) based framework for residential STLF. Exper-
Wei and Zhengang [11] combined the ARMA based method iment results showed that their method outperforms tradi-
and other statistical methods to improve the performance for tional machine learning methods. Shi et al. [21] proposed a
aggregated load forecasting. Pappas et al. [12] developed an pooling based deep RNN. The pooling stage uses load data of
offline model with enhanced the ARMA based method to neighbors to generate new features of inputs, which increases
predict the power usage provided by a power company in the data volume and helps to solve the over-fitting problem.
Greece. Huang and Shih [13] proposed a method that could Although the above methods made progresses in some
improve the forecasting accuracy through a modified ARMA aspects, the historical data they employed are the overall
based method with non-Gaussian process. Amini et al. [40] load data of single resident, which cannot include the spatio-
apply an ARIMA based model for electric vehicles’ demand temporal correlation existing among appliances’ load data.
forecasting. The method decouples conventional electrical The spatio-temporal correlation mentioned here is the spa-
load and charging demand of EV to forecast them indepen- tio correlation among electricity consumption behaviors of
dently, which can reduce forecasting errors. Yang et al. [14] different kind of appliances and the temporal correlation
presented an SVR based method to forecast power consump- between the historical electricity consumption behaviors and
tion at a city scale. They developed a grid search approach to future electricity consumption behaviors. For a single user,
automatically tune the model parameters, which can reduce the spatio correlation exists in the user’s electricity consump-
the difficulty in the parameter optimization phase. Velasco tion behaviors of different appliances. For instance, house-
et al. [15] presented a load forecasting model based on the hold members may have daily routines that using washing
SVR method for country-wide power usage. Ren [Link] [45] machine after taking a shower, or opening refrigerator before
employed an ensembled method for STLF, which consists of making a meal. The temporal correlation is the similarity
SVR, random forest and Xgboost. Other methods [16]–[20] of the historical electricity consumption behaviors and the
take extra conditions into account to improve the forecasting future consumption behaviors. More specifically, the time of
results. an electricity consumption behavior (e.g., electricity usage of
Load forecasting for individual users gives the estimation washing machine) happened in the future are probably close
of the total electricity consumption of an individual user, to the time that the same behavior happened in the past. These
e.g., a resident. However, this problem is still challenging. correlations exist inside the load profile, and is significant for
A major reason is that electricity consumption behaviors of individual users’ load forecasting.
single user is stochastic. The stochasticity is introduced by the Consequently, some researchers employed the load data
uncertainty of the time that electricity consumption activities of different appliances for load forecasting. These researches
occurred [21]. Another reason for the difficult of individ- employed different methods to explore the correlation among
ual users’ load forecasting is the consumption of electricity different appliances. Dinesh et al. [41] used nonintrusive
usage is dynamic, even for a specific application of the same appliance load monitoring techniques [26] to collect the
user. appliances’ load and forecast household load based on graph
In order to provide supporting accurate information for res- spectral clustering. Mohi Ud Din et al. [42] applied the
idential demand response, STLF for individual users attracts appliances-level load as the input features of neural network
increasing interests recently. Some methods [22]–[24] apply structure to forecast short-term load, and PCA technique is
clustering techniques to obtain the groups of users that have employed for feature reduction.
similar consumption behaviors. Teeraratkul et al. [22] pro- In this paper, we explore using the spatio-temporal cor-
posed a shape based clustering method for STLF. Based relation among different kinds of electricity consumption
on the profile of load curves, the method uses dynamic behaviors to improve the performance of STLF. We present
time warping technique to cluster the load curve and find a framework of STLF for individual users based on the cor-
a canonical shape for each set of curves. Then, a Markov relation information. Multiple time series are conducted in
model is used to conduct the individual users’ load fore- the framework to describe electricity consumption behaviors
casting. Quilumba et al. [23] grouped individual users based for different applications and their internal spatio-temporal
on similar consumption behaviors, which are represented by relationship. And a method based on Deep Neural Network
users’ load data. Based on the clustering results, a neural and iterative ResBlocks is proposed to learn the correlation
network employing weather and calendar features is used in among consumption behaviors for STLF. ResBlock that con-
the prediction phase. sists of few stacked layers and one skip connection is based
FIGURE 2. Disaggregation of household electricity usage. FIGURE 3. An example of daily energy consumption for a house in the
greater Boston area of the U.S.
of the iterative ResBlocks. We denote the number of iterations C. SEQUENTIAL GRID SEARCH METHOD FOR
as t. When t = 0, the ResBlocks module degenerates into a HYPER-PARAMETER OPTIMIZATION
non-iterative structure, and the IRBDNN model degenerates This subsection introduces the hyper-parameters optimiza-
into a DNN model. When t = 1, the input of the first ResBlock tion method for the proposed the IRBDNN model. As it
(denoted as ResBlock 1) is added to the output of the ResBlock is described in Section 3.1, the IRBDNN architecture con-
1 by a skip connection. The ResBlock 1 consists of three parts: sists of the stacked layers and the iterative ResBlocks. The
m stacked layers, the ResBlock 2 and a skip connection. When number of hidden neurons in each layer of IRBDNN model
t = t0 , the iterations are repeated t0 times. The ResBlock are the same, which is denoted as N . The weight matrix
t0 consists of m stacked layers, ResBlock (t0 + 1) and a that connects the hidden neurons in (l − 1)th layer and
skip connection. If iteration t0 is the last iteration, ResBlock the neurons in l th layer is W l , which is a N × N matrix.
(t0 + 1) degenerate into n stacked layers ( m > 0, n > 0). bl is a vector that contains N elements which represents
For each ResBlock in the IRBDNN model, the input of the bias of the hidden neurons in lth layer. The output of
ResBlock is linked to the output directly by skip connection, (l − 1)th layer is denoted as al−1 . Then the output of layer
which ensures the learning capability of the current ResBlock lth is al = σ (W l al−1 + bl ), where σ denotes the activation
with deeper embedded ResBlock is no worse than the learning function.
capability of the ResBlock without deeper embedded Res- The loss function applied in the model is given by (5):
Block. The structure enables the model to make full use of the r
spatio-temporal correlation among the different consumption 1 X
behaviors. Loss = (t(n) − p(n))2 (5)
N
As it is described above, when the number of iterations
is 0, the structure of the IRBDNN model degenerates into a where t(n) and p(n) denote the truth load data and the
structure of DNN. When the number of iterations is no less predicted load data in the n-th time interval, and N is
than 1, the iteration procedures of the IRBDNN model are the number of predicted time intervals in the training
presented as below: set.
In deep learning methods, numbers of hyper-parameters
y = F(x1 , 21 ) + W0 (x0 ) (1) need to be optimized. An exhaustive grid search for all
F(x1 , 21 ) = F(x2 , 22 ) + W1 (x1 ) (2) hyper-parameters is time-consuming. To address this prob-
lem, we design a sequential grid search approach to opti-
...
mize hyper-parameters for the proposed the IRBDNN model,
F(xt , 2t ) = F(xt+1 , 2t+1 ) + Wt (xt ) (3) which is inspired by Ismail et al. work [31]. In this
paper, the following hyper-parameters are optimized. The
where F(xt+1 , 2t+1 ) is the output of (m + n) staked layers firsthyper-parameter is the number of neuron N in each layer,
in the ResBlock (t + 1) with the input xt , (t + 1) is the searched in (100, 150, 200, 300, 400, 450, 500). The second
number of iterations, y is the output of the IRBDNN model, hyper-parameter is the learning rate LR, searched in (0.001,
2 denotes weights, biases associated with the model and Wt 0.0001, 0.00001). And the last one is the initializer I that can
denotes the linear projection to match the possible changes of be used for the IRBDNN parameters, searched in (Normal,
dimensions. Uniform, Glorot Normal, Glorot Uniform). The sequential
grid search process is illustrated in Figure 7 and Figure 8.
B. INPUT OF THE IRBDNN MODEL The Algorithm 1 described in Figure 7 is the main program
In this subsection, we construct multiple time series based and the Algorithm 2 described in Figure 8 is its subprogram.
on appliances’ load data to form the input of the IRBDNN The sequential grid search can be divided into three parts.
model. The time series consist of a number of load values In the first part, the hyper-parameters are initialized to build
and each load value represents the energy consumption for a the initial forecasting model. N and LR have significant
duration. We use ‘‘time interval’’ to denote a duration in the impact on the learning ability of the forecasting model. There-
rest of this paper. The preprocessed load data for the appliance fore, we optimize N and LR synchronously to define Model
i in time interval t is represented as Ei (t), and the overall 1 in the second part. I is set an initialized value first, then it is
load in time interval t is denoted as E0 (t). The sequence for optimized in the third part. After the above process, Model 2
forecasting the overall load E0 (t0 + 1) is shown as below: is defined with the optimal N , LR and I .
FIGURE 8. The flow diagram for the subprogram of the sequential grid
search approach.
TABLE 2. Descriptive statistics of the three subsets. TABLE 3. Range of hyper-parameters for grid search.
FIGURE 10. Error distributions of the results of the forecasting methods. (The suffix ‘AO’ indicates that the model employs both appliances’
load and the overall load data, and the suffix ‘O’ indicates that the model only employs the overall load data).
for the other methods that is larger than 210 is larger than 25.
The IRBDNN based method’s forecasting results have less
extreme large RSEs (>210) than other methods and has
smaller RSEs (0-140) than other methods. Considering the
overall performance in Table 4, we can draw the conclusion
that the IRBDNN based method generally performs better
than other methods.
C. DISCUSSION
To obtain a more comprehensive understanding for the per-
formance of the proposed approach, four additional sets of
experiments are conducted. The training data, validation data
FIGURE 11. Forecasting Performance of the proposed method for a week. and the testing data in these experiments are the same as the
data used in the Section IV, B.
iterative ResBlock can effectively learn the spatio-temporal
correlation among consumption behaviors. The forecasting 1) PERFORMANCE ANALYSIS OF SPATIO-TEMPORAL
results of a residential user using the proposed method as well CORRELATION AND ITERATIVE ResBlocks
as the actual load are illustrated in Figure 11. In the first set, three groups of experiments are conducted to
Figure 10 is the error distributions of the results of the fore- verify the impact of spatio-temporal correlation among dif-
casting methods. We can observe from the figure that the error ferent appliances and iterative ResBlocks. The three groups
distributions of the IRBDNN-OA, DNN-OA and SRX-OA are the IRBDNN group, the DNN group and the SRX group.
perform better than ELM-O and ARMA-O that only employ Each group includes two cases with different historical data:
the overall load data. More specifically, the number of RSEs 1) the overall load data; 2) the overall load data and the
between 0 and 140 (the first two bars) in the IRBDNN-OA’s appliances’ load data. The experiments results are shown
forecasting result is over 255, which is more than DNN-OA, in Figure 12.
SRX-OA, ARMA-O and ELM-O. In other words, the number The results from Figure 12 show that both the appliances’
of RSEs larger than 140 in the IRBDNN-OA’s forecasting load data and iterative ResBlocks can help to improve the
results is the fewest among the forecasting result of the five forecasting performance. Detailed analysis is presented as
methods, because the RSE is calculated for each predicted follows. Firstly, comparing two experiments in each group,
value and the number of RSEs for each method is equal. The the case employs both appliances’ load and the overall load
number of RSEs larger than 210 in the forecasting result of performs better than the case only employs the overall load
the IRBDNN-OA is around 25, while the number of RSEs data. We can conclude that the employment of the appliances’
FIGURE 12. Impact of employing the appliances’ load data. (a) RMSE;
(b) MAE.
TABLE 5. Details for the network structure of the IRBDNN model with
different number of iterations. FIGURE 14. MAE of the IRBDNN model with different number of
iterations.
V. CONCLUSION
In this paper, we explored using the spatio-temporal correla-
tion among different kinds of appliances to predict the short-
term electricity demand for individual residential users.
An effective STLF framework that includes the data acqui-
sition module, the data preprocessing module, the model
FIGURE 16. Influence on RMSE of hidden neuron number.
training module and the load forecasting module was pro-
posed. Multiple time series were conducted in the frame-
work to describe electricity consumption behaviors and their
neurons are similar. Both of them decrease when the hidden internal spatio-temporal relationship. In order to fully exploit
neuron number increases from 100 to 300, while both of the correlation of user behaviors and characteristics of users’
them increase when the number of hidden neurons is larger consumption patterns, a method based on DNN and iterative
than 300. ResBlocks was proposed to learn the correlation. A grid
search method was employed in the hyper-parameter opti-
4) ANALYSIS OF COMPUTATION TIME ON THE IRBDNN
mization phase. The proposed method and several existing
BASED METHOD
forecasting methods were evaluated on a real world dataset.
The results show that the IRBDNN based method outper-
In order to analysis the computational cost, we compared
forms other compared methods. Moreover, we demonstrated
the training time and the testing time of the IRBDNN based
that both the appliances’ load data and iterative ResBlocks
method and the based DNN method, since both of them
can help to improve the forecasting performance. In addition,
are implemented on neural network. The experiments are
experiment results indicate that the IRBDNN based method
conducted on a personal computer equipped with a 2.5GHz
intends to have improved performances with the increment of
Intel i5 Core Processer and 8GB RAM. We compared the
the iteration number. In the future work, we intend to employ
training time and the testing time for both methods with two
the correlation defined in communication networks [38], [39]
types of historical data, i.e., 1) only the overall load (denoted
to express the spatio-temporal correlation among differ-
as input type ‘O’); 2) both the overall load and the appliances’
ent residential users to improve the performance of STLF.
load (denoted as input type ‘OA’).
Also, we will explore predicting thermophysical properties of
Table 6 shows the computational cost for both methods.
matter [44].
The training time is recorded by running 200 epochs for the
whole training set. And the testing time is recorded by the
REFERENCES
whole testing set, which has 7 day’s data. We can observe
[1] H. J. Monfared, A. Ghasemi, A. Loni, and M. Marzband, ‘‘A hybrid price-
from the table that the employment of the appliance’s load based demand response program for the residential micro-grid,’’ Energy,
increases the computation cost for the training process and vol. 185, pp. 274–285, Oct. 2019.
the testing process for both methods, which indicates that the [2] J. A. Gomez-Herrera and M. F. Anjos, ‘‘Optimal collaborative demand-
response planner for smart residential buildings,’’ Energy, vol. 161,
improvement of the forecasting performance by employing
pp. 370–380, Oct. 2018.
the appliances’ load is at the cost of computation time. Also, [3] M. H. Albadi and E. F. El-Saadany, ‘‘A summary of demand response
we can observe from the table that the employment of iterative in electricity markets,’’ Electric Power Syst. Res., vol. 78, no. 11,
ResBlocks does not evidently increase the computation cost pp. 1989–1996, Nov. 2008.
[4] W. Yuan, J. Huang, and Y. J. A. Zhang, ‘‘Competitive charging station
of the IRBDNN based method that adopting skip connections pricing for plug-in electric vehicles,’’ IEEE Trans. Smart Grid, vol. 8, no. 2,
in the network structure. Thus, although there is additional pp. 627–639, Dec. 2017.
[5] M. Muratori and G. Rizzoni, ‘‘Residential demand response: Dynamic [28] C.-N. Yu, P. Mirowski, and T. K. Ho, ‘‘A sparse coding approach to
energy management and time-varying electricity pricing,’’ IEEE Trans. household electricity demand forecasting in smart grids,’’ IEEE Trans.
Power Syst., vol. 31, no. 2, pp. 1108–1117, Mar. 2016. Smart Grid, vol. 8, no. 2, pp. 738–748, Jan. 2016.
[6] T. Chen, ‘‘A collaborative fuzzy-neural approach for long-term load [29] Y. Wang, Q. Chen, C. Kang, Q. Xia, and M. Luo, ‘‘Sparse and redundant
forecasting in taiwan,’’ Comput. Ind. Eng., vol. 63, no. 3, pp. 663–670, representation-based smart meter data compression and pattern extrac-
Nov. 2012. tion,’’ IEEE Trans. Power Syst., vol. 32, no. 3, pp. 2142–2151, May 2016.
[7] L. Han, Y. Peng, Y. Li, B. Yong, Q. Zhou, and L. Shu, ‘‘Enhanced deep [30] K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Deep residual learning for image
networks for short-term and medium-term load forecasting,’’ IEEE Access, recognition,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR),
vol. 7, pp. 4045–4055, 2019. Jun. 2016, pp. 770–778.
[8] Z. Yu, Z. Niu, and W. Tang, ‘‘Deep learning for daily peak load [31] M. Ismail, M. Shahin, M. F. Shaaban, E. Serpedin, and K. Qaraqe, ‘‘Effi-
forecasting—A novel gated recurrent neural network combining dynamic cient detection of electricity theft cyber attacks in AMI networks,’’ in Proc.
time warping,’’ IEEE Access, vol. 7, pp. 17184–17194, 2019. IEEE Wireless Commun. Netw. Conf. (WCNC), Apr. 2018, pp. 1–6.
[32] J. Z. Kolter and M. J. Johnson, ‘‘REDD: A public data set for energy
[9] M. R. Haq and Z. Ni, ‘‘A new hybrid model for short-term electricity load
disaggregation research,’’ in Proc. Sustkdd, vol. 25, 2011, pp. 59–62.
forecasting,’’ IEEE Access, vol. 7, pp. 125413–125423, 2019.
[33] X. Gao, X. Li, B. Zhao, W. Ji, X. Jing, and Y. He, ‘‘Short-term electric-
[10] W. Kong, Z. Y. Dong, Y. Jia, D. J. Hill, Y. Xu, and Y. Zhang, ‘‘Short-term
ity load forecasting model based on EMD-GRU with feature selection,’’
residential load forecasting based on LSTM recurrent neural network,’’
Energies, vol. 12, no. 6, p. 1140, 2019.
IEEE Trans. Smart Grid, vol. 10, no. 1, pp. 841–851, Jan. 2019.
[34] T. Chai and R. R. Draxler, ‘‘Root mean square error (RMSE) or mean
[11] L. Wei and Z. Zhen-gang, ‘‘Based on time sequence of ARIMA model absolute error (MAE),’’ Geoscientific Model Develop. Discuss., vol. 7,
in the application of short-term electricity load forecasting,’’ in Proc. Int. no. 1, pp. 1525–1534, 2014.
Conf. Res. Challenges Comput. Sci., Dec. 2009, pp. 11–14. [35] S. Kim, G. Lee, G.-Y. Kwon, D.-I. Kim, and Y.-J. Shin, ‘‘Deep learning
[12] S. S. Pappas, L. Ekonomou, D. C. Karamousantas, G. E. Chatzarakis, based on multi-decomposition for short-term load forecasting,’’ Energies,
S. K. Katsikas, and P. Liatsis, ‘‘Electricity demand loads modeling using vol. 11, no. 12, p. 3433, 2018.
AutoRegressive moving average (ARMA) models,’’ Energy, vol. 33, no. 9, [36] G. Huang, S. Song, J. N. D. Gupta, and C. Wu, ‘‘Semi-supervised and
pp. 1353–1360, Sep. 2008. unsupervised extreme learning machines,’’ IEEE Trans. Cybern., vol. 44,
[13] S.-J. Huang and K.-R. Shih, ‘‘Short-term load forecasting via ARMA no. 12, pp. 2405–2417, Dec. 2014.
model identification including non-Gaussian process considerations,’’ [37] P. J. Brockwell, R. A. Davis, and M. V. Calder, Introduction to Time
IEEE Trans. Power Syst., vol. 18, no. 2, pp. 673–679, May 2003. Series and Forecasting, vol. 2. New York, NY, USA: Springer, 2002.
[14] Y. Yang, J. Che, C. Deng, and L. Li, ‘‘Sequential grid approach based [Online]. Available: [Link]
support vector regression for short-term electric load forecasting,’’ Appl. 2526-1#authorsandaffiliationsbook
Energy, vol. 238, pp. 1010–1021, Mar. 2019. [38] Z. Qin, Y. Wang, H. Cheng, Y. Zhou, Z. Sheng, and V. C. M. Leung,
[15] L. Clark, D. Lou, D. Michelle, G. T. Alegata, and G. C. Luna, ‘‘Day-ahead ‘‘Demographic information prediction: A portrait of smartphone appli-
load forecasting using support vector regression machines,’’ Int. J. Adv. cation users,’’ IEEE Trans. Emerg. Topics Comput., vol. 6, no. 3,
Comput. Sci. Appl., vol. 9, no. 3, pp. 22–27, 2018. pp. 432–444, Jul. 2018.
[16] L. Hernández, C. Baladrón, J. M. Aguiar, B. Carro, [39] H. Huang, H. Yin, G. Min, H. Jiang, J. Zhang, and Y. Wu, ‘‘Data-driven
A. Sánchez-Esguevillas, and J. Lloret, ‘‘Artificial neural networks information plane in software-defined networking,’’ IEEE Commun. Mag.,
for short-term load forecasting in microgrids environment,’’ Energy, vol. 55, no. 6, pp. 218–224, 2017.
vol. 75, pp. 252–264, Oct. 2014. [40] M. H. Amini, A. Kargarian, and O. Karabasoglu, ‘‘ARIMA-based decou-
[17] X. Cao, S. Dong, Z. Wu, and Y. Jing, ‘‘A data-driven hybrid optimization pled time series forecasting of electric vehicle charging demand for
model for short-term residential load forecasting,’’ in Proc. IEEE Int. Conf. stochastic power system operation,’’ Electr. Power Syst. Res., vol. 140,
Comput. Inf. Technol., Ubiquitous Comput. Commun., Dependable, Auto- pp. 378–390, Nov. 2016.
nomic Secure Comput., Pervasive Intell. Comput., Oct. 2015, pp. 283–287. [41] C. Dinesh, S. Makonin, and I. V. Bajic, ‘‘Residential power forecasting
[18] Y. Liang, D. Niu, and W.-C. Hong, ‘‘Short term load forecasting based on using load identification and graph spectral clustering,’’ IEEE Trans. Cir-
feature extraction and improved general regression neural network model,’’ cuits Syst. II, Exp. Briefs, vol. 66, no. 11, pp. 1900–1904, Nov. 2019.
Energy, vol. 166, pp. 653–663, Jan. 2019. [42] G. Mohi Ud Din, A. U. Mauthe, and A. K. Marnerides, ‘‘Appliance-level
short-term load forecasting using deep neural networks,’’ in Proc. Int. Conf.
[19] M. Tucci, E. Crisostomi, G. Giunta, and M. Raugi, ‘‘A multi-objective
Comput., Netw. Commun. (ICNC), Maui, HI, USA, Mar. 2018, pp. 53–57.
method for short-term load forecasting in European countries,’’ IEEE
[43] V. Sze, Y.-H. Chen, T.-J. Yang, and J. S. Emer, ‘‘Efficient processing of
Trans. Power Syst., vol. 31, no. 5, pp. 3537–3547, Sep. 2016.
deep neural networks: A tutorial and survey,’’ Proc. IEEE, vol. 105, no. 12,
[20] K. M. Powell, A. Sriprasad, W. J. Cole, and T. F. Edgar, ‘‘Heating, cooling, pp. 2295–2329, Dec. 2017.
and electrical load forecasting for a large-scale district energy system,’’ [44] Y. Zhou, Q. Li, and Q. Wang, ‘‘Energy storage analysis of UIO-66 and
Energy, vol. 74, pp. 877–885, Sep. 2014. water mixed nanofluids: An experimental and theoretical study,’’ Energies,
[21] H. Shi, M. Xu, and R. Li, ‘‘Deep learning for household load forecasting— vol. 12, no. 13, p. 2521, 2019.
A novel pooling deep RNN,’’ IEEE Trans. Smart Grid, vol. 9, no. 5, [45] L. Ren, L. Zhang, H. Wang, and L. Qi, ‘‘An ensemble model based on
pp. 5271–5280, Mar. 2017. machine learning methods for short-term power load forecasting,’’ in Proc.
[22] T. Teeraratkul, D. O’Neill, and S. Lall, ‘‘Shape-based approach to house- IOP Conf. Ser., Earth Environ. Sci., vol. 186, Oct. 2018, Art. no. 012042.
hold electric load curve clustering and prediction,’’ IEEE Trans. Smart
Grid, vol. 9, no. 5, pp. 5196–5206, Sep. 2017.
[23] F. L. Quilumba, W.-J. Lee, H. Huang, D. Y. Wang, and R. L. Szabados,
‘‘Using smart meter data to improve the accuracy of intraday load fore-
casting considering customer behavior similarities,’’ IEEE Trans. Smart
Grid, vol. 6, no. 2, pp. 911–918, Mar. 2015.
[24] J. D. Rhodes, W. J. Cole, C. R. Upshaw, T. F. Edgar, and M. E. Webber,
‘‘Clustering analysis of residential electricity demand profiles,’’ Appl.
Energy, vol. 135, pp. 461–471, Dec. 2014. YE HONG received the [Link]. degree in com-
[25] K. X. Perez, W. J. Cole, J. D. Rhodes, A. Ondeck, M. Webber, puter science from Sichuan University, China,
M. Baldea, and T. F. Edgar, ‘‘Nonintrusive disaggregation of residential air- in 2017, where she is currently pursuing the
conditioning loads from sub-hourly smart meter data,’’ Energy Buildings, master’s degree in computer science. Her research
vol. 81, pp. 316–325, Oct. 2014. interests include behavioral data analysis and deep
[26] G. W. Hart, ‘‘Nonintrusive appliance load monitoring,’’ Proc. IEEE, learning applications.
vol. 80, no. 12, pp. 1870–1891, 1992.
[27] J. Z. Kolter, S. Batra, and A. Y. Ng, ‘‘Energy disaggregation via discrim-
inative sparse coding,’’ in Proc. Adv. Neural Inf. Process. Syst., 2010,
pp. 1153–1161.
YINGJIE ZHOU (Member, IEEE) received the WENZHENG XU (Member, IEEE) received
Ph.D. degree from the School of Communication the [Link]., M.E., and Ph.D. degrees in com-
and Information Engineering, University of Elec- puter science from Sun Yat-sen University,
tronic Science and Technology of China (UESTC), Guangzhou, China, in 2008, 2010, and 2015,
China, in 2013. He was a Visiting Scholar with the respectively. He was a Visitor with The Australian
Department of Electrical Engineering, Columbia National University and The Chinese University
University, New York. He is currently an Assistant of Hong Kong. He is currently an Associate
Professor with the College of Computer Science, Professor with Sichuan University. His research
Sichuan University (SCU), China. His current interests include wireless ad hoc and sensor net-
research interests include network management, works, mobile computing, approximation algo-
behavioral data analysis, and resource allocation. rithms, combinatorial optimization, online social networks, and graph theory.
The proposed DNN with iterative ResBlocks outperforms traditional methods like ARMA by producing fewer extreme large relative squared errors (RSEs) and more frequent smaller RSEs (0-140). This is evident from the error distribution analysis, which shows that the IRBDNN model has fewer high RSEs (>210) than ARMA, indicating a generally more accurate forecast. This superiority is attributed to the model's ability to capture complex spatio-temporal relationships that traditional models may overlook .
In practice, the performance of deep learning models may degrade due to challenges such as the intrinsic characteristics of the dataset and optimization difficulties. The increased depth of the model, while theoretically enhancing learning capacity, can also lead to issues like overfitting and gradient vanishing or exploding, which complicate effective learning and hinder optimal performance .
Feature reduction helps improve neural network-based forecasting models by simplifying the input data, thus enhancing the model's ability to capture essential patterns without being overwhelmed by noise. A practical technique for feature reduction is the PCA (Principal Component Analysis), which is used to reduce the dimensionality of the appliance-level load data before inputting it into neural networks for short-term load forecasting .
The paper claims innovation in short-term load forecasting by being the first to utilize iterative ResBlocks to learn the latent features of electricity consumption behaviors. These blocks allow the neural network to delve deeper into data aspects, capturing complex spatio-temporal correlations that are crucial for improving model accuracy. This approach represents a novel utilization of deep learning architecture to enhance STLF performance beyond traditional methodologies .
The document presents evidence such as performance metrics from experiments showing that models with iterative ResBlocks, particularly the IRBDNN, have superior error distributions when compared to traditional methods like ARMA. The fewer extreme RSEs and more frequent smaller RSEs in IRBDNN results demonstrate its enhanced accuracy. Additionally, comprehensive analyses of spatio-temporal correlations substantiate the model's effective learning, as reflected in better predictive performance across various evaluation metrics .
Optimization and updating of a load forecasting model significantly impact its effectiveness by fine-tuning it to better capture the underlying patterns in the data. These processes involve adjusting the model parameters to minimize prediction error, hence ensuring that the model adapts well to varied and complex consumption patterns over time, improving its predictive accuracy and robustness .
The use of ResBlocks in the proposed architecture is justified by their ability to effectively learn both shallow and deep features, allowing the model to capture non-linear relationships between input features. ResBlocks provide shortcut connections that facilitate training deeper models without degradation in performance, thereby improving the model's capacity to generalize from spatio-temporal consumption data .
Nonintrusive appliance load monitoring offers the advantage of accurately collecting detailed load data from individual appliances without requiring invasive methods. This detailed appliance-level data can enhance load forecasting models by providing finer granularity in understanding consumption patterns, thereby increasing the accuracy of forecasts through more precise input features .
During preprocessing, the recording frequency of electricity consumption data is reduced from one measurement every three seconds to one measurement each hour. This transformation decreases the data volume, which eases computational demands and facilitates the capturing of broader consumption patterns without the noise of highly granular data, benefiting model training .
The IRBDNN model utilizes the spatio-temporal correlations by integrating both spatial and temporal features of electricity consumption behaviors, including the load of various appliances and historical usage patterns. It employs iterative ResBlocks within a deep neural network framework to learn these correlations, effectively recognizing patterns over time and among different appliances. By capturing these correlations, the model aims to improve short-term load forecasting accuracy .