Detailed Study Notes: Hourly Solar Irradiance Forecasting Using LSTM and CNN

Smart Grids and Sustainable Energy Research: Hourly Solar Irradiance Forecasting

Introduction

The transition to renewable energy is a global priority for energy security. Solar energy production has increased significantly, but its variability impacts photovoltaic (PV) systems' efficiency. Precise solar energy forecasting is crucial for PV systems to maintain high efficiency. Researchers have developed various techniques to forecast Global Horizontal Irradiance (GHI), which directly affects solar PV power production.

Forecasting Methods

Solar irradiance forecasting methods are categorized into:

  1. Physical Models: Based on numerical weather prediction (NWP) using global meteorological and local mesoscale models. These models use physical equations characterizing atmospheric changes, incorporating weather observations from ground sensors and satellites. Examples include the Global Forecast System (GFS) in the USA and the European Centre for Medium-Range Weather Forecasts (ECMWF) models. Drawbacks include coarse temporal resolution and initial condition sensitivity.

  2. Image-Based Methods: Use cloud images from satellites or ground-based cameras to predict cloud changes using digital image processing techniques and machine learning (ML) algorithms. Effective for very short-term prediction horizons.

  3. Data-Driven or Time-Series Models: Include statistical models (ST), machine learning (ML) models, and deep learning (DL) models. A fourth class, hybrid models, combines techniques from the same or different categories to improve forecasting accuracy.

Time Horizon

Based on the time horizon, forecasting types include:

  • Very Short Term: (Few seconds to several minutes ahead) Used for real-time battery storage management. Adapted models include image methods, ST, ML.

  • Short Term: (1 hour to 24 hours or 1 week ahead) Used for decision-making and unit commitment tasks. Adapted models include ML, DL, NWP, hybrid models.

  • Medium Term: (1 month to 1 year ahead) Used for maintenance schedules and power units running. Adapted models include ML, DL, hybrid models.

  • Long Term: (1 year to several years) Used for strategic power grid operations planning. Adapted models include ML, DL, hybrid models.

Statistical Models

Statistical models forecast GHI by analyzing past time series of irradiance with components like long-term trend, periodical components, and the mean. Examples include ARIMA, GARCH, and SARIMA. The main drawback is low accuracy in short-term predictions, and they require testing for stationarity.

Machine Learning and Deep Learning

ML and DL techniques can be applied to any time horizon without temporal or spatial limitations. ML approaches include SVM, Regression, KNN, and ensemble learning methods. DL methods use CNN and LSTM networks. There is a focus on deep models, specifically CNN and LSTM networks, as autonomous or hybrid structures for GHI prediction.

Prior Research
  • LSTM Applications: An LSTM model was used to predict one-hour and one-day ahead GHI, with the clearness index as input data to improve accuracy on cloudy days. LSTM-based models yield better prediction accuracy compared to ARIMA, SVR, BPNN, and CNN models.

  • Comparative Studies: Comparative analysis between LSTM neural networks, persistence algorithm, linear least square regression, and multilayered feedforward neural networks using backpropagation algorithm (BPNN) for hourly one-day ahead prediction of solar irradiance, data from Santiago, Cape Verde, resulting that LSTM technique yields more precise results than the BPNN technique.

  • Hybrid Models: The LSTM-CNN scheme demonstrated more accurate predictions for the next GHI hour compared to CNN-LSTM and standalone models, with a prediction skill score in the range of 37% to 45% and has better performaces compared to other hybrid models CNN–LSTM and a few standalone bench- mark models (SP, ANN, SVM, LSTM, CNN).

  • Wavelet Packet Decomposition (WPD): Hybrid models combining WPD, CNN, LSTM, and MLP to forecast solar irradiance one hour in advance, with the first model (WPD- CNN-LSTM-MLP) exhibits relatively better performances compared to the other models, resulting an RMSE of 32.1W/m232.1 W/m^2

  • Complete Ensemble Empirical Mode Decomposition Adaptive Noise (CEEMDAN): Hybrid model which decomposes the original historical GHI data using the CEEMDAN transformation; then, the generated components are arranged into five distinct configurations, each of which is applied to CNN-LSTM hybrid model. The study showed better overall performances regarding various error indices.

Knowledge gap

There is a notable lack of research applying CNN and LSTM to solar irradiance forecasting, with very few studies examining the impact of their hyperparameters on prediction quality.

Research Questions
  1. How to build simple deep learning models to capture short-term patterns in global horizontal irradiance data.

  2. How to effectively identify the best LSTM and CNN hyperparameters to improve forecasting accuracy.

Contributions
  1. Develop simple LSTM and CNN-based structures to improve the accuracy and reliability in forecasting global horizontal irradiance.

  2. Identify the optimal LSTM and CNN hyperparameters, input data structure, and parameter values (number of LSTM units, CNN kernels, dataset size, prediction window lag) to achieve the optimal forecasting scheme.

  3. Enhance GHI forecasting accuracy, enabling operation and maintenance (O&M) managers to make more reliable and efficient decisions.

Dataset and Preprocessing

Meteorological data was obtained from the National Renewable Energy Laboratory (NREL), specifically the National Solar Radiation Database (NSRDB). The dataset covers January 1, 1998, to December 31, 2020, with 30-minute intervals, including GHI (W/m2^2), pressure, temperature, wind speed, and relative humidity from the city of Los Angeles in California.

Preprocessing Steps
  1. Outlier Removal: Eliminating observations that deviate significantly from the rest of the data.

  2. Handling Missing Values: Replacing missing values using a benchmark technique.

  3. Normalization: Using the Min-max method to scale features to a common range [0, 1]. The formula for normalization for a GHI value is represented as:
    x<em>N=(xx</em>min)(x<em>maxx</em>min)x<em>N = \frac{(x - x</em>{min})}{(x<em>{max} - x</em>{min})}
    Min-max normalization scales all features of the diverse datasets, such as GHI and weather variables like temperature, humidity, and wind speed, to a common range, typically [0, 1].

The boxplot reveals a seasonal pattern, with the median irradiance distribution exhibiting a gradual increase from January to June, followed by a gradual decrease from July to December

Methodology

Deep Learning Neural Networks: LSTM

LSTM accounts for sequential dependencies in time series data, unlike standard neural networks. Each LSTM cell calculates the cell state c<em>tc<em>t and the hidden state h</em>th</em>t at each time step, based on input data x<em>tx<em>t, previous hidden state h</em>t1h</em>{t-1}, and previous cell state c<em>t1c<em>{t-1}. The LSTM update procedure is carried out through three gates: input gate i</em>ti</em>t, forget gate f<em>tf<em>t, and output gate o</em>to</em>t.

Forget Gate

This gate determines whether information should be retained or discarded. The output ftf_t is given by:

f<em>t=σ(W</em>f[h<em>t1,x</em>t]+bf)f<em>t = \sigma(W</em>f \cdot [h<em>{t-1}, x</em>t] + b_f)

where σ(.)\sigma(.) is the sigmoid function, W<em>fW<em>f is the weight matrix, and b</em>fb</em>f is the bias vector.

Input Gate

The sigmoid σ\sigma and tanh\tanh functions are applied in parallel to make a descision of either to write or to update the candidate cell ctc'_t. The equations describing the imput gate are:

i<em>t=σ(W</em>i[h<em>t1,x</em>t]+bi)i<em>t = \sigma(W</em>i \cdot [h<em>{t-1}, x</em>t] + b_i)

c<em>t=tanh(W</em>c[h<em>t1,x</em>t]+bc)c'<em>t = \tanh(W</em>c \cdot [h<em>{t-1}, x</em>t] + b_c)

Output Gate

The cell state ctc_t is updated recursively using the forget gate, input gate, and its old value. The equations that govern the output gate are:

c<em>t=f</em>tc<em>t1+i</em>tctc<em>t = f</em>t * c<em>{t-1} + i</em>t * c'_t

h<em>t=o</em>ttanh(ct)h<em>t = o</em>t * \tanh(c_t)

The predicted output y^t\hat{y}_t is estimated as:

y^<em>t=Wh</em>t\hat{y}<em>t = W h</em>t

where WW is a weight matrix.

CNN

CNN is traditionally used for image classification, but can be applied for correlations within time series. A CNN consists of convolution, pooling, and activation layers. The convolutional layer applies kernels to extract features, the pooling layer reduces the output size, and the activation layer introduces non-linearity. The fully connected layer makes predictions for the desired output. In our case, the input provided to the network comprises a sequence of time series data representing global horizontal irradiance.

Convolutional Layer

For a GHI input sequence {x}\lbrace x \rbrace of length N, the convolution operation for the kth filter is:

h<em>k=f(</em>i=1NF+1W<em>k,ix</em>i:i+F1+b<em>k)h<em>k = f(\sum</em>{i=1}^{N-F+1} W<em>{k,i}x</em>{i:i+F-1} + b<em>k) where FF is the length of the filter, W</em>k,iW</em>{k,i} is the weight matrix for the kth filter at position i, and ff is the activation function.

Pooling Layer

This layer performs a max pooling operation, resulting in an output z<em>kz<em>k: z</em>k=max<em>i=1NF+1h</em>k,iz</em>k = max<em>{i=1}^{N-F+1} h</em>{k,i}

Fully Connected Layer

The output zkz_k is flattened and passed through the fully connected layer, applying an activation function gg on the sum (Wz+b)(Wz + b), with WW as the weight matrix and bb as the bias term:

y=g(Wz+b)y = g(Wz + b)

Forecasting Approaches

Two approaches are used to predict GHI:

  1. Using only historical GHI records.

  2. Combining historical GHI data with meteorological parameters.

The models forecast the next value of hourly GHI using a fixed timestep. The first model predicts the next hour x<em>t+1x<em>{t+1} based on Nsteps passed hours:

x<em>t+1=f(x</em>tN<em>steps,x</em>tN<em>steps+1,,x</em>t)x<em>{t+1} = f(x</em>{t-N<em>{steps}}, x</em>{t-N<em>{steps}+1}, …, x</em>t)

After making a prediction, the sliding window shifts forward by 1 hour. In addition, six input features, GHI, temperature, humidity, wind speed, pressure, and clearness index, are used in all experiments.

x<em>t+1=f([x</em>t],[x1<em>t],[x2</em>t],[x3<em>t],[x4</em>t],[x5t])x<em>{t+1} = f([x</em>t], [x1<em>t], [x2</em>t], [x3<em>t], [x4</em>t], [x5_t])

Experimental Setup
  • Training Data: Defined over a specific period (e.g., 2010 to 2016).

  • Validation Data: The year following the training period (e.g., 2017).

  • Testing Data: The year following the validation period (e.g., 2018).

Data is generally distributed as 75% for training, 12.5% for validation, and 12.5% for testing.

Forecasting Model Development Steps
  1. Read data.

  2. Preprocess data (handle missing data, remove outliers, perform normalization).

  3. Select the most relevant features.

  4. Choose the dataset and split it into training, validation, and testing sets.

  5. Build the appropriate LSTM or CNN-based forecasting architecture.

  6. Train and validate the model.

  7. Optimize hyperparameters to improve model performance.

  8. Repeat steps 5 and 6 until the desired model is obtained.

  9. Assess and validate the obtained model using unseen testing data.

  10. Choose the final model based on test metrics.

Performance Metrics
  • MAE (Mean Absolute Error)
    MAE=1N<em>i=1Nx</em>ix^iMAE = \frac{1}{N}\sum<em>{i=1}^{N} |x</em>i - \hat{x}_i|

  • RMSE (Root Mean Square Error)
    RMSE=1N<em>i=1N(x</em>ix^i)2RMSE = \sqrt{\frac{1}{N}\sum<em>{i=1}^{N} (x</em>i - \hat{x}_i)^2}

  • MAPE (Mean Absolute Percentage Error)
    MAPE=100%N<em>i=1Nx</em>ix^<em>ix</em>iMAPE = \frac{100\%}{N}\sum<em>{i=1}^{N} |\frac{x</em>i - \hat{x}<em>i}{x</em>i}|

  • r (Pearson Correlation Coefficient)
    r=<em>i=1N(x</em>ixˉ)(x^<em>ix^ˉ)</em>i=1N(x<em>ixˉ)2</em>i=1N(x^ix^ˉ)2r = \frac{\sum<em>{i=1}^{N} (x</em>i - \bar{x})(\hat{x}<em>i - \bar{\hat{x}})}{\sqrt{\sum</em>{i=1}^{N} (x<em>i - \bar{x})^2 \sum</em>{i=1}^{N} (\hat{x}_i - \bar{\hat{x}})^2}}

Experiments and Discussion

The paper conducts multiple simulation experiments to analyze how different parameters impact the accuracy of predictions. Parameters considered in the tests include timestep, database, LSTM, and CNN parameters on the annual performance, and then the optimal configuration of the two deep networks is evaluated across four different seasons: spring, summer, autumn, and winter.

Experimental Settings
  • Training data is sampled every 30 minutes.

  • The prediction focuses on solar irradiance one hour in advance.

  • The optimization of the hyperparameters for both LSTM and CNN-based deep learning networks ( such as learning rate, dropout rate, batch Size, activation function, and the momentum) is accomplished through the use of the Random Search Keras Tuner approach and employing the Early stopping callback with monitor = “validation loss’’ and a patience equals to 25.

  • For the other hyperparameters, including the number of LSTM units, the CNN number of filters and kernel size, the grid search technique, provided in the scikit-learn library, has been adopted with 5-fold cross validation.

  • The following setting values have been adopted:

    • Epochs: 1000

    • Optimizer: Adam

    • Loss function: Mean Squared Error (MSE)

    • Metrics: Mean Absolute Error (MAE)

Effect of Timestep and Number of LSTM Units

Experimenting with different N_steps values within the range of [7h to 72h] while maintaining a fixed number of LSTM units leads to significant improvements. The hyperparameters