Predicting wind turbine power generation using
numerical weather forecast data
In this assignment you will investigate using linear (ADALINE) models and non-linear Multilayer
Perceptron (MLP) models to predict the power output of a wind turbine based on numerical weather
You have a unique turbine dataset assigned to you, which you can download from the assignment
module in canvas: turbine_N_data .mat, where N is the turbine number assigned to you. The dataset
contains the power output in kW at a sampling interval of 30 minutes for a wind turbine at an
unspecified location in Northern Ireland for the period 11/02/2021 to 31/10/2021, together with 16
weather related candidate predictor variables.
The candidate predictor variables are numerical weather forecast values of eight weather related
variables for the location of the wind turbine (in WdataWT) and the average values of these variables
for 40 locations across Northern Ireland (in WdataNI). The eight variables are:
v(k) Wind Speed (m/s) SD(k) sin(Wind Direction)
P(k) Air Pressure (mbar) v(k-1) wind speed 30 minutes earlier
T(k) Air Temperature (o
C) v(k-2) wind speed 60 minutes earlier
CD(k) cos(Wind Direction) v(k-3) wind speed 90 minutes earlier
The Power data p(k) in kW is contained in Pdata and the corresponding time stamps (date and time)
are recorded in Pdate. Here, k is used to indicate the current sampling instant.
Part 1: Preliminary investigation using linear (ADALINE) models
Part 1(A) – 20%
Using the ‘GenBasicLinear_ModelPredictions’ m-file as a template, investigate and compare the
performance of linear wind power prediction models for the following regression variable
1. WdataWT variable: v(k)
2. WdataNI variable: v(k)
3. WdataWT variables: v(k), v(k-1), v(k-2), v(k-3)
4. WdataNI variables: v(k), v(k-1), v(k-2), v(k-3)
5. All WdataWT and WdataNI wind speed variables (8 variables)
6. WdataWT variables: v(k), T(k), P(k), CD(k), SD(k)
7. All 8 WdataWT variables.
8. All WdataWT and WdataNI variables together (16 variables)
You may wish to test other variable combinations as well to gain insights into the relative importance
of each variable. Present your results in a table.
Discuss your results, commenting on:
• which variables are most useful for predicting turbine wind power output
• which regression variable combination gives the best RMSE prediction performance
• why the cos and sin of Wind Direction are included as candidate variables rather than Wind
ELE8066 Intelligent Systems and Control
• the quality of the data available to predict wind power (given that wind power is a deterministic
function of the incident wind speed)
• the potential motivations for including delayed values of wind speed as candidate regressors in
Part 1(B) -10%
Considering the v(k) – p(k) relationship that applies to wind turbines (and observed in the data), what
is the major limitation of employing linear models for this problem?
Identify, implement, and evaluate a simple post-processing adjustment of the model predictions to
reduce the impact of this limitation.
Part 2: Training a MLP prediction model
The Matlab m-file ‘GenMLP_ModelPredictions’ is an extended version of the basic script from Part 1
with template code for designing, training and generating predictions with a MLP model. The script
also highlights how to normalise input variables and remove outliers in the data. Using this script,
explore the design of MLP models with different structures and input variable combinations as follows.
Part 2(A) -20%
Using only the current wind speed, v(k), as an input variable, estimate
(i) a (1,1,1) MLP model
(ii) an optimally sized 1 hidden layer MLP model, i.e. a (1,Nh,1) MLP
(iii) an optimally sized 2 hidden layer MLP model, i.e. a (1,Nh1,Nh2,1) MLP
to predict p(k).
Part 2(B)- 20%
Using all 16 variables as inputs, develop an optimally sized MLP model to predict p(k). Compare the
prediction performance of this model with the different models developed in Part 2(A) and the
corresponding linear models. Present your results in a table.
Note: When training each MLP you need to repeat the training process several times to allow for the potential
for random weight initialisation leading to poor local minima. You should modify the script to automate this
process, as well as the process of determining the optimum number of hidden layer neurons for each network.
Part 3: Optimal variable selection
The Matlab m-file ‘GenAutoVarSelection_ModelPredictions’ shows how to implement an automatic
variable selection procedure when selecting the variables for linear prediction models from a
candidate set of variables. The method, a greedy search method, known as forward selection, chooses
one variable at a time – the one that reduces the RMSE the most when combined with the already
selected variables. Explore the operation of this file. In particular, note how the training and test
datasets are generated. The default arrangement (which is the same as used in the scripts in part 1
and 2) is to randomly select 20% of the data from training and 80% of the data for testing. An alterative
approach, which might be regarded as more realistic, is to split the data sequentially with the first few
months of data used to train the model and the remaining months used to test its performance.
ELE8066 Intelligent Systems and Control
Using this script determine the optimum subset of variables to use as inputs for the linear wind power
prediction model when:
(i) selecting from all 16 variables and using the 20%/80% random training/test data split
(ii) selecting from all 16 variables and using a 20%/80% sequential training/test data split
(iii) selecting from all 16 variables and using a 50%/50% sequential training/test data split.
Discuss your results, commenting on:
• What you observe regarding the subset of variables selected in each case, and what is causing
the patterns you observe.
• How the RMSE performance if (i) compares with the linear models developed in Part A,
• From a practical standpoint what a good choice of input variables might be.
Modify the variable selection script to perform optimum variable selection for the MLP model (up to
a maximum of 5 variables) taking care to consider the computational complexity of doing so and how
to keep it to a manageable level.
Compare the performance of the MLP prediction model built using this choice of variables with the
one built using the top 5 variables chosen for the linear model in Part 3(A). What do you conclude?
Assignment submission requirements
Your assignment report should be a maximum of 8 pages(font size 11) including cover page, all text,
equations, tables and plots. Your report should contain the following:
• Descriptions of the procedures used to obtain your results and explanations and observations
on the results obtained in each exercise.
• Any plots included should be fully annotated (i.e. appropriate title, legend and axis labels, etc.)
and appropriately scaled so important details are clear. (NB: Matlab figures can be copied and
pasted into word using the edit->copy figure option in Matlab.)
• References to the m-files used to get the results presented.
Reports exceeding 8 pages will be penalised (10% marks reduction per additional page). Your report
must be in pdf form and must be submitted via Canvas along with all the Matlab m-files you have
created to perform the analysis required for the exercises. The m-files should be clearly labelled and
include comments to explain and code that has been written.
Zip your report document and m-files into a single zip file for on-line submission. The zip file should
be called your_name_A2.zip.