Predicting wind turbine power generation using

numerical weather forecast data

In this assignment you will investigate using linear (ADALINE) models and non-linear Multilayer

Perceptron (MLP) models to predict the power output of a wind turbine based on numerical weather

forecast data.

You have a unique turbine dataset assigned to you, which you can download from the assignment

module in canvas: turbine_N_data .mat, where N is the turbine number assigned to you. The dataset

contains the power output in kW at a sampling interval of 30 minutes for a wind turbine at an

unspecified location in Northern Ireland for the period 11/02/2021 to 31/10/2021, together with 16

weather related candidate predictor variables.

The candidate predictor variables are numerical weather forecast values of eight weather related

variables for the location of the wind turbine (in WdataWT) and the average values of these variables

for 40 locations across Northern Ireland (in WdataNI). The eight variables are:

v(k) Wind Speed (m/s) SD(k) sin(Wind Direction)

P(k) Air Pressure (mbar) v(k-1) wind speed 30 minutes earlier

T(k) Air Temperature (o

C) v(k-2) wind speed 60 minutes earlier

CD(k) cos(Wind Direction) v(k-3) wind speed 90 minutes earlier

The Power data p(k) in kW is contained in Pdata and the corresponding time stamps (date and time)

are recorded in Pdate. Here, k is used to indicate the current sampling instant.

Part 1: Preliminary investigation using linear (ADALINE) models

Part 1(A) – 20%

Using the ‘GenBasicLinear_ModelPredictions’ m-file as a template, investigate and compare the

performance of linear wind power prediction models for the following regression variable

combinations:

1. WdataWT variable: v(k)

2. WdataNI variable: v(k)

3. WdataWT variables: v(k), v(k-1), v(k-2), v(k-3)

4. WdataNI variables: v(k), v(k-1), v(k-2), v(k-3)

5. All WdataWT and WdataNI wind speed variables (8 variables)

6. WdataWT variables: v(k), T(k), P(k), CD(k), SD(k)

7. All 8 WdataWT variables.

8. All WdataWT and WdataNI variables together (16 variables)

You may wish to test other variable combinations as well to gain insights into the relative importance

of each variable. Present your results in a table.

Discuss your results, commenting on:

• which variables are most useful for predicting turbine wind power output

• which regression variable combination gives the best RMSE prediction performance

• why the cos and sin of Wind Direction are included as candidate variables rather than Wind

Direction (0-360o

) itself.

ELE8066 Intelligent Systems and Control

2

• the quality of the data available to predict wind power (given that wind power is a deterministic

function of the incident wind speed)

• the potential motivations for including delayed values of wind speed as candidate regressors in

the dataset

Part 1(B) -10%

Considering the v(k) – p(k) relationship that applies to wind turbines (and observed in the data), what

is the major limitation of employing linear models for this problem?

Identify, implement, and evaluate a simple post-processing adjustment of the model predictions to

reduce the impact of this limitation.

Part 2: Training a MLP prediction model

The Matlab m-file ‘GenMLP_ModelPredictions’ is an extended version of the basic script from Part 1

with template code for designing, training and generating predictions with a MLP model. The script

also highlights how to normalise input variables and remove outliers in the data. Using this script,

explore the design of MLP models with different structures and input variable combinations as follows.

Part 2(A) -20%

Using only the current wind speed, v(k), as an input variable, estimate

(i) a (1,1,1) MLP model

(ii) an optimally sized 1 hidden layer MLP model, i.e. a (1,Nh,1) MLP

(iii) an optimally sized 2 hidden layer MLP model, i.e. a (1,Nh1,Nh2,1) MLP

to predict p(k).

Part 2(B)- 20%

Using all 16 variables as inputs, develop an optimally sized MLP model to predict p(k). Compare the

prediction performance of this model with the different models developed in Part 2(A) and the

corresponding linear models. Present your results in a table.

Note: When training each MLP you need to repeat the training process several times to allow for the potential

for random weight initialisation leading to poor local minima. You should modify the script to automate this

process, as well as the process of determining the optimum number of hidden layer neurons for each network.

Part 3: Optimal variable selection

The Matlab m-file ‘GenAutoVarSelection_ModelPredictions’ shows how to implement an automatic

variable selection procedure when selecting the variables for linear prediction models from a

candidate set of variables. The method, a greedy search method, known as forward selection, chooses

one variable at a time – the one that reduces the RMSE the most when combined with the already

selected variables. Explore the operation of this file. In particular, note how the training and test

datasets are generated. The default arrangement (which is the same as used in the scripts in part 1

and 2) is to randomly select 20% of the data from training and 80% of the data for testing. An alterative

approach, which might be regarded as more realistic, is to split the data sequentially with the first few

months of data used to train the model and the remaining months used to test its performance.

ELE8066 Intelligent Systems and Control

3

Part 3(A)-15%

Using this script determine the optimum subset of variables to use as inputs for the linear wind power

prediction model when:

(i) selecting from all 16 variables and using the 20%/80% random training/test data split

(ii) selecting from all 16 variables and using a 20%/80% sequential training/test data split

(iii) selecting from all 16 variables and using a 50%/50% sequential training/test data split.

Discuss your results, commenting on:

• What you observe regarding the subset of variables selected in each case, and what is causing

the patterns you observe.

• How the RMSE performance if (i) compares with the linear models developed in Part A,

• From a practical standpoint what a good choice of input variables might be.

Part 3(B)-15%

Modify the variable selection script to perform optimum variable selection for the MLP model (up to

a maximum of 5 variables) taking care to consider the computational complexity of doing so and how

to keep it to a manageable level.

Compare the performance of the MLP prediction model built using this choice of variables with the

one built using the top 5 variables chosen for the linear model in Part 3(A). What do you conclude?

Assignment submission requirements

Your assignment report should be a maximum of 8 pages(font size 11) including cover page, all text,

equations, tables and plots. Your report should contain the following:

• Descriptions of the procedures used to obtain your results and explanations and observations

on the results obtained in each exercise.

• Any plots included should be fully annotated (i.e. appropriate title, legend and axis labels, etc.)

and appropriately scaled so important details are clear. (NB: Matlab figures can be copied and

pasted into word using the edit->copy figure option in Matlab.)

• References to the m-files used to get the results presented.

Reports exceeding 8 pages will be penalised (10% marks reduction per additional page). Your report

must be in pdf form and must be submitted via Canvas along with all the Matlab m-files you have

created to perform the analysis required for the exercises. The m-files should be clearly labelled and

include comments to explain and code that has been written.

Zip your report document and m-files into a single zip file for on-line submission. The zip file should

be called your_name_A2.zip.