## LVQ and Machine Learning for Algorithmic Traders – Part 3

In the last two posts, LVQ and Machine Learning for Algorithmic Traders – Part 1, and LVQ and Machine Learning for Algorithmic Traders – Part 2, we demonstrated how to use:

**Linear Vector Quantization****Correlation testing**

..to determine the relevance/importance of and correlation between strategy parameters respectively.

Yet another technique we can use to estimate the best features to include in our trading strategies or models, is called **Recursive Feature Elimination**, an automatic feature selection approach.

## What is Automatic Feature Selection?

It enables **algorithmic traders** to construct multiple quantitative models using different segments of a given dataset, allowing them to identify which combination of features or strategy parameters results in the most accurate model.

One such method of automatic feature selection is **Recursive Feature Elimination (RFE)**.

To evaluate the best feature-space for an accurate model, the technique iteratively applies a Random Forest algorithm to all possible combinations of the input feature data (strategy parameters).

The end-outcome is a list of features that produce the most accurate model.

Using RFE, algorithmic traders can refine and speed up trading strategy optimization significantly (subject to this list being smaller than the total number of input parameters of course).

We’ll make use of the **caret (Classification and Regression Training) package in R** once again.

It contains functions to perform RFE conveniently, allowing us to spend more time in analysis instead of writing the functionality ourselves.

## Recursive Feature Elimination – Step by Step Process

- As before,
**run “raw” backtests without any optimization**, employing all features (parameters), and save your results in a suitable data structure (e.g. CSV table) + load the caret and randomForest libraries. **Specify the algorithm control**using a Random Forest selection method.**Execute**the Recursive Feature Elimination algorithm.**Output**the algorithm’s chosen features (strategy parameters).

### Step 1: Load the data + “randomForest” and “caret” machine learning libraries in R

`> library(caret)`

`> library(randomForest)`

`> train.blogpost <- read.csv("data.csv", head=T, nrows=1000)`

`> train.blogpost <- train.blogpost[,grep("feature|target",names(train.blogpost))]`

### Step 2: Specify the control using Random Forest selection function

`> rfe.control <- rfeControl(functions=rfFuncs, method="cv", number=10)`

### Step 3: Execute the Recursive Feature Elimination algorithm

`rfe.output <- rfe(train.blogpost[,1:21], train.blogpost[,22], sizes=c(1:21), rfeControl = rfe.control)`

### Step 4: Output chosen features (strategy parameters)

`> print(rfe.output)`

`> predictors(rfe.output)`

`> plot(rfe.output, type=c("o", "g"))`

## Conclusion

From these results, it is easily apparent that a model with:

- The
**first two parameters only**, generates the**most inaccurate**model. - The algorithm’s
**5 selected parameters (out of a total of 21)**produces the**most accurate**model. - Any number of
**parameters greater than 5 produces lower but comparable accuracy**, therefore choosing any greater a number of parameters would add zero value to the model.

Based on this, an algorithmic trader could significantly reduce his/her optimization overhead, by culling the number of strategy parameters employed in backtesting and optimization.

**Additional Resource: Measuring Investments’ Risk: Value at Risk (VIDEO)
**

** please activate CC mode to view subtitles.*

*Do you have what it takes? –* *Join the Darwinex Trader Movement!*