# LVQ and Machine Learning for Algorithmic Traders – Part 1

Algorithmic traders across all spectra of asset classes, often face a rather daunting challenge.

### What are the best inputs for an algorithmic trading strategy’s parameter space?

Different algorithmic trading strategies (whether manual or automated) will each have their own unique set of parameters that govern their behaviour.

Granted.. Genetic and Walk-Forward Optimization will help algorithmic traders establish what input values (or ranges thereof) in chosen parameter spaces, yield favourable results historically.

They will also help traders identify optimal time periods over which to re-optimize “the currently optimized parameter space”…. *yes, that could indeed, get pretty messy.*

While this approach may or may not yield robust parameter inputs, **several questions still remain in algorithmic traders’ minds:**

1) Should absolutely all parameters be optimized, or just some? If so, which ones?

2) What is the relevance and unique importance of each parameter in the trading strategy?

### Why is this important for Algorithmic Traders?

Selecting the right parameters in your trading algorithm can be the difference between:

- Average performance with a large number of parameters -> painfully long optimization times,

or, - Fantastic performance with a smaller number of parameters -> much shorter optimization times.

### What is the solution?

Selecting the most appropriate parameters is a practice known as **Feature Selection** in the **Machine Learning** world, a vast and complex area of research and development.

Needless to say it cannot be encapsulated in one single blog post, which therefore implies that there will be more blog posts on this subject in the very near future 🙂

For now, we will focus on estimating “the most important” parameters in a **trading strategy**, using a bit of **machine learning** in R.

Specifically, we will make use of the **caret **(short for** Classification and Regression Training**) package in R, as it contains excellent modeling functions to assist us with this Feature Selection problem.

Lastly, we will use a small constructed sample of 1,000 id|feature|target records as the dataset, to demonstrate Linear Vector Quantization (the solution).

### Step 1 – Load the “caret” machine learning library in R

`> library(caret)`

### Step 2 – Prepare the data

Construct a dataset containing 1,000 training data points in CSV form.

Making sure you’re in the directory where the training data resides, type the following commands in your R console:

`> train.blogpost <- read.csv("data.csv", head=T, nrows=1000)`

We need only the “feature” and “target” column values in the dataset. Type the following command in your R console to achieve this:

`train.blogpost <- train.blogpost[,grep("feature|target",names(train.blogpost))]`

### Step 3 – Construct an LVQ Model on the data.

`> model.control <- trainControl(method="repeatedcv", number=10, repeats=3)`

`> model <- train(as.factor(target)~., data=train.blogpost, method="lvq", preProcess="scale", trControl=model.control)`

### Step 4 – Retrieve the “importance” of each “feature” from the computed model.

`> importance <- varImp(model, scale=FALSE)`

`> print(importance)`

loess r-squared variable importance`only 20 most important variables shown (out of 21)`

`Overall`

`feature2 0.011949`

`feature18 0.010770`

`feature7 0.010556`

`feature16 0.010522`

`feature5 0.010400`

`feature11 0.009825`

`feature1 0.009673`

`feature14 0.009672`

`feature3 0.009663`

`feature13 0.008916`

`feature21 0.008846`

`feature15 0.008737`

`feature10 0.008616`

`feature17 0.008180`

`feature19 0.007864`

`feature12 0.005575`

`feature9 0.005268`

`feature8 0.005124`

`feature20 0.005089`

`feature4 0.005052`

`>`

### Step 5 – Visualize the importance of each feature.

`plot(importance)`

The plot of “feature importance” above clearly shows that features 12, 9, 8, 20, 4 and 6 have little impact on the outcome (the “target”), compared to the rest of the features.

To put it into context – in a trading strategy, these features may well have been parameters called:

Stop Loss 1, Stop Loss 2, Take Profit 1, Take Profit 2, RSI Top, RSI Bottom.. and so on.

## Conclusion

By conducting LVQ analysis on optimization results, algorithmic traders can save themselves not only time, but lost accuracy.

**Machine learning techniques** of this nature, greatly reduce the time a trader needs to spend on any optimization problem.

By ascertaining the relevant importance of parameters in this manner, traders can not only simplify their algorithms, but also make them more robust than previously possible with a larger number of parameters.

—

**Additional Resource: Learn more about DARWIN Portfolio Risk (VIDEO)
**

** please activate CC mode to view subtitles.*

*Do you have what it takes? –* *Join the Darwinex Trader Movement!*