In LVQ and Machine Learning for Algorithmic Traders – Part 1, we discussed and demonstrated a technique (Linear Vector Quantization) to decipher the relevance and relative importance of each feature variable in the dataset under study.
In doing so, algorithmic traders would be able to isolate which of a dataset’s features (read: strategy parameters) had a minor impact on the final target, thereby aiding faster strategy optimization.
Another technique we can use for the same objective (isolating features that have little to no impact on end outcomes), involves studying the correlation between the dataset’s feature variables.
When a trading strategy has highly correlated parameters, algorithmic traders not only run the risk of overfitting, but also that of introducing avoidable latency in execution.
While the latter may be more of a concern for short-term / intraday traders, over the longer term it may introduce considerably higher transactional costs even for swing traders -> intended vs. actual fills.
Algorithmic traders can therefore benefit from the removal of such highly correlated parameters, prior to any optimization.
The procedure to follow for removing such redundant features is quite simple (see below).
We will once again, make use of the caret (Classification and Regression Training) package in R, that contains a suite of convenient functions for this particular task.
N.B. It is just as simple to replicate this process in C++, Java, MQL or Python.
Step by Step Process
- Run “raw” backtests without any optimization, employing all features (parameters), and save your results in a suitable data structure (e.g. CSV table) for further analysis.
- Construct a correlation matrix of the data’s features (read: strategy’s parameters).
- Run the correlation matrix through caret’s findCorrelation() function to determine which features (parameters) are highly correlated, and can hence be removed.
- Set the correlation threshold at 60% (can be higher or lower if you prefer), to remove features.
Step 1: Load the “caret” machine learning library in R.
Step 2: Load and process the backtest dataset.
For the purposes of this example, we will use the same “feature|target” backtested dataset of 1,000 records employed in LVQ and Machine Learning for Algorithmic Traders – Part 1.
> train.blogpost <- read.csv("data.csv", head=T, nrows=1000)
> train.blogpost <- train.blogpost[,grep("feature",names(train.blogpost))]
Step 3: Calculate and print the correlation matrix
> correlation.matrix <- cor(train.blogpost)
Step 4: Detect and print highly correlated features (threshold > 60%)
> high.corr <- findCorrelation(correlation.matrix, cutoff=0.6)
The features (parameters) printed as a result of this process, have an absolute correlation of 60% or higher, and thus should be removed before any optimization is conducted.
Additional Resource: Measuring Investments’ Risk: Value at Risk (VIDEO)
* please activate CC mode to view subtitles.
Do you have what it takes? – Join the Darwinex Trader Movement!