MetaTrader Backtesting – Best Practices for Algorithmic Traders
MetaTrader backtesting can be tricky business for algorithmic traders. Follow these best practices to engineer robust, reliable trading strategies.
Simulating a strategy’s historical performance correctly, increases the probability of it generalizing well to unseen market data in future.
As such, it’s important that all backtesting be conducted robustly, careful attention being paid to any factors that could contribute towards historical and live performance being markedly different.
This post organizes proposed best practices into 3 main categories:
- Data Handling
- Parameter Selection
- Variable Factors
1) Data Handling
In this first section, we discuss how to treat data used in MetaTrader 4 backtesting.
Segmenting Historical Data
A common misconception in MetaTrader 4 backtesting is to associate a strategy’s robustness with how well it performs on the largest possible amount of historical data available to the trader.
A strategy that performs remarkably well on the entire dataset is at risk of being overfit (high variance).
Conversely, a strategy that has been backtested on too small a portion of historical data is likely to encounter high bias.
In both cases, the likelihood of the trading strategy generalizing well to unseen market data is poor at best.
Therefore, it is important at the very least, that any backtesting be conducted using independent training, validation and test sets:
- Training: The data segment used for training the strategy parameters on (or in other words, checking for decent performance in backtesting).
- Validation: The first unseen data segment used to test how the same parameters chosen in training, perform on unseen data. Passing this phase adds confidence that the strategy is robust. Discrepancies in this phase also permit revision of parameters before finally testing on the last batch of test data.
- Testing: If a strategy passes both training and validation phases satisfactorily, running a final simulation on this last “held out” segment of data adds another layer of confidence that the trading strategy either exhibits robustness or is prone to excess stagnation or failure in live trading.
Care should be taken however to not bias the outcome by modifying any parameters during the test sample phase.
If the trading strategy passed in-sample (training), validation, but not testing, it is advisable to go back to the drawing board for any modifications required, and then re-run both in-sample and validation tests again.
Selecting Length of Historical Data
The amount of historical data required for a backtesting exercise is directly proportional to the complexity of the trading strategy being tested.
In other words, the more complex (large number of training parameters) the strategy is, the more data required to ascertain the validity of its underlying hypothesis.
In line with the Occam’s Razor principle, when optimizing parameters to estimate the amount of data required for a backtest, it is generally best to favour simpler over more complex strategies if their output performance is not dramatically different (for better or worse).
If the amount of data available is not as large as is optimal for backtesting, traders may consider a cross-validation approach and/or walk forward optimization to make better use of smaller data samples.
Bootstrapping is another possibility, but care should be taken with sampling time series data that exhibit variable momentum over short periods of time, as it becomes considerably difficult to preserve dependency structure when bootstrapping such data.
Backtesting Timeframes Lower Than H1 (hourly)
Trading strategies that operate below hourly (H1) timeframes, are prone to experiencing excess divergence between their simulated and live trading performance.
This behaviour is primarily due to a considerable amount of noisy data in timeframes below hourly.
If a strategy models on data over small timeframes, traders must be mindful of overfitting risk (particularly in complex strategies with a large number of parameters).
Robustness testing such as Monte Carlo simulations over robust variations in strategy parameters, can quickly estimate how closely (or not) a strategy could perform in live trading vs. backtesting.
MetaTrader 4 Historical Data – Effects of Interpolation
Data on timeframes lower than H1 (hourly) in MetaTrader 4 is progressively affected by one-minute interpolation as the timeframe decreases in size, e.g. H1 -> M30 -> M15 -> M5 -> M1.
This has implications for trading strategies that are backtested on small precisions of data, with the probability of live trading results differing considerably from backtests increasing with every decrease in timeframe.
Traders should therefore be extra cautious when testing strategies on low timeframes, choosing robust parameter ranges and always comparing a strategy’s performance on lower timeframes with higher timeframes and multiple datasets.
If differences between timeframes and/or multiple datasets are excessively large, e.g. a remarkably profitable strategy on M15 turning in completely the opposite direction on higher timeframes or time series data from difference sources, the probability of the strategy being overfit to one dataset and/or finely interpolated data (in the case of lower timeframes) is usually fairly high.
Re-validating MetaTrader 4 History Center Data
Data in MT4 can at times experience corruption during platform usage. This can happen as a result of discrepancies in tick data transfer, opening charts for assets where history wasn’t already available, to name a few reasons.
Therefore, when backtesting in MetaTrader 4’s Strategy Tester whilst connected to your brokerage account, it makes sense to refresh asset data in the History Center again prior to executing new backtests.
This ensures that it is up to date, and any erroneous ticks or gaps as a result of market opens, interpolation discrepancies or high impact news events are validated prior to backtesting.
2) Parameter Selection
This section discusses some best practices in terms of risk management and strategy optimization.
Using Robust Parameter Ranges
Regardless of the number of parameters in a trading strategy, it’s important that traders choose parameter values that do not fit the underlying data too closely.
For example, it is a more robust practice to use an Exponential Moving Average (EMA) period of 50 instead of 49 or 47, if the resulting backtest performances are slightly different but both positive in both cases.
Similarly, during any genetic or walk-forward optimization, it is good practice to choose robust ranges of parameter values that typically increment in steps of 5 or 10, as opposed to 1 or 2.
Robust range selection in this manner does carry the risk of introducing high bias into the algorithm. Therefore, it is also important for traders to test how using robust parameter values impact the overall performance or ability of the strategy to generalize well to unseen data.
Estimating Impact of Variable Spread & Slippage
At the time of writing (October 16, 2017) MetaTrader 4’s Strategy Tester does not yet ship with the ability to backtest with variable slippage simulation.
Traders can however run slippage simulations on their backtest results in a a few ways, three of which are:
- By importing backtest results into a spreadsheet application (e.g. MS Excel), modifying Open/Close prices of trades with randomly generated values between a sensible range (e.g. between 0.1 and 2.0 or 5.0) and recalculating P/L.
- By importing backtest results into a statistical computing environment such as R, GNU Octave or MATLAB and doing the same as in point (1) above.
- Using 3rd-party commercial tools.
The same practice can be employed with variable spread simulation, as MetaTrader 4 currently ships with fixed spread testing available in the Strategy Tester.
Maintaining Stable Underlying Strategy VaR
Making use of dynamic stop loss and take profit targets, proportionate to available account equity, allows for realistic backtest outcomes, and aids risk stability (especially if the trader intends on creating a DARWIN).
In trading strategies that increase or decrease position sizes by available equity, it is particularly important that minimum/maximum lot sizes be restricted to sensible ranges for the following reasons:
- A lot size dropping below the minimum permissible lot size would result in trade rejection.
- Lot sizes scaling too large in single trades (e.g. to 100 lots or above) would introduce capacity constraints during live trading that could possibly require fractional position sizing to address, depending on factors such as time of day, strategy capacity and available liquidity.
- Such constraints would not be visible in a MetaTrader backtest, and without being factored in during strategy development, possibly create the illusion of an extremely profitable strategy with no liquidity considerations holding it back.
3) Variable Factors
In this final section, we discuss some considerations that trading strategies can benefit from during the design and post-backtest phases.
Correlation of Strategy Returns to Market Volatility
For a detailed discussion on this, please watch the webinar recording on Effects of Market Volatility on Trader Performance here.
Testing for Market Correlation
Comparing the correlation of a trading strategy’s returns to those of its underlying assets, reveals the dependency of the strategy on the same.
A strategy with a low dependency on the assets it trades is less likely to demonstrate unexpected behaviours during periods of decline or sudden turbulence in underlying assets, as opposed to one that relies heavily on underlying asset returns.
To analyze a strategy for Market Correlation as well as 11 other investment attributes, traders are encouraged to upload backtests to the Darwinex platform, review scores and visualize the evolution of their strategies for each attribute.
Optimizing Position Sizes for Capacity
One of the ways in which a strategy’s overall investment capacity can be improved, is by implementing fractional order sizing as opposed to fixed, singular lot sizing.
For example, sending 10 orders of 1 lot each is a more scalable approach than sending 1 order of 10 lots.
For a detailed discussion on scalability, please watch our webinar recording on Scalability (now titled Capacity on the DARWIN Exchange)
Trading High Impact News
Strategies that trade during high impact news events are impacted negatively by sudden swings in variable slippage and spreads at the time of the event.
One way to reduce (if not eliminate) such negative impact is by having the strategy disable trading at pre-defined times or intervals corresponding to such events.
[Webinar Recording] DO’s and DONT’s of MT4 Backtesting
Do you have what it takes? – Join the Darwinex Trader Movement!