MetaTrader Backtesting - Best Practices for Algorithmic Traders

MetaTrader backtesting can be tricky business for algorithmic traders. Follow these best practices to engineer robust, reliable trading strategies.

Watch the full video tutorial here (36 minutes).

Simulating a strategy’s historical performance correctly, increases the probability of it generalizing well to unseen market data in future.
As such, it’s important that all backtesting be conducted robustly, careful attention being paid to any factors that could contribute towards historical and live performance being markedly different.
This post organizes proposed best practices into 3 main categories:

Data Handling
Parameter Selection
Variable Factors

1) Data Handling

In this first section, we discuss how to treat data used in MetaTrader 4 backtesting.

Segmenting Historical Data

A common misconception in MetaTrader 4 backtesting is to associate a strategy’s robustness with how well it performs on the largest possible amount of historical data available to the trader.
A strategy that performs remarkably well on the entire dataset is at risk of being overfit (high variance).
Conversely, a strategy that has been backtested on too small a portion of historical data is likely to encounter high bias.
In both cases, the likelihood of the trading strategy generalizing well to unseen market data is poor at best.
Therefore, it is important at the very least, that any backtesting be conducted using independent training, validation and test sets:

Training: The data segment used for training the strategy parameters on (or in other words, checking for decent performance in backtesting).
Validation: The first unseen data segment used to test how the same parameters chosen in training, perform on unseen data. Passing this phase adds confidence that the strategy is robust. Discrepancies in this phase also permit revision of parameters before finally testing on the last batch of test data.
Testing: If a strategy passes both training and validation phases satisfactorily, running a final simulation on this last “held out” segment of data adds another layer of confidence that the trading strategy either exhibits robustness or is prone to excess stagnation or failure in live trading.

Care should be taken however to not bias the outcome by modifying any parameters during the test sample phase.
If the trading strategy passed in-sample (training), validation, but not testing, it is advisable to go back to the drawing board for any modifications required, and then re-run both in-sample and validation tests again.

Selecting Length of Historical Data

The amount of historical data required for a backtesting exercise is directly proportional to the complexity of the trading strategy being tested.
In other words, the more complex (large number of training parameters) the strategy is, the more data required to ascertain the validity of its underlying hypothesis.

In line with the Occam’s Razor principle, when optimizing parameters to estimate the amount of data required for a backtest, it is generally best to favour simpler over more complex strategies if their output performance is not dramatically different (for better or worse).

If the amount of data available is not as large as is optimal for backtesting, traders may consider a cross-validation approach and/or walk forward optimization to make better use of smaller data samples.
Bootstrapping is another possibility, but care should be taken with sampling time series data that exhibit variable momentum over short periods of time, as it becomes considerably difficult to preserve dependency structure when bootstrapping such data.

Backtesting Timeframes Lower Than H1 (hourly)

Trading strategies that operate below hourly (H1) timeframes, are prone to experiencing excess divergence between their simulated and live trading performance.

This behaviour is primarily due to a considerable amount of noisy data in timeframes below hourly.
If a strategy models on data over small timeframes, traders must be mindful of overfitting risk (particularly in complex strategies with a large number of parameters).
Robustness testing such as Monte Carlo simulations over robust variations in strategy parameters, can quickly estimate how closely (or not) a strategy could perform in live trading vs. backtesting.

MetaTrader 4 Historical Data – Effects of Interpolation

Data on timeframes lower than H1 (hourly) in MetaTrader 4 is progressively affected by one-minute interpolation as the timeframe decreases in size, e.g. H1 -> M30 -> M15 -> M5 -> M1.
This has implications for trading strategies that are backtested on small precisions of data, with the probability of live trading results differing considerably from backtests increasing with every decrease in timeframe.
Traders should therefore be extra cautious when testing strategies on low timeframes, choosing robust parameter ranges and always comparing a strategy’s performance on lower timeframes with higher timeframes and multiple datasets.
If differences between timeframes and/or multiple datasets are excessively large, e.g. a remarkably profitable strategy on M15 turning in completely the opposite direction on higher timeframes or time series data from difference sources, the probability of the strategy being overfit to one dataset and/or finely interpolated data (in the case of lower timeframes) is usually fairly high.

Re-validating MetaTrader 4 History Center Data

Data in MT4 can at times experience corruption during platform usage. This can happen as a result of discrepancies in tick data transfer, opening charts for assets where history wasn’t already available, to name a few reasons.
Therefore, when backtesting in MetaTrader 4’s Strategy Tester whilst connected to your brokerage account, it makes sense to refresh asset data in the History Center again prior to executing new backtests.
This ensures that it is up to date, and any erroneous ticks or gaps as a result of market opens, interpolation discrepancies or high impact news events are validated prior to backtesting.

2) Parameter Selection

This section discusses some best practices in terms of risk management and strategy optimization.

Using Robust Parameter Ranges

Regardless of the number of parameters in a trading strategy, it’s important that traders choose parameter values that do not fit the underlying data too closely.
For example, it is a more robust practice to use an Exponential Moving Average (EMA) period of 50 instead of 49 or 47, if the resulting backtest performances are slightly different but both positive in both cases.
Similarly, during any genetic or walk-forward optimization, it is good practice to choose robust ranges of parameter values that typically increment in steps of 5 or 10, as opposed to 1 or 2.
Robust range selection in this manner does carry the risk of introducing high bias into the algorithm. Therefore, it is also important for traders to test how using robust parameter values impact the overall performance or ability of the strategy to generalize well to unseen data.

Estimating Impact of Variable Spread & Slippage

At the time of writing (October 16, 2017) MetaTrader 4’s Strategy Tester does not yet ship with the ability to backtest with variable slippage simulation.
Traders can however run slippage simulations on their backtest results in a a few ways, three of which are:

By importing backtest results into a spreadsheet application (e.g. MS Excel), modifying Open/Close prices of trades with randomly generated values between a sensible range (e.g. between 0.1 and 2.0 or 5.0) and recalculating P/L.
By importing backtest results into a statistical computing environment such as R, GNU Octave or MATLAB and doing the same as in point (1) above.
Using 3rd-party commercial tools.

The same practice can be employed with variable spread simulation, as MetaTrader 4 currently ships with fixed spread testing available in the Strategy Tester.

Maintaining Stable Underlying Strategy VaR

Making use of dynamic stop loss and take profit targets, proportionate to available account equity, allows for realistic backtest outcomes, and aids risk stability (especially if the trader intends on creating a DARWIN).

In trading strategies that increase or decrease position sizes by available equity, it is particularly important that minimum/maximum lot sizes be restricted to sensible ranges for the following reasons:

A lot size dropping below the minimum permissible lot size would result in trade rejection.
Lot sizes scaling too large in single trades (e.g. to 100 lots or above) would introduce capacity constraints during live trading that could possibly require fractional position sizing to address, depending on factors such as time of day, strategy capacity and available liquidity.
Such constraints would not be visible in a MetaTrader backtest, and without being factored in during strategy development, possibly create the illusion of an extremely profitable strategy with no liquidity considerations holding it back.

3) Variable Factors

In this final section, we discuss some considerations that trading strategies can benefit from during the design and post-backtest phases.

Correlation of Strategy Returns to Market Volatility

For a detailed discussion on this, please watch the webinar recording on Effects of Market Volatility on Trader Performance here.

Testing for Market Correlation

Comparing the correlation of a trading strategy’s returns to those of its underlying assets, reveals the dependency of the strategy on the same.
A strategy with a low dependency on the assets it trades is less likely to demonstrate unexpected behaviours during periods of decline or sudden turbulence in underlying assets, as opposed to one that relies heavily on underlying asset returns.
To analyze a strategy for Market Correlation as well as 11 other investment attributes, traders are encouraged to upload backtests to the Darwinex platform, review scores and visualize the evolution of their strategies for each attribute.

Optimizing Position Sizes for Capacity

One of the ways in which a strategy’s overall investment capacity can be improved, is by implementing fractional order sizing as opposed to fixed, singular lot sizing.
For example, sending 10 orders of 1 lot each is a more scalable approach than sending 1 order of 10 lots.
For a detailed discussion on scalability, please watch our webinar recording on Scalability (now titled Capacity on the DARWIN Exchange)

Trading High Impact News

Strategies that trade during high impact news events are impacted negatively by sudden swings in variable slippage and spreads at the time of the event.
One way to reduce (if not eliminate) such negative impact is by having the strategy disable trading at pre-defined times or intervals corresponding to such events.

[Webinar Recording] DO’s and DONT’s of MT4 Backtesting

Do you have what it takes? – Join the Darwinex Trader Movement!

2 Comments

KlondikeFX

Posted October 17, 2017 at 10:31 am

Good stuff. Just a couple of additions off the top of my head though:
Re 2) Parameter Selection:
Also, keep an eye on the distribution of profitable parameters. If the strategy is highly profitable with an EMA Period of 50 but unprofitable with 55, 45, etc. it’s likely just an anomaly and you should be very cautious going forward.
As important as using solid parameter ranges is selecting a reasonable number of parameters (degrees of freedom). The higher the degrees of freedom the higher the likelihood of overfitting. Keep a variant of Occam’s Razor in mind for the parameter count as well “Everything should be kept as simple as possible, but no simpler.”
Another (pretty obvious) point is the sample size of trades. If you have a strategy with a small number of trades the relevance of the backtest is significantly lower than of a strategy with a lot of trades – given you are also considering the other factors of 1) Data-Handling posted above.

3Likes Reply
- Post Author
  
  The Market Bull
  
  Posted October 17, 2017 at 11:07 am
  
  Excellent feedback KlondikeFX!
  Thank you so much for your additions and interest – they will be added to the blog shortly referencing you as the source 🙂
  
  2Likes Reply

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

MetaTrader Backtesting – Best Practices for Algorithmic Traders