Posts

Market Regimes | Algorithmic Trading

Market Regimes | Advanced Trading Techniques To Categorize Them

Why is it important to be able to categorize Market Regimes?

And what benefits can we hope to see from doing so?

Price action exhibits different characteristics in different market regimes.

Being able to classify the current market regime, allows us to adjust our trading rules to best suit the regime we’re in.

Thus allowing us to keep our trading edge, regardless of whether the market is trending or ranging.

It also allows us to filter our trades based on the current market volatility. Another important consideration when optimizing your trading rules for each market regime

If we use identical trading rules in each regime, we risk our algorithm losing its valuable edge, with the probabilities moving against us.

However, this technique does require careful consideration of statistical significance to avoid over-fitting, which is also explained in the video.

 

Brought to you by Darwinex: UK FCA Regulated Broker, Asset Manager & Trader Exchange where Traders can legally attract Investor Capital and charge Performance Fees:

Risk disclosure:
https://www.darwinex.com/legal/risk-disclaimer


Content Disclaimer: The contents of this video (and all other videos by the presenter) are for educational purposes only, and are not to be construed as financial and/or investment advice.

darwin api

DARWIN API: What’s Been & What’s To Come (2019)

Earlier this year marked a significant milestone in Darwinex’ evolution… the Beta-state launch of the DARWIN API.

This included the following sub APIs:

  1. DARWIN Info API (to access Quote and Attributes data)
  2. DARWIN Quotes API (to stream Quotes from active DARWINs in real-time via REST)
  3. Quote Websocket API (to stream Quotes from active DARWINs in real-time via Web Sockets)
  4. DARWIN Trading API (to trade DARWINs via REST as you would via the platform)
  5. Investor Accounts Info API (to retrieve account and portfolio performance details, e.g. equity, position data etc)

All 5 sub APIs have now been rolled out to everyone.


Get Access To The DARWIN API


The entire suite of APIs was covered in great detail in a dedicated video tutorial series on the Darwinex YouTube channel! If you haven’t watched it yet, here’s the link to bookmark:


What did we achieve?

The API’s launch enabled for the very first time, programmatic access to the Darwinex Community dataset.

It enabled anyone and everyone to analyse and trade trader talent algorithmically, build custom indicators, automated trading robots, analysis tools and even full-fledged DARWIN Trading Terminals from scratch, to name a few things.

Algorithmic and discretionary/manual traders alike, quants, data scientists and practitioners across the board could now access a trader behaviour-powered, multi-variate financial time series that offers a richer feature-space than OHLCV (Open, High, Low, Close, Volume) price data found in traditional asset classes.


Why does that matter?

..because information is power.

The more informed your investments, the better your odds of survival.

Means to address several evergreen trading challenges became a reality.

The API exposed endpoints that enabled anyone to create their own custom DARWIN filters and indexes for both investment in and to inform existing investments.

For instance:

  1. Would you work with just session-sensitive over-the-counter tick volume or would the time-weighted order frequency of high performance DARWINs offer better insights into potential mispricing events? Watch this video for more information.
  2. Would your volume-spread strategy be served better by saturated “smart-money” assumptions about volume/price differentials or the direction performance DARWINs took when those differentials took place? Watch this video for insight.
  3. What seems more reliable… the Quote evolution of a DARWIN that’s successfully traded volatility after transaction costs, in continuously changing market conditions, navigating news, black swans, market sweeps and more, for 5+ years with a Darwinex-verified track record? …or the hyperbolic backtest of a volatility strategy with invariant market conditions?
  4. Would it make sense to apply technical analysis to trader behaviour?
  5. How do good traders react to major economic news releases?
  6. Would it make more sense to set a BUY STOP order with a DARWIN that’s consistently traded the Non-Farm Payroll successfully, or a straddle of BUY/SELL STOP orders around the EUR/USD as a more educated gamble than 50/50?
  7. What does a portfolio of composed of intraday, swing or night-scalper DARWINs look like?
  8. What is the correlation of your strategy’s returns with that of the Darwinex Community..
  9. …the list goes on and on.

Depth of available data

With historical end-of-day data available for all 12 DARWIN investment attributes, Quote data available in multiple timeframes down to tick level, and another 200+ diagnostic attributes available via FTP to complement data available via the DARWIN API, trading strategy and DARWIN portfolio R&D scopes increased x-fold.

API users are now empowered to build proprietary solutions with the DARWIN asset class… be they filters, portfolios, indicators, platform features.. the possibilities span as far as your imagination can take them.

Here’s an example that demonstrates such development:

To support users in this quest, Darwinex Labs will continue to publish detailed video tutorials and source code on a weekly basis, as well as API wrappers via GitHub, all open source.

And as the API matures further over time, available features will also see an increase!

As always, we’ll publish all beta and release candidate features on the Darwinex Community Forum, where we’ll also rely heavily on your valuable feedback and experience over time.


Related links

More information and access to the APIs

API Walkthrough

Darwinex API Store

API T&C

Darwinex Collective Slack Workspace for Algorithmic R&D

darwin-data

Raw DARWIN Data via FTP

We’ve started offering, completely free of charge, raw DARWIN data via FTP. Our aim is to make these data exploitable algorithmically.

The available data are divided into two groups:

  1. DARWIN quotes with a resolution of seconds.
  2. Time series and statistics which support the data and graphs shown on the DARWIN page.

We offer these data independently from the DARWIN API for the following reasons:

  • It’s a large volume of data.
  • The data are offered raw.
  • The data are purely historical.

Request your FTP access here.


 

MetaTrader Expert Advisors: The Set & Forget Myth [EAS-II]

MetaTrader EAs: The “Set & Forget” Myth [EAS-II]

This is the second post in the MetaTrader Expert Advisor [EAS] series we’ve begun recently.

In case you missed it, here’s a link to the first post:
Commercial Expert Advisors: everything that glitters is not gold.


If you’re wondering what “set & forget” means, it’s a common catch phrase used widely by many commercial MetaTrader EA vendors.

It implies that the prospective user of the EA advertised as “set & forget” need do nothing more than purchase, install and run said EA (usually accompanied by a notice to follow all rules, instructions etc as supplied by the vendor).

In this post, we’ll discuss some popular misconceptions centered around this “set & forget” EA-marketing phenomenon.

We’ll present a set of realities that attempt to dispel the notion, realities that are usually not presented by most EA vendors to retail forex traders.

By the end of this blog post, you’ll have developed a keen, rather chiseled understanding of what to look for in an MT4 Expert Advisor, how EA vendors are much better off listing DARWINs instead, and how “set & forget” is more “marketing sizzle” than it is ever a reality.

.. so let’s begin 🙂


“Set & Forget”: Myth vs. Reality

 

Almost all commercial MetaTrader EA websites will promote:

  1. The EA’s performance over historical and live data,
  2. Target asset mix, risk management and execution,
  3. Testimonials of other EA users validating the information above.

 

For the remainder of this post today, we’ll focus on the first point.


Backtests & Live Forward Tests

Profitable backtests showing a MetaTrader EA’s performance over many years, do add some credibility to a vendor’s effort.

Backtesting: Historical vs Live Performance

Backtesting: Historical vs Live Performance

However.. as easy to use a platform as MetaTrader is,

any backtest conducted in the MetaTrader 4 Strategy Tester cannot have taken the following into consideration:

 

  • Variable slippage & spread
  • Valid, temporally accurate swap charges (especially for trading strategies employing high leverage and/or those trading a large number of currency pairs over several days)
  • Variable market impact due to out-of-sample political and/or economic developments
  • Event-driven changes in a broker’s margin requirements
  • Execution latency, e.g. due to VPS co-location or infrastructural constraints.
  • Technical issues & human error, e.g. operating system or other malfunction (e.g. failed VPS or computer hardware), failed credit card payments to VPS services, etc.
  • Unforeseen, sudden market movements, e.g. the GBP Flash Crash.
  • COMPLETELY different market conditions to those experienced during the backtest (e.g. what would it have looked like had conditions been entirely different? what is the probability of a particular backtest being “lucky”?)

 

Let’s discuss each of these in turn.


Variable Slippage & Spread

At the present time, MetaTrader’s Strategy Tester enables EA developers to test a fixed spread applied across an entire backtest.

Using additional third-party tools, it is however possible to simulate randomized spread and slippage across an entire backtest.

At bare minimum, unless a vendor has made the effort to at least simulate the effects of these two variables on historical returns, the likelihood is low of the EA performing similar to the backtest had it actually been live during that time.

Indeed, vendors could boost their product’s credibility considerably by presenting its DARWIN listing, showcasing investment attribute scores & distributions.

The need to sell an EA diminishes very fast if its DARWIN is worthy of attracting AuM and legally earning performance fees on investor profits.

Of particular interest in this case, would be scores attained for Capacity & Divergence. i.e. the effects of different levels of pip divergence on strategy returns, and trader vs estimated investor returns respectively.

Furthermore, if the EA vendor is promoting the expert advisor as “only suited to fixed spread environments”, you can rest assured that any live forward tests have been conducted with a non-DMA broker (see here for more on this and associated risks).

On that note, you may also wish to read the section on “when a live track record just isn’t enough” in this recent post.

Valid, temporally accurate swap charges

For trading strategies holding trades longer than intraday timeframes (particularly those employing high leverage), accurately accommodating swap charges in MetaTrader’s Strategy Tester is impossible without external tools and access to a time series of broker swap history.

For more information on the exact treatment of swap charges in MetaTrader’s Strategy Tester, please visit this link.

For loss averse trading systems in particular (that hold trades for long periods of time, add to losing positions in an attempt to recover prior losses, etc), “setting and forgetting” such an EA could prove to be a futile exercise if for example, the EA’s target asset mix contains high swap pairs (e.g. EA’s that scalp the NZD/JPY and AUD/JPY around rollover times).


Variable market impact due to out-of-sample developments

Under the definition of “set & forget”, some EA vendors may imply that a MetaTrader expert advisor that survived say, the global financial crisis of 2008 in a backtest, could also survive crises in the future.

Emphasis may be observed in their marketing material on the EA’s robust performance “in all market conditions”, weathering the storms of past crises successfully.

Claims may also be bolstered by references to the length of the backtest as “evidence” to this effect.

When confronted with the above, retail forex traders must tread carefully.

When a live trading track record just isn't enough.

Just as past performance is not indicative of future performance, weathering past storms (let alone in a backtest) is certainly not indicative of weathering completely different storms in future – be they the result of political and/or economic shifts in the market, to name a few.


Event-driven changes in a broker’s margin requirements

Sometimes brokers may need to adjust their margin requirements due to regulatory and/or market risk related issues.

This can significantly impact strategies that rely on excess leverage for example, and is impossible to model chronologically inside a MetaTrader backtest.


Execution latency

Backtests in MetaTrader model returns assuming execution remains statically available.

In live trading however, order execution latency (the difference in time and/or price, between submitted and filled orders) can be frequent depending on factors such as:

  1. Trade frequency,
  2. Available liquidity,
  3. Time of submission (e.g. before, during or after news?)
  4. Size of submission (e.g. 0.1 lots vs 100.0 lots)
  5. ..to name a few.

Modelling this near-stochastic process adequately is both difficult and infeasible in MetaTrader backtests.

Technical issues & human error

Simulating the effects of software/hardware malfunction, bounced payments to VPS services, dropping coffee on your laptop (or server casing?), on the returns of a strategy is certainly not “impossible”.

Measuring Variable Spread / Slippage Is Important

Possibilities include:

  • randomly excluding a variable number of trades..
  • ..during variable periods of time,
  • ….for variable lengths of time,
  • ……over variable samples of underlying asset prices,
  • ………at different sub-samples of trading hours,
  • ………..running 10,000+ runs in Monte Carlo simulations over the same… and,

……. well, hopefully you can see why a commercial EA vendor may need tremendous motivation (not to mention computational power!) to go to such trouble.. 🙂

Yet the risks of technical issues and human error drastically changing live trading returns, are very real, rendering “set & forget” a very risky proposition.


Unforeseen, SUDDEN market movements

Sudden, potentially catastrophic market movements such as the GBP Flash Crash of October, 2016, bring with them a tidal wave of abrupt changes, not only in price but also in execution and liquidity dynamics.

Historical data will permit commercial EA vendors to showcase their product’s performance using just price during such times, ignoring other important factors.

Dynamic Position Sizing / Liquidity Considerations

For example, if an EA survived the GBP Flash Crash in a backtest, using fixed spread whilst assuming perfect execution conditions, odds of a repeat performance are negligible (if not zero) in case the GBP Flash Crash were to happen again on the first day the EA is deployed live.

More reason to treat “Set & Forget” claims with more than just a pinch of salt.. 😉

COMPLETELY different market conditions

Lastly, there linger the following questions:

What if a backtest was just lucky? (walk forward optimized or not).

What if the market conditions observed in the backtest never surface again?

What would this backtest look like if ALL the bad trades happened at the beginning (or the end), or some random positional variation thereof?

“The backtest looks great, but..

What is the true probability of me seeing similar performance in future?

Given that this EA has this backtest, what is a realistic range of future returns I am likely to see?”

Next time you see an EA marketed with “Set & Forget”, ask yourself these questions.

Then ask the EA vendor these questions, demanding statistically sound evidence.

Or just ask them to list their product as a DARWIN, and let the Darwinex Analytical Toolkit score its performance for you.


Conclusion

We hope this post has helped readers appreciate that:

  1. “Set & Forget” is more marketing sizzle, less anything else.
  2. You always need more than just a backtest.
  3. Past performance really and truly isn’t an indicator of future performance.
  4. It is futile to ignore live trading dynamics in backtesting.
  5. If an EA vendor doesn’t have a DARWIN, ask them to list one.
  6. If an EA vendor won’t list a DARWIN, ask them to upload a backtest or track record and confirm investment attribute scores attained.
  7. If an EA is only meant for “fixed spread environments”, the backtest and live track record are both of little value.

 

Lastly, if you find yourself considering an EA purchase somewhere, send the seller an email saying:

“I understand you will make money selling this EA – but why not list a DARWIN instead?


Do you currently rent or sell your MetaTrader EA?

Consider listing a DARWIN instead and tap into investor capital on the Darwin Exchange! (.. more than $1,000,000 in performance fees paid to date).

Darwinex - The Open Trader Exchange

Machine Learning on DARWIN Datasets

Machine Learning on DARWIN Datasets (MLD-I)

Machine learning in essence, is the research and application of algorithms that help us better understand data.

By leveraging statistical learning techniques from the realm of machine learning, practitioners are able to draw meaningful inferences from and turn data into actionable intelligence.

Furthermore, the availability of several open source machine learning tools, platforms and libraries today enables absolutely anyone to break into this field, utilizing a plethora of powerful algorithms to discover exploitable patterns in data and predict future outcomes.

This development in particular has given rise to a new wave of DIY retail traders, creating sophisticated trading strategies that compete (and in some cases, outperform others) in a space previously dominated by just institutional participants.


In this introductory blog post, we will discuss supportive reasoning for, and different categories of machine learning. In doing so, we will lay the foundation for using machine learning techniques to create DARWIN trading strategies in future blog posts in this series.

For your convenience, this post is structured as follows:

1) The Case for Machine Learning

2) Three Main Categories of Machine Learning

3) Setting up Python/R & C++ for Machine Learning on DARWIN Datasets


1) The Case for Machine Learning

We live in an age where both structured and unstructured data are available in abundance. Not only that, people now also have the tools and resources to gather this data for themselves if they so wish (at little to no cost), a reality that did not exist before.

Over time, machine learning has evolved into a robust means for capturing knowledge from, analyzing and creating predictive models for large volumes of data in a scalable, efficient manner when compared to manual human-driven practices. In doing so, it has also enabled practitioners to iteratively improve upon existing models and incorporate data driven decision-making in their pursuits.

Apart from its widespread use in finance, machine learning has also given rise to things over time that many now take for granted.

For example,

  • Email SPAM filters,
  • Video recommendation engines,
  • Personalized advertising,
  • Internet search engines,
  • Industrial robotics (e.g. in the automobile industry),
  • ..and even self-driving cars!

The DARWIN dataset (a multivariate time series) can therefore benefit from machine learning led research, and that’s exactly what this series of blog posts aims to lay the groundwork for.

In fact, there exists an ever-growing number of DARWIN assets on our Exchange that are powered entirely by machine learning driven trading strategies, three categories of which we discuss next.


Three Main Categories of Machine Learning

Main Types of Machine Learning

Main Types of Machine Learning

These are:

  1. Supervised Learning
  2. Unsupervised Learning
  3. Reinforcement Learning

An argument can indeed be made for a fourth category – Deep Reinforcement Learning – that involves a combination of supervised and reinforcement learning practices (more on this in future posts).

We will now discuss the key differences between these three types, and with the help of examples, develop an understanding of their practical applications.


1) Supervised Learning

Supervised Machine Learning

Supervised Machine Learning

In supervised learning, our aim is to “learn” a predictive model from “labeled” training data. The learned model is assessed for its ability to generalize well to unseen data, after which it can be used to predict outcomes based on future unseen data.

There are two main sub-categories of supervised learning:

  • Regression
  • Classification

1.1) Regression

Sir Francis Galton (coined the term

Sir Francis Galton (coined the term “regression”)

The term regression was coined by an English Statistician, Sir Francis Galton in 1886, in an article he wrote called Regression Towards Mediocrity in Hereditary Stature, where he described his research findings on how children’s heights did not depend on their parents’, but in fact regressed towards the population mean.

In regression tasks, we first use an existing set of:

  • Continuous predictor variables (e.g. historical scores of a DARWIN’s investment attributes and underlying strategy data) and,
  • A continuous response or target variable (i.e. the corresponding DARWIN Quote)

For example, one possible training set using DARWIN data, could have the following structure:

Timestamp | Ex | Mc | Rs | Ra | Os | Cs | R+ | R- | Dc | La | Pf | Cp | uVar | oOrd | dLev | Quote

..where uVar = Underlying Strategy VaR (%), oOrd = Open Orders and dLev = D-Leverage.

In this example, the Quote represents our response or target variable, and the rest our predictor variables. However, there is nothing stopping us from considering any other variable as our target variable.

For example, a study could switch from attempting to predict a DARWIN’s next Quote (for trade entry purposes) to say predicting the next La (for forecasting loss aversion). Several possibilities exist depending on the problem one is trying to solve.

In all cases, supervised machine learning attempts to find relationships between the predictor variables that “explain” the data, and the target variable (the output).

The following image illustrates one of the most basic forms of regression tasks, a linear regression.

In this example, a straight line is “fit” robustly to training data containing predictor values (x) and a response value (y), such that the distance between the data points and the line is minimized. The resultant gradient and intercept of the line can then be used to predict the outputs (y) of future unseen data (x).

Machine Learning - Linear Regression

Machine Learning – Linear Regression

Future blog posts in this series will cover the details, mathematical notation and how to perform regression tasks on DARWIN datasets in Python and R, with sample code.


1.2) Classification

In this sub-category of supervised machine learning, our task is to predict what discrete group (or “class”) unseen data belongs to.

As in regression analysis, the predictive model is once again “learned” from a training set where predictor variables and their corresponding target variable have already been provided to us. Only in this instance, the target variable is not a continuous numeric value, but a fixed set of class labels or groups.

Using the example given above in the discussion on regression analysis applied to DARWIN data, a classification approach could modify the problem from predicting a continuous output (DARWIN Quote), to a binary output (UP or DOWN).

The predictive model in this case would then be used to predict the DARWIN’s next movement (UP or DOWN) as opposed to a numeric value for its next forecast Quote (or any other target variable depending on the problem being attempted).

However, binary classification is not a must. A predictive model will classify unseen data based on class labels (groups) observed in the training set, thereby also permitting multi-class classification.

Supervised Machine Learning - Multi-Class Classification

Supervised Machine Learning – Multi-Class Classification

For example,

If the training set of DARWIN data contained rows of attribute scores for predictors (as in the regression example above) and class labels UP, DOWN, SIDEWAYS, BREAKOUT, STAY-OUT for targets, a robust predictive model could then “classify” future unseen data as one of these classes, possible use cases including forecasting direction, volatility, risk management, etc.

Future posts in this series will cover the details, mathematical notation and how to perform classification tasks on DARWIN datasets in Python and R, with sample code.


2) Unsupervised Learning

Unsupervised Machine Learning

Unsupervised Machine Learning

Unlike supervised machine learning where a training set contains predictors and a target variable’s true outcomes, in unsupervised learning the data structure is unknown.

Unsupervised learning techniques can be used to study this unknown structure, in an attempt to explore and extract valuable intelligence for a variety of predictive modeling purposes.

There are two main sub-categories of unsupervised learning:

  • Clustering
  • Dimensionality Reduction

2.1) Clustering

Clustering is an unsupervised learning technique that enables practitioners to take data with unknown structure and assemble it into meaningful classes or clusters.

Unlike supervised classification problems where training data will enable the “learning” of underlying relationships from already available ground truths, clustering algorithms will assemble data of unknown structure into classes without any previous knowledge of underlying relationships.

Each class or cluster arrived upon essentially includes a set of observations that are quite similar to each other, but dissimilar to observations found in other clusters. This makes clustering a great approach to extracting meaningful intelligence from input data.

Some of the many motivations for utilizing unsupervised learning in finance include data cleansing, portfolio selection, de-noising and detecting regime change.

The following image illustrates how clustering algorithms can be deployed on data with unknown structure, and yield finite numbers of clusters based on the similarity of predictor data:

Unsupervised Machine Learning - Clustering Algorithm

Unsupervised Machine Learning – Clustering Algorithm

Future posts in this series will explore and implement possible use cases of unsupervised clustering to DARWIN data, such as dynamic portfolio selection, custom filter creation, determination of seasonality in DARWIN risk profiles, to name a few ideas.

Working code in Python/R/C++ will also be provided alongside any implementations arrived upon.


2.2) Dimensionality Reduction

Data of large dimensions can present challenges in terms of storage, computational efficiency (especially in real-time – an important consideration for trading algorithms) and performance.

Combining 12 investment attributes for each DARWIN, across over 2,500 DARWINs (as of 07 December, 2017 12:30 GMT), with the multitude of underlying strategy parameters available, as well as any additional feature engineering can quickly give rise to situations where a dimensionality reduction exercise may be warranted.

Dimensionality reduction is useful for:

  • Reducing data from large to smaller dimensions, such that most of the important information in it is retained.
  • Visualization exercises where data of large dimensional space can be projected onto 1D to 3D space for subsequent rendering in standard statistical charts for analysis.

The following image illustrates how dimensionality reduction can project a multi-dimensional (>3) dataset to a 2D surface while retaining most of its important information:

Machine Learning - Dimensionality Reduction

Machine Learning – Dimensionality Reduction

Future posts in this series will outline the rationale and implementation of any dimensionality reduction exercises carried out, accompanied by Python/R/C++ source code where appropriate.


3) Reinforcement Learning

Machine Learning - Reinforcement Learning

Machine Learning – Reinforcement Learning

This sub-category is related to supervised learning, and involves the development of agents (e.g. systems) that optimize their own performance via interactions with their environment. 

Agents respond to the state of their current environment, which also contains a reward signal. With repeated interactions using a trial-and-error driven approach, the agent learns what series or assortment of actions leads to maximal reward.

Possibly one of the most amazing developments in the field of reinforcement learning is DeepMind’s AlphaGo Zero – in a nutshell, a reinforcement learning algorithm that mastered the game of Go by playing against itself repeatedly!

Reinforcement learning has several applications in trading, including its use in trade entry/exit timing, portfolio rebalancing and determining optimal holding periods, to name a few.

Future posts in this series will assess the suitability of reinforcement learning to DARWIN datasets, present any studies carried out and provide Python/R/C++ source code for the same.


Setting up Python/R & C++ for Machine Learning on DARWIN Datasets

In order to follow along with our future publications that include implementations and source code, you’ll need to have a functional DARWIN data science environment setup to support Python, R & C++.

For a detailed set of requirements and configuration instructions, please see our recent blog post Setting up a DARWIN Data Science Environment.

As always, if you have any questions, please feel free to leave them in the comments section at the bottom of this post and we’ll respond as soon as we can!


Additional Resource: How to Interface Python/R Trading Strategies with MetaTrader 4

$DWC 1-Minute Differenced Series

Mean Reversion Tests on DARWIN $DWC

In a previous post – Quantitative Modeling for Algorithmic Traders – we discussed the importance of Expectation, Variance, Standard Deviation, Covariance and Correlation.

In this post we’ll discuss how those concepts can be applied to DARWIN assets.

As a practical example, we will employ a series of statistical tests to assess if DARWIN $DWC is a Mean Reverting time series or otherwise.

 

These will include:

1) Hurst Exponent
2) Augmented Dickey-Fuller Test (ADF)
3) Half-life of Mean Reversion

 

In case you missed it, the mean reverting nature of DARWIN $DWC was discussed in our most recent post here.

Tests will be conducted on 1-Minute returns from $DWC, results and interpretation being published along the way. As always, please share your comments, feedback and suggestions in the comments at the end.

Note: Different statistical tests don’t always lead to similar outcomes, therefore it’s considered good practice to use at least two when evaluating mean reversion or any other statistical properties.

Before proceeding further, it’s important that we understand what Autocorrelation and Stationarity are.


Autocorrelation (Serial Correlation)

Autocorrelation (Serial Correlation)

Autocorrelation:

Also referred to as Serial Correlation.

It is a measure of the similarity or relationship between a time series and a delayed or “lagged” version of the same time series, over successive periods in time.

 

 

 

Stationary Time Series

Stationary Time Series

Stationarity:

A time series is considered stationary if its core statistical attributes remain constant over time.

These include mean, variance, standard deviation, autocorrelation, etc.

Stationary series demonstrate high predictability.

 

 

If a time series (e.g. DARWIN) can be mathematically transformed to approximately stationary, future Quotes of the time series (or trade entry direction / entries) can be reverse engineered from future points in its forecasted stationary series.

More on this in future blog posts.

Prior Assumptions:

Prior to conducting these tests on $DWC data, we are expecting to see a reasonable degree of mean reversion for the following reasons:

  1. There is visual confirmation (see below) that mean reverting tendency may exist.
  2. As $DWC behaves in relation to real time trader sentiment, it is reasonable to assume that it could exhibit cyclical behaviour.
$DWC 1-Minute Data Plot

$DWC 1-Minute Data Plot

$DWC 1-Minute Differenced Series

$DWC 1-Minute Differenced Series


Mean Reversion Test #1: Hurst Exponent

Mean Reversion in a time series can be assessed in terms of its rate of diffusion from inception.

 

For a time series X to be considered mean reverting:

Rate of Diffusion (X) < Rate of Diffusion of a Geometric Random Walk (GBM)

 

This rate of diffusion can be measured as the variance of the logarithm of the time series, at a random time interval T:

\(Var(T) = \left \langle \left | log(t + T) – log(t) \right |^{2} \right \rangle\)

 

If a time series is a GBM, then Var(T) ~ T, as T gets larger:

\(\left \langle \left | log(t + T) – log(t) \right |^{2} \right \rangle\) ~ T

 

If a time series is either trending or mean reverting, then:

\(\left \langle \left | log(t + T) – log(t) \right |^{2} \right \rangle\) ~ \(T^{2H}\)

.. where H is the Hurst Exponent, a measure of the extent to which the time series trends or mean reverts.

 

Hurst Exponent Interpretation:

If H > 0.5, the time series is TRENDING
If H < 0.5, the time series is Mean Reverting
If H = 0.5, the time series is a Geometric Random Walk

 

The DWC’s Hurst Exponent can be easily calculated in R, using the “pracma” library.

Note: For all code examples in this blog post, we have pre-loaded M1 data as “DWC.M1” to save time.

library(pracma)# Print M1 data Hurst Exponent
> hurstexp(log(DWC.M1$quote))
Simple R/S Hurst estimation: 0.8962816
Corrected R over S Hurst exponent: 0.9945418
Empirical Hurst exponent: 1.001317
Corrected empirical Hurst exponent: 0.9938308
Theoretical Hurst exponent: 0.520278

This first test shows that though this sample of DWC data is not demonstrating mean reverting behaviour (Theoretical Hurst Exponent > 0.5), it is not trending significantly either -> i.e. it is almost behaving like a GBM as per this test’s results (H = 0.520278), reducing the probability of DWC being a non-stationary random walk process.


Mean Reversion Test #2: Augmented Dickey-Fuller Test

If the $DWC time series is not a random walk (non-stationary series), then any Quote in the series will have a proportional relationship with the Quote immediately before it.

If $DWC is mean reverting, then any move higher above its mean would likely be followed by a move lower and vice versa.

The ADF Test checks for the presence of unit roots in a time series that’s autoregressive in nature, and for the tendency of a time series to mean revert.

Consider the following autoregressive model of order p:

\(\Delta x_{t} = \alpha + \beta t + \gamma x_{t-1} + \delta _{1}\Delta x_{t-1} + … + \delta _{p-1}\Delta x_{t-p+1} + \epsilon _{t}\)

The ADF test will statistically evaluate if γ = 0 (the null hypothesis) can be rejected at a given confidence interval.

If the null hypothesis can be rejected, it implies that the time series is not a random walk (non-stationary / no linear relationship between data points), and that there is a linear relationship between the current DWC Quote and the one immediately before it (stationary).

 

The ADF Test can be carried out in R quite easily, using the “urca” library.

ADF Test (1-minute DWC data):

> library(urca)
> summary(ur.df(DWC.M1$quote, type="drift", lags=1))
###############################################
# Augmented Dickey-Fuller Test Unit Root Test #
###############################################
Test regression driftCall:
lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
Residuals:
Min 1Q Median 3Q Max
-1.66860 -0.01990 0.00008 0.02011 1.16945
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0347880 0.0114228 3.045 0.00232 **
z.lag.1 -0.0003287 0.0001075 -3.057 0.00224 **
z.diff.lag -0.0365180 0.0045255 -8.069 7.22e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.04493 on 48479 degrees of freedom
Multiple R-squared: 0.001542, Adjusted R-squared: 0.001501
F-statistic: 37.45 on 2 and 48479 DF, p-value: < 2.2e-16
Value of test-statistic is: -3.0566 4.849Critical values for test statistics:
1pct 5pct 10pct
tau2 -3.43 -2.86 -2.57
phi1 6.43 4.59 3.78

 

Interpretation of ADF Test Results

Referring back to the autoregressive model earlier:

\(\Delta x_{t} = \alpha + \beta t + \gamma x_{t-1} + \delta _{1}\Delta x_{t-1} + … + \delta _{p-1}\Delta x_{t-p+1} + \epsilon _{t}\)

z.lag.1 = The value of the test-statistic γ (gamma) in the above equation.

tau2 = Critical values corresponding to the null hypothesis (γ = 0)

In order to reject the null hypothesis (γ = 0 – i.e. to reject that DWC is a non-stationary random walk), the value of the test statistic must be smaller than the critical values in tau2 (1%, 5% and 10% confidence intervals).

As z.lag.1 is -3.0566 (smaller than the critical values for the 5% and 10% confidence intervals), the null hypothesis can be rejected at the 90% and 95% confidence intervals, i.e. the probability of DWC being stationary (or not a random walk) is very high.

The tests above were also conducted on 30-minute, 1-hour, 2-hour, 4-hour and Daily precision $DWC data.

  1. Daily precision lead to the null hypothesis for the presence of a unit root being rejected at the 90% confidence interval. This test will be repeated periodically as more data is accrued over time.
  2. 30-minute, 1-hour, 2-hour and 4-hour tests all lead to the null hypothesis for the presence of a unit root being rejected at the 95% confidence interval.

Mean Reversion Test #3: Half-life of Mean Reversion

An alternative to the autoregressive linear model described above, is to consider how long any particular time series takes “to mean revert”.

By definition, a change in the next periodic value of a mean-reverting time series is proportional to the difference between the historical mean of the series and the current value.

Such time series are referred to as Ornstein-Uhlenbeck processes.

The differential of the earlier model leads us to the expected value of x(t):

\(E(x_{t}) = x_{0}e^{\gamma t} – \frac{\mu }{1 – e^{\gamma t}}\)

If DWC is a mean reverting series, and has a negative \(\gamma\), then the equation above tells us that DWC prices decay exponentially, with a half-life of \(\frac {-log(2)}{\gamma}\).

This means we now have two tasks ahead of us:

  1. Find \(\gamma\) and check if it is negative.
  2. Calculate the half-life and assess whether it is a practical length of time for traders to consider a mean reverting strategy on DWC.

Once again, we can easily conduct both steps in R.

Step 1: Calculate \(\gamma\) and check sign.

> M1.data <- as.ts(DWC.M1$quote)
> M1.data.lag <- lag(M1.data, -1)
> M1.data.delta <- diff(M1.data)
> M1.data.frame <- cbind(M1.data, M1.data.lag, M1.data.delta)
> M1.data.frame <- M1.data.frame[-1,]

> M1.regression <- lm(M1.data.delta ~ M1.data.lag, data=as.data.frame(M1.data.frame))

> gamma <- summary(M1.regression)$coefficients[2]
> print(gamma)
[1] -0.0003588994

\(\gamma\) is negative (-0.0003588994), so this $DWC 1-minute data sample can be considered mean reverting.

 

Step 2: Calculate half-life and assess practicality of mean reversion strategy.

> M1.data.half.life <- -log(2) / gamma> print(paste("Half-life: ", M1.data.half.life, " minutes, or ", M1.data.half.life/60, " Hours", sep=""))
[1] "Half-life: 1931.31306610404 minutes, or 32.1885511017341 Hours"

The half-life calculated for this $DWC 1-minute data sample is 32 hours.

 

Another important feature of the calculated half-life, is that it can be used as the period of a moving average employed in a mean reverting trading strategy[1].

If we plot a Simple Moving Average of period 1931 (in minutes, not hours), we get:

 

$DWC 1-Minute Data with SMA(1931)

$DWC 1-Minute Data with SMA(1931)


Summary:

  1. We conducted three statistical tests to ascertain the degree of mean reversion in $DWC 1-minute data, namely Hurst Exponent, Augmented Dickey-Fuller (ADF) and Half-Life of Mean Reversion.
  2. Hurst Exponent did not indicate mean reverting behaviour in the $DWC, but a rather close estimate for possible GBM behaviour.
  3. The Augmented Dickey-Fuller test results indicated stationary behaviour at the 95% confidence interval.
  4. The Half-life of Mean Reversion test indicated $DWC possesses mean reverting properties.
  5. We used the half-life calculated above as the period for a moving average, which when plotted on the chart revealed mean reverting Quote behaviour.

What are your thoughts after reading this research? ..please share in the comments section below!

 

References:

[1] Chan, Ernest, 2013. Algorithmic Trading: Winning Strategies and Their Rationale, John Wiley and Sons.


Additional Resource: Measuring Investments’ Risk: Value at Risk (VIDEO)
* please activate CC mode to view subtitles.

Do you have what it takes? – Join the Darwinex Trader Movement!

Darwinex - The Open Trader Exchange