Posts

Market correlation

Our latest improvement in…The Market Correlation Investable Attribute

In this blog post, we are going to explain the most recent improvements in the last investment attribute (IA) to join Darwinex’ ranks: Mr. Mc, AKA. Market Correlation.

However, before we get our hands dirty, let me briefly explain what correlation is for those of you who are not familiar with this concept.

Correlation is a statistic that measures the degree to which two variables move in relation to each other.

What are the 2 variables we use in Darwinex to calculate the Mc grade?

You are 100 % right! The relationship between your DARWIN’s return curve and the return curves in the underlying assets in which you trade.

Let me explain this further with an illustrative example.

Imagine that your DARWIN has yielded nice returns over the last year. As a result, 100% of Investors would think that you are a trading superstar and money would be pouring in into your DARWIN, right? Well, not that fast.

What if Darwinex’ algos discover that you have always been long EURUSD in 2017?

In this case, the return of your DARWIN will be 100% correlated with the EURUSD and you should not be awarded with any trading medal since you have basically made 1 trading decision in 2017: go long EURUSD.

In this extreme example, your Mc grade would be 0 which, in turn, would deteriorate your D-Score very badly, making it impossible to get a D-Score over 50, irrespective of the rest of the 11 investment attributes. Therefore, no AuM, no DarwinIA, no fame, no superstar status and an empty pocket 🙁

Darwinex Improves The Mc Investment Attribute

However, after having given this a lot of thought, we have reached the conclusion that this is not the most accurate way to measure the Mc attribute.

Well, to be totally honest with you, we already knew that this calculation was just an approximation. Nevertheless, we decided to implement it anyway for 2 primary reasons:

  • It would add much more value to our proprietary diagnostic toolkit
  • It would penalize “one-trick ponies” strategies which would likely prevent investors from investing in a “random” strategy -100% dependent on an exogenous factor, EURUSD evolution-

Why was our calculation only an approximation instead of 100% accurate?

This is due to the fact that, in the old Mc version, Darwinex didn’t take into consideration either the DARWIN’s leverage, which is now considered to be a trading decision in and of itself, nor the nº of D-Periods of experience accrued with such correlation

  • DARWIN leverage

Going back to our example, remember that your DARWIN has always been long EURUSD, imagine that the  leverage applied by our risk manager in your DARWIN, in order to offer an asset with a monthly target risk set at 10% VaR, had varied over time based on leverage changes in your underlying trading strategy. Modifications due to your technical or fundamental analysis/market conditions/predictions… on the EURUSD.

The DARWIN could have been using 5:1 leverage in some trades, 2:1 in others and then up to 8:1, etc. This way, both your DARWIN’s return curve and the EURUSD curve could look very different.

It is a fact that you have always been long EURUSD but your DARWIN could have been using very low leverage when the EURUSD went down, and more leverage when it went your way.

You’d thus be making a much better return % than the underlying asset in which you were trading.

  • Experience accrued: nºof D-Periods

The experience factor is a new variable that we have decided to introduce in the final calculation of Mc.

It is not the same to be highly correlated with the underlying asset during 1 week than in the course of 1 year, and we believe that the impact on the final grade can not be the same.

The tolerance level of the Mc algorithm will be inversely proportional to the number of D-Periods of experience during which said correlation is maintained.

Following our example, if the algorithm detected a significant correlation with the EURUSD, but this has occurred for a short period of time -1 D-Period-, the deterioration in the note of Mc would be lower than if you had 5.

The greater the nº of D-Periods keeping such correlation, the lower the degree of tolerance of the Mc attribute and the greater the penalty imposed to the D-Score.

So, after having thrown your strategy to the wolves, it turns out that you could still be trading superstar!

In summary, we have tweaked the Mc algo so we calculate its grade based on positions in the same asset– considering both leverage a trading decision in and of itself and the Experience factor.

Please note that the “leverage factor” change will improve accuracy of the Mc score in “medium-long term” strategies while not affecting scalpers or day traders and the “experience factor” will improve the Mc in almost all DARWINs.

Trade safe!

 


Do you want to say something about our latest improvement in the Mc investing attribute? You’re welcome to share your thoughts with other members of our Community here

$DWC 1-Minute Differenced Series

Mean Reversion Tests on DARWIN $DWC

In a previous post – Quantitative Modeling for Algorithmic Traders – we discussed the importance of Expectation, Variance, Standard Deviation, Covariance and Correlation.

In this post we’ll discuss how those concepts can be applied to DARWIN assets.

As a practical example, we will employ a series of statistical tests to assess if DARWIN $DWC is a Mean Reverting time series or otherwise.

 

These will include:

1) Hurst Exponent
2) Augmented Dickey-Fuller Test (ADF)
3) Half-life of Mean Reversion

 

In case you missed it, the mean reverting nature of DARWIN $DWC was discussed in our most recent post here.

Tests will be conducted on 1-Minute returns from $DWC, results and interpretation being published along the way. As always, please share your comments, feedback and suggestions in the comments at the end.

Note: Different statistical tests don’t always lead to similar outcomes, therefore it’s considered good practice to use at least two when evaluating mean reversion or any other statistical properties.

Before proceeding further, it’s important that we understand what Autocorrelation and Stationarity are.


Autocorrelation (Serial Correlation)

Autocorrelation (Serial Correlation)

Autocorrelation:

Also referred to as Serial Correlation.

It is a measure of the similarity or relationship between a time series and a delayed or “lagged” version of the same time series, over successive periods in time.

 

 

 

Stationary Time Series

Stationary Time Series

Stationarity:

A time series is considered stationary if its core statistical attributes remain constant over time.

These include mean, variance, standard deviation, autocorrelation, etc.

Stationary series demonstrate high predictability.

 

 

If a time series (e.g. DARWIN) can be mathematically transformed to approximately stationary, future Quotes of the time series (or trade entry direction / entries) can be reverse engineered from future points in its forecasted stationary series.

More on this in future blog posts.

Prior Assumptions:

Prior to conducting these tests on $DWC data, we are expecting to see a reasonable degree of mean reversion for the following reasons:

  1. There is visual confirmation (see below) that mean reverting tendency may exist.
  2. As $DWC behaves in relation to real time trader sentiment, it is reasonable to assume that it could exhibit cyclical behaviour.
$DWC 1-Minute Data Plot

$DWC 1-Minute Data Plot

$DWC 1-Minute Differenced Series

$DWC 1-Minute Differenced Series


Mean Reversion Test #1: Hurst Exponent

Mean Reversion in a time series can be assessed in terms of its rate of diffusion from inception.

 

For a time series X to be considered mean reverting:

Rate of Diffusion (X) < Rate of Diffusion of a Geometric Random Walk (GBM)

 

This rate of diffusion can be measured as the variance of the logarithm of the time series, at a random time interval T:

\(Var(T) = \left \langle \left | log(t + T) – log(t) \right |^{2} \right \rangle\)

 

If a time series is a GBM, then Var(T) ~ T, as T gets larger:

\(\left \langle \left | log(t + T) – log(t) \right |^{2} \right \rangle\) ~ T

 

If a time series is either trending or mean reverting, then:

\(\left \langle \left | log(t + T) – log(t) \right |^{2} \right \rangle\) ~ \(T^{2H}\)

.. where H is the Hurst Exponent, a measure of the extent to which the time series trends or mean reverts.

 

Hurst Exponent Interpretation:

If H > 0.5, the time series is TRENDING
If H < 0.5, the time series is Mean Reverting
If H = 0.5, the time series is a Geometric Random Walk

 

The DWC’s Hurst Exponent can be easily calculated in R, using the “pracma” library.

Note: For all code examples in this blog post, we have pre-loaded M1 data as “DWC.M1” to save time.

library(pracma)# Print M1 data Hurst Exponent
> hurstexp(log(DWC.M1$quote))
Simple R/S Hurst estimation: 0.8962816
Corrected R over S Hurst exponent: 0.9945418
Empirical Hurst exponent: 1.001317
Corrected empirical Hurst exponent: 0.9938308
Theoretical Hurst exponent: 0.520278

This first test shows that though this sample of DWC data is not demonstrating mean reverting behaviour (Theoretical Hurst Exponent > 0.5), it is not trending significantly either -> i.e. it is almost behaving like a GBM as per this test’s results (H = 0.520278), reducing the probability of DWC being a non-stationary random walk process.


Mean Reversion Test #2: Augmented Dickey-Fuller Test

If the $DWC time series is not a random walk (non-stationary series), then any Quote in the series will have a proportional relationship with the Quote immediately before it.

If $DWC is mean reverting, then any move higher above its mean would likely be followed by a move lower and vice versa.

The ADF Test checks for the presence of unit roots in a time series that’s autoregressive in nature, and for the tendency of a time series to mean revert.

Consider the following autoregressive model of order p:

\(\Delta x_{t} = \alpha + \beta t + \gamma x_{t-1} + \delta _{1}\Delta x_{t-1} + … + \delta _{p-1}\Delta x_{t-p+1} + \epsilon _{t}\)

The ADF test will statistically evaluate if γ = 0 (the null hypothesis) can be rejected at a given confidence interval.

If the null hypothesis can be rejected, it implies that the time series is not a random walk (non-stationary / no linear relationship between data points), and that there is a linear relationship between the current DWC Quote and the one immediately before it (stationary).

 

The ADF Test can be carried out in R quite easily, using the “urca” library.

ADF Test (1-minute DWC data):

> library(urca)
> summary(ur.df(DWC.M1$quote, type="drift", lags=1))
###############################################
# Augmented Dickey-Fuller Test Unit Root Test #
###############################################
Test regression driftCall:
lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
Residuals:
Min 1Q Median 3Q Max
-1.66860 -0.01990 0.00008 0.02011 1.16945
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0347880 0.0114228 3.045 0.00232 **
z.lag.1 -0.0003287 0.0001075 -3.057 0.00224 **
z.diff.lag -0.0365180 0.0045255 -8.069 7.22e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.04493 on 48479 degrees of freedom
Multiple R-squared: 0.001542, Adjusted R-squared: 0.001501
F-statistic: 37.45 on 2 and 48479 DF, p-value: < 2.2e-16
Value of test-statistic is: -3.0566 4.849Critical values for test statistics:
1pct 5pct 10pct
tau2 -3.43 -2.86 -2.57
phi1 6.43 4.59 3.78

 

Interpretation of ADF Test Results

Referring back to the autoregressive model earlier:

\(\Delta x_{t} = \alpha + \beta t + \gamma x_{t-1} + \delta _{1}\Delta x_{t-1} + … + \delta _{p-1}\Delta x_{t-p+1} + \epsilon _{t}\)

z.lag.1 = The value of the test-statistic γ (gamma) in the above equation.

tau2 = Critical values corresponding to the null hypothesis (γ = 0)

In order to reject the null hypothesis (γ = 0 – i.e. to reject that DWC is a non-stationary random walk), the value of the test statistic must be smaller than the critical values in tau2 (1%, 5% and 10% confidence intervals).

As z.lag.1 is -3.0566 (smaller than the critical values for the 5% and 10% confidence intervals), the null hypothesis can be rejected at the 90% and 95% confidence intervals, i.e. the probability of DWC being stationary (or not a random walk) is very high.

The tests above were also conducted on 30-minute, 1-hour, 2-hour, 4-hour and Daily precision $DWC data.

  1. Daily precision lead to the null hypothesis for the presence of a unit root being rejected at the 90% confidence interval. This test will be repeated periodically as more data is accrued over time.
  2. 30-minute, 1-hour, 2-hour and 4-hour tests all lead to the null hypothesis for the presence of a unit root being rejected at the 95% confidence interval.

Mean Reversion Test #3: Half-life of Mean Reversion

An alternative to the autoregressive linear model described above, is to consider how long any particular time series takes “to mean revert”.

By definition, a change in the next periodic value of a mean-reverting time series is proportional to the difference between the historical mean of the series and the current value.

Such time series are referred to as Ornstein-Uhlenbeck processes.

The differential of the earlier model leads us to the expected value of x(t):

\(E(x_{t}) = x_{0}e^{\gamma t} – \frac{\mu }{1 – e^{\gamma t}}\)

If DWC is a mean reverting series, and has a negative \(\gamma\), then the equation above tells us that DWC prices decay exponentially, with a half-life of \(\frac {-log(2)}{\gamma}\).

This means we now have two tasks ahead of us:

  1. Find \(\gamma\) and check if it is negative.
  2. Calculate the half-life and assess whether it is a practical length of time for traders to consider a mean reverting strategy on DWC.

Once again, we can easily conduct both steps in R.

Step 1: Calculate \(\gamma\) and check sign.

> M1.data <- as.ts(DWC.M1$quote)
> M1.data.lag <- lag(M1.data, -1)
> M1.data.delta <- diff(M1.data)
> M1.data.frame <- cbind(M1.data, M1.data.lag, M1.data.delta)
> M1.data.frame <- M1.data.frame[-1,]

> M1.regression <- lm(M1.data.delta ~ M1.data.lag, data=as.data.frame(M1.data.frame))

> gamma <- summary(M1.regression)$coefficients[2]
> print(gamma)
[1] -0.0003588994

\(\gamma\) is negative (-0.0003588994), so this $DWC 1-minute data sample can be considered mean reverting.

 

Step 2: Calculate half-life and assess practicality of mean reversion strategy.

> M1.data.half.life <- -log(2) / gamma> print(paste("Half-life: ", M1.data.half.life, " minutes, or ", M1.data.half.life/60, " Hours", sep=""))
[1] "Half-life: 1931.31306610404 minutes, or 32.1885511017341 Hours"

The half-life calculated for this $DWC 1-minute data sample is 32 hours.

 

Another important feature of the calculated half-life, is that it can be used as the period of a moving average employed in a mean reverting trading strategy[1].

If we plot a Simple Moving Average of period 1931 (in minutes, not hours), we get:

 

$DWC 1-Minute Data with SMA(1931)

$DWC 1-Minute Data with SMA(1931)


Summary:

  1. We conducted three statistical tests to ascertain the degree of mean reversion in $DWC 1-minute data, namely Hurst Exponent, Augmented Dickey-Fuller (ADF) and Half-Life of Mean Reversion.
  2. Hurst Exponent did not indicate mean reverting behaviour in the $DWC, but a rather close estimate for possible GBM behaviour.
  3. The Augmented Dickey-Fuller test results indicated stationary behaviour at the 95% confidence interval.
  4. The Half-life of Mean Reversion test indicated $DWC possesses mean reverting properties.
  5. We used the half-life calculated above as the period for a moving average, which when plotted on the chart revealed mean reverting Quote behaviour.

What are your thoughts after reading this research? ..please share in the comments section below!

 

References:

[1] Chan, Ernest, 2013. Algorithmic Trading: Winning Strategies and Their Rationale, John Wiley and Sons.


Additional Resource: Measuring Investments’ Risk: Value at Risk (VIDEO)
* please activate CC mode to view subtitles.

Do you have what it takes? – Join the Darwinex Trader Movement!

Darwinex - The Open Trader Exchange

Quantitative Modeling for Algorithmic Traders – Primer

Quantitative Modeling techniques enable traders to mathematically identify, what makes data “tick” – no pun intended 🙂 .

They rely heavily on the following core attributes of any sample data under study:

  1. Expectation – The mean or average value of the sample
  2. Variance – The observed spread of the sample
  3. Standard Deviation – The observed deviation from the sample’s mean
  4. Covariance – The linear association of two data samples
  5. Correlation – Solves the dimensionality problem in Covariance

Why a dedicated primer on Quantitative Modeling?

Understanding how to use the five core attributes listed above in practice, will enable you to:

  1. Construct diversified DARWIN portfolios using Darwinex’ proprietary Analytical Toolkit.
  2. Conduct mean-variance analysis for validating your DARWIN portfolio’s composition.
  3. Build a solid foundation for implementing more sophisticated quantitative modeling techniques.
  4. Potentially improve the robustness of trading strategies deployed across multiple assets.

Hence, a post dedicated to defining these core attributes, with practical examples in R (statistical computing language) should hopefully serve as good reference material to accompany existing and future posts.

Why R?

  1. It facilitates the analysis of large price datasets in short periods of time.
  2. Calculations that would otherwise require multiple lines of code in other languages, can be done much faster as R has a mature base of libraries for many quantitative finance applications.
  3. It’s free to download here.

 

* Sample data (EUR/USD and GBP/USD End-of-Day Adjusted Close Price) used in this post was obtained from Yahoo, where it is freely available to the public.

 

Before progressing any further, we need to download EUR/USD and GBP/USD sample data from Yahoo Finance (time period: January 01 to March 31, 2017)

In R, this can be achieved with the following code:

library(quantmod)

getSymbols("EUR=X",src="yahoo",from="2017-01-01", to="2017-03-31")

getSymbols("GBP=X",src="yahoo",from="2017-01-01", to="2017-03-31")

 

Note: “EUR=X” and “GBP=X” provided by Yahoo are in terms of US Dollars, i.e. the data represents USD/EUR and USD/GBP respectively. Hence, we will need to convert base currencies first.

To achieve this, we will first extract the Adjusted Close Price from each dataset, convert base currency and merge both into a new data frame for use later:

eurAdj = unclass(`EUR=X`$`EUR=X.Adjusted`)

# Convert to EUR/USD
eurAdj = 1/eurAdj  

gbpAdj <- unclass(`GBP=X`$`GBP=X.Adjusted`)

# Convert to GBP/USD
gbpAdj <- 1/gbpAdj

# Extract EUR dates for plotting later.
eurDates = index(`EUR=X`)  

# Create merged data frame.
eurgbp_merged <- data.frame(eurAdj,gbpAdj)

 

EUR/USD and GBP/USD (Jan 01 - Mar 31, 2017)

EUR/USD and GBP/USD (Jan 01 – Mar 31, 2017)

Finally, we merge the prices and dates to form one single dataframe, for use in the remainder of this post:

eurgbp_merged = data.frame(eurDates, eurgbp_merged)

colnames(eurgbp_merged) = c("Dates", "EURUSD", "GBPUSD")

 

The mean μ of a price series is its average value.

It is calculated by adding all elements of the series, then dividing this sum by the total number of elements in the series.

Mathematically, the mean μ of a price series P, where elements p ∈ P, with n number of elements in P, is expressed as:

\(μ = E(p) = \frac{1}{n} ∑ (p_1 + p_2 + p_3 + … + p_n)\)

In R, the mean of a sample can be calculated using the mean() function.

For example, to calculate the mean price observed in our sample of EUR/USD data, ranging from January 01 to March 31, 2017, we execute the following code to arrive at mean 1.065407:

mean(eurgbp_merged$EURUSD)

[1] 1.065407

 

Using the plotly library in R, here’s the mean overlayed graphically on this EUR/USD sample:

library(plotly)

plot_ly(name="EUR/USD Price", x = eurgbp_merged$Dates, y = as.numeric(eurgbp_merged$EURUSD), type="scatter", mode="lines") %>%

add_trace(name="EUR/USD Mean", y=(as.numeric(mean(eurgbp_merged$EURUSD))), mode="lines")

EUR/USD Mean Plotly (Jan01-Mar31, 2017)

EUR/USD Mean R Plotly Chart (Jan 01 – Mar 31, 2017)

The variance σ² of a price series is simply the mean or expectation, of the square of (how much price deviates from the mean).

It characterises the range of movement around the mean, or “spread” of the price series.

Mathematically, the variance σ² of a price series P, with elements p ∈ P, and mean μ, is expressed as:

\(σ²(p) = E[(p – μ)²]\)

Standard Deviation is simply the square root of variance, expressed as σ:

\(σ = \sqrt{σ²(p)} = \sqrt{E[(p – μ)²]}\)

 

In R, the standard deviation of a sample can be calculated using the sd() function.

For example, to calculate the standard deviation observed in our sample of EUR/USD data, ranging from January 01 to March 31, 2017, we execute the following code to arrive at s.d. 0.00996836:

sd(eurgbp_merged$EURUSD)

[1] 0.00996836

 

Using the plotly library in R again, we can overlay a single (or more) positive and negative standard deviation from the mean, as follows:

plot_ly(name="EUR/USD Price", x = eurgbp_merged$Dates, y = as.numeric(eurgbp_merged$EURUSD), type="scatter", mode="lines") %>%

add_trace(name="+1 S.D.", y=(as.numeric(mean(eurgbp_merged$EURUSD))+sd(eurgbp_merged$EURUSD)), mode="lines", line=list(dash="dot")) %>%

add_trace(name="-1 S.D.", y=(as.numeric(mean(eurgbp_merged$EURUSD))-sd(eurgbp_merged$EURUSD)), mode="lines", line=list(dash="dot")) %>%

add_trace(name="EUR/USD Mean", y=(as.numeric(mean(eurgbp_merged$EURUSD))), mode="lines")

EUR/USD Mean +/- 1 Standard Deviation Plotly Chart (Jan 01 - Mar 31, 2017)

EUR/USD Mean +/- 1 Standard Deviation R Plotly Chart (Jan 01 – Mar 31, 2017)

The sample covariance of two price series, in this case EUR/USD and GBP/USD, each with its respective sample mean, describes their linear association, i.e. how they move together in time.

Let’s denote EUR/USD by variable ‘e’ and GBP/USD by variable ‘g‘.

These price series will then have respective sample means of \(\overline{e}\) and \(\overline{g}\) respectively.

Mathematically, their sample covariance, Cov(e, g), where both have n number of data points \((e_i, g_i)\), can be expressed as:

\(Cov(e,g) = \frac{1}{n-1}\sum_{i=1}^{n}(e_i – \overline{e})(g_i – \overline{g})\)

 

In R, sample covariance can be calculated easily using the cov() function.

Before we calculate covariance, let’s first use the plotly library to draw a scatter plot of EUR/USD and GBP/USD.

 

To visualize linear association, we will also perform a linear regression on the two price series, followed by drawing this as a line of best fit on the scatter plot.

This can be achieved in R using the following code:

# Perform linear regression on EUR/USD and GBP/USD
fit <- lm(EURUSD ~ GBPUSD, data=eurgbp_merged)

# Draw scatter plot with line of best fit
plot_ly(name="Scatter Plot", data=eurgbp_merged, y=~EURUSD, x=~GBPUSD, type="scatter", mode="markers") %>%

add_trace(name="Linear Regression", data=eurgbp_merged, x=~GBPUSD, y=fitted(fit), mode="lines")

EUR/USD and GBP/USD Scatter Plot with Linear Regression

EUR/USD and GBP/USD Scatter Plot with Linear Regression

 

Based on this plot, EUR/USD and GBP/USD have a positive linear association.

 

To calculate the sample covariance of EUR/USD and GBP/USD between January 01 and March 31, 2017, we execute the following code to arrive at covariance 7.629787e-05:

cov(eurgbp_merged$EURUSD, eurgbp_merged$GBPUSD)

[1] 7.629787e-05

 

Problem: Being dimensional in nature, calculating just Covariance makes it difficult to compare price series with significantly different variances.

Solution: Calculate Correlation, which is Covariance normalized by the standard deviations of each price series, hence making it dimensionless and a more interpretable ratio of linear association between two price series.

 

Mathematically, Correlation ρ(e,g) of EUR/USD and GBP/USD, where \(σ_e\) and \(σ_g\) are their respective standard deviations, can be expressed as:

\(ρ(e,g) = \frac{Cov(e,g)}{σ_e σ_g} = \frac{\frac{1}{n-1}\sum_{i=1}^{n}(e_i – \overline{e})(g_i – \overline{g})}{σ_e σ_g}\)

  • Correlation = +1 indicates EXACT positive association.
  • Correlation = -1 indicates EXACT negative association.
  • Correlation = 0 indicates NO linear association.

 

In R, correlation can be calculated easily using the cor() function.

For example, to calculate the correlation between EUR/USD and GBP/USD, from January 01 to March 31, 2017, we execute the following code to arrive at 0.5169411:

cor(eurgbp_merged$EURUSD, eurgbp_merged$GBPUSD)

[1] 0.5169411

 

0.5169411 implies reasonable positive correlation between EUR/USD and GBP/USD, which is what we visualized earlier with our scatter plot and line of best fit.

 

In future blog posts, we will examine how to construct diversified DARWIN Portfolios using the information above in practice.

Trade safe,
The Darwinex Team

Additional Resource: Learn more about DARWIN Portfolio Risk (VIDEO)
* please activate CC mode to view subtitles.

Do you have what it takes? – Join the Darwinex Trader Movement!

Darwinex - The Open Trader Exchange

Darwinex – The Open Trader Exchange