Automated Tick Data Collection & Storage with R, MetaTrader, and a VPS

13 April 2018
The Market Bull

Let’s continue where we left off in our last post on tick data collection.

If you missed it, it’s important that you familiarise yourself with its contents first.

Here’s the link again:
https://blog.darwinex.com/download-tick-data-metatrader/

Therein, we promised a follow-up post that would discuss an approach for retail traders to automate tick data collection with the R programming language, MetaTrader and a VPS instance.

This would ensure that:

  1. The collection process runs uninterrupted (barring any technical issues, VPS downtime and/or MetaTrader login issues), 24×7.
  2. All ticks processed are saved to disk, uninterrupted by events such as restarts, power outages, downtime due to any security related issues, etc.
  3. A repository of tick data is therefore always available to serve a variety of purposes down the line.
  4. It becomes possible to feed stored tick/spread data to non-MetaTrader environments such as Python, R, Java, Julia and C/C++/C# to enable more sophisticated backtesting, exploratory research and/or machine learning.

Of course, it isn’t mandatory to use a VPS. If you’re able to ensure that the process can run uninterrupted on a local PC or laptop, that’s fine too.


Before we continue, you’ll remember that BID/ASK Spread was one of the variables calculated and subsequently stored in the output CSV last time:

 

Tick Data - Bid, Ask, Spread, Time

Tick Data – Bid, Ask, Spread, Time (in seconds or milliseconds)

 

Algorithmic traders (especially short term) often use maximum spread thresholds in their trading strategies.

This practice is most commonly used to prevent trading signals from being executed when asset spreads are too high for instance, as any excess could eat into returns quite considerably (with particularly grave consequences for traders targeting small gains).

However, this isn’t the only purpose spread data can serve.

Organizing and studying spread data can further assist traders in a number of ways:

  1. Monitoring the evolution of time-weighted spread distributions, e.g. discovering notable, potentially exploitable recurring patterns.
  2. Drawing practical inferences from the same, e.g. time ranges wherein spreads are favourable or otherwise.
  3. Studying the correlation of a strategy’s historical returns, volatility and/or other metrics to its underlying assets’ spread profile.
  4. Creating trading strategies that target particular spread profile behaviours.

.. to name a few.

And yes.. future blog posts will pay attention to these scenarios. But for now, to stay on topic (and prevent this blog post from approaching PhD thesis length), we’ll stick to configuring automation 🙂


Tick Data Collection & Storage: The Approach

Organising data is a very large, fairly complex and heavily researched area on its own -> hence impossible to encapsulate in a blog post.. or a hundred for that matter 😉

Therefore, for the benefit of all levels of readers here, we’ll demonstrate a simple approach from a typical retail trader’s perspective, describing how to:

  1. Make minor modifications to the previous post’s indicator code (MetaTrader 4/5), such that it becomes possible to store each market day’s bid/ask/spread data separately.
  2. Write an R script that reads in CSV files, compresses them and deletes the raw data to conserve space.
  3. Create an automated BATCH process on a Windows Server 2012 VPS (or PC) to organize and compress data for later use.

As tick data files can become fairly large, depending on the sampling timeframe (seconds in MT4, milliseconds in MT5), this practice structures the data in an easily approachable, convenient way and conserves hard drive space.


Please note that this implementation isn’t the most space/time efficient or production-friendly.

Future posts will delve deeper into more robust techniques for storing time series data, replacing the CSV-based approach in this post with more professional practices involving e.g. HDF5, Feather and RDS file types, and/or MongoDB and MySQL databases.


What do you need?

1) One VPS Server (with at least 8GB of hard drive space and 4 GB of RAM).

2) One MetaTrader 4 or 5 installation (dedicated to tick data collection).

3) R v3.3.3 or above. See here for installation instructions.

4) RTools v3.3.3 or above (must be the same version as R). For file compression from within R.

5) Privileges on the VPS (or PC) to setup BATCH files to execute at predefined times via the Windows Task Scheduler.

Let’s begin! 🙂


1) Code Modifications

The MT4/MT5 indicators in the previous post were programmed such that all tick data for any length of time, for any given symbol, would be saved to one file (for each symbol).

Here’s what their logic looked like:

Diagram - Tick Data Collection (MQL Indicator) v1.0


Additionally, the OnCalculate() method in the previous version assumes control of the CSV file being written to, without relinquishing access until the indicator is removed.

Ultimately, this simple example aims for any external tools (e.g. Python, R, etc.) to have access to the previous day’s CSV files.

To make this happen, we need to ensure MetaTrader has relinquished access to the file so other programs (or users) can access it.

Therefore, we’ll modify this behaviour such that tick data for each market day is saved into separate files, one file per symbol per day.


Here’s what the modified logic looks like:

Diagram - Tick Data (MetaTrader Indicator) v2.0

Fully functional code implementing these changes has been uploaded to GitHub for your convenience.

For MetaTrader 4 users:
Click here to download DLabs_TickData_ToCSV_MT4_v2.0.mql4

For MetaTrader 5 users:
Click here to download DLabs_TickData_ToCSV_MT5_v2.0.mql5


Differences between MT4 and MT5 versions:

  1. Month(), Day() and Year() functions from MQL4 have been replaced with an MqlDateTime struct called date_time.
  2. TimeToStruct(TimeCurrent(), date_timeis called to populate the struct for later use in the code.

2) R script to compress tick data

As mentioned earlier, tick data files can grow very large over time.

Working with CSV files as we are in this example, it makes sense to compress the raw data and then delete it to conserve hard drive space.

We’ve written a functional R script to achieve this, heavily commented for your convenience and accessible via the download link below.

Please note: This script has been written in a manner to promote learning, and not been overly optimized for speed. There are much faster ways of writing this functionality, but not without compromising code readability for novice or even intermediate level readers.

Here’s what the logic looks like:

R script to compress tick data CSV to ZIP


Additionally,

This R script also creates a time series of summarized spread data, and saves a compressed version of the same to disk – one less thing for you to do 🙂

The summarized data for each time series record includes:

  1. Timestamp (in seconds or milliseconds)
  2. Minimum Spread
  3. Maximum Spread
  4. Average Spread
  5. Number of ticks observed.

Notes:

  1. Summarized spread data is stored in a new directory for each market day, under \\MQLX\\Files, where X is 4 for MetaTrader 4 and 5 for MetaTrader 5.
  2. Compressed tick data is stored in a new directory called “ZIP_ARCHIVES”.
Directory Structure - Summarized Spread and Tick Data

Directory Structure – Summarized Spread and Tick Data

 

Here’s the R script:
Click here to download dlabs-calculate-spread-statistics-from-tick-data.R

Important Notes:

  1. Before running the script, please edit the following line at the beginning of the script and replace the contents between “” with your own MetaTrader directory:
    working_directory <- “C:\\Users\\INSERT-WINDOWS-USERNAME\\AppData\\Roaming\\MetaQuotes\\Terminal\\LONG-ALPHANUMERIC-STRING\\MQL4\\Files”
  2. This directory location is accessible via MetaTrader 4/5 -> File menu -> Click Open Data Folder -> copy it from the address bar in Explorer.

3) Process Automation on a Windows VPS

There are plenty of tutorials online on how to schedule tasks on Windows Server 2012 instances. However, please find below, a quick list of steps to get you there 🙂

As the R script above is configured to look for all tick data files from YESTERDAY, we want to create a Scheduled Task to do the following:

  1. Run a Windows BATCH (.bat) file that executes our R script.
  2. Do this every day a little after the close of the previous market day.

The .bat file needs just two lines of code:

@echo off
“C:\Program Files\R\R-3.3.3\bin\R.exe” CMD BATCH “C:\Users\YOUR-WINDOWS-USERNAME\Desktop\dlabs-calculate-spread-statistics-from-tick-data.R”

Simply open Notepad or your favourite text editor – though you should really be using Notepad++, so much better … 😉 – edit the second line above to reflect the correct file path to the R script, and save the file as script.bat.


Now, assuming that:

  • You’ve downloaded the R script dlabs-calculate-spread-statistics-from-tick-data.R to your Desktop
  • The path to your .bat script is “C:\Users\YOUR-WINDOWS-USERNAME\Desktop\script.bat”
  • The path to R.exe is “C:\Program Files\R\R-3.3.3\bin\R.exe”

..follow these steps on a Windows Server 2012 instance:

  1. Run the Task Scheduler from Control Panel -> Administrative Tools -> Task Scheduler
  2. Under Actions -> Click “Create Task”
  3. Enter “Automate Tick Data CSV to ZIP Script” in the “Name” field
  4. Under Security Options -> Select “Run whether user is logged on or not”.
  5. Click on Triggers -> New.
  6. Set “Begin the task” to “On a schedule”.
  7. Under Settings -> Select “Daily” and set the time for the process to execute every day automatically.
  8. Under Advanced Settings, make sure only “Enabled” is checked -> Click OK.
  9. Click on Actions -> New.
  10. Set “Action:” to “Start a program”.
  11. Enter “C:\Users\Darwinex\Desktop\script.bat” in the “Program/Script” text box -> Click OK.
  12. Click OK once back to the main “Create Task” screen.

The script will now run every day at the time you specified in step (7) above.


Conclusion

Future posts will expand on the functionality presented in this post, such as discussing more practical, production-friendly ways of both organizing and storing time series data.

We’ll also demonstrate (with Python, R and C++ code) some of the use-cases for acquiring tick and spread data as discussed in this post.


As always, we hope you’ve enjoyed this tutorial, and look forward to any feedback you may have for us 🙂

Code samples have been kept as simple as possible to enable the programmer in you to thrive in extensibility, and exercise your creative freedom as best as you can!

Should you have ideas/feedback on how we can extend this implementation further, please do feel free to leave a comment below – we’ll try our best to either release an update, or resolve your query directly.

Also, kindly share this post using the buttons provided, with any colleagues and/or networks you feel would benefit from the content. Or just share it anyway to help us spread the word! 🙂


You may also wish to read:

  1. How to download tick-data in MetaTrader 4 & 5
  2. Sharpe Ratio – A Reliable Measure of Performance?
  3. Working with DARWIN Time Series Data in R (MLD-II)
  4. Setting up a DARWIN Data Science Environment
  5. Currency Index Indicator for MetaTrader
  6. DO’s and DONT’s of MetaTrader Backtesting
  7. ZeroMQ – How to Interface Python/R with MetaTrader
  8. Quantitative Modeling for Algorithmic Traders