Automated Tick Data Collection & Storage with R, MetaTrader, and a VPS

Let’s continue where we left off in our last post on tick data collection.
If you missed it, it’s important that you familiarise yourself with its contents first.

Here’s the link again:
https://blog.darwinex.com/download-tick-data-metatrader/

Therein, we promised a follow-up post that would discuss an approach for retail traders to automate tick data collection with the R programming language, MetaTrader and a VPS instance.

This would ensure that:

The collection process runs uninterrupted (barring any technical issues, VPS downtime and/or MetaTrader login issues), 24×7.
All ticks processed are saved to disk, uninterrupted by events such as restarts, power outages, downtime due to any security related issues, etc.
A repository of tick data is therefore always available to serve a variety of purposes down the line.
It becomes possible to feed stored tick/spread data to non-MetaTrader environments such as Python, R, Java, Julia and C/C++/C# to enable more sophisticated backtesting, exploratory research and/or machine learning.

Of course, it isn’t mandatory to use a VPS. If you’re able to ensure that the process can run uninterrupted on a local PC or laptop, that’s fine too.

Before we continue, you’ll remember that BID/ASK Spread was one of the variables calculated and subsequently stored in the output CSV last time:

Algorithmic traders (especially short term) often use maximum spread thresholds in their trading strategies.
This practice is most commonly used to prevent trading signals from being executed when asset spreads are too high for instance, as any excess could eat into returns quite considerably (with particularly grave consequences for traders targeting small gains).

However, this isn’t the only purpose spread data can serve.

Organizing and studying spread data can further assist traders in a number of ways:

Monitoring the evolution of time-weighted spread distributions, e.g. discovering notable, potentially exploitable recurring patterns.
Drawing practical inferences from the same, e.g. time ranges wherein spreads are favourable or otherwise.
Studying the correlation of a strategy’s historical returns, volatility and/or other metrics to its underlying assets’ spread profile.
Creating trading strategies that target particular spread profile behaviours.

.. to name a few.
And yes.. future blog posts will pay attention to these scenarios. But for now, to stay on topic (and prevent this blog post from approaching PhD thesis length), we’ll stick to configuring automation 🙂

Tick Data Collection & Storage: The Approach

Organising data is a very large, fairly complex and heavily researched area on its own -> hence impossible to encapsulate in a blog post.. or a hundred for that matter 😉
Therefore, for the benefit of all levels of readers here, we’ll demonstrate a simple approach from a typical retail trader’s perspective, describing how to:

Make minor modifications to the previous post’s indicator code (MetaTrader 4/5), such that it becomes possible to store each market day’s bid/ask/spread data separately.
Write an R script that reads in CSV files, compresses them and deletes the raw data to conserve space.
Create an automated BATCH process on a Windows Server 2012 VPS (or PC) to organize and compress data for later use.

As tick data files can become fairly large, depending on the sampling timeframe (seconds in MT4, milliseconds in MT5), this practice structures the data in an easily approachable, convenient way and conserves hard drive space.

Please note that this implementation isn’t the most space/time efficient or production-friendly.
Future posts will delve deeper into more robust techniques for storing time series data, replacing the CSV-based approach in this post with more professional practices involving e.g. HDF5, Feather and RDS file types, and/or MongoDB and MySQL databases.

What do you need?

1) One VPS Server (with at least 8GB of hard drive space and 4 GB of RAM).
2) One MetaTrader 4 or 5 installation (dedicated to tick data collection).
3) R v3.3.3 or above. See here for installation instructions.
4) RTools v3.3.3 or above (must be the same version as R). For file compression from within R.
5) Privileges on the VPS (or PC) to setup BATCH files to execute at predefined times via the Windows Task Scheduler.
Let’s begin! 🙂

1) Code Modifications

The MT4/MT5 indicators in the previous post were programmed such that all tick data for any length of time, for any given symbol, would be saved to one file (for each symbol).

Additionally, the OnCalculate() method in the previous version assumes control of the CSV file being written to, without relinquishing access until the indicator is removed.

Ultimately, this simple example aims for any external tools (e.g. Python, R, etc.) to have access to the previous day’s CSV files.
To make this happen, we need to ensure MetaTrader has relinquished access to the file so other programs (or users) can access it.
Therefore, we’ll modify this behaviour such that tick data for each market day is saved into separate files, one file per symbol per day.

Fully functional code implementing these changes has been uploaded to GitHub for your convenience.
For MetaTrader 4 users:
Click here to download DLabs_TickData_ToCSV_MT4_v2.0.mql4
For MetaTrader 5 users:
Click here to download DLabs_TickData_ToCSV_MT5_v2.0.mql5

Differences between MT4 and MT5 versions:

Month(), Day() and Year() functions from MQL4 have been replaced with an MqlDateTime struct called date_time.
TimeToStruct(TimeCurrent(), date_time) is called to populate the struct for later use in the code.

2) R script to compress tick data

As mentioned earlier, tick data files can grow very large over time.
Working with CSV files as we are in this example, it makes sense to compress the raw data and then delete it to conserve hard drive space.
We’ve written a functional R script to achieve this, heavily commented for your convenience and accessible via the download link below.

Please note: This script has been written in a manner to promote learning, and not been overly optimized for speed. There are much faster ways of writing this functionality, but not without compromising code readability for novice or even intermediate level readers.

Additionally,

This R script also creates a time series of summarized spread data, and saves a compressed version of the same to disk – one less thing for you to do 🙂
The summarized data for each time series record includes:

Timestamp (in seconds or milliseconds)
Minimum Spread
Maximum Spread
Average Spread
Number of ticks observed.

Notes:

Summarized spread data is stored in a new directory for each market day, under \\MQLX\\Files, where X is 4 for MetaTrader 4 and 5 for MetaTrader 5.
Compressed tick data is stored in a new directory called “ZIP_ARCHIVES”.

Here’s the R script:
Click here to download dlabs-calculate-spread-statistics-from-tick-data.R

Important Notes:

Before running the script, please edit the following line at the beginning of the script and replace the contents between “” with your own MetaTrader directory:
working_directory <- “C:\\Users\\INSERT-WINDOWS-USERNAME\\AppData\\Roaming\\MetaQuotes\\Terminal\\LONG-ALPHANUMERIC-STRING\\MQL4\\Files”
This directory location is accessible via MetaTrader 4/5 -> File menu -> Click Open Data Folder -> copy it from the address bar in Explorer.

3) Process Automation on a Windows VPS

There are plenty of tutorials online on how to schedule tasks on Windows Server 2012 instances. However, please find below, a quick list of steps to get you there 🙂
As the R script above is configured to look for all tick data files from YESTERDAY, we want to create a Scheduled Task to do the following:

Run a Windows BATCH (.bat) file that executes our R script.
Do this every day a little after the close of the previous market day.

The .bat file needs just two lines of code:
@echo off
“C:\Program Files\R\R-3.3.3\bin\R.exe” CMD BATCH “C:\Users\YOUR-WINDOWS-USERNAME\Desktop\dlabs-calculate-spread-statistics-from-tick-data.R”
Simply open Notepad or your favourite text editor – though you should really be using Notepad++, so much better … 😉 – edit the second line above to reflect the correct file path to the R script, and save the file as script.bat.

Now, assuming that:

You’ve downloaded the R script dlabs-calculate-spread-statistics-from-tick-data.R to your Desktop
The path to your .bat script is “C:\Users\YOUR-WINDOWS-USERNAME\Desktop\script.bat”
The path to R.exe is “C:\Program Files\R\R-3.3.3\bin\R.exe”

..follow these steps on a Windows Server 2012 instance:

Run the Task Scheduler from Control Panel -> Administrative Tools -> Task Scheduler
Under Actions -> Click “Create Task”
Enter “Automate Tick Data CSV to ZIP Script” in the “Name” field
Under Security Options -> Select “Run whether user is logged on or not”.
Click on Triggers -> New.
Set “Begin the task” to “On a schedule”.
Under Settings -> Select “Daily” and set the time for the process to execute every day automatically.
Under Advanced Settings, make sure only “Enabled” is checked -> Click OK.
Click on Actions -> New.
Set “Action:” to “Start a program”.
Enter “C:\Users\Darwinex\Desktop\script.bat” in the “Program/Script” text box -> Click OK.
Click OK once back to the main “Create Task” screen.

The script will now run every day at the time you specified in step (7) above.

Conclusion

Future posts will expand on the functionality presented in this post, such as discussing more practical, production-friendly ways of both organizing and storing time series data.
We’ll also demonstrate (with Python, R and C++ code) some of the use-cases for acquiring tick and spread data as discussed in this post.

As always, we hope you’ve enjoyed this tutorial, and look forward to any feedback you may have for us 🙂

Code samples have been kept as simple as possible to enable the programmer in you to thrive in extensibility, and exercise your creative freedom as best as you can!

Should you have ideas/feedback on how we can extend this implementation further, please do feel free to leave a comment below – we’ll try our best to either release an update, or resolve your query directly.

Also, kindly share this post using the buttons provided, with any colleagues and/or networks you feel would benefit from the content. Or just share it anyway to help us spread the word! 🙂

You may also wish to read:

4 Comments

Rimantas

Posted August 25, 2018 at 10:31 pm

Hi, does the script can download missed data if the the server or PC have failures? Also does the ZeroMQ does the same< writes data from MT4 to file or databse?

7Likes Reply
- Post Author
  
  The Market Bull
  
  Posted September 3, 2018 at 6:08 pm
  
  Hi Rimantas,
  Thank you for your question.
  This would depend on the logic of the functionality where you employ ZeroMQ. To poll missing data in the advent of connectivity issues or other failures, you’d need to write the relevant code to achieve that outcome.
  ZeroMQ is itself a messaging framework. Depending on your programming language, if it has a supported binding with ZeroMQ, then you can employ ZeroMQ to perform such I/O.
  Hope this helps answer your question.
  
  6Likes Reply
  - Rimantas
    
    Posted September 9, 2018 at 7:08 am
    
    I came across your code, didn’t find any calls to ZeroMQ from indicator code.
    
    6Likes Reply
    - Post Author
      
      The Market Bull
      
      Posted September 13, 2018 at 3:29 pm
      
      Hi Rimantas,
      The code in this post does not reference ZeroMQ – it is for storing data via MQL and post-processing via R.
      The blog post for interfacing Python with MetaTrader via ZeroMQ is here:
      https://blog-test2.darwinex.com/zeromq-interface-python-r-metatrader4/
      
      4Likes Reply

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.