Posts

Setting up a DARWIN Data Science Environment in Windows, Linux & MacOS

Setting up a DARWIN Data Science Environment

This post describes how to setup a data science environment for DARWIN R&D.

Whether you’re a Data Scientist, Quant, Trader, Investor, Researcher, Developer or just someone keen on putting the DARWIN asset class under a scientific microscope, the contents of this post should hopefully give you a sound start.

The tools, libraries and datasets referenced herein are free to download, and employed by the Darwinex Labs team itself in its day to day efforts.


For your convenience, the rest of this post is structured as follows:

  1. Data Science Environment (Requirements & Setup)
  2. Required Data Science Libraries/Packages
  3. DARWIN Datasets (where & how to get them)

Data Science Environment (Requirements & Setup)

Python and R for Data Science

Python and R for Data Science

At the end of the day, all researchers have their own preferred R&D stack. For the purposes of this post however, we’ve chosen Python, R and C++ as the programming language base for our environment.

Why?

  1. Python -> easy to understand, powerful programming language with a large base of core libraries for machine learning, AI and statistical research.
  2. R -> free, robust alternative to commercial statistical research environments like MATLAB.
  3. C++ -> for enhancing performance, particularly in cases of mathematically intense calculations on large datasets.

Readers are of course most welcome to either extend this or craft a different stack should they so wish.


Requirements

For each language in our data science environment, we need the following:

  1. PythonAnaconda® Distribution – a free package and environment manager for Python developers.
  2. RBase R v3.3.2 or later, and RStudio Desktop – a fantastic IDE for code editing and visualization in R.
  3. C++Rtools for compiling external code modules in C++, for subsequent use in R when necessary.

Setup Instructions

  1. Python: Download and install the Anaconda Distribution, selecting Python v2.7. It ships with the Spyder IDE for code editing and visualization in Python, as well as Jupyter Notebook for compiling and sharing your research with colleagues, academia, etc.
  2. R: First download and install R v3.3.2 or later via the link above. Once installed, download and install RStudio via its link above.
  3. C++: Lastly, download and install Rtools v3.3.x (e.g. v3.3.2 if you downloaded R v3.3.2), for compiling external C++ code for use within R scripts.

Required Data Science Libraries / Packages

For Python:

We will initially require the following libraries:

Pandas: for data analysis, processing, restructuring and cleansing.

NumPy: for numerical and scientific computation using high performance data structures and vectorized mathematics.

SciPy: extends NumPy with functionality and additional algorithms for data manipulation and visualization.

Matplotlib: for 2D and 3D graphics.

Sci-Kit Learn: an extremely well-documented, robust and well-supported machine learning library.

Fortunately, all five ship with Anaconda and are installed by default when you install the Anaconda Distribution.


For R:

The following packages are essential to a lot of the research you’ll end up doing on DARWIN datasets:

R.utils, plotly, data.table, PerformanceAnalytics, TTR, xts, anytime, pracma, urca, forecast, tseries, stats, PortfolioAnalytics, RCurl, jsonlite, zoo, snow, sm, profr, proftools, MonteCarlo, microbenchmark, astsa, Rcpp, RcppArmadillo, RcppParallel, doParallel, inline, rbenchmark, knitr, plyr, corrplot, network, sna, ggplot2, GGally, xlsx.


Note: A convenient way to download and install all of these in your R data science environment, is to run the following code in an Rscript terminal or in the RStudio console:

if (!require(“pacman”)) install.packages(“pacman”)

# Define list of packages required for this project.

package.list <- c(“R.utils”, “plotly”, “data.table”, “PerformanceAnalytics”,
“TTR”, “xts”, “anytime”, “pracma”, “urca”, “forecast”, “tseries”, “stats”, “PortfolioAnalytics”,
“RCurl”, “jsonlite”, “zoo”, “snow”, “sm”, “profr”, “proftools”,
“MonteCarlo”, “microbenchmark”, “astsa”, “Rcpp”, “RcppArmadillo”, “RcppParallel”,
“doParallel”, “inline”, “rbenchmark”, “knitr”, “plyr”, “corrplot”,
“network”, “sna”, “ggplot2”, “GGally”, “xlsx”)

# Summon Pacman!
pacman::p_load(char=package.list, install=TRUE, update=FALSE)


DARWIN Datasets (where and how to get them)

Once the steps above are completed successfully, all we need is a DARWIN dataset to begin!

At the present time, data up to November 29, 2017 for DARWIN $DWC‘s Quotes is available via the Darwinex Labs GitHub profile, in both Daily (D1) and 1-minute (M1) precision.

We periodically update this dataset on GitHub, so check back every week or so for updates. And yes, we are working on an API where accessing data on-demand will become  a lot simpler (watch this space!).

You may download this data directly from GitHub in two ways:

1) Right-Click & Save-As on this link for 1-minute (M1) and this link for Daily (D1) data, or..

2) Execute the following code in an Rscript terminal or RStudio Console:

library(data.table)

DWC.M1.GitHub <- fread(“https://github.com/darwinex/DarwinexLabs/blob/master/datasets/community-darwins/DWC.M1.QUOTES.29.11.2017.csv?raw=true”, colClasses=”character”)

DWC.D1.GitHub <- fread(“https://github.com/darwinex/DarwinexLabs/blob/master/datasets/community-darwins/DWC.D1.QUOTES.29.11.2017.csv?raw=true”, colClasses=”character”)

Column #1 contains the Timestamp in POSIXct format, and Column #2 contains the Quote in the deepest available decimal precision.


Additional Resource: (Video) Setting up a DARWIN Data Science Environment

ZeroMQ - Distributed Trading Infrastructure

ZeroMQ – How To Interface Python/R with MetaTrader 4

Zero MQ - Distributed Messaging

ZeroMQ – Distributed Messaging

In this post, we present a technique employing ZeroMQ (an Open Source, Asynchronous Messaging Library and Concurrency Framework) for building a basic – but easily extensible – high performance bridge between external (non-MQL) programming languages and MetaTrader 4.

 

Reasons for writing this post:

  1. Lack of comprehensive, publicly available literature about this topic on the web.
  2. Traders have traditionally relied on Winsock/WinAPI based solutions that often require revision with both Microsoft™ and MetaQuotes™ updates.
  3. Alternatives to ZeroMQ include named pipes, and approaches where filesystem-dependent functionality forms the bridge between MetaTrader and external languages.

 

Click below to watch the video tutorials:

1) How to Interface Python Trading Strategies with MetaTrader

2) Algorithmic Trading via ZeroMQ: Trade Execution, Reporting & Management

3) Algorithmic Trading via ZeroMQ: Subscribing to Market Data

4) Build Algorithmic Trading Strategies with Python & ZeroMQ: Part 1

5) Build Algorithmic Trading Strategies with Python & ZeroMQ: Part 2


In this blog post, we lay the foundation for a distributed trading system that will:

  1. Consist of one or more trading strategies developed outside MetaTrader 4 (non-MQL),
  2. Use MetaTrader 4 for acquiring market data, trade execution and management,
  3. Support multiple non-MQL strategies interfacing with MetaTrader 4 simultaneously,
  4. Consider each trading strategy as an independent “Client”,
  5. Consider MetaTrader 4 as the “Server”, and medium to market,
  6. Permit both Server and Clients to communicate with each other on-demand.

 

Infographic: ZeroMQ-Enabled Distributed Trading Infrastructure (with MetaTrader 4)

Infographic: ZeroMQ-Enabled Distributed Trading Infrastructure (with MetaTrader 4)

Why ZeroMQ?

  1. Enables programmers to connect any code to any other code, in a number of ways.
  2. Eliminates a MetaTrader user’s dependency on just MetaTrader-supported technology (features, indicators, language constructs, libraries, etc.)
  3. Traders can develop indicators and strategies in C/C#/C++, Python, R and Java (to name a few), and deploy to market via MetaTrader 4.
  4. Leverage machine learning toolkits in Python and R for complex data analysis and strategy development, while interfacing with MetaTrader 4 for trade execution and management.
  5. ZeroMQ can be used as a high-performance transport layer in sophisticated, distributed trading systems otherwise difficult to implement in MQL.
  6. Different strategy components can be built in different languages if required, and seamlessly talk to each other over TCP, in-process, inter-process or multicast protocols.
  7. Multiple communication patterns and disconnected operation.

ZeroMQ: Supported Programming Languages

Though we focus on MQL interfaced with Python & R in this post, the basic process described here can be implemented easily in other ZeroMQ-supported languages.

A comprehensive list of ZeroMQ language bindings is available here:

Zero MQ Language Bindings


Who else is using ZeroMQ?

AT&T, Cisco, EA, Los Alamos Labs, NASA, Weta Digital, Zynga, Spotify, Samsung Electronics, Microsoft, CERN and Darwinex Labs.

ZeroMQ also powers at least 5 DARWINS on The DARWIN Exchange, where the underlying trading strategies were written in C++, Python and R.


Planning Flow Control

This post is not intended to be a detailed tutorial on ZeroMQ.

However, it is still important to understand a few things about ZeroMQ that make it particularly suited to the task of connecting external programming languages such as Python and R to MetaTrader 4.

  • It supports TCP, inter-process, in-process, PGM and EPGM enabled multicast networking. We will use the TCP transport type for the implementation in this post.
  • ZeroMQ enables servers and clients to connect “to each other” on demand, particularly useful for designing distributed trading infrastructure.
  • In addition to support for asynchronous communication and disconnected operation, ZeroMQ supports several communication patterns that permit higher-level data transfer, freeing programmers to focus more on the transfer logic rather than low-level mechanisms.
  • These patterns include: Request (REQ) / Reply (REP), Publish (PUB) / Subscribe (SUB) and Push (PUSH) / Pull (PULL).

 

For the implementation in this blog post, we will employ ZeroMQ’s REQ/REP and PUSH/PULL communication patterns. MetaTrader 4 will be our “Server”, and trading strategies will be “Clients”.

Please note that this (MT4=Server, Strategy=Client) is not a MUST – you will need to decide on whatever flow control suits your particular needs best.

For example, you might designate a machine independent of both the trading strategy as well as MetaTrader 4, as your Server, and have Strategies and MT4 both be Clients. There are a number of ways you could achieve the end goal; carefully planning flow control will lead to efficient functionality.

 

Request (REQ) / Reply (REP) Pattern

The Server (MetaTrader 4 EA) will employ a TCP socket of type REP, to receive requests and send responses. A REP socket MUST always initiate a pair of calls: first, a receive, followed by a send.

The Client (Trading Strategy, e.g. in Python) will employ a TCP socket of type REQ, to send requests and receive responses. A REQ socket MUST always initiate a pair of calls too: first, a send, followed by a receive.

For this implementation, the REQ/REP pattern will enable our Clients to send commands to the MetaTrader 4 Server and receive acknowledgements of the same (e.g. OPEN/MODIFY/CLOSE trades, GET BID/ASK RATES, GET HISTORICAL PRICES, etc.)

 

Push (PUSH) / Pull (PULL) Pattern

The Server (MetaTrader 4 EA) will also employ a second, PUSH socket, to send additional information to Clients (Trading Strategies). This is a one-way socket, and the server will only be able to send data to this socket, without being able to receive anything back through the same socket.

The Client (Trading Strategy) will also employ a second, PULL socket, to receive additional information from the Server. This too is a one-way socket, and the client will only be able to receive data from this socket, without being able to send anything through the same socket.

The PUSH/PULL pattern enables servers and clients to exchange data with each other on-demand, but in one direction without expecting a response. This could of course be swapped out for another REQ/REP pattern, depending on your application’s flow control requirements.

 

In summary, for this post’s basic implementation:

  1. The Server will employ two sockets, one REP and one PUSH.
  2. Each Client will employ two sockets, one REQ and one PULL.

 

Infographic: What this flow control plan looks like in practice.

Infographic: ZeroMQ Process Flow Control

 


MetaTrader 4 Expert Advisor – Components

As displayed in the infographic above, the MT4 EA will serve as our ZeroMQ-enabled Server, with three main modules:

  1. MESSAGE ROUTER – This allows the EA to receive commands and send acknowledgements back to connecting Clients (trading strategies) through the REP socket. The Router passes all messages on to the Parser. Note: For this example, the Router doesn’t serve much purpose, but it is good practice to have this intermediary where several strategies connect to the Server (MT4) and some manner of pre-parse actions may need to be performed.
  2. MESSAGE PARSER – Messages received by this module are decomposed into actions for the next module (Interpreter & Executor).
  3. INTERPRETER & EXECUTOR – This module literally “interprets” decomposed messages and performs requested actions accordingly. For example, if the Client is requesting market data, the module gathers it from the MetaTrader 4 History DB and sends it on to the Client via the PUSH socket. Alternatively, if the Client is requesting a BUY or SELL trade be opened on e.g. the EUR/USD, it sends the trade to market and a notification of success/failure/ticket-info to the Client via the PUSH socket.

Implementation Requirements

  1. ZeroMQ – MQL4 Bindings -> Download and install the required files as instructed here: https://github.com/dingmaotu/mql-zmq
  2. For Python -> “pyzmq” library
  3. For R -> “rzmq” library

Sample Code

To give you a head start, we’ve published a functional MetaTrader 4 Expert Advisor with the full implementation discussed in this blog post.

The MQL sample code provided is quite extensible, and can be used as a template in your efforts.

GitHub Links:

  1. DWX ZeroMQ Connector – Python & MQL

Notes:

  1. The Python source code demonstrates how communication patterns are implemented.
  2. It’s fairly simple to integrate this code in your existing Python/R trading strategies.

 


[WEBINAR REPLAY] How to Interface Python/R Trading Strategies with MetaTrader 4


[VIDEO TUTORIAL] Algorithmic Trading via ZeroMQ: Trade Execution, Reporting & Management (Python to MetaTrader)


[VIDEO TUTORIAL] Algorithmic Trading via ZeroMQ: Subscribing to Market Data (Python to MetaTrader)


[VIDEO TUTORIAL] Build Algorithmic Trading Strategies with Python & ZeroMQ: Part 1


[VIDEO TUTORIAL] Build Algorithmic Trading Strategies with Python & ZeroMQ: Part 2


Do you have what it takes? – Join the Darwinex Trader Movement!

Darwinex - The Open Trader Exchange