Giter Club home page Giter Club logo

dl-algo-trader's Introduction

Deep Reinforcement Learning for Algorithmic Trading

Training an Agent to make automated long/short trading decisions in a simulated stochastic market environment using Deep Q-Reinforcement Learning.

#Dependencies Python version 3.7.5

Packages: pip install tensorflow keras pandas matplotlib

Steps to run:

  1. python dl-agent.py #One run of 500 episodes
  2. python plot_reward.py #Plots the reward time series

IMPLEMENTATION

Based on the investment thesis of the mean reversion of the spreads, I will simulate 500 episodes of two mean reverting stochastic processes and train the agent to do a long/short strategy. Think of it as two instruments (stocks or bonds) belonging to the same industry sector which more or less move together and the agent i.e. Neural Net is the trader who will exploit the aberrations in their behavior due to news, earnings report, weather or other macro-economic events by going long on the cheaper instrument and short on the expensive one and vice versa until it reverts back to its mean. In fact, the Neural Net wouldn’t even know about the mean reversion behavior or whether to do a statistical arbitrage strategy or not, instead it will discover this pattern by itself in its pursuit to maximize the rewards/gains in every episode, i.e. it will learn this strategy by itself through trial and error. Once trained in this environment, this agent should be able to trade any two instruments which have a certain co-integrationbehavior and respective volatility range. We can safely assume that the trading volume is small enough so as to have no impact whatsoever on the market. I would like to re-emphasize the importance of generating unbiased data as opposed to using historical market data as I have defined the concept as ‘Smart Data’ in my previous post.

ENVIRONMENT

The first and the most important part is to design the environment. The environment class should implement the following attributes / methods based on the OpenAI / gymconvention:

Init : For initialization of the environment at the beginning of the episode.

State: Holds the price of A and B at any given time = t.

Step: The change in environment after one time step. With each call to this method, the environment returns 4 values described below:

a) next_state: The state as a result of the action performed by the agent. In our case, it will always be the Price of A and B at t = t + 1

b) reward: Gives the reward associated with the action performed by the Agent.

c) done: whether we have reached the end of the episode.

d) info: Contains diagnostic information.

Reset: To reset the environment after every episode of training. In this case, it restores the prices of both A and B to their respective means and simulates new price path.

Its a good practice to keep the environment code separate from that of the agent. Doing so, will make it easier to modify the environment’s behavior and training the agent on the fly. I wrote a Python class called market_env to implement its behavior.

A sample path of 500 time steps for the two assets generated by the environment with A(blue): mean = 100.0, vol = 10% and B(green): mean = 100.0, vol = 20% using the Ornstein–Uhlenbeck process (plotted using python/matplotlib) is shown below. As you can see that the two processes cross each other many times exhibiting a co-integration property, an ideal ground to train the agent for a long-short strategy.

Simulated Pair Prices

AGENT

The agent is a MLP (Multi Layer Perceptron) multi-class classifier neural network taking in two inputs from the environment: Price of A and B resulting in actions : (0) Long A, Short B (1) Short A, Long B (2) Do nothing, subject to maximizing the overall reward in every step. After every action, it receives the next observation (state) and the reward associated with its previous action. Since the environment is stochastic in nature, the agent operates through a MDP (Markov Decision Process) i.e. the next action is entirely based on the current state and not on the history of prices/states/actions and it discounts the future reward(s) with a certain measure (gamma). The score is calculated with every step and saved in the Agent’s memory along with the action, current state and the next state. The cumulative reward per episode is the sum of all the individual scores in the lifetime of an episode and will eventually judge the performance of the agent over its training. The complete workflow diagram is shown below:

Agent Environment

Why should this approach even work ? Since the spread of the two co-integrated processes exhibits a stationary property i.e. it has a constant mean and variance over time and can be thought of as having a normal distribution. The agent can identify this statistical behavior by buying and selling A and B simultaneously based on their price spread (= Price_A — Price_B) . For example, if the spread is negative it implies that A is cheap and B is expensive, the agent will figure the action would be to go long A and short B to attain the higher reward. The agent will try to approximate this through the Q(s, a) function where ‘s’ is the state and ‘a’ is the optimal action associated with that state to maximize its returns over the lifetime of the episode. The policy for next action will be determined using Bellman Ford Algorithm as described by the equation below:

Bellman-Ford

Through this mechanism, it will also appreciate the long term prospects than just immediate rewards by assigning different Q values to each action. This is the crux of Reinforcement Learning. Since the input space can be massively large, we will use a Deep Neural Network to approximate the Q(s, a) function through backward propagation. Over multiple iterations, the Q(s, a) function will converge to find the optimal action in every possible state it has explored.

Speaking of the internal details, it has two major components:

Memory: Its a list of events. The Agent will store the information through iterations of exploration and exploitation. It contains a list of the format: (state, action, reward, next_state, message)
Brain: This is the Fully Connected, Feed-Forward Neural Net which will train from the memory i.e. past experiences. Given the current state as input, it will predict the next optimal action.

To train the agent, we need to build our Neural Network which will learn to classify actions based on the inputs it receives. (A simplified Image below. Of course the real neural net will be more complicated than this.).

In the above image,

Inputs(2): Price of A and B in green.

Hidden(2 layers): Denoted by ‘H’ nodes in blue.

Output(3): classes of actions in red.

For implementation, I am using Keras and Tensorflow both of which are free and open source python libraries.

The neural net is trained with an arbitrarily chosen sample size from its memory at the end of every episode in real-time hence after every episode the network collects more data and trains further from it. As a result of that, the Q(s, a) function would converge with more iterations and we will see the agent’s performance increasing over time until it reaches a saturation point. The returns/rewards are scaled in the image below.

Performance Episodes

In the above graph, you can see 3 different plots representing entire training scenarios of 500 episodes, each having 500 steps. With every step, the agent performs an action and gets its reward. As you can see, in the beginning since the agent has no preconception of the consequences of its actions, it takes randomized actions to observe the rewards associated with it. Hence the cumulative reward per episode fluctuates a lot in the beginning from 0–300th episode, however beyond 300 episodes, the agent starts learning from its training and and by 400th episode, it almost converges in each of the training scenarios as it discovers the long-short pattern and starts to fully exploit it.

There are still many challenges to it and it is still a part of an ongoing research of engineering both the agent as well as the environment. My aim here was not to show a ‘backtested profitable trading strategy’ but to describe how to apply advanced Machine Learning concepts such as Deep Q-Learning/Neural Networks to the field of Algorithmic Trading. It is an extremely complicated process and pretty hard to explain in a single blog post however I have tried my best to simplify things. Check out dl-algo-trader link for the code.

Furthermore, this approach can be extended into a large portfolio of stocks and bonds and the agent can be trained under diverse range of stochastic environments. Additionally, the agent’s behavior can be constrained to various risk parameters such as sizing, hedging etc. One can also have multiple agents training under different suitability criteria given the desired risk/return profiles. These types of approximation can be made more accurately using large data sets and distributed computing power.

Eventually, the question is, can AI do everything ? Probably no. Can we effectively train it to do anything ? Possibly yes , i.e. with real intelligence, the artificial intelligence can surely thrive. Thanks for reading. Please feel free to share your ideas in the comment section below or connect with me on linkedin .

Hope you enjoyed the post !

DISCLAIMER

  1. Opinions expressed are solely my own and do not express the views or opinions of any of my employers.

  2. The information from the Site is based on financial models, and trading signals are generated mathematically. All of the calculations, signals, timing systems, and forecasts are the result of back testing, and are therefore merely hypothetical. Trading signals or forecasts used to produce our results were derived from equations which were developed through hypothetical reasoning based on a variety of factors. Theoretical buy and sell methods were tested against the past to prove the profitability of those methods in the past. Performance generated through back testing has many and possibly serious limitations. We do not claim that the historical performance, signals or forecasts will be indicative of future results. There will be substantial and possibly extreme differences between historical performance and future performance. Past performance is no guarantee of future performance. There is no guarantee that out-of-sample performance will match that of prior in-sample performance. )

Blog Link: https://medium.com/@gaurav1086/machine-learning-for-algorithmic-trading-f79201c8bac6

dl-algo-trader's People

Contributors

gaurav1086 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dl-algo-trader's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.