A light-weight deep reinforcement learning framework for portfolio management. This project explores the possibility of applying deep reinforcement learning algorithms to stock trading in a highly modular and scalable framework.

License: GNU General Public License v3.0

Python 1.01% Jupyter Notebook 98.99%

deep-reinforcement-stock-trading's Introduction

Deep-Reinforcement-Stock-Trading

This project intends to leverage deep reinforcement learning in portfolio management. The framework structure is inspired by Q-Trader. The reward for agents is the net unrealized (meaning the stocks are still in portfolio and not cashed out yet) profit evaluated at each action step. For inaction at each step, a negtive penalty is added to the portfolio as the missed opportunity to invest in "risk-free" Treasury bonds. A lot of new features and improvements are made in the training and evaluation pipelines. All evaluation metrics and visualizations are built from scratch.

Key assumptions and limitations of the current framework:

trading has no impact on the market
only single stock type is supported
only 3 basic actions: buy, hold, sell (no short selling or other complex actions)
the agent performs only 1 action for portfolio reallocation at the end of each trade day
all reallocations can be finished at the closing prices
no missing data in price history
no transaction cost

Key challenges of the current framework:

implementing algorithms from scratch with a thorough understanding of their pros and cons
building a reliable reward mechanism (learning tends to be stationary/stuck in local optima quite often)
ensuring the framework is scalable and extensible

Currently, the state is defined as the normalized adjacent daily stock price differences for n days plus [stock_price, balance, num_holding].

In the future, we plan to add other state-of-the-art deep reinforcement learning algorithms, such as Proximal Policy Optimization (PPO), to the framework and increase the complexity to the state in each algorithm by constructing more complex price tensors etc. with a wider range of deep learning approaches, such as convolutional neural networks or attention mechanism. In addition, we plan to integrate better pipelines for high quality data source, e.g. from vendors like Quandl; and backtesting, e.g. zipline.

Getting Started

To install all libraries/dependencies used in this project, run

pip3 install -r requirement.txt

To train a DDPG agent or a DQN agent, e.g. over S&P 500 from 2010 to 2015, run

python3 train.py --model_name=model_name --stock_name=stock_name

model_name is the model to use: either DQN or DDPG; default is DQN
stock_name is the stock used to train the model; default is ^GSPC_2010-2015, which is S&P 500 from 1/1/2010 to 12/31/2015
window_size is the span (days) of observation; default is 10
num_episode is the number of episodes used for training; default is 10
initial_balance is the initial balance of the portfolio; default is 50000

To evaluate a DDPG or DQN agent, run

python3 evaluate.py --model_to_load=model_to_load --stock_name=stock_name

model_to_load is the model to load; default is DQN_ep10; alternative is DDPG_ep10 etc.
stock_name is the stock used to evaluate the model; default is ^GSPC_2018, which is S&P 500 from 1/1/2018 to 12/31/2018
initial_balance is the initial balance of the portfolio; default is 50000

where stock_name can be referred in data directory and model_to_laod can be referred in saved_models directory.

To visualize training loss and portfolio value fluctuations history, run:

tensorboard --logdir=logs/model_events

where model_events can be found in logs directory.

Example Results

Note that the following results were obtained with 10 epochs of training only.

Frequently Asked Questions (FAQ)

How is this project different from other price prediction approaches, such as logistic regression or LSTM?
- Price prediction approaches like logistic regression have numerical outputs, which have to be mapped (through some interpretation of the predicted price) to action space (e.g. buy, sell, hold) separately. On the other hand, reinforcement learning approaches directly output the agent's action.

References:

deep-reinforcement-stock-trading's People

Contributors

Stargazers

Watchers

Forkers

rk2900 huning2009 damonclifford williamwongys lianjian fmilthaler nhu2000 tonylibing benwaldner kmishra1204 shivswamiai fanszoro github59987 niranjan570 0xdarkman fangyawei laokpa satheeshcdo luhongkai itsmeashutosh43 deepsjai frankfan007 sycityhunter foeinlove jingmouren allensmile gatsjy kevinstarwars mathczh akchihab gisforgringo eblancoh zer0wiz qwang-big q-learning-trader timedcy gandalfearless oldcai 08volt hrocha raghavml chengweiwang0 aiedward webclinic017 vishalbelsare xiehai1983 venaissance mzs0207 shlokatadistance mike9304 zhenoctzh hallyx xfunture duyqui616 montshasta2020 knkarthick purpleyoung prernamishra08 robertmay615 sammarieobrown pandawadhwa virtualpeer arifmudi overfittingstudyroom noke8868 adivittala virtualcafe piyumaha12 johnandreslee narcisoperez howtostu cologne-12 adey4 rexche akrglu sinbad-the-sailor kongoldwant ejhortala techthiyanes adatwfx 12lholt mshaulskiy charlie-xiaoqi shanshan-he theharold normonisping faruihuihui beianchang skpalu ankurkumarshukla alexbuce jsyzc2019 explomind1 cmed0005 stripelf aliiiqbp qiangge1987 aviatorbeijing qpanfinance arminmkhani

deep-reinforcement-stock-trading's Issues

Saved models are only working on ^DJI_2016 stock

I get error or session ends abruptly in google collab hence I am not able to train DQN on other stocks and hence I am using your saved models. Only able to train till 2 episodes with 1 year of stock data

Your DQN saved models are only producing profits on stock ^DJI_2016 and not on any other stocks ( not even on ^DJI_2010-2015 )

I have tried evaluating DQN on other stocks such as Alphabet Inc , Shanghai Composite etc still model produces only a straight line of portfolio value

I have tried changing initial portfolio value as well from 50k to 500k and 5k still to no avail .

What should I do ?

Please help !!

Thanks Raghav

Save DDQN

Hi! Thank you for your answering!
I know what you mean,but my problem is that I modify model file names there like
if model_name == 'DDQN':
agent.model.save('saved_models/DDQN_ep' + str(e) + '.h5')
but the evaluate.py needs two files,one is"DDQN_ep10.h5", another is "DDQN_ep10_target.h5". I don't know how to save the "DDQN_ep10_target.h5",I can only save "DDQN_ep10.h5".
So I want to know how to do it.
Hope you can answer me ~
Sincerely！

Loss is in millions ...experience_replay()

Hello Albert,
Please pardon this note as an issue, new to github, not sure where else i can ask a question.

I am trying to run your DQN model, and i find that the loss value reported is very high, not sure how to read the loss...
I have printed the following in the training.py file -

print(e, loss, action_dict[action], reward, agent.balance, len(agent.inventory),len(agent.memory),agent.buffer_size)
logger.info('Episode {}\tLoss: {:.2f}\tAction: {}\tReward: {:.2f}\tBalance: {:.2f}\tNumber of Stocks: {}'.format(e, loss, action_dict[action], reward, agent.balance, len(agent.inventory)))

Could you please clarify about the loss ?

Thank You
Arjun

why don't agent balance learn slow on DDPG

Hi Albert,

I am trying to learn a DDPG model, and I refer the code from your GitHub repository, but I always get a large gap between the reward and the real price.
Can you do me a favor?
I would like to know why the learning process of DDPG is very slow and the reward balance is so small?

Thanks for your reply in advanced!

Best regards,
CCC

albert-z-guo / deep-reinforcement-stock-trading Goto Github PK