An (unofficial) implementation of the post "Trading Bitcoin with Reinforcement Learning".

Home Page: https://launchpad.ai/blog/trading-bitcoin

License: MIT License

Python 100.00%

trading-bitcoin-with-reinforcement-learning's Introduction

Trading Bitcoin with Reinforcement Learning

This post describes how to apply reinforcement learning algorithm to trade Bitcoin. This repository provides an implementation aims to reproduce the result.

BnH

A buy-and-hold strategy that always hold 2 Bitcoins starting from the beginning of the test period.
RL

A trained RL agent making trading decisions to hold 0~4 Bitcoins given the current market condition.
MMT

A momentum strategy that holds 4 Bitcoins when the 30-period SMA cross-over than the current closing price and 0 Bitcoin otherwise.

Dependencies

Python3.6
NumPy 1.17.1
Pandas 0.25.1
Matplotlib 3.1.1
PyTorch 1.2.0 (CPU only)

Data

The minute-by-minute data is downloaded from Kaggle. I resample them into 15-minute interval and compute all the features we need. Then I save the two dataframes under bitcoin-historical-data.

Note that,

I delete the row indexed 2017-04-15 23:00:00 after resampling since there is a clear error. This is done in the remove_outlier() method under the Data class.
Due to request, I include the 15-minute data in bitcoin-historical-data (due to size constraint on GitHub, I cannot update the 1-minute data and the feature dataframe generated from the 15-minute data.)

How to run

# E.g. clone to local (say to Downloads)
cd ~/Downloads/trading-bitcoin-with-reinforcement-learning/

# Usage: python main.py <path-to-one-minute-data>
# If argument not provided, the default file path
# './bitcoin-historical-data/coinbaseUSD_1-min_data.csv' is given
python main.py ./bitcoin-historical-data/coinbaseUSD_1-min_data.csv

Note: I observed substantial variability in the test result therefore the equity curve you got may not be 100% the same as mine.

trading-bitcoin-with-reinforcement-learning's People

Contributors

Stargazers

Watchers

trading-bitcoin-with-reinforcement-learning's Issues

Can you give your data for training?

When I download the data from Kaggle, I find the talib raise error for all nans inputs. And the cum return result of RL is always zero.

Zero Cum log returns

Thank you for sharing this repo and pytorch implementation of a trading example.

I encountered a problem with Zero Cum log returns after model was trained with your original codes.
Would you point out which might be wrong with it?
Thank you for considering my question and request.

Why zero/change init of model?

Hi there, another question - during the actor's initialization, you change it so that theres normal init for the hidden layer and zero/one weight/bias init for the fc layer, why is this?

` nn.init.normal(self.hidden[0].weight.data,
mean=0., std=math.sqrt(2 / self.hidden[0].in_features))
nn.init.constant(self.hidden[0].bias.data, 0.)

    # zeroing output layer
    nn.init.constant(self.out.weight.data, 0.)
    nn.init.constant(self.out.bias.data, 1.)`

Where to tweak num of bitcoins?

As Bitcoin price has gone up so much more, buy/hold up to 4 bitcoins are difficult for retail investors. Can someone help me find the line of code to tweak that number? Also, this will help applied the algorithm to other coins.

Thank you.

Test data is trained in the model

` def roll_out(env, model, train_mode):
model.eval()

ret = 0      # episode return
r_lst = []   # store reward
p_lst = []   # store price
P_lst = []   # store position
buffer = []

s = env.reset()
done = False
while not done:

    # sample action
    A_Pr = model.forward(Variable(s))
    #print('model:', A_Pr.data)
    act = torch.multinomial(A_Pr.data, num_samples=1)
    i_act = act[0][0]
    #print('act:', i_act)

    # apply action
    s_, r, done = env.step(i_act)

    # tracker
    ret += r
    r_lst.append(r)
    p_lst.append(env.curr_OHLCV()[3])
    P_lst.append(i_act)

    # Save transitions
    buffer.append((s, act, r))

    if done: break

    # Swap states
    s = s_

# Learning when episode finishes
model.train()

S, A, R = zip(*buffer)
del buffer[:]

S = Variable(torch.cat(S))
A = Variable(torch.cat(A))

# Compute target
Q = []
ret = 0
for r in reversed(R):
    ret = r + .9 * ret
    Q.append(ret)
Q.reverse()

# standardize Q
Q = np.array(Q).astype(np.float32)
Q -= Q.mean()
Q /= Q.std() + 1e-6
Q.clip(min=-10, max=10)
Q = np.expand_dims(Q, axis=1)
Q = Variable(torch.from_numpy(Q))

# PG update
if train_mode: #modified.................add a judge
    A_Pr = model.forward(S).gather(1, A).clamp(min=1e-7, max=1 - 1e-7)

    loss = -(Q * torch.log(A_Pr)).mean()
    model.optim.zero_grad()
    loss.backward()
    model.optim.step()

model.eval()
return ret, r_lst, p_lst, P_lst`

The code should make a judge that the mode is training or testing.

self._action in the Env class doesn't change

Why standardize Q values?

On main.py, you standardize the Q-vector (I assume this is the discounted cumulative reward), via mean and std. Why do you do this? Aren't you normalizing out any notion of positive/negative returns?

thirstyscholar / trading-bitcoin-with-reinforcement-learning Goto Github PK

trading-bitcoin-with-reinforcement-learning's Introduction

Trading Bitcoin with Reinforcement Learning

Data

How to run

trading-bitcoin-with-reinforcement-learning's People

Contributors

Stargazers

Watchers

Forkers

trading-bitcoin-with-reinforcement-learning's Issues

Can you give your data for training?

Zero Cum log returns

Why zero/change init of model?

Where to tweak num of bitcoins?

Test data is trained in the model

self._action in the Env class doesn't change

Why standardize Q values?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent