minixc / simple-back Goto Github PK

View Code? Open in Web Editor NEW

59.0 3.0 12.0 12.1 MB

A simple daily python backtester that works out of the box.

License: Mozilla Public License 2.0

Python 100.00%

backtesting matplotlib jupyter-notebooks

simple-back's Introduction

📈📉 simple-back

Documentation | Request A Feature

Installation

pip install simple_back

Quickstart

The following is a simple crossover strategy. For a full tutorial on how to build a strategy using simple-back, visit the quickstart tutorial

from simple_back.backtester import BacktesterBuilder

builder = (
   BacktesterBuilder()
   .name('JNUG 20-Day Crossover')
   .balance(10_000)
   .calendar('NYSE')
   .compare(['JNUG']) # strategies to compare with
   .live_progress() # show a progress bar
   .live_plot(metric='Total Return (%)', min_y=None)  # we assume we are running this in a Jupyter Notebook
)

bt = builder.build() # build the backtest

for day, event, b in bt['2019-1-1':'2020-1-1']:
    if event == 'open':
        jnug_ma = b.prices['JNUG',-20:]['close'].mean()
        b.add_metric('Price', b.price('JNUG'))
        b.add_metric('MA (20 Days)', jnug_ma)

        if b.price('JNUG') > jnug_ma:
            if not b.portfolio['JNUG'].long: # check if we already are long JNUG
                b.portfolio['JNUG'].short.liquidate() # liquidate any/all short JNUG positions
                b.long('JNUG', percent=1) # long JNUG

        if b.price('JNUG') < jnug_ma:
            if not b.portfolio['JNUG'].short: # check if we already are short JNUG
                b.portfolio['JNUG'].long.liquidate() # liquidate any/all long JNUG positions
                b.short('JNUG', percent=1) # short JNUG

Why another Python backtester?

There are many backtesters out there, but this is the first one built for rapid prototyping in Jupyter Notebooks.

Built for Jupyter Notebooks

Get live feedback on your backtests (live plotting, progress and metrics) in your notebook to immediatly notice if something is off about your strategy.

Sensible Defaults and Caching

Many backtesters need a great deal of configuration and setup before they can be used. Not so this one. At it's core you only need one loop, as this backtester can be used like any iterator. A default provider for prices is included, and caches all its data on your disk to minimize the number of requests needed.

Extensible

This is intended to be a lean framework where, e.g. adding crypto data is as easy as extending the DailyPriceProvider class.

simple-back's People

Contributors

Stargazers

Watchers

Forkers

defendityourself governorbaloyi natemoser optionaldelta r-k-h fraufelder-frau vaiblast yaniv-private kyle-wendling webclinic017 samithaj

simple-back's Issues

add support for fees

I think it would be interesting to have the ability to add fees to "trades" / "positions"

replace .attr

It would be nice to write b.portfolio.value instead of b.portfolio.attr('value'). This should be possible to implement using https://docs.python.org/3/reference/datamodel.html#object.__getattribute__

add tests

Right now test coverage is non-existent

scheduling in strategies

Currently, doing specific things on certain days or events leads to messy nested ifs. I propose we add a decorators for scheduling. The big task (as always) is naming things, I was thinking of the following syntax, but there might be better options.

@run(every='monday', on='open')
@run(every='day', on='close')
@run(every='firstdayofweek', on='open') # run on tuesday if market closed on monday
@run(on='open') # sensible default for every could be 'day'
@run(every='tuesday') # while on could default to ['open', 'close'] (open & close)
...

Another option that instead of replacing ifs inside strategies would make it easier to determine if certain conditions are met would be to introduce a ScheduleConditions class, e.g. with shorthand bt.schedule

if bt.schedule.monday:
  ...
if bt.schedule.open:
  ...
if bt.schedule.first_day_of.week:
  ...
if bt.schedule.first_day_of.month:
  ...
if bt.schedule.date == '2019-1-1':
  ...
if bt.schedule.event == 'open':
  ...

We could also pass schedule to strategies directly and get rid of day and event.
Decorators could still work like this:

@run(Schedule.first_day_of.week)
...

Schedule could be static (it only depends on bt, no internal state) and e.g. bt.schedule.open could just be shorthand for Schedule.open(bt).
Or we could use a scheduling library but they would almost certainly have to be modified for the specific market calendars we are using.

crypto support

Tasks

~~make it possible to disable the current limitation of only buying full shares~~
make fractional shares a BacktesterBuilder option, with default set to False
find free data source for daily crypto data
provide crypto PriceDataProvider
BacktesterBuilder.interval for intra-day backtesting

add docstrings

tutorials

A good "getting started" tutorial should explain the core mechanics of simple-back without going to much into detail. I have structured it into the following parts:

Getting Started

quickstart (build a simple strategy)
move quickstart to notebook
debugging strategies
strategy object (making strategy objects and running more than one at once)
slippage
fees

Advanced

custom data providers (getting external data into the backtester)
custom metrics
custom price providers

Misc

cheat sheet of portfolio/metric ... usage

current close price replaced by na but not dropped

This example has taken from the quickstart tutorial

Code

builder = (
   BacktesterBuilder()
   .name('JNUG 20-Day Crossover')
   .balance(10_000)
   .calendar('NYSE')
   .compare(['JNUG']) # strategies to compare with
   #.live_progress() # show a progress bar
)


bt = builder.no_live_plot().build()
for day, event, b in bt['2019-1-1':'2020-1-1']:
    if event == 'open':
        jnug_ma = b.prices['JNUG',-5:]['close']
        print("-"*50)
        print("Current Date: " , day.strftime("%Y-%m-%d"),"\n\n")
        print("Current Close: ",b.price('JNUG'),"\n\n")
        print("Previous Close: \n",b.prices['JNUG',-1:]['close'],"\n")

Close prices

>>> b.prices['JNUG']['close']
2013-10-03    37749.738281
2013-10-04    33773.062500
2013-10-07    33648.792969
2013-10-08    29758.150391
2013-10-09    28438.962891
                  ...     
2019-12-24      766.299927
2019-12-26      813.159302
2019-12-27      778.961914
2019-12-30      835.891113
2019-12-31      832.202148

Expected

--------------------------------------------------
Current Date:  2019-12-27 


Current Close:  807.5760402517996 


Previous Close: 
 2019-12-26    813.159302
Name: close, dtype: float64 

--------------------------------------------------
Current Date:  2019-12-30 


Current Close:  791.2251285490009 


Previous Close: 
2019-12-27      778.961914
Name: close, dtype: float64 

--------------------------------------------------
Current Date:  2019-12-31 


Current Close:  867.695590073621 


Previous Close: 
 2019-12-30    835.891113
Name: close, dtype: float64

What I am getting

--------------------------------------------------
Current Date:  2019-12-27 


Current Close:  807.5760402517996 


Previous Close: 
 2019-12-26    813.159302
2019-12-27           NaN
Name: close, dtype: float64 

--------------------------------------------------
Current Date:  2019-12-30 


Current Close:  791.2251285490009 


Previous Close: 
 2019-12-30   NaN
Name: close, dtype: float64 

--------------------------------------------------
Current Date:  2019-12-31 


Current Close:  867.695590073621 


Previous Close: 
 2019-12-30    835.891113
2019-12-31           NaN
Name: close, dtype: float64

Is this supposed to do this way?

How to write simple ML backtesting without writing any oop code?

I have a ML model which takes three inputs and outputs a single value , similar to this

              #   ML training
from sklearn.datasets import make_regression
from sklearn.linear_model import LinearRegression
import pandas as pd

# Assume the features as three lagged inputs
X,y = make_regression(n_samples=100, n_features=3,noise=1)

reg = LinearRegression().fit(X, y)
reg.score(X, y)

     #   Back testing with trained model
from simple_back.backtester import BacktesterBuilder

builder = (
   BacktesterBuilder()
   .name('JNUG 20-Day Crossover')
   .balance(10_000)
   .calendar('NYSE')
   .compare(['JNUG']) # strategies to compare with
   .live_progress() # show a progress bar
)


bt = builder.no_live_plot().build()
for day, event, b in bt['2019-1-1':'2020-1-1']:
    if event == 'open':
        jnug_ma = b.prices['JNUG',-20:]['close'].mean()
        d = {'data1': b.prices['JNUG']['close'], 'data2': b.prices['JNUG']['open'],'data3': b.prices['JNUG']['high']}
        df = pd.DataFrame(d).dropna()
        pred = reg.predict(df)
        if pred > 0 and b.price('JNUG') > jnug_ma:
            # Long
        if pred < 0 and b.price('JNUG') < jnug_ma
            # Short

Here instead of iterating through b.prices['JNUG']['close'] alone I want to also use my newly created data df and its values for prediction and trading.

How can I do this in simple-back preferably without writing any oop code?

move docs to gh-pages

Pandoc + nbsphinx does not seem to work correctly on readthedocs, but builds fine locally.
Moving doc creation a github action and deploying docs on github pages would be a solution.

Segmentation fault while running example code in python3.6

I am getting segmentation fault error while running example code in python3.6

from simple_back.backtester import BacktesterBuilder

builder = (
   BacktesterBuilder()
   .name('JNUG 20-Day Crossover')
   .balance(10_000)
   .calendar('NYSE')
   .compare(['JNUG']) # strategies to compare with
   .live_progress() # show a progress bar using tqdm
   .live_plot() # we assume we are running this in a Jupyter Notebook
)

bt = builder.build()
for day, event, b in bt['2019-1-1':'2020-1-1']:
    pass

This returns

RuntimeError: main thread is not in main loop

>>> Segmentation fault

YahooFinanceSource Caching is a bit opaque

I didn't see this anywhere in the docs, but if the cache pulls a miss, it caches the miss somewhere. To anyone else seeing something like "data not available" on subsequent runs:

bt.prices.clear_cache()

Should fix it right up.

Slippage

As this backtester is built for open and close data and not intra-day data, results will be wrong to a certain extent as it is rarely possible to buy exactly at open or close price. Adding a SlippageModel that calculates a price drift would be easy, but coming up with an accurate SlippageModel and determining if it is accurate is not.

`SlippageModel`

1) random value from truncated normal distribution with open/close price as mean scaled by low/high

given current_price
slippage_price = random value from (gaussian: mu: current_price, sigma: x)
// x (sigma) can be set by user and determines how fat-tailed the distribution will be
if absolute value slippage_price > 1:
  set slippage_price so abs value = 1
if slippage_price > 0:
  slippage_price = slippage_price * high
if slippage_price < 0:
  slippage_price = slippage_price * low

disadvantages

does not take close/next day open into account
random (although could be seeded)

2) drift towards next price point

given current_price
slippage_price = current_price + (next_price - current_price) * x
// x can be set by user and determines how strong the price drift will be
// alternatively x could be scaled by the difference between open and close

disadvantages

price will always be higher/lower when next price is higher/lower, this discounts risk/volatility

3) combination

To mitigate most of the disadvantages from both algorithms, a combination could be to use 1) with the price from 2) as mean.

Evaluation

The only way I can think of to test this is to run an identical strategy in simple-back and Quantopian (which provides intra-day data). Using freely available intra-day data would be possible as well, but introduces new problems (e.g. what time do we assume to pass between order and execution)

todo

come up with strategy that works in both simple-back and Quantopian
implement all 3 options
test all options and perform grid search to find sensible defaults for x

drawbacks

Slippage in Quantopian (and real life) will differ based on liquidity, volatility, etc. - all data we don't have access to. Whatever default values/algorithms we come up with will heavily overfit to the "joint strategy" we test it on. I don't know of any papers on determining execution price from open and close prices, but if there are any, please let me know!

change in ordering api

At the moment the ordering is a bit clumsy and counter-intuitive (see example below).

b.order_pct('MSFT', 0.5) # long 50% of value in MSFT (will fail if >50% are invested)
b.order_abs('MSFT', 1_000) # long 1_000$ MSFT
b.order_abs('MSFT', b.balance.current*0.5) # long 50% of available funds in MSFT

b.order_pct('MSFT', -1) # short 100% MSFT
...

Proposed

b.long('MSFT', percent=.5)
b.long('MSFT', absolute=1_000)
b.long('MSFT', percent_available=.5)
b.long('MSFT', nshares=1)
# equivalent for short

Any other suggestions? What (if anything) should it default to when the keyword isn't set?

add section in documentation for common accessible attributes

I spent quite a bit of time last evening running through the code to find things which I think others might readily need and adding some support to the documentation for some of it would save the time for the next user.

For example:

b.portfolio.attr('value')

bt.balance.current

bt.balance.start

b.portfolio['XXX'].short.attr('num_shares')

b.portfolio['XXX'].long.attr('num_shares')

These are all very useful data points and I know you use them in your Performance analysis, I would like to see some place which just helps guide users for how to access things like this from the get go.

Thanks for the work you put into this, I was really impressed once I got the caching in and how thought out a lot if this is.

allow pickling of bt objects

Should be an easy fix, can just remove _last_thread in _graceful_stop.

build action fails when on branch other than master

This is really annoying for pull requests at the moment, because CI always fails on them. git-auto-commit fails with error: pathspec '<branch name>' did not match any file(s) known to git. Did not fimd an immediate solution for this, one workaround would be to have this be its own github action separate from the rest (we only really need automatic commits for master anyways.

tutorial example not working with datetime error

Hi,
I get a strange error when running the introductory tutorial example in notebook and also in terminal:
"KeyError: datetime.date(2019, 1, 2)"
Any idea?
Thanks
GV
I paste below the full report:

KeyError Traceback (most recent call last)
in
12
13 bt = builder.no_live_plot().build()
---> 14 for day, event, b in bt['2019-1-1':'2020-1-1']:
15 if event == 'open':
16 jnug_ma = b.prices['JNUG',-5:]['close']

/opt/anaconda3/lib/python3.7/site-packages/simple_back/backtester.py in getitem(self, date_range)
844 for date in self.dates:
845 self.datetimes += [
--> 846 sched.loc[date]["market_open"],
847 sched.loc[date]["market_close"],
848 ]

/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py in getitem(self, key)
877
878 maybe_callable = com.apply_if_callable(key, self.obj)
--> 879 return self._getitem_axis(maybe_callable, axis=axis)
880
881 def _is_scalar_access(self, key: Tuple):

/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
1108 # fall thru to straight lookup
1109 self._validate_key(key, axis)
-> 1110 return self._get_label(key, axis=axis)
1111
1112 def _get_slice_axis(self, slice_obj: slice, axis: int):

/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py in _get_label(self, label, axis)
1057 def _get_label(self, label, axis: int):
1058 # GH#5667 this will fail if the label is not present in the axis.
-> 1059 return self.obj.xs(label, axis=axis)
1060
1061 def _handle_lowerdim_multi_index_axis0(self, tup: Tuple):

/opt/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py in xs(self, key, axis, level, drop_level)
3486 loc, new_index = self.index.get_loc_level(key, drop_level=drop_level)
3487 else:
-> 3488 loc = self.index.get_loc(key)
3489
3490 if isinstance(loc, np.ndarray):

/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/datetimes.py in get_loc(self, key, method, tolerance)
620 else:
621 # unrecognized type
--> 622 raise KeyError(key)
623
624 try:

KeyError: datetime.date(2019, 1, 2)

add pylint badge

https://github.com/apmechev/pylint-badge

do not force pandas_market_calendars

Hello,

I was trying to setup the backtester with crypto and due to this line it's expecting an mcal name, which is not what I would expect if the .calendar() is optional.

https://github.com/MiniXC/simple-back/blob/master/simple_back/backtester.py#L812

Perhaps a check against this optional so that it doesn't try or if the feature is required, it is expanded to allow for additional calendars?

This also could be due to my lack of knowledge with pandas and calendars, however as an optional (https://minixc.github.io/simple-back/api/simple_back.html#simple_back.backtester.BacktesterBuilder.calendar) parameter, I wouldn't expect it to fail if I didn't provide one.

Thank you for this. I was going to prepare an example and submit a pull request for extending this to work with crypto data if you are interested in including such an example.

inconsistent use of nshares and num_shares

https://github.com/MiniXC/simple-back/blob/master/simple_back/backtester.py#L1312
https://github.com/MiniXC/simple-back/blob/master/simple_back/backtester.py#L137

So a position has num_shares but when opening a position you have nshares

I fought with this last night for quite a while just based on expectation that it would have been consistent across the implementation, not sure if this is by design or just different period of development, but it would seem they should be one in the same?

Just a small nag, and open for discussion.

add performance comparison with other backtesters

target equity weights

It could be beneficial for many strategies to assign target weights to a number of assets, e.g.
b.long(['AAPL', 'MSFT'], weights=[.5, .5])
I think sensible default behaviour would be to only allow liquidation of tickers in said weight list to achieve the target weights (but liquidation could be forced with an attribute) and that weights must add up to one. Also assets should only be rebalanced when calling long with the same set of weights again (otherwise a rebalance method could be an option as well).

Add imports to init, to incorporate autocompletion of many Engines

In order to give autocompleters (e.g. IntelliSense) the possibility to autocomplete the classes defined by the package without problems, the following import should be added to __init__.py:

from .backtester import Backtester
from .price_providers import DailyPriceProvider, YahooFinanceProvider


__all__ = [
    "Backtester",
    "DailyPriceProvider",
    "YahooFinanceProvider"
]

document matplotlib requirement for live plotting

My first run of a notebook got me

[...]simple_back\backtester.py:1517: UserWarning: matplotlib not installed, setting live plotting to false

A simple

pip install matplotlib

gets you over that bump, but since live plotting is one of the nice features of this backtester, it might be nice to mention that in the README.md or intro docs, especially if there's a minimum version needed. (I'm assuming that since there's a test for being able to successfully import matplotlib.pyplot that you want to keep the ability to deploy this w/o matplotlib rather than just add it to the requirements).

making simple_back faster

making simple-back faster

This issue is intended to keep track of efforts to improve simple-back performance, and will probably remain open for a while.

how slow is it?

At the moment, the quickstart example runs ~13 seconds with plotting enabled and ~1 second without (your mileage may vary). The difference is this big because plotting is blocking. async io or plotting in its own process with multiprocessing would help.
But this is not the whole picture. Even without plotting, backtests with many different symbols can take minutes if not hours.

why is it slow?

Retrieving prices will always be the slowest part of any backtester that does not have the whole universe of stocks and their prices in memory. At the moment, we use disk caching and then cache prices again in memory once they are requested at least once. But this is done on a more abstract level than it should be, as illustrated below.

The problem is that prices is often called with different dates, and _get_cached ends up with one cache entry for each date. The place where we should cache in memory is YahooPriceProvider, not DailyPriceProvider, while still making it easy for someone new to library writing a price provider without having to think to much about caching.

tasks

spend more time confirming the suspicions listed above (@Oberdiah is on the case)
move in-memory caching to YahooPriceProvider
look into async io for further speed-up
~~rewrite the whole thing in C~~

side-note on async io

The reason I think async io would be best in the long run is that we perform many tasks that have to wait for input (prices) at the moment. When e.g. buying all SP500 securities, a significant amount of time is spent for each order just waiting on the price, while in theory, all the orders could be executed at the same time and wait for their prices at the same time. Async IO could make syntax like this the new norm for buying multiple securities:

b.order_many(['ticker1', 'ticker2', 'ticker3', ...], [weight1, weight2, weight3, ...])

If we use async io under the hood, we can request all prices at once and don't have to wait for each one to arrive before ordering the next ticker.

minixc / simple-back Goto Github PK

simple-back's Introduction

📈📉 simple-back

Documentation | Request A Feature

Installation

Quickstart

Why another Python backtester?

Built for Jupyter Notebooks

Sensible Defaults and Caching

Extensible

simple-back's People

Contributors

Stargazers

Watchers

Forkers

simple-back's Issues

Tasks

Getting Started

Advanced

Misc

Code

Close prices

Expected

What I am getting

SlippageModel

1) random value from truncated normal distribution with open/close price as mean scaled by low/high

disadvantages

2) drift towards next price point

disadvantages

3) combination

Evaluation

todo

drawbacks

Proposed

Hi, I get a strange error when running the introductory tutorial example in notebook and also in terminal: "KeyError: datetime.date(2019, 1, 2)" Any idea? Thanks GV I paste below the full report:

making simple-back faster

how slow is it?

why is it slow?

tasks

side-note on async io

Recommend Projects

Recommend Topics

Recommend Org

`SlippageModel`

Hi,
I get a strange error when running the introductory tutorial example in notebook and also in terminal:
"KeyError: datetime.date(2019, 1, 2)"
Any idea?
Thanks
GV
I paste below the full report: