quantopian / empyrical Goto Github PK

View Code? Open in Web Editor NEW

1.2K 71.0 382.0 1.16 MB

Common financial risk and performance metrics. Used by zipline and pyfolio.

Home Page: https://quantopian.github.io/empyrical

License: Apache License 2.0

Python 100.00%

empyrical's Introduction

empyrical

Common financial risk metrics.

Installation
Usage
Support
Contributing
Testing

Installation

pip install empyrical

Usage

Simple Statistics

import numpy as np
from empyrical import max_drawdown, alpha_beta

returns = np.array([.01, .02, .03, -.4, -.06, -.02])
benchmark_returns = np.array([.02, .02, .03, -.35, -.05, -.01])

# calculate the max drawdown
max_drawdown(returns)

# calculate alpha and beta
alpha, beta = alpha_beta(returns, benchmark_returns)

Rolling Measures

import numpy as np
from empyrical import roll_max_drawdown

returns = np.array([.01, .02, .03, -.4, -.06, -.02])

# calculate the rolling max drawdown
roll_max_drawdown(returns, window=3)

Pandas Support

import pandas as pd
from empyrical import roll_up_capture, capture

returns = pd.Series([.01, .02, .03, -.4, -.06, -.02])

# calculate a capture ratio
capture(returns)

# calculate capture for up markets on a rolling 60 day basis
roll_up_capture(returns, window=60)

Support

Please open an issue for support.

Deprecated: Data Reading via `pandas-datareader`

As of early 2018, Yahoo Finance has suffered major API breaks with no stable replacement, and the Google Finance API has not been stable since late 2017 (source). In recent months it has become a greater and greater strain on the empyrical development team to maintain support for fetching data through pandas-datareader and other third-party libraries, as these APIs are known to be unstable.

As a result, all empyrical support for data reading functionality has been deprecated and will be removed in a future version.

Users should beware that the following functions are now deprecated:

empyrical.utils.cache_dir
empyrical.utils.data_path
empyrical.utils.ensure_directory
empyrical.utils.get_fama_french
empyrical.utils.load_portfolio_risk_factors
empyrical.utils.default_returns_func
empyrical.utils.get_symbol_returns_from_yahoo

Users should expect regular failures from the following functions, pending patches to the Yahoo or Google Finance API:

empyrical.utils.default_returns_func
empyrical.utils.get_symbol_returns_from_yahoo

Contributing

Please contribute using Github Flow. Create a branch, add commits, and open a pull request.

Testing

install requirements
- "nose>=1.3.7",
- "parameterized>=0.6.1"

./runtests.py

empyrical's People

Stargazers

Watchers

Forkers

nunofernandes-plight wall-ee heath9 jfferreira frankvigilante lotannaezenwa riverdarda faruto lostvkng jimmyhzuk pkan0583 jeyoor silky xiaojunw javablackbelt diwenxu-seekers jaydenwhyte egkachai avi-analytics cgdeboer hhuhhu carmine femtotrader bigan92 jameschristopher spider08 qubiroot ienliven limpins polyorca galoisgroupdev iruluttaluhar gabrieleiacono squirrelmaster riabaldevia apurvachaudhari tommyswei toledy vico wilsonwang371 davisvaughan hunslater-bot finquest bravea jamlamberti rsheftel ds-madhavan-ramani liam-f sigino robertlcs zjc5415 jaycode bwuebben htq310542 lipq525 saechtner strogo tianjixuetu wzy1993 vishalbelsare stjordanis lexa1983 yinnieryou misoinvestment etemiz jiapei100 word911 ayun2001 tnet pchavanne noisyoscillator hbpxp darkknight7 liudengfeng rpatelhc pjkonicki degerli kuangtu stuj79 wuyunhua clxn mayezhao giuse88 jingmouren gaochenyin houdie92 42093688 2quants databill86 williamsyb rigneydaniel no7dw scanfyu chetanmehra jaikumarm eagle0302 algoskynet bingtangben paolominguzzi amy-kathleen

empyrical's Issues

Stripping of name will have no effect if cached Fama-French exists

Pyfolio calls https://github.com/quantopian/empyrical/blob/master/empyrical/utils.py#L309 which reads the Fama-French factors from a cached file if it exists. Thus, users upgrading pyfolio and empyrical will get the error that column 'Mom' does not exist, as the old name with the white-space is still in the cache.

The problem is in this PR: #63

CC @vikram-narayan

Ensure max_drawdown returns a float scalar

max_drawdown promises to return a float scalar in it's documentation; however in reality it returns an ndarray/Series object with same dimensionality as the input. This creates problem in 'calmar_ratio' as the 'if max_dd<0:' statement throws "Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()" exception.

Someone is name squatting

https://pypi.org/project/empyrical3/
santosjorge/cufflinks#145

Calmar Ratio w/ EMDD

It would be great if you can add this.
http://alumnus.caltech.edu/~amir/mdd-risk.pdf

function EDD = emaxdrawdown(Mu, Sigma, T)
%EMAXDRAWDOWN Compute expected maximum drawdown for a Brownian motion.
%
%   EDD = emaxdrawdown(Mu, Sigma, T);
%
% Inputs:
%   Mu - The drift term of a Brownian motion with drift.
%   Sigma - The diffusion term of a Brownian motion with drift.
%   T - A time period of interest or a vector of times.
%
% Outputs:
%   EDD - Expected maximum drawdown for each time period in T.
%
% Notes:
%   If a geometric Brownian motion with stochastic differential equation
%       dS(t) = Mu0 * S(t) * dt + Sigma0 * S(t) * dW(t) ,
%   convert to the form here by Ito's Lemma with X(t) = log(S(t)) such that
%       Mu = M0 - 0.5 * Sigma0^2
%       Sigma = Sigma0 .
%
%   Maximum drawdown is non-negative since it is the change from a peak to a
%   trough.
%
% References:
%   Malik Magdon-Ismail, Amir F. Atiya, Amrit Pratap, and Yaser S. Abu-Mostafa,
%   "On the Maximum Drawdown of a Brownian Motion", Journal of Applied
%   Probability, Volume 41, Number 1, March 2004, pp. 147-161. 
%
%   See also MAXDRAWDOWN
%   Copyright 2005 The MathWorks, Inc.
%   $Revision: 1.1.6.2 $ $Date: 2005/12/12 23:16:10 $
if nargin < 3 || isempty(Mu) || isempty(Sigma) || isempty(T)
    error('Finance:emaxdrawdown:MissingInputArg', ...
        'Missing required input arguments Mu, Sigma, or T.');
end
[n, i] = size(Mu);
if (n > 1) || (i > 1)
    error('Finance:emaxdrawdown:InvalidInputArg', ...
        'Argument Mu must be a scalar.');
end
[n, i] = size(Sigma);
if (n > 1) || (i > 1)
    error('Finance:emaxdrawdown:InvalidInputArg', ...
        'Argument Sigma must be a scalar.');
end
T = T(:);
[n, i] = size(T);
if i > 1
    error('Finance:emaxdrawdown:InvalidInputArg', ...
        'Argument T must be a vector.');
end
    
if ~isfinite(Mu) || ~isfinite(Sigma) || (Sigma < 0)
    error('Finance:emaxdrawdown:InvalidInputArg', ...
        'Invalid argument values. Requires finite Mu, Sigma >= 0, T >= 0.');
end
for i = 1:n
    if ~isfinite(T(i)) || (T(i) < 0)
        error('Finance:emaxdrawdown:InvalidInputArg', ...
            'Invalid argument values. Requires finite T >= 0.');
    end
end
 
if Sigma < eps
    if Mu >= 0.0
        EDD = zeros(n,1);
    else
        EDD = - Mu .* T;
    end
    return
end
EDD = zeros(n,1);
for i = 1:n
    if T(i) < eps
        EDD = zeros(n,1);
    else
        if abs(Mu) <= eps
            EDD(i) = (sqrt(pi/2) * Sigma) .* sqrt(T(i));
        elseif Mu > eps
            Alpha = Mu/(2.0 * Sigma * Sigma);
            EDD(i) = emaxddQp(Alpha * Mu * T(i))/Alpha;
        else
            Alpha = Mu/(2.0 * Sigma * Sigma);
            EDD(i) = - emaxddQn(Alpha * Mu * T(i))/Alpha;
        end
    end
end
function Q = emaxddQp(x)
A = [   0.0005;     0.001;      0.0015;     0.002;      0.0025;     0.005;
        0.0075;     0.01;       0.0125;     0.015;      0.0175;     0.02;
        0.0225;     0.025;      0.0275;     0.03;       0.0325;     0.035;
        0.0375;     0.04;       0.0425;     0.045;      0.0475;     0.05;
        0.055;      0.06;       0.065;      0.07;       0.075;      0.08;
        0.085;      0.09;       0.095;      0.1;        0.15;       0.2;
        0.25;       0.3;        0.35;       0.4;        0.45;       0.5;
        1.0;        1.5;        2.0;        2.5;        3.0;        3.5;
        4.0;        4.5;        5.0;        10.0;       15.0;       20.0;
        25.0;       30.0;       35.0;       40.0;       45.0;       50.0;
        100.0;      150.0;      200.0;      250.0;      300.0;      350.0;
        400.0;      450.0;      500.0;      1000.0;     1500.0;     2000.0;
        2500.0;     3000.0;     3500.0;     4000.0;     4500.0;     5000.0 ];
B = [   0.01969;    0.027694;   0.033789;   0.038896;   0.043372;   0.060721;
        0.073808;   0.084693;   0.094171;   0.102651;   0.110375;   0.117503;
        0.124142;   0.130374;   0.136259;   0.141842;   0.147162;   0.152249;
        0.157127;   0.161817;   0.166337;   0.170702;   0.174924;   0.179015;
        0.186842;   0.194248;   0.201287;   0.207999;   0.214421;   0.220581;
        0.226505;   0.232212;   0.237722;   0.24305;    0.288719;   0.325071;
        0.355581;   0.382016;   0.405415;   0.426452;   0.445588;   0.463159;
        0.588642;   0.668992;   0.72854;    0.775976;   0.815456;   0.849298;
        0.878933;   0.905305;   0.92907;    1.088998;   1.184918;   1.253794;
        1.307607;   1.351794;   1.389289;   1.42186;    1.450654;   1.476457;
        1.647113;   1.747485;   1.818873;   1.874323;   1.919671;   1.958037;
        1.991288;   2.02063;    2.046885;   2.219765;   2.320983;   2.392826;
        2.448562;   2.494109;   2.532622;   2.565985;   2.595416;   2.621743 ];
if x > 5000
    Q = 0.25*log(x) + 0.49088;
elseif x < 0.0005
    Q = 0.5*sqrt(pi*x);
else
    Q = interp1(A,B,x);
end
function Q = emaxddQn(x)
A = [   0.0005;     0.001;      0.0015;     0.002;      0.0025;     0.005;
        0.0075;     0.01;       0.0125;     0.015;      0.0175;     0.02;
        0.0225;     0.025;      0.0275;     0.03;       0.0325;     0.035;
        0.0375;     0.04;       0.0425;     0.045;      0.0475;     0.05;
        0.055;      0.06;       0.065;      0.07;       0.075;      0.08;
        0.085;      0.09;       0.095;      0.1;        0.15;       0.2;    
        0.25;       0.3;        0.35;       0.4;        0.45;       0.5;
        0.75;       1.0;        1.25;       1.5;        1.75;       2.0;
        2.25;       2.5;        2.75;       3.0;        3.25;       3.5;
        3.75;       4.0;        4.25;       4.5;        4.75;       5.0 ];
B = [   0.019965;   0.028394;   0.034874;   0.040369;   0.045256;   0.064633;
        0.079746;   0.092708;   0.104259;   0.114814;   0.124608;   0.133772;
        0.142429;   0.150739;   0.158565;   0.166229;   0.173756;   0.180793;
        0.187739;   0.194489;   0.201094;   0.207572;   0.213877;   0.220056;
        0.231797;   0.243374;   0.254585;   0.265472;   0.27607;    0.286406;
        0.296507;   0.306393;   0.316066;   0.325586;   0.413136;   0.491599;
        0.564333;   0.633007;   0.698849;   0.762455;   0.824484;   0.884593;
        1.17202;    1.44552;    1.70936;    1.97074;    2.22742;    2.48396;
        2.73676;    2.99094;    3.24354;    3.49252;    3.74294;    3.99519;
        4.24274;    4.49238;    4.73859;    4.99043;    5.24083;    5.49882 ];
if (x > 5.0)
    Q = x + 0.5;
elseif x < 0.0005
    Q = 0.5*sqrt(pi*x);
else
    Q = interp1(A,B,x);
end

Investigate replacing all NaNs everywhere

Currently, we replace all NaNs in cum_returns. This makes the logic more robust, but might not alert us to certain data errors. We will want to check if there are any cases where there are NaNs in the return stream and how it makes sense to handle that.

We may want to raise an error or warning in case this happens.

API for setting default annualization factor

Setting default annualization factor for all metrics is quite handy sometimes.

Currently this can be achieved by setting

empyrical.stats.ANNUALIZATION_FACTORS['daily'] = 24*252

Please consider add an API for this feature.

r-value needs to be squared

https://github.com/quantopian/qrisk/blob/master/qrisk/stats.py#L638

linregress returns the r-value or correlation coefficient.

The docstring for stability_of_timeseries says it returns the r-squared as the stability score, but currently it doesn't look like it is being squared.

cannot concatenate a non-NDFrame object

Running:

import numpy as np
from empyrical import max_drawdown, alpha_beta

returns = np.array([.01, .02, .03, -.4, -.06, -.02])
benchmark_returns = np.array([.02, .02, .03, -.35, -.05, -.01])

calculate alpha and beta

alpha, beta = alpha_beta(returns, benchmark_returns)

Gives "cannot concatenate a non-NDFrame object" error

Link to sortino-related pdf no longer works

empyrical/empyrical/stats.py

Line 773 in b561c5f

See `<https://www.sunrisecapital.com/wp-content/uploads/2014/06/Futures_

link seems to 404 now.

Assertion error

Hi,

This is the only error occurring while I'm running tests: python runtests.py

/usr/lib/python3.8/site-packages/pandas/util/__init__.py:12: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.
  import pandas.util.testing
.F...............................................................................................................................................................................................................................................................ssssssss.....s.s...s.ss.......................................................s................................................................................................................................................s...............ssssssss.....s.s...s.ss.......................................................s................................................................................................................................................s...............
======================================================================
FAIL: test_perf_attrib_simple (empyrical.tests.test_perf_attrib.PerfAttribTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/empyrical-0.5.5/empyrical/tests/test_perf_attrib.py", line 69, in test_perf_attrib_simple
    pd.util.testing.assert_frame_equal(expected_perf_attrib_output,
  File "/usr/lib/python3.8/site-packages/pandas/_testing.py", line 1611, in assert_frame_equal
    assert_series_equal(
  File "/usr/lib/python3.8/site-packages/pandas/_testing.py", line 1327, in assert_series_equal
    assert lidx.freq == ridx.freq, (lidx.freq, ridx.freq)
AssertionError: (<Day>, None)

----------------------------------------------------------------------
Ran 735 tests in 0.827s

FAILED (failures=1, skipped=30)

Add Additional Calculations & Rolling Periods

@twiecki and friends, I'm working on the project that requires many of the same calculations that are performed in this repo, but also several others. I am starting to work on a PR to add a variety of other statistical measures as well as a framework for getting rolling metrics on those things.

Here are a few examples of additional metrics: up-capture, down-capture, tracking-error.

Would like to add some ability to create rolling metrics. eg: Rolling 60-day Max Drawdown, Rolling 12 month Tracking Error

Before I start this work in earnest, I want to ensure that you all are open to expanding the calculation and capability set. Also, I can include a correct calculation of information ratio.

backwards compat: empyrical.cum_returns_final used to support dataframes

quantopian/bayesalpha@c57713f broke a (non-explicit and apparently untested) use-case where a user can pass in a DataFrame with multiple columns and get the cumulative returns for each column. After this change, it always returns a scalar. This quietly breaks user-code.

Ideally we revert it to the old way, or at least raise an error that dataframe inputs are not supported.

CC @llllllllll

Just installed Pyfolio getting AttributeError: module 'empyrical' has no attribute 'information_ratio'

I'm getting this on a number of the example notebooks.. I did update the utils py file in order to get Google data instead of yahoo,, and I do have some portions of the notebooks working but can't get a "full" tear sheet due to this error! Any hints/workarounds appreciated..

Remove Series.strides from codebase (Pandas deprecation)

Summary

Pandas 0.24 (or 0.23) deprecates the strides attribute of Series which is used in this repository. https://pandas.pydata.org/pandas-docs/version/0.24/whatsnew/v0.23.0.html?highlight=strides#deprecations

Tasks

update the utils to handle the deprecation of the strides attribute.

AttributeError: 'str' object has no attribute 'year'

I do: empyrical.aggregate_returns(spy, convert_to=empyrical.YEARLY) where spy is just a Series of SPY which works with all the other empyrical functions.

I always get the following error:

    192         grouping = [lambda x: x.year, lambda x: x.month]
    193     elif convert_to == YEARLY:
--> 194         grouping = [lambda x: x.year]
    195     else:
    196         raise ValueError(

AttributeError: 'str' object has no attribute 'year'

Tis also happens with WEEKLY and MONTHLY.

Am I missing something or is this a bug?

not able to use skew or kurtosis

Looking at your stats.py code (see below), I see that you have the skew and kurtosis. However, I am not able to use it using empyrical.

SIMPLE_STAT_FUNCS = [
cum_returns_final,
annual_return,
annual_volatility,
sharpe_ratio,
calmar_ratio,
stability_of_timeseries,
max_drawdown,
omega_ratio,
sortino_ratio,
stats.skew,
stats.kurtosis,
tail_ratio,
cagr,
value_at_risk,
conditional_value_at_risk,
]

Community Contributions

Are open source contributions still welcomed/wanted? If so are there any active issues? I had a look at github issues, but they all seemed pretty outdated.

Documentation http://quantopian.github.io/empyrical links to is outdated

The documentation http://quantopian.github.io/empyrical links to is outdated (it's for 0.1, while the latest release at the time of writing this is 0.3.3.).

Standard Deviation: Sample vs. Population

The code for annual_volatility and the related Sharpe ratio and others uses the numpy std function. However, the source code automatically sets ddof=1, which if I understand correctly is appropriate if the data is a sample. However, I am using data that is the entire population, so I need to set ddof to 0. It would be nice if we could add that as an option in the higher level functions.

Source: http://www.statsdirect.com/help/basics/degrees_freedom.htm

Wrong annualization of Sortino Ratio?

The Sortino Ratio is annualized in stats.py as follows:
return sortino * ann_factor

where ann_factor is
ann_factor = annualization_factor(period, annualization)

I believe you actually meant to return
return sortino * np.sqrt(ann_factor)

just like you did with the Sharpe Ratio.

Add conda builds

We have a quantopian conda channel. Empyrical should be part of it so that our zipline and pyfolio conda builds can depend on it.

Capture ratio support annualization?

Here is my scenario.
I have returns, which is measured by quarter.
Then I wanna to calculate the up and down capture ratios using empyrical.
However, I find the capture function in stats.py does not support annualization.

Thus, I cannot make the calculation.
Simply Adding the annualization param will be fine since the annual function support the annualization param.

BUG: Roll() function is incorrect

It appears that the roll() function in the utils package is incorrect. There are two problems:

For the examples below we will assume the pd.series has 10 data points and the window argument is 5.

The function does not calculate for the last period. The source of this problem is the definition of the range in the for loop. As it is now:

for I in range(window, len(args[0])):

The loop will run for i values of 5 to 9 because the len of the series is 10, but the range() function is non-inclusive of the last number. So the next line that extracts the subset of the pd.series or np.array to run the function over:

numpy array: rets = [s[i - window:i] for s in args]
pandas series: rets = [s.iloc[i - window:i] for s in args]

on the last pass of the for loop i=9 and thus the last data point in the series is never included in the calculation.

The next issue is that for pandas series the above causes the results to be mapped to the wrong index. When the i in the for loop is 9, the maximum number of the loop, and thus the function is applied to elements [4, 9] the result is assigned the datetime index of 9, which is incorrect because that is the last value in the index and does not properly align to the data used in the function which is .iloc[4, 9]

How to see all of this:

For a given raw series below and using the roll() function of np.nansum() is a column of what we would expect, and what is actually returned.

	          Input Series	Expected	               Actual
1/1/01	                   1 		
1/2/01	                   2 		
1/3/01	                   3 		
1/4/01	                   4 		
1/5/01	                   5 	            15 	
1/6/01	                   6 	            20 	            15 
1/7/01	                   7 	            25 	            20 
1/8/01	                   8 	            30 	            25 
1/9/01	                   9 	            35 	            30 
1/10/01	                10 	            40 	            35

What is the fix?
It is simple in the _roll_ndarray() and _roll_pandas() functions make the range end at len() + 1:

for i in range(window, len(args[0]) + 1):

And for the _roll_pandas() then for the datetime index i - 1

data[args[0]. index[i - 1]] = func(*rets, **kwargs)

Why didn't any test catch this?

It appears that the test for the roll() function, test_pandas_roll() was incorrect. It expects the length of the result series to be the length of the input series minus the window. That is incorrect, the expected number of return elements is that PLUS one.

What is the solution?

Simple to fix the two offending functions in utils.py:

def _roll_ndarray(func, window, *args, **kwargs):
    data = []
    for i in range(window, len(args[0]) + 1):
        rets = [s[i-window:i] for s in args]
        data.append(func(*rets, **kwargs))
    return np.array(data)

def _roll_pandas(func, window, *args, **kwargs):
    data = {}
    for i in range(window, len(args[0]) + 1):
        rets = [s.iloc[i-window:i] for s in args]
        data[args[0].index[i - 1]] = func(*rets, **kwargs)
    return pd.Series(data)

Why not do this?

This change will cause 52 tests that were incorrectly passing to now break. I am happy to submit the PR, but I do not have the time now to correct all the broken tests.

If the package maintainers would like I can submit this PR and then over time if people want to help we can fix the tests. I don't have time now to fix them all myself.

Uncaught Exception: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.

  File "/usr/local/bin/jesse", line 11, in <module>
    load_entry_point('jesse', 'console_scripts', 'jesse')()
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/src/jesse/jesse/__init__.py", line 307, in backtest
    from jesse.modes import backtest_mode
  File "/home/src/jesse/jesse/modes/backtest_mode/__init__.py", line 10, in <module>
    import jesse.services.statistics as stats
  File "/home/src/jesse/jesse/services/statistics.py", line 1, in <module>
    import empyrical
  File "/usr/local/lib/python3.8/dist-packages/empyrical/__init__.py", line 21, in <module>
    from .stats import (
  File "/usr/local/lib/python3.8/dist-packages/empyrical/stats.py", line 24, in <module>
    from .utils import nanmean, nanstd, nanmin, up, down, roll, rolling_window
  File "/usr/local/lib/python3.8/dist-packages/empyrical/utils.py", line 27, in <module>
    from pandas_datareader import data as web
  File "/usr/local/lib/python3.8/dist-packages/pandas_datareader/__init__.py", line 2, in <module>
    from .data import (
  File "/usr/local/lib/python3.8/dist-packages/pandas_datareader/data.py", line 11, in <module>
    from pandas_datareader.av.forex import AVForexReader
  File "/usr/local/lib/python3.8/dist-packages/pandas_datareader/av/__init__.py", line 5, in <module>
    from pandas_datareader._utils import RemoteDataError
  File "/usr/local/lib/python3.8/dist-packages/pandas_datareader/_utils.py", line 6, in <module>
    from pandas_datareader.compat import is_number
  File "/usr/local/lib/python3.8/dist-packages/pandas_datareader/compat/__init__.py", line 7, in <module>
    from pandas.util.testing import assert_frame_equal
  File "/usr/local/lib/python3.8/dist-packages/pandas/util/testing.py", line 5, in <module>
    warnings.warn(
=========================================================================
 Uncaught Exception: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.```

We have this annoying FutureWarning. Looks like its origin is empyrical.

max_drawdown returns pd.Series, not float

Example:

import numpy as np
import pandas as pd
import empyrical as ep

dr = pd.date_range('2015/01/01', '2018/01/01')

px = pd.DataFrame(data=abs(np.random.randn(len(dr)) + 10),
                  columns=['FOO'],
                  index=dr)

rets = px.pct_change().dropna()
rets.index = rets.index.tz_localize('UTC')

ep.stats.max_drawdown(rets)

Output:

0   -0.450319
dtype: float64

This causes downstream problems for me, e.g. here:

empyrical/empyrical/stats.py

Line 547 in ea9a9b7

if max_dd < 0:

since the truth value of a pd.Series is ambiguous.

np.__version__ = 1.14.2
pd.__version__ = 0.18.0
ep.__version__ = 0.4.3

Remove Information Ratio

Our Information Ratio calculation (https://github.com/quantopian/empyrical/blob/master/empyrical/stats.py#L590) is wrong as noted by @marketneutral:
"The definition is wrong. You need a risk model to calc active risk to benchmark. And the benchmark look through to calc active return."

As this is still a ways off, we should rather delete it instead of having a wrong measure in here.

BUG: beta does not work with pandas Series

It seems like whatever gets returned by np.intersect1d() is not possible to use as the index of a pandas.Series(). It only returns nans.

Safety Error on empyrical installed via conda-forge

Issue with empyrical-0.5.5 on conda-forge:

During installation of Zipline v1.4.1, which installs Empyrical as a dependency, Conda reports an apparently corrupted file.

SafetyError: The package for empyrical located at C:\Users\test\Anaconda3\pkgs\empyrical-0.5.5-pyh9f0ad1d_0
appears to be corrupted. The path 'site-packages/empyrical/utils.py'
has an incorrect size.
reported size: 16676 bytes
actual size: 16753 bytes

Despite this SafetyError, utils.py does get installed and appears to be functional.

Environment: Windows 10, Anaconda 64 bit

How to replicate just with empyrical standalone install:

conda create -n empyricaltest python=3.6
conda activate empyricaltest
conda install -c conda-forge empyrical

How to replicate, as part of a Zipline v1.4.1 installation:

conda create -n zip141 python=3.6
conda activate zip141
conda install -c conda-forge zipline

Remove yahoo finance

Yahoo finance has been discontinued for good. Google is a fallback (albeit a bad one) but there's no reason to display a warning and confuse users: quantopian/pyfolio#478 (comment)

_create_unary_vectorized_roll_function(function) how to use parameters "out : array-like, optional"

I'm running the following example and i accept it works perfectly, output as expected.

import numpy as np
from empyrical import roll_max_drawdown

returns = np.array([.01, .02, .03, -.4, -.06, -.02])

# calculate the rolling max drawdown
roll_max_drawdown(returns, window=3)

I was wondering is there a way to force the result/output to be the same shape/length as the original array?

e.g output would be 6 rather than 4 with the first to values = nan?

Reason: I'm trying to transform this function across a dataframe and it comee back with "Length of passed values is 4, index implies 6"

returns = pd.DataFrame({
        'value_date' : ['2018-01-31', '2018-02-28', '2018-03-31','2018-04-30', '2018-05-31', '2018-06-30', 
                        '2018-01-31', '2018-02-28', '2018-03-31','2018-04-30', '2018-05-31', '2018-06-30'],
        'code_id' :  ['AUD','AUD','AUD','AUD','AUD','AUD', 
                      'USD','USD','USD','USD','USD','USD'],
        'gross_return': [.01, .02, .03, -.4, -.06, -.02, 
                         .06, .8, .9, .4, -1.06, .03],
        })
              

returns['rolling_max_drawdown'] = returns.groupby(['code_id'])['gross_return'].transform(lambda x: roll_max_drawdown(x, window=3))

My hack workaround is to modify unary_vectorized_roll to end with the following

Is there a better way?

    place_holding_array = np.empty(len(arr)-len(out),)* np.nan
    result = np.concatenate((place_holding_array, out))  
    
    return result

Support Python >3.5

Installing fails using Python 3.6 and later as provided in current conda distribution. Bottleneck package dependency seems to fail to install.

Add sphinx documentation

Would be great to have all the risk metrics have html docs before we release.

ImportError: cannot import name 'information_ratio' from 'empyrical'

This is still an issue with the latest pyfolio / empyrical install (0.5.3)

Just installed Pyfolio, getting error AttributeError: module 'empyrical.utils' has no attribute 'default_returns_func'

Using pyfolio==0.80 and empyrical==0.2.2. This error pops up when trying to create a full tear sheet.

Should returns be lognormal or normal

In the comments for the stats.cum_returns() function it states:

Notes
-----
For increased numerical accuracy, convert input to log returns
where it is possible to sum instead of multiplying.

But the math of function itself does not sum the returns but rather multiplies each return. So the notes state that log returns should be provided, but the math assumes normal returns. This is confusing, which should it be?

Use Hypothesis for testing

http://hypothesis.works/

Current testing is limited to cases devised by author. Including hypothesis will help find cases where functions do not work correctly.

Literature for levy exponent in annual_volatility?

Hi quantopian Team,

Thanks for the great empyrical library :) !

I am currently researching on how to set the levy stability parameter for the annualization of the per period volatility:, the respective code are is https://github.com/quantopian/empyrical/blob/master/empyrical/stats.py#L501

I kind of aware about the underlying problem (period volatility can over- or underestimate the annual volatility depending on return distribution) but I lack literature on it. Can you recommend a good source on the topic? most times people are just using alpha=2 without an explanation

Cheers

Max

Proposal: change period/annualization API

One thing I currently think is suboptimal is that we provide period and annualization to do the same thing. That might not be problematic if it was for a broad use-case but most users will be zipline or quantopian users and pass in daily returns. Two alternatives:

Set global vars like DAILY = 252 and have only annualization and allow it to be passed annualization=DAILY or annualization=356 if someone wants custom.
Allow annualization to also be a str like 'daily', essentially move period to annualization.

KeyError: 'date'

During handling of the above exception, another exception occurred:

KeyError Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/empyrical/utils.py in get_symbol_returns_from_yahoo(symbol, start, end)
435 px = web.get_data_yahoo(symbol, start=start, end=end)
--> 436 px['date'] = pd.to_datetime(px['date'])
437 px.set_index('date', drop=False, inplace=True)

Date is index field
Index(['High', 'Low', 'Open', 'Close', 'Volume', 'Adj Close'], dtype='object')

Pandas unable to import datareader deprecation error

Python 3.5.2
pandas 0.24.2
empyrical 0.5.0

When I call empyrical I always see this error:

/usr/local/lib/python3.5/dist-packages/empyrical/utils.py:32: UserWarning: Unable to import pandas_datareader. Suppressing import error and continuing. All data reading functionality will raise errors; but has been deprecated and will be removed in a later version.
  warnings.warn(msg)

It looks like empyrical's usage of pandas needs to be updated?

'int' object has no attribute 'year'

I get the error " 'int' object has no attribute 'year'" when I try:

noncum_returns = df[['date','monthly_return']]
week_return = aggregate_returns(noncum_returns, convert_to="weekly")

empyrical.stats.value_at_risk output is NaN

empyrical.stats.value_at_risk output is NaN. I cannot figure out what the problem is as all the other empyrical.stats functions work perfectly with the same data serie.
Also, where I can access the source codes of the stats. This was provided in the earlier version of empyrical.
Thanks,

sharpe_ratio calculated correctly?

from empyrical.stats import sharpe_ratio

ret = [SPX returns from 2017 without dividends]

sharpe_ratio( ret, 0.01 )
array([-34.99321358])

sharpe_ratio( ret, 0.00 )
Out[39]: array([2.69941353])

sharpe_ratio( ret, .01, annualization=252 )
Out[51]: array([-34.99321358])

ret is in DECIMAL form. Ie .01 means 1 percent.
2017-01-03 0.0085
2017-01-04 0.0057
2017-01-05 -0.0008
2017-01-06 0.0035
2017-01-09 -0.0035

The sharpe ratio should not swing from 2.699 to -34.993 for a constant risk free rate change from zero to .01.

roll_cagr not working

I am trying to do:

from empyrical import  roll_beta, roll_alpha, roll_sharpe_ratio, roll_annual_volatility, roll_cagr

but get an error message whereas the function seems to exists. I don't know why (the others functions are ok)

ImportError: cannot import name 'roll_cagr' from 'empyrical'

thx for your help.

Release a new empyrical version

github tag needs to be bumped and uploaded to pypi.

Bug in annualization of alpha?

https://github.com/quantopian/empyrical/blob/master/empyrical/stats.py#L821 multiplies daily alpha by the annualization factor (252). However, as daily alphas compound just like returns, the correct way to annualize is (daily_alpha + 1) ** ann_factor - 1.

numer = sum(returns_less_thresh[returns_less_thresh > 0.0])

My environment is python 3.6.3

pandas 0.18.1

Add CAGR

http://www.investopedia.com/terms/c/cagr.asp