jamalsenouci / causalimpact Goto Github PK

View Code? Open in Web Editor NEW

260.0 260.0 61.0 5.53 MB

Python port of CausalImpact R library

License: Apache License 2.0

Python 4.80% Jupyter Notebook 95.14% Dockerfile 0.06%

causalimpact's People

Contributors

Stargazers

Watchers

causalimpact's Issues

compile_posterior_inferences() missing 1 required positional argument: 'estimation'

Hi,

When I'm running the example provided in the documentation with the custom model I get the following error:

File "StatisticalMethod.py", line 160, in main_stat impact.run() File "/home/reihane/anaconda3/lib/python3.6/site-packages/causalimpact/analysis.py", line 46, in run self.params["estimation"]) File "/home/reihane/anaconda3/lib/python3.6/site-packages/causalimpact/analysis.py", line 339, in _run_with_ucm orig_std_params, estimation) TypeError: compile_posterior_inferences() missing 1 required positional argument: 'estimation'

I am using:
python 3.6
numpy 1.15.3
pandas 0.23.0
statsmodels 0.9.0
nose 1.3.7
Was wondering if I'm missing something or not using the correct library versions.

Thanks,
Reihane

AttributeError: 'CausalImpact' object has no attribute 'inferences'

Hello,

I am trying to use this package, following the steps in your notebook: https://github.com/jamalsenouci/causalimpact/blob/master/GettingStarted.ipynb

I have successfully installed the package in Power Shell:

And this is how my notebook looks like:

However, when I try to run the package, these are the errors I get:

Can you please help me figure out what I am doing wrong please?

Thank you in advance and Happy New Year!

AInhoa

compile_posterior_inferences throws error on integer referencing with a datetime value

Binder fails to build

Binder link in docs fails to build dockerfile due to version of numpy pinned in requirements.txt

does not work with latest version of pandas

Pandas Version: 0.21.1

Code:

import numpy as np
import pandas as pd
from statsmodels.tsa.arima_process import arma_generate_sample
import matplotlib
import seaborn as sns
%matplotlib inline
matplotlib.rcParams['figure.figsize'] = (15, 6)

np.random.seed(1)

x1 = arma_generate_sample(ar=[0.999], ma=[0.9], nsample=100) + 100
y = 1.2 * x1 + np.random.randn(100)

y[71:100] = y[71:100] + 10
data = pd.DataFrame(np.array([y, x1]).T, columns=["y","x1"])
data.plot()
pre_period = [0,69]
post_period = [70,99]
impact = CausalImpact(data, pre_period, post_period)
impact.run()
impact.plot()

Error:

AttributeError Traceback (most recent call last)
in ()
2 post_period = [70,99]
3 impact = CausalImpact(data, pre_period, post_period)
----> 4 impact.run()
5 impact.plot()

/anaconda/lib/python3.6/site-packages/causalimpact/analysis.py in run(self)
33 self.params["ucm_model"],
34 self.params["post_period_response"],
---> 35 self.params["alpha"])
36
37 # Depending on input, dispatch to the appropriate Run* method()

/anaconda/lib/python3.6/site-packages/causalimpact/analysis.py in _format_input(self, data, pre_period, post_period, model_args, ucm_model, post_period_response, alpha)
209 # Check <pre_period> and <post_period>
210 if data is not None:
--> 211 checked = self._format_input_prepost(pre_period, post_period, data)
212 pre_period = checked["pre_period"]
213 post_period = checked["post_period"]

/anaconda/lib/python3.6/site-packages/causalimpact/analysis.py in _format_input_prepost(self, pre_period, post_period, data)
104 pre_dtype = np.array(pre_period).dtype
105 post_dtype = np.array(post_period).dtype
--> 106 if isinstance(data.index, pd.tseries.index.DatetimeIndex):
107 pre_period = [pd.to_datetime(date) for date in pre_period]
108 post_period = [pd.to_datetime(date) for date in post_period]

AttributeError: module 'pandas.tseries' has no attribute 'index'

Apache License?

Assuming this will be under Apache license since its derived from an Apache License.
Can you add the Apache License file? Not having it prevents "worry free" re-use and extensibility.

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I was able to run the code all fine until I upgraded some of the packages and broke the code that was running fine. Looks like something related to Python3.6. Can you help me fix this ?

Even the example code given in the documentation is breaking.

import numpy as np
import pandas as pd
from statsmodels.tsa.arima_process import arma_generate_sample
import matplotlib
import seaborn as sns
%matplotlib inline
matplotlib.rcParams['figure.figsize'] = (15, 6)

# Data Prep
np.random.seed(1)
x1 = arma_generate_sample(ar=[0.999], ma=[0.9], nsample=100) + 100
y = 1.2 * x1 + np.random.randn(100)
y[71:100] = y[71:100] + 10
data = pd.DataFrame(np.array([y, x1]).T, columns=["y","x1"])

# Model
pre_period = [0,69]
post_period = [70,99]
impact = CausalImpact(data, pre_period, post_period)
impact.run()

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-62-a062d48e2479> in <module>()
     18 post_period = [70,99]
     19 impact = CausalImpact(data, pre_period, post_period)
---> 20 impact.run()

/Users/shikhardua/anaconda3/lib/python3.6/site-packages/causalimpact/analysis.py in run(self)
     33                                     self.params["ucm_model"],
     34                                     self.params["post_period_response"],
---> 35                                     self.params["alpha"])
     36 
     37         # Depending on input, dispatch to the appropriate Run* method()

/Users/shikhardua/anaconda3/lib/python3.6/site-packages/causalimpact/analysis.py in _format_input(self, data, pre_period, post_period, model_args, ucm_model, post_period_response, alpha)
    205         # representing time points
    206         if data is not None:
--> 207             data = self._format_input_data(data)
    208 
    209         # Check <pre_period> and <post_period>

/Users/shikhardua/anaconda3/lib/python3.6/site-packages/causalimpact/analysis.py in _format_input_data(self, data)
     75         # Must not have NA in covariates (if any)
     76         if len(data.columns) >= 2:
---> 77             if np.any(pd.isnull(data.iloc[:, 1:])):
     78                 raise ValueError("covariates must not contain null values")
     79 

/Users/shikhardua/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py in __nonzero__(self)
    951         raise ValueError("The truth value of a {0} is ambiguous. "
    952                          "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 953                          .format(self.__class__.__name__))
    954 
    955     __bool__ = __nonzero__

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Very different Results R vs Python (and minor Problems)

Hey Jamal, I played a little bit with your package and I really appreciate your work a lot ! I am looking forward to port some R stuff to Python and now it seems nearly possible thanks to your effort :)

Unfortunately I encoutered some differences between These two worlds which I can not explain at the Moment. See below. If you have any idea let me know.

R Version

library("CausalImpact")
library("zoo")

DATA = "
t,y,x1,x2\n
2016-02-20 22:41:20,110.0,134.0,128.0\n
2016-02-20 22:41:30,125.0,134.0,128.0\n
2016-02-20 22:41:40,123.0,134.0,128.0\n
2016-02-20 22:41:50,128.0,134.0,128.0\n
2016-02-20 22:42:00,114.0,134.0,128.0\n
2016-02-20 22:42:10,125.0,133.0,128.0\n
2016-02-20 22:42:20,119.0,133.0,128.0\n
2016-02-20 22:42:30,121.0,133.0,128.0\n
2016-02-20 22:42:40,139.0,133.0,128.0\n
2016-02-20 22:42:50,107.0,133.0,128.0\n
2016-02-20 22:43:00,115.0,132.0,128.0\n
2016-02-20 22:43:10,91.0,132.0,128.0\n
2016-02-20 22:43:20,107.0,132.0,128.0\n
2016-02-20 22:43:30,124.0,132.0,128.0\n
2016-02-20 22:43:40,116.0,131.0,128.0\n
2016-02-20 22:43:50,110.0,131.0,128.0\n
2016-02-20 22:44:00,100.0,131.0,128.0\n
2016-02-20 22:44:10,110.0,131.0,128.0\n
2016-02-20 22:44:20,113.0,129.0,128.0\n
2016-02-20 22:44:30,103.0,129.0,128.0\n
2016-02-20 22:44:40,117.0,129.0,128.0\n
2016-02-20 22:44:50,125.0,129.0,128.0\n
2016-02-20 22:45:00,115.0,129.0,128.0\n
2016-02-20 22:45:10,114.0,128.0,128.0\n
2016-02-20 22:45:20,138.0,128.0,128.0\n
2016-02-20 22:45:30,117.0,128.0,128.0\n
2016-02-20 22:45:40,104.0,128.0,128.0\n
2016-02-20 22:45:50,123.0,128.0,128.0\n
2016-02-20 22:46:00,122.0,128.0,128.0\n
2016-02-20 22:46:10,150.0,128.0,128.0\n
2016-02-20 22:46:20,127.0,128.0,128.0\n
2016-02-20 22:46:30,139.0,128.0,128.0\n
2016-02-20 22:46:40,139.0,127.0,127.0\n
2016-02-20 22:46:50,109.0,127.0,127.0\n
2016-02-20 22:47:00,107.0,127.0,127.0\n
2016-02-20 22:47:10,94.0,127.0,127.0\n
2016-02-20 22:47:20,112.0,127.0,127.0\n
2016-02-20 22:47:30,107.0,127.0,127.0\n
2016-02-20 22:47:40,126.0,127.0,127.0\n
2016-02-20 22:47:50,114.0,127.0,127.0\n
2016-02-20 22:48:00,129.0,127.0,127.0\n
2016-02-20 22:48:10,113.0,126.0,127.0\n
2016-02-20 22:48:20,114.0,126.0,127.0\n
2016-02-20 22:48:30,116.0,126.0,127.0\n
2016-02-20 22:48:40,110.0,125.0,126.0\n
2016-02-20 22:48:50,131.0,125.0,126.0\n
2016-02-20 22:49:00,109.0,125.0,126.0\n
2016-02-20 22:49:10,114.0,125.0,127.0\n
2016-02-20 22:49:20,116.0,125.0,126.0\n
2016-02-20 22:49:30,113.0,124.0,125.0\n
2016-02-20 22:49:40,108.0,124.0,125.0\n
2016-02-20 22:49:50,120.0,124.0,125.0\n
2016-02-20 22:50:00,106.0,123.0,125.0\n
2016-02-20 22:50:10,123.0,123.0,125.0\n
2016-02-20 22:50:20,123.0,123.0,124.0\n
2016-02-20 22:50:30,135.0,123.0,124.0\n
2016-02-20 22:50:40,127.0,123.0,124.0\n
2016-02-20 22:50:50,140.0,123.0,123.0\n
2016-02-20 22:51:00,139.0,123.0,123.0\n
2016-02-20 22:51:10,137.0,123.0,123.0\n
2016-02-20 22:51:20,123.0,123.0,123.0\n
2016-02-20 22:51:30,160.0,122.0,123.0\n
2016-02-20 22:51:40,173.0,122.0,123.0\n
2016-02-20 22:51:50,236.0,122.0,123.0\n
2016-02-20 22:52:00,233.0,122.0,123.0\n
2016-02-20 22:52:10,193.0,122.0,123.0\n
2016-02-20 22:52:20,169.0,122.0,123.0\n
2016-02-20 22:52:30,167.0,122.0,123.0\n
2016-02-20 22:52:40,172.0,121.0,123.0\n
2016-02-20 22:52:50,148.0,121.0,123.0\n
2016-02-20 22:53:00,125.0,121.0,123.0\n
2016-02-20 22:53:10,132.0,121.0,123.0\n
2016-02-20 22:53:20,165.0,121.0,123.0\n
2016-02-20 22:53:30,154.0,120.0,123.0\n
2016-02-20 22:53:40,158.0,120.0,123.0\n
2016-02-20 22:53:50,135.0,120.0,123.0\n
2016-02-20 22:54:00,145.0,120.0,123.0\n
2016-02-20 22:54:10,163.0,119.0,122.0\n
2016-02-20 22:54:20,146.0,119.0,122.0\n
2016-02-20 22:54:30,120.0,119.0,121.0\n
2016-02-20 22:54:40,149.0,118.0,121.0\n
2016-02-20 22:54:50,140.0,118.0,121.0\n
2016-02-20 22:55:00,150.0,117.0,121.0\n
2016-02-20 22:55:10,133.0,117.0,120.0\n
2016-02-20 22:55:20,143.0,117.0,120.0\n
2016-02-20 22:55:30,145.0,117.0,120.0\n
2016-02-20 22:55:40,145.0,117.0,120.0\n
2016-02-20 22:55:50,176.0,117.0,120.0\n
2016-02-20 22:56:00,134.0,117.0,120.0\n
2016-02-20 22:56:10,147.0,117.0,120.0\n
2016-02-20 22:56:20,131.0,117.0,120.0"

to_time <- function(s) {
  return(as.POSIXct(trimws(paste(s, '')), format="%Y-%m-%d %H:%M:%S", tz="Europe/Berlin"))
}

df <- read.table(textConnection(DATA), sep=",", header=T)
df$t = to_time(df$t)

df <- zoo(cbind(df$y, df$x1, df$x2), df$t)


pre_period <- c(to_time('2016-02-20 22:41:20'), to_time('2016-02-20 22:51:20'))
post_period <-c(to_time('2016-02-20 22:51:30'), to_time('2016-02-20 22:56:20'))

impact <- CausalImpact(df,pre_period, post_period)
impact$summary

> summary(impact)
Posterior inference {CausalImpact}
                         Average      Cumulative  
Actual                   156          4687        
Prediction (s.d.)        129 (4.6)    3875 (139.3)
95% CI                   [120, 138]   [3602, 4139]
                                                  
Absolute effect (s.d.)   27 (4.6)     812 (139.3) 
95% CI                   [18, 36]     [548, 1085] 
                                                  
Relative effect (s.d.)   21% (3.6%)   21% (3.6%)  
95% CI                   [14%, 28%]   [14%, 28%]  

Posterior tail-area probability p:   0.001
Posterior prob. of a causal effect:  99.8998%

For more details, type: summary(impact, "report")

Python Version

import pandas as pd
import sys
from io import StringIO
from causalimpact import CausalImpact

DATA = """
t,y,x1,x2\n
2016-02-20 22:41:20,110.0,134.0,128.0\n
2016-02-20 22:41:30,125.0,134.0,128.0\n
2016-02-20 22:41:40,123.0,134.0,128.0\n
2016-02-20 22:41:50,128.0,134.0,128.0\n
2016-02-20 22:42:00,114.0,134.0,128.0\n
2016-02-20 22:42:10,125.0,133.0,128.0\n
2016-02-20 22:42:20,119.0,133.0,128.0\n
2016-02-20 22:42:30,121.0,133.0,128.0\n
2016-02-20 22:42:40,139.0,133.0,128.0\n
2016-02-20 22:42:50,107.0,133.0,128.0\n
2016-02-20 22:43:00,115.0,132.0,128.0\n
2016-02-20 22:43:10,91.0,132.0,128.0\n
2016-02-20 22:43:20,107.0,132.0,128.0\n
2016-02-20 22:43:30,124.0,132.0,128.0\n
2016-02-20 22:43:40,116.0,131.0,128.0\n
2016-02-20 22:43:50,110.0,131.0,128.0\n
2016-02-20 22:44:00,100.0,131.0,128.0\n
2016-02-20 22:44:10,110.0,131.0,128.0\n
2016-02-20 22:44:20,113.0,129.0,128.0\n
2016-02-20 22:44:30,103.0,129.0,128.0\n
2016-02-20 22:44:40,117.0,129.0,128.0\n
2016-02-20 22:44:50,125.0,129.0,128.0\n
2016-02-20 22:45:00,115.0,129.0,128.0\n
2016-02-20 22:45:10,114.0,128.0,128.0\n
2016-02-20 22:45:20,138.0,128.0,128.0\n
2016-02-20 22:45:30,117.0,128.0,128.0\n
2016-02-20 22:45:40,104.0,128.0,128.0\n
2016-02-20 22:45:50,123.0,128.0,128.0\n
2016-02-20 22:46:00,122.0,128.0,128.0\n
2016-02-20 22:46:10,150.0,128.0,128.0\n
2016-02-20 22:46:20,127.0,128.0,128.0\n
2016-02-20 22:46:30,139.0,128.0,128.0\n
2016-02-20 22:46:40,139.0,127.0,127.0\n
2016-02-20 22:46:50,109.0,127.0,127.0\n
2016-02-20 22:47:00,107.0,127.0,127.0\n
2016-02-20 22:47:10,94.0,127.0,127.0\n
2016-02-20 22:47:20,112.0,127.0,127.0\n
2016-02-20 22:47:30,107.0,127.0,127.0\n
2016-02-20 22:47:40,126.0,127.0,127.0\n
2016-02-20 22:47:50,114.0,127.0,127.0\n
2016-02-20 22:48:00,129.0,127.0,127.0\n
2016-02-20 22:48:10,113.0,126.0,127.0\n
2016-02-20 22:48:20,114.0,126.0,127.0\n
2016-02-20 22:48:30,116.0,126.0,127.0\n
2016-02-20 22:48:40,110.0,125.0,126.0\n
2016-02-20 22:48:50,131.0,125.0,126.0\n
2016-02-20 22:49:00,109.0,125.0,126.0\n
2016-02-20 22:49:10,114.0,125.0,127.0\n
2016-02-20 22:49:20,116.0,125.0,126.0\n
2016-02-20 22:49:30,113.0,124.0,125.0\n
2016-02-20 22:49:40,108.0,124.0,125.0\n
2016-02-20 22:49:50,120.0,124.0,125.0\n
2016-02-20 22:50:00,106.0,123.0,125.0\n
2016-02-20 22:50:10,123.0,123.0,125.0\n
2016-02-20 22:50:20,123.0,123.0,124.0\n
2016-02-20 22:50:30,135.0,123.0,124.0\n
2016-02-20 22:50:40,127.0,123.0,124.0\n
2016-02-20 22:50:50,140.0,123.0,123.0\n
2016-02-20 22:51:00,139.0,123.0,123.0\n
2016-02-20 22:51:10,137.0,123.0,123.0\n
2016-02-20 22:51:20,123.0,123.0,123.0\n
2016-02-20 22:51:30,160.0,122.0,123.0\n
2016-02-20 22:51:40,173.0,122.0,123.0\n
2016-02-20 22:51:50,236.0,122.0,123.0\n
2016-02-20 22:52:00,233.0,122.0,123.0\n
2016-02-20 22:52:10,193.0,122.0,123.0\n
2016-02-20 22:52:20,169.0,122.0,123.0\n
2016-02-20 22:52:30,167.0,122.0,123.0\n
2016-02-20 22:52:40,172.0,121.0,123.0\n
2016-02-20 22:52:50,148.0,121.0,123.0\n
2016-02-20 22:53:00,125.0,121.0,123.0\n
2016-02-20 22:53:10,132.0,121.0,123.0\n
2016-02-20 22:53:20,165.0,121.0,123.0\n
2016-02-20 22:53:30,154.0,120.0,123.0\n
2016-02-20 22:53:40,158.0,120.0,123.0\n
2016-02-20 22:53:50,135.0,120.0,123.0\n
2016-02-20 22:54:00,145.0,120.0,123.0\n
2016-02-20 22:54:10,163.0,119.0,122.0\n
2016-02-20 22:54:20,146.0,119.0,122.0\n
2016-02-20 22:54:30,120.0,119.0,121.0\n
2016-02-20 22:54:40,149.0,118.0,121.0\n
2016-02-20 22:54:50,140.0,118.0,121.0\na
2016-02-20 22:55:00,150.0,117.0,121.0\n
2016-02-20 22:55:10,133.0,117.0,120.0\n
2016-02-20 22:55:20,143.0,117.0,120.0\n
2016-02-20 22:55:30,145.0,117.0,120.0\n
2016-02-20 22:55:40,145.0,117.0,120.0\n
2016-02-20 22:55:50,176.0,117.0,120.0\n
2016-02-20 22:56:00,134.0,117.0,120.0\n
2016-02-20 22:56:10,147.0,117.0,120.0\n
2016-02-20 22:56:20,131.0,117.0,120.0"""

df = pd.read_csv(StringIO(DATA))
df["t"] = pd.to_datetime(df["t"])
df.index = df["t"]
del df["t"] 

pre_period = [pd.to_datetime('2016-02-20 22:41:20'), pd.to_datetime('2016-02-20 22:51:20')]
post_period = [pd.to_datetime('2016-02-20 22:51:30'), pd.to_datetime('2016-02-20 22:56:20')]

impact = CausalImpact(df, pre_period, post_period)
impact.run()
impact.plot()

> impact.summary()
>                         Average      Cumulative
> Actual                      156            4687
> Predicted                   129            3883
> 95% CI                [93, 165]    [2812, 4955]
>                                                
> Absolute Effect              26             803
> 95% CI                 [62, -8]    [1874, -268]
>                                                
> Relative Effect           20.7%           20.7%
> 95% CI           [48.3%, -6.9%]  [48.3%, -6.9%]
>

So the Python Version seems much more restrictive.

By the way:
In inferences.py I had to change

        cum_effect = point_effect.copy()
        cum_effect.iloc[df_pre.index[0]:df_pre.index[-1]] = 0
        cum_effect = np.cumsum(cum_effect)
        cum_effect_upper = point_effect_upper.copy()
        cum_effect_upper.iloc[df_pre.index[0]:df_pre.index[-1]] = 0
        cum_effect_upper = np.cumsum(cum_effect_upper)
        cum_effect_lower = point_effect_lower.copy()
        cum_effect_lower.iloc[df_pre.index[0]:df_pre.index[-1]] = 0
        cum_effect_lower = np.cumsum(cum_effect_lower)

        cum_effect = point_effect.copy()
        cum_effect.loc[df_pre.index[0]:df_pre.index[-1]] = 0
        cum_effect = np.cumsum(cum_effect)
        cum_effect_upper = point_effect_upper.copy()
        cum_effect_upper.loc[df_pre.index[0]:df_pre.index[-1]] = 0
        cum_effect_upper = np.cumsum(cum_effect_upper)
        cum_effect_lower = point_effect_lower.copy()
        cum_effect_lower.loc[df_pre.index[0]:df_pre.index[-1]] = 0
        cum_effect_lower = np.cumsum(cum_effect_lower)

using python 3.6.3
causalimpact 0.1.1
numpy 1.13.3
pandas 0.21.1
seaborn 0.8.1
statsmodels 0.8.0
zeromq 4.1.3 0

and the python plot shows wrong timestamp-markers :)

Causal Impact failing due to to pandas update

Causal impact is failing recently due to this error:

module 'pandas.core.dtypes.common' has no attribute 'is_date_time_or_time_delta_type'

Pandas seems to have removed is_date_time_or_time_delta_type as I can not find it here:
https://github.com/pandas-dev/pandas/blob/main/pandas/core/common.py

Key Error: 'upper y'

I have the following pandas object

print(type(causal))
print(causal.columns)
print(type(causal.index))
print(causal.head())

<class 'pandas.core.frame.DataFrame'>
Index(['y', 'x1', 'x2'], dtype='object')
<class 'pandas.core.indexes.datetimes.DatetimeIndex'>
              y   x1   x2
date                     
2017-09-04  150  150  275
2017-09-05  200  249  125
2017-09-06  225  150  249
2017-09-07  150  125  275
2017-09-08  175  325  250

I set variables pre_period and post_period as in your documentation. Then I run

impact = CausalImpact(causal, pre_period, post_period)
impact.run()

and I get

KeyError: 'upper y'

Can you please point me in the right direction?
Thanks!

ImportError: cannot import name 'PandasError'

Using Pandas 0.20.1, when I try to import CausalImpact package, I get the following error

ImportError: cannot import name 'PandasError'

Error in the index attribute

First of all, thanks for this package. Moving back and forth from python to R is frustrating. I have an issue. My index is RangeIndex(start=0, stop=152, step=1) and pre_period = [0,109]; post_period = [111,151]. I have 2 columns in pandas and when I pass it to CausalImpact I get the following error:

AttributeError                            Traceback (most recent call last)
<ipython-input-86-f56cb9d3e7f1> in <module>()
----> 1 impact.run()

/Users/nawafalsabhan/anaconda/lib/python2.7/site-packages/causalimpact/analysis.pyc in run(self)
     33                                     self.params["ucm_model"],
     34                                     self.params["post_period_response"],
---> 35                                     self.params["alpha"])
     36 
     37         # Depending on input, dispatch to the appropriate Run* method()

/Users/nawafalsabhan/anaconda/lib/python2.7/site-packages/causalimpact/analysis.pyc in _format_input(self, data, pre_period, post_period, model_args, ucm_model, post_period_response, alpha)
    209         # Check <pre_period> and <post_period>
    210         if data is not None:
--> 211             checked = self._format_input_prepost(pre_period, post_period, data)
    212             pre_period = checked["pre_period"]
    213             post_period = checked["post_period"]

/Users/nawafalsabhan/anaconda/lib/python2.7/site-packages/causalimpact/analysis.pyc in _format_input_prepost(self, pre_period, post_period, data)
    104         pre_dtype = np.array(pre_period).dtype
    105         post_dtype = np.array(post_period).dtype
--> 106         if isinstance(data.index, pd.tseries.index.DatetimeIndex):
    107             pre_period = [pd.to_datetime(date) for date in pre_period]
    108             post_period = [pd.to_datetime(date) for date in post_period]

AttributeError: 'module' object has no attribute 'index'

Cannot import CausalImpact because of NameError:

When I import causalimpact, as such

import causalimpact

I get this error

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-1-707f1b214cc6> in <module>()
----> 1 import causalimpact

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/causalimpact-0.1-py2.7.egg/causalimpact/__init__.py in <module>()
      7 """
      8 
----> 9 from causalimpact.analysis import CausalImpact  # noqa
     10 from causalimpact.tests.test import run as test  # noqa

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/causalimpact-0.1-py2.7.egg/causalimpact/analysis.py in <module>()
      6 from causalimpact.misc import standardize_all_variables
      7 from causalimpact.model import construct_model
----> 8 from causalimpact.inferences import compile_posterior_inferences
      9 
     10 class CausalImpact(object):

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/causalimpact-0.1-py2.7.egg/causalimpact/inferences.py in <module>()
      1 from causalimpact.misc import unstandardize
      2 def compile_posterior_inferences(model, data_post, alpha=0.05,
----> 3                                  orig_std_params=identity):
      4 
      5     # Compute point predictions of counterfactual (in standardized space)

NameError: name 'identity' is not defined

I looked at the source code and found that identity is a default arg for compile_posterior_inferences, but identity doesn't exist. I looked elsewhere in the code to see if identity is referenced elsewhere to see if I could fix it myself, and found that theres this: orig_std_params = np.identity on line 243 of analysis.py, but np.identity is a function that returns the identity matrix. It it assigned to the variable orig_std_params without actually giving np.identity an argument. The orig_std_params variable is then reassigned to some other value. So I couldn't make sense of how to fix the code. Also, compile_na_inferences in inferences.py doesn't do anything, seems wrong because the function is used elsewhere in the package.

So I don't know how to use the package as such....

Validation and comparison against the R package

Hi, I'm happy to see that you have ported this package to Python as I was thinking of doing this myself. I am wondering if someone has gone ahead and done some validation and comparisons with the R package to ensure the results for known inputs are as expected and are also same across the two packages?
Thanks!

Calculation error of p-value

I'm not sure whether there is an error in the calculation of p_value of the code.
As the following figure shows, the p_value is calculated by the mean value of the synthetic control group's predictor rather than the point effect value. I guess you may be want to use the latter?

Another evidence is that when I use the example from https://colab.research.google.com/drive/1HkJ9zm0LY36Wz-wB_bSHq68w8Cef6qJO?usp=sharing#scrollTo=AqyItZ3Hggoh, even if I don't change the value of y after 3000, the p-value is still significant and inconsistent with the confidence interval given(including zero)

Think p-value was not calculated properly

Thanks for this package!

#assuming approximately normal distribution
#calculate standard deviation from the 95% conf interval
std_pred = (mean_upper - mean_pred) / 1.96
#calculate z score
z_score = (0 - mean_pred) / std_pred
#convert to probability

think this takes values of original series, and these are almost always different from zero. Would be better to look at rel_effect_upper and rel_effect instead.

Also

p_value = st.norm.cdf(z_score)
prob_causal = (100 - p_value)

p_value will be from 0 to 1, not sure if 1-p_value makes more sense..?

support bayesian estimation methods

A major difference between the R version and this is the method of estimation. Currently this only supports maximum likelihood estimation and results in differences between the two packages as noticed in #7

Link to "Documentation and Examples" is broken

This link isn't working anymore: http://jamalsenouci.github.io/CausalImpact/CausalImpact.html

Any idea of where else folks can find this material?

Thanks.

Coverage file is not being created in gh action

The codecov action is not finding a coverage.xml file to upload

x axis is displaying incorrectly when modelling series with datetime index

see #7 for details

CSV Data?

Hi,

If I wanted it to read from a CSV file:

data = pd.read_csv('Traffic.csv',header=None,names = ['CountryX','CountryY'])

where would I need to save the file for it read this?

Many Thanks
JD

How to get P-value in Python version?

Summary of R package provides "Posterior tail-area probability p" and "Posterior prob. of a causal effect". I cannot find it in summary of python package. Is there any reason of not reporting them?

Only allows with a predictor time series

Hello! Been really enjoying the library as it's saved me having to either re-learn R or get R and pandas talking.

One notable question:

In analysis.py you currently block any attempt to use the model without a predictor time series. In R it's still possible to use it with a single Series and it remains useful (although obviously without a strong control actually measuring uplift for example is very hard).

Was just wondering about the decision behind this, is it a functional thing or just a spare time thing? (with this presumably just being a fun side project).

I can bypass it by looking for the first nonzero value in the first column rather than the second, or just forcing to take the max from the pre-period, but this just gives me a flat line prediction unlike in R and at this point I'm reaching the end of my skills.

if data.shape[1] == 1:  # no exogenous values provided
        raise ValueError("data contains no exogenous variables")
non_null = pd.isnull(data.iloc[:, 1]).nonzero()
first_non_null = non_null[0]

Confidence intervals wider than R version

breaking out the wider confidence intervals from the rest of the issues raised in #7. This is potentially due to the R version placing an upper limit on the standard deviation

TypeError: is_list_like() takes 1 positional argument but 2 were given

I never had this error until recently.
When I do impact.run(), I see the following error:

`---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
in
----> 1 impact.run()

/opt/anaconda3/envs/ishbooks/lib/python3.6/site-packages/causalimpact/analysis.py in run(self)
107 kwargs["model_args"],
108 kwargs["alpha"],
--> 109 self.params["estimation"],
110 )
111 else:

/opt/anaconda3/envs/ishbooks/lib/python3.6/site-packages/causalimpact/analysis.py in _run_with_data(self, data, pre_period, post_period, model_args, alpha, estimation)
422 if model_args["standardize_data"]:
423 sd_results = standardize_all_variables(
--> 424 data_modeling, pre_period, post_period
425 )
426 df_pre = sd_results["data_pre"]

/opt/anaconda3/envs/ishbooks/lib/python3.6/site-packages/causalimpact/misc.py in standardize_all_variables(data, pre_period, post_period)
17
18 if not (
---> 19 pd.api.types.is_list_like(pre_period, list)
20 and pd.api.types.is_list_like(post_period)
21 ):

TypeError: is_list_like() takes 1 positional argument but 2 were given`

It looks to me like is_list_like in this instance does not need the second argument, namely "list". Not sure how that suddenly became a problem on my end however.

ImportError: No module named tests.test

It's no problem if I just comment out the import in init but I thought you might want to know about the error.

File "/usr/local/lib/python2.7/dist-packages/causalimpact-0.1-py2.7.egg/causalimpact/init.py", line 10, in
from causalimpact.tests.test import run as test # noqa
ImportError: No module named tests.test

Confidence Interval First Data Point

Just wondering why the point_pred lower and upper bounds are unusually large for the first data point. This happens even in the example while this is not the case with the R package.

e.g. from the example: Lower: -2804.815502 & Upper: 3048.805211

and the rest have more normal values.

Is there any way to fix this issue?

jamalsenouci / causalimpact Goto Github PK

causalimpact's People

Contributors

Stargazers

Watchers

Forkers

causalimpact's Issues

R Version

Python Version

Recommend Projects

Recommend Topics

Recommend Org