Giter Club home page Giter Club logo

causalimpact's People

Contributors

causalkayak avatar cdutr avatar deepsource-autofix[bot] avatar deepsourcebot avatar dependabot[bot] avatar jamalsenouci avatar noahsilbert avatar renovate-bot avatar simon19891101 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

causalimpact's Issues

compile_posterior_inferences() missing 1 required positional argument: 'estimation'

Hi,

When I'm running the example provided in the documentation with the custom model I get the following error:

File "StatisticalMethod.py", line 160, in main_stat impact.run() File "/home/reihane/anaconda3/lib/python3.6/site-packages/causalimpact/analysis.py", line 46, in run self.params["estimation"]) File "/home/reihane/anaconda3/lib/python3.6/site-packages/causalimpact/analysis.py", line 339, in _run_with_ucm orig_std_params, estimation) TypeError: compile_posterior_inferences() missing 1 required positional argument: 'estimation'

I am using:
python 3.6
numpy 1.15.3
pandas 0.23.0
statsmodels 0.9.0
nose 1.3.7
Was wondering if I'm missing something or not using the correct library versions.

Thanks,
Reihane

AttributeError: 'CausalImpact' object has no attribute 'inferences'

Hello,

I am trying to use this package, following the steps in your notebook: https://github.com/jamalsenouci/causalimpact/blob/master/GettingStarted.ipynb

I have successfully installed the package in Power Shell:
1git_installation

And this is how my notebook looks like:
2

However, when I try to run the package, these are the errors I get:
3
4

Can you please help me figure out what I am doing wrong please?

Thank you in advance and Happy New Year!

AInhoa

Binder fails to build

Binder link in docs fails to build dockerfile due to version of numpy pinned in requirements.txt

does not work with latest version of pandas


Pandas Version: 0.21.1


Code:

import numpy as np
import pandas as pd
from statsmodels.tsa.arima_process import arma_generate_sample
import matplotlib
import seaborn as sns
%matplotlib inline
matplotlib.rcParams['figure.figsize'] = (15, 6)

np.random.seed(1)

x1 = arma_generate_sample(ar=[0.999], ma=[0.9], nsample=100) + 100
y = 1.2 * x1 + np.random.randn(100)

y[71:100] = y[71:100] + 10
data = pd.DataFrame(np.array([y, x1]).T, columns=["y","x1"])
data.plot()
pre_period = [0,69]
post_period = [70,99]
impact = CausalImpact(data, pre_period, post_period)
impact.run()
impact.plot()

Error:

AttributeError Traceback (most recent call last)
in ()
2 post_period = [70,99]
3 impact = CausalImpact(data, pre_period, post_period)
----> 4 impact.run()
5 impact.plot()

/anaconda/lib/python3.6/site-packages/causalimpact/analysis.py in run(self)
33 self.params["ucm_model"],
34 self.params["post_period_response"],
---> 35 self.params["alpha"])
36
37 # Depending on input, dispatch to the appropriate Run* method()

/anaconda/lib/python3.6/site-packages/causalimpact/analysis.py in _format_input(self, data, pre_period, post_period, model_args, ucm_model, post_period_response, alpha)
209 # Check <pre_period> and <post_period>
210 if data is not None:
--> 211 checked = self._format_input_prepost(pre_period, post_period, data)
212 pre_period = checked["pre_period"]
213 post_period = checked["post_period"]

/anaconda/lib/python3.6/site-packages/causalimpact/analysis.py in _format_input_prepost(self, pre_period, post_period, data)
104 pre_dtype = np.array(pre_period).dtype
105 post_dtype = np.array(post_period).dtype
--> 106 if isinstance(data.index, pd.tseries.index.DatetimeIndex):
107 pre_period = [pd.to_datetime(date) for date in pre_period]
108 post_period = [pd.to_datetime(date) for date in post_period]

AttributeError: module 'pandas.tseries' has no attribute 'index'

Apache License?

Assuming this will be under Apache license since its derived from an Apache License.
Can you add the Apache License file? Not having it prevents "worry free" re-use and extensibility.

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I was able to run the code all fine until I upgraded some of the packages and broke the code that was running fine. Looks like something related to Python3.6. Can you help me fix this ?

Even the example code given in the documentation is breaking.

import numpy as np
import pandas as pd
from statsmodels.tsa.arima_process import arma_generate_sample
import matplotlib
import seaborn as sns
%matplotlib inline
matplotlib.rcParams['figure.figsize'] = (15, 6)

# Data Prep
np.random.seed(1)
x1 = arma_generate_sample(ar=[0.999], ma=[0.9], nsample=100) + 100
y = 1.2 * x1 + np.random.randn(100)
y[71:100] = y[71:100] + 10
data = pd.DataFrame(np.array([y, x1]).T, columns=["y","x1"])

# Model
pre_period = [0,69]
post_period = [70,99]
impact = CausalImpact(data, pre_period, post_period)
impact.run()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-62-a062d48e2479> in <module>()
     18 post_period = [70,99]
     19 impact = CausalImpact(data, pre_period, post_period)
---> 20 impact.run()

/Users/shikhardua/anaconda3/lib/python3.6/site-packages/causalimpact/analysis.py in run(self)
     33                                     self.params["ucm_model"],
     34                                     self.params["post_period_response"],
---> 35                                     self.params["alpha"])
     36 
     37         # Depending on input, dispatch to the appropriate Run* method()

/Users/shikhardua/anaconda3/lib/python3.6/site-packages/causalimpact/analysis.py in _format_input(self, data, pre_period, post_period, model_args, ucm_model, post_period_response, alpha)
    205         # representing time points
    206         if data is not None:
--> 207             data = self._format_input_data(data)
    208 
    209         # Check <pre_period> and <post_period>

/Users/shikhardua/anaconda3/lib/python3.6/site-packages/causalimpact/analysis.py in _format_input_data(self, data)
     75         # Must not have NA in covariates (if any)
     76         if len(data.columns) >= 2:
---> 77             if np.any(pd.isnull(data.iloc[:, 1:])):
     78                 raise ValueError("covariates must not contain null values")
     79 

/Users/shikhardua/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py in __nonzero__(self)
    951         raise ValueError("The truth value of a {0} is ambiguous. "
    952                          "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 953                          .format(self.__class__.__name__))
    954 
    955     __bool__ = __nonzero__

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Very different Results R vs Python (and minor Problems)

Hey Jamal, I played a little bit with your package and I really appreciate your work a lot ! I am looking forward to port some R stuff to Python and now it seems nearly possible thanks to your effort :)

Unfortunately I encoutered some differences between These two worlds which I can not explain at the Moment. See below. If you have any idea let me know.

R Version

library("CausalImpact")
library("zoo")

DATA = "
t,y,x1,x2\n
2016-02-20 22:41:20,110.0,134.0,128.0\n
2016-02-20 22:41:30,125.0,134.0,128.0\n
2016-02-20 22:41:40,123.0,134.0,128.0\n
2016-02-20 22:41:50,128.0,134.0,128.0\n
2016-02-20 22:42:00,114.0,134.0,128.0\n
2016-02-20 22:42:10,125.0,133.0,128.0\n
2016-02-20 22:42:20,119.0,133.0,128.0\n
2016-02-20 22:42:30,121.0,133.0,128.0\n
2016-02-20 22:42:40,139.0,133.0,128.0\n
2016-02-20 22:42:50,107.0,133.0,128.0\n
2016-02-20 22:43:00,115.0,132.0,128.0\n
2016-02-20 22:43:10,91.0,132.0,128.0\n
2016-02-20 22:43:20,107.0,132.0,128.0\n
2016-02-20 22:43:30,124.0,132.0,128.0\n
2016-02-20 22:43:40,116.0,131.0,128.0\n
2016-02-20 22:43:50,110.0,131.0,128.0\n
2016-02-20 22:44:00,100.0,131.0,128.0\n
2016-02-20 22:44:10,110.0,131.0,128.0\n
2016-02-20 22:44:20,113.0,129.0,128.0\n
2016-02-20 22:44:30,103.0,129.0,128.0\n
2016-02-20 22:44:40,117.0,129.0,128.0\n
2016-02-20 22:44:50,125.0,129.0,128.0\n
2016-02-20 22:45:00,115.0,129.0,128.0\n
2016-02-20 22:45:10,114.0,128.0,128.0\n
2016-02-20 22:45:20,138.0,128.0,128.0\n
2016-02-20 22:45:30,117.0,128.0,128.0\n
2016-02-20 22:45:40,104.0,128.0,128.0\n
2016-02-20 22:45:50,123.0,128.0,128.0\n
2016-02-20 22:46:00,122.0,128.0,128.0\n
2016-02-20 22:46:10,150.0,128.0,128.0\n
2016-02-20 22:46:20,127.0,128.0,128.0\n
2016-02-20 22:46:30,139.0,128.0,128.0\n
2016-02-20 22:46:40,139.0,127.0,127.0\n
2016-02-20 22:46:50,109.0,127.0,127.0\n
2016-02-20 22:47:00,107.0,127.0,127.0\n
2016-02-20 22:47:10,94.0,127.0,127.0\n
2016-02-20 22:47:20,112.0,127.0,127.0\n
2016-02-20 22:47:30,107.0,127.0,127.0\n
2016-02-20 22:47:40,126.0,127.0,127.0\n
2016-02-20 22:47:50,114.0,127.0,127.0\n
2016-02-20 22:48:00,129.0,127.0,127.0\n
2016-02-20 22:48:10,113.0,126.0,127.0\n
2016-02-20 22:48:20,114.0,126.0,127.0\n
2016-02-20 22:48:30,116.0,126.0,127.0\n
2016-02-20 22:48:40,110.0,125.0,126.0\n
2016-02-20 22:48:50,131.0,125.0,126.0\n
2016-02-20 22:49:00,109.0,125.0,126.0\n
2016-02-20 22:49:10,114.0,125.0,127.0\n
2016-02-20 22:49:20,116.0,125.0,126.0\n
2016-02-20 22:49:30,113.0,124.0,125.0\n
2016-02-20 22:49:40,108.0,124.0,125.0\n
2016-02-20 22:49:50,120.0,124.0,125.0\n
2016-02-20 22:50:00,106.0,123.0,125.0\n
2016-02-20 22:50:10,123.0,123.0,125.0\n
2016-02-20 22:50:20,123.0,123.0,124.0\n
2016-02-20 22:50:30,135.0,123.0,124.0\n
2016-02-20 22:50:40,127.0,123.0,124.0\n
2016-02-20 22:50:50,140.0,123.0,123.0\n
2016-02-20 22:51:00,139.0,123.0,123.0\n
2016-02-20 22:51:10,137.0,123.0,123.0\n
2016-02-20 22:51:20,123.0,123.0,123.0\n
2016-02-20 22:51:30,160.0,122.0,123.0\n
2016-02-20 22:51:40,173.0,122.0,123.0\n
2016-02-20 22:51:50,236.0,122.0,123.0\n
2016-02-20 22:52:00,233.0,122.0,123.0\n
2016-02-20 22:52:10,193.0,122.0,123.0\n
2016-02-20 22:52:20,169.0,122.0,123.0\n
2016-02-20 22:52:30,167.0,122.0,123.0\n
2016-02-20 22:52:40,172.0,121.0,123.0\n
2016-02-20 22:52:50,148.0,121.0,123.0\n
2016-02-20 22:53:00,125.0,121.0,123.0\n
2016-02-20 22:53:10,132.0,121.0,123.0\n
2016-02-20 22:53:20,165.0,121.0,123.0\n
2016-02-20 22:53:30,154.0,120.0,123.0\n
2016-02-20 22:53:40,158.0,120.0,123.0\n
2016-02-20 22:53:50,135.0,120.0,123.0\n
2016-02-20 22:54:00,145.0,120.0,123.0\n
2016-02-20 22:54:10,163.0,119.0,122.0\n
2016-02-20 22:54:20,146.0,119.0,122.0\n
2016-02-20 22:54:30,120.0,119.0,121.0\n
2016-02-20 22:54:40,149.0,118.0,121.0\n
2016-02-20 22:54:50,140.0,118.0,121.0\n
2016-02-20 22:55:00,150.0,117.0,121.0\n
2016-02-20 22:55:10,133.0,117.0,120.0\n
2016-02-20 22:55:20,143.0,117.0,120.0\n
2016-02-20 22:55:30,145.0,117.0,120.0\n
2016-02-20 22:55:40,145.0,117.0,120.0\n
2016-02-20 22:55:50,176.0,117.0,120.0\n
2016-02-20 22:56:00,134.0,117.0,120.0\n
2016-02-20 22:56:10,147.0,117.0,120.0\n
2016-02-20 22:56:20,131.0,117.0,120.0"

to_time <- function(s) {
  return(as.POSIXct(trimws(paste(s, '')), format="%Y-%m-%d %H:%M:%S", tz="Europe/Berlin"))
}

df <- read.table(textConnection(DATA), sep=",", header=T)
df$t = to_time(df$t)

df <- zoo(cbind(df$y, df$x1, df$x2), df$t)


pre_period <- c(to_time('2016-02-20 22:41:20'), to_time('2016-02-20 22:51:20'))
post_period <-c(to_time('2016-02-20 22:51:30'), to_time('2016-02-20 22:56:20'))

impact <- CausalImpact(df,pre_period, post_period)
impact$summary
> summary(impact)
Posterior inference {CausalImpact}
                         Average      Cumulative  
Actual                   156          4687        
Prediction (s.d.)        129 (4.6)    3875 (139.3)
95% CI                   [120, 138]   [3602, 4139]
                                                  
Absolute effect (s.d.)   27 (4.6)     812 (139.3) 
95% CI                   [18, 36]     [548, 1085] 
                                                  
Relative effect (s.d.)   21% (3.6%)   21% (3.6%)  
95% CI                   [14%, 28%]   [14%, 28%]  

Posterior tail-area probability p:   0.001
Posterior prob. of a causal effect:  99.8998%

For more details, type: summary(impact, "report")

image

Python Version

import pandas as pd
import sys
from io import StringIO
from causalimpact import CausalImpact

DATA = """
t,y,x1,x2\n
2016-02-20 22:41:20,110.0,134.0,128.0\n
2016-02-20 22:41:30,125.0,134.0,128.0\n
2016-02-20 22:41:40,123.0,134.0,128.0\n
2016-02-20 22:41:50,128.0,134.0,128.0\n
2016-02-20 22:42:00,114.0,134.0,128.0\n
2016-02-20 22:42:10,125.0,133.0,128.0\n
2016-02-20 22:42:20,119.0,133.0,128.0\n
2016-02-20 22:42:30,121.0,133.0,128.0\n
2016-02-20 22:42:40,139.0,133.0,128.0\n
2016-02-20 22:42:50,107.0,133.0,128.0\n
2016-02-20 22:43:00,115.0,132.0,128.0\n
2016-02-20 22:43:10,91.0,132.0,128.0\n
2016-02-20 22:43:20,107.0,132.0,128.0\n
2016-02-20 22:43:30,124.0,132.0,128.0\n
2016-02-20 22:43:40,116.0,131.0,128.0\n
2016-02-20 22:43:50,110.0,131.0,128.0\n
2016-02-20 22:44:00,100.0,131.0,128.0\n
2016-02-20 22:44:10,110.0,131.0,128.0\n
2016-02-20 22:44:20,113.0,129.0,128.0\n
2016-02-20 22:44:30,103.0,129.0,128.0\n
2016-02-20 22:44:40,117.0,129.0,128.0\n
2016-02-20 22:44:50,125.0,129.0,128.0\n
2016-02-20 22:45:00,115.0,129.0,128.0\n
2016-02-20 22:45:10,114.0,128.0,128.0\n
2016-02-20 22:45:20,138.0,128.0,128.0\n
2016-02-20 22:45:30,117.0,128.0,128.0\n
2016-02-20 22:45:40,104.0,128.0,128.0\n
2016-02-20 22:45:50,123.0,128.0,128.0\n
2016-02-20 22:46:00,122.0,128.0,128.0\n
2016-02-20 22:46:10,150.0,128.0,128.0\n
2016-02-20 22:46:20,127.0,128.0,128.0\n
2016-02-20 22:46:30,139.0,128.0,128.0\n
2016-02-20 22:46:40,139.0,127.0,127.0\n
2016-02-20 22:46:50,109.0,127.0,127.0\n
2016-02-20 22:47:00,107.0,127.0,127.0\n
2016-02-20 22:47:10,94.0,127.0,127.0\n
2016-02-20 22:47:20,112.0,127.0,127.0\n
2016-02-20 22:47:30,107.0,127.0,127.0\n
2016-02-20 22:47:40,126.0,127.0,127.0\n
2016-02-20 22:47:50,114.0,127.0,127.0\n
2016-02-20 22:48:00,129.0,127.0,127.0\n
2016-02-20 22:48:10,113.0,126.0,127.0\n
2016-02-20 22:48:20,114.0,126.0,127.0\n
2016-02-20 22:48:30,116.0,126.0,127.0\n
2016-02-20 22:48:40,110.0,125.0,126.0\n
2016-02-20 22:48:50,131.0,125.0,126.0\n
2016-02-20 22:49:00,109.0,125.0,126.0\n
2016-02-20 22:49:10,114.0,125.0,127.0\n
2016-02-20 22:49:20,116.0,125.0,126.0\n
2016-02-20 22:49:30,113.0,124.0,125.0\n
2016-02-20 22:49:40,108.0,124.0,125.0\n
2016-02-20 22:49:50,120.0,124.0,125.0\n
2016-02-20 22:50:00,106.0,123.0,125.0\n
2016-02-20 22:50:10,123.0,123.0,125.0\n
2016-02-20 22:50:20,123.0,123.0,124.0\n
2016-02-20 22:50:30,135.0,123.0,124.0\n
2016-02-20 22:50:40,127.0,123.0,124.0\n
2016-02-20 22:50:50,140.0,123.0,123.0\n
2016-02-20 22:51:00,139.0,123.0,123.0\n
2016-02-20 22:51:10,137.0,123.0,123.0\n
2016-02-20 22:51:20,123.0,123.0,123.0\n
2016-02-20 22:51:30,160.0,122.0,123.0\n
2016-02-20 22:51:40,173.0,122.0,123.0\n
2016-02-20 22:51:50,236.0,122.0,123.0\n
2016-02-20 22:52:00,233.0,122.0,123.0\n
2016-02-20 22:52:10,193.0,122.0,123.0\n
2016-02-20 22:52:20,169.0,122.0,123.0\n
2016-02-20 22:52:30,167.0,122.0,123.0\n
2016-02-20 22:52:40,172.0,121.0,123.0\n
2016-02-20 22:52:50,148.0,121.0,123.0\n
2016-02-20 22:53:00,125.0,121.0,123.0\n
2016-02-20 22:53:10,132.0,121.0,123.0\n
2016-02-20 22:53:20,165.0,121.0,123.0\n
2016-02-20 22:53:30,154.0,120.0,123.0\n
2016-02-20 22:53:40,158.0,120.0,123.0\n
2016-02-20 22:53:50,135.0,120.0,123.0\n
2016-02-20 22:54:00,145.0,120.0,123.0\n
2016-02-20 22:54:10,163.0,119.0,122.0\n
2016-02-20 22:54:20,146.0,119.0,122.0\n
2016-02-20 22:54:30,120.0,119.0,121.0\n
2016-02-20 22:54:40,149.0,118.0,121.0\n
2016-02-20 22:54:50,140.0,118.0,121.0\na
2016-02-20 22:55:00,150.0,117.0,121.0\n
2016-02-20 22:55:10,133.0,117.0,120.0\n
2016-02-20 22:55:20,143.0,117.0,120.0\n
2016-02-20 22:55:30,145.0,117.0,120.0\n
2016-02-20 22:55:40,145.0,117.0,120.0\n
2016-02-20 22:55:50,176.0,117.0,120.0\n
2016-02-20 22:56:00,134.0,117.0,120.0\n
2016-02-20 22:56:10,147.0,117.0,120.0\n
2016-02-20 22:56:20,131.0,117.0,120.0"""

df = pd.read_csv(StringIO(DATA))
df["t"] = pd.to_datetime(df["t"])
df.index = df["t"]
del df["t"] 

pre_period = [pd.to_datetime('2016-02-20 22:41:20'), pd.to_datetime('2016-02-20 22:51:20')]
post_period = [pd.to_datetime('2016-02-20 22:51:30'), pd.to_datetime('2016-02-20 22:56:20')]

impact = CausalImpact(df, pre_period, post_period)
impact.run()
impact.plot()
> impact.summary()
>                         Average      Cumulative
> Actual                      156            4687
> Predicted                   129            3883
> 95% CI                [93, 165]    [2812, 4955]
>                                                
> Absolute Effect              26             803
> 95% CI                 [62, -8]    [1874, -268]
>                                                
> Relative Effect           20.7%           20.7%
> 95% CI           [48.3%, -6.9%]  [48.3%, -6.9%]
> 

image

So the Python Version seems much more restrictive.

By the way:
In inferences.py I had to change

        cum_effect = point_effect.copy()
        cum_effect.iloc[df_pre.index[0]:df_pre.index[-1]] = 0
        cum_effect = np.cumsum(cum_effect)
        cum_effect_upper = point_effect_upper.copy()
        cum_effect_upper.iloc[df_pre.index[0]:df_pre.index[-1]] = 0
        cum_effect_upper = np.cumsum(cum_effect_upper)
        cum_effect_lower = point_effect_lower.copy()
        cum_effect_lower.iloc[df_pre.index[0]:df_pre.index[-1]] = 0
        cum_effect_lower = np.cumsum(cum_effect_lower)

to

        cum_effect = point_effect.copy()
        cum_effect.loc[df_pre.index[0]:df_pre.index[-1]] = 0
        cum_effect = np.cumsum(cum_effect)
        cum_effect_upper = point_effect_upper.copy()
        cum_effect_upper.loc[df_pre.index[0]:df_pre.index[-1]] = 0
        cum_effect_upper = np.cumsum(cum_effect_upper)
        cum_effect_lower = point_effect_lower.copy()
        cum_effect_lower.loc[df_pre.index[0]:df_pre.index[-1]] = 0
        cum_effect_lower = np.cumsum(cum_effect_lower)

using python 3.6.3
causalimpact 0.1.1
numpy 1.13.3
pandas 0.21.1
seaborn 0.8.1
statsmodels 0.8.0
zeromq 4.1.3 0

and the python plot shows wrong timestamp-markers :)

Key Error: 'upper y'

I have the following pandas object

print(type(causal))
print(causal.columns)
print(type(causal.index))
print(causal.head())
<class 'pandas.core.frame.DataFrame'>
Index(['y', 'x1', 'x2'], dtype='object')
<class 'pandas.core.indexes.datetimes.DatetimeIndex'>
              y   x1   x2
date                     
2017-09-04  150  150  275
2017-09-05  200  249  125
2017-09-06  225  150  249
2017-09-07  150  125  275
2017-09-08  175  325  250

I set variables pre_period and post_period as in your documentation. Then I run

impact = CausalImpact(causal, pre_period, post_period)
impact.run()

and I get

KeyError: 'upper y'

Can you please point me in the right direction?
Thanks!

Error in the index attribute

First of all, thanks for this package. Moving back and forth from python to R is frustrating. I have an issue. My index is RangeIndex(start=0, stop=152, step=1) and pre_period = [0,109]; post_period = [111,151]. I have 2 columns in pandas and when I pass it to CausalImpact I get the following error:

AttributeError                            Traceback (most recent call last)
<ipython-input-86-f56cb9d3e7f1> in <module>()
----> 1 impact.run()

/Users/nawafalsabhan/anaconda/lib/python2.7/site-packages/causalimpact/analysis.pyc in run(self)
     33                                     self.params["ucm_model"],
     34                                     self.params["post_period_response"],
---> 35                                     self.params["alpha"])
     36 
     37         # Depending on input, dispatch to the appropriate Run* method()

/Users/nawafalsabhan/anaconda/lib/python2.7/site-packages/causalimpact/analysis.pyc in _format_input(self, data, pre_period, post_period, model_args, ucm_model, post_period_response, alpha)
    209         # Check <pre_period> and <post_period>
    210         if data is not None:
--> 211             checked = self._format_input_prepost(pre_period, post_period, data)
    212             pre_period = checked["pre_period"]
    213             post_period = checked["post_period"]

/Users/nawafalsabhan/anaconda/lib/python2.7/site-packages/causalimpact/analysis.pyc in _format_input_prepost(self, pre_period, post_period, data)
    104         pre_dtype = np.array(pre_period).dtype
    105         post_dtype = np.array(post_period).dtype
--> 106         if isinstance(data.index, pd.tseries.index.DatetimeIndex):
    107             pre_period = [pd.to_datetime(date) for date in pre_period]
    108             post_period = [pd.to_datetime(date) for date in post_period]

AttributeError: 'module' object has no attribute 'index'

Cannot import CausalImpact because of NameError:

When I import causalimpact, as such

import causalimpact

I get this error

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-1-707f1b214cc6> in <module>()
----> 1 import causalimpact

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/causalimpact-0.1-py2.7.egg/causalimpact/__init__.py in <module>()
      7 """
      8 
----> 9 from causalimpact.analysis import CausalImpact  # noqa
     10 from causalimpact.tests.test import run as test  # noqa

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/causalimpact-0.1-py2.7.egg/causalimpact/analysis.py in <module>()
      6 from causalimpact.misc import standardize_all_variables
      7 from causalimpact.model import construct_model
----> 8 from causalimpact.inferences import compile_posterior_inferences
      9 
     10 class CausalImpact(object):

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/causalimpact-0.1-py2.7.egg/causalimpact/inferences.py in <module>()
      1 from causalimpact.misc import unstandardize
      2 def compile_posterior_inferences(model, data_post, alpha=0.05,
----> 3                                  orig_std_params=identity):
      4 
      5     # Compute point predictions of counterfactual (in standardized space)

NameError: name 'identity' is not defined

I looked at the source code and found that identity is a default arg for compile_posterior_inferences, but identity doesn't exist. I looked elsewhere in the code to see if identity is referenced elsewhere to see if I could fix it myself, and found that theres this: orig_std_params = np.identity on line 243 of analysis.py, but np.identity is a function that returns the identity matrix. It it assigned to the variable orig_std_params without actually giving np.identity an argument. The orig_std_params variable is then reassigned to some other value. So I couldn't make sense of how to fix the code. Also, compile_na_inferences in inferences.py doesn't do anything, seems wrong because the function is used elsewhere in the package.

So I don't know how to use the package as such....

Validation and comparison against the R package

Hi, I'm happy to see that you have ported this package to Python as I was thinking of doing this myself. I am wondering if someone has gone ahead and done some validation and comparisons with the R package to ensure the results for known inputs are as expected and are also same across the two packages?
Thanks!

Calculation error of p-value

I'm not sure whether there is an error in the calculation of p_value of the code.
As the following figure shows, the p_value is calculated by the mean value of the synthetic control group's predictor rather than the point effect value. I guess you may be want to use the latter?
image
Another evidence is that when I use the example from https://colab.research.google.com/drive/1HkJ9zm0LY36Wz-wB_bSHq68w8Cef6qJO?usp=sharing#scrollTo=AqyItZ3Hggoh, even if I don't change the value of y after 3000, the p-value is still significant and inconsistent with the confidence interval given(including zero)

Think p-value was not calculated properly

Thanks for this package!

#assuming approximately normal distribution
#calculate standard deviation from the 95% conf interval
std_pred = (mean_upper - mean_pred) / 1.96
#calculate z score
z_score = (0 - mean_pred) / std_pred
#convert to probability

think this takes values of original series, and these are almost always different from zero. Would be better to look at rel_effect_upper and rel_effect instead.

Also

p_value = st.norm.cdf(z_score)
prob_causal = (100 - p_value)

p_value will be from 0 to 1, not sure if 1-p_value makes more sense..?

support bayesian estimation methods

A major difference between the R version and this is the method of estimation. Currently this only supports maximum likelihood estimation and results in differences between the two packages as noticed in #7

CSV Data?

Hi,

If I wanted it to read from a CSV file:

data = pd.read_csv('Traffic.csv',header=None,names = ['CountryX','CountryY'])

where would I need to save the file for it read this?

Many Thanks
JD

How to get P-value in Python version?

Summary of R package provides "Posterior tail-area probability p" and "Posterior prob. of a causal effect". I cannot find it in summary of python package. Is there any reason of not reporting them?

Only allows with a predictor time series

Hello! Been really enjoying the library as it's saved me having to either re-learn R or get R and pandas talking.

One notable question:

In analysis.py you currently block any attempt to use the model without a predictor time series. In R it's still possible to use it with a single Series and it remains useful (although obviously without a strong control actually measuring uplift for example is very hard).

Was just wondering about the decision behind this, is it a functional thing or just a spare time thing? (with this presumably just being a fun side project).

I can bypass it by looking for the first nonzero value in the first column rather than the second, or just forcing to take the max from the pre-period, but this just gives me a flat line prediction unlike in R and at this point I'm reaching the end of my skills.

if data.shape[1] == 1:  # no exogenous values provided
        raise ValueError("data contains no exogenous variables")
non_null = pd.isnull(data.iloc[:, 1]).nonzero()
first_non_null = non_null[0]

TypeError: is_list_like() takes 1 positional argument but 2 were given

I never had this error until recently.
When I do impact.run(), I see the following error:

`---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
in
----> 1 impact.run()

/opt/anaconda3/envs/ishbooks/lib/python3.6/site-packages/causalimpact/analysis.py in run(self)
107 kwargs["model_args"],
108 kwargs["alpha"],
--> 109 self.params["estimation"],
110 )
111 else:

/opt/anaconda3/envs/ishbooks/lib/python3.6/site-packages/causalimpact/analysis.py in _run_with_data(self, data, pre_period, post_period, model_args, alpha, estimation)
422 if model_args["standardize_data"]:
423 sd_results = standardize_all_variables(
--> 424 data_modeling, pre_period, post_period
425 )
426 df_pre = sd_results["data_pre"]

/opt/anaconda3/envs/ishbooks/lib/python3.6/site-packages/causalimpact/misc.py in standardize_all_variables(data, pre_period, post_period)
17
18 if not (
---> 19 pd.api.types.is_list_like(pre_period, list)
20 and pd.api.types.is_list_like(post_period)
21 ):

TypeError: is_list_like() takes 1 positional argument but 2 were given`

It looks to me like is_list_like in this instance does not need the second argument, namely "list". Not sure how that suddenly became a problem on my end however.

ImportError: No module named tests.test

It's no problem if I just comment out the import in init but I thought you might want to know about the error.

File "/usr/local/lib/python2.7/dist-packages/causalimpact-0.1-py2.7.egg/causalimpact/init.py", line 10, in
from causalimpact.tests.test import run as test # noqa
ImportError: No module named tests.test

Confidence Interval First Data Point

Just wondering why the point_pred lower and upper bounds are unusually large for the first data point. This happens even in the example while this is not the case with the R package.

e.g. from the example: Lower: -2804.815502 & Upper: 3048.805211

and the rest have more normal values.

Is there any way to fix this issue?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.