jamalsenouci / causalimpact Goto Github PK
View Code? Open in Web Editor NEWPython port of CausalImpact R library
License: Apache License 2.0
Python port of CausalImpact R library
License: Apache License 2.0
Hi,
When I'm running the example provided in the documentation with the custom model I get the following error:
File "StatisticalMethod.py", line 160, in main_stat impact.run() File "/home/reihane/anaconda3/lib/python3.6/site-packages/causalimpact/analysis.py", line 46, in run self.params["estimation"]) File "/home/reihane/anaconda3/lib/python3.6/site-packages/causalimpact/analysis.py", line 339, in _run_with_ucm orig_std_params, estimation) TypeError: compile_posterior_inferences() missing 1 required positional argument: 'estimation'
I am using:
python 3.6
numpy 1.15.3
pandas 0.23.0
statsmodels 0.9.0
nose 1.3.7
Was wondering if I'm missing something or not using the correct library versions.
Thanks,
Reihane
Hello,
I am trying to use this package, following the steps in your notebook: https://github.com/jamalsenouci/causalimpact/blob/master/GettingStarted.ipynb
I have successfully installed the package in Power Shell:
And this is how my notebook looks like:
However, when I try to run the package, these are the errors I get:
Can you please help me figure out what I am doing wrong please?
Thank you in advance and Happy New Year!
AInhoa
Binder link in docs fails to build dockerfile due to version of numpy pinned in requirements.txt
Pandas Version: 0.21.1
Code:
import numpy as np
import pandas as pd
from statsmodels.tsa.arima_process import arma_generate_sample
import matplotlib
import seaborn as sns
%matplotlib inline
matplotlib.rcParams['figure.figsize'] = (15, 6)
np.random.seed(1)
x1 = arma_generate_sample(ar=[0.999], ma=[0.9], nsample=100) + 100
y = 1.2 * x1 + np.random.randn(100)
y[71:100] = y[71:100] + 10
data = pd.DataFrame(np.array([y, x1]).T, columns=["y","x1"])
data.plot()
pre_period = [0,69]
post_period = [70,99]
impact = CausalImpact(data, pre_period, post_period)
impact.run()
impact.plot()
Error:
AttributeError Traceback (most recent call last)
in ()
2 post_period = [70,99]
3 impact = CausalImpact(data, pre_period, post_period)
----> 4 impact.run()
5 impact.plot()/anaconda/lib/python3.6/site-packages/causalimpact/analysis.py in run(self)
33 self.params["ucm_model"],
34 self.params["post_period_response"],
---> 35 self.params["alpha"])
36
37 # Depending on input, dispatch to the appropriate Run* method()/anaconda/lib/python3.6/site-packages/causalimpact/analysis.py in _format_input(self, data, pre_period, post_period, model_args, ucm_model, post_period_response, alpha)
209 # Check <pre_period> and <post_period>
210 if data is not None:
--> 211 checked = self._format_input_prepost(pre_period, post_period, data)
212 pre_period = checked["pre_period"]
213 post_period = checked["post_period"]/anaconda/lib/python3.6/site-packages/causalimpact/analysis.py in _format_input_prepost(self, pre_period, post_period, data)
104 pre_dtype = np.array(pre_period).dtype
105 post_dtype = np.array(post_period).dtype
--> 106 if isinstance(data.index, pd.tseries.index.DatetimeIndex):
107 pre_period = [pd.to_datetime(date) for date in pre_period]
108 post_period = [pd.to_datetime(date) for date in post_period]AttributeError: module 'pandas.tseries' has no attribute 'index'
Assuming this will be under Apache license since its derived from an Apache License.
Can you add the Apache License file? Not having it prevents "worry free" re-use and extensibility.
I was able to run the code all fine until I upgraded some of the packages and broke the code that was running fine. Looks like something related to Python3.6. Can you help me fix this ?
Even the example code given in the documentation is breaking.
import numpy as np
import pandas as pd
from statsmodels.tsa.arima_process import arma_generate_sample
import matplotlib
import seaborn as sns
%matplotlib inline
matplotlib.rcParams['figure.figsize'] = (15, 6)
# Data Prep
np.random.seed(1)
x1 = arma_generate_sample(ar=[0.999], ma=[0.9], nsample=100) + 100
y = 1.2 * x1 + np.random.randn(100)
y[71:100] = y[71:100] + 10
data = pd.DataFrame(np.array([y, x1]).T, columns=["y","x1"])
# Model
pre_period = [0,69]
post_period = [70,99]
impact = CausalImpact(data, pre_period, post_period)
impact.run()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-62-a062d48e2479> in <module>()
18 post_period = [70,99]
19 impact = CausalImpact(data, pre_period, post_period)
---> 20 impact.run()
/Users/shikhardua/anaconda3/lib/python3.6/site-packages/causalimpact/analysis.py in run(self)
33 self.params["ucm_model"],
34 self.params["post_period_response"],
---> 35 self.params["alpha"])
36
37 # Depending on input, dispatch to the appropriate Run* method()
/Users/shikhardua/anaconda3/lib/python3.6/site-packages/causalimpact/analysis.py in _format_input(self, data, pre_period, post_period, model_args, ucm_model, post_period_response, alpha)
205 # representing time points
206 if data is not None:
--> 207 data = self._format_input_data(data)
208
209 # Check <pre_period> and <post_period>
/Users/shikhardua/anaconda3/lib/python3.6/site-packages/causalimpact/analysis.py in _format_input_data(self, data)
75 # Must not have NA in covariates (if any)
76 if len(data.columns) >= 2:
---> 77 if np.any(pd.isnull(data.iloc[:, 1:])):
78 raise ValueError("covariates must not contain null values")
79
/Users/shikhardua/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py in __nonzero__(self)
951 raise ValueError("The truth value of a {0} is ambiguous. "
952 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 953 .format(self.__class__.__name__))
954
955 __bool__ = __nonzero__
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Hey Jamal, I played a little bit with your package and I really appreciate your work a lot ! I am looking forward to port some R stuff to Python and now it seems nearly possible thanks to your effort :)
Unfortunately I encoutered some differences between These two worlds which I can not explain at the Moment. See below. If you have any idea let me know.
library("CausalImpact")
library("zoo")
DATA = "
t,y,x1,x2\n
2016-02-20 22:41:20,110.0,134.0,128.0\n
2016-02-20 22:41:30,125.0,134.0,128.0\n
2016-02-20 22:41:40,123.0,134.0,128.0\n
2016-02-20 22:41:50,128.0,134.0,128.0\n
2016-02-20 22:42:00,114.0,134.0,128.0\n
2016-02-20 22:42:10,125.0,133.0,128.0\n
2016-02-20 22:42:20,119.0,133.0,128.0\n
2016-02-20 22:42:30,121.0,133.0,128.0\n
2016-02-20 22:42:40,139.0,133.0,128.0\n
2016-02-20 22:42:50,107.0,133.0,128.0\n
2016-02-20 22:43:00,115.0,132.0,128.0\n
2016-02-20 22:43:10,91.0,132.0,128.0\n
2016-02-20 22:43:20,107.0,132.0,128.0\n
2016-02-20 22:43:30,124.0,132.0,128.0\n
2016-02-20 22:43:40,116.0,131.0,128.0\n
2016-02-20 22:43:50,110.0,131.0,128.0\n
2016-02-20 22:44:00,100.0,131.0,128.0\n
2016-02-20 22:44:10,110.0,131.0,128.0\n
2016-02-20 22:44:20,113.0,129.0,128.0\n
2016-02-20 22:44:30,103.0,129.0,128.0\n
2016-02-20 22:44:40,117.0,129.0,128.0\n
2016-02-20 22:44:50,125.0,129.0,128.0\n
2016-02-20 22:45:00,115.0,129.0,128.0\n
2016-02-20 22:45:10,114.0,128.0,128.0\n
2016-02-20 22:45:20,138.0,128.0,128.0\n
2016-02-20 22:45:30,117.0,128.0,128.0\n
2016-02-20 22:45:40,104.0,128.0,128.0\n
2016-02-20 22:45:50,123.0,128.0,128.0\n
2016-02-20 22:46:00,122.0,128.0,128.0\n
2016-02-20 22:46:10,150.0,128.0,128.0\n
2016-02-20 22:46:20,127.0,128.0,128.0\n
2016-02-20 22:46:30,139.0,128.0,128.0\n
2016-02-20 22:46:40,139.0,127.0,127.0\n
2016-02-20 22:46:50,109.0,127.0,127.0\n
2016-02-20 22:47:00,107.0,127.0,127.0\n
2016-02-20 22:47:10,94.0,127.0,127.0\n
2016-02-20 22:47:20,112.0,127.0,127.0\n
2016-02-20 22:47:30,107.0,127.0,127.0\n
2016-02-20 22:47:40,126.0,127.0,127.0\n
2016-02-20 22:47:50,114.0,127.0,127.0\n
2016-02-20 22:48:00,129.0,127.0,127.0\n
2016-02-20 22:48:10,113.0,126.0,127.0\n
2016-02-20 22:48:20,114.0,126.0,127.0\n
2016-02-20 22:48:30,116.0,126.0,127.0\n
2016-02-20 22:48:40,110.0,125.0,126.0\n
2016-02-20 22:48:50,131.0,125.0,126.0\n
2016-02-20 22:49:00,109.0,125.0,126.0\n
2016-02-20 22:49:10,114.0,125.0,127.0\n
2016-02-20 22:49:20,116.0,125.0,126.0\n
2016-02-20 22:49:30,113.0,124.0,125.0\n
2016-02-20 22:49:40,108.0,124.0,125.0\n
2016-02-20 22:49:50,120.0,124.0,125.0\n
2016-02-20 22:50:00,106.0,123.0,125.0\n
2016-02-20 22:50:10,123.0,123.0,125.0\n
2016-02-20 22:50:20,123.0,123.0,124.0\n
2016-02-20 22:50:30,135.0,123.0,124.0\n
2016-02-20 22:50:40,127.0,123.0,124.0\n
2016-02-20 22:50:50,140.0,123.0,123.0\n
2016-02-20 22:51:00,139.0,123.0,123.0\n
2016-02-20 22:51:10,137.0,123.0,123.0\n
2016-02-20 22:51:20,123.0,123.0,123.0\n
2016-02-20 22:51:30,160.0,122.0,123.0\n
2016-02-20 22:51:40,173.0,122.0,123.0\n
2016-02-20 22:51:50,236.0,122.0,123.0\n
2016-02-20 22:52:00,233.0,122.0,123.0\n
2016-02-20 22:52:10,193.0,122.0,123.0\n
2016-02-20 22:52:20,169.0,122.0,123.0\n
2016-02-20 22:52:30,167.0,122.0,123.0\n
2016-02-20 22:52:40,172.0,121.0,123.0\n
2016-02-20 22:52:50,148.0,121.0,123.0\n
2016-02-20 22:53:00,125.0,121.0,123.0\n
2016-02-20 22:53:10,132.0,121.0,123.0\n
2016-02-20 22:53:20,165.0,121.0,123.0\n
2016-02-20 22:53:30,154.0,120.0,123.0\n
2016-02-20 22:53:40,158.0,120.0,123.0\n
2016-02-20 22:53:50,135.0,120.0,123.0\n
2016-02-20 22:54:00,145.0,120.0,123.0\n
2016-02-20 22:54:10,163.0,119.0,122.0\n
2016-02-20 22:54:20,146.0,119.0,122.0\n
2016-02-20 22:54:30,120.0,119.0,121.0\n
2016-02-20 22:54:40,149.0,118.0,121.0\n
2016-02-20 22:54:50,140.0,118.0,121.0\n
2016-02-20 22:55:00,150.0,117.0,121.0\n
2016-02-20 22:55:10,133.0,117.0,120.0\n
2016-02-20 22:55:20,143.0,117.0,120.0\n
2016-02-20 22:55:30,145.0,117.0,120.0\n
2016-02-20 22:55:40,145.0,117.0,120.0\n
2016-02-20 22:55:50,176.0,117.0,120.0\n
2016-02-20 22:56:00,134.0,117.0,120.0\n
2016-02-20 22:56:10,147.0,117.0,120.0\n
2016-02-20 22:56:20,131.0,117.0,120.0"
to_time <- function(s) {
return(as.POSIXct(trimws(paste(s, '')), format="%Y-%m-%d %H:%M:%S", tz="Europe/Berlin"))
}
df <- read.table(textConnection(DATA), sep=",", header=T)
df$t = to_time(df$t)
df <- zoo(cbind(df$y, df$x1, df$x2), df$t)
pre_period <- c(to_time('2016-02-20 22:41:20'), to_time('2016-02-20 22:51:20'))
post_period <-c(to_time('2016-02-20 22:51:30'), to_time('2016-02-20 22:56:20'))
impact <- CausalImpact(df,pre_period, post_period)
impact$summary
> summary(impact)
Posterior inference {CausalImpact}
Average Cumulative
Actual 156 4687
Prediction (s.d.) 129 (4.6) 3875 (139.3)
95% CI [120, 138] [3602, 4139]
Absolute effect (s.d.) 27 (4.6) 812 (139.3)
95% CI [18, 36] [548, 1085]
Relative effect (s.d.) 21% (3.6%) 21% (3.6%)
95% CI [14%, 28%] [14%, 28%]
Posterior tail-area probability p: 0.001
Posterior prob. of a causal effect: 99.8998%
For more details, type: summary(impact, "report")
import pandas as pd
import sys
from io import StringIO
from causalimpact import CausalImpact
DATA = """
t,y,x1,x2\n
2016-02-20 22:41:20,110.0,134.0,128.0\n
2016-02-20 22:41:30,125.0,134.0,128.0\n
2016-02-20 22:41:40,123.0,134.0,128.0\n
2016-02-20 22:41:50,128.0,134.0,128.0\n
2016-02-20 22:42:00,114.0,134.0,128.0\n
2016-02-20 22:42:10,125.0,133.0,128.0\n
2016-02-20 22:42:20,119.0,133.0,128.0\n
2016-02-20 22:42:30,121.0,133.0,128.0\n
2016-02-20 22:42:40,139.0,133.0,128.0\n
2016-02-20 22:42:50,107.0,133.0,128.0\n
2016-02-20 22:43:00,115.0,132.0,128.0\n
2016-02-20 22:43:10,91.0,132.0,128.0\n
2016-02-20 22:43:20,107.0,132.0,128.0\n
2016-02-20 22:43:30,124.0,132.0,128.0\n
2016-02-20 22:43:40,116.0,131.0,128.0\n
2016-02-20 22:43:50,110.0,131.0,128.0\n
2016-02-20 22:44:00,100.0,131.0,128.0\n
2016-02-20 22:44:10,110.0,131.0,128.0\n
2016-02-20 22:44:20,113.0,129.0,128.0\n
2016-02-20 22:44:30,103.0,129.0,128.0\n
2016-02-20 22:44:40,117.0,129.0,128.0\n
2016-02-20 22:44:50,125.0,129.0,128.0\n
2016-02-20 22:45:00,115.0,129.0,128.0\n
2016-02-20 22:45:10,114.0,128.0,128.0\n
2016-02-20 22:45:20,138.0,128.0,128.0\n
2016-02-20 22:45:30,117.0,128.0,128.0\n
2016-02-20 22:45:40,104.0,128.0,128.0\n
2016-02-20 22:45:50,123.0,128.0,128.0\n
2016-02-20 22:46:00,122.0,128.0,128.0\n
2016-02-20 22:46:10,150.0,128.0,128.0\n
2016-02-20 22:46:20,127.0,128.0,128.0\n
2016-02-20 22:46:30,139.0,128.0,128.0\n
2016-02-20 22:46:40,139.0,127.0,127.0\n
2016-02-20 22:46:50,109.0,127.0,127.0\n
2016-02-20 22:47:00,107.0,127.0,127.0\n
2016-02-20 22:47:10,94.0,127.0,127.0\n
2016-02-20 22:47:20,112.0,127.0,127.0\n
2016-02-20 22:47:30,107.0,127.0,127.0\n
2016-02-20 22:47:40,126.0,127.0,127.0\n
2016-02-20 22:47:50,114.0,127.0,127.0\n
2016-02-20 22:48:00,129.0,127.0,127.0\n
2016-02-20 22:48:10,113.0,126.0,127.0\n
2016-02-20 22:48:20,114.0,126.0,127.0\n
2016-02-20 22:48:30,116.0,126.0,127.0\n
2016-02-20 22:48:40,110.0,125.0,126.0\n
2016-02-20 22:48:50,131.0,125.0,126.0\n
2016-02-20 22:49:00,109.0,125.0,126.0\n
2016-02-20 22:49:10,114.0,125.0,127.0\n
2016-02-20 22:49:20,116.0,125.0,126.0\n
2016-02-20 22:49:30,113.0,124.0,125.0\n
2016-02-20 22:49:40,108.0,124.0,125.0\n
2016-02-20 22:49:50,120.0,124.0,125.0\n
2016-02-20 22:50:00,106.0,123.0,125.0\n
2016-02-20 22:50:10,123.0,123.0,125.0\n
2016-02-20 22:50:20,123.0,123.0,124.0\n
2016-02-20 22:50:30,135.0,123.0,124.0\n
2016-02-20 22:50:40,127.0,123.0,124.0\n
2016-02-20 22:50:50,140.0,123.0,123.0\n
2016-02-20 22:51:00,139.0,123.0,123.0\n
2016-02-20 22:51:10,137.0,123.0,123.0\n
2016-02-20 22:51:20,123.0,123.0,123.0\n
2016-02-20 22:51:30,160.0,122.0,123.0\n
2016-02-20 22:51:40,173.0,122.0,123.0\n
2016-02-20 22:51:50,236.0,122.0,123.0\n
2016-02-20 22:52:00,233.0,122.0,123.0\n
2016-02-20 22:52:10,193.0,122.0,123.0\n
2016-02-20 22:52:20,169.0,122.0,123.0\n
2016-02-20 22:52:30,167.0,122.0,123.0\n
2016-02-20 22:52:40,172.0,121.0,123.0\n
2016-02-20 22:52:50,148.0,121.0,123.0\n
2016-02-20 22:53:00,125.0,121.0,123.0\n
2016-02-20 22:53:10,132.0,121.0,123.0\n
2016-02-20 22:53:20,165.0,121.0,123.0\n
2016-02-20 22:53:30,154.0,120.0,123.0\n
2016-02-20 22:53:40,158.0,120.0,123.0\n
2016-02-20 22:53:50,135.0,120.0,123.0\n
2016-02-20 22:54:00,145.0,120.0,123.0\n
2016-02-20 22:54:10,163.0,119.0,122.0\n
2016-02-20 22:54:20,146.0,119.0,122.0\n
2016-02-20 22:54:30,120.0,119.0,121.0\n
2016-02-20 22:54:40,149.0,118.0,121.0\n
2016-02-20 22:54:50,140.0,118.0,121.0\na
2016-02-20 22:55:00,150.0,117.0,121.0\n
2016-02-20 22:55:10,133.0,117.0,120.0\n
2016-02-20 22:55:20,143.0,117.0,120.0\n
2016-02-20 22:55:30,145.0,117.0,120.0\n
2016-02-20 22:55:40,145.0,117.0,120.0\n
2016-02-20 22:55:50,176.0,117.0,120.0\n
2016-02-20 22:56:00,134.0,117.0,120.0\n
2016-02-20 22:56:10,147.0,117.0,120.0\n
2016-02-20 22:56:20,131.0,117.0,120.0"""
df = pd.read_csv(StringIO(DATA))
df["t"] = pd.to_datetime(df["t"])
df.index = df["t"]
del df["t"]
pre_period = [pd.to_datetime('2016-02-20 22:41:20'), pd.to_datetime('2016-02-20 22:51:20')]
post_period = [pd.to_datetime('2016-02-20 22:51:30'), pd.to_datetime('2016-02-20 22:56:20')]
impact = CausalImpact(df, pre_period, post_period)
impact.run()
impact.plot()
> impact.summary()
> Average Cumulative
> Actual 156 4687
> Predicted 129 3883
> 95% CI [93, 165] [2812, 4955]
>
> Absolute Effect 26 803
> 95% CI [62, -8] [1874, -268]
>
> Relative Effect 20.7% 20.7%
> 95% CI [48.3%, -6.9%] [48.3%, -6.9%]
>
So the Python Version seems much more restrictive.
By the way:
In inferences.py I had to change
cum_effect = point_effect.copy()
cum_effect.iloc[df_pre.index[0]:df_pre.index[-1]] = 0
cum_effect = np.cumsum(cum_effect)
cum_effect_upper = point_effect_upper.copy()
cum_effect_upper.iloc[df_pre.index[0]:df_pre.index[-1]] = 0
cum_effect_upper = np.cumsum(cum_effect_upper)
cum_effect_lower = point_effect_lower.copy()
cum_effect_lower.iloc[df_pre.index[0]:df_pre.index[-1]] = 0
cum_effect_lower = np.cumsum(cum_effect_lower)
to
cum_effect = point_effect.copy()
cum_effect.loc[df_pre.index[0]:df_pre.index[-1]] = 0
cum_effect = np.cumsum(cum_effect)
cum_effect_upper = point_effect_upper.copy()
cum_effect_upper.loc[df_pre.index[0]:df_pre.index[-1]] = 0
cum_effect_upper = np.cumsum(cum_effect_upper)
cum_effect_lower = point_effect_lower.copy()
cum_effect_lower.loc[df_pre.index[0]:df_pre.index[-1]] = 0
cum_effect_lower = np.cumsum(cum_effect_lower)
using python 3.6.3
causalimpact 0.1.1
numpy 1.13.3
pandas 0.21.1
seaborn 0.8.1
statsmodels 0.8.0
zeromq 4.1.3 0
and the python plot shows wrong timestamp-markers :)
Causal impact is failing recently due to this error:
module 'pandas.core.dtypes.common' has no attribute 'is_date_time_or_time_delta_type'
Pandas seems to have removed is_date_time_or_time_delta_type
as I can not find it here:
https://github.com/pandas-dev/pandas/blob/main/pandas/core/common.py
I have the following pandas object
print(type(causal))
print(causal.columns)
print(type(causal.index))
print(causal.head())
<class 'pandas.core.frame.DataFrame'>
Index(['y', 'x1', 'x2'], dtype='object')
<class 'pandas.core.indexes.datetimes.DatetimeIndex'>
y x1 x2
date
2017-09-04 150 150 275
2017-09-05 200 249 125
2017-09-06 225 150 249
2017-09-07 150 125 275
2017-09-08 175 325 250
I set variables pre_period
and post_period
as in your documentation. Then I run
impact = CausalImpact(causal, pre_period, post_period)
impact.run()
and I get
KeyError: 'upper y'
Can you please point me in the right direction?
Thanks!
Using Pandas 0.20.1, when I try to import CausalImpact package, I get the following error
ImportError: cannot import name 'PandasError'
First of all, thanks for this package. Moving back and forth from python to R is frustrating. I have an issue. My index is RangeIndex(start=0, stop=152, step=1)
and pre_period = [0,109]; post_period = [111,151]
. I have 2 columns in pandas and when I pass it to CausalImpact I get the following error:
AttributeError Traceback (most recent call last)
<ipython-input-86-f56cb9d3e7f1> in <module>()
----> 1 impact.run()
/Users/nawafalsabhan/anaconda/lib/python2.7/site-packages/causalimpact/analysis.pyc in run(self)
33 self.params["ucm_model"],
34 self.params["post_period_response"],
---> 35 self.params["alpha"])
36
37 # Depending on input, dispatch to the appropriate Run* method()
/Users/nawafalsabhan/anaconda/lib/python2.7/site-packages/causalimpact/analysis.pyc in _format_input(self, data, pre_period, post_period, model_args, ucm_model, post_period_response, alpha)
209 # Check <pre_period> and <post_period>
210 if data is not None:
--> 211 checked = self._format_input_prepost(pre_period, post_period, data)
212 pre_period = checked["pre_period"]
213 post_period = checked["post_period"]
/Users/nawafalsabhan/anaconda/lib/python2.7/site-packages/causalimpact/analysis.pyc in _format_input_prepost(self, pre_period, post_period, data)
104 pre_dtype = np.array(pre_period).dtype
105 post_dtype = np.array(post_period).dtype
--> 106 if isinstance(data.index, pd.tseries.index.DatetimeIndex):
107 pre_period = [pd.to_datetime(date) for date in pre_period]
108 post_period = [pd.to_datetime(date) for date in post_period]
AttributeError: 'module' object has no attribute 'index'
When I import causalimpact, as such
import causalimpact
I get this error
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-1-707f1b214cc6> in <module>()
----> 1 import causalimpact
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/causalimpact-0.1-py2.7.egg/causalimpact/__init__.py in <module>()
7 """
8
----> 9 from causalimpact.analysis import CausalImpact # noqa
10 from causalimpact.tests.test import run as test # noqa
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/causalimpact-0.1-py2.7.egg/causalimpact/analysis.py in <module>()
6 from causalimpact.misc import standardize_all_variables
7 from causalimpact.model import construct_model
----> 8 from causalimpact.inferences import compile_posterior_inferences
9
10 class CausalImpact(object):
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/causalimpact-0.1-py2.7.egg/causalimpact/inferences.py in <module>()
1 from causalimpact.misc import unstandardize
2 def compile_posterior_inferences(model, data_post, alpha=0.05,
----> 3 orig_std_params=identity):
4
5 # Compute point predictions of counterfactual (in standardized space)
NameError: name 'identity' is not defined
I looked at the source code and found that identity is a default arg for compile_posterior_inferences
, but identity doesn't exist. I looked elsewhere in the code to see if identity is referenced elsewhere to see if I could fix it myself, and found that theres this: orig_std_params = np.identity
on line 243 of analysis.py, but np.identity is a function that returns the identity matrix. It it assigned to the variable orig_std_params without actually giving np.identity an argument. The orig_std_params variable is then reassigned to some other value. So I couldn't make sense of how to fix the code. Also, compile_na_inferences
in inferences.py doesn't do anything, seems wrong because the function is used elsewhere in the package.
So I don't know how to use the package as such....
Hi, I'm happy to see that you have ported this package to Python as I was thinking of doing this myself. I am wondering if someone has gone ahead and done some validation and comparisons with the R package to ensure the results for known inputs are as expected and are also same across the two packages?
Thanks!
I'm not sure whether there is an error in the calculation of p_value of the code.
As the following figure shows, the p_value is calculated by the mean value of the synthetic control group's predictor rather than the point effect value. I guess you may be want to use the latter?
Another evidence is that when I use the example from https://colab.research.google.com/drive/1HkJ9zm0LY36Wz-wB_bSHq68w8Cef6qJO?usp=sharing#scrollTo=AqyItZ3Hggoh, even if I don't change the value of y after 3000, the p-value is still significant and inconsistent with the confidence interval given(including zero)
Thanks for this package!
#assuming approximately normal distribution
#calculate standard deviation from the 95% conf interval
std_pred = (mean_upper - mean_pred) / 1.96
#calculate z score
z_score = (0 - mean_pred) / std_pred
#convert to probability
think this takes values of original series, and these are almost always different from zero. Would be better to look at rel_effect_upper and rel_effect instead.
Also
p_value = st.norm.cdf(z_score)
prob_causal = (100 - p_value)
p_value will be from 0 to 1, not sure if 1-p_value makes more sense..?
A major difference between the R version and this is the method of estimation. Currently this only supports maximum likelihood estimation and results in differences between the two packages as noticed in #7
This link isn't working anymore: http://jamalsenouci.github.io/CausalImpact/CausalImpact.html
Any idea of where else folks can find this material?
Thanks.
The codecov action is not finding a coverage.xml file to upload
see #7 for details
Hi,
If I wanted it to read from a CSV file:
data = pd.read_csv('Traffic.csv',header=None,names = ['CountryX','CountryY'])
where would I need to save the file for it read this?
Many Thanks
JD
Summary of R package provides "Posterior tail-area probability p" and "Posterior prob. of a causal effect". I cannot find it in summary of python package. Is there any reason of not reporting them?
Hello! Been really enjoying the library as it's saved me having to either re-learn R or get R and pandas talking.
One notable question:
In analysis.py you currently block any attempt to use the model without a predictor time series. In R it's still possible to use it with a single Series and it remains useful (although obviously without a strong control actually measuring uplift for example is very hard).
Was just wondering about the decision behind this, is it a functional thing or just a spare time thing? (with this presumably just being a fun side project).
I can bypass it by looking for the first nonzero value in the first column rather than the second, or just forcing to take the max from the pre-period, but this just gives me a flat line prediction unlike in R and at this point I'm reaching the end of my skills.
if data.shape[1] == 1: # no exogenous values provided
raise ValueError("data contains no exogenous variables")
non_null = pd.isnull(data.iloc[:, 1]).nonzero()
first_non_null = non_null[0]
breaking out the wider confidence intervals from the rest of the issues raised in #7. This is potentially due to the R version placing an upper limit on the standard deviation
I never had this error until recently.
When I do impact.run(), I see the following error:
`---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
in
----> 1 impact.run()
/opt/anaconda3/envs/ishbooks/lib/python3.6/site-packages/causalimpact/analysis.py in run(self)
107 kwargs["model_args"],
108 kwargs["alpha"],
--> 109 self.params["estimation"],
110 )
111 else:
/opt/anaconda3/envs/ishbooks/lib/python3.6/site-packages/causalimpact/analysis.py in _run_with_data(self, data, pre_period, post_period, model_args, alpha, estimation)
422 if model_args["standardize_data"]:
423 sd_results = standardize_all_variables(
--> 424 data_modeling, pre_period, post_period
425 )
426 df_pre = sd_results["data_pre"]
/opt/anaconda3/envs/ishbooks/lib/python3.6/site-packages/causalimpact/misc.py in standardize_all_variables(data, pre_period, post_period)
17
18 if not (
---> 19 pd.api.types.is_list_like(pre_period, list)
20 and pd.api.types.is_list_like(post_period)
21 ):
TypeError: is_list_like() takes 1 positional argument but 2 were given`
It looks to me like is_list_like in this instance does not need the second argument, namely "list". Not sure how that suddenly became a problem on my end however.
It's no problem if I just comment out the import in init but I thought you might want to know about the error.
File "/usr/local/lib/python2.7/dist-packages/causalimpact-0.1-py2.7.egg/causalimpact/init.py", line 10, in
from causalimpact.tests.test import run as test # noqa
ImportError: No module named tests.test
Just wondering why the point_pred lower and upper bounds are unusually large for the first data point. This happens even in the example while this is not the case with the R package.
e.g. from the example: Lower: -2804.815502 & Upper: 3048.805211
and the rest have more normal values.
Is there any way to fix this issue?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.