Giter Club home page Giter Club logo

Comments (5)

adavis-85 avatar adavis-85 commented on May 25, 2024

Was this solved? I'm having the same area using dowhy.fit.

from econml.

samanbanafti avatar samanbanafti commented on May 25, 2024

No, I have not heard back yet

from econml.

adavis-85 avatar adavis-85 commented on May 25, 2024

Realized my instrument variable wasn't binary and had to be. Using Econml and DoWhy in tandem from the sample notebooks online for EconMl.

from econml.

kbattocchi avatar kbattocchi commented on May 25, 2024

Sorry for the slow response - a couple of thoughts:

  1. It would help if you could provide a simplified repro; are a significant number of units treated and untreated at each time point?
  2. The DynamicDML class is intended for scenarios where treatments may be repeated, so it is not necessary to keep T=1 after the time of first treatment unless the units are actually continuing to receive treatment.

from econml.

benTC74 avatar benTC74 commented on May 25, 2024

Hi @kbattocchi and @samanbanafti I am just wondering how can this issue be solved? Because I encountered the same problem when I am using Causal Forest DML with dowhy fit and set discrete treatment to be True for the treatment. My treatment is a categorical variable with category type, it has values such as "High Impact", "Medium Impact" and "Low Impact" etc. It was working when I use the model on a continuous treatment variable except it is not RandomForestClassifier and discrete treatment is False.

Code:

first_stage_reg = lambda: GridSearchCV(estimator=RandomForestRegressor(n_estimators=1000),
                                              param_grid={
                                                  'max_depth': max_depth,
                                                  'max_features': max_features,
                                                  'min_samples_split': min_samples_split
                                              }, cv=5, n_jobs=-1, scoring='neg_mean_squared_error'
                                             )

first_stage_class = lambda: GridSearchCV(estimator=RandomForestClassifier(n_estimators=1000),
                                              param_grid={
                                                  'max_depth': max_depth,
                                                  'max_features': max_features,
                                                  'min_samples_split': min_samples_split
                                              }, cv=5, n_jobs=-1, scoring='neg_mean_squared_error'
                                             )

model_y = first_stage_reg().fit(X, Y).best_estimator_
model_t = first_stage_class().fit(X, T).best_estimator_

est_nonparam = CausalForestDML(model_y=model_y, model_t=model_t, discrete_treatment=True, n_estimators=1000, cv=5)

est_nonparam_dw = est_nonparam.dowhy.fit(Y, T, X, W=None, groups=groups,
                                        outcome_names=target_feature1, 
                                        treatment_names=['RegulatoryIndex'],
                                        feature_names=Agg_df_imputed_transformed.iloc[:, ~Agg_df_imputed_transformed.columns.isin(['RegulatoryIndex']+
                                                                                      target_features + country_indicator)].columns.tolist(),
                                        inference='blb')

Error:
One or more of the test scores are non-finite: [nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
nan nan nan nan nan nan nan nan nan nan nan nan nan nan]
econml has not been tested with dowhy versions >= 0.11

AttributeError Traceback (most recent call last)
Cell In[232], line 27
23 model_t = first_stage_class().fit(X, T).best_estimator_
25 est_nonparam = CausalForestDML(model_y=model_y, model_t=model_t, discrete_treatment=True, n_estimators=1000, cv=5)
---> 27 est_nonparam_dw = est_nonparam.dowhy.fit(Y, T, X, W=None, groups=groups,
28 outcome_names=target_feature1,
29 treatment_names=['RegulatoryIndex'],
30 feature_names=Agg_df_imputed_transformed.iloc[:, ~Agg_df_imputed_transformed.columns.isin(['RegulatoryIndex']+
31 target_features+
32 country_indicator)].columns.tolist(),
33 inference='blb')

File ~\AppData\Local\anaconda3\Lib\site-packages\econml\dowhy.py:180, in DoWhyWrapper.fit(self, Y, T, X, W, Z, outcome_names, treatment_names, feature_names, confounder_names, instrument_names, graph, estimand_type, proceed_when_unidentifiable, missing_nodes_as_confounders, control_value, treatment_value, target_units, **kwargs)
178 for p in self.get_params():
179 init_params[p] = getattr(self.cate_estimator, p)
--> 180 self.estimate
= self.dowhy
.estimate_effect(self.identified_estimand_,
181 method_name=method_name,
182 control_value=control_value,
183 treatment_value=treatment_value,
184 target_units=target_units,
185 method_params={
186 "init_params": init_params,
187 "fit_params": kwargs,
188 },
189 )
190 return self

File ~\AppData\Local\anaconda3\Lib\site-packages\dowhy\causal_model.py:360, in CausalModel.estimate_effect(self, identified_estimand, method_name, control_value, treatment_value, test_significance, evaluate_effect_strength, confidence_intervals, target_units, effect_modifiers, fit_estimator, method_params)
349 causal_estimator = causal_estimator_class(
350 identified_estimand,
351 test_significance=test_significance,
(...)
355 **extra_args,
356 )
358 self._estimator_cache[method_name] = causal_estimator
--> 360 return estimate_effect(
361 self._data,
362 self._treatment,
363 self._outcome,
364 identifier_name,
365 causal_estimator,
366 control_value,
367 treatment_value,
368 target_units,
369 effect_modifiers,
370 fit_estimator,
371 method_params,
372 )

File ~\AppData\Local\anaconda3\Lib\site-packages\dowhy\causal_estimator.py:719, in estimate_effect(data, treatment, outcome, identifier_name, estimator, control_value, treatment_value, target_units, effect_modifiers, fit_estimator, method_params)
714 return CausalEstimate(
715 None, None, None, None, None, None, control_value=control_value, treatment_value=treatment_value
716 )
718 if fit_estimator:
--> 719 estimator.fit(
720 data=data,
721 effect_modifier_names=effect_modifiers,
722 **method_params["fit_params"] if "fit_params" in method_params else {},
723 )
725 estimate = estimator.estimate_effect(
726 data,
727 treatment_value=treatment_value,
(...)
730 confidence_intervals=estimator._confidence_intervals,
731 )
733 if estimator._significance_test:

File ~\AppData\Local\anaconda3\Lib\site-packages\dowhy\causal_estimators\econml.py:194, in Econml.fit(self, data, effect_modifier_names, **kwargs)
190 estimator_named_args = estimator_argspec.args + estimator_argspec.kwonlyargs
191 estimator_data_args = {
192 arg: named_data_args[arg] for arg in named_data_args.keys() if arg in estimator_named_args
193 }
--> 194 self.estimator.fit(**estimator_data_args, **kwargs)
196 return self

File ~\AppData\Local\anaconda3\Lib\site-packages\econml\dml\causal_forest.py:854, in CausalForestDML.fit(self, Y, T, X, W, sample_weight, groups, cache_values, inference)
852 if X is None:
853 raise ValueError("This estimator does not support X=None!")
--> 854 return super().fit(Y, T, X=X, W=W,
855 sample_weight=sample_weight, groups=groups,
856 cache_values=cache_values,
857 inference=inference)

File ~\AppData\Local\anaconda3\Lib\site-packages\econml\dml_rlearner.py:422, in _RLearner.fit(self, Y, T, X, W, sample_weight, freq_weight, sample_var, groups, cache_values, inference)
385 """
386 Estimate the counterfactual model from data, i.e. estimates function :math:\\theta(\\cdot).
387
(...)
419 self: _RLearner instance
420 """
421 # Replacing fit from _OrthoLearner, to enforce Z=None and improve the docstring
--> 422 return super().fit(Y, T, X=X, W=W,
423 sample_weight=sample_weight, freq_weight=freq_weight, sample_var=sample_var, groups=groups,
424 cache_values=cache_values,
425 inference=inference)

File ~\AppData\Local\anaconda3\Lib\site-packages\econml_cate_estimator.py:131, in BaseCateEstimator._wrap_fit..call(self, Y, T, inference, *args, **kwargs)
129 inference.prefit(self, Y, T, *args, **kwargs)
130 # call the wrapped fit method
--> 131 m(self, Y, T, *args, **kwargs)
132 self._postfit(Y, T, *args, **kwargs)
133 if inference is not None:
134 # NOTE: we call inference fit after calling the main fit method

File ~\AppData\Local\anaconda3\Lib\site-packages\econml_ortho_learner.py:832, in _OrthoLearner.fit(self, Y, T, X, W, Z, sample_weight, freq_weight, sample_var, groups, cache_values, inference, only_final, check_input)
830 nuisances, fitted_models, new_inds, scores = ray.get(self.nuisances_ref[idx])
831 else:
--> 832 nuisances, fitted_models, new_inds, scores = self._fit_nuisances(
833 Y, T, X, W, Z, sample_weight=sample_weight_nuisances, groups=groups)
834 all_nuisances.append(nuisances)
835 self._models_nuisance.append(fitted_models)

File ~\AppData\Local\anaconda3\Lib\site-packages\econml_ortho_learner.py:982, in _OrthoLearner._fit_nuisances(self, Y, T, X, W, Z, sample_weight, groups)
979 else:
980 folds = splitter.split(to_split, strata)
--> 982 nuisances, fitted_models, fitted_inds, scores = _crossfit(self._ortho_learner_model_nuisance, folds,
983 self.use_ray, self.ray_remote_func_options, Y, T,
984 X=X, W=W, Z=Z, sample_weight=sample_weight,
985 groups=groups)
986 return nuisances, fitted_models, fitted_inds, scores

File ~\AppData\Local\anaconda3\Lib\site-packages\econml_ortho_learner.py:284, in _crossfit(models, folds, use_ray, ray_remote_fun_option, *args, **kwargs)
282 nuisance_temp, model_out, score_temp = ray.get(fold_refs[idx])
283 else:
--> 284 nuisance_temp, model_out, score_temp = _fit_fold(model, train_idxs, test_idxs,
285 calculate_scores, accumulated_args, kwargs)
287 if idx == 0:
288 nuisances = tuple([np.full((n,) + nuis.shape[1:], np.nan)
289 for nuis in nuisance_temp])

File ~\AppData\Local\anaconda3\Lib\site-packages\econml_ortho_learner.py:99, in _fit_fold(model, train_idxs, test_idxs, calculate_scores, args, kwargs)
96 kwargs_train = {key: var[train_idxs] for key, var in kwargs.items()}
97 kwargs_test = {key: var[test_idxs] for key, var in kwargs.items()}
---> 99 model.train(False, None, *args_train, **kwargs_train)
100 nuisance_temp = model.predict(*args_test, **kwargs_test)
102 if not isinstance(nuisance_temp, tuple):

File ~\AppData\Local\anaconda3\Lib\site-packages\econml\dml_rlearner.py:53, in _ModelNuisance.train(self, is_selecting, folds, Y, T, X, W, Z, sample_weight, groups)
51 def train(self, is_selecting, folds, Y, T, X=None, W=None, Z=None, sample_weight=None, groups=None):
52 assert Z is None, "Cannot accept instrument!"
---> 53 self._model_t.train(is_selecting, folds, X, W, T, **
54 filter_none_kwargs(sample_weight=sample_weight, groups=groups))
55 self._model_y.train(is_selecting, folds, X, W, Y, **
56 filter_none_kwargs(sample_weight=sample_weight, groups=groups))
57 return self

File ~\AppData\Local\anaconda3\Lib\site-packages\econml\dml\dml.py:91, in _FirstStageSelector.train(self, is_selecting, folds, X, W, Target, sample_weight, groups)
86 if self._discrete_target:
87 # In this case, the Target is the one-hot-encoding of the treatment variable
88 # We need to go back to the label representation of the one-hot so as to call
89 # the classifier.
90 if np.any(np.all(Target == 0, axis=0)) or (not np.any(np.all(Target == 0, axis=1))):
---> 91 raise AttributeError("Provided crossfit folds contain training splits that " +
92 "don't contain all treatments")
93 Target = inverse_onehot(Target)
95 self._model.train(is_selecting, folds, _combine(X, W, Target.shape[0]), Target,
96 **filter_none_kwargs(groups=groups, sample_weight=sample_weight))

AttributeError: Provided crossfit folds contain training splits that don't contain all treatments

Really appreciate if any help can be provided! Thank you very much in advance!!!!

from econml.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.