Hello, When calling DynamicDML() as

Sorry for the slow response - a couple of thoughts: It would h

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

DynamicDML() issue: AttributeError: Provided crossfit folds contain training splits that don't contain all treatments DynamicDML about econml HOT 5 OPEN

samanbanafti commented on May 25, 2024

DynamicDML() issue: AttributeError: Provided crossfit folds contain training splits that don't contain all treatments DynamicDML

from econml.

Comments (5)

adavis-85 commented on May 25, 2024

Was this solved? I'm having the same area using dowhy.fit.

from econml.

samanbanafti commented on May 25, 2024

No, I have not heard back yet

from econml.

adavis-85 commented on May 25, 2024

Realized my instrument variable wasn't binary and had to be. Using Econml and DoWhy in tandem from the sample notebooks online for EconMl.

from econml.

kbattocchi commented on May 25, 2024

Sorry for the slow response - a couple of thoughts:

It would help if you could provide a simplified repro; are a significant number of units treated and untreated at each time point?
The DynamicDML class is intended for scenarios where treatments may be repeated, so it is not necessary to keep T=1 after the time of first treatment unless the units are actually continuing to receive treatment.

from econml.

benTC74 commented on May 25, 2024

Hi @kbattocchi and @samanbanafti I am just wondering how can this issue be solved? Because I encountered the same problem when I am using Causal Forest DML with dowhy fit and set discrete treatment to be True for the treatment. My treatment is a categorical variable with category type, it has values such as "High Impact", "Medium Impact" and "Low Impact" etc. It was working when I use the model on a continuous treatment variable except it is not RandomForestClassifier and discrete treatment is False.

Code:

first_stage_reg = lambda: GridSearchCV(estimator=RandomForestRegressor(n_estimators=1000),
                                              param_grid={
                                                  'max_depth': max_depth,
                                                  'max_features': max_features,
                                                  'min_samples_split': min_samples_split
                                              }, cv=5, n_jobs=-1, scoring='neg_mean_squared_error'
                                             )

first_stage_class = lambda: GridSearchCV(estimator=RandomForestClassifier(n_estimators=1000),
                                              param_grid={
                                                  'max_depth': max_depth,
                                                  'max_features': max_features,
                                                  'min_samples_split': min_samples_split
                                              }, cv=5, n_jobs=-1, scoring='neg_mean_squared_error'
                                             )

model_y = first_stage_reg().fit(X, Y).best_estimator_
model_t = first_stage_class().fit(X, T).best_estimator_

est_nonparam = CausalForestDML(model_y=model_y, model_t=model_t, discrete_treatment=True, n_estimators=1000, cv=5)

est_nonparam_dw = est_nonparam.dowhy.fit(Y, T, X, W=None, groups=groups,
                                        outcome_names=target_feature1, 
                                        treatment_names=['RegulatoryIndex'],
                                        feature_names=Agg_df_imputed_transformed.iloc[:, ~Agg_df_imputed_transformed.columns.isin(['RegulatoryIndex']+
                                                                                      target_features + country_indicator)].columns.tolist(),
                                        inference='blb')

Error:
One or more of the test scores are non-finite: [nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
nan nan nan nan nan nan nan nan nan nan nan nan nan nan]
econml has not been tested with dowhy versions >= 0.11

AttributeError Traceback (most recent call last)
Cell In[232], line 27
23 model_t = first_stage_class().fit(X, T).best_estimator_
25 est_nonparam = CausalForestDML(model_y=model_y, model_t=model_t, discrete_treatment=True, n_estimators=1000, cv=5)
---> 27 est_nonparam_dw = est_nonparam.dowhy.fit(Y, T, X, W=None, groups=groups,
28 outcome_names=target_feature1,
29 treatment_names=['RegulatoryIndex'],
30 feature_names=Agg_df_imputed_transformed.iloc[:, ~Agg_df_imputed_transformed.columns.isin(['RegulatoryIndex']+
31 target_features+
32 country_indicator)].columns.tolist(),
33 inference='blb')

File ~\AppData\Local\anaconda3\Lib\site-packages\econml\dowhy.py:180, in DoWhyWrapper.fit(self, Y, T, X, W, Z, outcome_names, treatment_names, feature_names, confounder_names, instrument_names, graph, estimand_type, proceed_when_unidentifiable, missing_nodes_as_confounders, control_value, treatment_value, target_units, **kwargs)
178 for p in self.get_params():
179 init_params[p] = getattr(self.cate_estimator, p)
--> 180 self.estimate = self.dowhy.estimate_effect(self.identified_estimand_,
181 method_name=method_name,
182 control_value=control_value,
183 treatment_value=treatment_value,
184 target_units=target_units,
185 method_params={
186 "init_params": init_params,
187 "fit_params": kwargs,
188 },
189 )
190 return self

File ~\AppData\Local\anaconda3\Lib\site-packages\dowhy\causal_model.py:360, in CausalModel.estimate_effect(self, identified_estimand, method_name, control_value, treatment_value, test_significance, evaluate_effect_strength, confidence_intervals, target_units, effect_modifiers, fit_estimator, method_params)
349 causal_estimator = causal_estimator_class(
350 identified_estimand,
351 test_significance=test_significance,
(...)
355 **extra_args,
356 )
358 self._estimator_cache[method_name] = causal_estimator
--> 360 return estimate_effect(
361 self._data,
362 self._treatment,
363 self._outcome,
364 identifier_name,
365 causal_estimator,
366 control_value,
367 treatment_value,
368 target_units,
369 effect_modifiers,
370 fit_estimator,
371 method_params,
372 )

File ~\AppData\Local\anaconda3\Lib\site-packages\dowhy\causal_estimator.py:719, in estimate_effect(data, treatment, outcome, identifier_name, estimator, control_value, treatment_value, target_units, effect_modifiers, fit_estimator, method_params)
714 return CausalEstimate(
715 None, None, None, None, None, None, control_value=control_value, treatment_value=treatment_value
716 )
718 if fit_estimator:
--> 719 estimator.fit(
720 data=data,
721 effect_modifier_names=effect_modifiers,
722 **method_params["fit_params"] if "fit_params" in method_params else {},
723 )
725 estimate = estimator.estimate_effect(
726 data,
727 treatment_value=treatment_value,
(...)
730 confidence_intervals=estimator._confidence_intervals,
731 )
733 if estimator._significance_test:

File ~\AppData\Local\anaconda3\Lib\site-packages\dowhy\causal_estimators\econml.py:194, in Econml.fit(self, data, effect_modifier_names, **kwargs)
190 estimator_named_args = estimator_argspec.args + estimator_argspec.kwonlyargs
191 estimator_data_args = {
192 arg: named_data_args[arg] for arg in named_data_args.keys() if arg in estimator_named_args
193 }
--> 194 self.estimator.fit(**estimator_data_args, **kwargs)
196 return self

File ~\AppData\Local\anaconda3\Lib\site-packages\econml\dml\causal_forest.py:854, in CausalForestDML.fit(self, Y, T, X, W, sample_weight, groups, cache_values, inference)
852 if X is None:
853 raise ValueError("This estimator does not support X=None!")
--> 854 return super().fit(Y, T, X=X, W=W,
855 sample_weight=sample_weight, groups=groups,
856 cache_values=cache_values,
857 inference=inference)

File ~\AppData\Local\anaconda3\Lib\site-packages\econml\dml_rlearner.py:422, in _RLearner.fit(self, Y, T, X, W, sample_weight, freq_weight, sample_var, groups, cache_values, inference)
385 """
386 Estimate the counterfactual model from data, i.e. estimates function :math:\\theta(\\cdot).
387
(...)
419 self: _RLearner instance
420 """
421 # Replacing fit from _OrthoLearner, to enforce Z=None and improve the docstring
--> 422 return super().fit(Y, T, X=X, W=W,
423 sample_weight=sample_weight, freq_weight=freq_weight, sample_var=sample_var, groups=groups,
424 cache_values=cache_values,
425 inference=inference)

File ~\AppData\Local\anaconda3\Lib\site-packages\econml_cate_estimator.py:131, in BaseCateEstimator._wrap_fit..call(self, Y, T, inference, *args, **kwargs)
129 inference.prefit(self, Y, T, *args, **kwargs)
130 # call the wrapped fit method
--> 131 m(self, Y, T, *args, **kwargs)
132 self._postfit(Y, T, *args, **kwargs)
133 if inference is not None:
134 # NOTE: we call inference fit after calling the main fit method

File ~\AppData\Local\anaconda3\Lib\site-packages\econml_ortho_learner.py:832, in _OrthoLearner.fit(self, Y, T, X, W, Z, sample_weight, freq_weight, sample_var, groups, cache_values, inference, only_final, check_input)
830 nuisances, fitted_models, new_inds, scores = ray.get(self.nuisances_ref[idx])
831 else:
--> 832 nuisances, fitted_models, new_inds, scores = self._fit_nuisances(
833 Y, T, X, W, Z, sample_weight=sample_weight_nuisances, groups=groups)
834 all_nuisances.append(nuisances)
835 self._models_nuisance.append(fitted_models)

File ~\AppData\Local\anaconda3\Lib\site-packages\econml_ortho_learner.py:982, in _OrthoLearner._fit_nuisances(self, Y, T, X, W, Z, sample_weight, groups)
979 else:
980 folds = splitter.split(to_split, strata)
--> 982 nuisances, fitted_models, fitted_inds, scores = _crossfit(self._ortho_learner_model_nuisance, folds,
983 self.use_ray, self.ray_remote_func_options, Y, T,
984 X=X, W=W, Z=Z, sample_weight=sample_weight,
985 groups=groups)
986 return nuisances, fitted_models, fitted_inds, scores

File ~\AppData\Local\anaconda3\Lib\site-packages\econml_ortho_learner.py:284, in _crossfit(models, folds, use_ray, ray_remote_fun_option, *args, **kwargs)
282 nuisance_temp, model_out, score_temp = ray.get(fold_refs[idx])
283 else:
--> 284 nuisance_temp, model_out, score_temp = _fit_fold(model, train_idxs, test_idxs,
285 calculate_scores, accumulated_args, kwargs)
287 if idx == 0:
288 nuisances = tuple([np.full((n,) + nuis.shape[1:], np.nan)
289 for nuis in nuisance_temp])

File ~\AppData\Local\anaconda3\Lib\site-packages\econml_ortho_learner.py:99, in _fit_fold(model, train_idxs, test_idxs, calculate_scores, args, kwargs)
96 kwargs_train = {key: var[train_idxs] for key, var in kwargs.items()}
97 kwargs_test = {key: var[test_idxs] for key, var in kwargs.items()}
---> 99 model.train(False, None, *args_train, **kwargs_train)
100 nuisance_temp = model.predict(*args_test, **kwargs_test)
102 if not isinstance(nuisance_temp, tuple):

File ~\AppData\Local\anaconda3\Lib\site-packages\econml\dml_rlearner.py:53, in _ModelNuisance.train(self, is_selecting, folds, Y, T, X, W, Z, sample_weight, groups)
51 def train(self, is_selecting, folds, Y, T, X=None, W=None, Z=None, sample_weight=None, groups=None):
52 assert Z is None, "Cannot accept instrument!"
---> 53 self._model_t.train(is_selecting, folds, X, W, T, **
54 filter_none_kwargs(sample_weight=sample_weight, groups=groups))
55 self._model_y.train(is_selecting, folds, X, W, Y, **
56 filter_none_kwargs(sample_weight=sample_weight, groups=groups))
57 return self

File ~\AppData\Local\anaconda3\Lib\site-packages\econml\dml\dml.py:91, in _FirstStageSelector.train(self, is_selecting, folds, X, W, Target, sample_weight, groups)
86 if self._discrete_target:
87 # In this case, the Target is the one-hot-encoding of the treatment variable
88 # We need to go back to the label representation of the one-hot so as to call
89 # the classifier.
90 if np.any(np.all(Target == 0, axis=0)) or (not np.any(np.all(Target == 0, axis=1))):
---> 91 raise AttributeError("Provided crossfit folds contain training splits that " +
92 "don't contain all treatments")
93 Target = inverse_onehot(Target)
95 self._model.train(is_selecting, folds, _combine(X, W, Target.shape[0]), Target,
96 **filter_none_kwargs(groups=groups, sample_weight=sample_weight))

AttributeError: Provided crossfit folds contain training splits that don't contain all treatments

Really appreciate if any help can be provided! Thank you very much in advance!!!!

from econml.

DynamicDML() issue: AttributeError: Provided crossfit folds contain training splits that don't contain all treatments DynamicDML about econml HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent