doubleml / doubleml-for-py Goto Github PK
View Code? Open in Web Editor NEWDoubleML - Double Machine Learning in Python
Home Page: https://docs.doubleml.org
License: BSD 3-Clause "New" or "Revised" License
DoubleML - Double Machine Learning in Python
Home Page: https://docs.doubleml.org
License: BSD 3-Clause "New" or "Revised" License
see
doubleml-for-py/doubleml/double_ml.py
Line 32 in 87f0140
This is more of a nitpick :) I think there is an implicit assumption that the types of the outcome_variable
and treatment_variable(s)
should be float. So if we provide a dataframe to DoubleMLData
where those variables are of type Decimal
, the partialling out step fails with the error shown below. This is more of an issue specially when reading parquet files.
TypeError Traceback (most recent call last)
Cell In[36], line 1
----> 1 dml_plr.fit(n_jobs_cv = -1)
File ~/anaconda3/envs/python3/lib/python3.10/site-packages/doubleml/double_ml.py:605, in DoubleML.fit(self, n_jobs_cv, store_predictions, external_predictions, store_models)
602 ext_prediction_dict[learner] = None
604 # ml estimation of nuisance models and computation of score elements
--> 605 score_elements, preds = self._nuisance_est(self.__smpls, n_jobs_cv,
606 external_predictions=ext_prediction_dict,
607 return_models=store_models)
609 self._set_score_elements(score_elements, self._i_rep, self._i_treat)
611 # calculate rmses and store predictions and targets of the nuisance models
File ~/anaconda3/envs/python3/lib/python3.10/site-packages/doubleml/double_ml_plr.py:231, in DoubleMLPLR._nuisance_est(self, smpls, n_jobs_cv, external_predictions, return_models)
226 g_hat = {'preds': external_predictions['ml_g'],
227 'targets': None,
228 'models': None}
229 else:
230 # get an initial estimate for theta using the partialling out score
--> 231 psi_a = -np.multiply(d - m_hat['preds'], d - m_hat['preds'])
232 psi_b = np.multiply(d - m_hat['preds'], y - l_hat['preds'])
233 theta_initial = -np.nanmean(psi_b) / np.nanmean(psi_a)
TypeError: unsupported operand type(s) for -: 'decimal.Decimal' and 'float'
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LassoCV
from doubleml import DoubleMLData, DoubleMLPLR
df = pd.read_parquet("/...")
x_cols = [x for x in df.columns if "pre_" in x]
d_col = "event_action"
y_col = "post_outcome"
dml_data = DoubleMLData(df, y_col = y_col, d_cols=d_col, x_cols=x_cols)
learner = RandomForestRegressor(n_jobs = -1)
lasso = LassoCV()
dml_plr = DoubleMLPLR(dml_data, ml_l = learner, ml_g = learner, ml_m=lasso, score= "IV-type", n_folds = 2)
dml_plr.fit(n_jobs_cv = -1)
Model should fit successfully.
TypeError Traceback (most recent call last)
Cell In[36], line 1
----> 1 dml_plr.fit(n_jobs_cv = -1)
File ~/anaconda3/envs/python3/lib/python3.10/site-packages/doubleml/double_ml.py:605, in DoubleML.fit(self, n_jobs_cv, store_predictions, external_predictions, store_models)
602 ext_prediction_dict[learner] = None
604 # ml estimation of nuisance models and computation of score elements
--> 605 score_elements, preds = self._nuisance_est(self.__smpls, n_jobs_cv,
606 external_predictions=ext_prediction_dict,
607 return_models=store_models)
609 self._set_score_elements(score_elements, self._i_rep, self._i_treat)
611 # calculate rmses and store predictions and targets of the nuisance models
File ~/anaconda3/envs/python3/lib/python3.10/site-packages/doubleml/double_ml_plr.py:231, in DoubleMLPLR._nuisance_est(self, smpls, n_jobs_cv, external_predictions, return_models)
226 g_hat = {'preds': external_predictions['ml_g'],
227 'targets': None,
228 'models': None}
229 else:
230 # get an initial estimate for theta using the partialling out score
--> 231 psi_a = -np.multiply(d - m_hat['preds'], d - m_hat['preds'])
232 psi_b = np.multiply(d - m_hat['preds'], y - l_hat['preds'])
233 theta_initial = -np.nanmean(psi_b) / np.nanmean(psi_a)
TypeError: unsupported operand type(s) for -: 'decimal.Decimal' and 'float'
Linux-5.10.205-195.807.amzn2.x86_64-x86_64-with-glibc2.26
Python 3.10.13 | packaged by conda-forge | (main, Oct 26 2023, 18:07:37) [GCC 12.3.0]
DoubleML 0.7.1
Scikit-Learn 1.3.2
Extend the NonLinearScoreMixin
to handle data with clustering.
It should be sufficient to extend the estimation of the coefficient/parameter as the variance estimation is identical.
For dml1
, just use the current implementation without clustering (should be identical).
For dml2
, adjust the scaling of the score analogously to LinearScoreMixin
.
No response
No response
set_ml_nuisance_params('ml_g', 'd', params)
is being called multiple times it results in the last params
dict being used.set_ml_nuisance_params()
does not overwrite already set parameter sets (that also stem from set_ml_nuisance_params()
calls) but instead combines them: Add new keys to dict and replace existing ones.Currently, if you want to evaluate a policy (e.g. derived by IRM policy_tree()
), the gate()
method is the best call. However, this has two disadvantages: Firstly, you can only have gate()
function does not provide sensitivity analysis.
With this feature request I suggest to add an option weights
to the IRM model.
Allow the DoubleMLIRM
object to take a ATE
score as proposed in the Long Story Short paper to get a weighted average treatment effect. weights=None
should be the default and estimate an ATE which is equivalent to the current ATE implementation. If score='ATTE'
, then no weights should be allowed.
Additionally, in a later step, we might add a evaluate_policy()
function that computes the policy value and change the weights of an existing object without refitting (if possible).
The alternative would be to add weights to the sensitivity_analysis()
function, this however would be way more complex as currently the coefficient is not recalculated and furthermore it would change the DoubleML
class having implications on every other model.
Hello, I am trying to deep dive the ATE and ATT components of the DMl-IRM model. For example, the IRM ATE is:
I understand that since
Allowing the estimated IRM model to output the individual scores.
I think these estimates are already available because they can be called for CATE estimation. But this requires specifying a basis function to project the CATE estimates onto. Being able to call it directly from IRM model output would be great and more flexible.
No response
https://docs.doubleml.org/stable/examples/py_double_ml_pension.html
example shows ValueError in PLR model
ValueError Gram matrix passed in via 'precompute' parameter did not pass validation when a single element was checked - please check that it was computed properly. For element (4,5) we computed 3375.771728515625 but the user-supplied value was 3375.773193359375.
https://docs.doubleml.org/stable/examples/py_double_ml_pension.html
Cs = 0.0001*np.logspace(0, 4, 10)
lasso = make_pipeline(StandardScaler(), LassoCV(cv=5, max_iter=10000))
lasso_class = make_pipeline(StandardScaler(),
LogisticRegressionCV(cv=5, penalty='l1', solver='liblinear',
Cs = Cs, max_iter=1000))
np.random.seed(123)
dml_plr_lasso = dml.DoubleMLPLR(data_dml_base,
ml_g = lasso,
ml_m = lasso_class,
n_folds = 3)
try:
dml_plr_lasso.fit(store_predictions=True)
except ValueError as ve:
print('ignore exception ValueError', ve)
no ValueError
ValueError Gram matrix passed in via 'precompute' parameter did not pass validation when a single element was checked - please check that it was computed properly. For element (4,5) we computed 3375.771728515625 but the user-supplied value was 3375.773193359375.
Linux-5.13.0-30-generic-x86_64-with-glibc2.29
Python 3.8.10 (default, Nov 26 2021, 20:14:08)
[GCC 9.3.0]
DoubleML 1.0 -> pip list says DoubleML 0.4.1
Scikit-Learn 1.0
Even when I set both prediction functions to be lasso, which should support for sparse matrix in sklearn, the doubleml pkg throws the error that sparse matrix is not supported. Transforming the matrix to dense format will explode my memory. Is there any particular reason that sparse matrix cannot be used?
Hello,
in the Python 401(k) Case Study when entering the flexible model data into the dml.DoubleMLData object and then printing it, the y-variable (net_tfa) is seen in the x_cols even after it was specified as y_col.
This leads to the lasso model on the flex specification not estimating the coefficient correctly. For some reason it is only an issue with the flex model, but not with the base model. This is a reacent issue. Last week it was working properly.
If one want to create an object for a multi-treatment problem, in which each time just a 1-dimensional parameter theta_j
for the treatment j
is predicted including the rest of treatments /j
with the set of covariates X
, it outputs an error asking for the use of the option use_other_treat_as_covariate
even though it is by default True
.
import numpy as np
import pandas as pd
import doubleml as dml
from doubleml.datasets import fetch_401K
dtypes = data.dtypes
dtypes['nifa'] = 'float64'
dtypes['net_tfa'] = 'float64'
dtypes['tw'] = 'float64'
dtypes['inc'] = 'float64'
data = data.astype(dtypes)
features_base = ['age', 'inc', 'educ', 'fsize', 'marr',
'twoearn', 'db', 'pira', 'hown']
# Initialize DoubleMLData (data-backend of DoubleML)
data_dml_base = dml.DoubleMLData(data,
y_col='net_tfa',
d_cols=['e401', 'pira'],
x_cols=features_base,
use_other_treat_as_covariate=True)
I would expect a successful creation of the data object.
ValueError Traceback (most recent call last)
Cell In[6], line 5
1 features_base = ['age', 'inc', 'educ', 'fsize', 'marr',
2 'twoearn', 'db', 'pira', 'hown']
4 # Initialize DoubleMLData (data-backend of DoubleML)
----> 5 data_dml_base = dml.DoubleMLData(data,
6 y_col='net_tfa',
7 d_cols=['e401', 'pira'],
8 x_cols=features_base,
9 use_other_treat_as_covariate=True)
File ~/first_env/lib/python3.8/site-packages/doubleml/double_ml_data.py:151, in DoubleMLData.init(self, data, y_col, d_cols, x_cols, z_cols, t_col, use_other_treat_as_covariate, force_all_x_finite)
149 self.t_col = t_col
150 self.x_cols = x_cols
--> 151 self._check_disjoint_sets_y_d_x_z_t()
152 self.use_other_treat_as_covariate = use_other_treat_as_covariate
153 self.force_all_x_finite = force_all_x_finite
File ~/first_env/lib/python3.8/site-packages/doubleml/double_ml_data.py:634, in DoubleMLData._check_disjoint_sets_y_d_x_z_t(self)
631 # note that the line xd_list = self.x_cols + self.d_cols in method set_x_d needs adaption if an intersection of
632 # x_cols and d_cols as allowed (see https://github.com/DoubleML/doubleml-for-py/issues/83)%3C/span%3E)
633 if not d_cols_set.isdisjoint(x_cols_set):
--> 634 raise ValueError('At least one variable/column is set as treatment variable (d_cols
) and as covariate'
635 '(x_cols
). Consider using parameter use_other_treat_as_covariate
.')
637 if self.z_cols is not None:
638 z_cols_set = set(self.z_cols)
ValueError: At least one variable/column is set as treatment variable (d_cols
) and as covariate(x_cols
). Consider using parameter use_other_treat_as_covariate
.
Linux-4.15.0-194-generic-x86_64-with-glibc2.17
Python 3.8.2 (default, Feb 26 2020, 14:31:49)
[GCC 6.3.0 20170516]
DoubleML 0.6.1
Scikit-Learn 1.2.2
doubleml-for-py/doubleml/double_ml_irm.py
Lines 199 to 201 in 6ea72e4
predictions
are not yet trimmed. Presumably, it would be more reasonable to make the trimming during the "ML estimation and prediction step". Otherwise users might question whether the trimming really happens.'discard'
trimming_rule
'truncate'
. As alternative, we also want to offer the trimming_rule
'discard'
. For this we need to find a stable way to exclude observations from subsequent steps. Predictions can obviously just be set to np.nan
. In subsequent steps these observations need to be excluded. In the repeated cross-fitting case this can then result in different number of observations being evaluated for different random sample splits. At the beginning we might want to prevent these technically challenging cases and only allow trimming_rule = 'discard'
for n_rep == 1
.Enable the save use of classifiers for the PLR model (see #234)
The model should accept classifiers and switch to predict_proba()
(but only for ml_l
).
Thanks for your contributions to this project. However, compared with EcomML, this method lack of usage for predict.
I tried using the package with XGBoost to estimate the ml_g and ml_m terms. The existence of nulls is no problem for XGBoost as it is able to infer the correct branch split for null values empirically. Indeed, XGBoost is commonly used to estimate propensity scores when the confounding features are potentially missing/null.
Unfortunately, the doubleml package calls sklearn.utils.check_X_y
, and is configured to throw an error if there are missing values in the confounders X. This occurs several times in
https://github.com/DoubleML/doubleml-for-py/blob/master/doubleml/double_ml_plr.py
It seems the fix is as simple as allowing users to pass the check_X_y
kwarg force_all_finite=False
, e.g. see https://scikit-learn.org/stable/modules/generated/sklearn.utils.check_X_y.html. Once this is changed, I've found that an XGBoost model with missing values runs with no issue. Naturally, the response variable y still cannot have null values.
Due to a failure of the job for Python 3.6. on Ubuntu, most jobs are currently failing. We currently get the error message
Error: The version '3.6' with architecture 'x64' was not found for Ubuntu 22.04.
see for example actions/setup-python#544 (comment)
Currently we only check that predictions are finite in
doubleml-for-py/doubleml/_utils.py
Lines 204 to 208 in b3cbdb5
We may additionally want to check that estimated probabilities are strictly in (0,1)
(maybe with some eps threshold). Otherwise, the values of the score functions are likely infinite / missing. It may make sense to not let it directly fail but throw a warning. This way one for example would still have the option to discard these observations from the estimation of the target parameter or to choose a score function where this is accounted for, e.g., trimmed.
Hi,
I heard some time back that there were plans for the final stage score function estimation to be exposed so that we can use our own predictions from our own ML models. Has that feature been completed, and if so, are there examples on how to use it? Thanks!
In addition, I was wondering whether there is a DML multi-period diff-in-diff formulation in literature, and if there is, where is it on your priority list?
No response
No response
No response
use_other_treat_as_covariate
option and to #83.We do have unit tests for model defaults, see https://github.com/DoubleML/doubleml-for-py/blob/master/doubleml/tests/test_doubleml_model_defaults.py. The intention behind such "default setting unit tests" is twofold:
Currently, we only have such "default setting unit tests" for the initialization of the model classes but it would also make sense to extend it to the most important methods.
does the DoublML package have the option to output the residuals of the nuisance models, for example when computing RMSE for predicting D and Y, in order to compare different methods for estimating them. Maybe there is an existing code example somehwere that I couldn't find.
thank you
According to the doc, when setting parameters return_tune_res=True, it should return tune_res (A list containing detailed tuning results and the proposed hyperparameters, returned if return_tune_res is True). However, right now DoubleMLPLR.tune just returns none even return_tune_res=True.
In the case of multiple instruments, the function DoubleMLPLIV.fit() throws an error when executed with the parameter 'store_predictions=True'.
import numpy as np
import doubleml as dml
from doubleml.datasets import make_pliv_CHS2015
from sklearn.ensemble import RandomForestRegressor
from sklearn.base import clone
np.random.seed(3141)
learner = RandomForestRegressor(n_estimators=100, max_features=20, max_depth=5, min_samples_leaf=2)
ml_l = clone(learner)
ml_m = clone(learner)
ml_r = clone(learner)
obj_dml_data = make_pliv_CHS2015(n_obs=500, alpha=1.0, dim_x=10, dim_z=10, return_type='DoubleMLData')
dml_pliv_obj = dml.DoubleMLPLIV(obj_dml_data, ml_l, ml_m, ml_r)
dml_pliv_fit = dml_pliv_obj.fit(store_predictions=True)
Predictions for the whole list of learners ('params_names') are stored, i.e. for:
print(dml_pliv_obj.params_names)
['ml_l',
'ml_r',
'ml_m_Z1',
'ml_m_Z2',
'ml_m_Z3',
'ml_m_Z4',
'ml_m_Z5',
'ml_m_Z6',
'ml_m_Z7',
'ml_m_Z8',
'ml_m_Z9',
'ml_m_Z10']
After executing the code, the following error is stated:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/var/folders/sb/q_1b_jtx6_x55nw95r50s0tr0002mt/T/ipykernel_44055/2685974828.py in <module>
11 obj_dml_data = make_pliv_CHS2015(n_obs=500, alpha=1.0, dim_x=10, dim_z=10, return_type='DoubleMLData')
12 dml_pliv_obj = dml.DoubleMLPLIV(obj_dml_data, ml_l, ml_m, ml_r)
---> 13 dml_pliv_fit = dml_pliv_obj.fit(store_predictions=True)
/opt/anaconda3/envs/py39/lib/python3.10/site-packages/doubleml/double_ml.py in fit(self, n_jobs_cv, keep_scores, store_predictions, store_models)
500
501 if store_predictions:
--> 502 self._store_predictions(preds['predictions'])
503 if store_models:
504 self._store_models(preds['models'])
/opt/anaconda3/envs/py39/lib/python3.10/site-packages/doubleml/double_ml.py in _store_predictions(self, preds)
1000 def _store_predictions(self, preds):
1001 for learner in self.params_names:
-> 1002 self._predictions[learner][:, self._i_rep, self._i_treat] = preds[learner]
1003
1004 def _store_models(self, models):
KeyError: 'ml_m_Z1'
macOS-10.16-x86_64-i386-64bit
Python 3.10.6 (main, Oct 24 2022, 11:04:34) [Clang 12.0.0 ]
DoubleML 0.6.dev0
Scikit-Learn 1.1.3
The IRM model constructs two versions of the learner ml_g
on conditional samples for d==1
and d==0
to fit conditional expectations. To evaluate the out-of-sample performance the outcome/target for each model is set to y
, due to the implementation of dml_cv_predict
. This is not correct and should only be based on the conditional samples.
import numpy as np
import doubleml as dml
from doubleml.datasets import make_irm_data
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
from sklearn.base import clone
np.random.seed(3141)
ml_g = RandomForestRegressor(n_estimators=100, max_features=20, max_depth=5, min_samples_leaf=2)
ml_m = RandomForestClassifier(n_estimators=100, max_features=20, max_depth=5, min_samples_leaf=2)
obj_dml_data = make_irm_data(n_obs=500, dim_x=5)
dml_irm_obj = dml.DoubleMLIRM(obj_dml_data, ml_g, ml_m)
dml_irm_obj.fit()
print(dml_irm_obj.rmses)
The RMSE should be calculated on the correct subsample e.g. for ml_g0
:
np.sqrt(np.power(dml_irm_obj.predictions['ml_g0'][obj_dml_data.d == 0] - dml_irm_obj.nuisance_targets['ml_g0'][obj_dml_data.d == 0], 2).mean())
This would result in
1.0904733380718747
{'ml_g0': array([[1.20999233]]), 'ml_g1': array([[1.1650356]]), 'ml_m': array([[0.43024777]])}
not required
Bug reported by @ShreyDixit:
Assume that one successfully initializes an object of class DoubleMLData
. Then alters a property like y_col
in a way that violates some basic assumptions (e.g., the same variable cannot be at the same time the outcome variable y_col
and the treatment variable d_cols
). This results in a ValueError being raised. However, nevertheless the object mutates and violates the basic assumption.
--> So while the ValueError is appropriately raised, the object nevertheless mutates and the y_col
property is changed. The root cause is in the setter for the y_col
property
doubleml-for-py/doubleml/double_ml_data.py
Lines 353 to 365 in 0690cc6
Basically the value shouldn't be set before all checks have been successfully applied. However, in its current form the _check_disjoint_sets()
check requires that the properties have been set already. The same issue also applies to the other setters for properties like d_cols
, x_cols
, etc. Note however, that this issue only becomes relevant if an object of class DoubleMLData
has been initialized successfully and if then the user alters one of the properties in a way that violates _check_disjoint_sets()
.
Code block 1
from doubleml.datasets import make_plr_CCDDHNR2018
dml_data = make_plr_CCDDHNR2018()
print(dml_data.y_col)
dml_data.y_col = 'd'
Code block 2
print(dml_data.y_col)
First code block: dml_data.y_col == 'y'
and raise exception
ValueError: d cannot be set as outcome variable ``y_col`` and treatment variable in ``d_cols``.
Second code block: dml_data.y_col == 'y'
should still hold.
First code block: dml_data.y_col == 'y'
and raise exception
ValueError: d cannot be set as outcome variable ``y_col`` and treatment variable in ``d_cols``.
Second code block: dml_data.y_col == 'd'
Python 3.9.7
DoubleML 0.4.1
Scikit-Learn 1.0.1
set('abc') = {'a', 'b', 'c'}
, so we need to add brackets around self.y_col
here:
doubleml-for-py/doubleml/double_ml_data.py
Line 259 in be32d1f
See e.g. basics section in user guide
In [15]: def est_ols(y, X):
....: if X.ndim == 1:
....: X = X.reshape(-1, 1)
....: ols = LinearRegression(fit_intercept=False)
....: results = ols.fit(X, y)
....: theta = results.coef_
....: return theta
....:
Especially add everything needed to build the documentation with sphinx.
After estimation of DoubleML models, we apply a multiplier boostrap algorithm to obtain valid simultaneous inference (see also the user guide https://docs.doubleml.org/stable/guide/sim_inf.html or https://arxiv.org/abs/2103.09603). The implementation so far is not aligned with that in case of dml_procedure='dml1'
and needs to be slightly adapted.
Is it possible to access the attributes of the nuisance functions? For example, if the nuisance function is a RandomForestRegressor
, then the sklearn
package allows one to access the attributes such as estimators_
, feature_importances_
etc. Attributes like feature_importances_
can perhaps help identify the confounding variables in the model.
My problem is more about the theory or practice of double ml rather than of the pkg per se. I am sorry for that but I cannot find another place to ask the question. The original paper is way beyond my capability and I learned all about double ml through your document.
The thing is that I read Belloni, Chernozhukov, and Hansen (2014JEP) and find that in a case similar to the partially linear regression, they recommend to apply variable selection methods (lasso) to the two reduced form equations and use all of the selected controls in the traditional estimation (OLS) of the treatment effect of interest. There is no mention of cross-fitting for this double selection method. I wonder that is this double selection method simply the double ml without cross-fitting? And is the double ml with cross-fitting strictly better than double selection for any specific cases?
One related question is I want to know how to do double ml for the cases that there are some covariates that I don't want put them into the ml algorithms but want estimate them in a traditional way (like the covariates in a simple 2SLS). If using double selection, I can add them to the final OLS. But how can I do this with DoubleMLPR
? Or does adding such variables make sense at all?
Thanks in advance for any suggestions.
If a variable is present in d_cols
and in x_cols
, the following lines might be problematic (the duplicate variable can end up twice in the covariate array ._x
):
doubleml-for-py/doubleml/double_ml_data.py
Lines 361 to 367 in 3317fc5
when I use my own data (three variables in D, four variables in X), and after that the predictions for both "ml_l", "ml_m" has shape (n_obs, iteration, number of variables in D), shouldn't it be (n_obs, iteration, 1) for "ml_l"?
Furthermore, if I see the shape of feature importance score of the model for both "ml_l", "ml_m", it is (6,), shouldn't it be (4,) in my case?
In your provided example, it works fine, also it only has one variable in D, so hard to debug, but you can reproduce it using my code.
I hope I don't miss anything but if I do please let me know thanks!
test1=pd.DataFrame({
'd1': np.random.randn(100),
'd2': np.random.randn(100),
'd3': np.random.randn(100),
'x1': np.random.randn(100),
'x2': np.random.randn(100),
'x3': np.random.randn(100),
'x4': np.random.randn(100),
'y': np.random.randn(100)
})
obj_dml_data_from_df = DoubleMLData(test1, 'y', ["d1","d2","d3"])
ml_l=XGBRegressor(random_state=0)
ml_m=XGBRegressor(random_state=0)
dml_plr_obj = dml.DoubleMLPLR(obj_dml_data_from_df, ml_l, ml_m).fit(store_models=True)
print(dml_plr_obj.predictions["ml_l"].shape)
print(dml_plr_obj.predictions["ml_m"].shape)
print(dml_plr_obj.models["ml_l"]["d1"][0][0].feature_importances_.shape)
print(dml_plr_obj.models["ml_m"]["d1"][0][0].feature_importances_.shape)
(100, 1, 1)
(100, 1, 3)
(4,)
(4,)
(100, 1, 3)
(100, 1, 3)
(6,)
(6,)
Linux-5.4.0-150-generic-x86_64-with-glibc2.27
Python 3.10.9 (main, Jan 11 2023, 15:21:40) [GCC 11.2.0]
DoubleML 0.7.1
Scikit-Learn 1.0.2
Thanks for the package. Just wanted to report a very minor issue. You are missing } at the end of the bib entry for Python.
@Article{DoubleML2022Python,
title = {{DoubleML} -- {A}n Object-Oriented Implementation of Double Machine Learning in {P}ython},
author = {Philipp Bach and Victor Chernozhukov and Malte S. Kurz and Martin Spindler},
journal = {Journal of Machine Learning Research},
year = {2022},
volume = {23},
number = {53},
pages = {1--6},
url = {http://jmlr.org/papers/v23/21-0862.html}
} <---- this one.
No response
I was trying to use DML for continuous treatment (price) and binary outcome(churn). Based on the docs, its not possible to use any of these techniques to this case. Is there any way to adjust any of these algorithms for this setup? If there is a paper that I could develop and add to this library I'm down to it as well.
See also DoubleML/doubleml-for-r#83
Thank you for your excellent work, which is very helpful for learning and using DML. I have a question I'd like your guidance on. If I apply DML to panel data, how can the individual fixed effects be controlled? Directly generate individual dummy variables? My current data has about 30,000 individuals, but only three years of data, it may not be appropriate to generate individual dummy variables. But I don't know how to control for imperceptible individual fixed effects. thank you again!
No response
No response
No response
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.