Comments (2)
@pzivich , Thanks so much for the suggestion. It ended up to be an issue for the exposure model (g model), where somehow 1 of the dataset only have 1 class. After I increase the fold in the SL to 10, it worked. If I forge the SL and use sctmle.exposure_model(g_formula, GLMSL(sm.families.family.Binomial()))
it also works.
from zepid.
Hi @miaow27 so the PerfectSeparation
error comes up from the targeting step in TMLE. It's hard to tell exactly what is happening here. Can you copy the error here?
There are two possible causes: the random forest over-fitting or something in the g-model is highly correlated with the exposure (which ends up with a perfect separation when trying to fit that model in a split randomly).
Essentially in the cross-fit process, we break everything into two pieces then fit the algorithm (SL with one learner in the above). When the data gets split, sometimes the random forests have a tendency to over-fit (especially with a SL).
- The easiest fix would be to tune the hyperparameters of the random forest. I would try changing
min_samples_split
to something like 5 or 10 (instead of 2). - Another potential fix would be to increase the folds in the SL (3 is pretty low, essentially it takes the split then splits it in 3 pieces. More pieces gives more data to fit with). You could also forgo SL (since it only have one model).
- Lastly, you could instead add some 'smoother' learners to the Q SL. Something like a GAM or MARS would shrink the influence of the random forest (if the variance for the random forest is very high).
If it is the g-model correlation, then I would try a different seed. That might get the cross-fit to run. How to fix that issue is a little trickier to think through
from zepid.
Related Issues (20)
- IPTW handle PerfectSeparationErrors in the marginal structural model better
- AIPW for survival analysis ? HOT 1
- Dual treatments
- ValueError better pytest strategy
- Package compatibility? HOT 2
- Update documentation (and possibly re-organize) HOT 2
- MonteCarloGFormula
- Add Odds Ratio and other estimands for AIPTW and TMLE
- Addition of meta-analysis tools
- add p-value column in a forrest plot/ effectmeasureplot HOT 2
- Enhancement in graphics.py to change odds text size HOT 1
- Saving DAGs programatically HOT 11
- sklearn dependancy in setup.py should be scikit-learn HOT 1
- AIPW formula equivalent to what's in the literature? HOT 2
- Superlearn check weights HOT 2
- SingleCrossFit `invalid value encountered in log` HOT 8
- Unable to install latest 0.9.0 version through pip HOT 7
- Risk Ratio Summary HOT 1
- The default regression argument of zepid.base.interaction_contrast_ratio differs from the description in the documentation. HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from zepid.