Giter Club home page Giter Club logo

Comments (2)

miaow27 avatar miaow27 commented on July 17, 2024 1

@pzivich , Thanks so much for the suggestion. It ended up to be an issue for the exposure model (g model), where somehow 1 of the dataset only have 1 class. After I increase the fold in the SL to 10, it worked. If I forge the SL and use sctmle.exposure_model(g_formula, GLMSL(sm.families.family.Binomial())) it also works.

from zepid.

pzivich avatar pzivich commented on July 17, 2024

Hi @miaow27 so the PerfectSeparation error comes up from the targeting step in TMLE. It's hard to tell exactly what is happening here. Can you copy the error here?

There are two possible causes: the random forest over-fitting or something in the g-model is highly correlated with the exposure (which ends up with a perfect separation when trying to fit that model in a split randomly).

Essentially in the cross-fit process, we break everything into two pieces then fit the algorithm (SL with one learner in the above). When the data gets split, sometimes the random forests have a tendency to over-fit (especially with a SL).

  • The easiest fix would be to tune the hyperparameters of the random forest. I would try changing min_samples_split to something like 5 or 10 (instead of 2).
  • Another potential fix would be to increase the folds in the SL (3 is pretty low, essentially it takes the split then splits it in 3 pieces. More pieces gives more data to fit with). You could also forgo SL (since it only have one model).
  • Lastly, you could instead add some 'smoother' learners to the Q SL. Something like a GAM or MARS would shrink the influence of the random forest (if the variance for the random forest is very high).

If it is the g-model correlation, then I would try a different seed. That might get the cross-fit to run. How to fix that issue is a little trickier to think through

from zepid.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.