Giter Club home page Giter Club logo

Comments (6)

juliasilge avatar juliasilge commented on September 1, 2024 1

Thanks so much for your discussion! 🙌 I'm cleaning up older issues. Currently tidymodels handles imputation in the recipes package; check out recipe steps for imputation here.

from rsample.

topepo avatar topepo commented on September 1, 2024

That is an interesting prospect; can we have a tidy implementation of multiple imputation methods. I don't know/think that it belongs in rsample mostly because I want to keep the scope of the package small and focused.

I think that it might be good to have a conversion function (or maybe a tidy method) that can take an imputation object and make it workable with purrr::map and other tidyverse components. Before I let Pfizer, we had a non-trivial data analysis workflow for a clinical trial that required more than a simple function call (say to lm) to do the analysis and we wrestled with how to do the MI with existing packages. A tidy approach would enable those types of analysis.

At first glance, it looks like Amelia (and mice and others) couple the imputation and analysis. While it gives you a simple api, it does make things difficult if you want to control or modify the process. Perhaps they are decoupled in worker functions in those package. I don't know enough about them. Perhaps the package authors would be interested in tidy approaches.

I have some technical thoughts I could offer based on what I've learned in rsample. Though.

(I must confess that I haven't done any multiple imputation methods (for inferential analysis) since graduate school; I'm usually worried about prediction so a single imputation usually how that's done.)

Now that I've written this, I realize that I'm rambling. What do you think?

from rsample.

jroberayalas avatar jroberayalas commented on September 1, 2024

Thank you very much for your reply. I find quite interesting the different ideas that you have. Currently, I'm comparing different indicators of cumulative blood pressure (BP) exposure based on historical BP measures to assess whether it is possible to improve the performance of CVD predictive (Cox) models as those based on commonly used models. So far, I'm mostly following your examples using the recipes and rsample packages for survival analysis, since this seems a nice way to assess the importance of the cumulative BP indicators. However, the dataset I was using has some lipid variables (cholesterol, HDL, LDL,...) with a high level of missingness (around 70%), so that was the reason I was asking about the possibility to merge Amelia with rsample as both of them seem to share a lot of features. Nevertheless, I opted to simply omit the lipid variables mainly because 70% missingness is too much and I do not think the models can benefit from them at all. Your examples with recipes and survival analysis are more appropriate with what I'm working on.

I do agree that a tidy approach with MI packages may be quite useful, since a lot of health research (at least here in Oxford) seems to use it a lot to overcome the uncertainty with missing values.

from rsample.

zq2323 avatar zq2323 commented on September 1, 2024

Thank you very much for your reply. I find quite interesting the different ideas that you have. Currently, I'm comparing different indicators of cumulative blood pressure (BP) exposure based on historical BP measures to assess whether it is possible to improve the performance of CVD predictive (Cox) models as those based on commonly used models. So far, I'm mostly following your examples using the recipes and rsample packages for survival analysis, since this seems a nice way to assess the importance of the cumulative BP indicators. However, the dataset I was using has some lipid variables (cholesterol, HDL, LDL,...) with a high level of missingness (around 70%), so that was the reason I was asking about the possibility to merge Amelia with rsample as both of them seem to share a lot of features. Nevertheless, I opted to simply omit the lipid variables mainly because 70% missingness is too much and I do not think the models can benefit from them at all. Your examples with recipes and survival analysis are more appropriate with what I'm working on.

I do agree that a tidy approach with MI packages may be quite useful, since a lot of health research (at least here in Oxford) seems to use it a lot to overcome the uncertainty with missing values.

Thanks a lot for your discussion! I' m so interested in the "examples with recipes and survival analysis" you mentioned in this reply. But I can't find any link or resource of the examples. Would you mind to share the example? I know that this example may not be found due to too long time.

from rsample.

jroberayalas avatar jroberayalas commented on September 1, 2024

The example I'm talking about can be found here: https://rsample.tidymodels.org/articles/Applications/Survival_Analysis.html

from rsample.

github-actions avatar github-actions commented on September 1, 2024

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex https://reprex.tidyverse.org) and link to this issue.

from rsample.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.