Giter Club home page Giter Club logo

Comments (7)

alexpghayes avatar alexpghayes commented on June 25, 2024 1

In general I'm a firm believer that PCA should default to a truncated SVD implementation (either irlba or RSpectra) and only switch to a full SVD when the user requests something like num_comp > p / 4 or something like that. It would also be nice to have a randomized SVD implementation (perhaps the rsvd package) for larger datasets, perhaps as step_pca_approximate().

from embed.

dgrtwo avatar dgrtwo commented on June 25, 2024 1

This did issue get resolved in #83 or should it be kept open for more step variants?

I don't think this is resolved, since step_pca still uses full PCA by default, and the above reprex (getting 5 principal components from a dataset with 62k observations) is still slow. I agree with Alex that it can be made much faster in the common use case by making it the default:

In general I'm a firm believer that PCA should default to a truncated SVD implementation (either irlba or RSpectra) and only switch to a full SVD when the user requests something like num_comp > p / 4 or something like that

But maybe this issue belongs in the recipes package, since that's where step_pca lives?

from embed.

juliasilge avatar juliasilge commented on June 25, 2024

Related to #73

We are definitely interested in functionality like this! This is mostly implemented already so we'll get a draft PR ready and would love some feedback on it and/or more contributions. We are fairly sure we want to include this in embed, along with a Bayesian implementation of sparse PCA.

from embed.

alexpghayes avatar alexpghayes commented on June 25, 2024

Also cc @topepo https://github.com/DataSlingers/MoMA is a high quality sparse PCA implementation by Michael Weylandt (of the high quality glmnet replacement implementation)

from embed.

EmilHvitfeldt avatar EmilHvitfeldt commented on June 25, 2024

This did issue get resolved in #83 or should it be kept open for more step variants?

from embed.

topepo avatar topepo commented on June 25, 2024

I'd add an alternate PCA step here. Those package dependencies are a pita and I'd keep them here.

from embed.

github-actions avatar github-actions commented on June 25, 2024

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

from embed.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.