Comments (7)
In general I'm a firm believer that PCA should default to a truncated SVD implementation (either irlba
or RSpectra
) and only switch to a full SVD when the user requests something like num_comp > p / 4
or something like that. It would also be nice to have a randomized SVD implementation (perhaps the rsvd
package) for larger datasets, perhaps as step_pca_approximate()
.
from embed.
This did issue get resolved in #83 or should it be kept open for more step variants?
I don't think this is resolved, since step_pca
still uses full PCA by default, and the above reprex (getting 5 principal components from a dataset with 62k observations) is still slow. I agree with Alex that it can be made much faster in the common use case by making it the default:
In general I'm a firm believer that PCA should default to a truncated SVD implementation (either irlba or RSpectra) and only switch to a full SVD when the user requests something like num_comp > p / 4 or something like that
But maybe this issue belongs in the recipes package, since that's where step_pca lives?
from embed.
Related to #73
We are definitely interested in functionality like this! This is mostly implemented already so we'll get a draft PR ready and would love some feedback on it and/or more contributions. We are fairly sure we want to include this in embed, along with a Bayesian implementation of sparse PCA.
from embed.
Also cc @topepo https://github.com/DataSlingers/MoMA is a high quality sparse PCA implementation by Michael Weylandt (of the high quality glmnet
replacement implementation)
from embed.
This did issue get resolved in #83 or should it be kept open for more step variants?
from embed.
I'd add an alternate PCA step here. Those package dependencies are a pita and I'd keep them here.
from embed.
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.
from embed.
Related Issues (20)
- Cannot `update()` a tuneable `step_umap()` HOT 2
- Create groupings in the reference of pkgdown HOT 1
- Poisson models fail for likelihood encodings HOT 2
- Release embed 0.2.0 HOT 1
- FR: For each of the UMAP clusters, information/ID on values (from which columns) assigned to which UMAP clusters would be nice HOT 6
- step_umap crashing Rstudio HOT 18
- catboost method to embed categorical variables HOT 11
- Release embed 1.0.0 HOT 1
- step_woe errors uninformatively if outcome isn't a factor HOT 2
- Allow step_collapse_stringdist to accept different distance methods HOT 2
- Metrice argument for step_umap function HOT 2
- Custom metric for step_umap HOT 2
- Upkeep for embed HOT 1
- remove tidyr_new_interface() check HOT 1
- Test that all tunable.step_*() are specified correctly HOT 1
- Use rlang errors HOT 1
- step_embed() should have `keep_original_cols` argument HOT 1
- Release embed 1.1.0 HOT 1
- Add missing infrastructure tests HOT 1
- Release embed 1.1.1 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from embed.