Comments (6)
I think this is great, but would imitate a flaw of the vanilla sklearn design: you don't specify which variables/covariates the transformer ought to be applied to.
Makes sense to apply it in default to "all applicable" (e.g., one-hot to all categorical), but that's not always what one wants (e.g., PCA only to the variables coming from the questionnaire etc).
Potential solution is to specify pairs, transformer & variable names?
from mlj.jl.
Good point! However, making a column-selective transformer is, in my view, just a different kind of learning network, for which we could have another macro (or composite model type). So, e.g., you would do:
pca = PCA(n=2) # instantiate a PCA model
restricted_pca = @restrict pca features=[:x1, :x2, :x3, :x4] # defines and instantiates a restricted PCA model
composite_model = @pipeline restricted_pca random_forest # defines and instantiates transformer-predictor composite
from mlj.jl.
But would that not kill all other features entirely rather than just apply PCA only to these?
from mlj.jl.
No, no. The incoming data into two, applies PCA to one part, then reassembles. Maybe "restrict" is a bad name. Perhaps "selective" is better. I am not going into the implementation here.
from mlj.jl.
Ah, makes sense.
Though if you use "select" it would leave open how to tell the pipeline to transform resulting variables, i.e., new ones that are produced by the first transformer.
There's at least one of the two issues, depending on the design:
(a) "by default apply to all, sequentially" - this has the problem that a simple pipeline either kills variables, or has problems with selection
(b) "by default, add variables" - this has the problem that it it not straightforward how to chain transformers, since you would need to refer to the resulting variables somehow
Your comment indicates that you favour (b)? Or, do you have an idea which altogether avoids the issues? E.g., a default convention for output variables?
from mlj.jl.
Implemented some time ago. Query ?@pipeline
for details.
from mlj.jl.
Related Issues (20)
- Confusing Julia code in adding_models_for_general_use.md HOT 1
- Include MLJBalancing.jl in MLJ and re-export it's names.
- Update docs for new class imbalance support
- Add new sk-learn models to the docs
- Export the name `MLJFlow` HOT 1
- `evaluate` errors HOT 3
- Add AutoEncoderMLJ model (part of BetaML) HOT 10
- need a tutorial for using logger with dagshub and mlflow HOT 4
- Document how to add plot recipes in a new model implementation HOT 4
- Add new model descriptors to fix doc-generation fail HOT 1
- Two models fail integration tests but defy isolation
- Update list of BetaML models HOT 1
- Reinstate CatBoost integraton test
- Upate ROADMAP.md HOT 1
- Improve documentation by additional hierarchy HOT 5
- Include support for MixedModels.jl HOT 2
- Deserialisation fails for wrappers like `TunedModel` when atomic model overloads `save/restore` HOT 2
- feature_importances for Pipeline including XGBoost don't work HOT 2
- Current performance evaluation objects, recently added to TunedModel histories, are too big HOT 2
- Update cheat sheet instance of depracated `@from_network` code
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mlj.jl.