Comments (4)
I look to your first draft and this looks good for most AutoMl algorithm like RandomSearch, TPE, GaussianProcess, Neuroevolution/GeneticAlgo, etc.
Just to make sure, when guessing the next best params (or when fitting) does the auto_ml_strategy has access to the probability distribution class (pdf, rvs, etc.)? Since TPE needs the probability distribution function (pdf) and needs to sample from probability distributions, to guess the next best params. And other auto_ml might will probably needs to rvs at least.
As for the trial crashes, I know that hyperopt used some kind of status to know if the trial has succeeded or not. We could use some kind of success flag also. It should come from the score_function or the validation_technique which needs to output a status flag.
Most of the automl technique can run in parallel. For instance, TPE needs the past history to suggest new points, but it is still faster to run multiple run in parallel and update the hyperparams_repository less quicker than waiting for each to complete before starting the next trial. As cited by this article : "The consequence of parallelization is that each proposal x∗ is based on less feedback. This makes search less efficient, though faster in terms of wall time". For parallelization, there should have some share hyperparams_repository for any Bayesian Optimization technique (GaussianProcess, TPE, Spearmint, etc.) and probably Genetic Algo also to have access to past history.
There is only hyperband that it is a bit particular. The algo speed up the search by first allocating fewer resources (epochs) to not promising training run and to allocate more and more resources on more promising training run each times. You will not be able to do Hyperband with this abstraction, but Hyperband is not the same kind of abstraction anyway.
So I think that this is a really good starting point.
from neuraxle.
@Eric2Hamel Thanks for the thoughts!
Good catch for the error status, that would for sure be a good thing to add status to the hyperparams_repository
, perhaps that adding a try-catch (or a timeout when/if distributed?) would be interesting.
For the PDFs I thought about it and I'd need to add those methods in each of our Distribution classes, I'm not really sure of how to express/formulate those PDFs in the code of those distributions (help wanted). For now the auto_ml_strategy
already have access to those distribution classes because we pass to it the result of the call of wrapped_pipeline.get_hyperparams_space
which is an HyperparameterSpace dict containing the distribution instances.
Also yeah, probably that on suggesting the next trial to do that it could suggest a percentage of data to try on or something like that such as doing something like Hyperband, that'd be an interesting method to add to the auto_ml_strategy. However, if in Hyperband the next trials are the continuation of the first ones, then it'd perhaps need a new class different than AutoMLSequentialWrapper
. I'd like to have suggestions on this too.
You also somehow remind me that I didn't include yet the wrapped_pipeline.inspect()
method that I discussed in the conference where after training it'd be possible to save extra features of the trained model (e.g.: average and std of neurons' weights, loss' train curve and validation train curve, etc). That'd be another thing to add to the hyperparams_repository
.
from neuraxle.
from neuraxle.
@Eric2Hamel Nice! You can definitely add the pdf of the hyperparameter distributions. I'd suggest doing it for one in a first PR for a first review and then proceeding to every other ones. I'm mostly unsure of how for example Hyperopt and other algorithms would query the PDF to use it (e.g.: min and max and mean? Continuous PDF function? Discrete array given a PDF resolution? etc.).
Also, it more and more sounds like Hyperband and Hyperband-like algorithms would be another meta wrapper step think. I think that to continue training it'd need to use a kind of minibatch pipeline and limit the number of epochs and resume from there with pipeline serialization between each steps.
from neuraxle.
Related Issues (20)
- Bug: don't use `dict()` as default argument in functions. HOT 1
- Bug: Non-Parent Inits called HOT 1
- Code Style: Improve Code Style HOT 1
- Feature: 3rd Version for AutoML module. HOT 1
- Bug: Error when using SequentialQueuedPipeline due to improper 'teardown' function call HOT 1
- Feature: Use trial_hash instead of trial number when creating a log file. HOT 2
- Bug: Multiprocessing error "cannot pickle 'weakref' object" under Python 3.9 and Mac OS
- Feature: Have a step store using metaclass-based registration and DLS-based declaration. HOT 1
- Feature: uber base HOT 1
- Feature: Ditch IDs in the data container, and ditch re-hashing mechanisms
- Feature: Recursive Context instead of list context for different service cache levels HOT 2
- Testing: test the ParallelFeatureUnion with the tests of the FeatureUnion, and test the ParallelColumnTransformer with the tests of the ColumnTransformer. HOT 1
- Bug: Cleanup all of the scipy distributions that are already available within the regular distribution module
- Question: Why isn’t Neuraxle more popular in the machine learning community?
- Bug: StepSaverCallback & BestModelCheckpoint not working in version 0.7.0 HOT 1
- Bug: NumpyRavel class cannot be inistanciated.
- Feature: Additional arguments to fit method in BaseStep HOT 5
- Bug: Test files missing from `sdist` on PyPI HOT 1
- Feature: Integration with sklearn-evaluation
- Bug: Simple pipeline with sklearn StandardScaler / LinearRegression does not work
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from neuraxle.