Do something like this for meta_fit: <div class="snippet-clipboard-content notrans

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

AutoMLSequentialWrapper about neuraxle HOT 4 CLOSED

neuraxio commented on May 21, 2024

AutoMLSequentialWrapper

from neuraxle.

Comments (4)

Eric2Hamel commented on May 21, 2024

I look to your first draft and this looks good for most AutoMl algorithm like RandomSearch, TPE, GaussianProcess, Neuroevolution/GeneticAlgo, etc.

Just to make sure, when guessing the next best params (or when fitting) does the auto_ml_strategy has access to the probability distribution class (pdf, rvs, etc.)? Since TPE needs the probability distribution function (pdf) and needs to sample from probability distributions, to guess the next best params. And other auto_ml might will probably needs to rvs at least.

As for the trial crashes, I know that hyperopt used some kind of status to know if the trial has succeeded or not. We could use some kind of success flag also. It should come from the score_function or the validation_technique which needs to output a status flag.

Most of the automl technique can run in parallel. For instance, TPE needs the past history to suggest new points, but it is still faster to run multiple run in parallel and update the hyperparams_repository less quicker than waiting for each to complete before starting the next trial. As cited by this article : "The consequence of parallelization is that each proposal x∗ is based on less feedback. This makes search less efficient, though faster in terms of wall time". For parallelization, there should have some share hyperparams_repository for any Bayesian Optimization technique (GaussianProcess, TPE, Spearmint, etc.) and probably Genetic Algo also to have access to past history.

There is only hyperband that it is a bit particular. The algo speed up the search by first allocating fewer resources (epochs) to not promising training run and to allocate more and more resources on more promising training run each times. You will not be able to do Hyperband with this abstraction, but Hyperband is not the same kind of abstraction anyway.

So I think that this is a really good starting point.

from neuraxle.

guillaume-chevalier commented on May 21, 2024

@Eric2Hamel Thanks for the thoughts!

Good catch for the error status, that would for sure be a good thing to add status to the hyperparams_repository, perhaps that adding a try-catch (or a timeout when/if distributed?) would be interesting.

For the PDFs I thought about it and I'd need to add those methods in each of our Distribution classes, I'm not really sure of how to express/formulate those PDFs in the code of those distributions (help wanted). For now the auto_ml_strategy already have access to those distribution classes because we pass to it the result of the call of wrapped_pipeline.get_hyperparams_space which is an HyperparameterSpace dict containing the distribution instances.

Also yeah, probably that on suggesting the next trial to do that it could suggest a percentage of data to try on or something like that such as doing something like Hyperband, that'd be an interesting method to add to the auto_ml_strategy. However, if in Hyperband the next trials are the continuation of the first ones, then it'd perhaps need a new class different than AutoMLSequentialWrapper. I'd like to have suggestions on this too.

You also somehow remind me that I didn't include yet the wrapped_pipeline.inspect() method that I discussed in the conference where after training it'd be possible to save extra features of the trained model (e.g.: average and std of neurons' weights, loss' train curve and validation train curve, etc). That'd be another thing to add to the hyperparams_repository.

from neuraxle.

Eric2Hamel commented on May 21, 2024

Don't worry about de pdf, I will help you and this is local code so it is easier with the time I have to implement it. Yeah could be a try, except and also a timeout could be a good thing. About hyperband, it tries a lot of trial with not a lot of resources (for exemple 20epochs) take the a pourcentage of the best (for example half), retried that best for more epoch (for example (40 epochs) and retake a pourcentage of the best on and one until only one trial win. This needs to access to the number or resources like number of epoch and start trial which i think is not the same abstraction. Le jeu. 5 sept. 2019 15 h 45, Guillaume Chevalier <[email protected]> a écrit :

…

@Eric2Hamel <https://github.com/Eric2Hamel> Thanks for the thoughts! Good catch for the error status, that would for sure be a good thing to add status to the hyperparams_repository, perhaps that adding a try-catch (or a timeout when/if distributed?) would be interesting. For the PDFs I thought about it and I'd need to add those methods in each of our Distribution <https://www.neuraxle.neuraxio.com/stable/api/neuraxle.hyperparams.distributions.html#hyperparameter-distributions> classes, I'm not really sure of how to express/formulate those PDFs in the code of those distributions (help wanted). For now the auto_ml_strategy already have access to those distribution classes because we pass to it the result of the call of wrapped_pipeline.get_hyperparams_space which is an HyperparameterSpace <https://www.neuraxle.neuraxio.com/stable/api/neuraxle.hyperparams.space.html> dict containing the distribution instances. Also yeah, probably that on suggesting the next trial to do that it could suggest a percentage of data to try on or something like that such as doing something like Hyperband, that'd be an interesting method to add to the auto_ml_strategy. However, if in Hyperband the next trials are the continuation of the first ones, then it'd perhaps need a new class different than AutoMLSequentialWrapper. I'd like to have suggestions on this too. You also somehow remind me that I didn't include yet the wrapped_pipeline.inspect() method that I discussed in the conference where after training it'd be possible to save extra features of the trained model (e.g.: average and std of neurons' weights, loss' train curve and validation train curve, etc). That'd be another thing to add to the hyperparams_repository. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#45?email_source=notifications&email_token=AMADFS7VG3JGYQSA57ST5BDQIFOVZA5CNFSM4ITVTDV2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6AQQ5Y#issuecomment-528550007>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AMADFSZO6UIOC5L6Q4KSLX3QIFOVZANCNFSM4ITVTDVQ> .

from neuraxle.

guillaume-chevalier commented on May 21, 2024

@Eric2Hamel Nice! You can definitely add the pdf of the hyperparameter distributions. I'd suggest doing it for one in a first PR for a first review and then proceeding to every other ones. I'm mostly unsure of how for example Hyperopt and other algorithms would query the PDF to use it (e.g.: min and max and mean? Continuous PDF function? Discrete array given a PDF resolution? etc.).

Also, it more and more sounds like Hyperband and Hyperband-like algorithms would be another meta wrapper step think. I think that to continue training it'd need to use a kind of minibatch pipeline and limit the number of epochs and resume from there with pipeline serialization between each steps.

from neuraxle.

AutoMLSequentialWrapper about neuraxle HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent