Giter Club home page Giter Club logo

dehb's Introduction

DEHB: Evolutionary Hyperband for Scalable, Robust and Efficient Hyperparameter Optimization

License Tests docs Coverage Status PyPI Static Badge

Installation

# from pypi
pip install dehb

# to run examples, install from github
git clone https://github.com/automl/DEHB.git
pip install -e DEHB  # -e stands for editable, lets you modify the code and rerun things

Tutorials/Example notebooks

To run PyTorch example: (note additional requirements)

python examples/03_pytorch_mnist_hpo.py \
    --min_budget 1 \
    --max_budget 3 \
    --runtime 60 \
    --verbose

Running DEHB in a parallel setting

DEHB has been designed to interface a Dask client. DEHB can either create a Dask client during instantiation and close/kill the client during garbage collection. Or a client can be passed as an argument during instantiation.

  • Setting n_workers during instantiation
    If set to 1 (default) then the entire process is a sequential run without invoking Dask.
    If set to >1 then a Dask Client is initialized with as many workers as n_workers.
    This parameter is ignored if client is not None.
  • Setting client during instantiation
    When None (default), a Dask client is created using n_workers specified.
    Else, any custom-configured Dask Client can be created and passed as the client argument to DEHB.

Using GPUs in a parallel run

Certain target function evaluations (especially for Deep Learning) require computations to be carried out on GPUs. The GPU devices are often ordered by device ID and if not configured, all spawned worker processes access these devices in the same order and can either run out of memory or not exhibit parallelism.

For n_workers>1 and when running on a single node (or local), the single_node_with_gpus can be passed to the run() call to DEHB. Setting it to False (default) has no effect on the default setup of the machine. Setting it to True will reorder the GPU device IDs dynamically by setting the environment variable CUDA_VISIBLE_DEVICES for each worker process executing a target function evaluation. The re-ordering is done in a manner that the first priority device is the one with the least number of active jobs assigned to it by that DEHB run.

To run the PyTorch MNIST example on a single node using 2 workers:

python examples/03_pytorch_mnist_hpo.py \
    --min_budget 1 \
    --max_budget 3 \
    --runtime 60 \
    --n_workers 2 \
    --single_node_with_gpus \
    --verbose

Multi-node runs

Multi-node parallelism is often contingent on the cluster setup to be deployed on. Dask provides useful frameworks to interface various cluster designs. As long as the client passed to DEHB during instantiation is of type dask.distributed.Client, DEHB can interact with this client and distribute its optimization process in a parallel manner.

For instance, Dask-CLI can be used to create a dask-scheduler which can dump its connection details to a file on a cluster node accessible to all processes. Multiple dask-worker can then be created to interface the dask-scheduler by connecting to the details read from the file dumped. Each dask-worker can be triggered on any remote machine. Each worker can be configured as required, including mapping to specific GPU devices.

Some helper scripts can be found here, that can be used as a reference to run DEHB in a multi-node manner on clusters managed by SLURM. (not expected to work off-the-shelf)

To run the PyTorch MNIST example on a multi-node setup using 4 workers:

bash utils/run_dask_setup.sh \
    -f dask_dump/scheduler.json \  # This is how the workers will be discovered by DEHB
    -e env_name \
    -n 4

# Make sure to sleep to allow the workers to setup properly
sleep 5
python examples/03_pytorch_mnist_hpo.py \
    --min_budget 1 \
    --max_budget 3 \
    --runtime 60 \
    --scheduler_file dask_dump/scheduler.json \
    --verbose

DEHB Hyperparameters

We recommend the default settings. The default settings were chosen based on ablation studies over a collection of diverse problems and were found to be generally useful across all cases tested. However, the parameters are still available for tuning to a specific problem.

The Hyperband components:

  • min_budget: Needs to be specified for every DEHB instantiation and is used in determining the budget spacing for the problem at hand.
  • max_budget: Needs to be specified for every DEHB instantiation. Represents the full-budget evaluation or the actual black-box setting.
  • eta: (default=3) Sets the aggressiveness of Hyperband's aggressive early stopping by retaining 1/eta configurations every round

The DE components:

  • strategy: (default=rand1_bin) Chooses the mutation and crossover strategies for DE. rand1 represents the mutation strategy while bin represents the binomial crossover strategy.
    Other mutation strategies include: {rand2, rand2dir, best, best2, currenttobest1, randtobest1}
    Other crossover strategies include: {exp}
    Mutation and crossover strategies can be combined with a _ separator, for e.g.: rand2dir_exp.
  • mutation_factor: (default=0.5) A fraction within [0, 1] weighing the difference operation in DE
  • crossover_prob: (default=0.5) A probability within [0, 1] weighing the traits from a parent or the mutant

To cite the paper or code

@inproceedings{awad-ijcai21,
  author    = {N. Awad and N. Mallik and F. Hutter},
  title     = {{DEHB}: Evolutionary Hyberband for Scalable, Robust and Efficient Hyperparameter Optimization},
  pages     = {2147--2153},
  booktitle = {Proceedings of the Thirtieth International Joint Conference on
               Artificial Intelligence, {IJCAI-21}},
  publisher = {ijcai.org},
  editor    = {Z. Zhou},
  year      = {2021}
}

dehb's People

Contributors

aron-bram avatar bronzila avatar delaunay avatar eddiebergman avatar goktug97 avatar neeratyoy avatar noorawad avatar nzw0301 avatar phmueller avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dehb's Issues

Introduce IDs for evaluated configurations

Some advantages of assigning an unique ID to each configuration could be:

  • Better logging (e.g. better readability, result dicts for each configuration)
  • Checkpointing (and restarting) an optimization run

Regarding the implementation, I would suggest using incrementing IDs. However we would need to implement a check everytime we sample/generate a new configuration, testing if it has been used before. For this check I suggest to use the vector representation of the configuration, so that both using and not using ConfigSpace is supported out of the box.

Using an element-wise comparison between each hyperparameter specified in the config could potentially be costly, namely O(n*d) if n is the number of sampled configs and d is the dimensionality of the configuration (number of hyperparameters). However since d tends to be rather small, I think this will not be a drastic computational overhead.

An alternative could be hashing the configurations and then using this hash as an ID directly, which would result in a faster comparison. However hash collisions could map different configs to the same ID.

Improve readability of DEHB examples and docstrings

It will be useful to have more elaborate filler text in the notebooks serving as examples for using DEHB.

Also, it will be useful to have more docstrings that help explain DEHB function arguments. The example notebooks should also highlight how information on the arguments can be retrieved, for example as ?dehb.run.

Implement versioning for documentation

Giving the user the opportunity to select the documentation of the specfific package version they are using would increase the package quality and usability.

Always training from scratch?

Hello! I'm interested in using DEHB for HPO of neural networks.
But I couldn't find any code related to model checkpointing. Does training for every budget start from scratch?

Updating and populating the documentation

Currently our documentation solely consists of the documentation extracted from doc strings. Especially a landing page with a project overview, instructions on how to install the package and maybe a quick example on how to use it would greatly improve the documentation.

Handling min_budget = max_budget situation

When specifying the minimum budget equal to the maximum budget, DEHB performs plain DE. There are two potential ways of dealing with this:

  1. Break and throw an error communicating to the user, that this setup would result in plain DE and therefore it wont be done or
  2. Return a warning and perform DE with a population consisting of one individual on the max_budget. However only random configs are used for mutation

Personally, I would prefer option 2, since I do not feel the need to restrict the user. However communicating, that this setup does not perform DEHB should be done.

Edit: Currently we break and throw an error. However if we skip the assertion it runs through normally --> TODO: Check what happens under the hood

Using Dask ```Client``` as context manager

As described by @eddiebergman in #45:

Relying on __del__ for for cleanup is usually a bad practice and can lead to problems if DEHB would be the last thing to be cleaned up.
The correct way, is that Dask Clients can be used as context managers and is likely what you want to have around your hot loop. In the case of a user supplied dask, not sure what you want, might be bad form to automatically shut down their client unless they've specified it somehow.

Before implementing this, we should definitely discuss how we should handle the situation where the user supplies their client. I would agree with Eddie, that simply shutting it down might be bad form.

Adhering to computation cost budget better

The current implementation waits for all started jobs when the runtime budget is exhausted. This does make sense when using function evaluations or number of iterations as budget, but not when specifying the maximum computation cost in seconds.

Toy failure mode:
The computational budget is 1h, but a new job, that would e.g. take 30 mins, is submitted after 59 mins of optimization. Then the optimizer would wait for this job to finish and therefore overshoot the maxmimum computational budget of 1h.

For now a quick fix could be simply stopping all workers when the runtime budget is exhausted, however this would result in potentially lost compute time. Therefore it might also be interesting to think of a way to checkpoint the optimizers state in order to resume training.

Improve doc strings and add type annotations

Improve docstrings of DEHB to fit the Google Style Guide for python comments. When adjusting the doc strings it would make sense to adjust the signatures to add type annotations. These changes would improve the general readability of the code.

[Question] Understanding how the parent pool is generated

In the part of mutation when the parent pool is filled with candidates generated using the global population, it appears to me this is carried out in the following steps:

  1. Promote the top K individuals from the lower budget
  2. Set M equal to the number of additional candidates that need to be generated for the chosen mutation strategy (e.g. 3 - K in the case of rand1)
  3. Construct the "global population" by taking the union of all budget subpopulations
  4. Distil the global population to a smaller size M by performing DE mutation M times using the global population, and then append each mutant vector to the parent pool
  5. Perform mutation once using the parent pool

4.2 of the paper gave me the impression that "sampled from the global population pool" meant chosen at random, but the code seems to perform a prior round of mutation to generate parents, particularly at the highest budgets where very few configurations are promoted.

DE Optimiser return syntax is deprecated

Dear @Neeratyoy, please see the log from running the DE optimiser:

/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/dehb/optimizers/de.py:810: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
return np.array(self.traj), np.array(self.runtime), np.array(self.history)

Support for python 3.7

Hello!

Is there any issue with current implementation to support python 3.7? DEHB used to work with Oríon using 3.7 until last weekend. For some reason pip was not enforcing the dependency on '>=3.8' since February and started doing so previous weekend. Would it be possible to change the python dependency to '>=3.7' or some changes would be required to DEHB's code?

Thank you!

Configuration and budgets in each brackets

Hi, is it possible to add in how many configuration and budget is observed for each bracket?
For easier to look and manipulate.

I have a question, if I would want to set 100 models with resources of 100 epochs. How should I manipulate the parameters?

DEHB and non-async DE optimiser update design

Dear @Neeratyoy and @noorawad, according to the paper "DEHB: Evolutionary Hyperband for Scalable, Robust, and Efficient Hyperparameter Optimization," DEHB employs an immediate update design for DE by utilising the AsyncDE class. However, despite passing the async strategy argument in DEHB, it does not change the async strategy of the AsyncDE class, which by default is 'deferred.'

As I understand, the DE class currently only uses deferred updates. If one would like to compare DEHB and its DE component as separate optimisers for optimising a model's validation score, would the update design of deferred vs immediate have a significant impact on the final validation score or simply the wall-clock time? I'd like to compare DE and DEHB using the same update design, and the paper suggests that the immediate update design outperforms deferred DE.

Thank you.

Upgrade ConfigSpace

The current version of DEHB uses version 0.4.16 of ConfigSpace which doesn't package any wheels which could cause issues as it then has to be built on the users end when doing pip install dehb.

The newest versions of ConfigSpace, namely v0.5.x and v0.6.0 are compatible with the old one and now ship the wheels, which should make it easier. The wheels come with v0.5.0 and v0.6.0 give a much easier interface for defining search spaces.

Continuing configuration evolution when run budget is exhausted.

Note: This issues is only relevant when the Config ID changes have been merged.

While waiting to fetch all results when the run budget is exhausted, we keep sampling/evolving new configurations:

if self.is_worker_available():
    job_info = self._get_next_job()
    if brackets is not None and job_info["bracket_id"] >= brackets:
        # ignore submission and only collect results

While this was not an issue before, we would end up filling the ConfigRepository with unnecessary configurations, which will never be evaluated. We should think about how to redesign this.

Keep active brackets in a dictionary

Changing the list of active brackets to a dictionary with the bracket_id as keys would make the code on some occasions more readable. For example this:

# pass information of job submission to Bracket Manager
for bracket in self.active_brackets:
    if bracket.bracket_id == job_info['bracket_id']:
        # registering is IMPORTANT for Bracket Manager to perform SH
        bracket.register_job(job_info['budget'])
        break

Could then be rewritten as:

bracket_id = job_info['bracket_id']
self.active_brackets[bracket_id].register_job(job_info['budget'])

This would also have runtime advantages, however these are rather small, given that our active_brackets list is mostly small.

To clean up the dictionary we can then simlpy iterate over the key-value pairs and delete the brackets, that are already done.

Checkpointing optimization run

A nice addition to the package could be checkpointing the state of the optimizer in order to restart the optimization e.g. when an optimization run has finished or crashed.

This could also be interesting for issue #30 in order to restart the optimization with a higher computational budget.

PR #6 has already touched this topic and it might make sense to get some inspiration form there.

We could then also adjust the pytorch example to feature this.

Example on how to train an RF leaks data

Specifically

train_X, test_X, train_y, test_y = train_test_split(
    _data.get("data"), 
    _data.get("target"), 
    test_size=0.1, 
    shuffle=True, 
    random_state=seed
)
train_X, valid_X, train_y, valid_y = train_test_split(
    _data.get("data"), 
    _data.get("target"), 
    test_size=0.3, 
    shuffle=True, 
    random_state=seed
)

Should be:

train_x, rest_x, train_y, rest_y = train_test_split(
  _data.get("data"), 
  _data.get("target"), 
  train_size=0.6, 
  shuffle=True, 
  random_state=seed
)

# 50% of the 40% of the data left over from above (20%) 
valid_x, test_x, valid_y, test_y = train_test_split(
  rest_x, rest_y,
  test_size=0.5, 
  shuffle=True, 
  random_state=seed
)

.. or however you like, as long as the same data that will be used is not re-split multiple times

Implement parallel DE

Currently we only support to run DEHB in parallel, while DE can only be run with a single worker.

It would be a nice addition to also implement the parallel setup for DE, so that our library supports both parallel DEHB and DE.

No support for Constant type in vector_to_configspace() and configspace_to_vector()

Hello,

I've been working with DEHB on my dataset and came across an issue with the processing of ConfigSpace in the de.py file. Specifically, in the functions vector_to_configspace() and configspace_to_vector(), there seems to be no option or support for handling the ConfigSpace Constant type.

Is this an oversight or intentional? If it's the latter, should I avoid including the Constant type in my ConfigSpace or is there a recommended workaround?

Thanks for your assistance!

Continuously develop unit tests

In order to ensure, that the mechanics of DEHB function correctly (especially after changes/refactorings), it would be helpful to develop unit tests.

These could be divided into three major categories:

  1. DEHB
  2. Bracket Manager
  3. DE

I would prioritize these test as given by the order of the list above, since DEHB itself and the bracket manager are more likely to be adjusted in the future and therefore the unit tests will act like a safety net to ensure the components still work as they are supposed to.

Depending on which DE implementation we used, there might even be some unit tests in the original repo or did you write the DE code yourself? @Neeratyoy

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.