automl / dehb Goto Github PK
View Code? Open in Web Editor NEWHome Page: https://automl.github.io/DEHB/
License: Apache License 2.0
Home Page: https://automl.github.io/DEHB/
License: Apache License 2.0
Apparently, the hanging happens when we have any pending tasks and the hanging happens in this loop.
Currently our documentation solely consists of the documentation extracted from doc strings. Especially a landing page with a project overview, instructions on how to install the package and maybe a quick example on how to use it would greatly improve the documentation.
When specifying the minimum budget equal to the maximum budget, DEHB performs plain DE. There are two potential ways of dealing with this:
Personally, I would prefer option 2, since I do not feel the need to restrict the user. However communicating, that this setup does not perform DEHB should be done.
Edit: Currently we break and throw an error. However if we skip the assertion it runs through normally --> TODO: Check what happens under the hood
The current implementation waits for all started jobs when the runtime budget is exhausted. This does make sense when using function evaluations or number of iterations as budget, but not when specifying the maximum computation cost in seconds.
Toy failure mode:
The computational budget is 1h, but a new job, that would e.g. take 30 mins, is submitted after 59 mins of optimization. Then the optimizer would wait for this job to finish and therefore overshoot the maxmimum computational budget of 1h.
For now a quick fix could be simply stopping all workers when the runtime budget is exhausted, however this would result in potentially lost compute time. Therefore it might also be interesting to think of a way to checkpoint the optimizers state in order to resume training.
As described by @eddiebergman in #45:
Relying on __del__ for for cleanup is usually a bad practice and can lead to problems if DEHB would be the last thing to be cleaned up.
The correct way, is that Dask Clients can be used as context managers and is likely what you want to have around your hot loop. In the case of a user supplied dask, not sure what you want, might be bad form to automatically shut down their client unless they've specified it somehow.
Before implementing this, we should definitely discuss how we should handle the situation where the user supplies their client. I would agree with Eddie, that simply shutting it down might be bad form.
In the part of mutation when the parent pool is filled with candidates generated using the global population, it appears to me this is carried out in the following steps:
4.2 of the paper gave me the impression that "sampled from the global population pool" meant chosen at random, but the code seems to perform a prior round of mutation to generate parents, particularly at the highest budgets where very few configurations are promoted.
Hello! I'm interested in using DEHB for HPO of neural networks.
But I couldn't find any code related to model checkpointing. Does training for every budget start from scratch?
Hello!
Is there any issue with current implementation to support python 3.7? DEHB used to work with Oríon using 3.7 until last weekend. For some reason pip was not enforcing the dependency on '>=3.8' since February and started doing so previous weekend. Would it be possible to change the python dependency to '>=3.7' or some changes would be required to DEHB's code?
Thank you!
Currently we only support to run DEHB in parallel, while DE can only be run with a single worker.
It would be a nice addition to also implement the parallel setup for DE, so that our library supports both parallel DEHB and DE.
A nice addition to the package could be checkpointing the state of the optimizer in order to restart the optimization e.g. when an optimization run has finished or crashed.
This could also be interesting for issue #30 in order to restart the optimization with a higher computational budget.
PR #6 has already touched this topic and it might make sense to get some inspiration form there.
We could then also adjust the pytorch example to feature this.
Note: This issues is only relevant when the Config ID changes have been merged.
While waiting to fetch all results when the run budget is exhausted, we keep sampling/evolving new configurations:
if self.is_worker_available():
job_info = self._get_next_job()
if brackets is not None and job_info["bracket_id"] >= brackets:
# ignore submission and only collect results
While this was not an issue before, we would end up filling the ConfigRepository
with unnecessary configurations, which will never be evaluated. We should think about how to redesign this.
The CHANGELOG file was added after DEHB was already available as a package.
Would be nice to have the previous releases as old entries into the CHANGELOG for the sake of completeness.
Hello,
I've been working with DEHB on my dataset and came across an issue with the processing of ConfigSpace in the de.py
file. Specifically, in the functions vector_to_configspace()
and configspace_to_vector()
, there seems to be no option or support for handling the ConfigSpace Constant
type.
Is this an oversight or intentional? If it's the latter, should I avoid including the Constant
type in my ConfigSpace or is there a recommended workaround?
Thanks for your assistance!
Dear @Neeratyoy, please see the log from running the DE optimiser:
/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/dehb/optimizers/de.py:810: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
return np.array(self.traj), np.array(self.runtime), np.array(self.history)
The current version of DEHB uses version 0.4.16
of ConfigSpace which doesn't package any wheels which could cause issues as it then has to be built on the users end when doing pip install dehb
.
The newest versions of ConfigSpace
, namely v0.5.x
and v0.6.0
are compatible with the old one and now ship the wheels, which should make it easier. The wheels come with v0.5.0
and v0.6.0
give a much easier interface for defining search spaces.
The DEHB paper says that each SH bracket samples
In reality, this line should be:
n0 = int(np.ceil(self.max_SH_iter / (s + 1) * self.eta ** s))
Note that self.max_SH_iter
is
Giving the user the opportunity to select the documentation of the specfific package version they are using would increase the package quality and usability.
Specifically
train_X, test_X, train_y, test_y = train_test_split(
_data.get("data"),
_data.get("target"),
test_size=0.1,
shuffle=True,
random_state=seed
)
train_X, valid_X, train_y, valid_y = train_test_split(
_data.get("data"),
_data.get("target"),
test_size=0.3,
shuffle=True,
random_state=seed
)
Should be:
train_x, rest_x, train_y, rest_y = train_test_split(
_data.get("data"),
_data.get("target"),
train_size=0.6,
shuffle=True,
random_state=seed
)
# 50% of the 40% of the data left over from above (20%)
valid_x, test_x, valid_y, test_y = train_test_split(
rest_x, rest_y,
test_size=0.5,
shuffle=True,
random_state=seed
)
.. or however you like, as long as the same data that will be used is not re-split multiple times
Hi,
I'm using DEHB to find parameters for several models running in a loop. During execution I get the following error which effectively terminates the program.
TimeoutError: No valid workers found
Monitoring CPU usage, I see that utilization for the python process gradually climbs to 100%, even though each run only uses 2 workers, and the host computer has 8 cores.
The code instantiating the DEHB instance is this
dehb = DEHB(
f = self.objective,
cs = self.conf_space,
dimensions = len(self.conf_space.get_hyperparameters()),
min_budget = 2,
max_budget = 10,
n_workers = 2,
output_path="./temp"
)
trajectory, runtime, history = dehb.run(
total_cost = self.total_cost,
verbose = False,
save_intermediate = False,
seed = self.seed,
train_X = train_X,
train_y = train_y,
valid_X = valid_X,
valid_y = valid_y,
max_budget = dehb.max_budget,
save_history = False,
)
The rest of the code is roughly taken from the random forest example found here: https://automl.github.io/DEHB/latest/examples/01.1_Optimizing_RandomForest_using_DEHB/
Here is the full error right before the No valid worker found error:
2024-05-13 16:53:36,517 - distributed.worker - ERROR - Failed to communicate with scheduler during heartbeat.
Traceback (most recent call last):
File "/opt/homebrew/lib/python3.11/site-packages/distributed/comm/tcp.py", line 225, in read
frames_nosplit_nbytes_bin = await stream.read_bytes(fmt_size)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
tornado.iostream.StreamClosedError: Stream is closed
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/homebrew/lib/python3.11/site-packages/distributed/worker.py", line 1252, in heartbeat
response = await retry_operation(
^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/distributed/utils_comm.py", line 455, in retry_operation
return await retry(
^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/distributed/utils_comm.py", line 434, in retry
return await coro()
^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/distributed/core.py", line 1394, in send_recv_from_rpc
return await send_recv(comm=comm, op=key, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/distributed/core.py", line 1153, in send_recv
response = await comm.read(deserializers=deserializers)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/distributed/comm/tcp.py", line 237, in read
convert_stream_closed_error(self, e)
File "/opt/homebrew/lib/python3.11/site-packages/distributed/comm/tcp.py", line 142, in convert_stream_closed_error
raise CommClosedError(f"in {obj}: {exc}") from exc
distributed.comm.core.CommClosedError: in <TCP (closed) ConnectionPool.heartbeat_worker local=tcp://127.0.0.1:54877 remote=tcp://127.0.0.1:54831>: Stream is closed
2024-05-13 16:53:37,512 - distributed.nanny - WRN - Restarting worker
2024-05-13 16:53:37,523 - distributed.nanny - WRN - Restarting worker
2024-05-13 16:54:11,726 - distributed.core - ERR - Exception while handling op scatter
Traceback (most recent call last):
File "/opt/homebrew/lib/python3.11/site-packages/distributed/core.py", line 969, in _handle_comm
result = await result
^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/distributed/scheduler.py", line 6022, in scatter
raise TimeoutError("No valid workers found")
TimeoutError: No valid workers found
Improve docstrings of DEHB to fit the Google Style Guide for python comments. When adjusting the doc strings it would make sense to adjust the signatures to add type annotations. These changes would improve the general readability of the code.
Here the deprecated np.int
type of numpy is used, when initalizing DEHB without a configspace object. This is deprecated since numpy v1.20 and should be replaced by either np.int32
or np.int64
.
In order to ensure, that the mechanics of DEHB function correctly (especially after changes/refactorings), it would be helpful to develop unit tests.
These could be divided into three major categories:
I would prioritize these test as given by the order of the list above, since DEHB itself and the bracket manager are more likely to be adjusted in the future and therefore the unit tests will act like a safety net to ensure the components still work as they are supposed to.
Depending on which DE implementation we used, there might even be some unit tests in the original repo or did you write the DE code yourself? @Neeratyoy
Some advantages of assigning an unique ID to each configuration could be:
Regarding the implementation, I would suggest using incrementing IDs. However we would need to implement a check everytime we sample/generate a new configuration, testing if it has been used before. For this check I suggest to use the vector representation of the configuration, so that both using and not using ConfigSpace
is supported out of the box.
Using an element-wise comparison between each hyperparameter specified in the config could potentially be costly, namely O(n*d)
if n is the number of sampled configs and d is the dimensionality of the configuration (number of hyperparameters). However since d tends to be rather small, I think this will not be a drastic computational overhead.
An alternative could be hashing the configurations and then using this hash as an ID directly, which would result in a faster comparison. However hash collisions could map different configs to the same ID.
It will be useful to have more elaborate filler text in the notebooks serving as examples for using DEHB.
Also, it will be useful to have more docstrings that help explain DEHB function arguments. The example notebooks should also highlight how information on the arguments can be retrieved, for example as ?dehb.run
.
For now we should limit Numpy dependency to be <2.0
.
Given we manipulate arrays and they form the crucial data structures for the algorithm, we should explicitly test what breaks and how significant changes are required for the new Numpy version.
Please see this reproducible example and output, I believe it's becuase the configspace is never seeded:
from __future__ import annotations
from ConfigSpace import ConfigurationSpace
from dehb import DEHB
cs = ConfigurationSpace({"a": (1.0, 10.0)})
def f(config, fidelity):
print(config, fidelity)
return {"fitness": config["a"] ** 2, "cost": 1.0, "info": {}}
D = DEHB(cs, f=f, seed=1, min_fidelity=1, max_fidelity=100, n_workers=1)
D.run(1)
D = DEHB(cs, f=f, seed=1, min_fidelity=1, max_fidelity=100, n_workers=1)
D.run(1)
Found this while looking for where the Client is located.
Line 254 in 54ce41c
Hello,
since I wanted to use DEHB in order to optimize our hyperparameters. I thought it would be great, if this package would be pip-installable. Via "pip install --editable git+https://github.com/automl/DEHB#egg=DEHB"
Are there any plans for doing so?
Hi,
The dehb package is producing a large amount of logs into stdout, and they're not contained within the dehb namespace, but rather polluting the global namespace, which makes it difficult to read logs coming from any host program.
Is there anyway to turn off this behavior and only have it log errors like other programs?
Thanks,
Hi, is it possible to add in how many configuration and budget is observed for each bracket?
For easier to look and manipulate.
I have a question, if I would want to set 100 models with resources of 100 epochs. How should I manipulate the parameters?
Similar to SMAC, an ask-and-tell interface would make DEHB more flexible and potentially easier to use.
Changing the list of active brackets to a dictionary with the bracket_id
as keys would make the code on some occasions more readable. For example this:
# pass information of job submission to Bracket Manager
for bracket in self.active_brackets:
if bracket.bracket_id == job_info['bracket_id']:
# registering is IMPORTANT for Bracket Manager to perform SH
bracket.register_job(job_info['budget'])
break
Could then be rewritten as:
bracket_id = job_info['bracket_id']
self.active_brackets[bracket_id].register_job(job_info['budget'])
This would also have runtime advantages, however these are rather small, given that our active_brackets
list is mostly small.
To clean up the dictionary we can then simlpy iterate over the key-value pairs and delete the brackets, that are already done.
Dear @Neeratyoy and @noorawad, according to the paper "DEHB: Evolutionary Hyperband for Scalable, Robust, and Efficient Hyperparameter Optimization," DEHB employs an immediate update design for DE by utilising the AsyncDE class. However, despite passing the async strategy argument in DEHB, it does not change the async strategy of the AsyncDE class, which by default is 'deferred.'
As I understand, the DE class currently only uses deferred updates. If one would like to compare DEHB and its DE component as separate optimisers for optimising a model's validation score, would the update design of deferred vs immediate have a significant impact on the final validation score or simply the wall-clock time? I'd like to compare DE and DEHB using the same update design, and the paper suggests that the immediate update design outperforms deferred DE.
Thank you.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.