automl / jahs_bench_201 Goto Github PK

The first collection of surrogate benchmarks for Joint Architecture and Hyperparameter Search.

Home Page: https://automl.github.io/jahs_bench_201/

License: MIT License

Python 100.00%

jahs_bench_201's Introduction

JAHS-Bench-201

The first collection of surrogate benchmarks for Joint Architecture and Hyperparameter Search (JAHS), built to also support and facilitate research on multi-objective, cost-aware and (multi) multi-fidelity optimization algorithms.

Please see our documentation here. Precise details about the data collection and surrogate creation process, as well as our experiments, can be found in the assosciated publication.

Installation

Using pip

pip install jahs-bench

Optionally, you can download the data required to use the surrogate benchmark ahead of time with

python -m jahs_bench.download --target surrogates

To test if the installation was successful, you can, e.g, run a minimal example with

python -m jahs_bench_examples.minimal

This should randomly sample a configuration, and display both the sampled configuration and the result of querying the surrogate for that configuration. Note: We have recently discovered that XGBoost - the library used for our surrogate models - can suffer from some incompatibility issues with MacOS. Users who run into such an issue may consult this discussion for details.

Using the Benchmark

Creating Configurations

Configurations in our Joint Architecture and Hyperparameter (JAHS) space are represented as dictionaries, e.g.,:

config = {
    'Optimizer': 'SGD',
    'LearningRate': 0.1,
    'WeightDecay': 5e-05,
    'Activation': 'Mish',
    'TrivialAugment': False,
    'Op1': 4,
    'Op2': 1,
    'Op3': 2,
    'Op4': 0,
    'Op5': 2,
    'Op6': 1,
    'N': 5,
    'W': 16,
    'Resolution': 1.0,
}

For a full description on the search space and configurations see our documentation.

Evaluating Configurations

import jahs_bench

benchmark = jahs_bench.Benchmark(task="cifar10", download=True)

# Query a random configuration
config = benchmark.sample_config()
results = benchmark(config, nepochs=200)

# Display the outputs
print(f"Config: {config}")  # A dict
print(f"Result: {results}")  # A dict

More Evaluation Options

The API of our benchmark enables users to either query a surrogate model (the default) or the tables of performance data, or train a configuration from our search space from scratch using the same pipeline as was used by our benchmark. However, users should note that the latter functionality requires the installation of jahs_bench_201 with the optional data_creation component and its relevant dependencies. The relevant data can be automatically downloaded by our API. See our documentation for details.

Benchmark Data

We provide documentation for the performance dataset used to train our surrogate models and further information on our surrogate models.

Experiments and Evaluation Protocol

See our experiments repository and our documentation.

Leaderboards

We maintain leaderboards for several optimization tasks and algorithmic frameworks.

jahs_bench_201's People

Contributors

Stargazers

Watchers

Forkers

karibbov neochaos12 trellixvulnteam sdaulton dbsxodud-11 j0nasseng

jahs_bench_201's Issues

Add a `py.typed` to export type information

Seems most things are typed and typed quite well, adding a simple py.typed would allow outside tools to access these nicely :)

Incompatible checksums error

Problem: unexpected error when running the commands of the README.

Environment:

macOS 11.6
conda 4.11.0

Steps to reproduce

conda create -n jahs_err python=3.7.13
conda activate jahs_err

git clone --recurse-submodules -- [email protected]:automl/jahs_bench_mf
cd jahs_bench_mf
pip install .

python JAHS-Bench-MF/jahs_bench/public_api.py

Error

Attempting to read surrogate model from: JAHS-Bench-MF/surrogates/thesis_cifar10
Traceback (most recent call last):
  File "JAHS-Bench-MF/jahs_bench/public_api.py", line 46, in <module>
    b = Benchmark(model_path=model_path)
  File "JAHS-Bench-MF/jahs_bench/public_api.py", line 23, in __init__
    self.surrogate = XGBSurrogate.load(model_path)
  File "/usr/local/anaconda3/envs/jahs_err/lib/python3.7/site-packages/jahs_bench/lib/surrogate.py", line 396, in load
    params: dict = joblib.load(outdir / cls.__params_filename)
  File "/usr/local/anaconda3/envs/jahs_err/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 587, in load
    obj = _unpickle(fobj, filename, mmap_mode)
  File "/usr/local/anaconda3/envs/jahs_err/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 506, in _unpickle
    obj = unpickler.load()
  File "/usr/local/anaconda3/envs/jahs_err/lib/python3.7/pickle.py", line 1088, in load
    dispatch[key[0]](self)
  File "/usr/local/anaconda3/envs/jahs_err/lib/python3.7/pickle.py", line 1436, in load_reduce
    stack[-1] = func(*args)
  File "stringsource", line 6, in ConfigSpace.hyperparameters.__pyx_unpickle_CategoricalHyperparameter
_pickle.PickleError: Incompatible checksums (58514084 vs 0xea77850 = (_choices_set, choices, choices_vector, default_value, meta, name, normalized_default_value, num_choices, probabilities, weights))

Additional informations

The same error occurs with the latest python version (3.10), or with the 3.7.5.

Termination in colorectal_histology due to memory overflow

As mentioned in the title, colorectal_histology is terminated while both cifar10 and fashion-mnist work.
It seems only colorectal_histology requires 16+GB RAM in loading the surrogate benchmark.

Environment:

Ubuntu 18.04

$ conda create -n test-jahs python==3.8
$ pip install jahs-bench numpy ConfigSpace parzen_estimator

My code:

import os

import jahs_bench


DATA_DIR = f"{os.environ['HOME']}/tabular_benchmarks/jahs_bench_data/"


tasks = ["colorectal_histology", "cifar10", "fashion_mnist"]
benchmark = jahs_bench.Benchmark(task=tasks[0], download=False, save_dir=DATA_DIR)

config = benchmark.sample_config(random_state=42)
results = benchmark(config, nepochs=200)

print(config)
print(results)

Output (Termination happens only in colorectal_histology)

[00:22:56] WARNING: ../src/gbm/gbtree.cc:386: Loading from a raw memory buffer on CPU only machine.  Changing tree_method to hist.
[00:22:56] WARNING: ../src/learner.cc:223: No visible GPU is found, setting `gpu_id` to -1
[00:23:08] WARNING: ../src/gbm/gbtree.cc:386: Loading from a raw memory buffer on CPU only machine.  Changing tree_method to hist.
[00:23:08] WARNING: ../src/learner.cc:223: No visible GPU is found, setting `gpu_id` to -1
[00:23:12] WARNING: ../src/gbm/gbtree.cc:386: Loading from a raw memory buffer on CPU only machine.  Changing tree_method to hist.
[00:23:12] WARNING: ../src/learner.cc:223: No visible GPU is found, setting `gpu_id` to -1
[00:23:16] WARNING: ../src/gbm/gbtree.cc:386: Loading from a raw memory buffer on CPU only machine.  Changing tree_method to hist.
[00:23:16] WARNING: ../src/learner.cc:223: No visible GPU is found, setting `gpu_id` to -1
[00:23:25] WARNING: ../src/gbm/gbtree.cc:386: Loading from a raw memory buffer on CPU only machine.  Changing tree_method to hist.
[00:23:25] WARNING: ../src/learner.cc:223: No visible GPU is found, setting `gpu_id` to -1
Killed

XGBoost and MacOS incompatibility

This issue has been reported by multiple users when they attempted to install and use JAHS-Bench-201 on MacOS. To the best of my knowledge, this boils down to an incompatibility between XGBoost and MacOS, but it needs to be investigated further and should be addressed somehow at the earliest convenience in order to enable Mac users to also use the benchmark.

The following is an error log that was sent to me in response to the installation instructions in our ReadMe:

WARNING: /Users/runner/work/xgboost/xgboost/src/gbm/gbtree.cc:386: Loading from a raw memory buffer on CPU only machine.  Changing tree_method to hist.
Traceback (most recent call last):
  File "/usr/local/Caskroom/miniconda/base/envs/meta_neps_02/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/Caskroom/miniconda/base/envs/meta_neps_02/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/Caskroom/miniconda/base/envs/meta_neps_02/lib/python3.7/site-packages/jahs_bench_examples/minimal.py", line 13, in <module>
    run()
  File "/usr/local/Caskroom/miniconda/base/envs/meta_neps_02/lib/python3.7/site-packages/jahs_bench_examples/minimal.py", line 5, in run
    benchmark = jahs_bench.Benchmark(task="cifar10", kind="surrogate", download=True)
  File "/usr/local/Caskroom/miniconda/base/envs/meta_neps_02/lib/python3.7/site-packages/jahs_bench/api.py", line 96, in __init__
    loaders[kind]()
  File "/usr/local/Caskroom/miniconda/base/envs/meta_neps_02/lib/python3.7/site-packages/jahs_bench/api.py", line 106, in _load_surrogate
    self._surrogates[o] = XGBSurrogate.load(model_path / str(o))
  File "/usr/local/Caskroom/miniconda/base/envs/meta_neps_02/lib/python3.7/site-packages/jahs_bench/surrogate/model.py", line 462, in load
    model = joblib.load(outdir / cls.__model_filename)
  File "/usr/local/Caskroom/miniconda/base/envs/meta_neps_02/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 587, in load
    obj = _unpickle(fobj, filename, mmap_mode)
  File "/usr/local/Caskroom/miniconda/base/envs/meta_neps_02/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 506, in _unpickle
    obj = unpickler.load()
  File "/usr/local/Caskroom/miniconda/base/envs/meta_neps_02/lib/python3.7/pickle.py", line 1088, in load
    dispatch[key[0]](self)
  File "/usr/local/Caskroom/miniconda/base/envs/meta_neps_02/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 331, in load_build
    Unpickler.load_build(self)
  File "/usr/local/Caskroom/miniconda/base/envs/meta_neps_02/lib/python3.7/pickle.py", line 1552, in load_build
    setstate(state)
  File "/usr/local/Caskroom/miniconda/base/envs/meta_neps_02/lib/python3.7/site-packages/xgboost/core.py", line 1452, in __setstate__
    _LIB.XGBoosterUnserializeFromBuffer(handle, ptr, length))
  File "/usr/local/Caskroom/miniconda/base/envs/meta_neps_02/lib/python3.7/site-packages/xgboost/core.py", line 218, in _check_call
    raise XGBoostError(py_str(_LIB.XGBGetLastError()))
xgboost.core.XGBoostError: [10:43:09] /Users/runner/work/xgboost/xgboost/src/tree/tree_updater.cc:20: Unknown tree updater grow_gpu_hist
Stack trace:
  [bt] (0) 1   libxgboost.dylib                    0x00000001230ce4a4 dmlc::LogMessageFatal::~LogMessageFatal() + 116
  [bt] (1) 2   libxgboost.dylib                    0x0000000123217f39 xgboost::TreeUpdater::Create(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, xgboost::GenericParameter const*) + 729
  [bt] (2) 3   libxgboost.dylib                    0x0000000123159d8a xgboost::gbm::GBTree::LoadConfig(xgboost::Json const&) + 2634
  [bt] (3) 4   libxgboost.dylib                    0x0000000123178246 xgboost::LearnerConfiguration::LoadConfig(xgboost::Json const&) + 742
  [bt] (4) 5   libxgboost.dylib                    0x0000000123179827 xgboost::LearnerIO::Load(dmlc::Stream*) + 743
  [bt] (5) 6   libxgboost.dylib                    0x00000001230c9101 XGBoosterUnserializeFromBuffer + 145
  [bt] (6) 7   libffi.6.dylib                      0x000000010196a934 ffi_call_unix64 + 76
  [bt] (7) 8   ???                                 0x00007ff7beb9eb80 0x0 + 140702033505152

Moving away from pickles

Pickling our performance datasets and surrogate models was chosen as a convenient solution during the development phase of the repo, but is inherently problematic from the perspective of long-term maintenance and support of the repo. In particular, pickling makes the shared datasets and models very sensitive to the exact dependency versions and system configuration used when originally pickling them. Case in point, #6. Therefore, a future release should focus on moving away from pickles. Specifically,

Performance Datasets: Relatively straightforward since Pandas DataFrames are quite flexible and support a variety of I/O options. Viable formats include HDF5, Feather and CSV. Each comes with its own set of advantages and disadvantages that need to be carefully weighed. It may be fine to keep using pickles for the interim data (checkpoints, metrics) generated during model training and only switch to a different format for the final dataset.
Surrogate Models: This is more nuanced since it involves a large number of moving parts. A different serialization scheme will need to be chosen based on the interoperability of SciKit-Learn, XGBoost and jahs_bench.surrogates.model.XGBSurrogate. Ultimately, it may be necessary to write a custom JSON encoder for XGBSurrogate that captures all relevant parameters of the trained model object and saves/loads them.

Ambiguity in the query

This is not exactly an issue but leads to misunderstandings, so I would like to mention it here.

Which will be taken into account, nepochs in the argument of Benchmark.__call__ or epoch in config dict? (I know the answer is nepochs, but it is ambiguous until we run)
We do not get any out of domain errors for numerical parameters (e.g. we can specify N=100) and it seems fidelity parameters are rounded somehow?
Continuous parameter also does not have out-of-domain errors and seems to be rounded?

I would really appreciate if we can choose a mode, either clipping automatically or raising errors anytime you get invalid items.

Bug in argument check (metrics)

Hi guys,

thanks for the awesome work.

While using it, the line below throws an error, when metrics is specified.

jahs_bench_201/jahs_bench/api.py

Line 108 in e9a93ad

unknown = set(self.__known_metrics) - metrics

I think it should be

unknown = metrics - set(self.__known_metrics)

instead.

Cheers, Philipp

Issue with config and trajectory

Hiyo,

I ran into an issue while wrapping jahs bench with a given config. Using the trajectory functionality will fail but manually iterating over the epochs will success. I've attached the config, reproduce script, the stack trace and my full environemnt.

{
    'N': 3,
    'W': 4,
    'Op1': 1,
    'Op2': 3,
    'Op3': 4,
    'Op4': 1,
    'Op5': 2,
    'Op6': 3,
    'TrivialAugment': True,
    'Activation': 'Hardswish',
    'Optimizer': 'SGD',
    'Resolution': 0.25,
    'LearningRate': 0.10214993871440806,
    'WeightDecay': 0.00031212403229771485
}

Here's the reproducibility script:

from jahs_bench import Benchmark, BenchmarkTasks

config = {
    'N': 3,
    'W': 4,
    'Op1': 1,
    'Op2': 3,
    'Op3': 4,
    'Op4': 1,
    'Op5': 2,
    'Op6': 3,
    'TrivialAugment': True,
    'Activation': 'Hardswish',
    'Optimizer': 'SGD',
    'Resolution': 0.25,
    'LearningRate': 0.10214993871440806,
    'WeightDecay': 0.00031212403229771485
}

bench = Benchmark(
    task=BenchmarkTasks.FashionMNIST,
    save_dir="data/jahs-bench-data",
    download=False
)

# This will fail
traj = bench(config, nepochs=200, full_trajectory=True)

# This works
traj = {f: bench(config, nepochs=f)[f] for f in range(1, 201)}

Full trace:

TypeError                                 Traceback (most recent call last)
<ipython-input-7-2bdcd586c58f> in <module>
----> 1 traj = bench(config, nepochs=200, full_trajectory=True)

~/code/mf-prior-bench/.venv/lib/python3.7/site-packages/jahs_bench/api.py in __call__(self, config, nepochs, full_trajectory, **kwargs)
    138                  full_trajectory: bool = False, **kwargs):
    139         return self._call_fn(config=config, nepochs=nepochs,
--> 140                              full_trajectory=full_trajectory, **kwargs)
    141 
    142     def _benchmark_surrogate(self, config: dict, nepochs: Optional[int] = 200,

~/code/mf-prior-bench/.venv/lib/python3.7/site-packages/jahs_bench/api.py in _benchmark_surrogate(self, config, nepochs, full_trajectory, **kwargs)
    155         outputs = []
    156         for model in self._surrogates.values():
--> 157             outputs.append(model.predict(features))
    158 
    159         outputs: pd.DataFrame = pd.concat(outputs, axis=1)

~/code/mf-prior-bench/.venv/lib/python3.7/site-packages/jahs_bench/surrogate/model.py in predict(self, features)
    435 
    436         features = features.loc[:, self.feature_headers]
--> 437         ypredict = self.model.predict(features)
    438         ypredict = pd.DataFrame(ypredict, columns=self.label_headers)
    439         return ypredict

~/code/mf-prior-bench/.venv/lib/python3.7/site-packages/sklearn/utils/metaestimators.py in <lambda>(*args, **kwargs)
    111 
    112             # lambda, but not partial, allows help() to work with update_wrapper
--> 113             out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)  # noqa
    114         else:
    115 

~/code/mf-prior-bench/.venv/lib/python3.7/site-packages/sklearn/pipeline.py in predict(self, X, **predict_params)
    467         Xt = X
    468         for _, name, transform in self._iter(with_final=False):
--> 469             Xt = transform.transform(Xt)
    470         return self.steps[-1][1].predict(Xt, **predict_params)
    471 

~/code/mf-prior-bench/.venv/lib/python3.7/site-packages/sklearn/compose/_column_transformer.py in transform(self, X)
    751             _transform_one,
    752             fitted=True,
--> 753             column_as_strings=fit_dataframe_and_transform_dataframe,
    754         )
    755         self._validate_output(Xs)

~/code/mf-prior-bench/.venv/lib/python3.7/site-packages/sklearn/compose/_column_transformer.py in _fit_transform(self, X, y, func, fitted, column_as_strings)
    613                     message=self._log_message(name, idx, len(transformers)),
    614                 )
--> 615                 for idx, (name, trans, column, weight) in enumerate(transformers, 1)
    616             )
    617         except ValueError as e:

~/code/mf-prior-bench/.venv/lib/python3.7/site-packages/joblib/parallel.py in __call__(self, iterable)
   1041             # remaining jobs.
   1042             self._iterating = False
-> 1043             if self.dispatch_one_batch(iterator):
   1044                 self._iterating = self._original_iterator is not None
   1045 

~/code/mf-prior-bench/.venv/lib/python3.7/site-packages/joblib/parallel.py in dispatch_one_batch(self, iterator)
    859                 return False
    860             else:
--> 861                 self._dispatch(tasks)
    862                 return True
    863 

~/code/mf-prior-bench/.venv/lib/python3.7/site-packages/joblib/parallel.py in _dispatch(self, batch)
    777         with self._lock:
    778             job_idx = len(self._jobs)
--> 779             job = self._backend.apply_async(batch, callback=cb)
    780             # A job can complete so quickly than its callback is
    781             # called before we get here, causing self._jobs to

~/code/mf-prior-bench/.venv/lib/python3.7/site-packages/joblib/_parallel_backends.py in apply_async(self, func, callback)
    206     def apply_async(self, func, callback=None):
    207         """Schedule a func to be run"""
--> 208         result = ImmediateResult(func)
    209         if callback:
    210             callback(result)

~/code/mf-prior-bench/.venv/lib/python3.7/site-packages/joblib/_parallel_backends.py in __init__(self, batch)
    570         # Don't delay the application, to avoid keeping the input
    571         # arguments in memory
--> 572         self.results = batch()
    573 
    574     def get(self):

~/code/mf-prior-bench/.venv/lib/python3.7/site-packages/joblib/parallel.py in __call__(self)
    261         with parallel_backend(self._backend, n_jobs=self._n_jobs):
    262             return [func(*args, **kwargs)
--> 263                     for func, args, kwargs in self.items]
    264 
    265     def __reduce__(self):

~/code/mf-prior-bench/.venv/lib/python3.7/site-packages/joblib/parallel.py in <listcomp>(.0)
    261         with parallel_backend(self._backend, n_jobs=self._n_jobs):
    262             return [func(*args, **kwargs)
--> 263                     for func, args, kwargs in self.items]
    264 
    265     def __reduce__(self):

~/code/mf-prior-bench/.venv/lib/python3.7/site-packages/sklearn/utils/fixes.py in __call__(self, *args, **kwargs)
    214     def __call__(self, *args, **kwargs):
    215         with config_context(**self.config):
--> 216             return self.function(*args, **kwargs)
    217 
    218 

~/code/mf-prior-bench/.venv/lib/python3.7/site-packages/sklearn/pipeline.py in _transform_one(transformer, X, y, weight, **fit_params)
    874 
    875 def _transform_one(transformer, X, y, weight, **fit_params):
--> 876     res = transformer.transform(X)
    877     # if we have a weight for this transformer, multiply output
    878     if weight is None:

~/code/mf-prior-bench/.venv/lib/python3.7/site-packages/sklearn/preprocessing/_encoders.py in transform(self, X)
    511             handle_unknown=self.handle_unknown,
    512             force_all_finite="allow-nan",
--> 513             warn_on_unknown=warn_on_unknown,
    514         )
    515 

~/code/mf-prior-bench/.venv/lib/python3.7/site-packages/sklearn/preprocessing/_encoders.py in _transform(self, X, handle_unknown, force_all_finite, warn_on_unknown)
    132         for i in range(n_features):
    133             Xi = X_list[i]
--> 134             diff, valid_mask = _check_unknown(Xi, self.categories_[i], return_mask=True)
    135 
    136             if not np.all(valid_mask):

~/code/mf-prior-bench/.venv/lib/python3.7/site-packages/sklearn/utils/_encode.py in _check_unknown(values, known_values, return_mask)
    259 
    260         # check for nans in the known_values
--> 261         if np.isnan(known_values).any():
    262             diff_is_nan = np.isnan(diff)
    263             if diff_is_nan.any():

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

Env:

Python 3.7.12

Package                       Version
----------------------------- -----------
alabaster                     0.7.12
apeye                         1.2.0
argon2-cffi                   21.3.0
argon2-cffi-bindings          21.2.0
atomicwrites                  1.4.1
attrs                         22.1.0
autodocsumm                   0.2.9
automl-sphinx-theme           0.1.12
Babel                         2.10.3
backcall                      0.2.0
beautifulsoup4                4.11.1
black                         22.6.0
bleach                        5.0.1
CacheControl                  0.12.11
certifi                       2022.6.15
cffi                          1.15.1
cfgv                          3.3.1
charset-normalizer            2.1.0
click                         8.1.3
coloredlogs                   15.0.1
ConfigSpace                   0.4.21
coverage                      6.4.4
cssutils                      2.5.1
cycler                        0.11.0
Cython                        0.29.32
debugpy                       1.6.3
decopatch                     1.4.10
decorator                     5.1.1
defusedxml                    0.7.1
dict2css                      0.3.0
distlib                       0.3.5
docstring-to-markdown         0.10
docutils                      0.18.1
domdf-python-tools            3.3.0
entrypoints                   0.4
fastjsonschema                2.16.1
filelock                      3.8.0
flake8                        5.0.4
flatbuffers                   2.0
fonttools                     4.36.0
html5lib                      1.1
humanfriendly                 10.0
identify                      2.5.3
idna                          3.3
imagesize                     1.4.1
importlib-metadata            4.12.0
importlib-resources           5.9.0
ipykernel                     6.15.1
ipython                       7.34.0
ipython-genutils              0.2.0
ipywidgets                    8.0.1
isort                         5.10.1
jahs-bench                    1.0.0
jedi                          0.18.1
jedi-language-server          0.37.0
Jinja2                        3.1.2
joblib                        1.1.0
jsonschema                    4.13.0
jupyter                       1.0.0
jupyter-client                7.3.4
jupyter-console               6.4.4
jupyter-core                  4.11.1
jupyterlab-pygments           0.2.2
jupyterlab-widgets            3.0.2
kiwisolver                    1.4.4
lockfile                      0.12.2
lxml                          4.9.1
makefun                       1.14.0
markdown-it-py                2.1.0
MarkupSafe                    2.1.1
matplotlib                    3.5.3
matplotlib-inline             0.1.6
mccabe                        0.7.0
mdit-py-plugins               0.3.0
mdurl                         0.1.2
mf-prior-bench                0.1.0
mfp-bench                     0.0.1
mfpbench                      0.0.1
mistune                       0.8.4
more-itertools                8.14.0
mpmath                        1.2.1
msgpack                       1.0.4
mypy                          0.971
mypy-extensions               0.4.3
myst-parser                   0.18.0
natsort                       8.1.0
nbclient                      0.6.6
nbconvert                     6.5.3
nbformat                      5.4.0
nest-asyncio                  1.5.5
nodeenv                       1.7.0
notebook                      6.4.12
numpy                         1.21.6
numpydoc                      1.4.0
onnxruntime                   1.12.1
packaging                     21.3
pandas                        1.3.5
pandocfilters                 1.5.0
parso                         0.8.3
pathspec                      0.9.0
pexpect                       4.8.0
pickleshare                   0.7.5
Pillow                        9.2.0
pip                           22.2.2
pkgutil_resolve_name          1.3.10
platformdirs                  2.5.2
pluggy                        0.13.1
pre-commit                    2.20.0
prometheus-client             0.14.1
prompt-toolkit                3.0.30
protobuf                      4.21.5
psutil                        5.9.1
ptyprocess                    0.7.0
py                            1.11.0
pycodestyle                   2.9.1
pycparser                     2.21
pydantic                      1.9.2
pydocstyle                    6.1.1
pyflakes                      2.5.0
pygls                         0.12.1
Pygments                      2.13.0
pyparsing                     3.0.9
pyrsistent                    0.18.1
pytest                        4.6.0
pytest-cases                  3.6.13
pytest-cov                    3.0.0
python-dateutil               2.8.2
pytz                          2022.2.1
PyYAML                        6.0
pyzmq                         23.2.1
qtconsole                     5.3.1
QtPy                          2.2.0
requests                      2.28.1
ruamel.yaml                   0.17.21
ruamel.yaml.clib              0.2.6
scikit-learn                  1.0.2
scipy                         1.7.3
seaborn                       0.11.2
Send2Trash                    1.8.0
setuptools                    47.1.0
setuptools-scm                6.4.2
six                           1.16.0
snowballstemmer               2.2.0
soupsieve                     2.3.2.post1
Sphinx                        5.1.1
sphinx-autodoc-typehints      1.19.2
sphinx-gallery                0.11.0
sphinx-jinja2-compat          0.1.2
sphinx-prompt                 1.5.0
sphinx-tabs                   3.4.1
sphinx-toolbox                3.2.0
sphinxcontrib-applehelp       1.0.2
sphinxcontrib-devhelp         1.0.2
sphinxcontrib-htmlhelp        2.0.0
sphinxcontrib-jsmath          1.0.1
sphinxcontrib-qthelp          1.0.3
sphinxcontrib-serializinghtml 1.1.5
sympy                         1.10.1
tabulate                      0.8.10
terminado                     0.15.0
threadpoolctl                 3.1.0
tinycss2                      1.1.1
toml                          0.10.2
tomli                         2.0.1
tornado                       6.2
traitlets                     5.3.0
typed-ast                     1.5.4
typeguard                     2.13.3
typing_extensions             4.3.0
typing-inspect                0.8.0
urllib3                       1.26.11
virtualenv                    20.16.3
wcwidth                       0.2.5
webencodings                  0.5.1
wheel                         0.37.1
widgetsnbextension            4.0.2
xgboost                       1.5.2
yacs                          0.1.8
yahpo-gym                     1.0.1
zipp                          3.8.1

Provide a fixed tag version once stable

Not sure if the benchmark is stable yet but having a tag to version of would be ideal.

It would also be good to provide an option in the download module rather then to use fixed versioned surrogates/data once possible:

jahs_bench_201/jahs_bench/download.py

Lines 5 to 6 in c1e92dd

 surrogate_url = "https://ml.informatik.uni-freiburg.de/research-artifacts/jahs_bench_201/v1.0.0/assembled_surrogates.tar" 

 metric_url = "https://ml.informatik.uni-freiburg.de/research-artifacts/jahs_bench_201/v1.0.0/metric_data.tar"

Export the ability to create the `joint_config_space` as a function

The ConfigurationSpace can be a source of randomness and unfortunatly only provides a seed(...) functionality to update it. I don't trust this for re-use across multiple benchmark runs. If there was a function that created the config space with a given, that would be very helpful.

jahs_bench_201/jahs_bench/lib/core/configspace.py

Line 4 in c1e92dd

joint_config_space = CS.ConfigurationSpace("jahs_bench_config_space")

Wrong path when loading tabular data

hi guys,

loading the tabular data throws an error.

I think the problem is here:

jahs_bench_201/jahs_bench/api.py

Lines 162 to 165 in b9232f3

 def _load_table(self): 

 assert self.save_dir.exists() and self.save_dir.is_dir() 

 table_path = self.save_dir / self.task.value

self.data_dir should be self.table_dir:

 def _load_table(self): 
     assert self.table_dir.exists() and self.table_dir.is_dir() 
  
     table_path = self.table_dir / self.task.value

Thanks in advance,
cheers.

pip install error

It seems the issue relates to this post.

I could resolve this issue by pip uninstall typing.

My env:

python==3.8.3
Ubuntu18.04

Collecting git+https://github.com/automl/jahs_bench_201.git
  Cloning https://github.com/automl/jahs_bench_201.git to /tmp/pip-req-build-g73kff40
  Running command git clone -q https://github.com/automl/jahs_bench_201.git /tmp/pip-req-build-g73kff40
  Installing build dependencies ... error
  ERROR: Command errored out with exit status 1:
   command: /home/shuhei/anaconda3/bin/python /home/shuhei/anaconda3/lib/python3.8/site-packages/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-0pev8moj/overlay --no-warn-script-location --no-binary :none: --only-binary :none: -i https://pypi.org/simple -- 'poetry-core>=1.0.0'
       cwd: None
  Complete output (44 lines):
  Traceback (most recent call last):
    File "/home/shuhei/anaconda3/lib/python3.8/runpy.py", line 194, in _run_module_as_main
      return _run_code(code, main_globals, None,
    File "/home/shuhei/anaconda3/lib/python3.8/runpy.py", line 87, in _run_code
      exec(code, run_globals)
    File "/home/shuhei/anaconda3/lib/python3.8/site-packages/pip/__main__.py", line 26, in <module>
      sys.exit(_main())
    File "/home/shuhei/anaconda3/lib/python3.8/site-packages/pip/_internal/cli/main.py", line 73, in main
      command = create_command(cmd_name, isolated=("--isolated" in cmd_args))
    File "/home/shuhei/anaconda3/lib/python3.8/site-packages/pip/_internal/commands/__init__.py", line 104, in create_command
      module = importlib.import_module(module_path)
    File "/home/shuhei/anaconda3/lib/python3.8/importlib/__init__.py", line 127, in import_module
      return _bootstrap._gcd_import(name[level:], package, level)
    File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
    File "<frozen importlib._bootstrap>", line 991, in _find_and_load
    File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
    File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
    File "<frozen importlib._bootstrap_external>", line 783, in exec_module
    File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
    File "/home/shuhei/anaconda3/lib/python3.8/site-packages/pip/_internal/commands/install.py", line 24, in <module>
      from pip._internal.cli.req_command import RequirementCommand, with_cleanup
    File "/home/shuhei/anaconda3/lib/python3.8/site-packages/pip/_internal/cli/req_command.py", line 16, in <module>
      from pip._internal.index.package_finder import PackageFinder
    File "/home/shuhei/anaconda3/lib/python3.8/site-packages/pip/_internal/index/package_finder.py", line 21, in <module>
      from pip._internal.index.collector import parse_links
    File "/home/shuhei/anaconda3/lib/python3.8/site-packages/pip/_internal/index/collector.py", line 14, in <module>
      from pip._vendor import html5lib, requests
    File "/home/shuhei/anaconda3/lib/python3.8/site-packages/pip/_vendor/requests/__init__.py", line 114, in <module>
      from . import utils
    File "/home/shuhei/anaconda3/lib/python3.8/site-packages/pip/_vendor/requests/utils.py", line 25, in <module>
      from . import certs
    File "/home/shuhei/anaconda3/lib/python3.8/site-packages/pip/_vendor/requests/certs.py", line 15, in <module>
      from pip._vendor.certifi import where
    File "/home/shuhei/anaconda3/lib/python3.8/site-packages/pip/_vendor/certifi/__init__.py", line 1, in <module>
      from .core import contents, where
    File "/home/shuhei/anaconda3/lib/python3.8/site-packages/pip/_vendor/certifi/core.py", line 12, in <module>
      from importlib.resources import read_text
    File "/home/shuhei/anaconda3/lib/python3.8/importlib/resources.py", line 11, in <module>
      from typing import Iterable, Iterator, Optional, Set, Union   # noqa: F401
    File "/home/shuhei/anaconda3/lib/python3.8/site-packages/typing.py", line 1359, in <module>
      class Callable(extra=collections_abc.Callable, metaclass=CallableMeta):
    File "/home/shuhei/anaconda3/lib/python3.8/site-packages/typing.py", line 1007, in __new__
      self._abc_registry = extra._abc_registry
  AttributeError: type object 'Callable' has no attribute '_abc_registry'
  ----------------------------------------
ERROR: Command errored out with exit status 1: /home/shuhei/anaconda3/bin/python /home/shuhei/anaconda3/lib/python3.8/site-packages/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-0pev8moj/overlay --no-warn-script-location --no-binary :none: --only-binary :none: -i https://pypi.org/simple -- 'poetry-core>=1.0.0' Check the logs for full command output.

Issue with evaluating multiobjective benchmarks

Hi all -- I am interested in running the multiobjective variation of this benchmark and encountered the following issues, please advise:

For the hypervolume metric to be comparable across different solvers, everyone needs to use the same reference point. Therefore, in https://automl.github.io/jahs_bench_201/evaluation_protocol you must provide a recommended reference point for everyone to measure hypervolume with respect to. The reference point must be a finite performance value for each objective for which it is impossible to do worse.

I looked up the reference point for your reported results for the multiobjective random sampling approach, and in line 19-20 of this file: https://github.com/automl/jahs_bench_201_experiments/blob/master/jahs_bench_201_experiments/analysis/leaderboard.py and it looks like you are calculating the reference point as the minimum of all observed values. Is that correct? Such a policy would indirectly reward methods that sample very bad configurations and penalize methods that never take a "bad" evaluation.

Also in the same file, it looks like the 2 objectives for the multiobjective variation of the benchmarks are to maximize validation accuracy + minimize latency? It would be nice to clarify this somewhere in https://automl.github.io/jahs_bench_201/evaluation_protocol as well, as it was difficult to find.

	surrogate_url = "https://ml.informatik.uni-freiburg.de/research-artifacts/jahs_bench_201/v1.0.0/assembled_surrogates.tar"
	metric_url = "https://ml.informatik.uni-freiburg.de/research-artifacts/jahs_bench_201/v1.0.0/metric_data.tar"

	def _load_table(self):
	assert self.save_dir.exists() and self.save_dir.is_dir()

	table_path = self.save_dir / self.task.value