Comments (16)
I can give this a try
from syne-tune.
Hi Martin,
when testing DEHB, I also ran into an issue like this, for FCNet. It does not happen when you use max_resource_attr and pass it to the simulator backend as well, but this is of course not a proper fix.
Can we sync, and you tell me what you observed so far?
from syne-tune.
INFO:syne_tune.optimizer.schedulers.searchers.bayesopt.utils.debug_log:[0: random]
num_layers: 3
max_units: 181
batch_size: 91
learning_rate: 0.003162277660168382
weight_decay: 0.050005
momentum: 0.545
max_dropout: 0.5
INFO:syne_tune.optimizer.schedulers.searchers.bayesopt.utils.debug_log:[1: random]
num_layers: 1
max_units: 144
batch_size: 411
learning_rate: 0.009322270368292423
weight_decay: 0.05591940112979374
momentum: 0.7192215979996427
max_dropout: 0.11133237466616985
DEBUG:syne_tune.backend.simulator_backend.simulator_backend:[push OnTrialResultEvent:time = 43.93, trial_id = 0]
DEBUG:syne_tune.backend.simulator_backend.simulator_backend:[push OnTrialResultEvent:time = 43.93, trial_id = 0]
DEBUG:syne_tune.backend.simulator_backend.simulator_backend:[push OnTrialResultEvent:time = 43.94, trial_id = 0]
DEBUG:syne_tune.backend.simulator_backend.simulator_backend:[push OnTrialResultEvent:time = 65.91, trial_id = 0]
DEBUG:syne_tune.backend.simulator_backend.simulator_backend:[push OnTrialResultEvent:time = 88.05, trial_id = 0]
DEBUG:syne_tune.backend.simulator_backend.simulator_backend:[push OnTrialResultEvent:time = 110.15, trial_id = 0]
DEBUG:syne_tune.backend.simulator_backend.simulator_backend:[push OnTrialResultEvent:time = 132.23, trial_id = 0]
DEBUG:syne_tune.backend.simulator_backend.simulator_backend:[push OnTrialResultEvent:time = 154.31, trial_id = 0]
DEBUG:syne_tune.backend.simulator_backend.simulator_backend:[push OnTrialResultEvent:time = 176.51, trial_id = 0]
DEBUG:syne_tune.backend.simulator_backend.simulator_backend:[push OnTrialResultEvent:time = 198.60, trial_id = 0]
DEBUG:syne_tune.backend.simulator_backend.simulator_backend:[push CompleteEvent:time = 1063.64, trial_id = 0]
DEBUG:syne_tune.backend.simulator_backend.simulator_backend:[push OnTrialResultEvent:time = 20.28, trial_id = 1]
DEBUG:syne_tune.backend.simulator_backend.simulator_backend:[push OnTrialResultEvent:time = 20.28, trial_id = 1]
DEBUG:syne_tune.backend.simulator_backend.simulator_backend:[push OnTrialResultEvent:time = 20.28, trial_id = 1]
DEBUG:syne_tune.backend.simulator_backend.simulator_backend:[push OnTrialResultEvent:time = 30.35, trial_id = 1]
DEBUG:syne_tune.backend.simulator_backend.simulator_backend:[push OnTrialResultEvent:time = 40.39, trial_id = 1]
DEBUG:syne_tune.backend.simulator_backend.simulator_backend:[push OnTrialResultEvent:time = 50.39, trial_id = 1]
DEBUG:syne_tune.backend.simulator_backend.simulator_backend:[push OnTrialResultEvent:time = 60.40, trial_id = 1]
DEBUG:syne_tune.backend.simulator_backend.simulator_backend:[push OnTrialResultEvent:time = 70.41, trial_id = 1]
DEBUG:syne_tune.backend.simulator_backend.simulator_backend:[push OnTrialResultEvent:time = 80.39, trial_id = 1]
DEBUG:syne_tune.backend.simulator_backend.simulator_backend:[push OnTrialResultEvent:time = 90.36, trial_id = 1]
DEBUG:syne_tune.backend.simulator_backend.simulator_backend:[push CompleteEvent:time = 480.27, trial_id = 1]
These cumulative times for lcbench-airlines seem odd. They are virtually the same for epochs 1, 2, 3, as if epochs 2, 3 took no time at all, and epoch 1 quite a lot.
Note: The jump from the last OnTrialResultEvent to the CompleteEvent is because I only output the first 10 OnTrialResultEvent.
This should still not imply that any of them are skipped. But clearly, there seems to be some data issue.
from syne-tune.
OK, there are two things here.
First, there seems to be data errors in some of our blackboxes. For example, in lcbench-airlines, results for the first 3 epochs return the same value for metric_elapsed_time.
Martin, can you confirm this? I hope this is not some other bug somewhere.
Second, the simulator backend should still work correctly even if some epochs take no time at all, and it does not. Fixing this is a bit more tricky, it relates to the fact that the simulator backend simulates behaviour with checkpointing without actualluy storing info about the trials. In particular, we need to know from which resource to resume a trial.
from syne-tune.
I should say that everything works fine if you use max_resource_attr instead of max_t with HyperbandScheduler, and also pass max_resource_attr to the simulator backend. The value of this field is often "epochs".
To me, this is anyway the preferred way of doing things. The training script usually has information about the maximum resource in its config, and both scheduler and backend should get this info from there.
If this is used, the backend knows how many resources a trial will emit when it gets started, and this prevents this bug from happening.
from syne-tune.
OK, so #317 fixes this, in the sense that code will run. But the data errors in the BB tables remain.
from syne-tune.
I am closing this issue, but instead we have another one, namely validating (and maybe fixing) data in lcbench, and maybe also in fcnet.
The issue is that for many configs (maybe for all?), the elapsed_time values in the first 3 epochs are identical or very close to identical.
from syne-tune.
I can't find the new issue so I'll leave my comment here: I can confirm that apparently the first epoch requires much more time while the following two don't need any. I could not find anything in the raw data which would explain this behavior.
from syne-tune.
OK, I keep getting this error that the elapsed_time (called "time") and also "val_accuracy" are exactly the same for the first 3 epochs.
This does not happen for all configs sampled, but for the large majority of them.
I pushed code with debug outputs to branch debug_martin
from syne-tune.
Debug code here: https://github.com/awslabs/syne-tune/blob/debug_martin/syne_tune/blackbox_repository/simulated_tabular_backend.py#L156
And here: https://github.com/awslabs/syne-tune/blob/debug_martin/syne_tune/blackbox_repository/utils.py#L54
from syne-tune.
If I then run python benchmarking/nursery/benchmark_automl/benchmark_main.py --num_seeds 1 --method ASHA --benchmark lcbench-airlines
I get output like this:
Index(['val_accuracy', 'time'], dtype='object') {'val_accuracy': 54.633484, 'time': 33.52457, 'epoch': 1} {'val_accuracy': 54.633484, 'time': 33.52457, 'epoch': 2} {'val_accuracy': 54.633484, 'time': 33.52457, 'epoch': 3} {'val_accuracy': 55.08092, 'time': 50.357838, 'epoch': 4} {'val_accuracy': 55.907024, 'time': 67.25511, 'epoch': 5} {'val_accuracy': 56.68802, 'time': 84.19331, 'epoch': 6} {'val_accuracy': 57.218803, 'time': 101.09334, 'epoch': 7} {'val_accuracy': 57.582386, 'time': 118.04459, 'epoch': 8} {'val_accuracy': 57.81063, 'time': 135.03284, 'epoch': 9} {'val_accuracy': 57.95385, 'time': 152.08, 'epoch': 10} {'val_accuracy': 58.067047, 'time': 169.16852, 'epoch': 11} {'val_accuracy': 58.141846, 'time': 186.28845, 'epoch': 12} {'val_accuracy': 58.18696, 'time': 203.40231, 'epoch': 13} {'val_accuracy': 58.220665, 'time': 220.46123, 'epoch': 14} {'val_accuracy': 58.24464, 'time': 237.45142, 'epoch': 15} {'val_accuracy': 58.243977, 'time': 254.37263, 'epoch': 16} {'val_accuracy': 58.246155, 'time': 271.27945, 'epoch': 17} {'val_accuracy': 58.26594, 'time': 288.19318, 'epoch': 18} {'val_accuracy': 58.282715, 'time': 305.1667, 'epoch': 19} {'val_accuracy': 58.286236, 'time': 322.18546, 'epoch': 20} {'val_accuracy': 58.30233, 'time': 339.26794, 'epoch': 21} {'val_accuracy': 58.31726, 'time': 356.34616, 'epoch': 22} {'val_accuracy': 58.31894, 'time': 373.41287, 'epoch': 23} {'val_accuracy': 58.321625, 'time': 390.44696, 'epoch': 24} {'val_accuracy': 58.326984, 'time': 407.4664, 'epoch': 25} {'val_accuracy': 58.333534, 'time': 424.4817, 'epoch': 26} {'val_accuracy': 58.336884, 'time': 441.51205, 'epoch': 27} {'val_accuracy': 58.336548, 'time': 458.55728, 'epoch': 28} {'val_accuracy': 58.336548, 'time': 475.6127, 'epoch': 29} {'val_accuracy': 58.338898, 'time': 492.60458, 'epoch': 30} {'val_accuracy': 58.338726, 'time': 509.54263, 'epoch': 31} {'val_accuracy': 58.341076, 'time': 526.4424, 'epoch': 32} {'val_accuracy': 58.34728, 'time': 543.30194, 'epoch': 33} {'val_accuracy': 58.353485, 'time': 560.1487, 'epoch': 34} {'val_accuracy': 58.3612, 'time': 577.0582, 'epoch': 35} {'val_accuracy': 58.36808, 'time': 594.0279, 'epoch': 36} {'val_accuracy': 58.373608, 'time': 611.07294, 'epoch': 37} {'val_accuracy': 58.375286, 'time': 628.1597, 'epoch': 38} {'val_accuracy': 58.3778, 'time': 645.2522, 'epoch': 39} {'val_accuracy': 58.379982, 'time': 662.3246, 'epoch': 40} {'val_accuracy': 58.383003, 'time': 679.3447, 'epoch': 41} {'val_accuracy': 58.385017, 'time': 696.27783, 'epoch': 42} {'val_accuracy': 58.388203, 'time': 713.1839, 'epoch': 43} {'val_accuracy': 58.390717, 'time': 730.07983, 'epoch': 44} {'val_accuracy': 58.39256, 'time': 746.9867, 'epoch': 45} {'val_accuracy': 58.391724, 'time': 763.9096, 'epoch': 46} {'val_accuracy': 58.391724, 'time': 780.819, 'epoch': 47} {'val_accuracy': 58.391216, 'time': 797.7084, 'epoch': 48} {'val_accuracy': 58.39038, 'time': 814.5716, 'epoch': 49} {'val_accuracy': 58.39038, 'time': 814.57153, 'epoch': 50} {'val_accuracy': 58.39038, 'time': 814.5716, 'epoch': 51} INFO:syne_tune.blackbox_repository.simulated_tabular_backend:Trial 23: Fetching results: r=1, elapsed_time = 33.525 r=2, elapsed_time = 33.525 r=3, elapsed_time = 33.525 r=4, elapsed_time = 50.358 r=5, elapsed_time = 67.255 r=6, elapsed_time = 84.193 r=7, elapsed_time = 101.093 r=8, elapsed_time = 118.045 r=9, elapsed_time = 135.033 r=10, elapsed_time = 152.080 r=11, elapsed_time = 169.169 r=12, elapsed_time = 186.288 r=13, elapsed_time = 203.402 r=14, elapsed_time = 220.461 r=15, elapsed_time = 237.451 r=16, elapsed_time = 254.373 r=17, elapsed_time = 271.279 r=18, elapsed_time = 288.193 r=19, elapsed_time = 305.167 r=20, elapsed_time = 322.185 r=21, elapsed_time = 339.268 r=22, elapsed_time = 356.346 r=23, elapsed_time = 373.413 r=24, elapsed_time = 390.447 r=25, elapsed_time = 407.466 r=26, elapsed_time = 424.482 r=27, elapsed_time = 441.512 r=28, elapsed_time = 458.557 r=29, elapsed_time = 475.613 r=30, elapsed_time = 492.605 r=31, elapsed_time = 509.543 r=32, elapsed_time = 526.442 r=33, elapsed_time = 543.302 r=34, elapsed_time = 560.149 r=35, elapsed_time = 577.058 r=36, elapsed_time = 594.028 r=37, elapsed_time = 611.073 r=38, elapsed_time = 628.160 r=39, elapsed_time = 645.252 r=40, elapsed_time = 662.325 r=41, elapsed_time = 679.345 r=42, elapsed_time = 696.278 r=43, elapsed_time = 713.184 r=44, elapsed_time = 730.080 r=45, elapsed_time = 746.987 r=46, elapsed_time = 763.910 r=47, elapsed_time = 780.819 r=48, elapsed_time = 797.708 r=49, elapsed_time = 814.572 r=50, elapsed_time = 814.572 r=51, elapsed_time = 814.572
from syne-tune.
Interestingly, for this record, the values are the same for r=1,2,3 and also for r=49,50,51.
Maybe whoever created this data, did some funny "imputing"?
from syne-tune.
This "r=1,2,3" and "r=49,50,51" being the same happens for all sorts of other configs as well.
from syne-tune.
OK, I confirm Martin's observation that these errors are not in the raw data.
My best guess is this has something to do with the surrogate regression. This should really only be done with the config as input, so that a record (config1, resource1) can only be interpolated with data from the same resource level.
from syne-tune.
A safe way of making sure things work would be to use surrogate regression for each resource level separately
from syne-tune.
The bug as stated has been resolved, but follow-up error moved to #319
from syne-tune.
Related Issues (20)
- RemoteLauncher corrupts requirements.txt when not ending with newline HOT 5
- Conditional/Inactive hyperparameters HOT 6
- Troubles with maximising using MORandomScalarizationBayesOpt HOT 4
- Run BOHB/SyncBOHB using lcbench HOT 2
- Open `MultiObjectiveMultiSurrogateSearcher` to additional arguments HOT 2
- Simple example for learning curve plotting HOT 7
- Surprising results of trial values over time HOT 3
- Conditional sampling in configuration space HOT 4
- Using sigterm / catching sigterm to enable checkpointing HOT 10
- Convenience transformation for config spaces HOT 8
- Docs for continuing aborted runs HOT 12
- Hard to find default configurations for schedulers HOT 3
- Difficulties setting rungs / stopping HOT 20
- GP not robust to NaN metric HOT 2
- Direct support for time as a resource? HOT 7
- Acquisition functions in Bayesian optimization HOT 1
- Update Ray dependencies, as dependabot flags them as security vulnerabilities
- Set custom GPU Ids for LocalBackend HOT 2
- [Question] Multiple runs for same parameter values HOT 5
- ModuleNotFoundError: No module named 'sagemaker.interactive_apps' HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from syne-tune.