I tried the code on Colab Notebook.
<Crop some error code that is identical to the part below>
== Status ==
Memory usage on this node: 2.7/12.7 GiB
Using FIFO scheduling algorithm.
Resources requested: 1.0/2 CPUs, 0/0 GPUs, 0.0/7.36 GiB heap, 0.0/3.68 GiB objects
Result logdir: /root/ray_results/_objective_2021-10-05_11-51-07
Number of trials: 10/10 (9 ERROR, 1 RUNNING)
+------------------------+----------+-------+-----------------+--------------------+-------------------------------+----------+
| Trial name | status | loc | learning_rate | num_train_epochs | per_device_train_batch_size | seed |
|------------------------+----------+-------+-----------------+--------------------+-------------------------------+----------|
| _objective_86e23_00009 | RUNNING | | 7.96157e-06 | 2 | 32 | 38.0065 |
| _objective_86e23_00000 | ERROR | | 5.61152e-06 | 5 | 64 | 8.15396 |
| _objective_86e23_00001 | ERROR | | 1.56207e-05 | 2 | 16 | 7.08379 |
| _objective_86e23_00002 | ERROR | | 8.28892e-06 | 5 | 16 | 24.4435 |
| _objective_86e23_00003 | ERROR | | 1.09943e-06 | 2 | 8 | 29.158 |
| _objective_86e23_00004 | ERROR | | 2.3102e-06 | 5 | 8 | 25.0818 |
| _objective_86e23_00005 | ERROR | | 1.12076e-05 | 4 | 16 | 1.89943 |
| _objective_86e23_00006 | ERROR | | 1.67381e-05 | 2 | 32 | 2.81996 |
| _objective_86e23_00007 | ERROR | | 5.4041e-06 | 3 | 32 | 15.916 |
| _objective_86e23_00008 | ERROR | | 1.53049e-05 | 3 | 64 | 34.5377 |
+------------------------+----------+-------+-----------------+--------------------+-------------------------------+----------+
Number of errored trials: 9
+------------------------+--------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Trial name | # failures | error file |
|------------------------+--------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| _objective_86e23_00000 | 1 | /root/ray_results/_objective_2021-10-05_11-51-07/_objective_86e23_00000_0_learning_rate=5.6115e-06,num_train_epochs=5,per_device_train_batch_size=64,seed=8.154_2021-10-05_11-51-07/error.txt |
| _objective_86e23_00001 | 1 | /root/ray_results/_objective_2021-10-05_11-51-07/_objective_86e23_00001_1_learning_rate=1.5621e-05,num_train_epochs=2,per_device_train_batch_size=16,seed=7.0838_2021-10-05_11-51-08/error.txt |
| _objective_86e23_00002 | 1 | /root/ray_results/_objective_2021-10-05_11-51-07/_objective_86e23_00002_2_learning_rate=8.2889e-06,num_train_epochs=5,per_device_train_batch_size=16,seed=24.443_2021-10-05_11-51-09/error.txt |
| _objective_86e23_00003 | 1 | /root/ray_results/_objective_2021-10-05_11-51-07/_objective_86e23_00003_3_learning_rate=1.0994e-06,num_train_epochs=2,per_device_train_batch_size=8,seed=29.158_2021-10-05_11-51-17/error.txt |
| _objective_86e23_00004 | 1 | /root/ray_results/_objective_2021-10-05_11-51-07/_objective_86e23_00004_4_learning_rate=2.3102e-06,num_train_epochs=5,per_device_train_batch_size=8,seed=25.082_2021-10-05_11-51-17/error.txt |
| _objective_86e23_00005 | 1 | /root/ray_results/_objective_2021-10-05_11-51-07/_objective_86e23_00005_5_learning_rate=1.1208e-05,num_train_epochs=4,per_device_train_batch_size=16,seed=1.8994_2021-10-05_11-51-25/error.txt |
| _objective_86e23_00006 | 1 | /root/ray_results/_objective_2021-10-05_11-51-07/_objective_86e23_00006_6_learning_rate=1.6738e-05,num_train_epochs=2,per_device_train_batch_size=32,seed=2.82_2021-10-05_11-51-26/error.txt |
| _objective_86e23_00007 | 1 | /root/ray_results/_objective_2021-10-05_11-51-07/_objective_86e23_00007_7_learning_rate=5.4041e-06,num_train_epochs=3,per_device_train_batch_size=32,seed=15.916_2021-10-05_11-51-34/error.txt |
| _objective_86e23_00008 | 1 | /root/ray_results/_objective_2021-10-05_11-51-07/_objective_86e23_00008_8_learning_rate=1.5305e-05,num_train_epochs=3,per_device_train_batch_size=64,seed=34.538_2021-10-05_11-51-35/error.txt |
+------------------------+--------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2021-10-05 11:51:52,275 ERROR trial_runner.py:773 -- Trial _objective_86e23_00009: Error processing event.
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/ray/tune/trial_runner.py", line 739, in _process_trial
results = self.trial_executor.fetch_result(trial)
File "/usr/local/lib/python3.7/dist-packages/ray/tune/ray_trial_executor.py", line 746, in fetch_result
result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT)
File "/usr/local/lib/python3.7/dist-packages/ray/_private/client_mode_hook.py", line 82, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/ray/worker.py", line 1621, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(TuneError): ray::ImplicitFunc.train_buffered() (pid=823, ip=172.28.0.2, repr=<ray.tune.function_runner.ImplicitFunc object at 0x7f6b715d4990>)
File "/usr/local/lib/python3.7/dist-packages/ray/tune/trainable.py", line 178, in train_buffered
result = self.train()
File "/usr/local/lib/python3.7/dist-packages/ray/tune/trainable.py", line 237, in train
result = self.step()
File "/usr/local/lib/python3.7/dist-packages/ray/tune/function_runner.py", line 379, in step
self._report_thread_runner_error(block=True)
File "/usr/local/lib/python3.7/dist-packages/ray/tune/function_runner.py", line 527, in _report_thread_runner_error
("Trial raised an exception. Traceback:\n{}".format(err_tb_str)
ray.tune.error.TuneError: Trial raised an exception. Traceback:
ray::ImplicitFunc.train_buffered() (pid=823, ip=172.28.0.2, repr=<ray.tune.function_runner.ImplicitFunc object at 0x7f6b715d4990>)
File "/usr/local/lib/python3.7/dist-packages/ray/tune/function_runner.py", line 260, in run
self._entrypoint()
File "/usr/local/lib/python3.7/dist-packages/ray/tune/function_runner.py", line 329, in entrypoint
self._status_reporter.get_checkpoint())
File "/usr/local/lib/python3.7/dist-packages/ray/tune/function_runner.py", line 594, in _trainable_func
output = fn()
File "/usr/local/lib/python3.7/dist-packages/transformers/integrations.py", line 282, in dynamic_modules_import_trainable
return trainable(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/ray/tune/utils/trainable.py", line 344, in inner
trainable(config, **fn_kwargs)
File "/usr/local/lib/python3.7/dist-packages/transformers/integrations.py", line 183, in _objective
local_trainer.train(resume_from_checkpoint=checkpoint, trial=trial)
File "/usr/local/lib/python3.7/dist-packages/transformers/trainer.py", line 1241, in train
self.state.trial_params = hp_params(trial.assignments) if trial is not None else None
AttributeError: 'dict' object has no attribute 'assignments'
Result for _objective_86e23_00009:
{}
== Status ==
Memory usage on this node: 2.2/12.7 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/2 CPUs, 0/0 GPUs, 0.0/7.36 GiB heap, 0.0/3.68 GiB objects
Result logdir: /root/ray_results/_objective_2021-10-05_11-51-07
Number of trials: 10/10 (10 ERROR)
+------------------------+----------+-------+-----------------+--------------------+-------------------------------+----------+
| Trial name | status | loc | learning_rate | num_train_epochs | per_device_train_batch_size | seed |
|------------------------+----------+-------+-----------------+--------------------+-------------------------------+----------|
| _objective_86e23_00000 | ERROR | | 5.61152e-06 | 5 | 64 | 8.15396 |
| _objective_86e23_00001 | ERROR | | 1.56207e-05 | 2 | 16 | 7.08379 |
| _objective_86e23_00002 | ERROR | | 8.28892e-06 | 5 | 16 | 24.4435 |
| _objective_86e23_00003 | ERROR | | 1.09943e-06 | 2 | 8 | 29.158 |
| _objective_86e23_00004 | ERROR | | 2.3102e-06 | 5 | 8 | 25.0818 |
| _objective_86e23_00005 | ERROR | | 1.12076e-05 | 4 | 16 | 1.89943 |
| _objective_86e23_00006 | ERROR | | 1.67381e-05 | 2 | 32 | 2.81996 |
| _objective_86e23_00007 | ERROR | | 5.4041e-06 | 3 | 32 | 15.916 |
| _objective_86e23_00008 | ERROR | | 1.53049e-05 | 3 | 64 | 34.5377 |
| _objective_86e23_00009 | ERROR | | 7.96157e-06 | 2 | 32 | 38.0065 |
+------------------------+----------+-------+-----------------+--------------------+-------------------------------+----------+
Number of errored trials: 10
+------------------------+--------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Trial name | # failures | error file |
|------------------------+--------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| _objective_86e23_00000 | 1 | /root/ray_results/_objective_2021-10-05_11-51-07/_objective_86e23_00000_0_learning_rate=5.6115e-06,num_train_epochs=5,per_device_train_batch_size=64,seed=8.154_2021-10-05_11-51-07/error.txt |
| _objective_86e23_00001 | 1 | /root/ray_results/_objective_2021-10-05_11-51-07/_objective_86e23_00001_1_learning_rate=1.5621e-05,num_train_epochs=2,per_device_train_batch_size=16,seed=7.0838_2021-10-05_11-51-08/error.txt |
| _objective_86e23_00002 | 1 | /root/ray_results/_objective_2021-10-05_11-51-07/_objective_86e23_00002_2_learning_rate=8.2889e-06,num_train_epochs=5,per_device_train_batch_size=16,seed=24.443_2021-10-05_11-51-09/error.txt |
| _objective_86e23_00003 | 1 | /root/ray_results/_objective_2021-10-05_11-51-07/_objective_86e23_00003_3_learning_rate=1.0994e-06,num_train_epochs=2,per_device_train_batch_size=8,seed=29.158_2021-10-05_11-51-17/error.txt |
| _objective_86e23_00004 | 1 | /root/ray_results/_objective_2021-10-05_11-51-07/_objective_86e23_00004_4_learning_rate=2.3102e-06,num_train_epochs=5,per_device_train_batch_size=8,seed=25.082_2021-10-05_11-51-17/error.txt |
| _objective_86e23_00005 | 1 | /root/ray_results/_objective_2021-10-05_11-51-07/_objective_86e23_00005_5_learning_rate=1.1208e-05,num_train_epochs=4,per_device_train_batch_size=16,seed=1.8994_2021-10-05_11-51-25/error.txt |
| _objective_86e23_00006 | 1 | /root/ray_results/_objective_2021-10-05_11-51-07/_objective_86e23_00006_6_learning_rate=1.6738e-05,num_train_epochs=2,per_device_train_batch_size=32,seed=2.82_2021-10-05_11-51-26/error.txt |
| _objective_86e23_00007 | 1 | /root/ray_results/_objective_2021-10-05_11-51-07/_objective_86e23_00007_7_learning_rate=5.4041e-06,num_train_epochs=3,per_device_train_batch_size=32,seed=15.916_2021-10-05_11-51-34/error.txt |
| _objective_86e23_00008 | 1 | /root/ray_results/_objective_2021-10-05_11-51-07/_objective_86e23_00008_8_learning_rate=1.5305e-05,num_train_epochs=3,per_device_train_batch_size=64,seed=34.538_2021-10-05_11-51-35/error.txt |
| _objective_86e23_00009 | 1 | /root/ray_results/_objective_2021-10-05_11-51-07/_objective_86e23_00009_9_learning_rate=7.9616e-06,num_train_epochs=2,per_device_train_batch_size=32,seed=38.007_2021-10-05_11-51-43/error.txt |
+------------------------+--------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
(pid=823) Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_transform.weight', 'vocab_layer_norm.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.bias', 'vocab_projector.bias']
(pid=823) - This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
(pid=823) - This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
(pid=823) Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'pre_classifier.weight', 'classifier.weight', 'pre_classifier.bias']
(pid=823) You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
(pid=823) 2021-10-05 11:51:52,224 ERROR function_runner.py:266 -- Runner Thread raised error.
(pid=823) Traceback (most recent call last):
(pid=823) File "/usr/local/lib/python3.7/dist-packages/ray/tune/function_runner.py", line 260, in run
(pid=823) self._entrypoint()
(pid=823) File "/usr/local/lib/python3.7/dist-packages/ray/tune/function_runner.py", line 329, in entrypoint
(pid=823) self._status_reporter.get_checkpoint())
(pid=823) File "/usr/local/lib/python3.7/dist-packages/ray/tune/function_runner.py", line 594, in _trainable_func
(pid=823) output = fn()
(pid=823) File "/usr/local/lib/python3.7/dist-packages/transformers/integrations.py", line 282, in dynamic_modules_import_trainable
(pid=823) return trainable(*args, **kwargs)
(pid=823) File "/usr/local/lib/python3.7/dist-packages/ray/tune/utils/trainable.py", line 344, in inner
(pid=823) trainable(config, **fn_kwargs)
(pid=823) File "/usr/local/lib/python3.7/dist-packages/transformers/integrations.py", line 183, in _objective
(pid=823) local_trainer.train(resume_from_checkpoint=checkpoint, trial=trial)
(pid=823) File "/usr/local/lib/python3.7/dist-packages/transformers/trainer.py", line 1241, in train
(pid=823) self.state.trial_params = hp_params(trial.assignments) if trial is not None else None
(pid=823) AttributeError: 'dict' object has no attribute 'assignments'
(pid=823) Exception in thread Thread-2:
(pid=823) Traceback (most recent call last):
(pid=823) File "/usr/lib/python3.7/threading.py", line 926, in _bootstrap_inner
(pid=823) self.run()
(pid=823) File "/usr/local/lib/python3.7/dist-packages/ray/tune/function_runner.py", line 279, in run
(pid=823) raise e
(pid=823) File "/usr/local/lib/python3.7/dist-packages/ray/tune/function_runner.py", line 260, in run
(pid=823) self._entrypoint()
(pid=823) File "/usr/local/lib/python3.7/dist-packages/ray/tune/function_runner.py", line 329, in entrypoint
(pid=823) self._status_reporter.get_checkpoint())
(pid=823) File "/usr/local/lib/python3.7/dist-packages/ray/tune/function_runner.py", line 594, in _trainable_func
(pid=823) output = fn()
(pid=823) File "/usr/local/lib/python3.7/dist-packages/transformers/integrations.py", line 282, in dynamic_modules_import_trainable
(pid=823) return trainable(*args, **kwargs)
(pid=823) File "/usr/local/lib/python3.7/dist-packages/ray/tune/utils/trainable.py", line 344, in inner
(pid=823) trainable(config, **fn_kwargs)
(pid=823) File "/usr/local/lib/python3.7/dist-packages/transformers/integrations.py", line 183, in _objective
(pid=823) local_trainer.train(resume_from_checkpoint=checkpoint, trial=trial)
(pid=823) File "/usr/local/lib/python3.7/dist-packages/transformers/trainer.py", line 1241, in train
(pid=823) self.state.trial_params = hp_params(trial.assignments) if trial is not None else None
(pid=823) AttributeError: 'dict' object has no attribute 'assignments'
(pid=823)
---------------------------------------------------------------------------
TuneError Traceback (most recent call last)
<ipython-input-9-1f1a81b84f40> in <module>()
42 direction="maximize",
43 backend="ray",
---> 44 n_trials=10 # number of trials
45 )
2 frames
/usr/local/lib/python3.7/dist-packages/ray/tune/tune.py in run(run_or_experiment, name, metric, mode, stop, time_budget_s, config, resources_per_trial, num_samples, local_dir, search_alg, scheduler, keep_checkpoints_num, checkpoint_score_attr, checkpoint_freq, checkpoint_at_end, verbose, progress_reporter, log_to_file, trial_name_creator, trial_dirname_creator, sync_config, export_formats, max_failures, fail_fast, restore, server_port, resume, queue_trials, reuse_actors, trial_executor, raise_on_failed_trial, callbacks, loggers, ray_auto_init, run_errored_only, global_checkpoint_period, with_server, upload_dir, sync_to_cloud, sync_to_driver, sync_on_checkpoint, _remote)
553 if incomplete_trials:
554 if raise_on_failed_trial and not state[signal.SIGINT]:
--> 555 raise TuneError("Trials did not complete", incomplete_trials)
556 else:
557 logger.error("Trials did not complete: %s", incomplete_trials)
TuneError: ('Trials did not complete', [_objective_86e23_00000, _objective_86e23_00001, _objective_86e23_00002, _objective_86e23_00003, _objective_86e23_00004, _objective_86e23_00005, _objective_86e23_00006, _objective_86e23_00007, _objective_86e23_00008, _objective_86e23_00009])
loading configuration file https://huggingface.co/distilbert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/23454919702d26495337f3da04d1655c7ee010d5ec9d77bdb9e399e00302c0a1.91b885ab15d631bf9cee9dc9d25ece0afd932f2f5130eba28f2055b2220c0333
Model config DistilBertConfig {
"activation": "gelu",
"architectures": [
"DistilBertForMaskedLM"
],
"attention_dropout": 0.1,
"dim": 768,
"dropout": 0.1,
"hidden_dim": 3072,
"initializer_range": 0.02,
"max_position_embeddings": 512,
"model_type": "distilbert",
"n_heads": 12,
"n_layers": 6,
"pad_token_id": 0,
"qa_dropout": 0.1,
"seq_classif_dropout": 0.2,
"sinusoidal_pos_embds": false,
"tie_weights_": true,
"transformers_version": "4.11.2",
"vocab_size": 30522
}
loading file https://huggingface.co/distilbert-base-uncased/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/0e1bbfda7f63a99bb52e3915dcf10c3c92122b827d92eb2d34ce94ee79ba486c.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99
loading file https://huggingface.co/distilbert-base-uncased/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/75abb59d7a06f4f640158a9bfcde005264e59e8d566781ab1415b139d2e4c603.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4
loading file https://huggingface.co/distilbert-base-uncased/resolve/main/added_tokens.json from cache at None
loading file https://huggingface.co/distilbert-base-uncased/resolve/main/special_tokens_map.json from cache at None
loading file https://huggingface.co/distilbert-base-uncased/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/8c8624b8ac8aa99c60c912161f8332de003484428c47906d7ff7eb7f73eecdbb.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79
loading configuration file https://huggingface.co/distilbert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/23454919702d26495337f3da04d1655c7ee010d5ec9d77bdb9e399e00302c0a1.91b885ab15d631bf9cee9dc9d25ece0afd932f2f5130eba28f2055b2220c0333
Model config DistilBertConfig {
"activation": "gelu",
"architectures": [
"DistilBertForMaskedLM"
],
"attention_dropout": 0.1,
"dim": 768,
"dropout": 0.1,
"hidden_dim": 3072,
"initializer_range": 0.02,
"max_position_embeddings": 512,
"model_type": "distilbert",
"n_heads": 12,
"n_layers": 6,
"pad_token_id": 0,
"qa_dropout": 0.1,
"seq_classif_dropout": 0.2,
"sinusoidal_pos_embds": false,
"tie_weights_": true,
"transformers_version": "4.11.2",
"vocab_size": 30522
}
Reusing dataset glue (/root/.cache/huggingface/datasets/glue/mrpc/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)
100%
3/3 [00:00<00:00, 47.70it/s]
Loading cached processed dataset at /root/.cache/huggingface/datasets/glue/mrpc/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/cache-6be500ff95cfa94a.arrow
Loading cached processed dataset at /root/.cache/huggingface/datasets/glue/mrpc/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/cache-0208e5893d9737cc.arrow
100%
2/2 [00:00<00:00, 6.15ba/s]
PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
loading configuration file https://huggingface.co/distilbert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/23454919702d26495337f3da04d1655c7ee010d5ec9d77bdb9e399e00302c0a1.91b885ab15d631bf9cee9dc9d25ece0afd932f2f5130eba28f2055b2220c0333
Model config DistilBertConfig {
"activation": "gelu",
"architectures": [
"DistilBertForMaskedLM"
],
"attention_dropout": 0.1,
"dim": 768,
"dropout": 0.1,
"hidden_dim": 3072,
"initializer_range": 0.02,
"max_position_embeddings": 512,
"model_type": "distilbert",
"n_heads": 12,
"n_layers": 6,
"pad_token_id": 0,
"qa_dropout": 0.1,
"seq_classif_dropout": 0.2,
"sinusoidal_pos_embds": false,
"tie_weights_": true,
"transformers_version": "4.11.2",
"vocab_size": 30522
}
loading weights file https://huggingface.co/distilbert-base-uncased/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/9c169103d7e5a73936dd2b627e42851bec0831212b677c637033ee4bce9ab5ee.126183e36667471617ae2f0835fab707baa54b731f991507ebbb55ea85adb12a
Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_transform.bias', 'vocab_transform.weight', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.bias']
- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[I 2021-10-05 11:57:53,131] A new study created in memory with name: no-name-b8bec492-da86-496a-8dc9-889c57c2949a
Trial:
loading configuration file https://huggingface.co/distilbert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/23454919702d26495337f3da04d1655c7ee010d5ec9d77bdb9e399e00302c0a1.91b885ab15d631bf9cee9dc9d25ece0afd932f2f5130eba28f2055b2220c0333
Model config DistilBertConfig {
"activation": "gelu",
"architectures": [
"DistilBertForMaskedLM"
],
"attention_dropout": 0.1,
"dim": 768,
"dropout": 0.1,
"hidden_dim": 3072,
"initializer_range": 0.02,
"max_position_embeddings": 512,
"model_type": "distilbert",
"n_heads": 12,
"n_layers": 6,
"pad_token_id": 0,
"qa_dropout": 0.1,
"seq_classif_dropout": 0.2,
"sinusoidal_pos_embds": false,
"tie_weights_": true,
"transformers_version": "4.11.2",
"vocab_size": 30522
}
loading weights file https://huggingface.co/distilbert-base-uncased/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/9c169103d7e5a73936dd2b627e42851bec0831212b677c637033ee4bce9ab5ee.126183e36667471617ae2f0835fab707baa54b731f991507ebbb55ea85adb12a
Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_transform.bias', 'vocab_transform.weight', 'vocab_projector.weight', 'vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.bias']
- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
***** Running training *****
Num examples = 3668
Num Epochs = 2
Instantaneous batch size per device = 64
Total train batch size (w. parallel, distributed & accumulation) = 64
Gradient Accumulation steps = 1
Total optimization steps = 116
[W 2021-10-05 11:57:54,441] Trial 0 failed because of the following error: AttributeError("'Trial' object has no attribute 'assignments'")
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/optuna/study/_optimize.py", line 213, in _run_trial
value_or_values = func(trial)
File "/usr/local/lib/python3.7/dist-packages/transformers/integrations.py", line 150, in _objective
trainer.train(resume_from_checkpoint=checkpoint, trial=trial)
File "/usr/local/lib/python3.7/dist-packages/transformers/trainer.py", line 1241, in train
self.state.trial_params = hp_params(trial.assignments) if trial is not None else None
AttributeError: 'Trial' object has no attribute 'assignments'
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-10-ec85f0236770> in <module>()
42 direction="maximize",
43 backend="optuna",
---> 44 n_trials=10 # number of trials
45 )
8 frames
/usr/local/lib/python3.7/dist-packages/transformers/trainer.py in train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
1239 self.callback_handler.train_dataloader = train_dataloader
1240 self.state.trial_name = self.hp_name(trial) if self.hp_name is not None else None
-> 1241 self.state.trial_params = hp_params(trial.assignments) if trial is not None else None
1242 # This should be the same if the state has been saved but in case the training arguments changed, it's safer
1243 # to set this after the load.
AttributeError: 'Trial' object has no attribute 'assignments'