I'm trying to run the model training script for some of my experiments. I installed all the packages in a separate python environment based on the main repo page.
2023-07-08 08:58:24,308 - INFO - numexpr.utils - Note: NumExpr detected 12 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
2023-07-08 08:58:24,308 - INFO - numexpr.utils - NumExpr defaulting to 8 threads.
2023-07-08 08:58:24,572 - INFO - allennlp.common.params - evaluation = None
2023-07-08 08:58:24,572 - INFO - allennlp.common.params - include_in_archive = None
2023-07-08 08:58:24,572 - INFO - allennlp.common.params - random_seed = 13370
2023-07-08 08:58:24,572 - INFO - allennlp.common.params - numpy_seed = 1337
2023-07-08 08:58:24,572 - INFO - allennlp.common.params - pytorch_seed = 133
2023-07-08 08:58:24,573 - INFO - allennlp.common.checks - Pytorch version: 2.0.1
2023-07-08 08:58:24,574 - INFO - allennlp.common.params - type = default
2023-07-08 08:58:24,574 - INFO - allennlp.common.params - dataset_reader.type = muc
2023-07-08 08:58:24,574 - INFO - allennlp.common.params - dataset_reader.max_instances = None
2023-07-08 08:58:24,575 - INFO - allennlp.common.params - dataset_reader.manual_distributed_sharding = False
2023-07-08 08:58:24,575 - INFO - allennlp.common.params - dataset_reader.manual_multiprocess_sharding = False
2023-07-08 08:58:24,575 - INFO - allennlp.common.params - dataset_reader.definition_file = resources/data/muc/definitions.json
2023-07-08 08:58:24,575 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.type = pretrained_transformer_mismatched
2023-07-08 08:58:24,575 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.token_min_padding_length = 0
2023-07-08 08:58:24,575 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.model_name = t5-large
2023-07-08 08:58:24,575 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.namespace = tags
2023-07-08 08:58:24,575 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.max_length = 1024
2023-07-08 08:58:24,575 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.tokenizer_kwargs = None
Downloading (…)lve/main/config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.21k/1.21k [00:00<00:00, 11.2MB/s]
Downloading (…)ve/main/spiece.model: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 792k/792k [00:00<00:00, 12.1MB/s]
Downloading (…)/main/tokenizer.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.39M/1.39M [00:00<00:00, 17.0MB/s]
/home/sidvash/.conda/envs/iterx/lib/python3.11/site-packages/transformers/models/t5/tokenization_t5_fast.py:155: FutureWarning: This tokenizer was incorrectly instantiated with a model max length o
f 512 which will be corrected in Transformers v5.
For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.
- Be aware that you SHOULD NOT rely on t5-large automatically truncating your input to 512 when padding/encoding.
- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.
- To avoid this warning, please instantiate this tokenizer with `model_max_length` set to your preferred value.
warnings.warn(
2023-07-08 08:58:26,249 - INFO - allennlp.common.params - dataset_reader.is_training = True
2023-07-08 08:58:26,249 - INFO - allennlp.common.params - dataset_reader.skip_docs_without_templates = False
2023-07-08 08:58:26,249 - INFO - allennlp.common.params - dataset_reader.skip_docs_without_spans = True
2023-07-08 08:58:26,249 - INFO - allennlp.common.params - dataset_reader.verbose = False
2023-07-08 08:58:26,265 - INFO - allennlp.common.params - train_data_path = resources/data/muc/preprocessed/tokenized/train.json
2023-07-08 08:58:26,266 - INFO - allennlp.common.params - datasets_for_vocab_creation = None
2023-07-08 08:58:26,266 - INFO - allennlp.common.params - validation_dataset_reader.type = muc
2023-07-08 08:58:26,266 - INFO - allennlp.common.params - validation_dataset_reader.max_instances = None
2023-07-08 08:58:26,266 - INFO - allennlp.common.params - validation_dataset_reader.manual_distributed_sharding = False
2023-07-08 08:58:26,266 - INFO - allennlp.common.params - validation_dataset_reader.manual_multiprocess_sharding = False
2023-07-08 08:58:26,266 - INFO - allennlp.common.params - validation_dataset_reader.definition_file = resources/data/muc/definitions.json
2023-07-08 08:58:26,266 - INFO - allennlp.common.params - validation_dataset_reader.token_indexers.tokens.type = pretrained_transformer_mismatched
2023-07-08 08:58:26,267 - INFO - allennlp.common.params - validation_dataset_reader.token_indexers.tokens.token_min_padding_length = 0
2023-07-08 08:58:26,267 - INFO - allennlp.common.params - validation_dataset_reader.token_indexers.tokens.model_name = t5-large
2023-07-08 08:58:26,267 - INFO - allennlp.common.params - validation_dataset_reader.token_indexers.tokens.namespace = tags
2023-07-08 08:58:26,267 - INFO - allennlp.common.params - validation_dataset_reader.token_indexers.tokens.max_length = 1024
2023-07-08 08:58:26,267 - INFO - allennlp.common.params - validation_dataset_reader.token_indexers.tokens.tokenizer_kwargs = None
2023-07-08 08:58:26,268 - INFO - allennlp.common.params - validation_dataset_reader.is_training = False
2023-07-08 08:58:26,268 - INFO - allennlp.common.params - validation_dataset_reader.skip_docs_without_templates = True
2023-07-08 08:58:26,268 - INFO - allennlp.common.params - validation_dataset_reader.skip_docs_without_spans = True
2023-07-08 08:58:26,268 - INFO - allennlp.common.params - validation_dataset_reader.verbose = False
2023-07-08 08:58:26,268 - INFO - allennlp.common.params - validation_data_path = resources/data/muc/preprocessed/tokenized/dev.json
2023-07-08 08:58:26,268 - INFO - allennlp.common.params - validation_data_loader = None
2023-07-08 08:58:26,268 - INFO - allennlp.common.params - test_data_path = resources/data/muc/preprocessed/tokenized/test.json
2023-07-08 08:58:26,268 - INFO - allennlp.common.params - evaluate_on_test = False
2023-07-08 08:58:26,268 - INFO - allennlp.common.params - batch_weight_key =
2023-07-08 08:58:26,268 - INFO - allennlp.common.params - data_loader.type = multiprocess
2023-07-08 08:58:26,269 - INFO - allennlp.common.params - data_loader.batch_size = None
2023-07-08 08:58:26,269 - INFO - allennlp.common.params - data_loader.drop_last = False
2023-07-08 08:58:26,269 - INFO - allennlp.common.params - data_loader.shuffle = False
2023-07-08 08:58:26,269 - INFO - allennlp.common.params - data_loader.batch_sampler.type = bucket
2023-07-08 08:58:26,269 - INFO - allennlp.common.params - data_loader.batch_sampler.batch_size = 1
2023-07-08 08:58:26,269 - INFO - allennlp.common.params - data_loader.batch_sampler.sorting_keys = ['text']
2023-07-08 08:58:26,269 - INFO - allennlp.common.params - data_loader.batch_sampler.padding_noise = 0.1
2023-07-08 08:58:26,269 - INFO - allennlp.common.params - data_loader.batch_sampler.drop_last = False
2023-07-08 08:58:26,269 - INFO - allennlp.common.params - data_loader.batch_sampler.shuffle = True
2023-07-08 08:58:26,269 - INFO - allennlp.common.params - data_loader.batches_per_epoch = None
2023-07-08 08:58:26,269 - INFO - allennlp.common.params - data_loader.num_workers = 0
2023-07-08 08:58:26,269 - INFO - allennlp.common.params - data_loader.max_instances_in_memory = None
2023-07-08 08:58:26,269 - INFO - allennlp.common.params - data_loader.start_method = fork
2023-07-08 08:58:26,269 - INFO - allennlp.common.params - data_loader.cuda_device = None
2023-07-08 08:58:26,269 - INFO - allennlp.common.params - data_loader.quiet = False
2023-07-08 08:58:26,270 - INFO - allennlp.common.params - data_loader.collate_fn = <allennlp.data.data_loaders.data_collator.DefaultDataCollator object at 0x7f7065851a90>
loading instances: 0it [00:00, ?it/s]2023-07-08 08:58:26,305 - WARNING - allennlp.data.fields.sequence_label_field - Your label namespace was 'event_types'. We recommend you use a namespace ending
with 'labels' or 'tags', so we don't add UNK and PAD tokens by default to your vocabulary. See documentation for `non_padded_namespaces` parameter in Vocabulary.
2023-07-08 08:58:26,306 - WARNING - allennlp.data.fields.label_field - Your label namespace was 'slot_types'. We recommend you use a namespace ending with 'labels' or 'tags', so we don't add UNK an
d PAD tokens by default to your vocabulary. See documentation for `non_padded_namespaces` parameter in Vocabulary.
loading instances: 3995it [00:04, 792.24it/s] 2023-07-08 08:58:30,380 - WARNING - iterx.data.dataset.muc_dataset - Read 1300 documents. Of these, 672 had both templates and spans. 600 had no templa
tes and 628 had no spans.
loading instances: 4032it [00:04, 980.93it/s]
2023-07-08 08:58:30,380 - INFO - allennlp.common.params - data_loader.type = multiprocess
2023-07-08 08:58:30,381 - INFO - allennlp.common.params - data_loader.batch_size = None
2023-07-08 08:58:30,381 - INFO - allennlp.common.params - data_loader.drop_last = False
2023-07-08 08:58:30,381 - INFO - allennlp.common.params - data_loader.shuffle = False
2023-07-08 08:58:30,381 - INFO - allennlp.common.params - data_loader.batch_sampler.type = bucket
2023-07-08 08:58:30,381 - INFO - allennlp.common.params - data_loader.batch_sampler.batch_size = 1
2023-07-08 08:58:30,381 - INFO - allennlp.common.params - data_loader.batch_sampler.sorting_keys = ['text']
2023-07-08 08:58:30,381 - INFO - allennlp.common.params - data_loader.batch_sampler.padding_noise = 0.1
2023-07-08 08:58:30,381 - INFO - allennlp.common.params - data_loader.batch_sampler.drop_last = False
2023-07-08 08:58:30,381 - INFO - allennlp.common.params - data_loader.batch_sampler.shuffle = True
2023-07-08 08:58:30,381 - INFO - allennlp.common.params - data_loader.batches_per_epoch = None
2023-07-08 08:58:30,381 - INFO - allennlp.common.params - data_loader.num_workers = 0
2023-07-08 08:58:30,381 - INFO - allennlp.common.params - data_loader.max_instances_in_memory = None
2023-07-08 08:58:30,381 - INFO - allennlp.common.params - data_loader.start_method = fork
2023-07-08 08:58:30,381 - INFO - allennlp.common.params - data_loader.cuda_device = None
2023-07-08 08:58:30,381 - INFO - allennlp.common.params - data_loader.quiet = False
2023-07-08 08:58:30,381 - INFO - allennlp.common.params - data_loader.collate_fn = <allennlp.data.data_loaders.data_collator.DefaultDataCollator object at 0x7f7065851a90>
loading instances: 613it [00:00, 3097.39it/s]2023-07-08 08:58:30,604 - WARNING - iterx.data.dataset.muc_dataset - Read 200 documents. Of these, 111 had both templates and spans. 84 had no templates
and 89 had no spans.
loading instances: 666it [00:00, 2997.71it/s]
2023-07-08 08:58:30,604 - INFO - allennlp.common.params - data_loader.type = multiprocess
2023-07-08 08:58:30,604 - INFO - allennlp.common.params - data_loader.batch_size = None
2023-07-08 08:58:30,604 - INFO - allennlp.common.params - data_loader.drop_last = False
2023-07-08 08:58:30,604 - INFO - allennlp.common.params - data_loader.shuffle = False
2023-07-08 08:58:30,605 - INFO - allennlp.common.params - data_loader.batch_sampler.type = bucket
2023-07-08 08:58:30,605 - INFO - allennlp.common.params - data_loader.batch_sampler.batch_size = 1
2023-07-08 08:58:30,605 - INFO - allennlp.common.params - data_loader.batch_sampler.sorting_keys = ['text']
2023-07-08 08:58:30,605 - INFO - allennlp.common.params - data_loader.batch_sampler.padding_noise = 0.1
2023-07-08 08:58:30,605 - INFO - allennlp.common.params - data_loader.batch_sampler.drop_last = False
2023-07-08 08:58:30,605 - INFO - allennlp.common.params - data_loader.batch_sampler.shuffle = True
2023-07-08 08:58:30,605 - INFO - allennlp.common.params - data_loader.batches_per_epoch = None
2023-07-08 08:58:30,605 - INFO - allennlp.common.params - data_loader.num_workers = 0
2023-07-08 08:58:30,605 - INFO - allennlp.common.params - data_loader.max_instances_in_memory = None
2023-07-08 08:58:30,605 - INFO - allennlp.common.params - data_loader.start_method = fork
2023-07-08 08:58:30,605 - INFO - allennlp.common.params - data_loader.cuda_device = None
2023-07-08 08:58:30,605 - INFO - allennlp.common.params - data_loader.quiet = False
2023-07-08 08:58:30,605 - INFO - allennlp.common.params - data_loader.collate_fn = <allennlp.data.data_loaders.data_collator.DefaultDataCollator object at 0x7f7065851a90>
loading instances: 409it [00:00, 757.83it/s]2023-07-08 08:58:31,343 - WARNING - iterx.data.dataset.muc_dataset - Read 200 documents. Of these, 123 had both templates and spans. 74 had no templates
and 77 had no spans.
loading instances: 738it [00:00, 1000.58it/s]
2023-07-08 08:58:31,343 - INFO - allennlp.common.params - vocabulary.type = from_files
2023-07-08 08:58:31,343 - INFO - allennlp.common.params - vocabulary.directory = resources/data/muc/vocabulary
2023-07-08 08:58:31,343 - INFO - allennlp.common.params - vocabulary.padding_token = @@PADDING@@
2023-07-08 08:58:31,343 - INFO - allennlp.common.params - vocabulary.oov_token = @@UNKNOWN@@
2023-07-08 08:58:31,344 - INFO - allennlp.data.vocabulary - Loading token dictionary from resources/data/muc/vocabulary.
2023-07-08 08:58:31,344 - INFO - allennlp.common.params - model.type = iterative_template_extraction
2023-07-08 08:58:31,344 - INFO - allennlp.common.params - model.regularizer = None
2023-07-08 08:58:31,344 - INFO - allennlp.common.params - model.definition_file = resources/data/muc/definitions.json
2023-07-08 08:58:31,345 - CRITICAL - root - Uncaught exception
Traceback (most recent call last):
File "/home/sidvash/.conda/envs/iterx/lib/python3.11/site-packages/allennlp/common/params.py", line 211, in pop
value = self.params.pop(key)
^^^^^^^^^^^^^^^^^^^^
KeyError: 'type'
During handling of the above exception, another exception occurred: [15/1945]
Traceback (most recent call last):
File "/home/sidvash/.conda/envs/iterx/bin/allennlp", line 8, in <module>
sys.exit(run())
^^^^^
File "/home/sidvash/.conda/envs/iterx/lib/python3.11/site-packages/allennlp/__main__.py", line 39, in run
main(prog="allennlp")
File "/home/sidvash/.conda/envs/iterx/lib/python3.11/site-packages/allennlp/commands/__init__.py", line 120, in main
args.func(args)
File "/home/sidvash/.conda/envs/iterx/lib/python3.11/site-packages/allennlp/commands/train.py", line 111, in train_model_from_args
train_model_from_file(
File "/home/sidvash/.conda/envs/iterx/lib/python3.11/site-packages/allennlp/commands/train.py", line 177, in train_model_from_file
return train_model(
^^^^^^^^^^^^
File "/home/sidvash/.conda/envs/iterx/lib/python3.11/site-packages/allennlp/commands/train.py", line 258, in train_model
model = _train_worker(
^^^^^^^^^^^^^^
File "/home/sidvash/.conda/envs/iterx/lib/python3.11/site-packages/allennlp/commands/train.py", line 494, in _train_worker
train_loop = TrainModel.from_params(
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sidvash/.conda/envs/iterx/lib/python3.11/site-packages/allennlp/common/from_params.py", line 604, in from_params
return retyped_subclass.from_params(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sidvash/.conda/envs/iterx/lib/python3.11/site-packages/allennlp/common/from_params.py", line 638, in from_params
return constructor_to_call(**kwargs) # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sidvash/.conda/envs/iterx/lib/python3.11/site-packages/allennlp/commands/train.py", line 770, in from_partial_objects
model_ = model.construct(
^^^^^^^^^^^^^^^^
File "/home/sidvash/.conda/envs/iterx/lib/python3.11/site-packages/allennlp/common/lazy.py", line 82, in construct
return self.constructor(**contructor_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sidvash/.conda/envs/iterx/lib/python3.11/site-packages/allennlp/common/lazy.py", line 66, in constructor_to_use
return self._constructor.from_params( # type: ignore[union-attr]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sidvash/.conda/envs/iterx/lib/python3.11/site-packages/allennlp/common/from_params.py", line 604, in from_params
return retyped_subclass.from_params(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sidvash/.conda/envs/iterx/lib/python3.11/site-packages/allennlp/common/from_params.py", line 636, in from_params
kwargs = create_kwargs(constructor_to_inspect, cls, params, **extras)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sidvash/.conda/envs/iterx/lib/python3.11/site-packages/allennlp/common/from_params.py", line 206, in create_kwargs
constructed_arg = pop_and_construct_arg(
^^^^^^^^^^^^^^^^^^^^^^
File "/home/sidvash/.conda/envs/iterx/lib/python3.11/site-packages/allennlp/common/from_params.py", line 314, in pop_and_construct_arg
return construct_arg(class_name, name, popped_params, annotation, default, **extras)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sidvash/.conda/envs/iterx/lib/python3.11/site-packages/allennlp/common/from_params.py", line 348, in construct_arg
result = annotation.from_params(params=popped_params, **subextras)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sidvash/.conda/envs/iterx/lib/python3.11/site-packages/allennlp/common/from_params.py", line 585, in from_params
choice = params.pop_choice(
^^^^^^^^^^^^^^^^^^
File "/home/sidvash/.conda/envs/iterx/lib/python3.11/site-packages/allennlp/common/params.py", line 314, in pop_choice
value = self.pop(key, default)
^^^^^^^^^^^^^^^^^^^^^^
File "/home/sidvash/.conda/envs/iterx/lib/python3.11/site-packages/allennlp/common/params.py", line 216, in pop
raise ConfigurationError(msg)
allennlp.common.checks.ConfigurationError: key "type" is required at location "model.graph_encoder."