wanmok / iterx Goto Github PK
View Code? Open in Web Editor NEWThis is the repo for the IterX model. The paper was accepted to EACL 2023 and won an outstanding paper award.
Home Page: https://arxiv.org/abs/2210.06600
This is the repo for the IterX model. The paper was accepted to EACL 2023 and won an outstanding paper award.
Home Page: https://arxiv.org/abs/2210.06600
I'm trying to run the model training script for some of my experiments. I installed all the packages in a separate python environment based on the main repo page.
Then I ran:
PYTHONPATH=./src allennlp train \
--include-package iterx \
-s new_iterx_dir \
/data/sid/iterx/resources/training_configs/muc_config.jsonnet
and I get the following error:
allennlp.common.checks.ConfigurationError: key "type" is required at location "model.graph_encoder."
below is the full log:
2023-07-08 08:58:24,308 - INFO - numexpr.utils - Note: NumExpr detected 12 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
2023-07-08 08:58:24,308 - INFO - numexpr.utils - NumExpr defaulting to 8 threads.
2023-07-08 08:58:24,572 - INFO - allennlp.common.params - evaluation = None
2023-07-08 08:58:24,572 - INFO - allennlp.common.params - include_in_archive = None
2023-07-08 08:58:24,572 - INFO - allennlp.common.params - random_seed = 13370
2023-07-08 08:58:24,572 - INFO - allennlp.common.params - numpy_seed = 1337
2023-07-08 08:58:24,572 - INFO - allennlp.common.params - pytorch_seed = 133
2023-07-08 08:58:24,573 - INFO - allennlp.common.checks - Pytorch version: 2.0.1
2023-07-08 08:58:24,574 - INFO - allennlp.common.params - type = default
2023-07-08 08:58:24,574 - INFO - allennlp.common.params - dataset_reader.type = muc
2023-07-08 08:58:24,574 - INFO - allennlp.common.params - dataset_reader.max_instances = None
2023-07-08 08:58:24,575 - INFO - allennlp.common.params - dataset_reader.manual_distributed_sharding = False
2023-07-08 08:58:24,575 - INFO - allennlp.common.params - dataset_reader.manual_multiprocess_sharding = False
2023-07-08 08:58:24,575 - INFO - allennlp.common.params - dataset_reader.definition_file = resources/data/muc/definitions.json
2023-07-08 08:58:24,575 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.type = pretrained_transformer_mismatched
2023-07-08 08:58:24,575 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.token_min_padding_length = 0
2023-07-08 08:58:24,575 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.model_name = t5-large
2023-07-08 08:58:24,575 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.namespace = tags
2023-07-08 08:58:24,575 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.max_length = 1024
2023-07-08 08:58:24,575 - INFO - allennlp.common.params - dataset_reader.token_indexers.tokens.tokenizer_kwargs = None
Downloading (…)lve/main/config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.21k/1.21k [00:00<00:00, 11.2MB/s]
Downloading (…)ve/main/spiece.model: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 792k/792k [00:00<00:00, 12.1MB/s]
Downloading (…)/main/tokenizer.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.39M/1.39M [00:00<00:00, 17.0MB/s]
/home/sidvash/.conda/envs/iterx/lib/python3.11/site-packages/transformers/models/t5/tokenization_t5_fast.py:155: FutureWarning: This tokenizer was incorrectly instantiated with a model max length o
f 512 which will be corrected in Transformers v5.
For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.
- Be aware that you SHOULD NOT rely on t5-large automatically truncating your input to 512 when padding/encoding.
- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.
- To avoid this warning, please instantiate this tokenizer with `model_max_length` set to your preferred value.
warnings.warn(
2023-07-08 08:58:26,249 - INFO - allennlp.common.params - dataset_reader.is_training = True
2023-07-08 08:58:26,249 - INFO - allennlp.common.params - dataset_reader.skip_docs_without_templates = False
2023-07-08 08:58:26,249 - INFO - allennlp.common.params - dataset_reader.skip_docs_without_spans = True
2023-07-08 08:58:26,249 - INFO - allennlp.common.params - dataset_reader.verbose = False
2023-07-08 08:58:26,265 - INFO - allennlp.common.params - train_data_path = resources/data/muc/preprocessed/tokenized/train.json
2023-07-08 08:58:26,266 - INFO - allennlp.common.params - datasets_for_vocab_creation = None
2023-07-08 08:58:26,266 - INFO - allennlp.common.params - validation_dataset_reader.type = muc
2023-07-08 08:58:26,266 - INFO - allennlp.common.params - validation_dataset_reader.max_instances = None
2023-07-08 08:58:26,266 - INFO - allennlp.common.params - validation_dataset_reader.manual_distributed_sharding = False
2023-07-08 08:58:26,266 - INFO - allennlp.common.params - validation_dataset_reader.manual_multiprocess_sharding = False
2023-07-08 08:58:26,266 - INFO - allennlp.common.params - validation_dataset_reader.definition_file = resources/data/muc/definitions.json
2023-07-08 08:58:26,266 - INFO - allennlp.common.params - validation_dataset_reader.token_indexers.tokens.type = pretrained_transformer_mismatched
2023-07-08 08:58:26,267 - INFO - allennlp.common.params - validation_dataset_reader.token_indexers.tokens.token_min_padding_length = 0
2023-07-08 08:58:26,267 - INFO - allennlp.common.params - validation_dataset_reader.token_indexers.tokens.model_name = t5-large
2023-07-08 08:58:26,267 - INFO - allennlp.common.params - validation_dataset_reader.token_indexers.tokens.namespace = tags
2023-07-08 08:58:26,267 - INFO - allennlp.common.params - validation_dataset_reader.token_indexers.tokens.max_length = 1024
2023-07-08 08:58:26,267 - INFO - allennlp.common.params - validation_dataset_reader.token_indexers.tokens.tokenizer_kwargs = None
2023-07-08 08:58:26,268 - INFO - allennlp.common.params - validation_dataset_reader.is_training = False
2023-07-08 08:58:26,268 - INFO - allennlp.common.params - validation_dataset_reader.skip_docs_without_templates = True
2023-07-08 08:58:26,268 - INFO - allennlp.common.params - validation_dataset_reader.skip_docs_without_spans = True
2023-07-08 08:58:26,268 - INFO - allennlp.common.params - validation_dataset_reader.verbose = False
2023-07-08 08:58:26,268 - INFO - allennlp.common.params - validation_data_path = resources/data/muc/preprocessed/tokenized/dev.json
2023-07-08 08:58:26,268 - INFO - allennlp.common.params - validation_data_loader = None
2023-07-08 08:58:26,268 - INFO - allennlp.common.params - test_data_path = resources/data/muc/preprocessed/tokenized/test.json
2023-07-08 08:58:26,268 - INFO - allennlp.common.params - evaluate_on_test = False
2023-07-08 08:58:26,268 - INFO - allennlp.common.params - batch_weight_key =
2023-07-08 08:58:26,268 - INFO - allennlp.common.params - data_loader.type = multiprocess
2023-07-08 08:58:26,269 - INFO - allennlp.common.params - data_loader.batch_size = None
2023-07-08 08:58:26,269 - INFO - allennlp.common.params - data_loader.drop_last = False
2023-07-08 08:58:26,269 - INFO - allennlp.common.params - data_loader.shuffle = False
2023-07-08 08:58:26,269 - INFO - allennlp.common.params - data_loader.batch_sampler.type = bucket
2023-07-08 08:58:26,269 - INFO - allennlp.common.params - data_loader.batch_sampler.batch_size = 1
2023-07-08 08:58:26,269 - INFO - allennlp.common.params - data_loader.batch_sampler.sorting_keys = ['text']
2023-07-08 08:58:26,269 - INFO - allennlp.common.params - data_loader.batch_sampler.padding_noise = 0.1
2023-07-08 08:58:26,269 - INFO - allennlp.common.params - data_loader.batch_sampler.drop_last = False
2023-07-08 08:58:26,269 - INFO - allennlp.common.params - data_loader.batch_sampler.shuffle = True
2023-07-08 08:58:26,269 - INFO - allennlp.common.params - data_loader.batches_per_epoch = None
2023-07-08 08:58:26,269 - INFO - allennlp.common.params - data_loader.num_workers = 0
2023-07-08 08:58:26,269 - INFO - allennlp.common.params - data_loader.max_instances_in_memory = None
2023-07-08 08:58:26,269 - INFO - allennlp.common.params - data_loader.start_method = fork
2023-07-08 08:58:26,269 - INFO - allennlp.common.params - data_loader.cuda_device = None
2023-07-08 08:58:26,269 - INFO - allennlp.common.params - data_loader.quiet = False
2023-07-08 08:58:26,270 - INFO - allennlp.common.params - data_loader.collate_fn = <allennlp.data.data_loaders.data_collator.DefaultDataCollator object at 0x7f7065851a90>
loading instances: 0it [00:00, ?it/s]2023-07-08 08:58:26,305 - WARNING - allennlp.data.fields.sequence_label_field - Your label namespace was 'event_types'. We recommend you use a namespace ending
with 'labels' or 'tags', so we don't add UNK and PAD tokens by default to your vocabulary. See documentation for `non_padded_namespaces` parameter in Vocabulary.
2023-07-08 08:58:26,306 - WARNING - allennlp.data.fields.label_field - Your label namespace was 'slot_types'. We recommend you use a namespace ending with 'labels' or 'tags', so we don't add UNK an
d PAD tokens by default to your vocabulary. See documentation for `non_padded_namespaces` parameter in Vocabulary.
loading instances: 3995it [00:04, 792.24it/s] 2023-07-08 08:58:30,380 - WARNING - iterx.data.dataset.muc_dataset - Read 1300 documents. Of these, 672 had both templates and spans. 600 had no templa
tes and 628 had no spans.
loading instances: 4032it [00:04, 980.93it/s]
2023-07-08 08:58:30,380 - INFO - allennlp.common.params - data_loader.type = multiprocess
2023-07-08 08:58:30,381 - INFO - allennlp.common.params - data_loader.batch_size = None
2023-07-08 08:58:30,381 - INFO - allennlp.common.params - data_loader.drop_last = False
2023-07-08 08:58:30,381 - INFO - allennlp.common.params - data_loader.shuffle = False
2023-07-08 08:58:30,381 - INFO - allennlp.common.params - data_loader.batch_sampler.type = bucket
2023-07-08 08:58:30,381 - INFO - allennlp.common.params - data_loader.batch_sampler.batch_size = 1
2023-07-08 08:58:30,381 - INFO - allennlp.common.params - data_loader.batch_sampler.sorting_keys = ['text']
2023-07-08 08:58:30,381 - INFO - allennlp.common.params - data_loader.batch_sampler.padding_noise = 0.1
2023-07-08 08:58:30,381 - INFO - allennlp.common.params - data_loader.batch_sampler.drop_last = False
2023-07-08 08:58:30,381 - INFO - allennlp.common.params - data_loader.batch_sampler.shuffle = True
2023-07-08 08:58:30,381 - INFO - allennlp.common.params - data_loader.batches_per_epoch = None
2023-07-08 08:58:30,381 - INFO - allennlp.common.params - data_loader.num_workers = 0
2023-07-08 08:58:30,381 - INFO - allennlp.common.params - data_loader.max_instances_in_memory = None
2023-07-08 08:58:30,381 - INFO - allennlp.common.params - data_loader.start_method = fork
2023-07-08 08:58:30,381 - INFO - allennlp.common.params - data_loader.cuda_device = None
2023-07-08 08:58:30,381 - INFO - allennlp.common.params - data_loader.quiet = False
2023-07-08 08:58:30,381 - INFO - allennlp.common.params - data_loader.collate_fn = <allennlp.data.data_loaders.data_collator.DefaultDataCollator object at 0x7f7065851a90>
loading instances: 613it [00:00, 3097.39it/s]2023-07-08 08:58:30,604 - WARNING - iterx.data.dataset.muc_dataset - Read 200 documents. Of these, 111 had both templates and spans. 84 had no templates
and 89 had no spans.
loading instances: 666it [00:00, 2997.71it/s]
2023-07-08 08:58:30,604 - INFO - allennlp.common.params - data_loader.type = multiprocess
2023-07-08 08:58:30,604 - INFO - allennlp.common.params - data_loader.batch_size = None
2023-07-08 08:58:30,604 - INFO - allennlp.common.params - data_loader.drop_last = False
2023-07-08 08:58:30,604 - INFO - allennlp.common.params - data_loader.shuffle = False
2023-07-08 08:58:30,605 - INFO - allennlp.common.params - data_loader.batch_sampler.type = bucket
2023-07-08 08:58:30,605 - INFO - allennlp.common.params - data_loader.batch_sampler.batch_size = 1
2023-07-08 08:58:30,605 - INFO - allennlp.common.params - data_loader.batch_sampler.sorting_keys = ['text']
2023-07-08 08:58:30,605 - INFO - allennlp.common.params - data_loader.batch_sampler.padding_noise = 0.1
2023-07-08 08:58:30,605 - INFO - allennlp.common.params - data_loader.batch_sampler.drop_last = False
2023-07-08 08:58:30,605 - INFO - allennlp.common.params - data_loader.batch_sampler.shuffle = True
2023-07-08 08:58:30,605 - INFO - allennlp.common.params - data_loader.batches_per_epoch = None
2023-07-08 08:58:30,605 - INFO - allennlp.common.params - data_loader.num_workers = 0
2023-07-08 08:58:30,605 - INFO - allennlp.common.params - data_loader.max_instances_in_memory = None
2023-07-08 08:58:30,605 - INFO - allennlp.common.params - data_loader.start_method = fork
2023-07-08 08:58:30,605 - INFO - allennlp.common.params - data_loader.cuda_device = None
2023-07-08 08:58:30,605 - INFO - allennlp.common.params - data_loader.quiet = False
2023-07-08 08:58:30,605 - INFO - allennlp.common.params - data_loader.collate_fn = <allennlp.data.data_loaders.data_collator.DefaultDataCollator object at 0x7f7065851a90>
loading instances: 409it [00:00, 757.83it/s]2023-07-08 08:58:31,343 - WARNING - iterx.data.dataset.muc_dataset - Read 200 documents. Of these, 123 had both templates and spans. 74 had no templates
and 77 had no spans.
loading instances: 738it [00:00, 1000.58it/s]
2023-07-08 08:58:31,343 - INFO - allennlp.common.params - vocabulary.type = from_files
2023-07-08 08:58:31,343 - INFO - allennlp.common.params - vocabulary.directory = resources/data/muc/vocabulary
2023-07-08 08:58:31,343 - INFO - allennlp.common.params - vocabulary.padding_token = @@PADDING@@
2023-07-08 08:58:31,343 - INFO - allennlp.common.params - vocabulary.oov_token = @@UNKNOWN@@
2023-07-08 08:58:31,344 - INFO - allennlp.data.vocabulary - Loading token dictionary from resources/data/muc/vocabulary.
2023-07-08 08:58:31,344 - INFO - allennlp.common.params - model.type = iterative_template_extraction
2023-07-08 08:58:31,344 - INFO - allennlp.common.params - model.regularizer = None
2023-07-08 08:58:31,344 - INFO - allennlp.common.params - model.definition_file = resources/data/muc/definitions.json
2023-07-08 08:58:31,345 - CRITICAL - root - Uncaught exception
Traceback (most recent call last):
File "/home/sidvash/.conda/envs/iterx/lib/python3.11/site-packages/allennlp/common/params.py", line 211, in pop
value = self.params.pop(key)
^^^^^^^^^^^^^^^^^^^^
KeyError: 'type'
During handling of the above exception, another exception occurred: [15/1945]
Traceback (most recent call last):
File "/home/sidvash/.conda/envs/iterx/bin/allennlp", line 8, in <module>
sys.exit(run())
^^^^^
File "/home/sidvash/.conda/envs/iterx/lib/python3.11/site-packages/allennlp/__main__.py", line 39, in run
main(prog="allennlp")
File "/home/sidvash/.conda/envs/iterx/lib/python3.11/site-packages/allennlp/commands/__init__.py", line 120, in main
args.func(args)
File "/home/sidvash/.conda/envs/iterx/lib/python3.11/site-packages/allennlp/commands/train.py", line 111, in train_model_from_args
train_model_from_file(
File "/home/sidvash/.conda/envs/iterx/lib/python3.11/site-packages/allennlp/commands/train.py", line 177, in train_model_from_file
return train_model(
^^^^^^^^^^^^
File "/home/sidvash/.conda/envs/iterx/lib/python3.11/site-packages/allennlp/commands/train.py", line 258, in train_model
model = _train_worker(
^^^^^^^^^^^^^^
File "/home/sidvash/.conda/envs/iterx/lib/python3.11/site-packages/allennlp/commands/train.py", line 494, in _train_worker
train_loop = TrainModel.from_params(
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sidvash/.conda/envs/iterx/lib/python3.11/site-packages/allennlp/common/from_params.py", line 604, in from_params
return retyped_subclass.from_params(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sidvash/.conda/envs/iterx/lib/python3.11/site-packages/allennlp/common/from_params.py", line 638, in from_params
return constructor_to_call(**kwargs) # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sidvash/.conda/envs/iterx/lib/python3.11/site-packages/allennlp/commands/train.py", line 770, in from_partial_objects
model_ = model.construct(
^^^^^^^^^^^^^^^^
File "/home/sidvash/.conda/envs/iterx/lib/python3.11/site-packages/allennlp/common/lazy.py", line 82, in construct
return self.constructor(**contructor_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sidvash/.conda/envs/iterx/lib/python3.11/site-packages/allennlp/common/lazy.py", line 66, in constructor_to_use
return self._constructor.from_params( # type: ignore[union-attr]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sidvash/.conda/envs/iterx/lib/python3.11/site-packages/allennlp/common/from_params.py", line 604, in from_params
return retyped_subclass.from_params(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sidvash/.conda/envs/iterx/lib/python3.11/site-packages/allennlp/common/from_params.py", line 636, in from_params
kwargs = create_kwargs(constructor_to_inspect, cls, params, **extras)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sidvash/.conda/envs/iterx/lib/python3.11/site-packages/allennlp/common/from_params.py", line 206, in create_kwargs
constructed_arg = pop_and_construct_arg(
^^^^^^^^^^^^^^^^^^^^^^
File "/home/sidvash/.conda/envs/iterx/lib/python3.11/site-packages/allennlp/common/from_params.py", line 314, in pop_and_construct_arg
return construct_arg(class_name, name, popped_params, annotation, default, **extras)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sidvash/.conda/envs/iterx/lib/python3.11/site-packages/allennlp/common/from_params.py", line 348, in construct_arg
result = annotation.from_params(params=popped_params, **subextras)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sidvash/.conda/envs/iterx/lib/python3.11/site-packages/allennlp/common/from_params.py", line 585, in from_params
choice = params.pop_choice(
^^^^^^^^^^^^^^^^^^
File "/home/sidvash/.conda/envs/iterx/lib/python3.11/site-packages/allennlp/common/params.py", line 314, in pop_choice
value = self.pop(key, default)
^^^^^^^^^^^^^^^^^^^^^^
File "/home/sidvash/.conda/envs/iterx/lib/python3.11/site-packages/allennlp/common/params.py", line 216, in pop
raise ConfigurationError(msg)
allennlp.common.checks.ConfigurationError: key "type" is required at location "model.graph_encoder."
It looks like I need to change something in the config file?
There are currently no instructions for running inference on a trained model in the README (only instructions for training and for scoring). We should fix this.
Traceback (most recent call last):
File "D:\anaconda3\envs\iterx\Lib\json\decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Hi,
I am really interested in this paper of using imitation for doc-level extraction and congraduations this work is finally accepted by EACL. Would you release the code then?
The current version of resources/training_configs/muc_config.jsonnet
does not work out of the box, due to some minor discrepancies between the public code release and the code for which this file was written. I will submit a PR with the relevant changes.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.