allenai / allennlp-as-a-library-example Goto Github PK

A simple example for how to build your own model using AllenNLP as a dependency.

Python 53.04% CSS 27.84% HTML 11.52% Shell 7.60%

allennlp-as-a-library-example's Introduction

A simple example for how to build your own model using AllenNLP as a dependency. An explanation of all of the code in this repository is given in the part 1 and part 2 of the AllenNLP tutorial.

There are two main pieces of code you need to write in order to make a new model: a DatasetReader and a Model. In this repository, we constructed a DatasetReader for reading academic papers formatted as a JSON lines file (you can see an example of the data in tests/fixtures/s2_papers.jsonl). We then constructed a model to classify the papers given some label (which we specified as the paper's venue in the DatasetReader). Finally, we added a script to use AllenNLP's training commands from a third-party repository, and an experiment configuration for running a real model on real data.

To train this model, after setting up your development environment by running pip install -r requirements.txt, you run:

allennlp train experiments/venue_classifier.json -s /tmp/your_output_dir_here --include-package my_library

This example was written by the AllenNLP team. You can see a similar example repository written by others here.

allennlp-as-a-library-example's People

Contributors

Stargazers

Watchers

allennlp-as-a-library-example's Issues

key "type" required at location "model.text_field_embedder."

I am using allennlp 0.8.4 and following this example. It gives the error: allennlp.common.checks.ConfigurationError: 'key "type" is required at location "model.text_field_embedder."'

Any idea how to fix it?

Tests are failing

When I sync down the project and run python -m pytest. There is a failing test.
The output is included below.

`
======================================================================================================= test session starts =======================================================================================================
platform darwin -- Python 3.6.3, pytest-3.5.1, py-1.5.3, pluggy-0.6.0
rootdir: /Users/paul.murphy/PycharmProjects/second-attempt/allennlp-as-a-library-example, inifile: pytest.ini
plugins: pythonpath-0.7.2, cov-2.5.1, flaky-3.4.0
collected 3 items

tests/dataset_readers/semantic_scholar_dataset_reader_test.py . [ 33%]
tests/models/academic_paper_classifier_test.py F [ 66%]
tests/predictors/predictor_test.py . [100%]

============================================================================================================ FAILURES =============================================================================================================
_________________________________________________________________________________ AcademicPaperClassifierTest.test_model_can_train_save_and_load __________________________________________________________________________________

self = <models.academic_paper_classifier_test.AcademicPaperClassifierTest testMethod=test_model_can_train_save_and_load>

def test_model_can_train_save_and_load(self):

  self.ensure_model_can_train_save_and_load(self.param_file)

tests/models/academic_paper_classifier_test.py:12:

../../untitled/venv/lib/python3.6/site-packages/allennlp/common/testing/model_test_case.py:81: in ensure_model_can_train_save_and_load
self.check_model_computes_gradients_correctly(model, model_batch)

model = AcademicPaperClassifier(
(text_field_embedder): BasicTextFieldEmbedder(
(token_embedder_tokens): Embedding(
...(_dropout): ModuleList(
(0): Dropout(p=0.2)
(1): Dropout(p=0.0)
)
)
(loss): CrossEntropyLoss(
)
)
model_batch = {'abstract': {'tokens': Variable containing:
18 80 6 ... 0 0 0
18 80 6 ... 0 ... 0 0
237 612 238 4 613 614 14 239 615 616 0 0
[torch.LongTensor of size 10x12]
}}

@staticmethod
def check_model_computes_gradients_correctly(model, model_batch):
    model.zero_grad()
    result = model(**model_batch)
    result["loss"].backward()
    has_zero_or_none_grads = {}
    for name, parameter in model.named_parameters():
        zeros = torch.zeros(parameter.size())
        if parameter.requires_grad:

            if parameter.grad is None:
                has_zero_or_none_grads[name] = "No gradient computed (i.e parameter.grad is None)"
            # Some parameters will only be partially updated,
            # like embeddings, so we just check that any gradient is non-zero.
            if (parameter.grad.data.cpu() == zeros).all():
                has_zero_or_none_grads[name] = f"zeros with shape ({tuple(parameter.grad.size())})"
        else:
            assert parameter.grad is None

    if has_zero_or_none_grads:
        for name, grad in has_zero_or_none_grads.items():
            print(f"Parameter: {name} had incorrect gradient: {grad}")

      raise Exception("Incorrect gradients found. See stdout for more info.")

E Exception: Incorrect gradients found. See stdout for more info.

../../untitled/venv/lib/python3.6/site-packages/allennlp/common/testing/model_test_case.py:161: Exception
------------------------------------------------------------------------------------------------------ Captured stdout call -------------------------------------------------------------------------------------------------------
Parameter: classifier_feedforward._linear_layers.0.weight had incorrect gradient: zeros with shape ((2, 4))
Parameter: classifier_feedforward._linear_layers.0.bias had incorrect gradient: zeros with shape ((2,))
Parameter: classifier_feedforward._linear_layers.1.weight had incorrect gradient: zeros with shape ((3, 2))
------------------------------------------------------------------------------------------------------ Captured stderr call -------------------------------------------------------------------------------------------------------
10it [00:00, 388.59it/s]
100%|██████████| 10/10 [00:00<00:00, 2317.55it/s]
10it [00:00, 557.01it/s]
10it [00:00, 551.77it/s]
20it [00:00, 2460.00it/s]
accuracy: 0.4000, accuracy3: 1.0000, loss: 1.0902 ||: 100%|##########| 1/1 [00:00<00:00, 50.55it/s]
accuracy: 0.4000, accuracy3: 1.0000, loss: 1.0898 ||: 100%|##########| 1/1 [00:00<00:00, 88.64it/s]
10it [00:00, 530.95it/s]
10it [00:00, 561.45it/s]
-------------------------------------------------------------------------------------------------------- Captured log call --------------------------------------------------------------------------------------------------------
bucket_iterator.py 92 WARNING shuffle parameter is set to False, while bucket iterators by definition change the order of your data.
bucket_iterator.py 92 WARNING shuffle parameter is set to False, while bucket iterators by definition change the order of your data.
===Flaky Test Report===

===End Flaky Test Report===
=============================================================================================== 1 failed, 2 passed in 3.06 seconds ================================================================================================
(
`

Dead code in paper_classifier_predictor.py

allennlp-as-a-library-example/my_library/predictors/paper_classifier_predictor.py

Line 21 in 583f2fd

all_labels = [label_dict[i] for i in range(len(label_dict))]

The changes to 0.6.1 removes the usages of all_labels but its initialization is still here. It made me very confusing when I tried to understand "how the predictor found all the labels?" and it turned out this is dead code.

Unable to --include-package my_library

I am able to train the model with default models

/Desktop/MentionDetector/allennlp-master/allennlp/allennlp-as-a-library-example-master$ allennlp train ../../tutorials/getting_started/simple_tagger.json --serialization-dir /tmp
/tutorials/getting_started

This goes through successfully.
However I am unable to train including my_package

~/Desktop/MentionDetector/allennlp-master/allennlp/allennlp-as-a-library-example-master$ allennlp train experiments/venue_classifier.json -s /tmp/venue_output_dir  --include-package my_library
/home/kenome/anaconda3/envs/allennlp/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
/home/kenome/anaconda3/envs/allennlp/lib/python3.6/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
  """)
Traceback (most recent call last):
  File "/home/kenome/anaconda3/envs/allennlp/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/kenome/anaconda3/envs/allennlp/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/kenome/Desktop/MentionDetector/allennlp-master/allennlp/run.py", line 18, in <module>
    main(prog="python -m allennlp.run")
  File "/home/kenome/Desktop/MentionDetector/allennlp-master/allennlp/commands/__init__.py", line 62, in main
    import_submodules(package_name)
  File "/home/kenome/Desktop/MentionDetector/allennlp-master/allennlp/common/util.py", line 256, in import_submodules
    importlib.import_module(package_name + '.' + name)
  File "/home/kenome/anaconda3/envs/allennlp/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 953, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'my_library.models.archival'

Here the model seems to look for 'my_library.models.archival' where as it should be looking for 'allennlp.models.archival'
in allennlp/common/util.py", line 256, in import_submodules
importlib.import_module(package_name + '.' + name)

PYTHONPATH is as follows
:/home/kenome/Desktop/MentionDetector/allennlp-master/allennlp:/home/kenome/Desktop/MentionDetector/allennlp-master/allennlp/allennlp-as-a-library-example-master:/home/kenome/Desktop/MentionDetector/allennlp-master:/home/kenome/Desktop/MentionDetector/allennlp-master/allennlp/allennlp
``
Please advise.

Make releases for versions that this tutorial covers

Looks like this tutorial has been updated for 0.6.1, but the releases / tags only reflect up to 0.4.2 . would be good to get that updated.

Running AcademicPaperClassifierTest, dataset_reader's type not found.

python3 academic_paper_classifier_test.py
Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex.
2019-03-11 21:14:11,580 - INFO - allennlp.common.checks - Pytorch version: 1.0.1.post2

2019-03-11 21:14:11,610 - INFO - allennlp.common.from_params - instantiating class <class 'allennlp.data.dataset_readers.dataset_reader.DatasetReader'> from params {'type': 's2_papers'} and extras {}
E

ERROR: test_model_can_train_save_and_load (main.AcademicPaperClassifierTest)

Traceback (most recent call last):
File "academic_paper_classifier_test.py", line 12, in setUp
self.set_up_model('/home/kindler/Projects/2019/M03/re_all/dataset/allennlp_test/allennlp-as-a-library-example/tests/fixtures/academic_paper_classifier.json', '/home/kindler/Projects/2019/M03/re_all/dataset/allennlp_test/allennlp-as-a-library-example/tests/fixtures/s2_papers.jsonl'),
File "/home/kindler/.local/lib/python3.6/site-packages/allennlp/common/testing/model_test_case.py", line 25, in set_up_model
reader = DatasetReader.from_params(params['dataset_reader'])
File "/home/kindler/.local/lib/python3.6/site-packages/allennlp/common/from_params.py", line 275, in from_params
default_to_first_choice=default_to_first_choice)
File "/home/kindler/.local/lib/python3.6/site-packages/allennlp/common/params.py", line 317, in pop_choice
raise ConfigurationError(message)
allennlp.common.checks.ConfigurationError: "s2_papers not in acceptable choices for dataset_reader.type: ['ccgbank', 'conll2003', 'conll2000', 'ontonotes_ner', 'coref', 'winobias', 'event2mind', 'interleaving', 'language_modeling', 'multiprocess', 'ptb_trees', 'squad', 'quac', 'triviaqa', 'qangaroo', 'srl', 'semantic_dependencies', 'seq2seq', 'sequence_tagging', 'snli', 'universal_dependencies', 'sst_tokens', 'quora_paraphrase', 'atis', 'nlvr', 'wikitables', 'template_text2sql', 'grammar_based_text2sql', 'quarel', 'simple_language_modeling', 'babi', 'copynet_seq2seq', 'text_classification_json']"

Multi labels prediction using Allennlp

Hi,
We have a use case where we need to predict the label as well as the classes within the label. It is like predicting two columns (multi label target prediction). I am not sure whether I can use the existing library example for my requirement.
Please suggest me whether it's possible using allennlp.
Thanks in Advance

Update to the latest Allennlp version

The requirements.txt file has the allennlp version set to 0.8.1,

Tutorial no longer works out of the box

As of May 14, tutorial doesn't work out of the box anymore, and must be updated for the current AllenNLP version.

git clone https://github.com/allenai/allennlp-as-a-library-example.git
cd allennlp-as-a-library-example
allennlp train experiments/venue_classifier.json -s /tmp/your_output_dir_here --include-package my_library`

Output:
AssertionError: No super class method found for "decode"

Removing the @OVERRIDES for decode in the model class, leads to other errors (key error for data loader)

training failing with "cuda_device": 0

When I set "cuda_device": 0, the training is failing with the following error
RuntimeError: Expected object of type torch.FloatTensor but found type torch.cuda.FloatTensor for argument #4 'tensor1'

Do I need to set the cuda flag somewhere else too?

Fail to load model archive after training

Hi all, I followed an example and finally got the trained model saved in model.tar.gz. I want to load the model to make a predictor like following

from allennlp.models.archival import load_archive
from allennlp.service.predictors import Predictor

archive = load_archive('model.tar.gz')
# predictor = Predictor.from_archive(archive, 'paper-classifier')

However, I got the following error when I try to load the model.

ConfigurationError                        Traceback (most recent call last)
<ipython-input-2-741f7f19114e> in <module>()
----> 1 archive = load_archive('output/model.tar.gz')

~/anaconda3/lib/python3.6/site-packages/allennlp/models/archival.py in load_archive(archive_file, cuda_device, overrides, weights_file)
    147                        weights_file=weights_path,
    148                        serialization_dir=serialization_dir,
--> 149                        cuda_device=cuda_device)
    150 
    151     if tempdir:

~/anaconda3/lib/python3.6/site-packages/allennlp/models/model.py in load(cls, config, serialization_dir, weights_file, cuda_device)
    293         # This allows subclasses of Model to override _load.
    294         # pylint: disable=protected-access
--> 295         return cls.by_name(model_type)._load(config, serialization_dir, weights_file, cuda_device)
    296 
    297 

~/anaconda3/lib/python3.6/site-packages/allennlp/common/registrable.py in by_name(cls, name)
     54     def by_name(cls: Type[T], name: str) -> Type[T]:
     55         if name not in Registrable._registry[cls]:
---> 56             raise ConfigurationError("%s is not a registered name for %s" % (name, cls.__name__))
     57         return Registrable._registry[cls].get(name)
     58 

ConfigurationError: 'paper-classifier is not a registered name for Model'

Is their a way on how to register paper-classifier so that it can be loaded and use later?

allenai / allennlp-as-a-library-example Goto Github PK

allennlp-as-a-library-example's Introduction

allennlp-as-a-library-example's People

Contributors

Stargazers

Watchers

Forkers

allennlp-as-a-library-example's Issues

2019-03-11 21:14:11,610 - INFO - allennlp.common.from_params - instantiating class <class 'allennlp.data.dataset_readers.dataset_reader.DatasetReader'> from params {'type': 's2_papers'} and extras {} E

ERROR: test_model_can_train_save_and_load (main.AcademicPaperClassifierTest)

Recommend Projects

Recommend Topics

Recommend Org

2019-03-11 21:14:11,610 - INFO - allennlp.common.from_params - instantiating class <class 'allennlp.data.dataset_readers.dataset_reader.DatasetReader'> from params {'type': 's2_papers'} and extras {}
E