Giter Club home page Giter Club logo

allennlp-as-a-library-example's Introduction

A simple example for how to build your own model using AllenNLP as a dependency. An explanation of all of the code in this repository is given in the part 1 and part 2 of the AllenNLP tutorial.

There are two main pieces of code you need to write in order to make a new model: a DatasetReader and a Model. In this repository, we constructed a DatasetReader for reading academic papers formatted as a JSON lines file (you can see an example of the data in tests/fixtures/s2_papers.jsonl). We then constructed a model to classify the papers given some label (which we specified as the paper's venue in the DatasetReader). Finally, we added a script to use AllenNLP's training commands from a third-party repository, and an experiment configuration for running a real model on real data.

To train this model, after setting up your development environment by running pip install -r requirements.txt, you run:

allennlp train experiments/venue_classifier.json -s /tmp/your_output_dir_here --include-package my_library

This example was written by the AllenNLP team. You can see a similar example repository written by others here.

allennlp-as-a-library-example's People

Contributors

amandalynne avatar ibeltagy avatar joelgrus avatar maksymdel avatar matt-gardner avatar mikerossgithub avatar nelson-liu avatar schmmd avatar seal6363 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

allennlp-as-a-library-example's Issues

Tests are failing

When I sync down the project and run python -m pytest. There is a failing test.
The output is included below.

`
======================================================================================================= test session starts =======================================================================================================
platform darwin -- Python 3.6.3, pytest-3.5.1, py-1.5.3, pluggy-0.6.0
rootdir: /Users/paul.murphy/PycharmProjects/second-attempt/allennlp-as-a-library-example, inifile: pytest.ini
plugins: pythonpath-0.7.2, cov-2.5.1, flaky-3.4.0
collected 3 items

tests/dataset_readers/semantic_scholar_dataset_reader_test.py . [ 33%]
tests/models/academic_paper_classifier_test.py F [ 66%]
tests/predictors/predictor_test.py . [100%]

============================================================================================================ FAILURES =============================================================================================================
_________________________________________________________________________________ AcademicPaperClassifierTest.test_model_can_train_save_and_load __________________________________________________________________________________

self = <models.academic_paper_classifier_test.AcademicPaperClassifierTest testMethod=test_model_can_train_save_and_load>

def test_model_can_train_save_and_load(self):
  self.ensure_model_can_train_save_and_load(self.param_file)

tests/models/academic_paper_classifier_test.py:12:


../../untitled/venv/lib/python3.6/site-packages/allennlp/common/testing/model_test_case.py:81: in ensure_model_can_train_save_and_load
self.check_model_computes_gradients_correctly(model, model_batch)


model = AcademicPaperClassifier(
(text_field_embedder): BasicTextFieldEmbedder(
(token_embedder_tokens): Embedding(
...(_dropout): ModuleList(
(0): Dropout(p=0.2)
(1): Dropout(p=0.0)
)
)
(loss): CrossEntropyLoss(
)
)
model_batch = {'abstract': {'tokens': Variable containing:
18 80 6 ... 0 0 0
18 80 6 ... 0 ... 0 0
237 612 238 4 613 614 14 239 615 616 0 0
[torch.LongTensor of size 10x12]
}}

@staticmethod
def check_model_computes_gradients_correctly(model, model_batch):
    model.zero_grad()
    result = model(**model_batch)
    result["loss"].backward()
    has_zero_or_none_grads = {}
    for name, parameter in model.named_parameters():
        zeros = torch.zeros(parameter.size())
        if parameter.requires_grad:

            if parameter.grad is None:
                has_zero_or_none_grads[name] = "No gradient computed (i.e parameter.grad is None)"
            # Some parameters will only be partially updated,
            # like embeddings, so we just check that any gradient is non-zero.
            if (parameter.grad.data.cpu() == zeros).all():
                has_zero_or_none_grads[name] = f"zeros with shape ({tuple(parameter.grad.size())})"
        else:
            assert parameter.grad is None

    if has_zero_or_none_grads:
        for name, grad in has_zero_or_none_grads.items():
            print(f"Parameter: {name} had incorrect gradient: {grad}")
      raise Exception("Incorrect gradients found. See stdout for more info.")

E Exception: Incorrect gradients found. See stdout for more info.

../../untitled/venv/lib/python3.6/site-packages/allennlp/common/testing/model_test_case.py:161: Exception
------------------------------------------------------------------------------------------------------ Captured stdout call -------------------------------------------------------------------------------------------------------
Parameter: classifier_feedforward._linear_layers.0.weight had incorrect gradient: zeros with shape ((2, 4))
Parameter: classifier_feedforward._linear_layers.0.bias had incorrect gradient: zeros with shape ((2,))
Parameter: classifier_feedforward._linear_layers.1.weight had incorrect gradient: zeros with shape ((3, 2))
------------------------------------------------------------------------------------------------------ Captured stderr call -------------------------------------------------------------------------------------------------------
10it [00:00, 388.59it/s]
100%|██████████| 10/10 [00:00<00:00, 2317.55it/s]
10it [00:00, 557.01it/s]
10it [00:00, 551.77it/s]
20it [00:00, 2460.00it/s]
accuracy: 0.4000, accuracy3: 1.0000, loss: 1.0902 ||: 100%|##########| 1/1 [00:00<00:00, 50.55it/s]
accuracy: 0.4000, accuracy3: 1.0000, loss: 1.0898 ||: 100%|##########| 1/1 [00:00<00:00, 88.64it/s]
10it [00:00, 530.95it/s]
10it [00:00, 561.45it/s]
-------------------------------------------------------------------------------------------------------- Captured log call --------------------------------------------------------------------------------------------------------
bucket_iterator.py 92 WARNING shuffle parameter is set to False, while bucket iterators by definition change the order of your data.
bucket_iterator.py 92 WARNING shuffle parameter is set to False, while bucket iterators by definition change the order of your data.
===Flaky Test Report===

===End Flaky Test Report===
=============================================================================================== 1 failed, 2 passed in 3.06 seconds ================================================================================================
(
`

Unable to --include-package my_library

I am able to train the model with default models

/Desktop/MentionDetector/allennlp-master/allennlp/allennlp-as-a-library-example-master$ allennlp train ../../tutorials/getting_started/simple_tagger.json --serialization-dir /tmp
/tutorials/getting_started

This goes through successfully.
However I am unable to train including my_package

~/Desktop/MentionDetector/allennlp-master/allennlp/allennlp-as-a-library-example-master$ allennlp train experiments/venue_classifier.json -s /tmp/venue_output_dir  --include-package my_library
/home/kenome/anaconda3/envs/allennlp/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
/home/kenome/anaconda3/envs/allennlp/lib/python3.6/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
  """)
Traceback (most recent call last):
  File "/home/kenome/anaconda3/envs/allennlp/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/kenome/anaconda3/envs/allennlp/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/kenome/Desktop/MentionDetector/allennlp-master/allennlp/run.py", line 18, in <module>
    main(prog="python -m allennlp.run")
  File "/home/kenome/Desktop/MentionDetector/allennlp-master/allennlp/commands/__init__.py", line 62, in main
    import_submodules(package_name)
  File "/home/kenome/Desktop/MentionDetector/allennlp-master/allennlp/common/util.py", line 256, in import_submodules
    importlib.import_module(package_name + '.' + name)
  File "/home/kenome/anaconda3/envs/allennlp/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 953, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'my_library.models.archival'

Here the model seems to look for 'my_library.models.archival' where as it should be looking for 'allennlp.models.archival'
in allennlp/common/util.py", line 256, in import_submodules
importlib.import_module(package_name + '.' + name)

PYTHONPATH is as follows
:/home/kenome/Desktop/MentionDetector/allennlp-master/allennlp:/home/kenome/Desktop/MentionDetector/allennlp-master/allennlp/allennlp-as-a-library-example-master:/home/kenome/Desktop/MentionDetector/allennlp-master:/home/kenome/Desktop/MentionDetector/allennlp-master/allennlp/allennlp
``
Please advise.

Running AcademicPaperClassifierTest, dataset_reader's type not found.

python3 academic_paper_classifier_test.py
Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex.
2019-03-11 21:14:11,580 - INFO - allennlp.common.checks - Pytorch version: 1.0.1.post2

2019-03-11 21:14:11,610 - INFO - allennlp.common.from_params - instantiating class <class 'allennlp.data.dataset_readers.dataset_reader.DatasetReader'> from params {'type': 's2_papers'} and extras {}
E

ERROR: test_model_can_train_save_and_load (main.AcademicPaperClassifierTest)

Traceback (most recent call last):
File "academic_paper_classifier_test.py", line 12, in setUp
self.set_up_model('/home/kindler/Projects/2019/M03/re_all/dataset/allennlp_test/allennlp-as-a-library-example/tests/fixtures/academic_paper_classifier.json', '/home/kindler/Projects/2019/M03/re_all/dataset/allennlp_test/allennlp-as-a-library-example/tests/fixtures/s2_papers.jsonl'),
File "/home/kindler/.local/lib/python3.6/site-packages/allennlp/common/testing/model_test_case.py", line 25, in set_up_model
reader = DatasetReader.from_params(params['dataset_reader'])
File "/home/kindler/.local/lib/python3.6/site-packages/allennlp/common/from_params.py", line 275, in from_params
default_to_first_choice=default_to_first_choice)
File "/home/kindler/.local/lib/python3.6/site-packages/allennlp/common/params.py", line 317, in pop_choice
raise ConfigurationError(message)
allennlp.common.checks.ConfigurationError: "s2_papers not in acceptable choices for dataset_reader.type: ['ccgbank', 'conll2003', 'conll2000', 'ontonotes_ner', 'coref', 'winobias', 'event2mind', 'interleaving', 'language_modeling', 'multiprocess', 'ptb_trees', 'squad', 'quac', 'triviaqa', 'qangaroo', 'srl', 'semantic_dependencies', 'seq2seq', 'sequence_tagging', 'snli', 'universal_dependencies', 'sst_tokens', 'quora_paraphrase', 'atis', 'nlvr', 'wikitables', 'template_text2sql', 'grammar_based_text2sql', 'quarel', 'simple_language_modeling', 'babi', 'copynet_seq2seq', 'text_classification_json']"

Multi labels prediction using Allennlp

Hi,
We have a use case where we need to predict the label as well as the classes within the label. It is like predicting two columns (multi label target prediction). I am not sure whether I can use the existing library example for my requirement.
Please suggest me whether it's possible using allennlp.
Thanks in Advance

Tutorial no longer works out of the box

As of May 14, tutorial doesn't work out of the box anymore, and must be updated for the current AllenNLP version.

git clone https://github.com/allenai/allennlp-as-a-library-example.git
cd allennlp-as-a-library-example
allennlp train experiments/venue_classifier.json -s /tmp/your_output_dir_here --include-package my_library`

Output:
AssertionError: No super class method found for "decode"

Removing the @OVERRIDES for decode in the model class, leads to other errors (key error for data loader)

training failing with "cuda_device": 0

When I set "cuda_device": 0, the training is failing with the following error
RuntimeError: Expected object of type torch.FloatTensor but found type torch.cuda.FloatTensor for argument #4 'tensor1'

Do I need to set the cuda flag somewhere else too?

Fail to load model archive after training

Hi all, I followed an example and finally got the trained model saved in model.tar.gz. I want to load the model to make a predictor like following

from allennlp.models.archival import load_archive
from allennlp.service.predictors import Predictor

archive = load_archive('model.tar.gz')
# predictor = Predictor.from_archive(archive, 'paper-classifier')

However, I got the following error when I try to load the model.

ConfigurationError                        Traceback (most recent call last)
<ipython-input-2-741f7f19114e> in <module>()
----> 1 archive = load_archive('output/model.tar.gz')

~/anaconda3/lib/python3.6/site-packages/allennlp/models/archival.py in load_archive(archive_file, cuda_device, overrides, weights_file)
    147                        weights_file=weights_path,
    148                        serialization_dir=serialization_dir,
--> 149                        cuda_device=cuda_device)
    150 
    151     if tempdir:

~/anaconda3/lib/python3.6/site-packages/allennlp/models/model.py in load(cls, config, serialization_dir, weights_file, cuda_device)
    293         # This allows subclasses of Model to override _load.
    294         # pylint: disable=protected-access
--> 295         return cls.by_name(model_type)._load(config, serialization_dir, weights_file, cuda_device)
    296 
    297 

~/anaconda3/lib/python3.6/site-packages/allennlp/common/registrable.py in by_name(cls, name)
     54     def by_name(cls: Type[T], name: str) -> Type[T]:
     55         if name not in Registrable._registry[cls]:
---> 56             raise ConfigurationError("%s is not a registered name for %s" % (name, cls.__name__))
     57         return Registrable._registry[cls].get(name)
     58 

ConfigurationError: 'paper-classifier is not a registered name for Model'

Is their a way on how to register paper-classifier so that it can be loaded and use later?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.