allenai / deep_qa Goto Github PK

A deep NLP library, based on Keras / tf, focused on question answering (but useful for other NLP too)

License: Apache License 2.0

Python 99.41% Shell 0.59%

deep_qa's Introduction

DEPRECATED

DeepQA is built on top of Keras. We've decided that pytorch is a better platform for NLP research. We re-wrote DeepQA into a pytorch library called AllenNLP. There will be no more development of DeepQA. But, we're pretty excited about AllenNLP - if you're doing deep learning for natural language processing, you should check it out!

DeepQA

DeepQA is a library for doing high-level NLP tasks with deep learning, particularly focused on various kinds of question answering. DeepQA is built on top of Keras and TensorFlow, and can be thought of as an interface to these systems that makes NLP easier.

Specifically, this library provides the following benefits over plain Keras / TensorFlow:

It is easy to get NLP right in DeepQA.
- In Keras, there are a lot of issues around padding sequences and masking that are not handled well in the main Keras code, and we have well-tested code that does the right thing for, e.g., computing attentions over padded sequences, padding all training instances to the same lengths (possibly dynamically by batch, to minimize computation wasted on padding tokens), or distributing text encoders across several sentences or words.
- DeepQA provides a nice, consistent API around building NLP models. This API has functionality around processing data instances, embedding words and/or characters, easily getting various kinds of sentence encoders, and so on. It makes building models for high-level NLP tasks easy.
DeepQA provides a clean interface to training, validating, and debugging Keras models. It is easy to experiment with variants of a model family just by changing some parameters in a JSON file. For example, the particulars of how words are represented, either with fixed GloVe vectors, fine-tuned word2vec vectors, or a concatenation of those with a character-level CNN, are all specified by parameters in a JSON file, not in your actual code. This makes it trivial to switch the details of your model based on the data that you're working with.
DeepQA contains a number of state-of-the-art models, particularly focused around question answering systems (though we've dabbled in models for other tasks, as well). The actual model code for these systems is typically 50 lines or less.

Running DeepQA

Setting up a development environment

DeepQA is built using Python 3. The easiest way to set up a compatible environment is to use Conda. This will set up a virtual environment with the exact version of Python used for development along with all the dependencies needed to run DeepQA.

Download and install Conda.
Create a Conda environment with Python 3.
```
conda create -n deep_qa python=3.5
```
Now activate the Conda environment.
```
source activate deep_qa
```
Install the required dependencies.
```
./scripts/install_requirements.sh
```
Set the PYTHONHASHSEED for repeatable experiments.
```
export PYTHONHASHSEED=2157
```

You should now be able to test your installation with pytest -v. Congratulations! You now have a development environment for deep_qa that uses TensorFlow with CPU support. (For GPU support, see requirements.txt for information on how to install tensorflow-gpu).

Using DeepQA as an executable

To train or evaluate a model using a clone of the DeepQA repository, the recommended entry point is to use the run_model.py script. The first argument to that script is a parameter file, described more below. The second argument determines the behavior, either training a model or evaluating a trained model against a test dataset. Current valid options for the second argument are train and test (omitting the argument is the same as passing train).

Parameter files specify the model class you're using, model hyperparameters, training details, data files, data generator details, and many other things. You can see example parameter files in the examples directory. You can get some notion of what parameters are available by looking through the documentation.

Actually training a model will require input files, which you need to provide. We have a companion library, DeepQA Experiments, which was originally designed to produce input files and run experiments, and can be used to generate required data files for most of the tasks we have models for. We're moving towards putting the data processing code directly into DeepQA, so that DeepQA Experiments is not necessary, but for now, getting training data files in the right format is most easily done with DeepQA Experiments.

Using DeepQA as a library

If you are using DeepQA as a library in your own code, it is still straightforward to run your model. Instead of using the run_model.py script to do the training/evaluation, you can do it yourself as follows:

from deep_qa import run_model, evaluate_model, load_model, score_dataset

# Train a model given a json specification
run_model("/path/to/json/parameter/file")


# Load a model given a json specification
loaded_model = load_model("/path/to/json/parameter/file")
# Do some more exciting things with your model here!


# Get predictions from a pre-trained model on some test data specified in the json parameters.
predictions = score_dataset("/path/to/json/parameter/file")
# Compute your own metrics, or do beam search, or whatever you want with the predictions here.


# Compute Keras' metrics on a test dataset, using a pre-trained model.
evaluate_model("/path/to/json/parameter/file", ["/path/to/data/file"])

The rest of the usage guidelines, examples, etc., are the same as when working in a clone of the repository.

Implementing your own models

To implement a new model in DeepQA, you need to subclass TextTrainer. There is documentation on what is necessary for this; see in particular the Abstract methods section. For a simple example of a fully functional model, see the simple sequence tagger, which has about 20 lines of actual implementation code.

In order to train, load and evaluate models which you have written yourself, simply pass an additional argument to the functions above and remove the model_class parameter from your json specification. For example:

from deep_qa import run_model
from .local_project import MyGreatModel

# Train a model given a json specification (without a "model_class" attribute).
run_model("/path/to/json/parameter/file", model_class=MyGreatModel)

If you're doing a new task, or a new variant of a task with a different input/output specification, you probably also need to implement an Instance type. The Instance handles reading data from a file and converting it into numpy arrays that can be used for training and evaluation. This only needs to happen once for each input/output spec.

Implemented models

DeepQA has implementations of state-of-the-art methods for a variety of tasks. Here are a few of them:

Reading comprehension

The attentive reader, from Teaching Machines to Read and Comprehend, by Hermann and others
Gated Attention Reader from Gated Attention Readers for Text Comprehension,
Bidirectional Attention Flow, from Bidirectional Attention Flow for Machine Comprehension,

Entailment

Decomposable Attention, from A Decomposable Attention Model for Natural Language Inference,

Datasets

This code allows for easy experimentation with the following datasets:

Note that the data processing code for most of this currently lives in DeepQA Experiments, however.

Contributing

If you use this code and think something could be improved, pull requests are very welcome. Opening an issue is ok, too, but we can respond much more quickly to pull requests.

Contributors

Matt Gardner
Mark Neumann
Nelson Liu.
Pradeep Dasigi (the initial author of this codebase)

License

This code is released under the terms of the Apache 2 license.

deep_qa's People

Contributors

Stargazers

Watchers

Forkers

deneutoy nelson-liu matt-gardner beckysharp codeaudit neverspill benjamesbabala royshan allensmile liwzhi cerisara pdasigi alwaysright matt-peters rpgone mohdkashif93 arkll alvinhom liyi193328 sachuin23 mrbot-ai adamage bhavanadalvi gopigrip7 haniesedghi oyvindtafjord nikett roshantanisha tranlm perryhau snuderek subramanyata sungjinlees canoefzh programning saadmahboob soroushmehr schmmd elhomosiguiente billpei janezkranjc spdut iqbal-chowdhury virajadduru colinsongf akshay107 dandersonw emachieng trietnm2 sanjeeku p4rk3r domsia rkadlec tsuresh83 vunb suraj-deshmukh lonehacker zergey goodrahstar hugochan singhranjodh waheedabro libertatis trungvan86 leezqcst chenglongchen valyome sandy4321 dlecnu junteudjio novellll chunlinx baokunguo szha381 pkoduganty zhangyang5511 yuanzhike x369903522 hibeautycode linhx13 githubuser6787 pyx123 arasharchor aiedward renchiyuan atendra12 marilenaoita meelement sainideepak shadowridgedev siddimore kaeflint appajisunilkumar saravananpsg ahmedmishfaq emediacode vbabenko ymihay guptam samsudin09

deep_qa's Issues

Remove pin on pandas version

See #345.

InputShape as a list is broken in latest keras (2.0.4)

AttributeError: 'list' object has no attribute 'shape'
Exception ignored in: <bound method BaseSession.del of <tensorflow.python.client.session.Session object at 0x1211e8c18>>
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 582, in del
AttributeError: 'NoneType' object has no attribute 'TF_DeleteStatus'

Cannot run `bidaf_squad.json`

Should I be able to run the following "out of the box" or do I need to train first?

# python scripts/run_model.py example_experiments/reading_comprehension/bidaf_squad.json  test
2017-06-20 16:36:40,791 - INFO - deep_qa.run - Loading model from parameter file: example_experiments/reading_comprehension/bidaf_squad.json
2017-06-20 16:36:40,831 - PARAM - deep_qa.common.params - random_seed = 13370
2017-06-20 16:36:40,831 - PARAM - deep_qa.common.params - numpy_seed = 1337
Using TensorFlow backend.
2017-06-20 16:36:43,750 - INFO - deep_qa.common.checks - Keras version: 2.0.4
2017-06-20 16:36:43,750 - INFO - deep_qa.common.checks - Tensorflow version: 1.1.0
2017-06-20 16:36:44,233 - PARAM - deep_qa.common.params - processor = {}
2017-06-20 16:36:44,234 - PARAM - deep_qa.common.params - processor.word_splitter = simple
2017-06-20 16:36:44,234 - PARAM - deep_qa.common.params - processor.word_filter = pass_through
2017-06-20 16:36:44,234 - PARAM - deep_qa.common.params - processor.word_stemmer = pass_through
2017-06-20 16:36:44,325 - PARAM - deep_qa.common.params - model_class = BidirectionalAttentionFlow
2017-06-20 16:36:44,325 - PARAM - deep_qa.common.params - encoder = ConfigTree([('word', ConfigTree([('type', 'cnn'), ('ngram_filter_sizes', [5]), ('num_filters', 100)]))])
2017-06-20 16:36:44,325 - INFO - deep_qa.common.params - Converting Params object to dict; logging of default values will not occur when dictionary parameters are used subsequently.
2017-06-20 16:36:44,325 - INFO - deep_qa.common.params - CURRENTLY DEFINED PARAMETERS: 
2017-06-20 16:36:44,325 - PARAM - deep_qa.common.params - encoder.word.type = cnn
2017-06-20 16:36:44,325 - PARAM - deep_qa.common.params - encoder.word.ngram_filter_sizes = [5]
2017-06-20 16:36:44,326 - PARAM - deep_qa.common.params - encoder.word.num_filters = 100
2017-06-20 16:36:44,326 - PARAM - deep_qa.common.params - num_hidden_seq2seq_layers = 2
2017-06-20 16:36:44,326 - PARAM - deep_qa.common.params - num_passage_words = None
2017-06-20 16:36:44,326 - PARAM - deep_qa.common.params - num_question_words = None
2017-06-20 16:36:44,326 - PARAM - deep_qa.common.params - num_highway_layers = 2
2017-06-20 16:36:44,326 - PARAM - deep_qa.common.params - highway_activation = relu
2017-06-20 16:36:44,326 - PARAM - deep_qa.common.params - similarity_function = {'combination': 'x,y,x*y', 'type': 'linear'}
2017-06-20 16:36:44,326 - INFO - deep_qa.common.params - Converting Params object to dict; logging of default values will not occur when dictionary parameters are used subsequently.
2017-06-20 16:36:44,326 - INFO - deep_qa.common.params - CURRENTLY DEFINED PARAMETERS: 
2017-06-20 16:36:44,327 - PARAM - deep_qa.common.params - similarity_function.combination = x,y,x*y
2017-06-20 16:36:44,327 - PARAM - deep_qa.common.params - similarity_function.type = linear
2017-06-20 16:36:44,327 - PARAM - deep_qa.common.params - embeddings = ConfigTree([('words', ConfigTree([('dimension', 100), ('pretrained_file', '/net/efs/aristo/dlfa/glove/glove.6B.100d.txt.gz'), ('project', True), ('fine_tune', False), ('dropout', 0.2)])), ('characters', ConfigTree([('dimension', 8), ('dropout', 0.2)]))])
2017-06-20 16:36:44,327 - PARAM - deep_qa.common.params - data_generator = ConfigTree([('dynamic_padding', True), ('adaptive_batch_sizes', True), ('adaptive_memory_usage_constant', 440000), ('maximum_batch_size', 60)])
2017-06-20 16:36:44,327 - PARAM - deep_qa.common.params - data_generator.dynamic_padding = True
2017-06-20 16:36:44,327 - PARAM - deep_qa.common.params - data_generator.padding_noise = 0.2
2017-06-20 16:36:44,327 - PARAM - deep_qa.common.params - data_generator.sort_every_epoch = True
2017-06-20 16:36:44,327 - PARAM - deep_qa.common.params - data_generator.adaptive_batch_sizes = True
2017-06-20 16:36:44,328 - PARAM - deep_qa.common.params - data_generator.adaptive_memory_usage_constant = 440000
2017-06-20 16:36:44,328 - PARAM - deep_qa.common.params - data_generator.maximum_batch_size = 60
2017-06-20 16:36:44,328 - PARAM - deep_qa.common.params - data_generator.biggest_batch_first = False
2017-06-20 16:36:44,328 - PARAM - deep_qa.common.params - dataset = {}
2017-06-20 16:36:44,328 - PARAM - deep_qa.common.params - dataset.type = text
2017-06-20 16:36:44,328 - PARAM - deep_qa.common.params - num_sentence_words = None
2017-06-20 16:36:44,328 - PARAM - deep_qa.common.params - num_word_characters = None
2017-06-20 16:36:44,328 - PARAM - deep_qa.common.params - tokenizer = {'type': 'words and characters'}
2017-06-20 16:36:44,328 - PARAM - deep_qa.common.params - tokenizer.type = words and characters
2017-06-20 16:36:44,329 - PARAM - deep_qa.common.params - tokenizer.processor = {}
2017-06-20 16:36:44,329 - PARAM - deep_qa.common.params - tokenizer.processor.word_splitter = simple
2017-06-20 16:36:44,329 - PARAM - deep_qa.common.params - tokenizer.processor.word_filter = pass_through
2017-06-20 16:36:44,329 - PARAM - deep_qa.common.params - tokenizer.processor.word_stemmer = pass_through
2017-06-20 16:36:44,329 - PARAM - deep_qa.common.params - encoder = ConfigTree([('word', ConfigTree([('type', 'cnn'), ('ngram_filter_sizes', [5]), ('num_filters', 100)]))])
2017-06-20 16:36:44,329 - PARAM - deep_qa.common.params - encoder_fallback_behavior = crash
2017-06-20 16:36:44,329 - PARAM - deep_qa.common.params - seq2seq_encoder = ConfigTree([('default', ConfigTree([('type', 'bi_gru'), ('encoder_params', ConfigTree([('units', 100)])), ('wrapper_params', ConfigTree())]))])
2017-06-20 16:36:44,329 - PARAM - deep_qa.common.params - seq2seq_encoder_fallback_behavior = crash
2017-06-20 16:36:44,330 - PARAM - deep_qa.common.params - train_files = ['/net/efs/aristo/dlfa/squad/processed/train.tsv']
2017-06-20 16:36:44,330 - PARAM - deep_qa.common.params - validation_files = ['/net/efs/aristo/dlfa/squad/processed/dev.tsv']
2017-06-20 16:36:44,330 - PARAM - deep_qa.common.params - test_files = None
2017-06-20 16:36:44,330 - PARAM - deep_qa.common.params - max_training_instances = None
2017-06-20 16:36:44,330 - PARAM - deep_qa.common.params - max_validation_instances = None
2017-06-20 16:36:44,330 - PARAM - deep_qa.common.params - max_test_instances = None
2017-06-20 16:36:44,330 - PARAM - deep_qa.common.params - train_steps_per_epoch = None
2017-06-20 16:36:44,330 - PARAM - deep_qa.common.params - train_steps_per_epoch = None
2017-06-20 16:36:44,330 - PARAM - deep_qa.common.params - train_steps_per_epoch = None
2017-06-20 16:36:44,331 - PARAM - deep_qa.common.params - save_models = True
2017-06-20 16:36:44,331 - PARAM - deep_qa.common.params - model_serialization_prefix = /net/efs/aristo/dlfa/models/bidaf
2017-06-20 16:36:44,340 - PARAM - deep_qa.common.params - num_gpus = 1
2017-06-20 16:36:44,340 - PARAM - deep_qa.common.params - validation_split = 0.1
2017-06-20 16:36:44,340 - PARAM - deep_qa.common.params - batch_size = 32
2017-06-20 16:36:44,340 - PARAM - deep_qa.common.params - num_epochs = 20
2017-06-20 16:36:44,341 - PARAM - deep_qa.common.params - optimizer = ConfigTree([('type', 'adadelta'), ('learning_rate', 0.5)])
2017-06-20 16:36:44,341 - PARAM - deep_qa.common.params - optimizer.type = adadelta
2017-06-20 16:36:44,341 - PARAM - deep_qa.common.params - gradient_clipping = {'value': 10, 'type': 'clip_by_norm'}
2017-06-20 16:36:44,341 - PARAM - deep_qa.common.params - loss = categorical_crossentropy
2017-06-20 16:36:44,341 - PARAM - deep_qa.common.params - metrics = ['accuracy']
2017-06-20 16:36:44,341 - PARAM - deep_qa.common.params - validation_metric = val_loss
2017-06-20 16:36:44,341 - PARAM - deep_qa.common.params - patience = 3
2017-06-20 16:36:44,342 - PARAM - deep_qa.common.params - fit_kwargs = {}
2017-06-20 16:36:44,342 - PARAM - deep_qa.common.params - tensorboard_log = None
2017-06-20 16:36:44,342 - PARAM - deep_qa.common.params - tensorboard_frequency = 0
2017-06-20 16:36:44,342 - PARAM - deep_qa.common.params - debug = {}
2017-06-20 16:36:44,342 - PARAM - deep_qa.common.params - show_summary_with_masking_info = False
2017-06-20 16:36:44,342 - INFO - deep_qa.training.trainer - Loading serialized model
Traceback (most recent call last):
  File "scripts/run_model.py", line 35, in <module>
    main()
  File "scripts/run_model.py", line 22, in main
    evaluate_model(sys.argv[1])
  File "scripts/../deep_qa/run.py", line 214, in evaluate_model
    model = load_model(param_path, model_class=model_class)
  File "scripts/../deep_qa/run.py", line 160, in load_model
    model.load_model()
  File "scripts/../deep_qa/training/trainer.py", line 389, in load_model
    model_config_file = open("%s_config.json" % self.model_prefix)
FileNotFoundError: [Errno 2] No such file or directory: '/net/efs/aristo/dlfa/models/bidaf_config.json'

Figure out a good way to handle increasing the vocabulary of a pre-trained model

Say you train a model on SQuAD, then want to fine-tune it on SciQ. Presumably there will be words in SciQ that you have plenty of training data for, but were OOV in SQuAD. How do you handle updating the vocabulary in this setting? This is hard, because you basically need to append new rows onto an existing embedding matrix, which messes with an already-existing computation graph. Not sure at all how to do this, but it'd be pretty nice.

Let similarity functions return vectors instead of scalars

Upgrade to python 3.6

I think the main benefit to this is the ability to give type annotations on variables, instead of just methods and method arguments. What needs to be done is just updating all of the continuous integration services / build tools to use the new python version, and make sure everything still works.

Allow tokenizers to insert begin and end tokens

So that if you want, you can represents words as character sequences like [@BEGIN@, w, o, r, d, @END@]. This is potentially helpful for various kinds of encoders (probably not BOW, but many others).

Generator re-pads shuffled instances which are already padded

Training using the data generator results in a slow down due to incremental padding from the noise introduced during shuffling.

Enhance DataIndexer

A better DataIndexer that allows for the techniques described in https://arxiv.org/abs/1703.00993 would be fantastic.

to summarize, they:

index every word in the set of glove vectors, no matter if it shows up in the train set or not
when a word shows up a test time, assign it one of the glove vectors if available. else, assign oov words a random but unique vector

Model checkpointing is done by validation loss, not validation accuracy

This means that the "best epoch" as defined by our code may not have actually been saved by Keras. We need to be sure that the model checkpointing uses the same metric as early stopping.

pip install is failing

Running python setup.py sdist as advised in #296 raises the following error (on current master 672020e):

error in deep_qa setup command: 'install_requires' must be a string or list of strings containing valid project/version requirement specifiers; Invalid requirement, parse error at "'+git://g'"

Fix serialization of WordAlignmentEntailment layers

Currently, WordAlignmentEntailment layers do not work properly with serialization, and should thus be fixed.

Maybe move the utility methods out of `TextTrainer`?

You could make the argument that the way data is handled and the way we build models are too tightly coupled, and should be decomposed. That would mean, basically, making a cleaner separation between the objects that read and process data and TextTrainer, and perhaps also splitting out the _embed_input, _get_encoder, and _get_seq2seq_encoder methods into a separate model utility class.

I'm not totally sold that this is necessary, though. In order to make the handling of word / word+character tokenizers transparent to the model class, you have to have a tight coupling between the data generator and the _embed_input method. I think it would be pretty difficult to make this work without the way that it's currently structured.

Documentation for doing model parallelism on multiple GPUs

With dropping theano support, it should be easy to make our models use multiple GPUs, not just with batch parallelism, and to put some parts of the model on the CPU (e.g., the embedding layer, as recommended by Matt Peters). I think this is pretty straightforward, but I haven't done it before. We should:

Write some documentation with recommendations for how and when to use this (thinking of people new to the codebase and to deep learning in general; can we give them some guidance on how to structure a model for optimal efficiency?).
Implement some reasonable defaults, like putting the embedding layer on the CPU, in TextTrainer.

Change name to semparse

Can we change the name of this repository? tacl2015-factorization doesn't make much sense anymore. I suggest the repo should be allenai/semparse, unless you have a better name (maybe something involving "open vocabulary"?). I would just change it myself, but I don't have access to the repo settings.

Get random seed into the parameter file

The plan for actually implementing model ensembling (#306) is to write a class that takes several model parameter files, loads them all, calls score_dataset on all of them, and averages the result. In order for this to work well, we really should have the random seed as part of the parameter file.

The only real trouble here is that the random seeds have to be set before keras is imported, so this has to be implemented carefully.

Easy profiling functionality

It'd be really nice to know where there are performance bottlenecks in your model. I think tensorflow 1.0 added some stuff that would make this relatively easy to diagnose; can we put in some simple tooling that tells you which layers take up the majority of your computation time?

Make docs incorporate READMEs

We have a few places under doc/ where we've written explanations of things. Those explanations really should be in READMEs in the code, and just read from the README and put into doc/, instead of duplicated in both places.

I don't know if this is easy or hard, because I don't know enough about how sphinx works, or how ReadTheDocs actually builds our docs for each commit. It's probably hard.

scripts for data processing?

Do you have script files for processing raw dataset such as snli_1.0 to processed/{train,dev}.tsv in your example json file?

Why ReadMe.md haven't mention the doc's url and No install instructions?

Hi, as the title.
I know the doc url is http://deep-qa.readthedocs.io/en/latest/models/about_models.html, but I suggest place it in the ReadMe.
To use the liabrary , I must add project path to PYTHONPATH manully?

Avoid instantiating huge tensors as input to similarity functions

I'm not sure how this would work, really, but it takes a whole lot of memory to do it like we do it, tiling everything and then doing elementwise multiplication. There might be some way to make this work using some kind of batch_dot or dot.

Better API around continuing training

Currently, if you want to load a model trained on one dataset and continue training, you can kind of do that, but it will end up overwriting the model that you loaded, which is a problem. We should add nice functionality around continuing training.

Investigate using cache better in CircleCI

Looks like our build scripts for CircleCI intentionally clear the cache before building the docs, which makes things take longer than they probably need to. It didn't use to matter much, because Travis was always way slower, but now sometimes CircleCI finishes after Travis. I don't know why clearing the cache is necessary; @nelson-liu, any particular reason you put that in there?

See if masking makes a difference with a CNN encoder

It's standard to just ignore masking when using a CNN on word / character sequences, because the max pooling can effectively ignore the padding tokens, anyway. It'd be interesting to actually verify that this is true, by implementing masking for our CNN and seeing what difference it makes, if any.

Pretty low priority, though. Just getting a thought out of my head and into an issue tracker.

GatedAttention doesn't load correctly

The GatedAttention layer doesn't implement get_config, and so the gating function doesn't get loaded correctly. You can check this in a test by specifying a non-default gating function when training the model; it will load with the wrong one.

Getting a nan softmax loss

I followed the instructions and started training on the Squad Dataset. It started off well but then the loss became a nan. What could be the possible reason and how can I correct it ?
`657/1474 [============>.................] - ETA: 6417s - loss: 4.8018 - span_begin_softmax_loss: 2.5148 - span_end_softmax_loss: 2.2870 - span_begin_softmax_acc: 0.3892 - span_end_softmax_acc: 0.4372

658/1474 [============>.................] - ETA: 6406s - loss: 4.8008 - span_begin_softmax_loss: 2.5144 - span_end_softmax_loss: 2.2864 - span_begin_softmax_acc: 0.3892 - span_end_softmax_acc: 0.4371

659/1474 [============>.................] - ETA: 6404s - loss: 4.8027 - span_begin_softmax_loss: 2.5155 - span_end_softmax_loss: 2.2872 - span_begin_softmax_acc: 0.3890 - span_end_softmax_acc: 0.4371

660/1474 [============>.................] - ETA: 6390s - loss: 4.8010 - span_begin_softmax_loss: 2.5145 - span_end_softmax_loss: 2.2865 - span_begin_softmax_acc: 0.3892 - span_end_softmax_acc: 0.4372

661/1474 [============>.................] - ETA: 6388s - loss: 4.8017 - span_begin_softmax_loss: 2.5152 - span_end_softmax_loss: 2.2865 - span_begin_softmax_acc: 0.3892 - span_end_softmax_acc: 0.4372

662/1474 [============>.................] - ETA: 6377s - loss: nan - span_begin_softmax_loss: nan - span_end_softmax_loss: nan - span_begin_softmax_acc: 0.3886 - span_end_softmax_acc: 0.4365`

Enable evaluation on epoch with max validation acc.

~~I think model loading is broken right now due to custom layers, but I would need to write a test and check. Until we fix model loading,~~ we can't evaluate on test set with the epoch of a model that had the max validation accuracy (done by serializing then reloading said epoch).

We also need hooks in the scala code to use these in experiments, and make sure the evaluation output is correct with it.

Make an LSTM / GRU layer that uses `tf.dynamic_rnn` instead of `K.rnn`

The intent being that it should probably be a lot faster. I think Matt Peters already has one of these that we can just use.

Though, with the dynamic padding that we're doing now, maybe this isn't as necessary...

Better parameter logging

It'd be nice if every time a parameter was read, there was a logging statement. This is for reproducibility across code changes, particularly with default parameters. If the default changes in a version of code, the saved parameter file associated with a model won't give the same result. If instead we print out the actual value used for each parameter, default or not, we'd have a complete picture of what parameters were used.

I'd probably do this by adding a method to common.params, something like get_with_default, that logs the value that actually gets returned, with a common prefix. Then you can grep the output for these statements, and rebuild a json file that has all of the parameters, including default values.

Remove theano support

This is really just dropping the theano condition from TravisCI, and removing the @requires_theano / @requires_tensorflow annotations. We're moving towards just using tensorflow, which will allow for a lot more flexibility in building models.

Produce training data from generators, as opposed to in a list

I ran into an issue where training a large dataset (Who Did What) with 50d character embeddings would promptly cause the memory usage on my system to explode (60 GB of RAM exhausted). when you use the words and characters tokenizer, you essentially multiply all your data arrays by the maximum word length (in characters). This can become problematic, and the fix is to make the training data come from a generator where we only generate it as needed, thus eliminating the need of storing one big array in memory.

I'm adding this as an issue here so I can track it (/ make sure i don't forget about it). Assigning myself.

evaluate_model() gives parsing error

I am trying to evaluate a model using evaluate_model()
evaluate_model("some.json")
but it gives a ParseException.

some.json-

{
    "model_class": "BidirectionalAttentionFlow",
    "model_serialization_prefix": "/models/bidaf",
    "encoder": {
        "word": {
          "type": "cnn",
          "ngram_filter_sizes": [5],
          "num_filters": 100
        }
    },
    "seq2seq_encoder": {
        "default": {
            "type": "bi_gru",
            "encoder_params": {
                "units": 100
            },
            "wrapper_params": {}
        }
    },
    "data_generator": {
      "dynamic_padding": true,
      "adaptive_batch_sizes": true,
      "adaptive_memory_usage_constant": 440000,
      "maximum_batch_size": 60
    },
    "patience": 3,
    "embeddings": {
      "words": {
        "dimension": 100,
        "pretrained_file": "/trained_vectors/glove.840B.300d.txt",
        "project": true,
        "fine_tune": false,
        "dropout": 0.2
      },
      "characters": {
        "dimension": 8,
        "dropout": 0.2
      }
    },
    "num_epochs": 20,
    "optimizer": {
      "type": "adadelta",
      "learning_rate": 0.5
    },
    "validation_files": ["dev_squad.json"],
    "train_files": ["train_squad.json"]
}

This is the stacktrace-

Using TensorFlow backend.
processor = {}
processor.word_splitter = simple
processor.word_filter = pass_through
processor.word_stemmer = pass_through
0 1175
Traceback (most recent call last):
  File "D:/backup/PycharmProjects/test/deep_qa/with_qa.py", line 79, in <module>
    evaluate_model("some.json")
  File "D:\backup\PycharmProjects\test\deep_qa\run.py", line 214, in evaluate_model
    model = load_model(param_path, model_class=model_class)
  File "D:\backup\PycharmProjects\test\deep_qa\run.py", line 147, in load_model
    param_dict = pyhocon.ConfigFactory.parse_file(param_path)
  File "C:\Users\mohit.badwal.NOTEBOOK546\Anaconda3\lib\site-packages\pyhocon\config_parser.py", line 51, in parse_file
    return ConfigFactory.parse_string(content, os.path.dirname(filename), resolve)
  File "C:\Users\mohit.badwal.NOTEBOOK546\Anaconda3\lib\site-packages\pyhocon\config_parser.py", line 90, in parse_string
    return ConfigParser().parse(content, basedir, resolve)
  File "C:\Users\mohit.badwal.NOTEBOOK546\Anaconda3\lib\site-packages\pyhocon\config_parser.py", line 272, in parse
    config = config_expr.parseString(content, parseAll=True)[0]
  File "C:\Users\mohit.badwal.NOTEBOOK546\Anaconda3\lib\site-packages\pyparsing.py", line 1216, in parseString
    raise exc
  File "C:\Users\mohit.badwal.NOTEBOOK546\Anaconda3\lib\site-packages\pyparsing.py", line 1210, in parseString
    se._parse( instring, loc )
  File "C:\Users\mohit.badwal.NOTEBOOK546\Anaconda3\lib\site-packages\pyparsing.py", line 1072, in _parseNoCache
    loc,tokens = self.parseImpl( instring, preloc, doActions )
  File "C:\Users\mohit.badwal.NOTEBOOK546\Anaconda3\lib\site-packages\pyparsing.py", line 2545, in parseImpl
    loc, exprtokens = e._parse( instring, loc, doActions )
  File "C:\Users\mohit.badwal.NOTEBOOK546\Anaconda3\lib\site-packages\pyparsing.py", line 1076, in _parseNoCache
    loc,tokens = self.parseImpl( instring, preloc, doActions )
  File "C:\Users\mohit.badwal.NOTEBOOK546\Anaconda3\lib\site-packages\pyparsing.py", line 2348, in parseImpl
    raise ParseException(instring, loc, self.errmsg, self)
pyparsing.ParseException: Expected end of text (at char 0), (line:1, col:1)

Where am I going wrong ?

Collapsible Code Blocks for Training Logs

We're planning on putting training logs of various models in doc/models/about_models.rst, but they're quite long which reduces readability. It'd be nice to have these blocks collapsed by default, but I think this is a feature request for the readthedocs sphinx theme.

Alternatively, we could manually add the theme as a submodule to this repo in our doc folder (wherever sphinx looks for themes), remove the theme from the conf.py and requirements.txt, then build the docs with our locally changed version.

Remove `default_label` argument in `read_from_line` methods

It's no longer used, and hasn't been for a long time, and basically all of our new methods say "this argument is unused, but we're adding it for consistency".

Different Character Encodings

Using byte encoding on unicode characters could be a good idea, vs a single index for each unicode characters.

Allowing for different character encodings in tokenizers that return characters would thus be nice.

Allow for non-Keras optimizers

Using plain tensorflow optimizers instead of Keras optimizers is beneficial in some instances (particularly when you have a large embedding matrix, as Matt Peters has discovered). We should split out the actual training loop into something configurable, to allow for different means of optimizing the same computation graph. You can still use _build_model just like we normally do, you just pull out the inputs and outputs and pass them directly into a tensorflow optimizer instead of calling model.fit().

Unpin Sphinx version

See #362. We had to pin the sphinx version because of a conflict with how we're using python's logging functionality. We need to fix this somehow.

Clean up the embeddings API and parameter passing

Currently, there are keys around pretrained embeddings, projecting the embeddings, dropout, and so on, that are flat parameters to TextTrainer. There's also an embedding_dim parameter, which is a dict, with arbitrary allowed keys. We should make the flat parameters also a part of this dictionary, so the parameters look something like this:

"embeddings": {
  "words": {
    "dim": 100,
    "pretrained_file": "/path/to/glove",
    "fine_tune": false
  },
  "characters": {
    "dim": 16
  }
}

Enable batch-parallelization on multiple GPUs

This should be doable with Keras, as shown here: https://medium.com/@kuza55/transparent-multi-gpu-training-on-tensorflow-with-keras-8b0016fd9012. Matt Peters also got this working, though I'm not sure if he used the approach in that blog post, and he had to fork Keras to get it to work correctly.

Tests are Flaky

I've found the unit tests on CI recently to be extremely flaky (the jobs just get killed). I don't think its unreasonable for us to keep our memory usage low, so it'd be good to look into why this is happening.

Migrate dataset reader code from (scala) DeepQA Experiments to DeepQA

Firstly, Much thanks to this great project, which is what I would like to do; I'll continuously watch, use, and even contribute to this project.

But when I want to run some pipelines from scratch, but found that the data pre processing steps is in another project: https://github.com/allenai/deep_qa_experiments, the project's code is scala.

I think the preprocessing steps in another steps is complicated for someone wishing to start the stuff quickly.

Switch `run_on_aws.sh` to use ECR instead of bintray

See email from Jesse, which links to this document. This should just mean changing a couple of lines in that script, but there might also be some other setup required. In general, we should probably have a README somewhere with what the requirements are to use that script.

Make it so you can use `run_model.py` functionality outside of DeepQA

We made DeepQA pip-installable, but you still can't actually run a model unless you use our scripts, which basically means that you need to fork the code and implement your model inside of DeepQA. Instead, we should make it so that you can use all of DeepQA from a third-party codebase, including actually running the thing.

And when we do this, we should add a section to the main README that gives a "hello world" example of how to use DeepQA in third-party code.

Make layers importable from the module

Especially as we're doing one layer per file, an import like from ...layers.backend.add_mask import AddMask is a bit long and redundant. Better to just do from ...layers import AddMask.

Use virtualenv instead of conda

We use conda to create a consistent environment for Python. Is there a reason we use conda over virtualenv?

I was reading the AI2 Python Guide and it recommends using virtualenv. Are there any concerns if we used virtualenv rather than conda (other than the fact that we presently use conda and it works).

Fix paths in the code to be OS independent

I'm pretty sure this won't work on Windows at the moment, because we have plenty of places where / is hard-coded, instead of using os.path.join. This should be fixed.

It's also super low priority for us, and really hard to know if we've found everything, because we just use linux and macOS. Contributions welcome here.

Provide an easy model ensembling API

Not sure what this looks like, exactly, but it should be doable without much work.

Adaptive batch sizes

Because we're doing dynamic padding, it makes sense to vary the batch size to use the GPU memory optimally - when the padding lengths are small, we can increase the batch size, and when they're long, we can decrease the batch size.

The simplest thing to do is to expose a method to subclasses that lets them split the data into batches after sorting. The concrete model class can them specify some heuristics for how many instances will fit in a batch based on how large they are.

A much more exciting thing to do, but also probably close to impossible, is to have the library just figure out how many instances can go in each batch, by examining the computation graph, or something. Not at all sure how to do this.

Allow loading already-indexed data

This would cut down pre-processing time, at the expense of having to make sure you're using the right vocabulary files and such. It would probably also make some of the sequence tagging stuff simpler.

This depends on #328, and you would basically have an option in each script to output a pre-indexed file, running the data indexing code and saving the results. Or maybe this would be a stand-alone script that just ran the pre-processing and saved the data indexer... The second option is probably cleaner, and doesn't depend on #328. You'd have to also add an option to TextTrainer that tells it it's loading a pre-indexed dataset, and add a way to save and load IndexedInstances (maybe just pickling them...)

Figure out if our `switch` function is broken

It sure seems like computing gradients through a switch doesn't work. At least it definitely didn't when switch was used in a loss function. We need to figure out if we're actually getting correct gradients for other places where we use switch, and remove the use of switch if we're not.

allenai / deep_qa Goto Github PK

deep_qa's Introduction

DEPRECATED

DeepQA

Running DeepQA

Setting up a development environment

Using DeepQA as an executable

Using DeepQA as a library

Implementing your own models

Implemented models

Reading comprehension

Entailment

Datasets

Contributing

Contributors

License

deep_qa's People

Contributors

Stargazers

Watchers

Forkers

deep_qa's Issues

Recommend Projects

Recommend Topics

Recommend Org