facebookresearch / parlai Goto Github PK

A framework for training and evaluating AI models on a variety of openly available dialogue datasets.

License: MIT License

Python 85.43% Shell 0.32% HTML 7.95% JavaScript 5.72% CSS 0.48% Makefile 0.01% C++ 0.02% Cuda 0.05% Dockerfile 0.01%

parlai's Introduction

ParlAI (pronounced “par-lay”) is a python framework for sharing, training and testing dialogue models, from open-domain chitchat, to task-oriented dialogue, to visual question answering.

Its goal is to provide researchers:

100+ popular datasets available all in one place, with the same API, among them PersonaChat, DailyDialog, Wizard of Wikipedia, Empathetic Dialogues, SQuAD, MS MARCO, QuAC, HotpotQA, QACNN & QADailyMail, CBT, BookTest, bAbI Dialogue tasks, Ubuntu Dialogue, OpenSubtitles, Image Chat, VQA, VisDial and CLEVR. See the complete list here.
a wide set of reference models -- from retrieval baselines to Transformers.
a large zoo of pretrained models ready to use off-the-shelf
seamless integration of Amazon Mechanical Turk for data collection and human evaluation
integration with Facebook Messenger to connect agents with humans in a chat interface
a large range of helpers to create your own agents and train on several tasks with multitasking
multimodality, some tasks use text and images

ParlAI is described in the following paper: “ParlAI: A Dialog Research Software Platform", arXiv:1705.06476 or see these more up-to-date slides.

Follow us on Twitter and check out our Release notes to see the latest information about new features & updates, and the website http://parl.ai for further docs. For an archived list of updates, check out NEWS.md.

Interactive Tutorial

For those who want to start with ParlAI now, you can try our Colab Tutorial.

Installing ParlAI

Operating System

ParlAI should work as inteded under Linux or macOS. We do not support Windows at this time, but many users report success on Windows using Python 3.8 and issues with Python 3.9. We are happy to accept patches that improve Windows support.

Python Interpreter

ParlAI currently requires Python3.8+.

Requirements

ParlAI supports Pytorch 1.6 or higher. All requirements of the core modules are listed in requirements.txt. However, some models included (in parlai/agents) have additional requirements.

Virtual Environment

We strongly recommend you install ParlAI in a virtual environment using venv or conda.

End User Installation

If you want to use ParlAI without modifications, you can install it with:

cd /path/to/your/parlai-app
python3.8 -m venv venv
venv/bin/pip install --upgrade pip setuptools wheel
venv/bin/pip install parlai

Developer Installation

Many users will want to modify some parts of ParlAI. To set up a development environment, run the following commands to clone the repository and install ParlAI:

git clone https://github.com/facebookresearch/ParlAI.git ~/ParlAI
cd ~/ParlAI
python3.8 -m venv venv
venv/bin/pip install --upgrade pip setuptools wheel
venv/bin/python setup.py develop

Note Sometimes the install from source maynot work due to dependencies (specially in PyTorch related packaged). In that case try building a fresh conda environment and running the similar to the following: conda install pytorch==2.0.0 torchvision torchaudio torchtext pytorch-cuda=11.8 -c pytorch -c nvidia. Check torch setup documentation for your CUDA and OS versions.

All needed data will be downloaded to ~/ParlAI/data. If you need to clear out the space used by these files, you can safely delete these directories and any files needed will be downloaded again.

Documentation

Examples

A large set of scripts can be found in parlai/scripts. Here are a few of them. Note: If any of these examples fail, check the installation section to see if you have missed something.

Display 10 random examples from the SQuAD task

parlai display_data -t squad

Evaluate an IR baseline model on the validation set of the Personachat task:

parlai eval_model -m ir_baseline -t personachat -dt valid

Train a single layer transformer on PersonaChat (requires pytorch and torchtext). Detail: embedding size 300, 4 attention heads, 2 epochs using batchsize 64, word vectors are initialized with fasttext and the other elements of the batch are used as negative during training.

parlai train_model -t personachat -m transformer/ranker -mf /tmp/model_tr6 --n-layers 1 --embedding-size 300 --ffn-size 600 --n-heads 4 --num-epochs 2 -veps 0.25 -bs 64 -lr 0.001 --dropout 0.1 --embedding-type fasttext_cc --candidates batch

Code Organization

The code is set up into several main directories:

core: contains the primary code for the framework
agents: contains agents which can interact with the different tasks (e.g. machine learning models)
scripts: contains a number of useful scripts, like training, evaluating, interactive chatting, ...
tasks: contains code for the different tasks available from within ParlAI
mturk: contains code for setting up Mechanical Turk, as well as sample MTurk tasks
messenger: contains code for interfacing with Facebook Messenger
utils: contains a wide number of frequently used utility methods
crowdsourcing: contains code for running crowdsourcing tasks, such as on Amazon Mechanical Turk
chat_service: contains code for interfacing with services such as Facebook Messenger
zoo: contains code to directly download and use pretrained models from our model zoo

Support

If you have any questions, bug reports or feature requests, please don't hesitate to post on our Github Issues page. You may also be interested in checking out our FAQ and our Tips n Tricks.

Please remember to follow our Code of Conduct.

Contributing

We welcome PRs from the community!

You can find information about contributing to ParlAI in our Contributing document.

The Team

ParlAI is currently maintained by Moya Chen, Emily Dinan, Dexter Ju, Mojtaba Komeili, Spencer Poff, Pratik Ringshia, Stephen Roller, Kurt Shuster, Eric Michael Smith, Megan Ung, Jack Urbanek, Jason Weston, Mary Williamson, and Jing Xu. Kurt Shuster is the current Tech Lead.

Former major contributors and maintainers include Alexander H. Miller, Margaret Li, Will Feng, Adam Fisch, Jiasen Lu, Antoine Bordes, Devi Parikh, Dhruv Batra, Filipe de Avila Belbute Peres, Chao Pan, and Vedant Puri.

Citation

Please cite the arXiv paper if you use ParlAI in your work:

@article{miller2017parlai,
  title={ParlAI: A Dialog Research Software Platform},
  author={{Miller}, A.~H. and {Feng}, W. and {Fisch}, A. and {Lu}, J. and {Batra}, D. and {Bordes}, A. and {Parikh}, D. and {Weston}, J.},
  journal={arXiv preprint arXiv:{1705.06476}},
  year={2017}
}

License

ParlAI is MIT licensed. See the LICENSE file for details.

parlai's People

Contributors

Stargazers

Watchers

Forkers

programmer-util iamalbert ajaytalati kormilitzin liviust ml-lab louiekang happylicio dim25 lipiji pranjaldaga hydercps saraswat sohuren ponlee ebottabi tpnguyen kiyukuta mikalv codeaudit tedliaotw davidpham87 xhuvom kochergan neo4reo benjamesbabala txye hbcbh1999 ankitshah009 devbib kannangrajan techscientist gilshu stevenlol chenmoshushi maroofmf jetbloom thanhlct guoyoyo11 breakdawn kjeanclaude pkulzb yangliuy rap9430 gkannan1 wakeupbuddy azhao1981 hsd315 zgsxwsdxg kevin1h ubaidsayyed54 data-peace aimicm nabihach bensnw phonx tb222 brainprint vaibhavgupta3110 shyamalschandra amitkb3 ltoscano ashiqrh ravikg guepar johndpope orchestor templeblock cyvaction gonewithgt prabhjotsl fundou plliao zhangruiskyline nagyist deepmusic hhy5277 amareshpandey mvandreas iou2much akki2825 vunb praga9 tzukit122 digideskio namjae anubisye steem williamzhen willcode2surf andremtsilva bigblackfella akriot kobihcmomanyi krishnad chetkhatri qse24h gitztevs shihuaxing benderpan

parlai's Issues

Why don't use "answer_start" in SQUAD for target

In current drqa.py, the _find_target() in class DrqaAgent() finds the start/end token span for all labels in document and then return a random one for training. This approach will inevitably bring some false positives in target. Now that "answer_start" is a provided field in SQUAD dataset, I wonder why don't we just use that to build the target?

There may be an error in line 268 in dialog_teacher.py

opt.get('no_images', False) should be self.opt.get('no_images', False) ?

ImportError: No module named 'parlai'

I assume init.py or setup.py issues:

python examples/drqa/train.py -t "#opensubtitles" -bs 32
Traceback (most recent call last):
  File "examples/drqa/train.py", line 28, in <module>
    from parlai.agents.drqa.agents import SimpleDictionaryAgent
ImportError: No module named 'parlai'

Similar to #50.

zmq missing from the requirements file

pip3 install zmq should work

drqa training time

python examples/drqa/train.py -t squad -bs 32 takes a very long time on a CPU at least.

What are some good toy params for playing with DrQA?

FileNotFoundError: [Errno 2] No such file or directory: 'luajit

I can sucessfully run the command "python3 examples/memnn_luatorch_cpu/full_task_train.py -t babi:task10k:1 -nt 1" in the terminal,but when I run it by pycharm,it occures the error as follows:
FileNotFoundError: [Errno 2] No such file or directory: 'luajit
Do you know why?@alexholdenmiller

Run time error

When I run this command,

 python3 examples/display_data.py -t babi:task1k:1

just got this error. System is Ubuntu16.04, and I have installed pytorch already.

[task:babi:task1k:1]
[numthreads:1]
[num_examples:10]
[datatype:train]
[download_path:/usr/local/lib/python3.5/dist-packages/parlai-0.1.0-py3.5.egg/downloads/]
[datapath:/usr/local/lib/python3.5/dist-packages/parlai-0.1.0-py3.5.egg/data/]
[batchsize:1]
[Agent initializing.]
[creating task(s): babi:task1k:1]
[building data: /usr/local/lib/python3.5/dist-packages/parlai-0.1.0-py3.5.egg/data//bAbI/]
mkdir: cannot create directory ‘/usr/local/lib/python3.5/dist-packages/parlai-0.1.0-py3.5.egg/data’: Permission denied
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/parlai-0.1.0-py3.5.egg/parlai/core/agents.py", line 218, in _create_task_agents
    create_agent = getattr(my_module, 'create_agents')
AttributeError: module 'parlai.tasks.babi.agents' has no attribute 'create_agents'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "examples/display_data.py", line 42, in <module>
    main()
  File "examples/display_data.py", line 30, in main
    world = create_task(opt, agent)
  File "/usr/local/lib/python3.5/dist-packages/parlai-0.1.0-py3.5.egg/parlai/core/worlds.py", line 673, in create_task
    world = create_task_world(opt, user_agents)
  File "/usr/local/lib/python3.5/dist-packages/parlai-0.1.0-py3.5.egg/parlai/core/worlds.py", line 648, in create_task_world
    world_class, task_agents = _get_task_world(opt)
  File "/usr/local/lib/python3.5/dist-packages/parlai-0.1.0-py3.5.egg/parlai/core/worlds.py", line 643, in _get_task_world
    task_agents = _create_task_agents(opt)
  File "/usr/local/lib/python3.5/dist-packages/parlai-0.1.0-py3.5.egg/parlai/core/agents.py", line 222, in _create_task_agents
    return create_task_agent_from_taskname(opt)
  File "/usr/local/lib/python3.5/dist-packages/parlai-0.1.0-py3.5.egg/parlai/core/agents.py", line 186, in create_task_agent_from_taskname
    task_agents = teacher_class(opt)
  File "/usr/local/lib/python3.5/dist-packages/parlai-0.1.0-py3.5.egg/parlai/tasks/babi/agents.py", line 28, in __init__
    opt['datafile'] = _path('', task.split(':')[2], opt)
  File "/usr/local/lib/python3.5/dist-packages/parlai-0.1.0-py3.5.egg/parlai/tasks/babi/agents.py", line 15, in _path
    build(opt)
  File "/usr/local/lib/python3.5/dist-packages/parlai-0.1.0-py3.5.egg/parlai/tasks/babi/build.py", line 18, in build
    build_data.make_dir(dpath)
  File "/usr/local/lib/python3.5/dist-packages/parlai-0.1.0-py3.5.egg/parlai/core/build_data.py", line 24, in make_dir
    raise RuntimeError('failed: ' + s)
RuntimeError: failed: mkdir -p "/usr/local/lib/python3.5/dist-packages/parlai-0.1.0-py3.5.egg/data//bAbI/"

So, what is this indicates?

AttributeError: module 'parlai.agents.drqa.agents' has no attribute 'DrqaAgent

Following error when I try to run python examples/display_model.py -m drqa -t "#opensubtitles" -dt valid:

DrqaAgent

    Warning: no model found for 'en'

    Only loading the 'en' tokenizer.

Traceback (most recent call last):
  File "examples/display_model.py", line 42, in <module>
    main()
  File "examples/display_model.py", line 29, in main
    agent = create_agent(opt)
  File "/Users/bittlingmayer/Desktop/aca/ParlAI/parlai/core/agents.py", line 105, in create_agent
    model_class = getattr(my_module, class_name)
AttributeError: module 'parlai.agents.drqa.agents' has no attribute 'DrqaAgent

Do you plan to support MS Marco or InsuranceQA?

Hi, do you plan to support MS Marco or InsuranceQA dataset in the near future?

PackageNotFoundError: Package missing in current win-64 channels:

conda install pytorch torchvision cuda80 -c soumith
Fetching package metadata ...............

PackageNotFoundError: Package missing in current win-64 channels:

pytorch

I tried to add channel but i got:

conda install pytorch torchvision cuda80 -c soumith
Fetching package metadata ...............

PackageNotFoundError: Package missing in current win-64 channels:

pytorch

Any suggestion ?

Cuda Out of Memory in LSTM SQUAD model

Does this example require too much VRAM? I am trying to run this example on a box with a Nvida 970m (3GB VRAM) but I got this error:

05/16/2017 04:02:32 PM: [ Ok, let's go... ]
05/16/2017 04:02:32 PM: [ Training for 1000 iters... ]
05/16/2017 04:02:36 PM: [train] updates = 10 | train loss = 9.83 | exs = 310
05/16/2017 04:02:39 PM: [train] updates = 20 | train loss = 9.79 | exs = 623
05/16/2017 04:02:42 PM: [train] updates = 30 | train loss = 9.75 | exs = 938
THCudaCheck FAIL file=/py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory
Traceback (most recent call last):
  File "examples/drqa/train.py", line 178, in <module>
    main(opt)
  File "examples/drqa/train.py", line 113, in main
    train_world.parley()
  File "/home/ivan/ParlAI/parlai/core/worlds.py", line 505, in parley
    batch_act = self.batch_act(index, batch_observations[index])
  File "/home/ivan/ParlAI/parlai/core/worlds.py", line 479, in batch_act
    batch_actions = a.batch_act(batch_observation)
  File "/home/ivan/ParlAI/parlai/agents/drqa/agents.py", line 192, in batch_act
    self.model.update(batch)
  File "/home/ivan/ParlAI/parlai/agents/drqa/model.py", line 113, in update
    self.optimizer.step()
  File "/home/ivan/anaconda3/lib/python3.6/site-packages/torch/optim/adamax.py", line 68, in step
    torch.max(norm_buf, 0, out=(exp_inf, exp_inf.new().long()))
RuntimeError: cuda runtime error (2) : out of memory at /py/conda-bld/pytorch_1493680494901/work/torch/lib/THC/generic/THCStorage.cu:66

I haven't dive in the code, so I am not sure if this is a bug or I just need more VRAM. Thank you

feature request: progress bar

It would be really helpful if the % and so on were printed out to the console during training, similar to fasttext.

Estimated time to finish could be good too.

Add Dockerfile

Add Dockerfile and a docker image which might make trying out ParlAI easier.

error on running first example

python3 examples/display_data.py -t babi:task1k:1

[numthreads:1]
[task:babi:task1k:1]
[datatype:train]
[batchsize:1]
[num_examples:10]
[datapath:/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/parlai-0.1.0-py3.4.egg/data/]
[download_path:/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/parlai-0.1.0-py3.4.egg/downloads/]
[Agent initializing.]
[creating task(s): babi:task1k:1]
[building data: /Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/parlai-0.1.0-py3.4.egg/data//bAbI/]
sh: wget: command not found
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/parlai-0.1.0-py3.4.egg/parlai/core/agents.py", line 218, in _create_task_agents
create_agent = getattr(my_module, 'create_agents')
AttributeError: 'module' object has no attribute 'create_agents'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "examples/display_data.py", line 42, in
main()
File "examples/display_data.py", line 30, in main
world = create_task(opt, agent)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/parlai-0.1.0-py3.4.egg/parlai/core/worlds.py", line 673, in create_task
world = create_task_world(opt, user_agents)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/parlai-0.1.0-py3.4.egg/parlai/core/worlds.py", line 648, in create_task_world
world_class, task_agents = _get_task_world(opt)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/parlai-0.1.0-py3.4.egg/parlai/core/worlds.py", line 643, in _get_task_world
task_agents = _create_task_agents(opt)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/parlai-0.1.0-py3.4.egg/parlai/core/agents.py", line 222, in _create_task_agents
return create_task_agent_from_taskname(opt)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/parlai-0.1.0-py3.4.egg/parlai/core/agents.py", line 186, in create_task_agent_from_taskname
task_agents = teacher_class(opt)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/parlai-0.1.0-py3.4.egg/parlai/tasks/babi/agents.py", line 28, in init
opt['datafile'] = _path('', task.split(':')[2], opt)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/parlai-0.1.0-py3.4.egg/parlai/tasks/babi/agents.py", line 15, in _path
build(opt)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/parlai-0.1.0-py3.4.egg/parlai/tasks/babi/build.py", line 23, in build
build_data.download(dpath, url)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/parlai-0.1.0-py3.4.egg/parlai/core/build_data.py", line 19, in download
raise RuntimeError('failed: ' + s)
RuntimeError: failed: cd "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/parlai-0.1.0-py3.4.egg/data//bAbI/"; wget https://s3.amazonaws.com/fair-data/parlai/babi/babi.tar.gz

Solution is to install wget first. It doesn't automatically provides wget as in the requirements file
pip3 install wget
also upgrade pip if not done

Failed building wheel for pytorch

Could you please help ?
I'm using PyCharm and i tried to install pytorch, but i got below error:

Install packages failed: Error occurred when installing package pytorch.

The following command was executed:

packaging_tool.py install --build-dir C:\Users\PC\AppData\Local\Temp\pycharm-packaging3873295594079902182.tmp pytorch

The error output of the command:

Failed building wheel for pytorch
Command "C:\Users\PC\Anaconda3\python.exe -u -c "import setuptools, tokenize;file='C:\Users\PC\AppData\Local\Temp\pycharm-packaging3873295594079902182.tmp\pytorch\setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record C:\Users\PC\AppData\Local\Temp\pip-_plphgbs-record\install-record.txt --single-version-externally-managed --compile" failed with error code 1 in C:\Users\PC\AppData\Local\Temp\pycharm-packaging3873295594079902182.tmp\pytorch\

Collecting pytorch
Using cached pytorch-0.1.2.tar.gz
Building wheels for collected packages: pytorch
Running setup.py bdist_wheel for pytorch: started
Running setup.py bdist_wheel for pytorch: finished with status 'error'
Complete output from command C:\Users\PC\Anaconda3\python.exe -u -c "import setuptools, tokenize;file='C:\Users\PC\AppData\Local\Temp\pycharm-packaging3873295594079902182.tmp\pytorch\setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" bdist_wheel -d C:\Users\PC\AppData\Local\Temp\tmpvlzpjkgvpip-wheel- --python-tag cp36:
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\PC\AppData\Local\Temp\pycharm-packaging3873295594079902182.tmp\pytorch\setup.py", line 17, in
raise Exception(message)
Exception: You should install pytorch from http://pytorch.org

Running setup.py clean for pytorch
Failed to build pytorch
Installing collected packages: pytorch
Running setup.py install for pytorch: started
Running setup.py install for pytorch: finished with status 'error'
Complete output from command C:\Users\PC\Anaconda3\python.exe -u -c "import setuptools, tokenize;file='C:\Users\PC\AppData\Local\Temp\pycharm-packaging3873295594079902182.tmp\pytorch\setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record C:\Users\PC\AppData\Local\Temp\pip-_plphgbs-record\install-record.txt --single-version-externally-managed --compile:
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\PC\AppData\Local\Temp\pycharm-packaging3873295594079902182.tmp\pytorch\setup.py", line 13, in
raise Exception(message)
Exception: You should install pytorch from http://pytorch.org

----------------------------------------

Every response in corpus is a possible candidate?

In the dialogue corpuses the examples appear to contain every possible response as a possible candidate. What's the general logic behind this or is it a byproduct of using fbdialog_teacher's setup_data function?

[opensubtitles]: Everybody' s a little nervous their first night in show business .
[labels: You just do what we rehearsed and everything will be fine .]
[cands: - I' m packing right now .| Then Pearl Harbor happened and everything changed .| Well , then maybe you can explain to me why that robust woman is eating my girlfriend' s din ...| This postmark is too faint to read .| - It' s not attached to the adjacent bulldings .| ...and 741300 more]
[RepeatLabelAgent]: You just do what we rehearsed and everything will be fine

[cornell_movie]: Edward, did you hear me?
[labels: I'm here.]
[cands: No, Kringelein, not tired, -- just -- Well -- well --|Now he's gone and made me fall in love with him, which I never wanted to do. I told him that.|By killing them?|I lost it. I lose everything.|Lynette, I told you already, it won't work.| ...and 97145 more]
[RepeatLabelAgent]: I'm here.

On a side note I'm making my own task which isn't a dialogue but can follow the format of one fairly easily. Putting the data into fb format and inheriting from fbdialog_teacher seems to be the simplest approach.

Virtualenv being created in the wrong directory?

Hey -

I started getting a FileNotFoundError: [Errno 2] No such file or directory: errors, because the setup_aws script was trying to delete the parent_dir+"/venv directory here.

However, looks like the venv directory is actually being created in the mturk/task/ directory, because that's where the run.py script is being execute from (at least by me).

I can submit a PR to have the venv get created in the parent_dir if that's what you intended.

Error while trying to display examples from ubuntu corpus

skc@Ultron:~/git/other_repos/ParlAI$ python3 examples/display_data.py -t ubuntu -n 10
[download_path:/Users/skc/git/other_repos/ParlAI/downloads]
[parlai_home:/Users/skc/git/other_repos/ParlAI]
[numthreads:1]
[num_examples:10]
[datatype:train]
[batchsize:1]
[datapath:/Users/skc/git/other_repos/ParlAI/data]
[no_images:False]
[task:ubuntu]
[creating task(s): ubuntu]
[building data: /Users/skc/git/other_repos/ParlAI/data/Ubuntu]
100% [......................................................................] 199132376 / 199132376
unpacking ubuntu.tar.gz
[DialogTeacher initializing.]
loading: /Users/skc/git/other_repos/ParlAI/data/Ubuntu/train.csv
Traceback (most recent call last):
File "examples/display_data.py", line 42, in
main()
File "examples/display_data.py", line 35, in main
world.parley()
File "/Users/skc/git/other_repos/ParlAI/parlai/core/worlds.py", line 219, in parley
acts[0] = agents[0].act()
File "/Users/skc/git/other_repos/ParlAI/parlai/core/dialog_teacher.py", line 130, in act
action, self.epochDone = self.next_example()
File "/Users/skc/git/other_repos/ParlAI/parlai/core/dialog_teacher.py", line 115, in next_example
action, epoch_done = self.data.get(self.episode_idx, self.entry_idx)
File "/Users/skc/git/other_repos/ParlAI/parlai/core/dialog_teacher.py", line 287, in get
if table['labels'][0] not in table['label_candidates']:
TypeError: argument of type 'NoneType' is not iterable

Other datasets like babi, dialog_babi and MovieDialog are displaying fine.

Create Gitter group for community conversation

Would be great to have gitter / IRC group for community conversation with the team.

num_hits flag in run_mturk.py may be incorrect

Hello from Mechanical Turk! I am part of the MTurk team, and I'm trying to walk through the MTurk integration in ParlAI.

In the "run_mturk.py" file, one of the parameters is "num_hits", which can be set to choose how many HITs are published for a specific task. I think there may be a logical bug here.

E.g. if I set num_hits = 3 while using model_evaluator:

ParlAI publishes 3 HITs to MTurk containing the details of 1 task
The same Worker could do all three HITs (doing one HIT does not stop you from doing another HIT), rating the same task over and over again
The net result is that you have the same result (a score from ONE Worker for ONE task) repeated 3 times - so it's less information than you wanted and you're paying the same person 3 times.

I believe what you really want is 3 different Workers rating each task. MTurk supports this using the concept of assignments. What you want is to publish 1 HIT per Task, but with 3 assignments per HIT. You can define the number of assignments using the "MaxAssignments" flag when creating HITs (http://docs.aws.amazon.com/AWSMechTurk/latest/AWSMturkAPI/ApiReference_CreateHITOperation.html).

If you ask for 3 assignments for a HIT, MTurk will make the HIT available to 3 different Workers automatically. A Worker is not allowed to complete more than one assignment per HIT.

Then, when you are retrieving results, you will want to check for all 3 assignments being done, instead of searching for all 3 HITs getting done.

To recap: right now for each task, ParlAI posts X HITs with 1 Assignments each, and should instead post 1 HIT with X Assignments.

Please let me know if that makes sense and if I can help in any other way.

How to set MTrurk assignments requested

Is there a way to set the HIT Assignments Requested in Mechanical Turk? Default is 1.

build_dict doesn't work in train_model without a custom dictionary class

Kind of an odd finicky thing to point out. There needs to be dictionary agent arguments parsed in order for a dictionary to be built in train_model.py. And this only happens incidentally with drqa agent when the agent's parameters are added to the model. If an agent doesn't have a custom dictionary class or call the dictionary agent argument parser in its add_cmdline_args function then there are no default dictionary arguments available (and this causes it to crash).

Update tasks in task_list.py

The list of tasks in task_list.py is outdated and does not include tasks like VQA and VisDial.

Error on running the readme instructions

I cloned the repository and tried to run examples

git clone https://github.com/facebookresearch/ParlAI.git
cd ParlAI/
python setup.py install
python examples/display_data.py -t babi:task1k:1

> [task:babi:task1k:1]

[download_path:/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/parlai-0.1.0-py2.7.egg/downloads/]
[datatype:train]
[numthreads:1]
[batchsize:1]
[datapath:/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/parlai-0.1.0-py2.7.egg/data/]
[num_examples:10]
Traceback (most recent call last):
File "examples/display_data.py", line 42, in
main()
File "examples/display_data.py", line 29, in main
agent = RepeatLabelAgent(opt)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/parlai-0.1.0-py2.7.egg/parlai/agents/repeat_label/agents.py", line 27, in init
super().init(opt)
TypeError: super() takes at least 1 argument (0 given)

Name server pointing to http://parl.ai

Currently Name server pointing is not established would be good to have parl.ai -> http://ec2-35-162-199-80.us-west-2.compute.amazonaws.com/

Issue on 3rd example on Readme

python3 examples/eval_model.py -m ir_baseline -t "#moviedd-reddit" -dt valid

Output is

[model_file:]
[task:#moviedd-reddit]
[numthreads:1]
[batchsize:1]
[model_params:]
[num_examples:1000]
[model:ir_baseline]
[datatype:valid]
[download_path:/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/parlai-0.1.0-py3.4.egg/downloads]
[display_examples:False]
[datapath:/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/parlai-0.1.0-py3.4.egg/data]
IrBaselineAgent
Traceback (most recent call last):
File "examples/eval_model.py", line 45, in
main()
File "examples/eval_model.py", line 29, in main
agent = create_agent(opt)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/parlai-0.1.0-py3.4.egg/parlai/core/agents.py", line 100, in create_agent
my_module = importlib.import_module(module_name)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/importlib/init.py", line 109, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 2254, in _gcd_import
File "", line 2237, in _find_and_load
File "", line 2212, in _find_and_load_unlocked
File "", line 321, in _call_with_frames_removed
File "", line 2254, in _gcd_import
File "", line 2237, in _find_and_load
File "", line 2224, in _find_and_load_unlocked
ImportError: No module named 'parlai.agents.ir_baseline'

Running examples doesn't work

I installed ParlAI using the installation instructions provided in the readme file. I then tried to run the training example using

python examples/train_model.py -m drqa -t squad -bs 32 -mf /tmp/model

I get the following error message:

[ Loading model None ]
Traceback (most recent call last):
  File "examples/train_model.py", line 161, in <module>
    main()
  File "examples/train_model.py", line 93, in main
    agent = create_agent(opt)
  File "/root/ParlAI/parlai/core/agents.py", line 283, in create_agent
    return model_class(opt)
  File "/root/ParlAI/parlai/agents/drqa/drqa.py", line 139, in __init__
    self._init_from_saved(opt['pretrained_model'])
  File "/root/ParlAI/parlai/agents/drqa/drqa.py", line 161, in _init_from_saved
    map_location=lambda storage, loc: storage
  File "/usr/local/anaconda3/lib/python3.6/site-packages/torch/serialization.py", line 229, in load
    return _load(f, map_location, pickle_module)
  File "/usr/local/anaconda3/lib/python3.6/site-packages/torch/serialization.py", line 362, in _load
    return legacy_load(f)
  File "/usr/local/anaconda3/lib/python3.6/site-packages/torch/serialization.py", line 297, in legacy_load
    with closing(tarfile.open(fileobj=f, mode='r:', format=tarfile.PAX_FORMAT)) as tar, \
  File "/usr/local/anaconda3/lib/python3.6/tarfile.py", line 1553, in open
    raise ValueError("nothing to open")
ValueError: nothing to open

It seems to me that even though --pretrained_model is set to None, the code in drqa.py still tries to run it. I tried changing the condition in drqa.py to

if 'pretrained_model' in self.opt and self.opt['pretrained_model']:

But similar errors start popping up:

Traceback (most recent call last):
  File "examples/train_model.py", line 161, in <module>
    main()
...
    with open(opt['embedding_file']) as f:
TypeError: expected str, bytes or os.PathLike object, not NoneType

Is there something wrong with the argument parser? Or do the conditionals need to be updated?

word embeddings with DrQA

From the below, I cannot really tell what is happening with the word embeddings by default.

From the warning, it seems like it should train on the text given (similar to fasttext unsupervised).

But from the param value pretrained_words True, it seems like it is using some pretrained .vec that was never provided.

Any clarification?

05/25/2017 02:17:53 PM: Setting fix_embeddings to False as embeddings are random.
05/25/2017 02:17:53 PM: [ Initializing model from scratch ]
05/25/2017 02:17:56 PM: [ WARNING: No embeddings provided. Keeping random initialization. ]
05/25/2017 02:17:56 PM: [ Created with options: ] 
no_images	False
num_features	3
question_merge	self_attn
pretrained_words	True
rnn_type	lstm
dict_max_ngram_size	-1
vocab_size	146272
doc_layers	3
use_in_question	True
embedding_dim	300

Error running DrQA

I ran: python examples/drqa/train.py -t "#opensubtitles"

After 10 or 20 minutes:

05/25/2017 02:34:31 PM: [ Running validation... ]
Traceback (most recent call last):
  File "examples/drqa/train.py", line 180, in <module>
    main(opt)
  File "examples/drqa/train.py", line 118, in main
    valid_metric = validate(opt, doc_reader, iteration)
  File "examples/drqa/train.py", line 62, in validate
    valid_world.parley()
  File "/Users/bittlingmayer/Desktop/aca/ParlAI/parlai/core/worlds.py", line 221, in parley
    acts[1] = agents[1].act()
  File "/Users/bittlingmayer/Desktop/aca/ParlAI/parlai/agents/drqa/agents.py", line 157, in act
    [ex], null=self.word_dict['<NULL>'], cuda=self.opt['cuda']
  File "/Users/bittlingmayer/Desktop/aca/ParlAI/parlai/agents/drqa/utils.py", line 126, in batchify
    max_length = max([d.size(0) for d in docs])
  File "/Users/bittlingmayer/Desktop/aca/ParlAI/parlai/agents/drqa/utils.py", line 126, in <listcomp>
    max_length = max([d.size(0) for d in docs])
RuntimeError: dimension 0 out of range of 0D tensor at /Users/soumith/code/builder/wheel/pytorch-src/torch/lib/TH/generic/THTensor.c:24

Path handling in tasks

Hello,

Thanks a lot for your work and for sharing your work with the world. I just started to play with the repository and I would like to know your opinion on the following point: the path are hard coded with a slash in string.

It lead to a few hiccups in the download files for example cornell movie and opensubtitles (which could be updated for the 2016 version).

I would suggest to replace the string concatenation of path with the elegant os.path.join built-in function to handle the system dependency.

Default python version recognized as python2.7 when running ./setup.sh

First of all, thanks for share this awesome library.

This is a very novice question.

When I run setup.sh it seems that the program capture default python version as 2.7.
ex) sudo ./setup.sh
Creating /usr/local/lib/python2.7/dist-packages/parlai.egg-link (link to .)

This make error when running base_train.py on Opensubtitles dataset.

I have python3.5 as well in my Ubuntu 14.04 machine, and python in bash link to python3.5 (not python2.7).

Any hints?
I really appreciate your answer.

module 'library.PositionalEncoder' not found

Tried to debug #58 by testing the LuaJIT installation with luajit -e "require 'library.PositionalEncoder'" from #58 (comment).

Received the following error

~/code/ParlAI> luajit -e "require 'library.PositionalEncoder'"
luajit: (command line):1: module 'library.PositionalEncoder' not found:
        no field package.preload['library.PositionalEncoder']
        no file '/Users/brian/.luarocks/share/lua/5.1/library/PositionalEncoder.lua'
        no file '/Users/brian/.luarocks/share/lua/5.1/library/PositionalEncoder/init.lua'
        no file '/Users/brian/torch/install/share/lua/5.1/library/PositionalEncoder.lua'
        no file '/Users/brian/torch/install/share/lua/5.1/library/PositionalEncoder/init.lua'
        no file './library/PositionalEncoder.lua'
        no file '/Users/brian/torch/install/share/luajit-2.1.0-beta1/library/PositionalEncoder.lua'
        no file '/usr/local/share/lua/5.1/library/PositionalEncoder.lua'
        no file '/usr/local/share/lua/5.1/library/PositionalEncoder/init.lua'
        no file '/Users/brian/torch/install/lib/library/PositionalEncoder.dylib'
        no file '/Users/brian/.luarocks/lib/lua/5.1/library/PositionalEncoder.so'
        no file '/Users/brian/torch/install/lib/lua/5.1/library/PositionalEncoder.so'
        no file '/Users/brian/torch/install/lib/library/PositionalEncoder.dylib'
        no file './library/PositionalEncoder.so'
        no file '/usr/local/lib/lua/5.1/library/PositionalEncoder.so'
        no file '/usr/local/lib/lua/5.1/loadall.so'
        no file '/Users/brian/torch/install/lib/library.dylib'
        no file '/Users/brian/.luarocks/lib/lua/5.1/library.so'
        no file '/Users/brian/torch/install/lib/lua/5.1/library.so'
        no file '/Users/brian/torch/install/lib/library.dylib'
        no file './library.so'
        no file '/usr/local/lib/lua/5.1/library.so'
        no file '/usr/local/lib/lua/5.1/loadall.so'
stack traceback:
        [C]: in function 'require'
        (command line):1: in main chunk
        [C]: at 0x01000016c0

trained a attention lstm model, how to save it and using it?

I just run this command:

python3 examples/drqa/train.py -t squad -b 32

to train a attention based lstm model, using squad dataset, its runing, but I how do I using it, and evalue it?

"true label missing from candidate labels" in QACNN and QADailyMail

[download_path:/Users/filipeabperes/ParlAI/downloads]
[datatype:train]
[image_mode:raw]
[numthreads:1]
[batchsize:1]
[datapath:/Users/filipeabperes/ParlAI/data]
[model:repeat_label]
[model_file:None]
[dict_class:None]
[evaltask:None]
[display_examples:False]
[num_epochs:1]
[max_train_time:inf]
[log_every_n_secs:1]
[validation_every_n_secs:0]
[validation_patience:5]
[dict_build_first:True]
[parlai_home:/Users/filipeabperes/ParlAI]
[creating task(s): qacnn]
[loading fbdialog data:/Users/filipeabperes/ParlAI/data/QACNN/train.txt]
[ training... ]
Traceback (most recent call last):
  File "train_model.py", line 161, in <module>
    main()
  File "train_model.py", line 106, in main
    world.parley()
  File "/Users/filipeabperes/ParlAI/parlai/core/worlds.py", line 236, in parley
    acts[0] = agents[0].act()
  File "/Users/filipeabperes/ParlAI/parlai/core/dialog_teacher.py", line 130, in act
    action, self.epochDone = self.next_example()
  File "/Users/filipeabperes/ParlAI/parlai/core/dialog_teacher.py", line 115, in next_example
    action, epoch_done = self.data.get(self.episode_idx, self.entry_idx)
  File "/Users/filipeabperes/ParlAI/parlai/core/dialog_teacher.py", line 294, in get
    raise RuntimeError('true label missing from candidate labels')
RuntimeError: true label missing from candidate labels```

lua requirement missing

Error on running python examples/memnn_luatorch_cpu/full_task_train.py -t babi:task10k:1 -n 8

Traceback (most recent call last):
File "examples/memnn_luatorch_cpu/full_task_train.py", line 100, in
main()
File "examples/memnn_luatorch_cpu/full_task_train.py", line 68, in main
agent = ParsedRemoteAgent(opt, {'dictionary': dictionary})
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/parlai-0.1.0-py3.4.egg/parlai/agents/remote_agent/agents.py", line 123, in init
super().init(opt, shared)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/parlai-0.1.0-py3.4.egg/parlai/agents/remote_agent/agents.py", line 51, in init
args=opt.get('remote_args', '')
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/subprocess.py", line 859, in init
restore_signals, start_new_session)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/subprocess.py", line 1463, in _execute_child
raise child_exception_type(errno_num, err_msg)
FileNotFoundError: [Errno 2] No such file or directory: 'luajit'

OpenAI Gym possible integration

Hi to all,

Have you ever thought to use OpenAI Gym as a core framework for developing ParlAI? Why do you choose to develop it completely from scratch? May you be interested in a possible integration with this framework? I will be very interesting in trying to develop this kind of feature so maybe I can help with the development.

Thank you in advance for your answer.

Alessandro

First example in the README.md does not work

The following is the output from running the first example from the README.md:

$ python display_data.py -t babi:task1k:1
[task:babi:task1k:1]
[download_path:/somewhere/code/ParlAI/downloads]
[datatype:train]
[numthreads:1]
[batchsize:1]
[no_images:False]
[datapath:/somewhere/code/ParlAI/data]
[parlai_home:/somewhere/code/ParlAI]
[num_examples:10]
Traceback (most recent call last):
  File "display_data.py", line 42, in <module>
    main()
  File "display_data.py", line 29, in main
    agent = RepeatLabelAgent(opt)
  File "/somewhere/code/ParlAI/parlai/agents/repeat_label/agents.py", line 27, in __init__
    super().__init__(opt)
TypeError: super() takes at least 1 argument (0 given)

Add new VQA datasets to task_list.py and add test to check them

Pytorch Installation failed with requirements_ext.txt

My review points are as below:

We must provide installation steps at README.md
provided requirements_ext.txt installation with pip always fails because only saying pytorch fails and exception as below:

Failed building wheel for pytorch
Running setup.py clean for pytorch
Running setup.py bdist_wheel for regex ... done
Stored in directory: /home/chetan/.cache/pip/wheels/22/bd/1e/4fb7e5438c893f68d145c1748e29dafe5d215bad5dedef8037
Successfully built regex
Failed to build pytorch
Installing collected packages: pytorch, regex
Running setup.py install for pytorch ... error
  Complete output from command /home/chetan/anaconda2/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-XTOHDh/pytorch/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-o2gbGm-record/install-record.txt --single-version-externally-managed --compile:
  Traceback (most recent call last):
    File "<string>", line 1, in <module>
    File "/tmp/pip-build-XTOHDh/pytorch/setup.py", line 13, in <module>
      raise Exception(message)
  Exception: You should install pytorch from http://pytorch.org
  Created new window in existing browser session.

It looks we can probably append this to requirements.txt only or else use saumith's pip command as given at pytorch.org
3. What is need of setup.sh if setup.py would be used as per installation steps at README.md

Problems running the MemNN model

Hello, I am trying to run the training for the MemNN model on the bAbI dialog tasks and receiving the following error. Can someone help?

(parlai) chait@chait:~/ParlAI/examples$ python memnn_luatorch_cpu/full_task_train.py --remote-cmd ~/torch/ -t dialog_babi:Task:1 -nt 8
[port:5555]
[datapath:/home/chait/ParlAI/data]
[parlai_home:/home/chait/ParlAI]
[datatype:train]
[download_path:/home/chait/ParlAI/downloads]
[numthreads:8]
[task:dialog_babi:Task:1]
[dict_language:english]
[num_examples:1000]
[remote_cmd:/home/chait/torch/]
[dict_nulltoken:<NULL>]
[dict_unktoken:<UNK>]
[remote_args:/home/chait/ParlAI/examples/memnn_luatorch_cpu/params_default.lua]
[dict_minfreq:0]
[num_its:100]
[batchsize:1]
[dict_max_ngram_size:-1]
Setting up dictionary.
[creating task(s): dialog_babi:Task:1]
[DialogTeacher initializing.]
[loading fbdialog data:/home/chait/ParlAI/data/dialog-bAbI/dialog-bAbI-tasks/dialog-babi-task1-API-calls-trn.txt]
[creating task(s): dialog_babi:Task:1]
[DialogTeacher initializing.]
[loading fbdialog data:/home/chait/ParlAI/data/dialog-bAbI/dialog-bAbI-tasks/dialog-babi-task1-API-calls-dev.txt]
Dictionary: saving dictionary to /tmp/dict.txt.
Dictionary ready, moving on to training.
Traceback (most recent call last):
  File "memnn_luatorch_cpu/full_task_train.py", line 104, in <module>
    main()
  File "memnn_luatorch_cpu/full_task_train.py", line 72, in main
    agent = ParsedRemoteAgent(opt, {'dictionary': dictionary})
  File "/home/chait/ParlAI/parlai/agents/remote_agent/agents.py", line 123, in __init__
    super().__init__(opt, shared)
  File "/home/chait/ParlAI/parlai/agents/remote_agent/agents.py", line 51, in __init__
    args=opt.get('remote_args', '')
  File "/home/chait/anaconda3/envs/parlai/lib/python3.5/subprocess.py", line 676, in __init__
    restore_signals, start_new_session)
  File "/home/chait/anaconda3/envs/parlai/lib/python3.5/subprocess.py", line 1282, in _execute_child
    raise child_exception_type(errno_num, err_msg)
PermissionError: [Errno 13] Permission denied

I saw this thread on stackoverflow, but was unable to solve the problem myself.

Validation agent has datatype train

In examples/train_model.py there is the option to run validation every n seconds during training. However the model agent which observes the teacher's act containing the validation data still has datatype train in its self.opt dictionary. In the example drqa agent whether an act is intended to be used for validation or training is inferred from the contents of the observation itself.

In my task it's not possible to distinguish between the validation and training data based on an observation's contents. The simple workaround I'm using is to have a custom act function in a class that inherits from DialogTeacher and pass a validation flag inside the observation dictionary. This seems to go against the spirit of using an opt dictionary but I'm leery of going deeper into the worlds and agent modules.

On a side note having format X-Y-Z for the parse arguments and X_Y_Z for the key dictionaries makes it an initial confusion and minor inconvenience when for example using sublime text's search all files function to find out what the option does or what its default value is.

TF-IDF IR Baseline model performance on bAbI_dialog

Hey! I'm trying to use the TF_IDF model provided by the framework to get comparable results to the bAbI Dialog Tasks paper. I had some strange results that I didn't understand.

For the eval_model script-

$ python eval_model.py -m ir_baseline -t dialog_babi:Task:1 -dt valid

[download_path:/home/chait/ParlAI/downloads]
[parlai_home:/home/chait/ParlAI]
[datatype:valid]
[task:dialog_babi:Task:1]
[model:ir_baseline]
[model_file:]
[datapath:/home/chait/ParlAI/data]
[batchsize:1]
[display_examples:False]
[model_params:]
[numthreads:1]
[num_examples:1000]
IrBaselineAgent
[Agent initializing.]
[length_penalty:0.5]
[parlai_home:/home/chait/ParlAI]
[creating task(s): dialog_babi:Task:1]
[DialogTeacher initializing.]
[loading fbdialog data:/home/chait/ParlAI/data/dialog-bAbI/dialog-bAbI-tasks/dialog-babi-task1-API-calls-dev.txt]
---
{'hits@k': {1: 0.0, 10: 0.0, 50: 0.0, 100: 0.0, 5: 0.0}, 'f1': 0.18181818181818182, 'total': 1, 'accuracy': 0.0}
---
{'hits@k': {1: 0.0, 10: 0.0, 50: 0.0, 100: 0.0, 5: 0.0}, 'f1': 0.09090909090909091, 'total': 2, 'accuracy': 0.0}
---
{'hits@k': {1: 0.0, 10: 0.0, 50: 0.0, 100: 0.0, 5: 0.0}, 'f1': 0.06060606060606061, 'total': 3, 'accuracy': 0.0}
---
{'hits@k': {1: 0.0, 10: 0.0, 50: 0.0, 100: 0.0, 5: 0.0}, 'f1': 0.045454545454545456, 'total': 4, 'accuracy': 0.0}
---
{'hits@k': {1: 0.0, 10: 0.0, 50: 0.0, 100: 0.0, 5: 0.0}, 'f1': 0.03636363636363636, 'total': 5, 'accuracy': 0.0}
---
{'hits@k': {1: 0.0, 10: 0.0, 50: 0.0, 100: 0.0, 5: 0.0}, 'f1': 0.030303030303030304, 'total': 6, 'accuracy': 0.0}
---
{'hits@k': {1: 0.0, 10: 0.0, 50: 0.0, 100: 0.0, 5: 0.0}, 'f1': 0.05194805194805195, 'total': 7, 'accuracy': 0.0}
---
{'hits@k': {1: 0.0, 10: 0.0, 50: 0.0, 100: 0.0, 5: 0.0}, 'f1': 0.045454545454545456, 'total': 8, 'accuracy': 0.0}
---
{'hits@k': {1: 0.0, 10: 0.0, 50: 0.0, 100: 0.0, 5: 0.0}, 'f1': 0.04040404040404041, 'total': 9, 'accuracy': 0.0}
---
{'hits@k': {1: 0.0, 10: 0.0, 50: 0.0, 100: 0.0, 5: 0.0}, 'f1': 0.03636363636363636, 'total': 10, 'accuracy': 0.0}
---
{'hits@k': {1: 0.0, 10: 0.0, 50: 0.0, 100: 0.0, 5: 0.0}, 'f1': 0.03305785123966942, 'total': 11, 'accuracy': 0.0}
---
{'hits@k': {1: 0.0, 10: 0.0, 50: 0.0, 100: 0.0, 5: 0.0}, 'f1': 0.04545454545454545, 'total': 12, 'accuracy': 0.0}
---
{'hits@k': {1: 0.0, 10: 0.0, 50: 0.0, 100: 0.0, 5: 0.0}, 'f1': 0.04195804195804195, 'total': 13, 'accuracy': 0.0}
---
{'hits@k': {1: 0.0, 10: 0.0, 50: 0.0, 100: 0.0, 5: 0.0}, 'f1': 0.03896103896103896, 'total': 14, 'accuracy': 0.0}
---
{'hits@k': {1: 0.0, 10: 0.0, 50: 0.0, 100: 0.0, 5: 0.0}, 'f1': 0.03636363636363636, 'total': 15, 'accuracy': 0.0}
---
{'hits@k': {1: 0.0, 10: 0.0, 50: 0.0, 100: 0.0, 5: 0.0}, 'f1': 0.03409090909090909, 'total': 16, 'accuracy': 0.0}
---
{'hits@k': {1: 0.0, 10: 0.0, 50: 0.0, 100: 0.0, 5: 0.0}, 'f1': 0.04278074866310161, 'total': 17, 'accuracy': 0.0}
---
{'hits@k': {1: 0.0, 10: 0.0, 50: 0.0, 100: 0.0, 5: 0.0}, 'f1': 0.04040404040404041, 'total': 18, 'accuracy': 0.0}
---
{'hits@k': {1: 0.0, 10: 0.0, 50: 0.0, 100: 0.0, 5: 0.0}, 'f1': 0.03827751196172249, 'total': 19, 'accuracy': 0.0}
---
{'hits@k': {1: 0.0, 10: 0.0, 50: 0.0, 100: 0.0, 5: 0.0}, 'f1': 0.03636363636363636, 'total': 20, 'accuracy': 0.0}
.
.
. (for each of the 1000 dialogs in the dataset)
.
.
---
{'hits@k': {1: 0.0, 10: 0.0, 50: 0.0, 100: 0.0, 5: 0.0}, 'f1': 0.03135435992578862, 'total': 980, 'accuracy': 0.0}
---
{'hits@k': {1: 0.0, 10: 0.0, 50: 0.0, 100: 0.0, 5: 0.0}, 'f1': 0.03132239829487548, 'total': 981, 'accuracy': 0.0}
---
{'hits@k': {1: 0.0, 10: 0.0, 50: 0.0, 100: 0.0, 5: 0.0}, 'f1': 0.03129050175893365, 'total': 982, 'accuracy': 0.0}
---
{'hits@k': {1: 0.0, 10: 0.0, 50: 0.0, 100: 0.0, 5: 0.0}, 'f1': 0.03125867011930097, 'total': 983, 'accuracy': 0.0}
---
{'hits@k': {1: 0.0, 10: 0.0, 50: 0.0, 100: 0.0, 5: 0.0}, 'f1': 0.031226903178122812, 'total': 984, 'accuracy': 0.0}
---
{'hits@k': {1: 0.0, 10: 0.0, 50: 0.0, 100: 0.0, 5: 0.0}, 'f1': 0.03119520073834807, 'total': 985, 'accuracy': 0.0}
---
{'hits@k': {1: 0.0, 10: 0.0, 50: 0.0, 100: 0.0, 5: 0.0}, 'f1': 0.031163562603724996, 'total': 986, 'accuracy': 0.0}
---
{'hits@k': {1: 0.0, 10: 0.0, 50: 0.0, 100: 0.0, 5: 0.0}, 'f1': 0.03113198857879721, 'total': 987, 'accuracy': 0.0}
---
{'hits@k': {1: 0.0, 10: 0.0, 50: 0.0, 100: 0.0, 5: 0.0}, 'f1': 0.03128450496871562, 'total': 988, 'accuracy': 0.0}
---
{'hits@k': {1: 0.0, 10: 0.0, 50: 0.0, 100: 0.0, 5: 0.0}, 'f1': 0.031252872506664336, 'total': 989, 'accuracy': 0.0}
---
{'hits@k': {1: 0.0, 10: 0.0, 50: 0.0, 100: 0.0, 5: 0.0}, 'f1': 0.0312213039485768, 'total': 990, 'accuracy': 0.0}
---
{'hits@k': {1: 0.0, 10: 0.0, 50: 0.0, 100: 0.0, 5: 0.0}, 'f1': 0.031189799101000032, 'total': 991, 'accuracy': 0.0}
---
{'hits@k': {1: 0.0, 10: 0.0, 50: 0.0, 100: 0.0, 5: 0.0}, 'f1': 0.03115835777126112, 'total': 992, 'accuracy': 0.0}
---
{'hits@k': {1: 0.0, 10: 0.0, 50: 0.0, 100: 0.0, 5: 0.0}, 'f1': 0.031126979767463273, 'total': 993, 'accuracy': 0.0}
---
{'hits@k': {1: 0.0, 10: 0.0, 50: 0.0, 100: 0.0, 5: 0.0}, 'f1': 0.03127858057435535, 'total': 994, 'accuracy': 0.0}
---
{'hits@k': {1: 0.0, 10: 0.0, 50: 0.0, 100: 0.0, 5: 0.0}, 'f1': 0.031247144814984133, 'total': 995, 'accuracy': 0.0}
---
{'hits@k': {1: 0.0, 10: 0.0, 50: 0.0, 100: 0.0, 5: 0.0}, 'f1': 0.031215772179627725, 'total': 996, 'accuracy': 0.0}
---
{'hits@k': {1: 0.0, 10: 0.0, 50: 0.0, 100: 0.0, 5: 0.0}, 'f1': 0.031184462478344246, 'total': 997, 'accuracy': 0.0}
---
{'hits@k': {1: 0.0, 10: 0.0, 50: 0.0, 100: 0.0, 5: 0.0}, 'f1': 0.03133539806886513, 'total': 998, 'accuracy': 0.0}
---
{'hits@k': {1: 0.0, 10: 0.0, 50: 0.0, 100: 0.0, 5: 0.0}, 'f1': 0.03130403130403143, 'total': 999, 'accuracy': 0.0}
---
{'hits@k': {1: 0.0, 10: 0.0, 50: 0.0, 100: 0.0, 5: 0.0}, 'f1': 0.031272727272727396, 'total': 1000, 'accuracy': 0.0}

And for the display_model script-

$ python display_model.py -m ir_baseline -t dialog_babi:Task:1 -dt valid

[model_params:]
[datatype:valid]
[parlai_home:/home/chait/ParlAI]
[model_file:]
[task:dialog_babi:Task:1]
[model:ir_baseline]
[numthreads:1]
[download_path:/home/chait/ParlAI/downloads]
[datapath:/home/chait/ParlAI/data]
[batchsize:1]
[num_examples:10]
IrBaselineAgent
[Agent initializing.]
[length_penalty:0.5]
[parlai_home:/home/chait/ParlAI]
[creating task(s): dialog_babi:Task:1]
[DialogTeacher initializing.]
[loading fbdialog data:/home/chait/ParlAI/data/dialog-bAbI/dialog-bAbI-tasks/dialog-babi-task1-API-calls-dev.txt]
[dialog_babi:Task:1]: hello
   [IRBaselineAgent]: I don't know.
~~
[dialog_babi:Task:1]: can you book a table for six people with french food
   [IRBaselineAgent]: I don't know.
~~
[dialog_babi:Task:1]: <SILENCE>
   [IRBaselineAgent]: I don't know.
~~
[dialog_babi:Task:1]: in bombay
   [IRBaselineAgent]: I don't know.
~~
[dialog_babi:Task:1]: i am looking for a cheap restaurant
   [IRBaselineAgent]: I don't know.
~~
[dialog_babi:Task:1]: <SILENCE>
   [IRBaselineAgent]: I don't know.
- - - - - - - - - - - - - - - - - - - - -
~~
[dialog_babi:Task:1]: hi
   [IRBaselineAgent]: I don't know.
~~
[dialog_babi:Task:1]: can you make a restaurant reservation with italian cuisine for six people in a cheap price range
   [IRBaselineAgent]: I don't know.
~~
[dialog_babi:Task:1]: <SILENCE>
   [IRBaselineAgent]: I don't know.
~~
[dialog_babi:Task:1]: rome please
   [IRBaselineAgent]: I don't know.
~~

Is this how the model is expected to perform? (The per dialog accuracy is zero for all of them, is there a way to obtain the per response accuracy?)

Windows 10 + python 3.5 (Anaconda) code page error

Hi,

I won't be using windows much with ParlAI, still I decided to clone it and "run it", sadly I am not very up to speed with codepages, any ideas if this is from your side or Anaconda?:

python setup.py develop
Traceback (most recent call last):
File "setup.py", line 15, in
readme = f.read()
File "e:\Anaconda3\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 202: character maps to

a mistake in tutorial?

In the first tutorial where "we’ll set up the display loop.", the code is:

parser = ParlaiParser()
opt = parser.parse_args()

if 'task' not in opt:
    # if task not specified from the command line,
    # default to the 1000-training example bAbI task 1
    opt['task'] = 'babi:task1k:1'

agent = RepeatLabelAgent(opt)
world = create_task(opt, agent)

for _ in range(10):
    world.parley()
    print(world.display())
    if world.epoch_done():
        print('EPOCH DONE')
        break

But actually when I ran the code I found that, when the 'task' is not specified, 'task' key is still in opt but its value is None, so the line

if 'task' not in opt:

should be

if 'task' is None:

Need benchmark for sentence selection task

Just like DrQA becoming a high standard benchmark for span selection task for reading comprehension, we might also need a benchmark for sentence selection task (e.g. wikiqa). The ir_baseline is okay, just a little too simple.
I recently re-implemented the paper A Compare-Aggregate Model for Matching Text Sequences with PyTorch and ParlAI (author's original torch implementation is here), which achieves 0.74 MAP for wikiqa. I can send a pull request if you would like to use it as an agent.

No module named 'torch'

i try to run ParlAi example python examples/display_data.py -t babi:task1k:1 but i got belwo error:

[task:babi:task1k:1]
[download_path:/Applications/My-Project/ParlAI/downloads]
[datatype:train]
[image_mode:raw]
[numthreads:1]
[batchsize:1]
[datapath:/Applications/My-Project/ParlAI/data]
[num_examples:10]
[parlai_home:/Applications/My-Project/ParlAI]
[creating task(s): babi:task1k:1]
Traceback (most recent call last):
  File "examples/display_data.py", line 42, in <module>
    main()
  File "examples/display_data.py", line 30, in main
    world = create_task(opt, agent)
  File "/Applications/My-Project/ParlAI/parlai/core/worlds.py", line 806, in create_task
    world = create_task_world(opt, user_agents)
  File "/Applications/My-Project/ParlAI/parlai/core/worlds.py", line 778, in create_task_world
    world_class, task_agents = _get_task_world(opt)
  File "/Applications/My-Project/ParlAI/parlai/core/worlds.py", line 773, in _get_task_world
    task_agents = _create_task_agents(opt)
  File "/Applications/My-Project/ParlAI/parlai/core/agents.py", line 371, in _create_task_agents
    my_module = importlib.import_module(module_name)
  File "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 978, in _gcd_import
  File "<frozen importlib._bootstrap>", line 961, in _find_and_load
  File "<frozen importlib._bootstrap>", line 950, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 655, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 205, in _call_with_frames_removed
  File "/Applications/My-Project/ParlAI/parlai/tasks/babi/agents.py", line 7, in <module>
    from parlai.core.fbdialog_teacher import FbDialogTeacher
  File "/Applications/My-Project/ParlAI/parlai/core/fbdialog_teacher.py", line 42, in <module>
    from .dialog_teacher import DialogTeacher
  File "/Applications/My-Project/ParlAI/parlai/core/dialog_teacher.py", line 9, in <module>
    from .image_featurizers import ImageLoader
  File "/Applications/My-Project/ParlAI/parlai/core/image_featurizers.py", line 8, in <module>
    import torch
ModuleNotFoundError: No module named 'torch'

My system is MacOs, and the python version is 3.6.1, i have install the dependent package by python setup.py develop, but still got this error,

what's the problem?

Reproduce the drQA performance

Using GloVe pre-trained word vectors and the default hyper-parameters, I've gotten the performance EM=52.87 F1=64.47, which is far lower than the reported performance (EM=69.5, F1=78.8) in the paper. (http://arxiv.org/abs/1704.00051)
Did you reproduce the performance?

Add Travis-CI for Continues Integration and Test life cycle

Pad inputs to a fixed length?

I want to pad my input sentences to a fixed length (so as to use a convolutional encoder instead of a recurrent one). I understand that data is loaded in dialog_teacher.py and I can modify that code to add my padding. Is there an API to do this? If not, can an API be added to do the same?

untar on zip file