Giter Club home page Giter Club logo

rmm's Introduction

RMM: A Recursive Mental Model for Dialogue Navigation

This repository contains code for the paper RMM: A Recursive Mental Model for Dialog Navigation.

@inproceedings{romanroman:EMNLP-Findings20,
  title={RMM: A Recursive Mental Model for Dialog Navigation},
  author={Homero Roman Roman and Yonatan Bisk and Jesse Thomason and Asli Celikyilmaz and Jianfeng Gao},
  booktitle={Findings of the 2020 Conference on Empirical Methods in Natural Language Processing},
  year={2020}
}

Installation / Build Instructions

This repository is built from the Matterport3DSimulator codebase. The original installation instructions are included at README_Matterport3DSimulator.md. In this document we outline the instructions necessary to work with the CVDN task.

We recommend using the mattersim Dockerfile to install the simulator. The simulator can also be built without docker but satisfying the project dependencies may be more difficult.

Prerequisites

  • Ubuntu 16.04
  • Nvidia GPU with driver >= 384
  • Install docker with gpu support
  • Note: CUDA / CuDNN toolkits do not need to be installed (these are provided by the docker image)

Building using Docker

Build the docker image:

docker build -t cvdn .

Run the docker container, mounting both the git repo and the dataset:

docker run -it --volume `pwd`:/root/mount/Matterport3DSimulator -w /root/mount/Matterport3DSimulator cvdn

CVDN Dataset Download

Download the train, val_seen, val_unseen, and test splits of the whole CVDN dataset by executing the following script:

tasks/CVDN/data/download.sh

Matterport3D Dataset Download

To use the simulator you must first download the Matterport3D Dataset which is available after requesting access here. The download script that will be provided allows for downloading of selected data types.

The experiments rely on the ResNet-152-imagenet features which must be pre-processed before hand.

Pre-processed features can be obtained as follows:

mkdir -p img_features/
cd img_features/
wget https://www.dropbox.com/s/o57kxh2mn5rkx4o/ResNet-152-imagenet.zip?dl=1 -O ResNet-152-imagenet.zip
unzip ResNet-152-imagenet.zip
cd ..

Train and Evaluate

Pretraining

Pretraining is done using the classic speaker follower setup.

Agent pretraining:

python src/train.py --train_datasets=CVDN --eval_datasets=CVDN

Speaker pretraining:

python src/train.py --entity=speaker --train_datasets=CVDN --eval_datasets=CVDN

Pre-trained models are already included in results/baseline/CVDN_train_eval_CVDN/G1/v1/steps_4

Training and evaluating RMM

To train RMM with single branch evaluation run the following command:

python src/train.py --mode=gameplay --rl_mode=agent_speaker --train_datasets=CVDN --eval_datasets=CVDN

And to train RMM with multiple branch evaluation using the Action Probabilities, run the following command:

python src/train.py --mode=gameplay --eval_branching=3 --action_probs_branching --train_datasets=CVDN --eval_datasets=CVDN

Results are by default saved in

results/gameplay/CVDN_train_eval_CVDN/G1/v1/steps_4/agent_rl_speaker_rl/agent_sample_speaker_sample

val_unseen_gps.csv will contain the Goal Progresses for all the evaluation entries at each time step a question is asked as well as a final goal progress for that entry.

Optional functionality

Including the flag --target_only indicates the agent to not ask questions and only use the target as textual guidance. Similarly, including the flag --current_q_a_only indicates that the agent will only use the latest question-answer pair and discard its dialogue history.

Acknowledgements

This repository is built upon the Matterport3DSimulator codebase.

The CVDN dataset was collected by Thomason et al. as outlined in the paper Vision-and-Dialog Navigation

rmm's People

Contributors

homerorr avatar ybisk avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.