Giter Club home page Giter Club logo

simpletod's Introduction

SimpleTOD: A Simple Language Model for Task-Oriented Dialogue

Authors: Ehsan Hosseini-Asl, Bryan McCann, Chien-Sheng Wu, Semih Yavuz, and Richard Socher

SimpleTOD

SimpleTOD single turn

Introduction

Task-oriented dialogue (TOD) systems accomplish a goal described by a user in natural language. They often use a pipeline approach. Such approach requires natural language understanding (NLU) for belief state tracking, dialogue management (DM) for deciding which actions to take based on those beliefs, and natural language generation (NLG) for generating responses.

We propose recasting task-oriented dialogue as a simple, causal (unidirectional) language modeling task. We show that such an approach can solve all the sub-tasks in a unified way using multi-task maximum likelihood training. The proposed Simple Task-Oriented Dialogue (SimpleTOD) approach enables modeling of the inherent dependencies between the sub-tasks of task-oriented dialogue, by optimizing for all tasks in an end-to-end manner.

Paper link: https://arxiv.org/abs/2005.00796

Blog link: https://blog.einstein.ai/simpletod

Table of Contents

Installation

The package general requirements are

  • Python >= 3.6
  • Pytorch >= 1.2 (installation instructions here)
  • Transformers >= 2.5.1 (installation instructions here)

1- The package can be installed by running the following command.

pip install -r requirements.txt

2- Running inside docker container

docker build -t <image_name>:<tag> -f Dockerfile

Usage

This section explains steps to preprocess MultiWOZ dataset and training the model.

Preprocessing:

It includes downloading MultiWOZ dataset, performing delexicaliztion, and creating dataset for language model

create_dataset.sh

Each dialogue turn will be represented as a sequence, which contains previous user/system turns, belief, action, and delexicalized response

<|endoftext|> <|context|> <|user|> i am looking for a college type attraction . <|system|> there are 18 colleges i have found , would you prefer 1 in town centre or in the west ? <|user|> i would like to visit on in town centre please . <|system|> sure , we have thirteen options , 10 of which are free . may i suggest king s college , or hughes hall ? <|user|> okay , may i have their postcode , entrance fee , and phone number ?<|endofcontext|> 
<|belief|> attraction type college , attraction name kings college|hughes hall , attraction area centre <|endofbelief|> 
<|action|> attraction inform name , attraction inform fee , attraction inform post , attraction inform phone <|endofaction|> 
<|response|> sure , the post code to [attraction_name] is [attraction_postcode] , the entrance fee is free , and phone number [attraction_phone] <|endofresponse|> <|endoftext|>

DST training:

training the model for predicting belief states.

train_dst.sh $GPU gpt2 $GPT2_TYPE $BATCH

For this task, we include none slot values in the sequence. We observed that this will improve SimpleTOD performance on DST by reducing false positive rates.

<|endoftext|> <|context|> <|user|> am looking for a place to to stay that has cheap price range it should be in a type of hotel <|endofcontext|> 
<|belief|> hotel name not mentioned , hotel area not mentioned , hotel parking not mentioned , hotel pricerange cheap , hotel stars not mentioned , hotel internet not mentioned , hotel type hotel <|endofbelief|> <|endoftext|>

End-to-End training:

In this step, we train SimpleTOD on the sequence of context+belief+action+delex response. Compared to DST task, we do not include none slot values, because of the sequence length limitaiton od GPT2.

train_end2end.sh $GPU gpt2 $GPT2_TYPE $BATCH

Generation:

This script will generate SimpeTOD belief/action/responses. Generation is based on each dialogue, where it create context for each turn and save the generated belief, action, and responses for the dialogue.

CUDA_VISIBLE_DEVICES=$GPU python generate_dialogue.py $CHECKPOINT $DECODING

It will save the model output in a json file MODEL_OUTPUT which contains all dialogues with groundtruth user and system responses as well.

  • In order to use DB search during generation, set --use_db_search (this will use oracle DB search results)
  • In order to use DB search dynamically, set --use_db_search and --use_dynamic_db
  • To use oracle belief and actions, simple set --use_oracle_belief and --use_oracle_action

Evaluation

MultiWOZ evaluation contains two part, Dialogue State Tracking (DST) and End-to-End.

DST evaluation

In order to compute joint accuracy, simply run the following script using the generated MODEL_OUTPUT file. it will use the generated belief states to compute the metric. It will compute joint accuracy without any label cleaning.

python compute_joint_acc.py $MODEL_OUTPUT 

There are two types of label cleaning that can be used to compute joint accuracy.

  • To use default lable cleaning suggested by MultiWOZ author, please set --default_cleaning (for more details, please refer to MultiWOZ FAQ.5)
  • We found other type of noisy annotation. Please refer to the paper for more details different types of noisy annotations. Here, we provide an option to compute joint accuracy by fixing Type 2 noisy annotation (where one or more slots are not labeled in some turns.) by setting --type2_cleaning
  • The complete list of Type 2 noisy annotations is here. For more details on noisy annotation on MultiWOZ dataset, please refer to the paper

End-to-End evaluation

In order to compute inform/success/BLEU, simply run the following script. It will load generated belief states and responses, and computes the metrics.

python evaluate_multiwoz.py $MODEL_OUTPUT

Demo

In order to test the model in real conversation with human, we have provided a simple script where user can input text in a multi turn setting, and see the responses from SimpleTOD. It will generate lexicalized responses and belief states at each turn. For more information, please read the blog.

python demo.py $CHECKPOINT $DECODING

Citation

@article{hosseini2020simple,
  title={A simple language model for task-oriented dialogue},
  author={Hosseini-Asl, Ehsan and McCann, Bryan and Wu, Chien-Sheng and Yavuz, Semih and Socher, Richard},
  journal={arXiv preprint arXiv:2005.00796},
  year={2020}
}

License

The code is released under the BSD-3 License - see LICENSE for details

simpletod's People

Contributors

ehosseiniasl avatar svc-scm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

simpletod's Issues

no $CHECKPOINT in execute the demo

when I tried to run the demo by this line
python demo.py openai-gpt nucleus
I got this error

 File "demo.py", line 602, in <module>
    TOP_P = float(sys.argv[3])
IndexError: list index out of range

What is the third arg to use?

Some bugs

Hello, I recently read your code and found that there are some problems in several places:

  1. "lexixal" in "python preprocess_multiwoz.py lexixal" in create_dataset.sh should be "lexical";
  2. The last part of the "prepare_simpletod_data.py" code saves the data, you need to use "encoding=’utf-8’ ";
  3. In the last part of the main function in main.py, when eval the checkpoint, the save path should be a relative path, without adding "./" in train.sh, and when saving eval_results.txt later, the path is also No, he will look for it from the root directory "output/gpt", but in fact it should be "output/gpt2/checkpoint-xxx";
    I hope you can check these, thanks again!

Shouldn't context be masked during training?

If I understand correctly the idea should be that model generate belief states, dbsearch results, action and response conditioned on some dialog context. Then shouldn't we mask the context in between <|context|> and <|endofcontext|> during training? Because we are not trying to generate past dialogue history, but generate belief states, etc, given past dialogue history.

Basically what I mean is that we modify labels on this line

outputs = model(inputs, labels=labels)

to mask the context before passing it to the model

generate the dialogue response

I tried to generate the dialogue with

python generate_dialogue.py $CHECKPOINT gpt2 $DECODING greedy

I got the result but the generated_response': ['', '', '', '']

when I tried to use the last saved checkpoint as

python generate_dialogue.py --checkpoint="/output/gpt2/checkpoint-350000/pytorch_model.bin"

I got this error

Traceback (most recent call last):
  File "/simpletod/generate_dialogue.py", line 89, in <module>
    tokenizer = GPT2Tokenizer.from_pretrained(model_checkpoint)
  File "/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py", line 1625, in from_pretrained
    f"Calling {cls.__name__}.from_pretrained() with the path to a single file or url is not "
ValueError: Calling GPT2Tokenizer.from_pretrained() with the path to a single file or url is not supported for this tokenizer. Use a model identifier or the path to a directory instead.

how can I generate the response using checkpoint?

missing dependencies

It seems package tensorboardX and simplejson are required in "create_dataset.sh" but missing in "requirements.txt". You may want to add them to the file "requirements.txt"

About the DST training script

Hi Ehsan,

Thanks for the amazing work. I am trying to reproduce the DST results. However, I couldn't find the training hyper-parameters (batch size, lr, early stop) in the repo and the paper.

Is there any plan to release the training script for reproducing the result of the paper?

Thanks,
Zhaojiang

Question on encoding the special token <|context|>

Hi,

I tried to run the DST training script in vscode debug mode. I found that the <|context|> in train.history_belief was encoded to a list of tokens rather than a single token.
['Ġ<', '|', 'context', '|', '>'] and its corresponding ids [1279, 91, 22866, 91, 29]

I tried to track the tokenizer, but the token: "<|context|>" was not added to the gpt2 vocabulary on purpose.

I'm wondering where did I go wrong, or this result is right?

E

what happen?

can't get the result reported in paper when using end2end training without dbsearch

Hi all,
with default parameters, I run end2end training without desearch, train file is resources/gpt2/train.history_belief_action_sys_delex with 56778 samples in the file. The ppl in valid set stop decreasing after only 2 epoch, got final valid set ppl=2.30. The success rate of test set is around 18.5%, much lower than 70.5% reported in paper table 3. And the belief acc is around 42%, also much lower than 55% in paper table 1. I wonder if the model is trained well with default params, would you please release your hyper parameters for end2end training and traing details?

dst evaluation

the article reports joint accuracy to be 56.45 on multiwoz, but I can't reproduce the result with default cleaning method. I noticed type_2_noisy_annotations.json in noisy_annotations directory, did you replace original dst annotations with annotations in this file when evaluating?

Which checkpoint is the best?

Hello, I am also trying to reproduce the result, I notice there are many checkpoints saved, which checkpoint should I use? How do you figure out which one is the best?

Error in create_dataset.sh !!

# preprocess multiwoz with delexicalized responses
python preprocess_multiwoz.py delex

# preprocess multiwoz with lexicalized responses
python preprocess_multiwoz.py lexixal

# create dataset for language modeling with SimpleTOD
python prepare_simpletod_data.py

The 'lexixal' should be 'lexical'.
That will make ur code can't run correctly!!

demo , return q[-1] list index out of range - error

When I try to run the demo by

python demo.py gpt2 greedy

I got this output with the following error

(venv) D:\simpletod-master\simpletod-master>python demo.py gpt2 greedy

Loading Model

SimpleTOD is ready to chat. What would you like to ask?

You: hello, I need a cheap hotel

Traceback (most recent call last):
  File "D:\simpletod-master\simpletod-master\demo.py", line 696, in <module>
    domain = get_turn_domain(beliefs, domain_queue)
  File "D:\simpletod-master\simpletod-master\demo.py", line 587, in get_turn_domain
    return q[-1]
IndexError: list index out of range

when I tried to print the output line by line

I found that the belief_text and beliefs are empty :

  belief_text = get_belief_new_dbsearch(tmp_pred)
       print(belief_text)

       beliefs = convert_belief(belief_text)
       print(beliefs)

their values are:

 []
{}

Is there a problem with arguments?

Distributed Training

Hi, I found that using Dataparallel is really slow, thus I'm looking at the distributeddataparallel part of the code. However I'm not clear what is the default configuration in order to utilize distributeddataparallel, can someone help me on this? Thanks!

Use of Aggregated Belief

In the evaluation script evaluate_multiwoz.py at line 320, the use of aggregated_belief has been commented and instead beliefs is used. This makes pred_beliefs to be a list of dictionaries instead of a dictionary (as would be the case if one used aggregated_belief).

This makes the later check at line 332 fail and thus yielding empty venues for the dialogues. This significantly decreases the Matches and Success rates.

Making the change to aggregated_belief (and changing the lines 335 and 336 accordingly) significantly increases the Matches and Success rates.

What is the reason behind the choice of using pred_beliefs = dial['beliefs'] instead of pred_beliefs = dial['aggregated_belief']?

ValueError: Input <|endoftext|> is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers.

When running demo.py with 'gpt2' as the model, I came across this issue:

Loading ModelTraceback (most recent call last):
File "demo.py", line 620, in
break_tokens = tokenizer.encode(tokenizer._eos_token) + tokenizer.encode('?') + tokenizer.encode('!')
File "/balboa/projects/conv_ai/packages/transformers/src/transformers/tokenization_utils_base.py", line 1430, in encode
**kwargs,
File "/balboa/projects/conv_ai/packages/transformers/src/transformers/tokenization_utils_base.py", line 1742, in encode_plus
**kwargs,
File "/balboa/projects/conv_ai/packages/transformers/src/transformers/tokenization_utils.py", line 454, in _encode_plus
first_ids = get_input_ids(text)
File "/balboa/projects/conv_ai/packages/transformers/src/transformers/tokenization_utils.py", line 442, in get_input_ids
f"Input {text} is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers."
tmp = b
ValueError: Input <|endoftext|> is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers.

The error was fixed when I changed line 620 to: break_tokens = tokenizer.encode(tokenizer.eos_token) + tokenizer.encode('?') + tokenizer.encode('!')

Issues on end2end evaluation

When I use the default parameters to train the end2end model, the results obtained are very different from those in the paper. There is also a gap of about 5% in the dst evaluation. Is there any point that needs to be modified, thx.

issues on dst

Hi,

when evaluating the JGA for DST, did you remove both the none slot and dontcare slot?

When I ran the dialogue_generation.py, it seems that the generated belief states are always empty in the MODEL_OUTPUT file. so could you please provide more details about how the model is trained for DST?

Thanks!

all_venues .json

what is the content of the file all_venues in evaluate_multiwoz.py
self.venues = json.load(open('resources/all_venues.json', 'r'))
how can I get it, please?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.