siat-nlp / galaxy Goto Github PK

Official repository of the AAAI'2022 paper "GALAXY: A Generative Pre-trained Model for Task-Oriented Dialog with Semi-Supervised Learning and Explicit Policy Injection"

License: Apache License 2.0

Python 95.16% Shell 4.84%

pre-trained-model dialogue-generation task-oriented-dialogue semi-supervised

galaxy's People

Contributors

Stargazers

Watchers

Forkers

momozzing silverriver ahmedbellaaj10 hydercps youngerous 965694547 cm-li justin2061 ddoongs banksy23 dehanalkautsar victoryzz

galaxy's Issues

About domain overlap in the dataset

Hello!
I found that dataset Multiwoz has already been included in your pretrained data UniDA, while it is also used as your fine-tuning data. Will there be unfairness in your low-resource experiment result?
Thanks for your response in advance!

running on CPU

can we run this code only on CPU?

Question for codes and dataset

Thank you for opening codes for your impressive methods and results.

I'd like to ask when your code and dataset will be released.

回复生成时，输入应当使用golden的上文system response还是模型生成的system response

在测试阶段，我注意到代码中生成system response的input中的context部分似乎是由模型前面自己生成的system response和数据集中的user utterance组成的。

而我在看multiwoz benchmark时，看了里面先前一些模型的代码，在测试阶段，input似乎使用的都是golden的system response而不是模型自己预测的，显然使用模型自己生成的语句作为context会模型降低效果。

先前我一直以为评测E2E-TOD 的system response使用的应当是模型自己预测的结果作为context，现在发现好像两种方式都有，十分困惑，也可能是我代码看的不够透彻，希望能得到解答，十分感谢！

FileNotFoundError

sh scripts/multiwoz2.0/infer.sh

params as below:

#!/bin/bash
set -ux

CUDA environment settings.

export CUDA_VISIBLE_DEVICES=0

Parameters.

DATA_NAME=multiwoz
PROJECT_NAME=GALAXY
MODEL=UnifiedTransformer
PROJECT_ROOT=/data/cll/${PROJECT_NAME}
SAVE_ROOT=/data/cll/${PROJECT_NAME}
VOCAB_PATH=${PROJECT_ROOT}/model/Bert/vocab.txt
VERSION=2.0
LOAD_MODEL_DIR=110-35
LOAD_MODEL_NAME=state_epoch_7
INIT_CHECKPOINT=${SAVE_ROOT}/outputs/${DATA_NAME}${VERSION}/${LOAD_MODEL_DIR}/${LOAD_MODEL_NAME}
WITH_JOINT_ACT=false
USE_TRUE_PREV_BSPN=false
USE_TRUE_PREV_ASPN=false
USE_TRUE_PREV_RESP=false
USE_TRUE_CURR_BSPN=false
USE_TRUE_CURR_ASPN=false
USE_TRUE_DB_POINTER=false
USE_ALL_PREVIOUS_CONTEXT=true
BATCH_SIZE=1
BEAM_SIZE=1
NUM_GPU=1
SEED=10
SAVE_DIR=${SAVE_ROOT}/outputs/${DATA_NAME}${VERSION}/${LOAD_MODEL_DIR}.infer

Main run.

python -u run.py
--do_infer=true
--model=${MODEL}
--save_dir=${SAVE_DIR}
--data_name=${DATA_NAME}
--data_root=${PROJECT_ROOT}
--vocab_path=${VOCAB_PATH}
--init_checkpoint=${INIT_CHECKPOINT}
--with_joint_act=${WITH_JOINT_ACT}
--use_true_prev_bspn=${USE_TRUE_PREV_BSPN}
--use_true_prev_aspn=${USE_TRUE_PREV_ASPN}
--use_true_prev_resp=${USE_TRUE_PREV_RESP}
--use_true_curr_bspn=${USE_TRUE_CURR_BSPN}
--use_true_curr_aspn=${USE_TRUE_CURR_ASPN}
--use_true_db_pointer=${USE_TRUE_DB_POINTER}
--use_all_previous_context=${USE_ALL_PREVIOUS_CONTEXT}
--batch_size=${BATCH_SIZE}
--beam_size=${BEAM_SIZE}
--version=${VERSION}
--gpu=${NUM_GPU}
--seed=${SEED}
--max_len=1024
--max_ctx_turn=20
--num_act=20
--num_type_embeddings=2
--data_processed=data_for_galaxy_encoded.data.json

error:
Traceback (most recent call last):
File "run.py", line 130, in
main()
File "run.py", line 74, in main
bpe = MultiWOZBPETextField(hparams)
File "/data/cll/GALAXY/galaxy/data/field.py", line 356, in init
self._build_vocab()
File "/data/cll/GALAXY/galaxy/data/field.py", line 498, in _build_vocab
self.vocab.load_vocab(vp)
File "/data/cll/GALAXY/galaxy/utils/utils.py", line 199, in load_vocab
self._freq_dict = json.loads(open(vocab_path + '.freq.json', 'r').read())
FileNotFoundError: [Errno 2] No such file or directory: '/data/cll/GALAXY/data/multiwoz2.0/vocab.freq.json'

In project document GALAXY have no "data/multiwoz2.0/vocab.freq.json", where can I get this file? Same error happened in ”scripts/camrest/infer.sh“：FileNotFoundError: [Errno 2] No such file or directory: 'data/camrest/CamRestOTGY.json'

The pretrain code?

Hello, where is the pretrain code for GALAXY

Evaluation on MultiWOZ

How to get 20.50 BLEU on MultiWOZ 2.0 in the paper?

Did you use the BLEUScorer class in galaxy/utils/eval.py?

Two questions about the evaluation

Hi,

Great thanks for providing this fantastic repo!

I have two questions about the evaluations:

How many random seeds did you use to get the main evaluation results on the MultiWOZ2.0 dataset, e.g., Table 3 in your AAAI paper? If more than one seed is used, what are the other seeds except the SEED=10 in GALAXY/scripts/multiwoz2.0/train.sh?
In GALAXY/scripts/multiwoz2.0/infer.sh there is a command LOAD_MODEL_NAME=state_epoch_7. May I ask how you select this checkpoint (the 7-th/60 training epochs)? Is there a way that we can automatically select the best checkpoint?

Looking forward to hearing from you!

bug in the code

When we run pretrain_trainer.py file with some no of batches (let's say 16), instead of running for 16 batches, it runs for all available batches (2372275/32 = 74133 batches). Though we load only 16 batches using DataLoader object in data_loader.py file. I could not find the actual reason behind this. Plz help to resolve this issue

when to release chinese dialogue pretrianing model ?

How to obtain delexicalized representations?

@HwwAncient Hello, thanks for your work!

I'm referring to the downloaded MultiWOZ data. In data_for_galaxy.json, terms user_delex and resp are delexicalized responses. I have following questions:

How are they generated?
How are they used in training and evaluation?

Questions about the labeled`UniDA` with development/test sets.

In Table 1 and Readme, I found that among the 975780 utterances, some datasets such as MultiWOZ and SimJoint also use development set and test set during pre-training.

But the paper further evaluates on MultiWOZ test set. Would it make the evaluations unfair as the model already uses partial labeled information of MultiWOZ?

Question about dynamic booking pointer during dialogue generation

I am interested in coding a little demo with the pretrained multiwoz model. However I am not able to figure out how to inject book info into db pointer dynamically. When should the model trigger to check if booking is possible or not? Let's say in a real world scenario if the predicted dialogue act is [offerbook] do we then check to see if booking is possible or not? It feels like we also need the user act here - something like [accept book] or [reject book] and only after then if predicted user act is [accept book] then system should check whether a booking is possible or not and add that result to db pointer.

Here we see that ground truth book pointer is used. What should be the process and sequence of actions to get the book pointer in a real world scenario. I've tried keeping it as [book_nores] but this causes the dialogue to go into a loop asking whether user would like to do booking or not. One solution is to change it to [book_success] once predicted dialogue act has [offerbook] but in the case of user utterance "no i changed my mind" system still outputs something like "booking was successful." because [book success] was added to db and user's preference didn't have an affect.

I have a very dirty kaggle notebook, you may take a look at my attempt under Inference section.

MultiWOZ 2.2 implementation (data processing)

Hi,
Thank you for releasing the code.

I want to run GALAXY on MultiWOZ 2.2, but there's no code for generating data_for_galaxy.json and generating data_for_galaxy_encoded.data.json.
Could you release the code for creating these files for MultiWOZ 2.2?

Thank you.

Reproducing GALAXY

Thank you for releasing the code to the public!
I am trying to reproduce the pre-train checkpoint you shared on Github, but I could not get the same checkpoint for somehow. So several questions came to my mind:
Q1: Stopping criteria for choosing pre-training & fine-tuning checkpoints. It seems to me that the stopping criteria is not based on validation loss. What was the criteria for choosing the final epoch number? For example, you said epoch 14 for pre-training and epoch 7 for MultiWOZ2.0. I wonder how you came up with the number.
Q2: The number of pre-training data. The UniDA dataset you shared on Github has 463,039, but this seems smaller than the sum of the training sets in eight datasets used for UniDA (according to the paper). Did you get the same checkpoint with the data you currently uploaded?
Q3: GPU machines used for pre-training. It would be great if you could share what GPU machines you used to pre-train the GALAXY checkpoint. I am guessing that might be one of the reasons why I do not get the same result. Thanks!

code and dataset

请问大约什么时间会开源数据和代码呢