An unofficial repository of the paper 'A Document-grounded Matching Network for Response Selection in Retrieval-based Chatbots' [paper], which is implemented in PyTorch.
The codes have been tested on CMUDoG dataset with 1รV100 GPU, training for 2h.
- Python 3.7
- PyTorch 1.7.0
- fitlog
Model | CMUDoG | PersonaChat | |||||||
---|---|---|---|---|---|---|---|---|---|
Original Persona | Revised Persona | ||||||||
R@1 | R@2 | R@5 | R@1 | R@2 | R@5 | R@1 | R@2 | R@5 | |
DGMN (Original) | 65.6 | 78.3 | 91.2 | 67.6 | 80.2 | 92.9 | 58.8 | 62.5 | 87.7 |
DGMN (Reproduced) | 71.6 | 83.4 | 94.7 | - | - | - | - | - | - |
DGMN + 100d_w2v (Reproduced) | 72.5 | 85.9 | 97.1 | - | - | - | - | - | - |
You can download the datasets and their corresponding embedding tables used in their paper from the following links.
- PERSONA-CHAT and its embedding and vocabulary files.
- CMU_DoG and its embedding and vocabulary files.
Unzip the datasets to the folder of dataset
and run the preprocessing codes provided in JasonForjoy/FIRE.
Then, you will obtain the preprocessing files in dataset/personachat_preprocessed
and dataset/cmudog_preprocessed
.
Also, we preprocessed the data in advanced, you can also simply run this command to download the preprocessed data:
sh download_dataset.sh
Temporarily we only test on CMUDoG. Here is an example to train DGMN in CMUDoG dataset.
sh run_cmudog.sh
Thank Xueliang Zhao for providing the source codes written in TensorFlow for reference, it does help a lots.
- Evaluate on PersonaChat dataset.
- Check why the reproduced version outperforms the reported result reported in the original paper.