RC-Transformer

Improving Transformer with Sequential Context Representations for Abstractive Text Summarization

Recent dominant approaches for abstractive text summarization are mainly RNN-based encoder-decoder framework, these methods usually suffer from the poor semantic representations for long sequences. In this paper, we propose a new abstractive summarization model, called RC-Transformer (RCT). The model is not only capable of learning long-term dependencies, but also addresses the inherent shortcoming of Transformer on insensitivity to word order information.

We extend the Transformer with an additional RNN-based encoder to capture the sequential context representations. In order to extract salient information effectively, we further construct a convolution module to filter the sequential context with local importance. The experimental results on Gigaword and DUC-2004 datasets show that our proposed model achieves the state-of-the-art performance, even without introducing external information. In addition, our model also owns an advantage in speed over the RNN-based models.

Overview

Training

If you want to train your own model, please follow the following steps. Two GPUs with 12GB memory or more will be helpful.

Prepare a parallel linguistics corpus for abstractive text summarization (without tokenization), like Gigaword or CNN Daily/Mail.
Create a dataset folder. Set the prefix, vocab_size, emb_size in config/config.py.
Run python preprocess.py to generate sub-word vocabulary and word2vec embeddings. The input format should be "source"\t"summary". The *_bpe.vocab and word2vbvec.model will appear in the dataset/prefix folder.
Edit the config/config.py for training, Run python train_run.py --cuda --lexical --pretrain_emb. It will take at least 2 days to train a good model with batch_size 64.
We also implement a REINFORCE training, running python train_RL.py --cuda --lexical --pretrain_emb after step4 always get a better model.
*Copy mechanism is also introduced in our experiments. But we did not achieve the desired results. Maybe effective for the CNN Daily/Mail dataset

Testing

please set configuration in config.py
run

python eval.py --cuda --lexical --pretrain_emb

Results

	ROUGE-1	ROUGE-2	ROUGE-L
Gigaword	37.16	17.73	34.41
DUC2004	33.16	14.7	30.52

details see paper

Other

.
├── config             配置文件
│   ├── config.py      模型训练
│   └── eval_config.py 模型预测
├── data
│   ├── data_loader.py load训练数据
│   ├── eval_batcher.py load测试数据
│   ├── generate_topic.py 生成keyword
│   ├── __init__.py
│   └── utils.py   dataloader工具
├── eval.py            模型预测
├── finetune.py        用语言模型微调
├── length_eval.py     实验：不同输出文本长度的影响
├── model
│   ├── beam_search.py  beam search
│   ├── __init__.py
│   ├── lr_scheduler.py 学习率衰减
│   ├── model.py        模型
│   └── optims.py       optimizer优化
├── nn
│   ├── __init__.py
│   ├── modules
│   │   ├── attention.py      self-attention
│   │   ├── embedding.py         通用embedding
│   │   ├── __init__.py
│   │   ├── layers.py            encoder decoder layer
│   │   ├── position_wise.py     position_wise feed forward layer
│   └── └── transformer.py       label smoothing
├── preprocess.py                数据预处理，分词，训练embedding
├── README.md
├── requirements.txt
├── train_RL.py                强化训练
├── train_run.py               模型训练
├── train_copy.py              copy机制
└── visual_attention.py        attention可视化

colinsongf / rctransformer Goto Github PK

rctransformer's Introduction

RC-Transformer

Improving Transformer with Sequential Context Representations for Abstractive Text Summarization

Overview

Training

Testing

Results

Other

rctransformer's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent