chenrocks / fast_abs_rl Goto Github PK

View Code? Open in Web Editor NEW

624.0 624.0 186.0 82 KB

Code for ACL 2018 paper: "Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting. Chen and Bansal"

License: MIT License

Python 100.00%

abstractive-summarization deep-learning natural-language-processing pytorch reinforcement-learning

fast_abs_rl's People

Contributors

Stargazers

Watchers

Forkers

stevenlol wakedupchan clairett zfang dodocho dongxf369 xuanhan863 kayamin qgzang jinyeong poka93 shaoyn0817 yorick76ee xingxinyu96 amoliu chiuyeelau niudong1001 cosecant-csc ruizhanggrammarly tangxiangru weili-nlp yaserkl mattzheng jfsantos shawnxiha quietwoods nn-tony panyang astorfi pked01 andrewhuang121 bung87 hanguo97 huaiwen dx2048 chenmoshushi shengleih caoxu915683474 zhangsh950618 mfx5218 lbda1 dfenglei ashishbaghudana codeaudit shubhampachori12110095 yucoian reloadbrain wurentidai ninikolov zhenyangiacas ifeynman shenjiawei19 pkulzb binwone richardsun-voyager chinmayapatnayak lhuang2019 jerrykuo7727 stephenlasky jpatrick9793 caozhen-alex qiujun1994 benhoff hezhihao10 tuyetkha amyzhangmin wjzhang392 hxw11 digifaire zhiyumeng howellyu susannawull kingqicai 0xdaksh agoloprem nuo97 vinace frankict bellamn charanrajt gm0616 zhoudan0215 tomgun132 jasonclei aishwarya-nr lzw-pku umangkeshri nicemartin legendtianjin nefujiangping munaachyuta aobo-y shaleenx fnan huangluyang001 shuandemorian sohamparikh zupiter jasonjimnz peter-xbs

fast_abs_rl's Issues

Is there a way to train on datasets other than CNN/Daily Mail

Since the preprocessing assumes that the model is going to be run on CNN or Daily Mail. Can this be run on other datasets with minimal changes?

Specifically, is it possible to use pre-trained models on other datasets with minimal changes to the code, or would there have to be too many changes?

how to use it with single article as input and pretrained model outputs summary?

Training steps for Extractor and Abstractor and Full Model

Hi,
Thanks for the great work. Could you share the training steps that you used for Extractor/Abstractor ML training and the joint model RL training? Basically when did you stop training each of these models?

[Question] Apply the code to other datasets ?

I would like to apply this great code to other datasets, but I met several obstacles :

The article is not sentence-separated
The summary is a single sentence, representing several sentences of the article

To overcome these obstacles, I tried following approach :

Use a sentence tokenizer (SpaCy) to process the dataset before training.
As mentioned by #4, extract K sentences and concatenate them into a single sentence to the abstractor.

But it's not a good approach.

First of all, the sentence tokenizer is not perfect and tokenized sentences might be messed up.

About the dataset with a single sentence, most of the time this sentence is very abstractive. Therefore, creating the pseudo-labels is difficult. I believe in this case the ROUGE-L matching might not give the best corresponding sentence.
And if the pseudo-labels are wrong...

So my question is : do you have any lead of general idea that I can implement to improve the existing code for other datasets ?

Evaluation on dev set

Hi,
I noticed that on evaluation part of abstractor, you used ground-truth target to make prediction (not token of argmax prediction of the previous step). Even when we don't let the gradient backpropagate through this dev set, we should act like we don't really see this dataset, do we ?
I observed that the dev score is very close to training score by this bias (though training & dev are not that identical ?). This also leads to a quicker drop in learning rate & perhaps to a suboptimal ?

Thanks,
Hoa

Sentence Split

hi, I want to know how do you divide a document into some sentences? In your code, I found that you treated each word as a sentence. Is it right?

is it a bug in model/rl.py

hi, i think there is a bug in model/rl.py class PtrScorer.forward function,
where the lstm states (h, c) never update in for loop.

TypeError: scatter_add() missing 1 required positional arguments: "src"

Hi Chen,

I have tried to follow your instructions exactly, but the following happens when I run the pretrained model:

loading checkpoint ckpt-0-0...
loading checkpoint ckpt-42.508549-0...
/home/arnav-gulati/.local/lib/python3.6/site-packages/torch/nn/functional.py:1374: UserWarning: nn.functional.tanh is deprecated. Use torch.tanh instead.
  warnings.warn("nn.functional.tanh is deprecated. Use torch.tanh instead.")
Traceback (most recent call last):
  File "decode_full_model.py", line 169, in <module>
    args.max_dec_word, args.cuda)
  File "decode_full_model.py", line 87, in decode
    all_beams = abstractor(ext_arts, beam_size, diverse)
  File "/home/arnav-gulati/Documents/other_models/fast_abs_rl-master/decoding.py", line 116, in __call__
    all_beams = self._net.batched_beamsearch(*dec_args)
  File "/home/arnav-gulati/Documents/other_models/fast_abs_rl-master/model/copy_summ.py", line 125, in batched_beamsearch
    token, states, attention, beam_size)
  File "/home/arnav-gulati/Documents/other_models/fast_abs_rl-master/model/copy_summ.py", line 245, in topk_step
    source=score.contiguous().view(beam*batch, -1) * copy_prob
TypeError: scatter_add() missing 1 required positional arguments: "src"

The command I used was:

python3 decode_full_model.py --path=/home/arnav-gulati/Documents/other_models/fast_abs_rl-master --model_dir=/home/arnav-gulati/Documents/other_models/fast_abs_rl-master/pretrained/acl --beam=5 --test

and I followed your instructions on exporting the data path with:
export DATA=/home/arnav-gulati/Documents/cnn-dailymail-master/finished_files/

I am not sure what causes the following line:
TypeError: scatter_add() missing 1 required positional arguments: "src"

Do you have any advice on what I should do? I believe I have installed all the dependencies, but just in case you want to see them, here they are:

boto 2.49.0 boto3 1.9.171 botocore 1.12.171 bz2file 0.98 certifi 2019.6.16 chardet 3.0.4 cytoolz 0.9.0.1 docutils 0.14 futures 3.2.0 gensim 3.7.3 idna 2.8 jmespath 0.9.4 numpy 1.16.4 pip 19.1.1 protobuf 3.8.0 pyrouge 0.1.3 python-dateutil 2.8.0 requests 2.22.0 s3transfer 0.2.1 scipy 1.2.2 setuptools 41.0.1 six 1.12.0 smart-open 1.8.4 tensorboardX 1.7 toolz 0.9.0 torch 0.4.0 urllib3 1.25.3 wheel 0.33.4

Any help would be appreciated!

i get an error when I modify the number of layers in LSTM

when I modify the number of layers in LSTM and train the train_full_rl.py something wrong;
Start training
Traceback (most recent call last):
File "train_full_rl.py", line 231, in
train(args)
File "train_full_rl.py", line 186, in train
trainer.train()
File "/home/zhangxiaoyi/pyworkspace/bytecup8/fast_abs_rl/training.py", line 211, in train
log_dict = self._pipeline.train_step()
File "/home/zhangxiaoyi/pyworkspace/bytecup8/fast_abs_rl/rl.py", line 193, in train_step
self._stop_reward_fn, self._stop_coeff
File "/home/zhangxiaoyi/pyworkspace/bytecup8/fast_abs_rl/rl.py", line 60, in a2c_train_step
(inds, ms), bs = agent(raw_arts)
File "/home/zhangxiaoyi/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/zhangxiaoyi/pyworkspace/bytecup8/fast_abs_rl/model/rl.py", line 221, in forward
outputs = self._ext(enc_art)
File "/home/zhangxiaoyi/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/zhangxiaoyi/pyworkspace/bytecup8/fast_abs_rl/model/rl.py", line 130, in forward
self._hop_v, self.hop_wq)
File "/home/zhangxiaoyi/pyworkspace/bytecup8/fast_abs_rl/model/rl.py", line 74, in attention
PtrExtractorRL.attention_score(attention, query, v, w), dim=-1)
File "/home/zhangxiaoyi/pyworkspace/bytecup8/fast_abs_rl/model/rl.py", line 66, in attention_score
sum = attention + torch.mm(query, w)
RuntimeError: The size of tensor a (13) must match the size of tensor b (3) at non-singleton dimension 0

RuntimeError: sizes must be non-negative

Hey,
Thank you very much for this great work,
I am having a problem training extractor model on my own data,
I am using pytorch 0.4.0 with Cuda 10.0 on Ubuntu 18.04
I hope you can help me with this issue.
Thank you very much in advanced.

this is the error traceback:

Traceback (most recent call last):
File "/home/me/fast_abs_rl-master/model/extract.py", line 33, in forward
for conv in self._convs], dim=1)
File "/home/me/fast_abs_rl-master/model/extract.py", line 33, in
for conv in self._convs], dim=1)
File "/home/me/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/me/.local/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 176, in forward
self.padding, self.dilation, self.groups)
RuntimeError: sizes must be non-negative

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train_extractor_ml.py", line 237, in
main(args)
File "train_extractor_ml.py", line 179, in main
trainer.train()
File "/home/me/fast_abs_rl-master/training.py", line 212, in train
log_dict = self._pipeline.train_step()
File "/home/me/fast_abs_rl-master/training.py", line 97, in train_step
net_out = self._net(*fw_args)
File "/home/me/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/me/fast_abs_rl-master/model/extract.py", line 287, in forward
enc_out = self._encode(article_sents, sent_nums)
File "/home/me/fast_abs_rl-master/model/extract.py", line 309, in _encode
for art_sent in article_sents]
File "/home/me/fast_abs_rl-master/model/extract.py", line 309, in
for art_sent in article_sents]
File "/home/me/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/me/fast_abs_rl-master/model/extract.py", line 40, in forward
for conv in self._convs])
File "/home/me/fast_abs_rl-master/model/extract.py", line 40, in
for conv in self._convs])
File "/home/me/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/me/.local/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 176, in forward
self.padding, self.dilation, self.groups)
RuntimeError: sizes must be non-negative

A question about rl training function

for action, p, r, b in zip(indices, probs, reward, baseline): advantage = r - b avg_advantage += advantage losses.append(-p.log_prob(action) * (advantage/len(indices))) # divide by T*B

I have a question about this piece of code.
If I didn't get it wrong, the variable b here is tensor with gradient enabled, so optimizing tensors in losses will actually both optimize reward by changing policy weights and minimizing the advantage by maximizing baseline. I can't understand why the baseline is optimized here, because as far as I know, the baseline should only be optimized during the training of the critic.
Actually I used this training function in a different summarization task, and I found that the avg_advantage is always dropping.
Thank you very much.

net is not-defined when noCuda

Line156 of train_abstractor.py:
if args.cuda:
net = net.cuda()
pipeline = BasicPipeline(meta['net'], net,
train_batcher, val_batcher, args.batch, val_fn,
criterion, optimizer, grad_fn)

Since there is no cuda, net is not-defined. Therefore, the train_step function of BasicPipeline in training.py will produce a Segmentation fault.

russian data

Hi,

With your repo, I want to create own model to do summarization article to the title in my data.
How u think, when I have done this with your repo, the result had been similar?

Plus, how I should prepare my data?

when my dataset is single sentence abstract.

Dear Chen
I train this model in othe dataset of single sentence abstract. but in the process of implementation, i find that the model(abstractor) build an channel by single sentence to single abstract.
If i try to intelligently select K sentence by rouge in extractor stage, than i flatten this sentence to train in abstractor and full_rl, do you think this is feasible？
Does the model of fast_abs_rl apply to single sentence abstract? I would be very grateful for your reply and advices.

Why need to remove the last id of target?

fast_abs_rl/data/batcher.py

Line 170 in aebf539

remove_last = lambda tgt: tgt[:-1]

Hi, I found in batcher.py there exist one line that remove the last id of the target, but I am very puzzle why do this, can you kindly explain it for me?

about the paper

Hi Chen,
I read your paper, which is very interesting. I have several questions; could you please help me figure them out ?

The extractor extracts salient sentences (d_1,d_2,...d_n) from the original documents. For each salient sentence, the abstractor function (g) generate a summary sentence; all the summary sentences are concatenated to form the entire summary.
I am wondering if my understanding is correct.
In the paragraph above section 2.1, I am wondering if this is a typo?

S_i should be set of summary sentences in y_i while D_i is the set of document sentences in x_i

Thanks

invalid gradient at index 0 - expected shape [] but got [1]

Hi,

I am new to pytorch, and was trying to use your model for summarizing some text. I have been trying to run training with news data, and then added only a single new file, in both cases, train_full_rl fails in rl.py file with below error, I tried to switch to GPU and still get same problem, I was wondering if you have seen this error before. When I was using GPU, it would show error that input and target size does not match, that is why I am printing out sizes (torch.Size([700])):

{'net_args': {'extractor': {'net_args': {'conv_hidden': 100, 'emb_dim': 128, 'bi directional': True, 'vocab_size': 30004, 'lstm_layer': 1, 'lstm_hidden': 256}, ' traing_params': {'batch_size': 32, 'lr_decay': 0.5, 'optimizer': ['adam', {'lr': 0.001}], 'clip_grad_norm': 2}, 'net': 'ml_rnn_extractor'}, 'abstractor': {'net_ args': {'emb_dim': 128, 'bidirectional': True, 'vocab_size': 30004, 'n_hidden': 256, 'n_layer': 1}, 'traing_params': {'batch_size': 32, 'lr_decay': 0.5, 'optimi zer': ['adam', {'lr': 0.001}], 'clip_grad_norm': 2.0}, 'net': 'base_abstractor'} }, 'train_params': {'gamma': 0.95, 'lr_decay': 0.5, 'stop_reward': 'rouge-1', 'c lip_grad_norm': 2, 'batch_size': 32, 'reward': 'rouge-l', 'optimizer': ('adam', {'lr': 0.0001}), 'stop_coeff': 1.0}, 'net': 'rnn-ext_abs_rl'}
Start training
torch.Size([700])
torch.Size([700])
Traceback (most recent call last):
File "Geneva_ABS.py", line 266, in
main(args)
File "Geneva_ABS.py", line 224, in main
train_full_rl().train(path=output_path+'/model/', abs_dir=output_path+'/abst ractor/', ext_dir = output_path+'/extractor/')
File "/home/train_full_rl.py", line 195 , in train
trainer.train()
File "/home/training.py", line 211, in train
log_dict = self.pipeline.train_step()
File "/home/rl.py", line 177, in train step
self._stop_reward_fn, self._stop_coeff
File "/home/rl.py", line 108, in a2c_tr ain_step
[torch.ones(1).to(critic_loss.device)]*(1+len(losses))
File "/home/venv/lib/python3.5/site-packages/torch/autograd/in it.py", line 90, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: invalid gradient at index 0 - expected shape [] but got [1]

where are [path/to/abstractor/model] and [path/to/extractor/model]?

Hi,
I go to step train abstractor and extractor using ML objectives. I want to know the location of path to abstractor and extractor to run.

Using a different word embedding model

Hey Chen,

Recently, a very interesting model called BERT (https://github.com/google-research/bert) came out and has shown to achieve novel results on several NLP tasks. Do you think the results of your model can be improved by using their embedding model instead of word2vec?

I think it is an interesting idea, but I think I would need to make many many changes to the current architecture of your model.

Kind regards,

Nick

Attempt to reproduce the results of rnn-ext + RL in the paper

Hi, I tried to reproduce the results of rnn-ext + RL (without abstractor).
First, I pre-train an extractor using train_extractor_ml and construct the proxy training labels using ROUGE-L F1 score. Then, I run the train_full_rl.py and I do not specify the path of abstractor. After I finish the training, the results are significantly lower than the values reported in your paper. I just wonder if it is incorrect to use ROUGE-L F1 score to construct the proxy labels for rnn-ext, or you used other reward functions for it? Thank you so much for your help!

Possible bug in compute_rouge_n & compute_rouge_l & compute_rouge_l_summ

fast_abs_rl/metric.py

Lines 33 to 38 in 9e6c45d

 if mode == 'p': 

 score = precision 

 if mode == 'r': 

 score = recall 

 else: 

 score = f_score

I assume this is intended:
if mode == 'p':
     score = precision
elif mode == 'r':
      score = recall
else:
     score = f_score

how can i restore my model?

hello dear Chen! i would like to know if i have interrupted my training, can i restore my model from the previous model checkpoint?

Must need GPU?

model/util.py has two following lines that requires GPU.
LineNum 59: order = torch.LongTensor(order).to(sequence_emb.get_device())
LineNum 73: order = torch.LongTensor(order).to(lstm_states[0].get_device())

However, my server has no GPU. When I train my own model using
python train_abstractor.py --no-cuda --path=mypath/abstractor --w2v=mypath/word2vec.128d.226k.bin
I will get the error:
fast_abs_rl/model/util.py", line 73, in reorder_lstm_states
order = torch.LongTensor(order).to(lstm_states[0].get_device())
RuntimeError: _th_get_device is not implemented for type torch.FloatTensor

I just check the doc of pytorch. It says that only GPU tensor has the method get_device().
"For CUDA tensors, this function returns the device ordinal of the GPU on which the tensor resides. For CPU tensors, an error is thrown." as "RuntimeError: get_device is not implemented for type torch.FloatTensor"

Does the code must need a GPU? Thank you.

How to get the extractive summary

Hello, I can train the model, I just get a label of extractor, but I can't find how to get the first phrase extrator summay .Could you tell me how to fix it?

`bidirectional` for StackedLSTMCells and a typo(?)

I notice two unusual places of codes, and was puzzled by that.

(1) in class StackedLSTMCells in rnn.py
line 102
@Property
def bidirectional(self):
return self._cells[0].bidirectional # ==> LSTMCell has no bidirectional?

I suppose self._cells stores nn.LSTMCell, which has no bidirectional attribute? Do I understand this correctly?

(2) In class _CopyLinear in copy_summ.py

inside __init__ function, I notice a line:
self.regiser_module(None, '_b'). Do you mean register_module?
If I set bias=False, I will get an error either way. Could you comment on this?

Thank you!

How to utilize the results form the extractor?

Thanks for the great work!
I'm just wondering if we can utilize the results form the extractor independently.

Inference

Hi, is it any way to make easy inferences from texts? say I have an article and I want to see what is the summary for the model, without any evaluation.
Thanks for your work!

Dropout Layer in CNN

Hello,

Here in the CNN model, it seems the dropout layer is placed before the convolutional layer as opposed to after the max pooling layer as per Kim 2014:

fast_abs_rl/model/extract.py

Line 28 in b1b66c1

conv_in = F.dropout(emb_input.transpose(1, 2),

Is this intentional or a bug? If it's intentional, what is the rationale behind it?

Thanks,

Felix

Why pretrained a word2vec word embedding only on the train set?

fast_abs_rl/train_word2vec.py

Line 24 in aebf539

self._path = join(DATA_DIR, 'train')

Excuse me, I found that you only pretrained a word2vec word embedding on the train set，not included val set. Have any deep meaning? I am confused about this.Thanks！

[Question] why recall of Rouge L for pseudo-labels ?

I read your paper again, and couldn't find the reason why you are using the recall of ROUGE-L score to make the pseudo-labels. Specifically :

Why ROUGE-L ? (and not ROUGE-1 for example)

Why precision ?

RuntimeError: Expected tensor for argument #1 'input' to have the same dimensionas tensor for 'result'

Traceback (most recent call last):
  File "train_extractor_ml.py", line 237, in <module>
    main(args)
  File "train_extractor_ml.py", line 179, in main
    trainer.train()
  File "/home/eddiewng/LCSTS/fast_abs_rl/training.py", line 211, in train
    log_dict = self._pipeline.train_step()
  File "/home/eddiewng/LCSTS/fast_abs_rl/training.py", line 96, in train_step
    net_out = self._net(*fw_args)
  File "/home/eddiewng/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/eddiewng/LCSTS/fast_abs_rl/model/extract.py", line 276, in forward
    enc_out = self._encode(article_sents, sent_nums)
  File "/home/eddiewng/LCSTS/fast_abs_rl/model/extract.py", line 296, in _encode
    for art_sent in article_sents]
  File "/home/eddiewng/LCSTS/fast_abs_rl/model/extract.py", line 296, in <listcomp>
    for art_sent in article_sents]
  File "/home/eddiewng/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/eddiewng/LCSTS/fast_abs_rl/model/extract.py", line 31, in forward
    for conv in self._convs], dim=1)
  File "/home/eddiewng/LCSTS/fast_abs_rl/model/extract.py", line 31, in <listcomp>
    for conv in self._convs], dim=1)
  File "/home/eddiewng/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/eddiewng/.local/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 176, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: Expected tensor for argument #1 'input' to have the same dimension as tensor for 'result'; but 4 does not equal 3 (while checking arguments for cudnn_convolution)

I am trying to implement the model on chinese dataset. I succeeded to train the abstractive model. But when i try to train the extractive model. It just throw out this error. I have no idea what's going on. Does any one have any idea?

Error: AttributeError: 'float' object has no attribute 'item'.

Hey,
Thank you very much for this great work.
i am having this error when I train my abstractor model:
AttributeError: 'float' object has no attribute 'item'.
any ideas why ?
Thanks in advanced

RuntimeError: CUDNN_STATUS_EXECUTION_FAILED

RuntimeError: CUDNN_STATUS_EXECUTION_FAILED

ubuntu 16.0.4
pytorch 0.4.0
python 3.6.5
cuda 9.0
##########
what should I do..

RuntimeError: cuda runtime error (2) : out of memory

(train the full RL model)
I am trying to run python train_full_rl.py --path=[path/to/save/model] --abs_dir=[path/to/abstractor/model] --ext_dir=[path/to/extractor/model]
My GPU has 12GB memory. NVIDIA-SMI 387.26 , Driver Version: 387.26
0 Tesla K40m .
Could you comment on how much GPU RAM does training this needs in the README?
Thank you!
I met such error:

loading checkpoint ckpt-1.482692-6000...
loading checkpoint ckpt-3.515528-3000...
loading checkpoint ckpt-1.482692-6000...
start training with the following hyper-parameters:
{'net': 'rnn-ext_abs_rl', 'net_args': {'abstractor': {'net': 'base_abstractor', 'net_args': {'vocab_size': 30004, 'emb_dim': 128, 'n_hidden': 256, 'bidirectional': True, 'n_layer': 1}, 'traing_params': {'optimizer': ['adam', {'lr': 0.001}], 'clip_grad_norm': 2.0, 'batch_size': 32, 'lr_decay': 0.5}}, 'extractor': {'net': 'ml_rnn_extractor', 'net_args': {'vocab_size': 30004, 'emb_dim': 128, 'conv_hidden': 100, 'lstm_hidden': 256, 'lstm_layer': 1, 'bidirectional': True}, 'traing_params': {'optimizer': ['adam', {'lr': 0.001}], 'clip_grad_norm': 2.0, 'batch_size': 32, 'lr_decay': 0.5}}}, 'train_params': {'optimizer': ('adam', {'lr': 0.0001}), 'clip_grad_norm': 2.0, 'batch_size': 32, 'lr_decay': 0.5, 'gamma': 0.95, 'reward': 'rouge-l', 'stop_coeff': 1.0, 'stop_reward': 'rouge-1'}}
Start training
WARNING: Exploding Gradients 1829948160.00
WARNING: Exploding Gradients 110035.23
THCudaCheck FAIL file=/pytorch/aten/src/THC/generic/THCStorage.cu line=58 error=2 : out of memory
Traceback (most recent call last):
File "train_full_rl.py", line 228, in
train(args)
File "train_full_rl.py", line 182, in train
trainer.train()
File "/home/project/fast_abs_rl/fast_abs_rl/training.py", line 211, in train
log_dict = self._pipeline.train_step()
File "/home/project/fast_abs_rl/fast_abs_rl/rl.py", line 173, in train_step
self._stop_reward_fn, self._stop_coeff
File "/home/project/fast_abs_rl/fast_abs_rl/rl.py", line 64, in a2c_train_step
summaries = abstractor(ext_sents)
File "/home/project/fast_abs_rl/fast_abs_rl/decoding.py", line 94, in call
decs, attns = self._net.batch_decode(*dec_args)
File "/home/project/fast_abs_rl/fast_abs_rl/model/copy_summ.py", line 63, in batch_decode
attention, init_dec_states = self.encode(article, art_lens)
File "/home/project/fast_abs_rl/fast_abs_rl/model/summ.py", line 81, in encode
init_enc_states, self.embedding
File "/home/project/fast_abs_rl/fast_abs_rl/model/rnn.py", line 41, in lstm_encoder
lstm_out = reorder_sequence(lstm_out, reorder_ind, lstm.batch_first)
File "/home/project/fast_abs_rl/fast_abs_rl/model/util.py", line 58, in reorder_sequence
sorted = sequence_emb.index_select(index=order, dim=batch_dim)
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/aten/src/THC/generic/THCStorage.cu:58

Error when training extractor

Traceback (most recent call last):
File "C:\Users\admin\AppData\Local\Programs\Python\Python36\lib\multiprocessing\context.py", line 190, in get_context
ctx = _concrete_contexts[method]
KeyError: 'forkserver'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "D:/lab/2018match/fast_abs_rl-master/fast_abs_rl-master/train_extractor_ml.py", line 237, in
main(args)
File "D:/lab/2018match/fast_abs_rl-master/fast_abs_rl-master/train_extractor_ml.py", line 131, in main
args.cuda, args.debug)
File "D:/lab/2018match/fast_abs_rl-master/fast_abs_rl-master/train_extractor_ml.py", line 71, in build_batchers
single_run=False, fork=not debug)
File "D:\lab\2018match\fast_abs_rl-master\fast_abs_rl-master\data\batcher.py", line 216, in init
ctx = mp.get_context('forkserver')
File "C:\Users\admin\AppData\Local\Programs\Python\Python36\lib\multiprocessing\context.py", line 238, in get_context
return super().get_context(method)
File "C:\Users\admin\AppData\Local\Programs\Python\Python36\lib\multiprocessing\context.py", line 192, in get_context
raise ValueError('cannot find context for %r' % method)
ValueError: cannot find context for 'forkserver'

AttributeError: 'float' object has no attribute 'item'

Hi,
when I run the train_abstractor.py, I got the following error:

Traceback (most recent call last):
File "train_abstractor.py", line 220, in
main(args)
File "train_abstractor.py", line 166, in main
trainer.train()
File "/data/comp/fast_abs_rl/training.py", line 211, in train
log_dict = self._pipeline.train_step()
File "/data/comp/fast_abs_rl/training.py", line 107, in train_step
log_dict.update(self._grad_fn())
File "/data/comp/fast_abs_rl/training.py", line 20, in f
grad_norm = grad_norm.item()
AttributeError: 'float' object has no attribute 'item'

How to solve this problem? Thanks!

pin_memory when learning

Hey Chen,

I have a question about the DataLoader in the files where the extractor, abstractor and RL are trained, more specifically, about this line of code:

val_loader = DataLoader(
MatchDataset('val'), batch_size=BUCKET_SIZE,
shuffle=False, num_workers=4 if cuda and not debug else 0,
collate_fn=coll_fn
)

I was wondering why pin_memory is not set to TRUE? I have read that this can improve performance when training on Nvidia GPU's. Also, would you advise to set num_workers higher when using a high-end GPU (Tesla V100 16GB) and 8 cores CPU?

Thanks!

Nick

Bug when limiting the number of sentences in the summary

When I limit the number of sentences in the summary to 1, for example, the forward function in PtrExtractorRL in /model/rl.py is called.
The for loop for _ in range(n_step): is run once (n_step=1) and the output is returned.
Yet in decode_full_model.py the last element of the extracted output is removed as EOE:
ext = extractor(raw_art_sents)[:-1] # exclude EOE.

So in this case the only sentence being extracted is discarded as EOE.

One solution I can think of is to append an EOE in the forward function of PtrExtractorRL. Specifically, insert the following lines just before the return call:

if out.item() != score.size()[1]:
    append(torch.tensor([[score.size()[1]]], device=out.get_device()))

[Clarification] Why EOE is trained only at RL time ?

I understand that EOE token allow the agent to stop in RL training.

But why not training also the extractor (only) to extract EOE token ?

I'm trying to add another token into your architecture, and follow your design by adding it only at RL.

However results are not satisfying. I'm wondering if I should train the extractor too, with this supplementary token.

What do you think ?

Possible bug in beam search

L127-L128 has no effect:

fast_abs_rl/decoding.py

Lines 126 to 130 in b1b66c1

 if i == UNK: 

 art_sent[max(range(len(art_sent)), 

 key=lambda j: attn[j].item())] 

 else: 

 seq.append(id2word[i])

I assume this is intended:seq.append(art_sent[max(range(len(art_sent)), key=lambda j: attn[j].item())]).

attention for LSTMPointerNet - self._attn_wm is not used

Hi,

Thanks for sharing the excellent work and high-quality code! I notice in the attention mechanism for LSTMPointerNet, you didn't use self._attn_wm in attention_score function. I guess the result is already good without it. Did you decide to drop self._attn_wm because it does not help much? Thanks for the comments.

error when runing train_full_rl in my own dataset

WARNING: Exploding Gradients 8048.54 WARNING: Exploding Gradients 219.81 WARNING: Exploding Gradients 360.95 WARNING: Exploding Gradients 233.41 Traceback (most recent call last): File "train_full_rl.py", line 228, in <module> train(args) File "train_full_rl.py", line 182, in train trainer.train() File "/data/xiaqiang/nmt/fast_abs_rl/training.py", line 211, in train log_dict = self._pipeline.train_step() File "/data/xiaqiang/nmt/fast_abs_rl/rl.py", line 174, in train_step self._stop_reward_fn, self._stop_coeff File "/data/xiaqiang/nmt/fast_abs_rl/rl.py", line 101, in a2c_train_step critic_loss = F.mse_loss(baseline, reward) File "/data/xiaqiang/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 1716, in mse_loss return _pointwise_loss(lambda a, b: (a - b) ** 2, torch._C._nn.mse_loss, input, target, reduction) File "/data/xiaqiang/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 1674, in _pointwise_loss return lambd_optimized(input, target, reduction) RuntimeError: input and target shapes do not match: input [204], target [200] at /opt/conda/conda-bld/pytorch_1535491974311/work/aten/src/THCUNN/generic/MSECriterion.cu:12

run the training in spark cluster

How can i use this code to train and inference my model in distributed environment of spark cluster which has multiple cpu nodes.

[Question] Why not ff-ext with RL ?

From the results of your paper :

For both abstractive and extractive approach, it seems that :

ff-ext > rnn-ext

Why not comparing ff-ext + RL and ff-ext + RL + rerank with other approaches as well ?

About ROUGE for evaluations.

Hi,
I followed the institution to install and set pyrouge and ROUGE1.5.5,
when I evaluate the full model by executing the "eval_full_model.py" script, the --meteor flag works well, but the --rouge flag will output errors like:

Cannot open exception db file for reading: /home/zqj/Workspace/Tools/pyrouge-master/tools/ROUGE-1.5.5/data/WordNet-2.0.exc.db
Traceback (most recent call last):
File "eval_full_model.py", line 53, in
main(args)
File "eval_full_model.py", line 26, in main
output = eval_rouge(dec_pattern, dec_dir, ref_pattern, ref_dir)
File "/home/zqj/Workspace/fast_abs_rl/evaluate.py", line 40, in eval_rouge
output = sp.check_output(cmd.split(' '), universal_newlines=True)
File "/usr/lib/python3.5/subprocess.py", line 626, in check_output
**kwargs).stdout
File "/usr/lib/python3.5/subprocess.py", line 708, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['/home/zqj/Workspace/Tools/pyrouge-master/tools/ROUGE-1.5.5/ROUGE-1.5.5.pl', '-e', '/home/zqj/Workspace/Tools/pyrouge-master/tools/ROUGE-1.5.5/data', '-c', '95', '-r', '1000', '-n', '2', '-m', '-a', '/tmp/tmpf5fnmfg3/settings.xml']' returned non-zero exit status 255

My OS is: Ubuntu 16.04.
Perl version: perl 5, version 22.

My I ask how to solve this is issue?

Data Format

I read the source code and found that the data format is JSON. How it is specifically constructed? I want to use my own data. What do I need to do?

is there a way to extract a fixed number of summary sentences?

Hey,
Thanks for this great work, I just want to know if it is easily possible to force the model to predict exactly n abstract lines, so all tested articles will have the same number of summary lines.

No such file or directory: 'path/to/raw-stories/val'

Where should I create val/test folder ?

I am trying to evaluate from pretrained model. As per the instructions after downloaing and detokenising the dataset I get,

Prretrained model folder
*/stories folder
*_stories_tokenized folder

ishandutta2007@MacBook-Pro:~/Documents/Projects/fast_abs_rl$ export DATA=../cnn-dailymail/cnn/stories

ishandutta2007@MacBook-Pro:~/Documents/Projects/fast_abs_rl$ python decode_full_model.py --path=../cnn-dailymail/cnn_stories_tokenized --model_dir=./pretrained/acl --beam=5 --val

loading checkpoint ckpt-0-0...
loading checkpoint ckpt-42.508549-0...
Traceback (most recent call last):
File "decode_full_model.py", line 169, in
args.max_dec_word, args.cuda)
File "decode_full_model.py", line 49, in decode
dataset = DecodeDataset(split)
File "/Users/ishandutta2007/Documents/Projects/fast_abs_rl/decoding.py", line 30, in init
super().init(split, DATASET_DIR)
File "/Users/ishandutta2007/Documents/Projects/fast_abs_rl/data/data.py", line 14, in init
self._n_data = _count_data(self._data_path)
File "/Users/ishandutta2007/Documents/Projects/fast_abs_rl/data/data.py", line 29, in _count_data
names = os.listdir(path)
FileNotFoundError: [Errno 2] No such file or directory: '../cnn-dailymail/cnn/stories/val'

IndexError: too many indices for tensor of dimension 1

batcher.py#L119. There is an ERROR when the target summary has one sentence only.

No beam search during training