chenrocks / fast_abs_rl Goto Github PK
View Code? Open in Web Editor NEWCode for ACL 2018 paper: "Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting. Chen and Bansal"
License: MIT License
Code for ACL 2018 paper: "Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting. Chen and Bansal"
License: MIT License
Since the preprocessing assumes that the model is going to be run on CNN or Daily Mail. Can this be run on other datasets with minimal changes?
Specifically, is it possible to use pre-trained models on other datasets with minimal changes to the code, or would there have to be too many changes?
Hi,
Thanks for the great work. Could you share the training steps that you used for Extractor/Abstractor ML training and the joint model RL training? Basically when did you stop training each of these models?
I would like to apply this great code to other datasets, but I met several obstacles :
To overcome these obstacles, I tried following approach :
But it's not a good approach.
First of all, the sentence tokenizer is not perfect and tokenized sentences might be messed up.
About the dataset with a single sentence, most of the time this sentence is very abstractive. Therefore, creating the pseudo-labels is difficult. I believe in this case the ROUGE-L matching might not give the best corresponding sentence.
And if the pseudo-labels are wrong...
So my question is : do you have any lead of general idea that I can implement to improve the existing code for other datasets ?
Hi,
I noticed that on evaluation part of abstractor, you used ground-truth target to make prediction (not token of argmax prediction of the previous step). Even when we don't let the gradient backpropagate through this dev set, we should act like we don't really see this dataset, do we ?
I observed that the dev score is very close to training score by this bias (though training & dev are not that identical ?). This also leads to a quicker drop in learning rate & perhaps to a suboptimal ?
Thanks,
Hoa
hi, I want to know how do you divide a document into some sentences? In your code, I found that you treated each word as a sentence. Is it right?
hi, i think there is a bug in model/rl.py class PtrScorer.forward function,
where the lstm states (h, c) never update in for loop.
Hi Chen,
I have tried to follow your instructions exactly, but the following happens when I run the pretrained model:
loading checkpoint ckpt-0-0...
loading checkpoint ckpt-42.508549-0...
/home/arnav-gulati/.local/lib/python3.6/site-packages/torch/nn/functional.py:1374: UserWarning: nn.functional.tanh is deprecated. Use torch.tanh instead.
warnings.warn("nn.functional.tanh is deprecated. Use torch.tanh instead.")
Traceback (most recent call last):
File "decode_full_model.py", line 169, in <module>
args.max_dec_word, args.cuda)
File "decode_full_model.py", line 87, in decode
all_beams = abstractor(ext_arts, beam_size, diverse)
File "/home/arnav-gulati/Documents/other_models/fast_abs_rl-master/decoding.py", line 116, in __call__
all_beams = self._net.batched_beamsearch(*dec_args)
File "/home/arnav-gulati/Documents/other_models/fast_abs_rl-master/model/copy_summ.py", line 125, in batched_beamsearch
token, states, attention, beam_size)
File "/home/arnav-gulati/Documents/other_models/fast_abs_rl-master/model/copy_summ.py", line 245, in topk_step
source=score.contiguous().view(beam*batch, -1) * copy_prob
TypeError: scatter_add() missing 1 required positional arguments: "src"
The command I used was:
python3 decode_full_model.py --path=/home/arnav-gulati/Documents/other_models/fast_abs_rl-master --model_dir=/home/arnav-gulati/Documents/other_models/fast_abs_rl-master/pretrained/acl --beam=5 --test
and I followed your instructions on exporting the data path with:
export DATA=/home/arnav-gulati/Documents/cnn-dailymail-master/finished_files/
I am not sure what causes the following line:
TypeError: scatter_add() missing 1 required positional arguments: "src"
Do you have any advice on what I should do? I believe I have installed all the dependencies, but just in case you want to see them, here they are:
boto 2.49.0 boto3 1.9.171 botocore 1.12.171 bz2file 0.98 certifi 2019.6.16 chardet 3.0.4 cytoolz 0.9.0.1 docutils 0.14 futures 3.2.0 gensim 3.7.3 idna 2.8 jmespath 0.9.4 numpy 1.16.4 pip 19.1.1 protobuf 3.8.0 pyrouge 0.1.3 python-dateutil 2.8.0 requests 2.22.0 s3transfer 0.2.1 scipy 1.2.2 setuptools 41.0.1 six 1.12.0 smart-open 1.8.4 tensorboardX 1.7 toolz 0.9.0 torch 0.4.0 urllib3 1.25.3 wheel 0.33.4
Any help would be appreciated!
when I modify the number of layers in LSTM and train the train_full_rl.py something wrong;
Start training
Traceback (most recent call last):
File "train_full_rl.py", line 231, in
train(args)
File "train_full_rl.py", line 186, in train
trainer.train()
File "/home/zhangxiaoyi/pyworkspace/bytecup8/fast_abs_rl/training.py", line 211, in train
log_dict = self._pipeline.train_step()
File "/home/zhangxiaoyi/pyworkspace/bytecup8/fast_abs_rl/rl.py", line 193, in train_step
self._stop_reward_fn, self._stop_coeff
File "/home/zhangxiaoyi/pyworkspace/bytecup8/fast_abs_rl/rl.py", line 60, in a2c_train_step
(inds, ms), bs = agent(raw_arts)
File "/home/zhangxiaoyi/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/zhangxiaoyi/pyworkspace/bytecup8/fast_abs_rl/model/rl.py", line 221, in forward
outputs = self._ext(enc_art)
File "/home/zhangxiaoyi/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/zhangxiaoyi/pyworkspace/bytecup8/fast_abs_rl/model/rl.py", line 130, in forward
self._hop_v, self.hop_wq)
File "/home/zhangxiaoyi/pyworkspace/bytecup8/fast_abs_rl/model/rl.py", line 74, in attention
PtrExtractorRL.attention_score(attention, query, v, w), dim=-1)
File "/home/zhangxiaoyi/pyworkspace/bytecup8/fast_abs_rl/model/rl.py", line 66, in attention_score
sum = attention + torch.mm(query, w)
RuntimeError: The size of tensor a (13) must match the size of tensor b (3) at non-singleton dimension 0
Hey,
Thank you very much for this great work,
I am having a problem training extractor model on my own data,
I am using pytorch 0.4.0 with Cuda 10.0 on Ubuntu 18.04
I hope you can help me with this issue.
Thank you very much in advanced.
this is the error traceback:
Traceback (most recent call last):
File "/home/me/fast_abs_rl-master/model/extract.py", line 33, in forward
for conv in self._convs], dim=1)
File "/home/me/fast_abs_rl-master/model/extract.py", line 33, in
for conv in self._convs], dim=1)
File "/home/me/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/me/.local/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 176, in forward
self.padding, self.dilation, self.groups)
RuntimeError: sizes must be non-negative
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "train_extractor_ml.py", line 237, in
main(args)
File "train_extractor_ml.py", line 179, in main
trainer.train()
File "/home/me/fast_abs_rl-master/training.py", line 212, in train
log_dict = self._pipeline.train_step()
File "/home/me/fast_abs_rl-master/training.py", line 97, in train_step
net_out = self._net(*fw_args)
File "/home/me/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/me/fast_abs_rl-master/model/extract.py", line 287, in forward
enc_out = self._encode(article_sents, sent_nums)
File "/home/me/fast_abs_rl-master/model/extract.py", line 309, in _encode
for art_sent in article_sents]
File "/home/me/fast_abs_rl-master/model/extract.py", line 309, in
for art_sent in article_sents]
File "/home/me/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/me/fast_abs_rl-master/model/extract.py", line 40, in forward
for conv in self._convs])
File "/home/me/fast_abs_rl-master/model/extract.py", line 40, in
for conv in self._convs])
File "/home/me/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/me/.local/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 176, in forward
self.padding, self.dilation, self.groups)
RuntimeError: sizes must be non-negative
for action, p, r, b in zip(indices, probs, reward, baseline): advantage = r - b avg_advantage += advantage losses.append(-p.log_prob(action) * (advantage/len(indices))) # divide by T*B
I have a question about this piece of code.
If I didn't get it wrong, the variable b here is tensor with gradient enabled, so optimizing tensors in losses will actually both optimize reward by changing policy weights and minimizing the advantage by maximizing baseline. I can't understand why the baseline is optimized here, because as far as I know, the baseline should only be optimized during the training of the critic.
Actually I used this training function in a different summarization task, and I found that the avg_advantage is always dropping.
Thank you very much.
Line156 of train_abstractor.py:
if args.cuda:
net = net.cuda()
pipeline = BasicPipeline(meta['net'], net,
train_batcher, val_batcher, args.batch, val_fn,
criterion, optimizer, grad_fn)
Since there is no cuda, net is not-defined. Therefore, the train_step function of BasicPipeline in training.py will produce a Segmentation fault.
Hi,
With your repo, I want to create own model to do summarization article to the title in my data.
How u think, when I have done this with your repo, the result had been similar?
Plus, how I should prepare my data?
Dear Chen
I train this model in othe dataset of single sentence abstract. but in the process of implementation, i find that the model(abstractor) build an channel by single sentence to single abstract.
If i try to intelligently select K sentence by rouge in extractor stage, than i flatten this sentence to train in abstractor and full_rl, do you think this is feasible?
Does the model of fast_abs_rl apply to single sentence abstract? I would be very grateful for your reply and advices.
Line 170 in aebf539
Hi Chen,
I read your paper, which is very interesting. I have several questions; could you please help me figure them out ?
The extractor extracts salient sentences (d_1,d_2,...d_n) from the original documents. For each salient sentence, the abstractor function (g) generate a summary sentence; all the summary sentences are concatenated to form the entire summary.
I am wondering if my understanding is correct.
In the paragraph above section 2.1, I am wondering if this is a typo?
Thanks
Hi,
I am new to pytorch, and was trying to use your model for summarizing some text. I have been trying to run training with news data, and then added only a single new file, in both cases, train_full_rl fails in rl.py file with below error, I tried to switch to GPU and still get same problem, I was wondering if you have seen this error before. When I was using GPU, it would show error that input and target size does not match, that is why I am printing out sizes (torch.Size([700])):
{'net_args': {'extractor': {'net_args': {'conv_hidden': 100, 'emb_dim': 128, 'bi directional': True, 'vocab_size': 30004, 'lstm_layer': 1, 'lstm_hidden': 256}, ' traing_params': {'batch_size': 32, 'lr_decay': 0.5, 'optimizer': ['adam', {'lr': 0.001}], 'clip_grad_norm': 2}, 'net': 'ml_rnn_extractor'}, 'abstractor': {'net_ args': {'emb_dim': 128, 'bidirectional': True, 'vocab_size': 30004, 'n_hidden': 256, 'n_layer': 1}, 'traing_params': {'batch_size': 32, 'lr_decay': 0.5, 'optimi zer': ['adam', {'lr': 0.001}], 'clip_grad_norm': 2.0}, 'net': 'base_abstractor'} }, 'train_params': {'gamma': 0.95, 'lr_decay': 0.5, 'stop_reward': 'rouge-1', 'c lip_grad_norm': 2, 'batch_size': 32, 'reward': 'rouge-l', 'optimizer': ('adam', {'lr': 0.0001}), 'stop_coeff': 1.0}, 'net': 'rnn-ext_abs_rl'}
Start training
torch.Size([700])
torch.Size([700])
Traceback (most recent call last):
File "Geneva_ABS.py", line 266, in
main(args)
File "Geneva_ABS.py", line 224, in main
train_full_rl().train(path=output_path+'/model/', abs_dir=output_path+'/abst ractor/', ext_dir = output_path+'/extractor/')
File "/home/train_full_rl.py", line 195 , in train
trainer.train()
File "/home/training.py", line 211, in train
log_dict = self.pipeline.train_step()
File "/home/rl.py", line 177, in train step
self._stop_reward_fn, self._stop_coeff
File "/home/rl.py", line 108, in a2c_tr ain_step
[torch.ones(1).to(critic_loss.device)]*(1+len(losses))
File "/home/venv/lib/python3.5/site-packages/torch/autograd/in it.py", line 90, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: invalid gradient at index 0 - expected shape [] but got [1]
Hi,
I go to step train abstractor and extractor using ML objectives. I want to know the location of path to abstractor and extractor to run.
Hey Chen,
Recently, a very interesting model called BERT (https://github.com/google-research/bert) came out and has shown to achieve novel results on several NLP tasks. Do you think the results of your model can be improved by using their embedding model instead of word2vec?
I think it is an interesting idea, but I think I would need to make many many changes to the current architecture of your model.
Kind regards,
Nick
Hi, I tried to reproduce the results of rnn-ext + RL (without abstractor).
First, I pre-train an extractor using train_extractor_ml and construct the proxy training labels using ROUGE-L F1 score. Then, I run the train_full_rl.py and I do not specify the path of abstractor. After I finish the training, the results are significantly lower than the values reported in your paper. I just wonder if it is incorrect to use ROUGE-L F1 score to construct the proxy labels for rnn-ext, or you used other reward functions for it? Thank you so much for your help!
Lines 33 to 38 in 9e6c45d
hello dear Chen! i would like to know if i have interrupted my training, can i restore my model from the previous model checkpoint?
model/util.py has two following lines that requires GPU.
LineNum 59: order = torch.LongTensor(order).to(sequence_emb.get_device())
LineNum 73: order = torch.LongTensor(order).to(lstm_states[0].get_device())
However, my server has no GPU. When I train my own model using
python train_abstractor.py --no-cuda --path=mypath/abstractor --w2v=mypath/word2vec.128d.226k.bin
I will get the error:
fast_abs_rl/model/util.py", line 73, in reorder_lstm_states
order = torch.LongTensor(order).to(lstm_states[0].get_device())
RuntimeError: _th_get_device is not implemented for type torch.FloatTensor
I just check the doc of pytorch. It says that only GPU tensor has the method get_device().
"For CUDA tensors, this function returns the device ordinal of the GPU on which the tensor resides. For CPU tensors, an error is thrown." as "RuntimeError: get_device is not implemented for type torch.FloatTensor"
Does the code must need a GPU? Thank you.
Hello, I can train the model, I just get a label of extractor, but I can't find how to get the first phrase extrator summay .Could you tell me how to fix it?
I notice two unusual places of codes, and was puzzled by that.
(1) in class StackedLSTMCells
in rnn.py
line 102
@Property
def bidirectional(self):
return self._cells[0].bidirectional # ==> LSTMCell has no bidirectional?
I suppose self._cells
stores nn.LSTMCell
, which has no bidirectional
attribute? Do I understand this correctly?
(2) In class _CopyLinear
in copy_summ.py
inside __init__
function, I notice a line:
self.regiser_module(None, '_b')
. Do you mean register_module
?
If I set bias=False
, I will get an error either way. Could you comment on this?
Thank you!
Thanks for the great work!
I'm just wondering if we can utilize the results form the extractor independently.
Hi, is it any way to make easy inferences from texts? say I have an article and I want to see what is the summary for the model, without any evaluation.
Thanks for your work!
Hello,
Here in the CNN model, it seems the dropout layer is placed before the convolutional layer as opposed to after the max pooling layer as per Kim 2014:
Line 28 in b1b66c1
Is this intentional or a bug? If it's intentional, what is the rationale behind it?
Thanks,
Felix
Line 24 in aebf539
I read your paper again, and couldn't find the reason why you are using the recall of ROUGE-L score to make the pseudo-labels. Specifically :
Why ROUGE-L ? (and not ROUGE-1 for example)
Why precision ?
Traceback (most recent call last):
File "train_extractor_ml.py", line 237, in <module>
main(args)
File "train_extractor_ml.py", line 179, in main
trainer.train()
File "/home/eddiewng/LCSTS/fast_abs_rl/training.py", line 211, in train
log_dict = self._pipeline.train_step()
File "/home/eddiewng/LCSTS/fast_abs_rl/training.py", line 96, in train_step
net_out = self._net(*fw_args)
File "/home/eddiewng/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
result = self.forward(*input, **kwargs)
File "/home/eddiewng/LCSTS/fast_abs_rl/model/extract.py", line 276, in forward
enc_out = self._encode(article_sents, sent_nums)
File "/home/eddiewng/LCSTS/fast_abs_rl/model/extract.py", line 296, in _encode
for art_sent in article_sents]
File "/home/eddiewng/LCSTS/fast_abs_rl/model/extract.py", line 296, in <listcomp>
for art_sent in article_sents]
File "/home/eddiewng/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
result = self.forward(*input, **kwargs)
File "/home/eddiewng/LCSTS/fast_abs_rl/model/extract.py", line 31, in forward
for conv in self._convs], dim=1)
File "/home/eddiewng/LCSTS/fast_abs_rl/model/extract.py", line 31, in <listcomp>
for conv in self._convs], dim=1)
File "/home/eddiewng/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
result = self.forward(*input, **kwargs)
File "/home/eddiewng/.local/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 176, in forward
self.padding, self.dilation, self.groups)
RuntimeError: Expected tensor for argument #1 'input' to have the same dimension as tensor for 'result'; but 4 does not equal 3 (while checking arguments for cudnn_convolution)
I am trying to implement the model on chinese dataset. I succeeded to train the abstractive model. But when i try to train the extractive model. It just throw out this error. I have no idea what's going on. Does any one have any idea?
Hey,
Thank you very much for this great work.
i am having this error when I train my abstractor model:
AttributeError: 'float' object has no attribute 'item'.
any ideas why ?
Thanks in advanced
(train the full RL model)
I am trying to run python train_full_rl.py --path=[path/to/save/model] --abs_dir=[path/to/abstractor/model] --ext_dir=[path/to/extractor/model]
My GPU has 12GB memory. NVIDIA-SMI 387.26 , Driver Version: 387.26
0 Tesla K40m .
Could you comment on how much GPU RAM does training this needs in the README?
Thank you!
I met such error:
loading checkpoint ckpt-1.482692-6000...
loading checkpoint ckpt-3.515528-3000...
loading checkpoint ckpt-1.482692-6000...
start training with the following hyper-parameters:
{'net': 'rnn-ext_abs_rl', 'net_args': {'abstractor': {'net': 'base_abstractor', 'net_args': {'vocab_size': 30004, 'emb_dim': 128, 'n_hidden': 256, 'bidirectional': True, 'n_layer': 1}, 'traing_params': {'optimizer': ['adam', {'lr': 0.001}], 'clip_grad_norm': 2.0, 'batch_size': 32, 'lr_decay': 0.5}}, 'extractor': {'net': 'ml_rnn_extractor', 'net_args': {'vocab_size': 30004, 'emb_dim': 128, 'conv_hidden': 100, 'lstm_hidden': 256, 'lstm_layer': 1, 'bidirectional': True}, 'traing_params': {'optimizer': ['adam', {'lr': 0.001}], 'clip_grad_norm': 2.0, 'batch_size': 32, 'lr_decay': 0.5}}}, 'train_params': {'optimizer': ('adam', {'lr': 0.0001}), 'clip_grad_norm': 2.0, 'batch_size': 32, 'lr_decay': 0.5, 'gamma': 0.95, 'reward': 'rouge-l', 'stop_coeff': 1.0, 'stop_reward': 'rouge-1'}}
Start training
WARNING: Exploding Gradients 1829948160.00
WARNING: Exploding Gradients 110035.23
THCudaCheck FAIL file=/pytorch/aten/src/THC/generic/THCStorage.cu line=58 error=2 : out of memory
Traceback (most recent call last):
File "train_full_rl.py", line 228, in
train(args)
File "train_full_rl.py", line 182, in train
trainer.train()
File "/home/project/fast_abs_rl/fast_abs_rl/training.py", line 211, in train
log_dict = self._pipeline.train_step()
File "/home/project/fast_abs_rl/fast_abs_rl/rl.py", line 173, in train_step
self._stop_reward_fn, self._stop_coeff
File "/home/project/fast_abs_rl/fast_abs_rl/rl.py", line 64, in a2c_train_step
summaries = abstractor(ext_sents)
File "/home/project/fast_abs_rl/fast_abs_rl/decoding.py", line 94, in call
decs, attns = self._net.batch_decode(*dec_args)
File "/home/project/fast_abs_rl/fast_abs_rl/model/copy_summ.py", line 63, in batch_decode
attention, init_dec_states = self.encode(article, art_lens)
File "/home/project/fast_abs_rl/fast_abs_rl/model/summ.py", line 81, in encode
init_enc_states, self.embedding
File "/home/project/fast_abs_rl/fast_abs_rl/model/rnn.py", line 41, in lstm_encoder
lstm_out = reorder_sequence(lstm_out, reorder_ind, lstm.batch_first)
File "/home/project/fast_abs_rl/fast_abs_rl/model/util.py", line 58, in reorder_sequence
sorted = sequence_emb.index_select(index=order, dim=batch_dim)
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/aten/src/THC/generic/THCStorage.cu:58
Traceback (most recent call last):
File "C:\Users\admin\AppData\Local\Programs\Python\Python36\lib\multiprocessing\context.py", line 190, in get_context
ctx = _concrete_contexts[method]
KeyError: 'forkserver'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:/lab/2018match/fast_abs_rl-master/fast_abs_rl-master/train_extractor_ml.py", line 237, in
main(args)
File "D:/lab/2018match/fast_abs_rl-master/fast_abs_rl-master/train_extractor_ml.py", line 131, in main
args.cuda, args.debug)
File "D:/lab/2018match/fast_abs_rl-master/fast_abs_rl-master/train_extractor_ml.py", line 71, in build_batchers
single_run=False, fork=not debug)
File "D:\lab\2018match\fast_abs_rl-master\fast_abs_rl-master\data\batcher.py", line 216, in init
ctx = mp.get_context('forkserver')
File "C:\Users\admin\AppData\Local\Programs\Python\Python36\lib\multiprocessing\context.py", line 238, in get_context
return super().get_context(method)
File "C:\Users\admin\AppData\Local\Programs\Python\Python36\lib\multiprocessing\context.py", line 192, in get_context
raise ValueError('cannot find context for %r' % method)
ValueError: cannot find context for 'forkserver'
Hi,
when I run the train_abstractor.py, I got the following error:
Traceback (most recent call last):
File "train_abstractor.py", line 220, in
main(args)
File "train_abstractor.py", line 166, in main
trainer.train()
File "/data/comp/fast_abs_rl/training.py", line 211, in train
log_dict = self._pipeline.train_step()
File "/data/comp/fast_abs_rl/training.py", line 107, in train_step
log_dict.update(self._grad_fn())
File "/data/comp/fast_abs_rl/training.py", line 20, in f
grad_norm = grad_norm.item()
AttributeError: 'float' object has no attribute 'item'
How to solve this problem? Thanks!
Hey Chen,
I have a question about the DataLoader in the files where the extractor, abstractor and RL are trained, more specifically, about this line of code:
val_loader = DataLoader(
MatchDataset('val'), batch_size=BUCKET_SIZE,
shuffle=False, num_workers=4 if cuda and not debug else 0,
collate_fn=coll_fn
)
I was wondering why pin_memory is not set to TRUE? I have read that this can improve performance when training on Nvidia GPU's. Also, would you advise to set num_workers higher when using a high-end GPU (Tesla V100 16GB) and 8 cores CPU?
Thanks!
When I limit the number of sentences in the summary to 1, for example, the forward
function in PtrExtractorRL
in /model/rl.py
is called.
The for loop for _ in range(n_step):
is run once (n_step=1
) and the output is returned.
Yet in decode_full_model.py
the last element of the extracted output is removed as EOE:
ext = extractor(raw_art_sents)[:-1] # exclude EOE
.
So in this case the only sentence being extracted is discarded as EOE.
One solution I can think of is to append an EOE in the forward
function of PtrExtractorRL
. Specifically, insert the following lines just before the return call:
if out.item() != score.size()[1]:
append(torch.tensor([[score.size()[1]]], device=out.get_device()))
I understand that EOE token allow the agent to stop in RL training.
But why not training also the extractor (only) to extract EOE token ?
I'm trying to add another token into your architecture, and follow your design by adding it only at RL.
However results are not satisfying. I'm wondering if I should train the extractor too, with this supplementary token.
What do you think ?
L127-L128 has no effect:
Lines 126 to 130 in b1b66c1
I assume this is intended:seq.append(art_sent[max(range(len(art_sent)), key=lambda j: attn[j].item())])
.
Hi,
Thanks for sharing the excellent work and high-quality code! I notice in the attention mechanism for LSTMPointerNet
, you didn't use self._attn_wm
in attention_score
function. I guess the result is already good without it. Did you decide to drop self._attn_wm
because it does not help much? Thanks for the comments.
WARNING: Exploding Gradients 8048.54 WARNING: Exploding Gradients 219.81 WARNING: Exploding Gradients 360.95 WARNING: Exploding Gradients 233.41 Traceback (most recent call last): File "train_full_rl.py", line 228, in <module> train(args) File "train_full_rl.py", line 182, in train trainer.train() File "/data/xiaqiang/nmt/fast_abs_rl/training.py", line 211, in train log_dict = self._pipeline.train_step() File "/data/xiaqiang/nmt/fast_abs_rl/rl.py", line 174, in train_step self._stop_reward_fn, self._stop_coeff File "/data/xiaqiang/nmt/fast_abs_rl/rl.py", line 101, in a2c_train_step critic_loss = F.mse_loss(baseline, reward) File "/data/xiaqiang/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 1716, in mse_loss return _pointwise_loss(lambda a, b: (a - b) ** 2, torch._C._nn.mse_loss, input, target, reduction) File "/data/xiaqiang/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 1674, in _pointwise_loss return lambd_optimized(input, target, reduction) RuntimeError: input and target shapes do not match: input [204], target [200] at /opt/conda/conda-bld/pytorch_1535491974311/work/aten/src/THCUNN/generic/MSECriterion.cu:12
How can i use this code to train and inference my model in distributed environment of spark cluster which has multiple cpu nodes.
Hi,
I followed the institution to install and set pyrouge and ROUGE1.5.5,
when I evaluate the full model by executing the "eval_full_model.py" script, the --meteor flag works well, but the --rouge flag will output errors like:
Cannot open exception db file for reading: /home/zqj/Workspace/Tools/pyrouge-master/tools/ROUGE-1.5.5/data/WordNet-2.0.exc.db
Traceback (most recent call last):
File "eval_full_model.py", line 53, in
main(args)
File "eval_full_model.py", line 26, in main
output = eval_rouge(dec_pattern, dec_dir, ref_pattern, ref_dir)
File "/home/zqj/Workspace/fast_abs_rl/evaluate.py", line 40, in eval_rouge
output = sp.check_output(cmd.split(' '), universal_newlines=True)
File "/usr/lib/python3.5/subprocess.py", line 626, in check_output
**kwargs).stdout
File "/usr/lib/python3.5/subprocess.py", line 708, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['/home/zqj/Workspace/Tools/pyrouge-master/tools/ROUGE-1.5.5/ROUGE-1.5.5.pl', '-e', '/home/zqj/Workspace/Tools/pyrouge-master/tools/ROUGE-1.5.5/data', '-c', '95', '-r', '1000', '-n', '2', '-m', '-a', '/tmp/tmpf5fnmfg3/settings.xml']' returned non-zero exit status 255
My OS is: Ubuntu 16.04.
Perl version: perl 5, version 22.
My I ask how to solve this is issue?
I read the source code and found that the data format is JSON. How it is specifically constructed? I want to use my own data. What do I need to do?
Hey,
Thanks for this great work, I just want to know if it is easily possible to force the model to predict exactly n abstract lines, so all tested articles will have the same number of summary lines.
Where should I create val/test folder ?
I am trying to evaluate from pretrained model. As per the instructions after downloaing and detokenising the dataset I get,
ishandutta2007@MacBook-Pro:~/Documents/Projects/fast_abs_rl$ export DATA=../cnn-dailymail/cnn/stories
ishandutta2007@MacBook-Pro:~/Documents/Projects/fast_abs_rl$ python decode_full_model.py --path=../cnn-dailymail/cnn_stories_tokenized --model_dir=./pretrained/acl --beam=5 --val
loading checkpoint ckpt-0-0...
loading checkpoint ckpt-42.508549-0...
Traceback (most recent call last):
File "decode_full_model.py", line 169, in
args.max_dec_word, args.cuda)
File "decode_full_model.py", line 49, in decode
dataset = DecodeDataset(split)
File "/Users/ishandutta2007/Documents/Projects/fast_abs_rl/decoding.py", line 30, in init
super().init(split, DATASET_DIR)
File "/Users/ishandutta2007/Documents/Projects/fast_abs_rl/data/data.py", line 14, in init
self._n_data = _count_data(self._data_path)
File "/Users/ishandutta2007/Documents/Projects/fast_abs_rl/data/data.py", line 29, in _count_data
names = os.listdir(path)
FileNotFoundError: [Errno 2] No such file or directory: '../cnn-dailymail/cnn/stories/val'
batcher.py#L119. There is an ERROR when the target summary has one sentence only.
Hello,
Could you help me understand why beam search (e.g. BeamAbstractor
) is not used during training (either abstractor training or RL training)?
Thanks,
Felix
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.