yale-lily / dyle Goto Github PK

View Code? Open in Web Editor NEW

54.0 54.0 11.0 47 KB

Repository for ACL'22 paper: Dynamic Latent Extraction for Abstractive Long-Input Summarization

Home Page: https://arxiv.org/abs/2110.08168

License: MIT License

Python 100.00%

dyle's People

Contributors

Stargazers

Watchers

Forkers

nthon jinhyeong-lim dragomirradev imadensmn sleepydog77 riu-13 janicexiong jaeeonpark cakevision ducdinhchu conglesolutionx

dyle's Issues

About dynamic weight and irrelevant snippets

Hello, I have read your paper and I am trying to run your code. You say this model can denoise by down-weighting irrelevant snippets but I can't find this filtering in your code. So how does it recogonize irrelevant snippets？By scoring similarity？

not achieve the results

Thank you for your outstanding work, but when I downloaded the qmsum checkpoint you provided, I did not achieve the results of my paper on my server, which was 1-2 points lower than the results in the paper. May I ask what the reason is? The server is 3080

Questions about window size settings

Hi, thanks for your outstanding work!
I noticed that when preprocessing data for the generator, the length of the content of a turn of conversations is determined by the window_size and max_source_len. You set the window size to 0, which means that the content of the conversation is limited to the sentence itself, without any contextual information (assuming that the sentence contains less than the maximum length of words).
May I ask what is the reason for setting the window size to 0? Is completing some contextual information not beneficial in terms of performance。

About detach_generator_consistency

Hi, thanks for your outstanding work!
I saw all the detach_generator_consistency in the source code is set to False. In this setting, the dynamic_mlp can only be optimized through the seq_loss path after marginalizing.

I wonder why we do not set detach_generator_consistency to True. Does it because the actual training results were worse? But I did not see it reported in the paper.

If it got worse, maybe the model collapsed because the dynamic scores and doc scores are easily predicted to all zeros or there were other reasons. Could you give me some explanations?
Thanks a lot! @MaoZiming

About oracles of arxiv

Hello, sorry to bother you. I have difficulty in processing oracles of arxiv training set. I just run arxiv_oracle.py and wait. I trained for 4 days and it only produces 4300 files in index_train. Can you check this python files? I'm really confused now.

About the method of learning rate decay

Hello! Sorry to bother you again. In Experiment.py, it writes "no_improvement = self.seq_evaluate_gen(test=False, beam_size=beam_size)" and " if no_improvement and self.iter_num > config.start_decay:" to achieve learning rate decay. But self.seq_evaluate_gen doesn't return anything, personally I set the "no_improvement = True" if the evaluation rouge scores don't improve, which means if "metric > self.best_metric", then "no_improvement = False". Am I right?
I am trying to achieve the expected scores mentioned in your paper, but my scores are much lower and it doesn't improve much in training iteration.

yale-lily / dyle Goto Github PK

dyle's People

Contributors

Stargazers

Watchers

Forkers

dyle's Issues

About dynamic weight and irrelevant snippets

not achieve the results

Questions about window size settings

About detach_generator_consistency

About oracles of arxiv

About the method of learning rate decay

CUDA out of memory.

Problems in code (attention mask is not used)

Does the validation and test need oracle?The code cannot match the paper

Minimum GPU requirement

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent