yale-lily / dyle Goto Github PK
View Code? Open in Web Editor NEWRepository for ACL'22 paper: Dynamic Latent Extraction for Abstractive Long-Input Summarization
Home Page: https://arxiv.org/abs/2110.08168
License: MIT License
Repository for ACL'22 paper: Dynamic Latent Extraction for Abstractive Long-Input Summarization
Home Page: https://arxiv.org/abs/2110.08168
License: MIT License
Hi, thanks for your outstanding work!
I noticed that when preprocessing data for the generator, the length of the content of a turn of conversations is determined by the window_size and max_source_len. You set the window size to 0, which means that the content of the conversation is limited to the sentence itself, without any contextual information (assuming that the sentence contains less than the maximum length of words).
May I ask what is the reason for setting the window size to 0? Is completing some contextual information not beneficial in terms of performance。
Thank you for your outstanding work, but when I downloaded the qmsum checkpoint you provided, I did not achieve the results of my paper on my server, which was 1-2 points lower than the results in the paper. May I ask what the reason is? The server is 3080
Hello, sorry to bother you. I have difficulty in processing oracles of arxiv training set. I just run arxiv_oracle.py and wait. I trained for 4 days and it only produces 4300 files in index_train. Can you check this python files? I'm really confused now.
I was wondering what's the minimum amount of VRAM is required even to test the models? also for training? I have 2 16 GB cards but I couldn't make it work even with batch size 1.
In your code, you write that only during train and test need oracle. But in your paper, you say "No extractive oracles are used
during test time".
Thanks for your outstanding work!
When I was training your model, I encountered an out-of-memory issue. I am using a Tesla V100S 32G GPU, and although I tried reducing the batch size to 1, the problem still persists. Is there any way to reduce the memory consumption during training?
Hello, I have read your paper and I am trying to run your code. You say this model can denoise by down-weighting irrelevant snippets but I can't find this filtering in your code. So how does it recogonize irrelevant snippets?By scoring similarity?
Hi, thanks for your outstanding work!
I saw all the detach_generator_consistency
in the source code is set to False. In this setting, the dynamic_mlp
can only be optimized through the seq_loss
path after marginalizing.
I wonder why we do not set detach_generator_consistency
to True. Does it because the actual training results were worse? But I did not see it reported in the paper.
If it got worse, maybe the model collapsed because the dynamic scores and doc scores are easily predicted to all zeros or there were other reasons. Could you give me some explanations?
Thanks a lot! @MaoZiming
Hello! Sorry to bother you again. In Experiment.py, it writes "no_improvement = self.seq_evaluate_gen(test=False, beam_size=beam_size)" and " if no_improvement and self.iter_num > config.start_decay:" to achieve learning rate decay. But self.seq_evaluate_gen doesn't return anything, personally I set the "no_improvement = True" if the evaluation rouge scores don't improve, which means if "metric > self.best_metric", then "no_improvement = False". Am I right?
I am trying to achieve the expected scores mentioned in your paper, but my scores are much lower and it doesn't improve much in training iteration.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.