wyu97 / genread Goto Github PK

Code and Checkpoints for "Generate rather than Retrieve: Large Language Models are Strong Context Generators" in ICLR 2023.

Python 100.00%

large-language-model question-answering

genread's People

Contributors

Stargazers

Watchers

Forkers

c00renut ricklentz kozakroman eltociear deyh2020 techthiyanes huzama swj0419 moqingxinai wshzd arian-askari goldenretriever98 zhutony geiduanliu jxzhangjhu tan-hexiang leeds1219 skaarlcooper

genread's Issues

Practical Application

Let's consider an example: "I want to know how much the tuition fee for the Computer Science department at New York University is in 2024".

At this point, LLM, which replaces the retriever, cannot generate documents containing the information I need, so the reader will not be able to answer my question. How should I solve this?

Thank you for your attention.

Thanks for the great work and I really enjoy the insight you provide in the paper! I am wondering if there is any code snippet that could help me quickly reproduce the results of DPR+InstructGPT in zero-shot setting? I assume the prompt should be the same as in inprompts/regular.jsonl except for the background document?

Reproducing table 2

Dear Authors,
Hello, I am a graduate student studying information retrieval from South Korea.
First of all, thank you for sharing your great work.

I am facing difficulty in reproducing the experimental results on NQ data.

I will try to be as brief as possible.

Model used:
M1) GenRead-3B-NQ,

Contexts used:
C1) supervised:clustering (Recall@10: 71.3)
C2) DPR (FiD-distil) (The one provided by the FiD authors from here (https://github.com/facebookresearch/FiD)) (Recall@10: 80.3)
Note that I fixed the number of used documents to 10. (--n_contexts argument)

Since we have 1 model and 2 contexts, there are 2 possible combinations. i.e. M1+C1, M1+C2.

The commands I cloned and ran for the FiD repo are shown below.
python test_reader.py --model_path {model_path} --eval_data {test_json_path} --per_gpu_batch_size 1 --n
_context 10

M1+C1 is reported as 45.6 in table 2, but my experiment came up with 46.2. This seems like a reasonable margin of error.

However, M1+C2 (similar to row 5 in Table 2) came out to be 41.3, which is very different from the reported 50.1.

In summary, FiD-xl produces the same result as the paper when used with generated documents, but the result is very different when used with retrieved documents.
Do you have any suggestions on what I'm doing wrong?

Best regards,
Eunseong,

Request for Access to Full Dataset for Each Step

I recently read your paper and was impressed by your work. I am interested in using your dataset for my own research and would like to request access to the full dataset for each step.

I noticed you have already released a test dataset, but I am interested in analyzing the full dataset for each step to save money. Would it be possible to obtain access to this data?

If you have already released the full dataset and I missed it, please let me know where I can find it. If the dataset is not available,

I would appreciate any information you can provide about your plans to release it in the future.

Some questions about the proposed clustering-based prompts.

Dear Authors

Thanks for sharing the great work.

For the proposed Clustering-based prompts, I have a few question:
Q1: Step 1: get 1 initial document per question. Wondering why we need to have this step? Seems not benefit to increase diversity?
Q2: Step 2: title said "encoding documents", paragraph said "encode question-document pair"; wondering what do you actually encoding?
Q3: Step 3: Sampling K question-doc paris from K-clusters; How to make sure the document contains relevant information to the question?
Q4: So the intuition of this method is: like in-context learning, providing an "prompt example" for LLM to generate diverse documents; and then giving the sample instruction, the LLM will generate documents with high diversity?

thanks for your attention!

Best,
Dayu

About Evaluation metrics

Hi, thanks for your work! After downloading the results you provided for zero-shot setting, I got the following results (a little bit higher than reported):

I just want to know if this is the improved version or if there is anything wrong with my installed python packages? Thanks!

Reproduce table 1

Dear author,
Thank you for your outstanding work! I would like to reproduce the experiments mentioned in Table 1. Can you please guide me on how to train DPR on the FEVER dataset? I noticed that the KILT knowledge source is vast, and the official repository does not provide the DPR checkpoint trained on KILT FEVER.
Besides, could you please tell me details about "Contriever + InstructGPT"? Did you just load the contriever checkpoint released by https://github.com/facebookresearch/contriever ?
Thank you in advance for your assistance.

wyu97 / genread Goto Github PK

genread's People

Contributors

Stargazers

Watchers

Forkers

genread's Issues

Practical Application

Code for DPR+InstructGPT

Reproducing table 2

Request for Access to Full Dataset for Each Step

Some questions about the proposed clustering-based prompts.

About Evaluation metrics

Reproduce table 1

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent