Giter Club home page Giter Club logo

ambigdocs's Introduction

AmbigDocs: Reasoning across Documents on Different Entities under the Same Name

[Paper] [Homepage] [Dataset]

Introduction

This is the repository for the paper AmbigDocs: Reasoning across Documents on Different Entities under the Same Name.

We introduce AmbigDocs, a benchmark for testing the abilities of current LMs to distinguish confusing entity mentions and generate a cohesive answer. Single instance consists of a question asking about an ambiguous entity and a list of gold document-answer pairs for each disambiguated entity.

Dataset Contents

Download the data from here and place under src/data. Additionally, we use the Wikipedia snapshot from December 20th, 2018. Please place the documents (psgs_w100.tsv) in same directory, which can be downloaded from DPR repo.

Each data instance consists of question, ambiguous_entity, qid, and a list of documents. Each element in documents consists of title which is a disambiguated entity, text, pid for referencing psgs_w100.tsv, and answer.

Setup

pip install -r requirements.txt

For evaluation, please place the necessary LMs under src/models. For generation, please place question_converter-3b, t5_xxl_true_nli_mixture under src/models.

Dataset Generation

For dataset generation, please refer to src/generation subdirectory.

Evaluation

  1. Executing below will run inference on test split. mode represents the following: 1: Gold Only, 2: Gold+Retrieved, 3: Retrieved Only, 4: Few-shot Put the name of the model you are using in model. If this contains "gpt", put openAPI key afterwards. Otherwise, put the model path to the argument.

    python qa.py [data_path] [mode] [model] [openAPI key/path_to_QA_model]
    
  2. Executing below will compute preliminary operations for computing Disambig-F1 score.

    sh df1.sh [mode] [model]
    
  3. Executing below will compute Answer Recall / Entity Recall / Entity-Answer Recall / Disambig-F1 scores.

    python eval.py [mode] [model]
    

While our study mainly focuses on Gold Only setting, we also experiment on retrieved corpus. We leverage GTR as our retriever and the codes taken from ALCE repo. Please download necessary pre-computed embeddings and GTR model and execute retrieval.py.

python retrieval.py [path_to_gtr_wikipedia_index.pkl] [path_to_GTR_model] \
--retriever gtr \
--data_file ../../../data/test.json \
--output_file ../../../data/test_retrieved.json

Citations

If you find our work helpful, please cite us as

@article{lee2024ambigdocs,
    title={AmbigDocs: Reasoning across Documents on Different Entities under the Same Name},
    author={Lee, Yoonsang and Ye, Xi and Choi, Eunsol},
    journal={arXiv preprint arXiv:2404.12447},
    year={2024}
}

ambigdocs's People

Contributors

lilys012 avatar

Stargazers

init avatar

Watchers

 avatar

ambigdocs's Issues

Question about evaluation metrics

Hi!

thank you for putting this dataset together; it is really helpful to our research :)

I want to ask about the method names in the eval.py script:

  • is "R" answer recall, and "Entity" entity recall?
  • is there existing code for the answer-entity recall (EAR) metric you report in the paper?

(I could implement myself using the two if i'm understanding correctly, but I thought I should ask. )

Thank you!

Best,
Sophia

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.