Giter Club home page Giter Club logo

multi-hop-analysis's People

Contributors

dnanhkhoa avatar xanhho avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

multi-hop-analysis's Issues

Reproduced results are far from the paper mentioned.

Hi, I use your provided data https://www.dropbox.com/s/dcrr5m0sxhexr84/2wiki.zip?dl=0 and provided checkpoints https://www.dropbox.com/s/b0d65poctqs38w8/checkpoints.zip?dl=0 to reproduce the dev set results, here are my commands:

python preprocess.py
sh predict_dev_all_settings.sh

And here is my results scores on 2wiki dev set (using checkpoint in dir 2wiki_3task):

{   
    "em": 67.09,
    "f1": 72.84,
    "prec": 72.56,
    "recall": 74.13,
    "sp_em": 78.46,
    "sp_f1": 92.68,
    "sp_prec": 96.28,
    "sp_recall": 90.73,
    "evi_em": 39.86,
    "evi_f1": 71.27,
    "evi_prec": 77.12,
    "evi_recall": 69.37,
    "joint_em": 32.36,
    "joint_f1": 55.15,
    "joint_prec": 59.93,
    "joint_recall": 54.2
}

which Answer Task scores em f1and Evidence-level Task scores evi_em evi_f1 are far from the metrics which original paper mentioned:
multihop_analysis_scores

Did I made some mistakes in my reproducing process?

Reproduced results are far from the paper mentioned.

Hi, I use your provided data https://www.dropbox.com/s/dcrr5m0sxhexr84/2wiki.zip?dl=0 and provided checkpoints https://www.dropbox.com/s/b0d65poctqs38w8/checkpoints.zip?dl=0 to reproduce the dev set results, here are my commands:

python preprocess.py
sh predict_dev_all_settings.sh

And here is my results scores on 2wiki dev set (using checkpoint in dir 2wiki_3task):

{   
    "em": 67.09,
    "f1": 72.84,
    "prec": 72.56,
    "recall": 74.13,
    "sp_em": 78.46,
    "sp_f1": 92.68,
    "sp_prec": 96.28,
    "sp_recall": 90.73,
    "evi_em": 39.86,
    "evi_f1": 71.27,
    "evi_prec": 77.12,
    "evi_recall": 69.37,
    "joint_em": 32.36,
    "joint_f1": 55.15,
    "joint_prec": 59.93,
    "joint_recall": 54.2
}

which Answer Task scores em f1and Evidence-level Task scores evi_em evi_f1 are far from the metrics which original paper mentioned:
multihop_analysis_scores

Did I made some mistakes in my reproducing process?

How to get the pre-processed data of 2Wiki (more detailly)?

Hi @xanhho . You said data processing is based on HGN. But I see there are some new attributes in Example class such as evidence_ids. Besides, I found it difficult to follow the entire process of HGN. If I want to replace the paragraph selection part with more precise selection, how can I get the pre-processed data quickly?

Reproduced results are far from the paper mentioned.

Hi, I use your provided data https://www.dropbox.com/s/dcrr5m0sxhexr84/2wiki.zip?dl=0 and provided checkpoints https://www.dropbox.com/s/b0d65poctqs38w8/checkpoints.zip?dl=0 to reproduce the dev set results, here are my commands:

python preprocess.py
sh predict_dev_all_settings.sh

And here is my results scores on 2wiki dev set (using checkpoint in dir 2wiki_3task):

{   
    "em": 67.09,
    "f1": 72.84,
    "prec": 72.56,
    "recall": 74.13,
    "sp_em": 78.46,
    "sp_f1": 92.68,
    "sp_prec": 96.28,
    "sp_recall": 90.73,
    "evi_em": 39.86,
    "evi_f1": 71.27,
    "evi_prec": 77.12,
    "evi_recall": 69.37,
    "joint_em": 32.36,
    "joint_f1": 55.15,
    "joint_prec": 59.93,
    "joint_recall": 54.2
}

which Answer Task scores em f1and Evidence-level Task scores evi_em evi_f1 are far from the metrics which original paper mentioned:
multihop_analysis_scores

Did I made some mistakes in my reproducing process?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.