alab-nii / multi-hop-analysis Goto Github PK

View Code? Open in Web Editor NEW

5.0 5.0 3.0 35 KB

License: Apache License 2.0

Python 98.23% Shell 1.77%

machine-reading-comprehension multi-hop-model multi-hop-reasoning

multi-hop-analysis's People

Contributors

Stargazers

Watchers

Forkers

xanhho anshiquanshu66 dnanhkhoa

multi-hop-analysis's Issues

Reproduced results are far from the paper mentioned.

Hi, I use your provided data https://www.dropbox.com/s/dcrr5m0sxhexr84/2wiki.zip?dl=0 and provided checkpoints https://www.dropbox.com/s/b0d65poctqs38w8/checkpoints.zip?dl=0 to reproduce the dev set results, here are my commands:

python preprocess.py
sh predict_dev_all_settings.sh

And here is my results scores on 2wiki dev set (using checkpoint in dir 2wiki_3task):

{   
    "em": 67.09,
    "f1": 72.84,
    "prec": 72.56,
    "recall": 74.13,
    "sp_em": 78.46,
    "sp_f1": 92.68,
    "sp_prec": 96.28,
    "sp_recall": 90.73,
    "evi_em": 39.86,
    "evi_f1": 71.27,
    "evi_prec": 77.12,
    "evi_recall": 69.37,
    "joint_em": 32.36,
    "joint_f1": 55.15,
    "joint_prec": 59.93,
    "joint_recall": 54.2
}

which Answer Task scores em f1and Evidence-level Task scores evi_em evi_f1 are far from the metrics which original paper mentioned:

Did I made some mistakes in my reproducing process?

Reproduced results are far from the paper mentioned.

python preprocess.py
sh predict_dev_all_settings.sh

And here is my results scores on 2wiki dev set (using checkpoint in dir 2wiki_3task):

{   
    "em": 67.09,
    "f1": 72.84,
    "prec": 72.56,
    "recall": 74.13,
    "sp_em": 78.46,
    "sp_f1": 92.68,
    "sp_prec": 96.28,
    "sp_recall": 90.73,
    "evi_em": 39.86,
    "evi_f1": 71.27,
    "evi_prec": 77.12,
    "evi_recall": 69.37,
    "joint_em": 32.36,
    "joint_f1": 55.15,
    "joint_prec": 59.93,
    "joint_recall": 54.2
}

which Answer Task scores em f1and Evidence-level Task scores evi_em evi_f1 are far from the metrics which original paper mentioned:

Did I made some mistakes in my reproducing process?

How to get the pre-processed data of 2Wiki (more detailly)?

Hi @xanhho . You said data processing is based on HGN. But I see there are some new attributes in Example class such as evidence_ids. Besides, I found it difficult to follow the entire process of HGN. If I want to replace the paragraph selection part with more precise selection, how can I get the pre-processed data quickly?

Reproduced results are far from the paper mentioned.

python preprocess.py
sh predict_dev_all_settings.sh

And here is my results scores on 2wiki dev set (using checkpoint in dir 2wiki_3task):

{   
    "em": 67.09,
    "f1": 72.84,
    "prec": 72.56,
    "recall": 74.13,
    "sp_em": 78.46,
    "sp_f1": 92.68,
    "sp_prec": 96.28,
    "sp_recall": 90.73,
    "evi_em": 39.86,
    "evi_f1": 71.27,
    "evi_prec": 77.12,
    "evi_recall": 69.37,
    "joint_em": 32.36,
    "joint_f1": 55.15,
    "joint_prec": 59.93,
    "joint_recall": 54.2
}

which Answer Task scores em f1and Evidence-level Task scores evi_em evi_f1 are far from the metrics which original paper mentioned:

Did I made some mistakes in my reproducing process?

alab-nii / multi-hop-analysis Goto Github PK

multi-hop-analysis's People

Contributors

Stargazers

Watchers

Forkers

multi-hop-analysis's Issues

Reproduced results are far from the paper mentioned.

Reproduced results are far from the paper mentioned.

How to get the pre-processed data of 2Wiki (more detailly)?

Reproduced results are far from the paper mentioned.

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent