Giter Club home page Giter Club logo

entqa's People

Contributors

dependabot[bot] avatar wenzhengzhang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

entqa's Issues

Probability of predicted entities decreases with increase in number of candidates

Hello, thank you for your very interesting work. I noticed that the final probability of the detected entities tends to decrease when i increase the number of candidates, which sometimes causes the result to be less accurate with increased k as the probability falls below the threshold. Is this normal behavior? Possibly due to the re-ranking function?

Some issues in data preprocessing

Hi, thanks for your amazing works in EntQA! I am currently working on adapting your work to a custom dataset. And I found following issues in the data preprocessing script that may raise some errors or misfunction.

EntQA/preprocess_data.py

Lines 238 to 242 in 7b3cec5

def char2token(text, index):
char2token_list = []
for i, tok in enumerate(text):
char2token_list += [i] * len(tok.replace("##", ""))
return char2token_list[index]

This char2token function actually forgets blank char when counting the mapping between characters and tokens. To fix this issue, the length should be char2token_list += [i] * (len(tok.replace("##", "")) + 1).

EntQA/preprocess_data.py

Lines 268 to 269 in 7b3cec5

max_ent_length=args.max_ent_len,
pad_to_max_ent_length=True,

This function looks have two weird parameters max_ent_length and pad_to_max_ent_length. I guess it should be max_length=args.max_length and padding='max_length'.

Hope this would provide some helps to future developers.

Cannot reproduce the results

Hi, Wenzheng,

Thanks for your great work!
According to Table 1 and Table 3 in your paper, the test F1 and val F1 should be 85.8 and 87.5, respectively. But I got lower results as follows using the trained reader and reader inputs provided in this repo:

Test results:
{"pred_total": 4760, "gold_total": 4485, "strong_correct_num": 3902}
test recall 0.8700| test precision 0.8197 | test F1 0.8441 |

Val results
{"pred_total": 5110, "gold_total": 4791, "strong_correct_num": 4323}
val recall 0.9023 | val precision 0.846 | val F1 0.8732 |

May I know if the above results are reasonable? and how can I reproduce the results in your paper? Thanks!

Questions about the AIDA CoNLL datasets

Hello!
I want to know how to generate the three files "aida-yago2-dataset-train.tsv", "aida-yago2-dataset-val.tsv" and "aida-yago2-dataset-test.tsv" ? I can only generate one dataset file according to the website you recommend, that is "AIDA-YAGO2-dataset.tsv".

The file in gerbil-SpotWrapNifWS4Test/repository

Dear author, I am trying to run the GERBIL evaluation.
In the gerbil-SpotWrapNifWS4Test/pom.xml, I found this:
<!-- Let's use a local repository for the local libraries of this project --> <repository> <id>local repository</id> <url>file://${project.basedir}/repository</url> </repository>
And when I tried to config experiment, there is an error caused by the missing artifacts in this local repository.
The error is as follow:
org.eclipse.aether.resolution.ArtifactResolutionException: The following artifacts could not be resolved: org.aksw:gerbil.nif.transfer:jar:1.1.0-SNAPSHOT, org.restlet:org.restlet:jar:2.2.1, org.restlet:org.restlet.ext.servlet:jar:2.2.1: Could not find artifact org.aksw:gerbil.nif.transfer:jar:1.1.0-SNAPSHOT in local repository (file:/data/luxy/EntQA/gerbil-SpotWrapNifWS4Test/repository)
How can I get the artifacts in the local repository?

evaluate on GERBIL

How long have you waitting for evaluate EntQA on AIDA-testb in GERBIL?
I have waiting for 9 hours without any sentence transfer to the annotater.
How to solve this?

可以发一个安装并运行EntQA的视频么

我自己按照方法配置,因为不能fan墙的缘故,总是在一些地方出错,我希望您这边可以在b站或者别的短视频平台发布一个教学视频,帮助不太了解这方面的人运行您的代码,不知道是否可以?

ValueError np.load(ars.cands_embeds_path)

Hi,

I encounter an error while using the run_retriever.py script with the downloaded precomputed candidates embeddings.

This is what is returned by the program :

all_cands_embeds = np.load(args.cands_embeds_path)
  File "/home/m/anaconda3/envs/entqa/lib/python3.8/site-packages/numpy/lib/npyio.py", line 440, in load
    return format.read_array(fid, allow_pickle=allow_pickle,
  File "/home/m/anaconda3/envs/entqa/lib/python3.8/site-packages/numpy/lib/format.py", line 787, in read_array
    array.shape = shape
ValueError: cannot reshape array of size 2336706528 into shape (5903531,1024)

Would you be so kind as to help me understand what I am not doing properly ? :)

some issues to reproduce the Gerbil result

Hello, really nice work:

I have a few questions/corrections/suggestions to reproduce the gerbil results.

  1. the "pip -r install requirements.txt" in readme should be "pip install -r requirements.txt"
  2. 'torch=1.7.1' in requirements.txt shoud be 'torch==1.7.1'
  3. in readme, the parameter '--raw_kb_dir' command should be '--raw_kb_path' used by preprocess_data.py
  4. As for gerbil evaluation, the EntQA package uses similar interface to "end2end_neural_el" and "rel". I am trying to use the provided gerbil snapshot to evaluate on "rel" locally, but find the metrics is much lower than its reported values. It shows many wikipedia pages are unknown which should be valid pages. Do you have any ideas ?
  5. Can you explain the actual input/output (an example is greatly appreciated) of the "python gerbil_experiments/server.py" if it is setup.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.