wenzhengzhang / entqa Goto Github PK

View Code? Open in Web Editor NEW

57.0 57.0 12.0 243 KB

Pytorch implementation of EntQA paper

License: MIT License

Python 78.80% Dockerfile 0.15% Makefile 0.14% Shell 0.08% Java 20.83%

entqa's People

Contributors

Stargazers

Watchers

Forkers

wangdongde yifding acdante daiqizhu techthiyanes nearwatson edobobo chenchao9999 eatinghungry zhn1010 vera-pro

entqa's Issues

Probability of predicted entities decreases with increase in number of candidates

Hello, thank you for your very interesting work. I noticed that the final probability of the detected entities tends to decrease when i increase the number of candidates, which sometimes causes the result to be less accurate with increased k as the probability falls below the threshold. Is this normal behavior? Possibly due to the re-ranking function?

Some issues in data preprocessing

Hi, thanks for your amazing works in EntQA! I am currently working on adapting your work to a custom dataset. And I found following issues in the data preprocessing script that may raise some errors or misfunction.

EntQA/preprocess_data.py

Lines 238 to 242 in 7b3cec5

 def char2token(text, index): 

 char2token_list = [] 

 for i, tok in enumerate(text): 

 char2token_list += [i] * len(tok.replace("##", "")) 

 return char2token_list[index]

This char2token function actually forgets blank char when counting the mapping between characters and tokens. To fix this issue, the length should be char2token_list += [i] * (len(tok.replace("##", "")) + 1).

EntQA/preprocess_data.py

Lines 268 to 269 in 7b3cec5

 max_ent_length=args.max_ent_len, 

 pad_to_max_ent_length=True,

This function looks have two weird parameters max_ent_length and pad_to_max_ent_length. I guess it should be max_length=args.max_length and padding='max_length'.

Hope this would provide some helps to future developers.

Cannot reproduce the results

Hi, Wenzheng,

Thanks for your great work!
According to Table 1 and Table 3 in your paper, the test F1 and val F1 should be 85.8 and 87.5, respectively. But I got lower results as follows using the trained reader and reader inputs provided in this repo:

Test results:
{"pred_total": 4760, "gold_total": 4485, "strong_correct_num": 3902}
test recall 0.8700| test precision 0.8197 | test F1 0.8441 |

Val results
{"pred_total": 5110, "gold_total": 4791, "strong_correct_num": 4323}
val recall 0.9023 | val precision 0.846 | val F1 0.8732 |

May I know if the above results are reasonable? and how can I reproduce the results in your paper? Thanks!

Questions about the AIDA CoNLL datasets

Hello!
I want to know how to generate the three files "aida-yago2-dataset-train.tsv", "aida-yago2-dataset-val.tsv" and "aida-yago2-dataset-test.tsv" ? I can only generate one dataset file according to the website you recommend, that is "AIDA-YAGO2-dataset.tsv".

The file in gerbil-SpotWrapNifWS4Test/repository

Dear author, I am trying to run the GERBIL evaluation.
In the gerbil-SpotWrapNifWS4Test/pom.xml, I found this:
 <repository> <id>local repository</id> <url>file://${project.basedir}/repository</url> </repository>
And when I tried to config experiment, there is an error caused by the missing artifacts in this local repository.
The error is as follow:
org.eclipse.aether.resolution.ArtifactResolutionException: The following artifacts could not be resolved: org.aksw:gerbil.nif.transfer:jar:1.1.0-SNAPSHOT, org.restlet:org.restlet:jar:2.2.1, org.restlet:org.restlet.ext.servlet:jar:2.2.1: Could not find artifact org.aksw:gerbil.nif.transfer:jar:1.1.0-SNAPSHOT in local repository (file:/data/luxy/EntQA/gerbil-SpotWrapNifWS4Test/repository)
How can I get the artifacts in the local repository?

This is what is returned by the program :

all_cands_embeds = np.load(args.cands_embeds_path)
  File "/home/m/anaconda3/envs/entqa/lib/python3.8/site-packages/numpy/lib/npyio.py", line 440, in load
    return format.read_array(fid, allow_pickle=allow_pickle,
  File "/home/m/anaconda3/envs/entqa/lib/python3.8/site-packages/numpy/lib/format.py", line 787, in read_array
    array.shape = shape
ValueError: cannot reshape array of size 2336706528 into shape (5903531,1024)

Would you be so kind as to help me understand what I am not doing properly ? :)

some issues to reproduce the Gerbil result

Hello, really nice work:

I have a few questions/corrections/suggestions to reproduce the gerbil results.

the "pip -r install requirements.txt" in readme should be "pip install -r requirements.txt"
'torch=1.7.1' in requirements.txt shoud be 'torch==1.7.1'
in readme, the parameter '--raw_kb_dir' command should be '--raw_kb_path' used by preprocess_data.py
As for gerbil evaluation, the EntQA package uses similar interface to "end2end_neural_el" and "rel". I am trying to use the provided gerbil snapshot to evaluate on "rel" locally, but find the metrics is much lower than its reported values. It shows many wikipedia pages are unknown which should be valid pages. Do you have any ideas ?
Can you explain the actual input/output (an example is greatly appreciated) of the "python gerbil_experiments/server.py" if it is setup.

wenzhengzhang / entqa Goto Github PK

entqa's People

Contributors

Stargazers

Watchers

Forkers

entqa's Issues

Probability of predicted entities decreases with increase in number of candidates

Some issues in data preprocessing

Cannot reproduce the results

Questions about the AIDA CoNLL datasets

The file in gerbil-SpotWrapNifWS4Test/repository

evaluate on GERBIL

可以发一个安装并运行EntQA的视频么

ValueError np.load(ars.cands_embeds_path)

some issues to reproduce the Gerbil result

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	def char2token(text, index):
	char2token_list = []
	for i, tok in enumerate(text):
	char2token_list += [i] * len(tok.replace("##", ""))
	return char2token_list[index]