Comments (14)
@heisenbugfix You've hit a rare instance of Wikipedia/Wikidata deprecating or merging articles. In general this happens when an article is deemed spammy or too similar to another article. My suggestion here is to look for "Aspect of History" (what Q17524420 originally stood for) and try to find what it used to point to/was used for and find a good substitute, or remove altogether).
from deeptype.
I got the same error for HISTORY - "Q309". I searched this id in the wikidata_ids.txt in "data/wikidata/" and it was not found. This might be the cause because if its not in wikidata_ids.txt then this item is not captured at all. I followed the exact steps as given in README.md. Still getting the above error. @JonathanRaiman - Please help!!
from deeptype.
I have the same problem when I either want to extract data or use the human generated type system:
sh extraction/full_preprocess.sh ${DATA_DIR} en
ends with KeyError for
people Q2472587
export LANGUAGE=fr
export DATA_DIR=data/
python3 extraction/project_graph.py ${DATA_DIR}wikidata/ extraction/classifiers/type_classifier.py
crashes with KeyError for:
Q10855242 (race horse)
Q23038290 (fossile taxon)
enwiki/food
I stopped removing after his point, I assume it is a deep rabbit hole. Is there an easy way around this? I really want to use this research, but I sadly cannot.
from deeptype.
@jcklie I suggest either finding an older Wikidata dump to download (no longer faces invalidation issues), or starting from scratch your own ruleset/script. I've had this issue when upgrading to newer wikidatas in the past, and usually there were ~3 broken ones that were usually associated to merges of infrequently used Q-ids.
from deeptype.
@JonathanRaiman Thank you for your quick response. I have another question about it: why does it also crash with "full_preprocess.sh"? I thought that this script automagically collects all the data from scratch while using the newest Wikipedia and Wikidata.
from deeptype.
@jcklie Final step of full_preprocess calls fast_link_fixer.py
a step that calls/uses several Q-ids to construct inheritance rules for "fixing" wikipedia links (e.g. changing the counts so that they get grouped in more semantic ways). This step is optional (e.g. the "non-fixed" counts are also compatible with the code). Nonetheless, if you find that errors occur on this step, you can re-run just that script separate from the "full_preprocess.sh" pipeline and replace/update the missing Q-id with its valid new Q-id.
For instance fast_link_fixer.py
has on line 99: PEOPLE = wkd(c, "Q2472587")
, and suppose Q2472587
has vanished, then I would suggest finding other parent classes for an instance of "people" (e.g. Jewish people Q7325
has "nation" and "ethnoreligious group" as possible alternative parents).
Concerning Q2472587
specifically, I'm a bit confused because it still shows up here, so I'm not sure what went wrong in the extraction process. If you can post the traceback/error that might help track down where/why some Q-ids went missing.
from deeptype.
I run sh extraction/full_preprocess.sh ${DATA_DIR} en
The end is
Construct mapping 100% (254777507 lines) |######################|Time: 1:04:32
loaded trie
61455967/254777507 anchor_tags could not be found in wikidata
3264/254777507 anchor_tags links were malformed/too long
Missing anchor_tags sample:
Anarchism -> Anarchism
self-governed -> Self-governance
hierarchies -> Hierarchy
stateless societies -> Stateless society
anarcho-capitalism -> anarcho-capitalism
anarchist legal philosophy -> Anarchist law
anti-authoritarian interpretations -> Libertarian socialism
collectivism -> Collectivist anarchism
individualism -> individualism
social -> Social anarchism
/home/klie/entity-linking/deeptype/venv/lib64/python3.6/site-packages/wikidata_linker_utils/type_collection.py:351: UserWarning: Node 'Q3679160' under `bad_node` is not a known wikidata id.
el
...
loading wikidata id -> index
done
Traceback (most recent call last):
File "extraction/fast_link_fixer.py", line 594, in <module>
main()
File "extraction/fast_link_fixer.py", line 456, in main
initialize_globals(c)
File "extraction/fast_link_fixer.py", line 99, in initialize_globals
PEOPLE = wkd(c, "Q2472587")
File "extraction/fast_link_fixer.py", line 72, in wkd
return c.name2index[name]
File "/home/klie/entity-linking/deeptype/venv/lib64/python3.6/site-packages/wikidata_linker_utils/wikidata_ids.py", line 20, in __getitem__
value = self.marisa[key]
File "marisa_trie.pyx", line 462, in marisa_trie.BytesTrie.__getitem__ (src/marisa_trie.cpp:8352)
KeyError: 'Q2472587'
from deeptype.
I have the same problem and I tried looking at the constructed trie. It looks like a lot of category links and anchor tags are missing:
32644079/33088413 category links could not be found in wikidata
85/33088413 category links were malformed
Missing links sample:
'enwiki/Anarchism' -> 'enwiki/Category:Anarchism'
'enwiki/Anarchism' -> 'enwiki/Category:Anti-capitalism'
'enwiki/Anarchism' -> 'enwiki/Category:Anti-fascism'
'enwiki/Anarchism' -> 'enwiki/Category:Far-left politics'
'enwiki/Anarchism' -> 'enwiki/Category:Libertarian socialism'
'enwiki/Anarchism' -> 'enwiki/Category:Political culture'
'enwiki/Anarchism' -> 'enwiki/Category:Political ideologies'
'enwiki/Anarchism' -> 'enwiki/Category:Social theories'
'enwiki/Autism' -> 'enwiki/Category:Autism'
'enwiki/Autism' -> 'enwiki/Category:Articles containing video clips'
162759256/255575773 anchor_tags could not be found in wikidata
3286/255575773 anchor_tags links were malformed/too long
Missing anchor_tags sample:
Anarchism -> Anarchism
anti-authoritarian -> anti-authoritarian
political philosophy -> political philosophy
self-governed -> Self-governance
cooperative -> cooperative
hierarchies -> Hierarchy
stateless societies -> Stateless society
free associations -> Free association (communism and anarchism)
state -> State (polity)
far-left -> Far-left politics
Most of the entities in fast_link_fixer.py
are not in the trie but are still available on Wikidata. It seems fine to me to just ignore them but I'm not sure whether it will affect the results?
Also, it's a stupid question but I couldn't find the place to download the same Wiki dump as mentioned in the paper. Can you point me to it?
All the best.
from deeptype.
Hi. Can you tell me how to map Qxxx with an entity with the wikidata we download. I dont find any file to map these things. Please help. Tks
from deeptype.
def load_aucs():
paths = [
"/home/jonathanraiman/en_field_auc_w10_e10.json",
"/home/jonathanraiman/en_field_auc_w10_e10-s1234.json",
"/home/jonathanraiman/en_field_auc_w5_e5.json",
"/home/jonathanraiman/en_field_auc_w5_e5-s1234.json"
]
where are these come from??????~~~
from deeptype.
Could you please tell us that what's the dump file you have used in your paper? we failed in this step, so can not do next step, Thank you very much.
from deeptype.
I have the same issue with "Q2472587" did anyone fix it?
from deeptype.
I have the same issue with "Q2472587" did anyone fix it?
I am having the same problem. Did anyone manage to fix it??
from deeptype.
Same problem here with Q20871948
from deeptype.
Related Issues (20)
- Getting NaN loss after few epochs ~30 HOT 1
- Evaluate Learnability does not give output graph as per the jupyter notebook LearnabilityStudy.ipynb
- FileNotFoundError: [Errno 2] No such file or directory: 'data/location_classification/classes.txt' HOT 1
- Getting coverage for only 6 million wikidata items in evolved type HOT 4
- How can I find associated types of an entity ? HOT 1
- how much disk for decompress the file latest-all.json.bz2 HOT 9
- No space left on device error when running full_preprocess.sh HOT 2
- where do you get candidate entities for a mention from? HOT 7
- How to write the function classify() in extraction/type_classifier.py? HOT 2
- Where is this file, or from which file and step it was generated? HOT 1
- How can i use an evolved type system only? HOT 1
- alpha in equation and how to select parameters in training type classifier HOT 1
- Compatibility with Tensorflow 1.14 HOT 2
- Train in a custom dataset HOT 4
- Is there a pre-trained, plug-n-play model that we can play with? HOT 5
- Use Python 3.6 and Cython 0.26 to install wikidata_linker_utils HOT 1
- FileNotFoundError: [Errno 2] No such file or directory: 'data/wikidata/wikidata_wikititle2wikidata.tsv' HOT 1
- Issue replicating accuracy of 0.98 HOT 2
- TypeError: 'NoneType' object is not iterable when running full_preprocess.sh HOT 3
- load_wikidata_ids tries to build a marisatrie.RecordTrie. Python runs out of memory.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from deeptype.