Giter Club home page Giter Club logo

Comments (14)

JonathanRaiman avatar JonathanRaiman commented on August 15, 2024 1

@heisenbugfix You've hit a rare instance of Wikipedia/Wikidata deprecating or merging articles. In general this happens when an article is deemed spammy or too similar to another article. My suggestion here is to look for "Aspect of History" (what Q17524420 originally stood for) and try to find what it used to point to/was used for and find a good substitute, or remove altogether).

from deeptype.

heisenbugfix avatar heisenbugfix commented on August 15, 2024

I got the same error for HISTORY - "Q309". I searched this id in the wikidata_ids.txt in "data/wikidata/" and it was not found. This might be the cause because if its not in wikidata_ids.txt then this item is not captured at all. I followed the exact steps as given in README.md. Still getting the above error. @JonathanRaiman - Please help!!

from deeptype.

jcklie avatar jcklie commented on August 15, 2024

I have the same problem when I either want to extract data or use the human generated type system:

sh extraction/full_preprocess.sh ${DATA_DIR} en ends with KeyError for

people Q2472587

export LANGUAGE=fr
export DATA_DIR=data/
python3 extraction/project_graph.py ${DATA_DIR}wikidata/ extraction/classifiers/type_classifier.py

crashes with KeyError for:

Q10855242 (race horse)
Q23038290 (fossile taxon)
enwiki/food

I stopped removing after his point, I assume it is a deep rabbit hole. Is there an easy way around this? I really want to use this research, but I sadly cannot.

from deeptype.

JonathanRaiman avatar JonathanRaiman commented on August 15, 2024

@jcklie I suggest either finding an older Wikidata dump to download (no longer faces invalidation issues), or starting from scratch your own ruleset/script. I've had this issue when upgrading to newer wikidatas in the past, and usually there were ~3 broken ones that were usually associated to merges of infrequently used Q-ids.

from deeptype.

jcklie avatar jcklie commented on August 15, 2024

@JonathanRaiman Thank you for your quick response. I have another question about it: why does it also crash with "full_preprocess.sh"? I thought that this script automagically collects all the data from scratch while using the newest Wikipedia and Wikidata.

from deeptype.

JonathanRaiman avatar JonathanRaiman commented on August 15, 2024

@jcklie Final step of full_preprocess calls fast_link_fixer.py a step that calls/uses several Q-ids to construct inheritance rules for "fixing" wikipedia links (e.g. changing the counts so that they get grouped in more semantic ways). This step is optional (e.g. the "non-fixed" counts are also compatible with the code). Nonetheless, if you find that errors occur on this step, you can re-run just that script separate from the "full_preprocess.sh" pipeline and replace/update the missing Q-id with its valid new Q-id.

For instance fast_link_fixer.py has on line 99: PEOPLE = wkd(c, "Q2472587"), and suppose Q2472587 has vanished, then I would suggest finding other parent classes for an instance of "people" (e.g. Jewish people Q7325 has "nation" and "ethnoreligious group" as possible alternative parents).

Concerning Q2472587 specifically, I'm a bit confused because it still shows up here, so I'm not sure what went wrong in the extraction process. If you can post the traceback/error that might help track down where/why some Q-ids went missing.

from deeptype.

jcklie avatar jcklie commented on August 15, 2024

@JonathanRaiman

I run sh extraction/full_preprocess.sh ${DATA_DIR} en

The end is

Construct mapping 100% (254777507 lines) |######################|Time:  1:04:32
loaded trie
61455967/254777507 anchor_tags could not be found in wikidata
3264/254777507 anchor_tags links were malformed/too long
Missing anchor_tags sample:
    Anarchism -> Anarchism
    self-governed -> Self-governance
    hierarchies -> Hierarchy
    stateless societies -> Stateless society
    anarcho-capitalism -> anarcho-capitalism
    anarchist legal philosophy -> Anarchist law
    anti-authoritarian interpretations -> Libertarian socialism
    collectivism -> Collectivist anarchism
    individualism -> individualism
    social -> Social anarchism
/home/klie/entity-linking/deeptype/venv/lib64/python3.6/site-packages/wikidata_linker_utils/type_collection.py:351: UserWarning: Node 'Q3679160' under `bad_node` is not a known wikidata id.
  el

...

loading wikidata id -> index
done
Traceback (most recent call last):
  File "extraction/fast_link_fixer.py", line 594, in <module>
    main()
  File "extraction/fast_link_fixer.py", line 456, in main
    initialize_globals(c)
  File "extraction/fast_link_fixer.py", line 99, in initialize_globals
    PEOPLE = wkd(c, "Q2472587")
  File "extraction/fast_link_fixer.py", line 72, in wkd
    return c.name2index[name]
  File "/home/klie/entity-linking/deeptype/venv/lib64/python3.6/site-packages/wikidata_linker_utils/wikidata_ids.py", line 20, in __getitem__
    value = self.marisa[key]
  File "marisa_trie.pyx", line 462, in marisa_trie.BytesTrie.__getitem__ (src/marisa_trie.cpp:8352)
KeyError: 'Q2472587'

from deeptype.

dungtn avatar dungtn commented on August 15, 2024

I have the same problem and I tried looking at the constructed trie. It looks like a lot of category links and anchor tags are missing:

32644079/33088413 category links could not be found in wikidata
85/33088413 category links were malformed
Missing links sample:
'enwiki/Anarchism' -> 'enwiki/Category:Anarchism'
'enwiki/Anarchism' -> 'enwiki/Category:Anti-capitalism'
'enwiki/Anarchism' -> 'enwiki/Category:Anti-fascism'
'enwiki/Anarchism' -> 'enwiki/Category:Far-left politics'
'enwiki/Anarchism' -> 'enwiki/Category:Libertarian socialism'
'enwiki/Anarchism' -> 'enwiki/Category:Political culture'
'enwiki/Anarchism' -> 'enwiki/Category:Political ideologies'
'enwiki/Anarchism' -> 'enwiki/Category:Social theories'
'enwiki/Autism' -> 'enwiki/Category:Autism'
'enwiki/Autism' -> 'enwiki/Category:Articles containing video clips'

162759256/255575773 anchor_tags could not be found in wikidata
3286/255575773 anchor_tags links were malformed/too long
Missing anchor_tags sample:
Anarchism -> Anarchism
anti-authoritarian -> anti-authoritarian
political philosophy -> political philosophy
self-governed -> Self-governance
cooperative -> cooperative
hierarchies -> Hierarchy
stateless societies -> Stateless society
free associations -> Free association (communism and anarchism)
state -> State (polity)
far-left -> Far-left politics

Most of the entities in fast_link_fixer.py are not in the trie but are still available on Wikidata. It seems fine to me to just ignore them but I'm not sure whether it will affect the results?

Also, it's a stupid question but I couldn't find the place to download the same Wiki dump as mentioned in the paper. Can you point me to it?

All the best.

from deeptype.

linhlt2689 avatar linhlt2689 commented on August 15, 2024

Hi. Can you tell me how to map Qxxx with an entity with the wikidata we download. I dont find any file to map these things. Please help. Tks

from deeptype.

muscleSunFlower avatar muscleSunFlower commented on August 15, 2024

def load_aucs():
paths = [
"/home/jonathanraiman/en_field_auc_w10_e10.json",
"/home/jonathanraiman/en_field_auc_w10_e10-s1234.json",
"/home/jonathanraiman/en_field_auc_w5_e5.json",
"/home/jonathanraiman/en_field_auc_w5_e5-s1234.json"
]

where are these come from??????~~~

from deeptype.

Lavine24 avatar Lavine24 commented on August 15, 2024

Could you please tell us that what's the dump file you have used in your paper? we failed in this step, so can not do next step, Thank you very much.

from deeptype.

lbozarth avatar lbozarth commented on August 15, 2024

I have the same issue with "Q2472587" did anyone fix it?

from deeptype.

 avatar commented on August 15, 2024

I have the same issue with "Q2472587" did anyone fix it?

I am having the same problem. Did anyone manage to fix it??

from deeptype.

zbeloki avatar zbeloki commented on August 15, 2024

Same problem here with Q20871948

from deeptype.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.