Giter Club home page Giter Club logo

Comments (7)

ewheeler avatar ewheeler commented on June 13, 2024

Good suggestion @ppKrauss

Is there any listing of these stable country pages on wikidata? I've not found a listing/category for these or a way to crawl/fetch them all programmatically

from country-codes.

ppKrauss avatar ppKrauss commented on June 13, 2024

Hi @ewheeler, thanks (!), I will check best strategy next week. There are two ways,

  1. Use a list of countries at Wikipedia as source, parsing it by a little adaptation in this wikitext2CSV script. Audit advantages: is human readable and audited by English-Wikipedia community.

  2. Use SparQL and trust only in Wikidata, looking for all instances of Q6256... Or use some trusted DBpedia (as Wikidata curators) algorithm to get it.

The item 2 is the ideal solution and generates an automatic CSV.

from country-codes.

ppKrauss avatar ppKrauss commented on June 13, 2024

Testing solution of item 2,

SELECT ?item ?itemLabel 
WHERE {
  ?item wdt:P31 wd:Q6256.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}

run this query here and download as CSV to check JOIN.


Perhaps better! A CSV with only Wikidata-ID and 2-letter-country-code columns:

SELECT * 
WHERE {
  ?item wdt:P297 ?code
} ORDER BY ?code

here.

from country-codes.

ppKrauss avatar ppKrauss commented on June 13, 2024

Migration problem

Hi @ewheeler , can you help to check cause of errors at https://github.com/ppKrauss/country-codes ?
The dataset is good, but terminal goodtables datapackage.json say that no.

Wikidata minor problem

I am using SQL to check and JOIN... The JOIN is:

  SELECT  c.*, w.item as "wdId" 
  FROM dataset.vw_country_codes c LEFT JOIN wikidata_country w 
    ON w.code=c.iso3166_1_alpha_2 AND c.iso3166_1_alpha_2 IS NOT NULL 
    AND w.item NOT IN ('Q165783', 'Q2895', 'Q1249802', 'Q29999', 'Q407199', 'Q838261')

The wdId nulls are for Namibia and Sark only.

item code action
Q165783 BQ delete
Q27561 BQ preserve
Q2895 BY delete
Q184 BY preserve
Q1249802 FK delete
Q9648 FK preserve
Q29999 NL delete
Q55 NL preserve
Q407199 PS delete
Q219060 PS preserve
Q838261 YU delete
Q83286 YU preserve

The duplicated pairs are about Wikidata's records on "grouping nations" as "Kingdom of the Netherlands" in the NL pair.

from country-codes.

ppKrauss avatar ppKrauss commented on June 13, 2024

Hi @ewheeler, sorry for coming back so late ... Now the problems are solved, all be automatic.

Submiting pull request 65 to add sh wd_countries.sh in your makefile.

Supposing that you prefer to adapt your Python scripts to the join, a new column wd_id. You can join the tables on iso2_code=ISO3166-1-Alpha-2.

Only Sark is not there, because have no iso2_code, but you can add as Q3405693.

Wikidata have persistent IDs (it's safe!), so the rule of the thumb is to preserve the older Wikidata ID (wd_id) of a country when somebody try to duplicate it editing Wikidata. For "future new nations" the rule is to check Wikidata Item at the stable English Wikipedia page. The "manual filter" is the grep line at wd_countries.sh, and is cumulative.

from country-codes.

valerio-bozzolan avatar valerio-bozzolan commented on June 13, 2024

What is the blockage at the moment? Is any help needed on this? :) Thank you so much!

from country-codes.

rufuspollock avatar rufuspollock commented on June 13, 2024

@valerio-bozzolan PR is welcome to add this.

from country-codes.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.