Comments (7)
Good suggestion @ppKrauss
Is there any listing of these stable country pages on wikidata? I've not found a listing/category for these or a way to crawl/fetch them all programmatically
from country-codes.
Hi @ewheeler, thanks (!), I will check best strategy next week. There are two ways,
-
Use a list of countries at Wikipedia as source, parsing it by a little adaptation in this wikitext2CSV script. Audit advantages: is human readable and audited by English-Wikipedia community.
-
Use SparQL and trust only in Wikidata, looking for all instances of Q6256... Or use some trusted DBpedia (as Wikidata curators) algorithm to get it.
The item 2 is the ideal solution and generates an automatic CSV.
from country-codes.
Testing solution of item 2,
SELECT ?item ?itemLabel
WHERE {
?item wdt:P31 wd:Q6256.
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
run this query here and download as CSV to check JOIN.
Perhaps better! A CSV with only Wikidata-ID and 2-letter-country-code columns:
SELECT *
WHERE {
?item wdt:P297 ?code
} ORDER BY ?code
here.
from country-codes.
Migration problem
Hi @ewheeler , can you help to check cause of errors at https://github.com/ppKrauss/country-codes ?
The dataset is good, but terminal goodtables datapackage.json
say that no.
Wikidata minor problem
I am using SQL to check and JOIN... The JOIN is:
SELECT c.*, w.item as "wdId"
FROM dataset.vw_country_codes c LEFT JOIN wikidata_country w
ON w.code=c.iso3166_1_alpha_2 AND c.iso3166_1_alpha_2 IS NOT NULL
AND w.item NOT IN ('Q165783', 'Q2895', 'Q1249802', 'Q29999', 'Q407199', 'Q838261')
The wdId
nulls are for Namibia and Sark only.
item | code | action |
---|---|---|
Q165783 | BQ | delete |
Q27561 | BQ | preserve |
Q2895 | BY | delete |
Q184 | BY | preserve |
Q1249802 | FK | delete |
Q9648 | FK | preserve |
Q29999 | NL | delete |
Q55 | NL | preserve |
Q407199 | PS | delete |
Q219060 | PS | preserve |
Q838261 | YU | delete |
Q83286 | YU | preserve |
The duplicated pairs are about Wikidata's records on "grouping nations" as "Kingdom of the Netherlands" in the NL pair.
from country-codes.
Hi @ewheeler, sorry for coming back so late ... Now the problems are solved, all be automatic.
Submiting pull request 65 to add sh wd_countries.sh
in your makefile.
Supposing that you prefer to adapt your Python scripts to the join, a new column wd_id
. You can join the tables on iso2_code=ISO3166-1-Alpha-2
.
Only Sark is not there, because have no iso2_code, but you can add as Q3405693.
Wikidata have persistent IDs (it's safe!), so the rule of the thumb is to preserve the older Wikidata ID (wd_id
) of a country when somebody try to duplicate it editing Wikidata. For "future new nations" the rule is to check Wikidata Item at the stable English Wikipedia page. The "manual filter" is the grep
line at wd_countries.sh
, and is cumulative.
from country-codes.
What is the blockage at the moment? Is any help needed on this? :) Thank you so much!
from country-codes.
@valerio-bozzolan PR is welcome to add this.
from country-codes.
Related Issues (20)
- MYSQL hyphen problem HOT 1
- Wrong value in Dial field HOT 1
- Include official long names
- CLDR name for Sark should not be 'Namibia' HOT 3
- Update link in repository description HOT 3
- Use Goodtables to continuous data validation HOT 10
- "." not part of TLD HOT 1
- Request: Add timezone columns
- ISO-3166-1 out of date at source; consider switching source HOT 1
- Missing 3 country codes HOT 2
- use the git releases
- About Taiwan,China HOT 3
- New Venezuelan currency
- Not deploying to datahub.io
- Missing currency data for Taiwain
- Add currency symbols like "€" or "$" to country codes
- Change of country name TURKEY to TÜRKİYE
- Unable to download https://datahub.io/core/country-list/r/data.json HOT 1
- I suggest changing the source also to my repo where I keep the updated list from iso.org
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from country-codes.