datasets / country-codes Goto Github PK
View Code? Open in Web Editor NEWComprehensive country code information, including ISO 3166 codes, ITU dialing codes, ISO 4217 currency codes, and many others
Home Page: https://datahub.io/core/country-codes
Comprehensive country code information, including ISO 3166 codes, ITU dialing codes, ISO 4217 currency codes, and many others
Home Page: https://datahub.io/core/country-codes
Currency for Latvia is out of date, is EUR from 01.01.2014.
Formal and official lists that define usual "spatial location" and "spatial relations":
BO
) have the UTM cells {19L,20L,19K,20K,21K}
BO
) is bordered by Argentina, Brazil, ..., that is, the set {AR,BR,CL,PE,PY}
. We can adopt as "formal official neighbor" the _Touches_ operator of the DE-9IM, that is a OGC SQL standard.Just wondering if https://github.com/lipis/flag-icon-css can be merged in some way...
I suggest using the public domain dedication and license (PDDL). For example JSON for this for datapackage.json see http://data.okfn.org/tools/dp/create.json
The link in the repository description (http://data.okfn.org/data/country-codes) now redirects to https://datahub.io/
Should update the link to either
or
https://datahub.io/core/country-codes (however this one does not seem to update automatically to reflect repo changes)
BTW: what would long names be?
For the above dataset, metadata points to:
http://data.okfn.org/tools/country-codes/datapackage.json
Cannot GET /tools/country-codes/datapackage.json
Namibia's ISO 2char code is 'NA' but missing in the data.
Probably, pandas read_csv was used somewhere in the processing, which interprets 'NA' as missing by default (use keep_default_na=False to prevent this).
@ewheeler It would be nice adding the two digit continent code to the country-codes. We could retrieve data about countries and their continents from here
@rgrp , @pdehaye would you mind leaving your thoughts here.
What do we prefer out of:
Suggest change title to:
List of Countries and Associated Codes including ISO 2/3 digit codes (ISO-3166)
There are other official langs than en and fr, but there are many.... As #15 suggest, the CSV can include "official langs", so, we can add more columns (or another new CSV) to show name_es
, name_pt
, etc. for the more used langs.
specify field types (string, integer, number, etc) in schema describing data/country-codes-comprehensive.csv
Perhaps I'm missing it, and perhaps it isn't viewed as important... but as someone just now finding this resource, my very first question was "WHEN"? I have no idea if this data is from yesterday or from 1990. I just don't see a date anywhere. So just a suggestion that a "last updated" could be useful on the download page, if not the file itself.
I'm trying to use parse this dataset using the python datapackage
utility. Among other issues (bugs in datapackage
), I am unable to deal with the Palestine row due to the fact that there are two values for the GAUL column:
"Palestine, State of","Palestine, État de",PS,PSE,275, ,"gz,wj", , ,970,PLE,"GZ,WE","91,267",PLE,,"PALESTINE, STATE OF",,No universal currency,,In contention
I also see that there are multiple country codes specified (comma-sep), and while that isn't triggering an error (since it's just a "string"), that is going to result in unehlpful data.
If this is conforming to the tabular data spec, I can work with datapackage
author to get this resolved there. If not, I'd propose either:
Hi,
the Czech Republic changed its English name to Czechia.
Wikipedia
https://en.wikipedia.org/wiki/Name_of_the_Czech_Republic
Adoption of Czechia
[...]
In 2013, Czech president Miloš Zeman recommended the wider official use of Czechia,[34] and on 14 April 2016, the country's political leadership agreed to make Czechia the official short name. The new name was approved by the Czech cabinet on 2 May 2016[35] as the Czech Republic's official short name and was published in the United Nations UNTERM and UNGEGN country name databases on July 5th, 2016. [36]
Ministry of Foreign Affairs
Short country name "Česko"/"Czechia" to be entered in UN databases
http://www.mzv.cz/jnp/en/issues_and_press/factsheets/x2016_04_21_the_completion_of_translations_of_the.html
It's not in the [UN Statistics](United Nations Statistics Division) list yet (last updated in 2013 it seems), but it is in use e.g. in the UN Treaty collection.
Not sure if it's worth updating/patching but wanted to note it here.
Apparantely the SEC has its own code system for countries.
https://www.sec.gov/edgar/searchedgar/edgarstatecodes.htm
I can add them if there's consensus that they are useful.
CLDR v30.0.3 was released on 2016-12-02 (http://cldr.unicode.org/index/downloads/cldr-30) and includes the following relevant changes:
Opps, should I have opened this issue in the https://github.com/datasets/language-codes repo?
Please divide three dial codes of the Dominican Republic.
See https://en.wikipedia.org/wiki/Republic_of_Macedonia
"the former Yugoslav Republic of Macedonia" was provisional, and today not used. Use "Republic of Macedonia" instead.
For Serbia in 'Dial' field code is "381 p". Can you fix this, please.
json file in data directory looks a bit incongruous - should it be there
The entry for Namibia has ZAR incorrectly listed as the primary currency, presumably due to the country appearing twice in the official source data [1] (as ZAR is legal tender there):
How should the import script handle this?
[1] http://www.currency-iso.org/dam/downloads/lists/list_one.xml
Suggestion to use Goodtables.io. See example of datasets-br/state-codes, that is using. You can run offline by goodtables-py#cli.
The datatype of column ISO4217-currency_minor_uni must be an array or list, idem ISO4217-currency_numeric_code. The type "number" is about one number.
[27,12] [type-or-format-error] The value "2,2" in row 27 and column 12 is not type "number" and format "default"
[27,14] [type-or-format-error] The value "356,064" in row 27 and column 14 is not type "number" and format "default"
[59,12] [type-or-format-error] The value "2,2" in row 59 and column 12 is not type "number" and format "default"
[59,14] [type-or-format-error] The value "192,931" in row 59 and column 14 is not type "number" and format "default"
[72,12] [type-or-format-error] The value "2,2" in row 72 and column 12 is not type "number" and format "default"
[72,14] [type-or-format-error] The value "222,840" in row 72 and column 14 is not type "number" and format "default"
[101,12] [type-or-format-error] The value "2,2" in row 101 and column 12 is not type "number" and format "default"
[101,14] [type-or-format-error] The value "332,840" in row 101 and column 14 is not type "number" and format "default"
[127,12] [type-or-format-error] The value "2,2" in row 127 and column 12 is not type "number" and format "default"
[127,14] [type-or-format-error] The value "426,710" in row 127 and column 14 is not type "number" and format "default"
[153,12] [type-or-format-error] The value "2,2" in row 153 and column 12 is not type "number" and format "default"
[153,14] [type-or-format-error] The value "516,710" in row 153 and column 14 is not type "number" and format "default"
[169,12] [type-or-format-error] The value "2,2" in row 169 and column 12 is not type "number" and format "default"
[169,14] [type-or-format-error] The value "590,840" in row 169 and column 14 is not type "number" and format "default"
PS: please change the URL http://data.okfn.org/data/country-codes (at this project's home) to a correct one.
Would it be possible to get the README as markdown? This is what is currently expected by http://data.okfn.org registry (though that behaviour could be modified to expect README and README.txt ...)
Looks like fix for #47 broke ability for this dataset to be loaded (at least by the python datapackage), since CSV column name no longer matches what's in the datapackage.json
file.
Value is 91 267
(with a space between 91 and 267), causing int parsing to fail in MATLAB
as @wodow suggested, it would be great to have some 'smoke tests' or simple assertions to help avoid regressions
one approach could be to save the output of csvkit's csvstat
for data/country-codes.csv
for the last commit that touches the file and then diff with current output after making a change.
example output for 71eded8 version of data/country-codes.csv
is pasted below
these stats would naturally change as various codes change over time, but highlighting per-column changes in Unique values
and presence of Nulls
would be useful for surfacing regressions
1. name
<type 'unicode'>
Nulls: True
Unique values: 249
Max length: 38
2. official_name_en
<type 'unicode'>
Nulls: True
Unique values: 241
Max length: 52
3. official_name_fr
<type 'unicode'>
Nulls: True
Unique values: 241
Max length: 62
4. ISO3166-1-Alpha-2
<type 'unicode'>
Nulls: True
Unique values: 248
Max length: 4
5. ISO3166-1-Alpha-3
<type 'unicode'>
Nulls: True
Unique values: 249
Max length: 4
6. ISO3166-1-numeric
<type 'unicode'>
Nulls: False
Unique values: 251
Max length: 3
7. ITU
<type 'unicode'>
Nulls: True
Unique values: 232
5 most frequent values:
NOR: 2
LIE: 1
EGY: 1
AGL: 1
BGD: 1
Max length: 4
8. MARC
<type 'unicode'>
Nulls: True
Unique values: 243
5 most frequent values:
uik: 3
gw: 1
gv: 1
gu: 1
gt: 1
Max length: 14
9. WMO
<type 'unicode'>
Nulls: True
Unique values: 213
5 most frequent values:
BX: 2
NU: 2
VI: 2
AT: 2
BD: 1
Max length: 4
10. DS
<type 'unicode'>
Nulls: True
Unique values: 175
5 most frequent values:
F: 10
USA: 7
AUS: 5
NZ: 4
FIN: 2
Max length: 4
11. Dial
<type 'unicode'>
Nulls: True
Unique values: 228
5 most frequent values:
44: 4
672: 3
262: 3
590: 3
61: 3
Max length: 17
12. FIFA
<type 'unicode'>
Nulls: True
Unique values: 237
Max length: 15
13. FIPS
<type 'unicode'>
Nulls: True
Unique values: 247
5 most frequent values:
NL: 2
BD: 1
BE: 1
BF: 1
BG: 1
Max length: 26
14. GAUL
<type 'int'>
Nulls: True
Min: 1
Max: 91267
Sum: 245883
Mean: 1011.86419753
Median: 145
Standard Deviation: 7183.59897321
Unique values: 243
15. IOC
<type 'unicode'>
Nulls: True
Unique values: 226
Max length: 4
16. ISO4217-currency_alphabetic_code
<type 'unicode'>
Nulls: True
Unique values: 149
5 most frequent values:
EUR: 33
USD: 17
XOF: 8
XCD: 8
XAF: 6
Max length: 4
17. ISO4217-currency_country_name
<type 'unicode'>
Nulls: True
Unique values: 238
Max length: 44
18. ISO4217-currency_minor_unit
<type 'int'>
Nulls: True
Values: 0, 2, 3
19. ISO4217-currency_name
<type 'unicode'>
Nulls: True
Unique values: 150
5 most frequent values:
Euro: 33
US Dollar: 17
CFA Franc BCEAO: 8
East Caribbean Dollar: 8
CFA Franc BEAC: 6
Max length: 29
20. ISO4217-currency_numeric_code
<type 'unicode'>
Nulls: True
Unique values: 149
5 most frequent values:
978: 33
840: 17
951: 8
952: 8
950: 6
Max length: 4
21. is_independent
<type 'unicode'>
Nulls: True
Unique values: 18
5 most frequent values:
Yes: 195
Territory of GB: 12
Part of FR: 8
Territory of AU: 4
Part of NL: 4
Max length: 22
22. Capital
<type 'unicode'>
Nulls: True
Unique values: 242
5 most frequent values:
Kingston: 2
Kinshasa: 1
East Jerusalem: 1
Kiev: 1
Paris: 1
Max length: 19
23. Continent
<type 'unicode'>
Nulls: True
Unique values: 6
5 most frequent values:
AF: 58
AS: 52
EU: 52
OC: 27
SA: 14
Max length: 4
24. TLD
<type 'unicode'>
Nulls: True
Unique values: 246
5 most frequent values:
.gp: 3
.cx: 1
.cy: 1
.cz: 1
.ro: 1
Max length: 4
25. Languages
<type 'unicode'>
Nulls: True
Unique values: 243
5 most frequent values:
fr: 3
en: 2
fr-CI: 1
zh-CN,yue,wuu,dta,ug,za: 1
ar-TN,fr: 1
Max length: 89
26. geonameid
<type 'int'>
Nulls: True
Min: 49518
Max: 7909807
Sum: 593982118
Mean: 2385470.35341
Median: 2363686
Standard Deviation: 1541571.57676
Unique values: 249
27. EDGAR
<type 'unicode'>
Nulls: True
Unique values: 214
Max length: 4
Row count: 251
It's really great that you provide this data to us all - thanks a lot - this helps very much with my project ;)
There is a general problem when using this data with MYSQL statments. Hyphens in column names are causing trouble. Usually you can enclose them in backticks - except when using JOIN command - then there's always an error. No matter if using backticks or brackets to enclose the column name. The only way is to alter columns name to use underscores instead of hyphens. Please consider it in your work.
available as .pdf and .doc here: http://www.itu.int/pub/T-SP-LT.1-2013
maybe use antiword to parse .doc
Wikipedia have stable pages for all countries, and Wikidata supply an ID for it. Today Wikidata IDs are playing important role as "concept identifier", for Web Semantic in general and for open projects like OpenStreetMaps, etc.
Example: BR is https://www.wikidata.org/wiki/Q155 , so the column wd_id
of line BR
is Q155
. With Wikidata API we can fill automatically the wd_id
column.
$ goodtables datapackage http://data.okfn.org/data/core/country-codes/datapackage.json
DATASEThttps://github.com/frictionlessdata/datapackage-py/issues/122
=======
{'error-count': 2, 'table-count': 1, 'time': 4.537, 'valid': False}
TABLE [1]
=========
{'datapackage': 'http://data.okfn.org/data/core/country-codes/datapackage.json',
'error-count': 2,
'headers': ['name',
'official_name_en',
'official_name_fr',
'ISO3166-1-Alpha-2',
'ISO3166-1-Alpha-3',
'ISO3166-1-numeric',
'ITU',
'MARC',
'WMO',
'DS',
'Dial',
'FIFA',
'FIPS',
'GAUL',
'IOC',
'ISO4217-currency_alphabetic_code',
'ISO4217-currency_country_name',
'ISO4217-currency_minor_unit',
'ISO4217-currency_name',
'ISO4217-currency_numeric_code',
'is_independent',
'Capital',
'Continent',
'TLD',
'Languages',
'geonameid',
'EDGAR'],
'row-count': 252,
'source': 'https://raw.github.com/datasets/country-codes/master/data/country-codes.csv',
'time': 3.752,
'valid': False}
---------
[-,6] [non-matching-header] Header in column 6 doesn't match field name UN Statistics M49 numeric codes
[-,26] [non-matching-header] Header in column 26 doesn't match field name Geoname ID
Hi,
The values in the ISO3166-1-numeric
column are actually UN M49
codes (which is where you scrape them from).
These two code lists overlap almost completely, except for the two top ones. Channel Islands and Sark, are not in the ISO 3166-1 standard, but are in UN M49. So maybe it's good to either remove the top ones or change the column name & description.
What's difference between these 2 columns?
this is identical for iso 3166-2 and is more comprehensive. Suggest we deprecate country-codes for this repo
I want to add TLDs and ITU designation in English (from here), but since ISO no longer provides iso3166 lists for free -- ./scripts/get_countries_of_earth.py fails.
Can I expect that list-en1-semic-3.txt & list-fr1-semic.txt are already downloaded or script should be fixed (add files to repo/change it fetch info from other source like maxmind)?
So how should I proceed with editing get_countries_of_earth.py?
Very handy to have that! Name of the timezone and difference from UTC as the two different columns.
Please consider using customary country names instead of the current official names. Customary country names provided by the Unicode CLDR are used widely by major websites including Google.
As a bonus, the Unicode CLDR provides additional country names in many languages, and thus better for wider application of the country-codes.
I have provided a full country name list from the latest Unicode CLDR 25 in the forked version below:
https://github.com/hanteng/country-codes/tree/master/data/country-names
Please consider replacing or adding the names into the country-codes dataset.
the raw CSV contains several locations with double space chars
Maybe there could be a search&replace for those items, e.g.
...,"Chine, région administrative spéciale de Hong Kong",...
in C# style, I'd write:
str.Replace(" ", " ");
Currency codes for GB & US, at least, were lost in the jumbo commit d4e4895.
Too long to check all the edits, so maybe revert?
Do we want natural capitalized names rather than upper case e.g. United States rather than UNITED STATES
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.