datasets / country-codes Goto Github PK

Comprehensive country code information, including ISO 3166 codes, ITU dialing codes, ISO 4217 currency codes, and many others

Home Page: https://datahub.io/core/country-codes

Python 89.99% Makefile 10.01%

country-codes's Introduction

Comprehensive country code information, including ISO 3166 codes, ITU dialing codes, ISO 4217 currency codes, and many others. Provided as a Tabular Data Package: view datapackage

Data

Data are fetched from multiple sources:

Official formal and short names (in English, French, Spanish, Arabic, Chinese, and Russian) are from United Nations Protocol and Liaison Service

Customary English short names are from Unicode Common Locale Data Repository (CLDR) Project.

Note: CLDR shorter names "ZZ-alt-short" are used when available

ISO 3166 official short names (in English, French, Spanish, Arabic, Chinese, and Russian) are from United Nations Department of Ecoonomic and Social Affairs Statistics Division

ISO 4217 currency codes are from currency-iso.org

Many other country codes are from statoids.com

Special thanks to Gwillim Law for his excellent statoids.com site (some of the field descriptions are excerpted from his site), which is more up-to-date than most similar resources and is much easier to scrape than multiple Wikipedia pages.

Capital cities, languages, continents, TLDs, and geonameid are from geonames.org

EDGAR codes are from sec.gov

Preparation

This package includes Python scripts to fetch current country information from various data sources and output CSV of combined country code information.

CSV output is provided via the in2csv and csvcut utilities from csvkit

NOTE/TODO: currently, preparation requires manual process to download and rename 6 CSV files from https://unstats.un.org/unsd/methodology/m49/overview/

data/country-codes.csv

Install requirements:

pip install -r scripts/requirements.pip

Run GNU Make to generate data file:

make country-codes.csv

License

This material is licensed by its maintainers under the Public Domain Dedication and License.

Nevertheless, it should be noted that this material is ultimately sourced from ISO and other standards bodies and their rights and licensing policies are somewhat unclear. As this is a short, simple database of facts there is a strong argument that no rights can subsist in this collection. However, ISO state on their site:

ISO makes the list of alpha-2 country codes available for internal use and non-commercial purposes free of charge.

This carries the implication (though not spelled out) that other uses are not permitted and that, therefore, there may be rights preventing further general use and reuse.

If you intended to use these data in a public or commercial product, please check the original sources for any specific restrictions.

country-codes's People

Stargazers

Watchers

Forkers

buholzer jingoro pld klettow virdi nicmer francis28 wavesummit portauw incerpt imanfakhar kasima sorki codeatbusiness gauravpal26 boogermann defuz nrgi-data hozn twistedhardware dineshec2006 rickmasters haixing-hu sbambach sentences kamotos pagenoare limbus-medtec vmptk joel1st louis-p valiklviv jessevondoom haihoa bnesi ano ays-dev cloud-hai-vo t2be zolok nullstat annielytical zelima hanteng myschizobuddy manuelstofer mukda macrawat ksheurs shahrukhahmed89 cesarsegurac sharmaharh waylonflinn macle0d464 mithileshabhishek fitz0019 ciprianghetu vaiski cookie89 0x68 cjcrow yupswing mroed siverlinning sorokine davebarnwell rrusch atraining mariagitgit easadler lkmtri gozan nokome yiumingh dibihoussam paulditerwich rockyfjord khvzak monteforest davidsanchezuribe tomysaman dot-ly mirzacengic paulhunter81 radovankavicky gapdata ptsagkis testuserme1 nqtung robconery jeffokunkoz mtayyabhanif hugomillerpower chstark gagan17 roboandie loriroseamber ajparsons kanwalrizvi tidjee

country-codes's Issues

Use Goodtables to continuous data validation

Suggestion to use Goodtables.io. See example of datasets-br/state-codes, that is using. You can run offline by goodtables-py#cli.

Warnings

Data Package "datapackage.json" has a validation error Descriptor validation error: 'title' is a required property at "sources/0" in descriptor and at "properties/sources/items/required" in profile
... same warning for "sources/1", "sources/2"... "sources/6".

Errors

The datatype of column ISO4217-currency_minor_uni must be an array or list, idem ISO4217-currency_numeric_code. The type "number" is about one number.

[27,12] [type-or-format-error] The value "2,2" in row 27 and column 12 is not type "number" and format "default"
[27,14] [type-or-format-error] The value "356,064" in row 27 and column 14 is not type "number" and format "default"
[59,12] [type-or-format-error] The value "2,2" in row 59 and column 12 is not type "number" and format "default"
[59,14] [type-or-format-error] The value "192,931" in row 59 and column 14 is not type "number" and format "default"
[72,12] [type-or-format-error] The value "2,2" in row 72 and column 12 is not type "number" and format "default"
[72,14] [type-or-format-error] The value "222,840" in row 72 and column 14 is not type "number" and format "default"
[101,12] [type-or-format-error] The value "2,2" in row 101 and column 12 is not type "number" and format "default"
[101,14] [type-or-format-error] The value "332,840" in row 101 and column 14 is not type "number" and format "default"
[127,12] [type-or-format-error] The value "2,2" in row 127 and column 12 is not type "number" and format "default"
[127,14] [type-or-format-error] The value "426,710" in row 127 and column 14 is not type "number" and format "default"
[153,12] [type-or-format-error] The value "2,2" in row 153 and column 12 is not type "number" and format "default"
[153,14] [type-or-format-error] The value "516,710" in row 153 and column 14 is not type "number" and format "default"
[169,12] [type-or-format-error] The value "2,2" in row 169 and column 12 is not type "number" and format "default"
[169,14] [type-or-format-error] The value "590,840" in row 169 and column 14 is not type "number" and format "default"

PS: please change the URL http://data.okfn.org/data/country-codes (at this project's home) to a correct one.

Validation errors

$ goodtables datapackage http://data.okfn.org/data/core/country-codes/datapackage.json
DATASEThttps://github.com/frictionlessdata/datapackage-py/issues/122
=======
{'error-count': 2, 'table-count': 1, 'time': 4.537, 'valid': False}

TABLE [1]
=========
{'datapackage': 'http://data.okfn.org/data/core/country-codes/datapackage.json',
 'error-count': 2,
 'headers': ['name',
             'official_name_en',
             'official_name_fr',
             'ISO3166-1-Alpha-2',
             'ISO3166-1-Alpha-3',
             'ISO3166-1-numeric',
             'ITU',
             'MARC',
             'WMO',
             'DS',
             'Dial',
             'FIFA',
             'FIPS',
             'GAUL',
             'IOC',
             'ISO4217-currency_alphabetic_code',
             'ISO4217-currency_country_name',
             'ISO4217-currency_minor_unit',
             'ISO4217-currency_name',
             'ISO4217-currency_numeric_code',
             'is_independent',
             'Capital',
             'Continent',
             'TLD',
             'Languages',
             'geonameid',
             'EDGAR'],
 'row-count': 252,
 'source': 'https://raw.github.com/datasets/country-codes/master/data/country-codes.csv',
 'time': 3.752,
 'valid': False}
---------
[-,6] [non-matching-header] Header in column 6 doesn't match field name UN Statistics M49 numeric codes
[-,26] [non-matching-header] Header in column 26 doesn't match field name Geoname ID

from frictionlessdata/datapackage-py#122

Difference between entity and name

What's difference between these 2 columns?

get_countries_of_earth.py fails since ISO removed iso3166 list

I want to add TLDs and ITU designation in English (from here), but since ISO no longer provides iso3166 lists for free -- ./scripts/get_countries_of_earth.py fails.
Can I expect that list-en1-semic-3.txt & list-fr1-semic.txt are already downloaded or script should be fixed (add files to repo/change it fetch info from other source like maxmind)?
So how should I proceed with editing get_countries_of_earth.py?

Kosovo missing

https://en.wikipedia.org/wiki/Kosovo

flags

Just wondering if https://github.com/lipis/flag-icon-css can be merged in some way...

Add official geographical relations

Formal and official lists that define usual "spatial location" and "spatial relations":

list of UTM zones of the country, or list of "UTM grid cells", as even big countries have maximum of 30 or 40 cells. It is util to validate some geographical data, and to roughly define the geographical area/context of the country. Example: Bolivia (BO) have the UTM cells {19L,20L,19K,20K,21K}
PS: this suggestion is a "spatial version" of the "time suggestion #5".
geographical summary of spatial relations, are the list of "neighbor countries", as a simple iso-2letter codes. Example: Bolivia (BO) is bordered by Argentina, Brazil, ..., that is, the set {AR,BR,CL,PE,PY}. We can adopt as "formal official neighbor" the _Touches_ operator of the DE-9IM, that is a OGC SQL standard.

GAUL column for "Palestine, State of" is not an integer

Value is 91 267 (with a space between 91 and 267), causing int parsing to fail in MATLAB

Update link in repository description

The link in the repository description (http://data.okfn.org/data/country-codes) now redirects to https://datahub.io/

Should update the link to either

http://data.okfn.org/tools/view?url=https%3A%2F%2Fgithub.com%2Fdatasets%2Fcountry-codes#resource-country-codes

https://datahub.io/core/country-codes (however this one does not seem to update automatically to reflect repo changes)

Deprecate country-codes in favour of this

this is identical for iso 3166-2 and is more comprehensive. Suggest we deprecate country-codes for this repo

Add a license

I suggest using the public domain dedication and license (PDDL). For example JSON for this for datapackage.json see http://data.okfn.org/tools/dp/create.json

Request: Add timezone columns

Very handy to have that! Name of the timezone and difference from UTC as the two different columns.

CLDR v30.0.3 released

CLDR v30.0.3 was released on 2016-12-02 (http://cldr.unicode.org/index/downloads/cldr-30) and includes the following relevant changes:

New script codes for Adlam, Bhaiksuki, Marchen, Newa, Osage
Some support for new region codes EZ, UN (though names for EZ are not available in languages other than English).
Updated english names for bn/Beng “Bangla”, mic “Mi'kmaq”, or “Odia”.
Documented the use of script subtag “Zxxx” to indicate spoken or otherwise unwritten content.
The set of language and script names for which translations are requested was revamped, leading to a substantial increase in the number of such names.
Substantial new data has been added for likely subtags (e.g. to get the main script for each language).

Opps, should I have opened this issue in the https://github.com/datasets/language-codes repo?

Czech Republic English name

Hi,

the Czech Republic changed its English name to Czechia.

Wikipedia
https://en.wikipedia.org/wiki/Name_of_the_Czech_Republic

Adoption of Czechia
[...]
In 2013, Czech president Miloš Zeman recommended the wider official use of Czechia,[34] and on 14 April 2016, the country's political leadership agreed to make Czechia the official short name. The new name was approved by the Czech cabinet on 2 May 2016[35] as the Czech Republic's official short name and was published in the United Nations UNTERM and UNGEGN country name databases on July 5th, 2016. [36]

Ministry of Foreign Affairs
Short country name "Česko"/"Czechia" to be entered in UN databases
http://www.mzv.cz/jnp/en/issues_and_press/factsheets/x2016_04_21_the_completion_of_translations_of_the.html

It's not in the [UN Statistics](United Nations Statistics Division) list yet (last updated in 2013 it seems), but it is in use e.g. in the UN Treaty collection.

Not sure if it's worth updating/patching but wanted to note it here.

add csv and datapackage.json

http://data.okfn.org/about/contribute#prepare

Latvia currency

Currency for Latvia is out of date, is EUR from 01.01.2014.

http://www.bank.lv/en/eu-and-euro/eu-and-euro

"." not part of TLD

The preceding . isn't part of the TLD

Authoritative example

https://www.iana.org/whois?q=tw

CLDR name for Sark should not be 'Namibia'

GB (UK) & USD lost currencies

Currency codes for GB & US, at least, were lost in the jumbo commit d4e4895.

Too long to check all the edits, so maybe revert?

Include official long names

Potential sources:

https://www.un.int/protocol/sites/www.un.int/files/Protocol%20and%20Liaison%20Service/officialnamesofcountries.pdf

http://www.fao.org/countryprofiles/iso3list/en/

https://www.state.gov/s/inr/rls/4250.htm

https://www.cia.gov/library/publications/the-world-factbook/fields/2142.html

http://untermportal.un.org/UNTERM/download/country

Metadata button not working

http://data.okfn.org/tools/view?url=https%3A%2F%2Fgithub.com%2Fdatasets%2Fcountry-codes#resource-country-codes

For the above dataset, metadata points to:
http://data.okfn.org/tools/country-codes/datapackage.json

Cannot GET /tools/country-codes/datapackage.json

tests or simple assertions on output

as @wodow suggested, it would be great to have some 'smoke tests' or simple assertions to help avoid regressions

one approach could be to save the output of csvkit's csvstat for data/country-codes.csv for the last commit that touches the file and then diff with current output after making a change.

example output for 71eded8 version of data/country-codes.csv is pasted below

these stats would naturally change as various codes change over time, but highlighting per-column changes in Unique values and presence of Nulls would be useful for surfacing regressions

  1. name
    <type 'unicode'>
    Nulls: True
    Unique values: 249
    Max length: 38
  2. official_name_en
    <type 'unicode'>
    Nulls: True
    Unique values: 241
    Max length: 52
  3. official_name_fr
    <type 'unicode'>
    Nulls: True
    Unique values: 241
    Max length: 62
  4. ISO3166-1-Alpha-2
    <type 'unicode'>
    Nulls: True
    Unique values: 248
    Max length: 4
  5. ISO3166-1-Alpha-3
    <type 'unicode'>
    Nulls: True
    Unique values: 249
    Max length: 4
  6. ISO3166-1-numeric
    <type 'unicode'>
    Nulls: False
    Unique values: 251
    Max length: 3
  7. ITU
    <type 'unicode'>
    Nulls: True
    Unique values: 232
    5 most frequent values:
        NOR:    2
        LIE:    1
        EGY:    1
        AGL:    1
        BGD:    1
    Max length: 4
  8. MARC
    <type 'unicode'>
    Nulls: True
    Unique values: 243
    5 most frequent values:
        uik:    3
        gw: 1
        gv: 1
        gu: 1
        gt: 1
    Max length: 14
  9. WMO
    <type 'unicode'>
    Nulls: True
    Unique values: 213
    5 most frequent values:
        BX: 2
        NU: 2
        VI: 2
        AT: 2
        BD: 1
    Max length: 4
 10. DS
    <type 'unicode'>
    Nulls: True
    Unique values: 175
    5 most frequent values:
        F:  10
        USA:    7
        AUS:    5
        NZ: 4
        FIN:    2
    Max length: 4
 11. Dial
    <type 'unicode'>
    Nulls: True
    Unique values: 228
    5 most frequent values:
        44: 4
        672:    3
        262:    3
        590:    3
        61: 3
    Max length: 17
 12. FIFA
    <type 'unicode'>
    Nulls: True
    Unique values: 237
    Max length: 15
 13. FIPS
    <type 'unicode'>
    Nulls: True
    Unique values: 247
    5 most frequent values:
        NL: 2
        BD: 1
        BE: 1
        BF: 1
        BG: 1
    Max length: 26
 14. GAUL
    <type 'int'>
    Nulls: True
    Min: 1
    Max: 91267
    Sum: 245883
    Mean: 1011.86419753
    Median: 145
    Standard Deviation: 7183.59897321
    Unique values: 243
 15. IOC
    <type 'unicode'>
    Nulls: True
    Unique values: 226
    Max length: 4
 16. ISO4217-currency_alphabetic_code
    <type 'unicode'>
    Nulls: True
    Unique values: 149
    5 most frequent values:
        EUR:    33
        USD:    17
        XOF:    8
        XCD:    8
        XAF:    6
    Max length: 4
 17. ISO4217-currency_country_name
    <type 'unicode'>
    Nulls: True
    Unique values: 238
    Max length: 44
 18. ISO4217-currency_minor_unit
    <type 'int'>
    Nulls: True
    Values: 0, 2, 3
 19. ISO4217-currency_name
    <type 'unicode'>
    Nulls: True
    Unique values: 150
    5 most frequent values:
        Euro:   33
        US Dollar:  17
        CFA Franc BCEAO:    8
        East Caribbean Dollar:  8
        CFA Franc BEAC: 6
    Max length: 29
 20. ISO4217-currency_numeric_code
    <type 'unicode'>
    Nulls: True
    Unique values: 149
    5 most frequent values:
        978:    33
        840:    17
        951:    8
        952:    8
        950:    6
    Max length: 4
 21. is_independent
    <type 'unicode'>
    Nulls: True
    Unique values: 18
    5 most frequent values:
        Yes:    195
        Territory of GB:    12
        Part of FR: 8
        Territory of AU:    4
        Part of NL: 4
    Max length: 22
 22. Capital
    <type 'unicode'>
    Nulls: True
    Unique values: 242
    5 most frequent values:
        Kingston:   2
        Kinshasa:   1
        East Jerusalem: 1
        Kiev:   1
        Paris:  1
    Max length: 19
 23. Continent
    <type 'unicode'>
    Nulls: True
    Unique values: 6
    5 most frequent values:
        AF: 58
        AS: 52
        EU: 52
        OC: 27
        SA: 14
    Max length: 4
 24. TLD
    <type 'unicode'>
    Nulls: True
    Unique values: 246
    5 most frequent values:
        .gp:    3
        .cx:    1
        .cy:    1
        .cz:    1
        .ro:    1
    Max length: 4
 25. Languages
    <type 'unicode'>
    Nulls: True
    Unique values: 243
    5 most frequent values:
        fr: 3
        en: 2
        fr-CI:  1
        zh-CN,yue,wuu,dta,ug,za:    1
        ar-TN,fr:   1
    Max length: 89
 26. geonameid
    <type 'int'>
    Nulls: True
    Min: 49518
    Max: 7909807
    Sum: 593982118
    Mean: 2385470.35341
    Median: 2363686
    Standard Deviation: 1541571.57676
    Unique values: 249
 27. EDGAR
    <type 'unicode'>
    Nulls: True
    Unique values: 214
    Max length: 4

Row count: 251

Markdownify README (and rename into README.md)

Would it be possible to get the README as markdown? This is what is currently expected by http://data.okfn.org registry (though that behaviour could be modified to expect README and README.txt ...)

Dial codes of the Dominican Republic

Please divide three dial codes of the Dominican Republic.

Palestine row violates schema, renders dataset unusable.

I'm trying to use parse this dataset using the python datapackage utility. Among other issues (bugs in datapackage), I am unable to deal with the Palestine row due to the fact that there are two values for the GAUL column:

"Palestine, State of","Palestine, État de",PS,PSE,275, ,"gz,wj", , ,970,PLE,"GZ,WE","91,267",PLE,,"PALESTINE, STATE OF",,No universal currency,,In contention

I also see that there are multiple country codes specified (comma-sep), and while that isn't triggering an error (since it's just a "string"), that is going to result in unehlpful data.

If this is conforming to the tabular data spec, I can work with datapackage author to get this resolved there. If not, I'd propose either:

Moving this entry to two rows.
Changing the affected field types to arrays (though this seems quite inelegant, since all other rows would have arrays of a single value and seems to run contrary to the essence of this table)

MYSQL hyphen problem

It's really great that you provide this data to us all - thanks a lot - this helps very much with my project ;)

There is a general problem when using this data with MYSQL statments. Hyphens in column names are causing trouble. Usually you can enclose them in backticks - except when using JOIN command - then there's always an error. No matter if using backticks or brackets to enclose the column name. The only way is to alter columns name to use underscores instead of hyphens. Please consider it in your work.

add TLDs

Delete json file in data directory (should it be there?)

json file in data directory looks a bit incongruous - should it be there

Reorder columns to have name and iso code at the front

Rename to simple name e.g. countries or country-list or country-codes

What do we prefer out of:

countries
country-list
country-codes

Suggest change title to:

List of Countries and Associated Codes including ISO 2/3 digit codes (ISO-3166)

Sort by name

add official languages

Wrong value in Dial field

For Serbia in 'Dial' field code is "381 p". Can you fix this, please.

Several locations with double spaces

the raw CSV contains several locations with double space chars

Maybe there could be a search&replace for those items, e.g.

...,"Chine,  région administrative spéciale de Hong Kong",...

in C# style, I'd write:

str.Replace("  ", " ");

add ITU's legal time

available as .pdf and .doc here: http://www.itu.int/pub/T-SP-LT.1-2013
maybe use antiword to parse .doc

ISO 3166-1 numeric contains invalid values

Hi,

The values in the ISO3166-1-numeric column are actually UN M49 codes (which is where you scrape them from).
These two code lists overlap almost completely, except for the two top ones. Channel Islands and Sark, are not in the ISO 3166-1 standard, but are in UN M49. So maybe it's good to either remove the top ones or change the column name & description.

Include column for Wikidata identifier, suggestion

Wikipedia have stable pages for all countries, and Wikidata supply an ID for it. Today Wikidata IDs are playing important role as "concept identifier", for Web Semantic in general and for open projects like OpenStreetMaps, etc.

Example: BR is https://www.wikidata.org/wiki/Q155 , so the column wd_id of line BR is Q155. With Wikidata API we can fill automatically the wd_id column.

Republic of Macedonia

See https://en.wikipedia.org/wiki/Republic_of_Macedonia

"the former Yugoslav Republic of Macedonia" was provisional, and today not used. Use "Republic of Macedonia" instead.

Capitalized rather than upper case names for countries (?)

Do we want natural capitalized names rather than upper case e.g. United States rather than UNITED STATES

Please consider using customary country names provided by the Unicode CLDR.

Please consider using customary country names instead of the current official names. Customary country names provided by the Unicode CLDR are used widely by major websites including Google.

As a bonus, the Unicode CLDR provides additional country names in many languages, and thus better for wider application of the country-codes.

I have provided a full country name list from the latest Unicode CLDR 25 in the forked version below:
https://github.com/hanteng/country-codes/tree/master/data/country-names

Please consider replacing or adding the names into the country-codes dataset.

More here:
http://people.oii.ox.ac.uk/hanteng/zh/2014/05/28/whats-in-a-name-country-names-are-technical-cultural-and-political/

add colums or another table of official names in each official lang

There are other official langs than en and fr, but there are many.... As #15 suggest, the CSV can include "official langs", so, we can add more columns (or another new CSV) to show name_es, name_pt, etc. for the more used langs.

EDGAR Codes

Apparantely the SEC has its own code system for countries.

https://www.sec.gov/edgar/searchedgar/edgarstatecodes.htm

I can add them if there's consensus that they are useful.

add field types to schema

specify field types (string, integer, number, etc) in schema describing data/country-codes-comprehensive.csv

No date?

Perhaps I'm missing it, and perhaps it isn't viewed as important... but as someone just now finding this resource, my very first question was "WHEN"? I have no idea if this data is from yesterday or from 1990. I just don't see a date anywhere. So just a suggestion that a "last updated" could be useful on the download page, if not the file itself.

Rename short_name_en and short_name_fr to name and name_fr

short_name_en to name
short_name_fr to name_fr
currency_short_name_en to currency_country_name (?)

BTW: what would long names be?

[1] http://www.currency-iso.org/dam/downloads/lists/list_one.xml

Importer fails because datapackage descriptor doesn't match CSV

Looks like fix for #47 broke ability for this dataset to be loaded (at least by the python datapackage), since CSV column name no longer matches what's in the datapackage.json file.

Adding continent codes for countries

@ewheeler It would be nice adding the two digit continent code to the country-codes. We could retrieve data about countries and their continents from here
@rgrp , @pdehaye would you mind leaving your thoughts here.