Giter Club home page Giter Club logo

country-codes's Introduction

goodtables.io

Comprehensive country code information, including ISO 3166 codes, ITU dialing codes, ISO 4217 currency codes, and many others. Provided as a Tabular Data Package: view datapackage

Data

Data are fetched from multiple sources:

Official formal and short names (in English, French, Spanish, Arabic, Chinese, and Russian) are from United Nations Protocol and Liaison Service

Customary English short names are from Unicode Common Locale Data Repository (CLDR) Project.

Note: CLDR shorter names "ZZ-alt-short" are used when available

ISO 3166 official short names (in English, French, Spanish, Arabic, Chinese, and Russian) are from United Nations Department of Ecoonomic and Social Affairs Statistics Division

ISO 4217 currency codes are from currency-iso.org

Many other country codes are from statoids.com

Special thanks to Gwillim Law for his excellent statoids.com site (some of the field descriptions are excerpted from his site), which is more up-to-date than most similar resources and is much easier to scrape than multiple Wikipedia pages.

Capital cities, languages, continents, TLDs, and geonameid are from geonames.org

EDGAR codes are from sec.gov

Preparation

This package includes Python scripts to fetch current country information from various data sources and output CSV of combined country code information.

CSV output is provided via the in2csv and csvcut utilities from csvkit

NOTE/TODO: currently, preparation requires manual process to download and rename 6 CSV files from https://unstats.un.org/unsd/methodology/m49/overview/

data/country-codes.csv

Install requirements:

pip install -r scripts/requirements.pip

Run GNU Make to generate data file:

make country-codes.csv

License

This material is licensed by its maintainers under the Public Domain Dedication and License.

Nevertheless, it should be noted that this material is ultimately sourced from ISO and other standards bodies and their rights and licensing policies are somewhat unclear. As this is a short, simple database of facts there is a strong argument that no rights can subsist in this collection. However, ISO state on their site:

ISO makes the list of alpha-2 country codes available for internal use and non-commercial purposes free of charge.

This carries the implication (though not spelled out) that other uses are not permitted and that, therefore, there may be rights preventing further general use and reuse.

If you intended to use these data in a public or commercial product, please check the original sources for any specific restrictions.

country-codes's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

country-codes's Issues

Use Goodtables to continuous data validation

Suggestion to use Goodtables.io. See example of datasets-br/state-codes, that is using. You can run offline by goodtables-py#cli.

Warnings

  • Data Package "datapackage.json" has a validation error Descriptor validation error: 'title' is a required property at "sources/0" in descriptor and at "properties/sources/items/required" in profile
  • ... same warning for "sources/1", "sources/2"... "sources/6".

Errors

The datatype of column ISO4217-currency_minor_uni must be an array or list, idem ISO4217-currency_numeric_code. The type "number" is about one number.

[27,12] [type-or-format-error] The value "2,2" in row 27 and column 12 is not type "number" and format "default"
[27,14] [type-or-format-error] The value "356,064" in row 27 and column 14 is not type "number" and format "default"
[59,12] [type-or-format-error] The value "2,2" in row 59 and column 12 is not type "number" and format "default"
[59,14] [type-or-format-error] The value "192,931" in row 59 and column 14 is not type "number" and format "default"
[72,12] [type-or-format-error] The value "2,2" in row 72 and column 12 is not type "number" and format "default"
[72,14] [type-or-format-error] The value "222,840" in row 72 and column 14 is not type "number" and format "default"
[101,12] [type-or-format-error] The value "2,2" in row 101 and column 12 is not type "number" and format "default"
[101,14] [type-or-format-error] The value "332,840" in row 101 and column 14 is not type "number" and format "default"
[127,12] [type-or-format-error] The value "2,2" in row 127 and column 12 is not type "number" and format "default"
[127,14] [type-or-format-error] The value "426,710" in row 127 and column 14 is not type "number" and format "default"
[153,12] [type-or-format-error] The value "2,2" in row 153 and column 12 is not type "number" and format "default"
[153,14] [type-or-format-error] The value "516,710" in row 153 and column 14 is not type "number" and format "default"
[169,12] [type-or-format-error] The value "2,2" in row 169 and column 12 is not type "number" and format "default"
[169,14] [type-or-format-error] The value "590,840" in row 169 and column 14 is not type "number" and format "default"

PS: please change the URL http://data.okfn.org/data/country-codes (at this project's home) to a correct one.

Validation errors

$ goodtables datapackage http://data.okfn.org/data/core/country-codes/datapackage.json
DATASEThttps://github.com/frictionlessdata/datapackage-py/issues/122
=======
{'error-count': 2, 'table-count': 1, 'time': 4.537, 'valid': False}

TABLE [1]
=========
{'datapackage': 'http://data.okfn.org/data/core/country-codes/datapackage.json',
 'error-count': 2,
 'headers': ['name',
             'official_name_en',
             'official_name_fr',
             'ISO3166-1-Alpha-2',
             'ISO3166-1-Alpha-3',
             'ISO3166-1-numeric',
             'ITU',
             'MARC',
             'WMO',
             'DS',
             'Dial',
             'FIFA',
             'FIPS',
             'GAUL',
             'IOC',
             'ISO4217-currency_alphabetic_code',
             'ISO4217-currency_country_name',
             'ISO4217-currency_minor_unit',
             'ISO4217-currency_name',
             'ISO4217-currency_numeric_code',
             'is_independent',
             'Capital',
             'Continent',
             'TLD',
             'Languages',
             'geonameid',
             'EDGAR'],
 'row-count': 252,
 'source': 'https://raw.github.com/datasets/country-codes/master/data/country-codes.csv',
 'time': 3.752,
 'valid': False}
---------
[-,6] [non-matching-header] Header in column 6 doesn't match field name UN Statistics M49 numeric codes
[-,26] [non-matching-header] Header in column 26 doesn't match field name Geoname ID

from frictionlessdata/datapackage-py#122

get_countries_of_earth.py fails since ISO removed iso3166 list

I want to add TLDs and ITU designation in English (from here), but since ISO no longer provides iso3166 lists for free -- ./scripts/get_countries_of_earth.py fails.
Can I expect that list-en1-semic-3.txt & list-fr1-semic.txt are already downloaded or script should be fixed (add files to repo/change it fetch info from other source like maxmind)?
So how should I proceed with editing get_countries_of_earth.py?

Add official geographical relations

Formal and official lists that define usual "spatial location" and "spatial relations":

  • list of UTM zones of the country, or list of "UTM grid cells", as even big countries have maximum of 30 or 40 cells. It is util to validate some geographical data, and to roughly define the geographical area/context of the country. Example: Bolivia (BO) have the UTM cells {19L,20L,19K,20K,21K}
    PS: this suggestion is a "spatial version" of the "time suggestion #5".
  • geographical summary of spatial relations, are the list of "neighbor countries", as a simple iso-2letter codes. Example: Bolivia (BO) is bordered by Argentina, Brazil, ..., that is, the set {AR,BR,CL,PE,PY}. We can adopt as "formal official neighbor" the _Touches_ operator of the DE-9IM, that is a OGC SQL standard.

CLDR v30.0.3 released

CLDR v30.0.3 was released on 2016-12-02 (http://cldr.unicode.org/index/downloads/cldr-30) and includes the following relevant changes:

  • New script codes for Adlam, Bhaiksuki, Marchen, Newa, Osage
  • Some support for new region codes EZ, UN (though names for EZ are not available in languages other than English).
  • Updated english names for bn/Beng “Bangla”, mic “Mi'kmaq”, or “Odia”.
  • Documented the use of script subtag “Zxxx” to indicate spoken or otherwise unwritten content.
  • The set of language and script names for which translations are requested was revamped, leading to a substantial increase in the number of such names.
  • Substantial new data has been added for likely subtags (e.g. to get the main script for each language).

Opps, should I have opened this issue in the https://github.com/datasets/language-codes repo?

Czech Republic English name

Hi,

the Czech Republic changed its English name to Czechia.

Wikipedia
https://en.wikipedia.org/wiki/Name_of_the_Czech_Republic

Adoption of Czechia
[...]
In 2013, Czech president Miloš Zeman recommended the wider official use of Czechia,[34] and on 14 April 2016, the country's political leadership agreed to make Czechia the official short name. The new name was approved by the Czech cabinet on 2 May 2016[35] as the Czech Republic's official short name and was published in the United Nations UNTERM and UNGEGN country name databases on July 5th, 2016. [36]

Ministry of Foreign Affairs
Short country name "Česko"/"Czechia" to be entered in UN databases
http://www.mzv.cz/jnp/en/issues_and_press/factsheets/x2016_04_21_the_completion_of_translations_of_the.html

It's not in the [UN Statistics](United Nations Statistics Division) list yet (last updated in 2013 it seems), but it is in use e.g. in the UN Treaty collection.

Not sure if it's worth updating/patching but wanted to note it here.

tests or simple assertions on output

as @wodow suggested, it would be great to have some 'smoke tests' or simple assertions to help avoid regressions

one approach could be to save the output of csvkit's csvstat for data/country-codes.csv for the last commit that touches the file and then diff with current output after making a change.

example output for 71eded8 version of data/country-codes.csv is pasted below

these stats would naturally change as various codes change over time, but highlighting per-column changes in Unique values and presence of Nulls would be useful for surfacing regressions

  1. name
    <type 'unicode'>
    Nulls: True
    Unique values: 249
    Max length: 38
  2. official_name_en
    <type 'unicode'>
    Nulls: True
    Unique values: 241
    Max length: 52
  3. official_name_fr
    <type 'unicode'>
    Nulls: True
    Unique values: 241
    Max length: 62
  4. ISO3166-1-Alpha-2
    <type 'unicode'>
    Nulls: True
    Unique values: 248
    Max length: 4
  5. ISO3166-1-Alpha-3
    <type 'unicode'>
    Nulls: True
    Unique values: 249
    Max length: 4
  6. ISO3166-1-numeric
    <type 'unicode'>
    Nulls: False
    Unique values: 251
    Max length: 3
  7. ITU
    <type 'unicode'>
    Nulls: True
    Unique values: 232
    5 most frequent values:
        NOR:    2
        LIE:    1
        EGY:    1
        AGL:    1
        BGD:    1
    Max length: 4
  8. MARC
    <type 'unicode'>
    Nulls: True
    Unique values: 243
    5 most frequent values:
        uik:    3
        gw: 1
        gv: 1
        gu: 1
        gt: 1
    Max length: 14
  9. WMO
    <type 'unicode'>
    Nulls: True
    Unique values: 213
    5 most frequent values:
        BX: 2
        NU: 2
        VI: 2
        AT: 2
        BD: 1
    Max length: 4
 10. DS
    <type 'unicode'>
    Nulls: True
    Unique values: 175
    5 most frequent values:
        F:  10
        USA:    7
        AUS:    5
        NZ: 4
        FIN:    2
    Max length: 4
 11. Dial
    <type 'unicode'>
    Nulls: True
    Unique values: 228
    5 most frequent values:
        44: 4
        672:    3
        262:    3
        590:    3
        61: 3
    Max length: 17
 12. FIFA
    <type 'unicode'>
    Nulls: True
    Unique values: 237
    Max length: 15
 13. FIPS
    <type 'unicode'>
    Nulls: True
    Unique values: 247
    5 most frequent values:
        NL: 2
        BD: 1
        BE: 1
        BF: 1
        BG: 1
    Max length: 26
 14. GAUL
    <type 'int'>
    Nulls: True
    Min: 1
    Max: 91267
    Sum: 245883
    Mean: 1011.86419753
    Median: 145
    Standard Deviation: 7183.59897321
    Unique values: 243
 15. IOC
    <type 'unicode'>
    Nulls: True
    Unique values: 226
    Max length: 4
 16. ISO4217-currency_alphabetic_code
    <type 'unicode'>
    Nulls: True
    Unique values: 149
    5 most frequent values:
        EUR:    33
        USD:    17
        XOF:    8
        XCD:    8
        XAF:    6
    Max length: 4
 17. ISO4217-currency_country_name
    <type 'unicode'>
    Nulls: True
    Unique values: 238
    Max length: 44
 18. ISO4217-currency_minor_unit
    <type 'int'>
    Nulls: True
    Values: 0, 2, 3
 19. ISO4217-currency_name
    <type 'unicode'>
    Nulls: True
    Unique values: 150
    5 most frequent values:
        Euro:   33
        US Dollar:  17
        CFA Franc BCEAO:    8
        East Caribbean Dollar:  8
        CFA Franc BEAC: 6
    Max length: 29
 20. ISO4217-currency_numeric_code
    <type 'unicode'>
    Nulls: True
    Unique values: 149
    5 most frequent values:
        978:    33
        840:    17
        951:    8
        952:    8
        950:    6
    Max length: 4
 21. is_independent
    <type 'unicode'>
    Nulls: True
    Unique values: 18
    5 most frequent values:
        Yes:    195
        Territory of GB:    12
        Part of FR: 8
        Territory of AU:    4
        Part of NL: 4
    Max length: 22
 22. Capital
    <type 'unicode'>
    Nulls: True
    Unique values: 242
    5 most frequent values:
        Kingston:   2
        Kinshasa:   1
        East Jerusalem: 1
        Kiev:   1
        Paris:  1
    Max length: 19
 23. Continent
    <type 'unicode'>
    Nulls: True
    Unique values: 6
    5 most frequent values:
        AF: 58
        AS: 52
        EU: 52
        OC: 27
        SA: 14
    Max length: 4
 24. TLD
    <type 'unicode'>
    Nulls: True
    Unique values: 246
    5 most frequent values:
        .gp:    3
        .cx:    1
        .cy:    1
        .cz:    1
        .ro:    1
    Max length: 4
 25. Languages
    <type 'unicode'>
    Nulls: True
    Unique values: 243
    5 most frequent values:
        fr: 3
        en: 2
        fr-CI:  1
        zh-CN,yue,wuu,dta,ug,za:    1
        ar-TN,fr:   1
    Max length: 89
 26. geonameid
    <type 'int'>
    Nulls: True
    Min: 49518
    Max: 7909807
    Sum: 593982118
    Mean: 2385470.35341
    Median: 2363686
    Standard Deviation: 1541571.57676
    Unique values: 249
 27. EDGAR
    <type 'unicode'>
    Nulls: True
    Unique values: 214
    Max length: 4

Row count: 251

Palestine row violates schema, renders dataset unusable.

I'm trying to use parse this dataset using the python datapackage utility. Among other issues (bugs in datapackage), I am unable to deal with the Palestine row due to the fact that there are two values for the GAUL column:

"Palestine, State of","Palestine, État de",PS,PSE,275, ,"gz,wj", , ,970,PLE,"GZ,WE","91,267",PLE,,"PALESTINE, STATE OF",,No universal currency,,In contention

I also see that there are multiple country codes specified (comma-sep), and while that isn't triggering an error (since it's just a "string"), that is going to result in unehlpful data.

If this is conforming to the tabular data spec, I can work with datapackage author to get this resolved there. If not, I'd propose either:

  1. Moving this entry to two rows.
  2. Changing the affected field types to arrays (though this seems quite inelegant, since all other rows would have arrays of a single value and seems to run contrary to the essence of this table)

MYSQL hyphen problem

It's really great that you provide this data to us all - thanks a lot - this helps very much with my project ;)

There is a general problem when using this data with MYSQL statments. Hyphens in column names are causing trouble. Usually you can enclose them in backticks - except when using JOIN command - then there's always an error. No matter if using backticks or brackets to enclose the column name. The only way is to alter columns name to use underscores instead of hyphens. Please consider it in your work.

Several locations with double spaces

the raw CSV contains several locations with double space chars

Maybe there could be a search&replace for those items, e.g.

...,"Chine,  région administrative spéciale de Hong Kong",...

in C# style, I'd write:

str.Replace("  ", " ");

ISO 3166-1 numeric contains invalid values

Hi,

The values in the ISO3166-1-numeric column are actually UN M49 codes (which is where you scrape them from).
These two code lists overlap almost completely, except for the two top ones. Channel Islands and Sark, are not in the ISO 3166-1 standard, but are in UN M49. So maybe it's good to either remove the top ones or change the column name & description.

Include column for Wikidata identifier, suggestion

Wikipedia have stable pages for all countries, and Wikidata supply an ID for it. Today Wikidata IDs are playing important role as "concept identifier", for Web Semantic in general and for open projects like OpenStreetMaps, etc.

Example: BR is https://www.wikidata.org/wiki/Q155 , so the column wd_id of line BR is Q155. With Wikidata API we can fill automatically the wd_id column.

Please consider using customary country names provided by the Unicode CLDR.

Please consider using customary country names instead of the current official names. Customary country names provided by the Unicode CLDR are used widely by major websites including Google.

As a bonus, the Unicode CLDR provides additional country names in many languages, and thus better for wider application of the country-codes.

I have provided a full country name list from the latest Unicode CLDR 25 in the forked version below:
https://github.com/hanteng/country-codes/tree/master/data/country-names

Please consider replacing or adding the names into the country-codes dataset.

More here:
http://people.oii.ox.ac.uk/hanteng/zh/2014/05/28/whats-in-a-name-country-names-are-technical-cultural-and-political/

add field types to schema

specify field types (string, integer, number, etc) in schema describing data/country-codes-comprehensive.csv

No date?

Perhaps I'm missing it, and perhaps it isn't viewed as important... but as someone just now finding this resource, my very first question was "WHEN"? I have no idea if this data is from yesterday or from 1990. I just don't see a date anywhere. So just a suggestion that a "last updated" could be useful on the download page, if not the file itself.

ISO3166-1-Alpha-2 for Namibia is missing

Namibia's ISO 2char code is 'NA' but missing in the data.

Probably, pandas read_csv was used somewhere in the processing, which interprets 'NA' as missing by default (use keep_default_na=False to prevent this).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.