datasets / country-codes Goto Github PK

Comprehensive country code information, including ISO 3166 codes, ITU dialing codes, ISO 4217 currency codes, and many others

Home Page: https://datahub.io/core/country-codes

Python 89.99% Makefile 10.01%

country-codes's Issues

Latvia currency

Currency for Latvia is out of date, is EUR from 01.01.2014.

http://www.bank.lv/en/eu-and-euro/eu-and-euro

Reorder columns to have name and iso code at the front

CLDR name for Sark should not be 'Namibia'

Add official geographical relations

Formal and official lists that define usual "spatial location" and "spatial relations":

list of UTM zones of the country, or list of "UTM grid cells", as even big countries have maximum of 30 or 40 cells. It is util to validate some geographical data, and to roughly define the geographical area/context of the country. Example: Bolivia (BO) have the UTM cells {19L,20L,19K,20K,21K}
PS: this suggestion is a "spatial version" of the "time suggestion #5".
geographical summary of spatial relations, are the list of "neighbor countries", as a simple iso-2letter codes. Example: Bolivia (BO) is bordered by Argentina, Brazil, ..., that is, the set {AR,BR,CL,PE,PY}. We can adopt as "formal official neighbor" the _Touches_ operator of the DE-9IM, that is a OGC SQL standard.

flags

Just wondering if https://github.com/lipis/flag-icon-css can be merged in some way...

Include official long names

Potential sources:

https://www.un.int/protocol/sites/www.un.int/files/Protocol%20and%20Liaison%20Service/officialnamesofcountries.pdf

http://www.fao.org/countryprofiles/iso3list/en/

https://www.state.gov/s/inr/rls/4250.htm

https://www.cia.gov/library/publications/the-world-factbook/fields/2142.html

http://untermportal.un.org/UNTERM/download/country

Add a license

I suggest using the public domain dedication and license (PDDL). For example JSON for this for datapackage.json see http://data.okfn.org/tools/dp/create.json

Update link in repository description

The link in the repository description (http://data.okfn.org/data/country-codes) now redirects to https://datahub.io/

Should update the link to either

http://data.okfn.org/tools/view?url=https%3A%2F%2Fgithub.com%2Fdatasets%2Fcountry-codes#resource-country-codes

https://datahub.io/core/country-codes (however this one does not seem to update automatically to reflect repo changes)

Rename short_name_en and short_name_fr to name and name_fr

short_name_en to name
short_name_fr to name_fr
currency_short_name_en to currency_country_name (?)

BTW: what would long names be?

Metadata button not working

http://data.okfn.org/tools/view?url=https%3A%2F%2Fgithub.com%2Fdatasets%2Fcountry-codes#resource-country-codes

For the above dataset, metadata points to:
http://data.okfn.org/tools/country-codes/datapackage.json

Cannot GET /tools/country-codes/datapackage.json

ISO3166-1-Alpha-2 for Namibia is missing

Namibia's ISO 2char code is 'NA' but missing in the data.

Probably, pandas read_csv was used somewhere in the processing, which interprets 'NA' as missing by default (use keep_default_na=False to prevent this).

add csv and datapackage.json

http://data.okfn.org/about/contribute#prepare

Adding continent codes for countries

@ewheeler It would be nice adding the two digit continent code to the country-codes. We could retrieve data about countries and their continents from here
@rgrp , @pdehaye would you mind leaving your thoughts here.

Rename to simple name e.g. countries or country-list or country-codes

What do we prefer out of:

countries
country-list
country-codes

Suggest change title to:

List of Countries and Associated Codes including ISO 2/3 digit codes (ISO-3166)

add colums or another table of official names in each official lang

There are other official langs than en and fr, but there are many.... As #15 suggest, the CSV can include "official langs", so, we can add more columns (or another new CSV) to show name_es, name_pt, etc. for the more used langs.

add field types to schema

specify field types (string, integer, number, etc) in schema describing data/country-codes-comprehensive.csv

No date?

Perhaps I'm missing it, and perhaps it isn't viewed as important... but as someone just now finding this resource, my very first question was "WHEN"? I have no idea if this data is from yesterday or from 1990. I just don't see a date anywhere. So just a suggestion that a "last updated" could be useful on the download page, if not the file itself.

Palestine row violates schema, renders dataset unusable.

I'm trying to use parse this dataset using the python datapackage utility. Among other issues (bugs in datapackage), I am unable to deal with the Palestine row due to the fact that there are two values for the GAUL column:

"Palestine, State of","Palestine, État de",PS,PSE,275, ,"gz,wj", , ,970,PLE,"GZ,WE","91,267",PLE,,"PALESTINE, STATE OF",,No universal currency,,In contention

I also see that there are multiple country codes specified (comma-sep), and while that isn't triggering an error (since it's just a "string"), that is going to result in unehlpful data.

If this is conforming to the tabular data spec, I can work with datapackage author to get this resolved there. If not, I'd propose either:

Moving this entry to two rows.
Changing the affected field types to arrays (though this seems quite inelegant, since all other rows would have arrays of a single value and seems to run contrary to the essence of this table)

Kosovo missing

https://en.wikipedia.org/wiki/Kosovo

Czech Republic English name

Hi,

the Czech Republic changed its English name to Czechia.

Wikipedia
https://en.wikipedia.org/wiki/Name_of_the_Czech_Republic

Adoption of Czechia
[...]
In 2013, Czech president Miloš Zeman recommended the wider official use of Czechia,[34] and on 14 April 2016, the country's political leadership agreed to make Czechia the official short name. The new name was approved by the Czech cabinet on 2 May 2016[35] as the Czech Republic's official short name and was published in the United Nations UNTERM and UNGEGN country name databases on July 5th, 2016. [36]

Ministry of Foreign Affairs
Short country name "Česko"/"Czechia" to be entered in UN databases
http://www.mzv.cz/jnp/en/issues_and_press/factsheets/x2016_04_21_the_completion_of_translations_of_the.html

It's not in the [UN Statistics](United Nations Statistics Division) list yet (last updated in 2013 it seems), but it is in use e.g. in the UN Treaty collection.

Not sure if it's worth updating/patching but wanted to note it here.

EDGAR Codes

Apparantely the SEC has its own code system for countries.

https://www.sec.gov/edgar/searchedgar/edgarstatecodes.htm

I can add them if there's consensus that they are useful.

add official languages

CLDR v30.0.3 released

CLDR v30.0.3 was released on 2016-12-02 (http://cldr.unicode.org/index/downloads/cldr-30) and includes the following relevant changes:

New script codes for Adlam, Bhaiksuki, Marchen, Newa, Osage
Some support for new region codes EZ, UN (though names for EZ are not available in languages other than English).
Updated english names for bn/Beng “Bangla”, mic “Mi'kmaq”, or “Odia”.
Documented the use of script subtag “Zxxx” to indicate spoken or otherwise unwritten content.
The set of language and script names for which translations are requested was revamped, leading to a substantial increase in the number of such names.
Substantial new data has been added for likely subtags (e.g. to get the main script for each language).

Opps, should I have opened this issue in the https://github.com/datasets/language-codes repo?

Dial codes of the Dominican Republic

Please divide three dial codes of the Dominican Republic.

Republic of Macedonia

See https://en.wikipedia.org/wiki/Republic_of_Macedonia

"the former Yugoslav Republic of Macedonia" was provisional, and today not used. Use "Republic of Macedonia" instead.

Wrong value in Dial field

For Serbia in 'Dial' field code is "381 p". Can you fix this, please.

Delete json file in data directory (should it be there?)

json file in data directory looks a bit incongruous - should it be there

Only one currency included for each country

The entry for Namibia has ZAR incorrectly listed as the primary currency, presumably due to the country appearing twice in the official source data [1] (as ZAR is legal tender there):

How should the import script handle this?

[1] http://www.currency-iso.org/dam/downloads/lists/list_one.xml

Use Goodtables to continuous data validation

Suggestion to use Goodtables.io. See example of datasets-br/state-codes, that is using. You can run offline by goodtables-py#cli.

Warnings

Data Package "datapackage.json" has a validation error Descriptor validation error: 'title' is a required property at "sources/0" in descriptor and at "properties/sources/items/required" in profile
... same warning for "sources/1", "sources/2"... "sources/6".

Errors

The datatype of column ISO4217-currency_minor_uni must be an array or list, idem ISO4217-currency_numeric_code. The type "number" is about one number.

[27,12] [type-or-format-error] The value "2,2" in row 27 and column 12 is not type "number" and format "default"
[27,14] [type-or-format-error] The value "356,064" in row 27 and column 14 is not type "number" and format "default"
[59,12] [type-or-format-error] The value "2,2" in row 59 and column 12 is not type "number" and format "default"
[59,14] [type-or-format-error] The value "192,931" in row 59 and column 14 is not type "number" and format "default"
[72,12] [type-or-format-error] The value "2,2" in row 72 and column 12 is not type "number" and format "default"
[72,14] [type-or-format-error] The value "222,840" in row 72 and column 14 is not type "number" and format "default"
[101,12] [type-or-format-error] The value "2,2" in row 101 and column 12 is not type "number" and format "default"
[101,14] [type-or-format-error] The value "332,840" in row 101 and column 14 is not type "number" and format "default"
[127,12] [type-or-format-error] The value "2,2" in row 127 and column 12 is not type "number" and format "default"
[127,14] [type-or-format-error] The value "426,710" in row 127 and column 14 is not type "number" and format "default"
[153,12] [type-or-format-error] The value "2,2" in row 153 and column 12 is not type "number" and format "default"
[153,14] [type-or-format-error] The value "516,710" in row 153 and column 14 is not type "number" and format "default"
[169,12] [type-or-format-error] The value "2,2" in row 169 and column 12 is not type "number" and format "default"
[169,14] [type-or-format-error] The value "590,840" in row 169 and column 14 is not type "number" and format "default"

PS: please change the URL http://data.okfn.org/data/country-codes (at this project's home) to a correct one.

Markdownify README (and rename into README.md)

Would it be possible to get the README as markdown? This is what is currently expected by http://data.okfn.org registry (though that behaviour could be modified to expect README and README.txt ...)

Importer fails because datapackage descriptor doesn't match CSV

Looks like fix for #47 broke ability for this dataset to be loaded (at least by the python datapackage), since CSV column name no longer matches what's in the datapackage.json file.

GAUL column for "Palestine, State of" is not an integer

Value is 91 267 (with a space between 91 and 267), causing int parsing to fail in MATLAB

tests or simple assertions on output

as @wodow suggested, it would be great to have some 'smoke tests' or simple assertions to help avoid regressions

one approach could be to save the output of csvkit's csvstat for data/country-codes.csv for the last commit that touches the file and then diff with current output after making a change.

example output for 71eded8 version of data/country-codes.csv is pasted below

these stats would naturally change as various codes change over time, but highlighting per-column changes in Unique values and presence of Nulls would be useful for surfacing regressions

  1. name
    <type 'unicode'>
    Nulls: True
    Unique values: 249
    Max length: 38
  2. official_name_en
    <type 'unicode'>
    Nulls: True
    Unique values: 241
    Max length: 52
  3. official_name_fr
    <type 'unicode'>
    Nulls: True
    Unique values: 241
    Max length: 62
  4. ISO3166-1-Alpha-2
    <type 'unicode'>
    Nulls: True
    Unique values: 248
    Max length: 4
  5. ISO3166-1-Alpha-3
    <type 'unicode'>
    Nulls: True
    Unique values: 249
    Max length: 4
  6. ISO3166-1-numeric
    <type 'unicode'>
    Nulls: False
    Unique values: 251
    Max length: 3
  7. ITU
    <type 'unicode'>
    Nulls: True
    Unique values: 232
    5 most frequent values:
        NOR:    2
        LIE:    1
        EGY:    1
        AGL:    1
        BGD:    1
    Max length: 4
  8. MARC
    <type 'unicode'>
    Nulls: True
    Unique values: 243
    5 most frequent values:
        uik:    3
        gw: 1
        gv: 1
        gu: 1
        gt: 1
    Max length: 14
  9. WMO
    <type 'unicode'>
    Nulls: True
    Unique values: 213
    5 most frequent values:
        BX: 2
        NU: 2
        VI: 2
        AT: 2
        BD: 1
    Max length: 4
 10. DS
    <type 'unicode'>
    Nulls: True
    Unique values: 175
    5 most frequent values:
        F:  10
        USA:    7
        AUS:    5
        NZ: 4
        FIN:    2
    Max length: 4
 11. Dial
    <type 'unicode'>
    Nulls: True
    Unique values: 228
    5 most frequent values:
        44: 4
        672:    3
        262:    3
        590:    3
        61: 3
    Max length: 17
 12. FIFA
    <type 'unicode'>
    Nulls: True
    Unique values: 237
    Max length: 15
 13. FIPS
    <type 'unicode'>
    Nulls: True
    Unique values: 247
    5 most frequent values:
        NL: 2
        BD: 1
        BE: 1
        BF: 1
        BG: 1
    Max length: 26
 14. GAUL
    <type 'int'>
    Nulls: True
    Min: 1
    Max: 91267
    Sum: 245883
    Mean: 1011.86419753
    Median: 145
    Standard Deviation: 7183.59897321
    Unique values: 243
 15. IOC
    <type 'unicode'>
    Nulls: True
    Unique values: 226
    Max length: 4
 16. ISO4217-currency_alphabetic_code
    <type 'unicode'>
    Nulls: True
    Unique values: 149
    5 most frequent values:
        EUR:    33
        USD:    17
        XOF:    8
        XCD:    8
        XAF:    6
    Max length: 4
 17. ISO4217-currency_country_name
    <type 'unicode'>
    Nulls: True
    Unique values: 238
    Max length: 44
 18. ISO4217-currency_minor_unit
    <type 'int'>
    Nulls: True
    Values: 0, 2, 3
 19. ISO4217-currency_name
    <type 'unicode'>
    Nulls: True
    Unique values: 150
    5 most frequent values:
        Euro:   33
        US Dollar:  17
        CFA Franc BCEAO:    8
        East Caribbean Dollar:  8
        CFA Franc BEAC: 6
    Max length: 29
 20. ISO4217-currency_numeric_code
    <type 'unicode'>
    Nulls: True
    Unique values: 149
    5 most frequent values:
        978:    33
        840:    17
        951:    8
        952:    8
        950:    6
    Max length: 4
 21. is_independent
    <type 'unicode'>
    Nulls: True
    Unique values: 18
    5 most frequent values:
        Yes:    195
        Territory of GB:    12
        Part of FR: 8
        Territory of AU:    4
        Part of NL: 4
    Max length: 22
 22. Capital
    <type 'unicode'>
    Nulls: True
    Unique values: 242
    5 most frequent values:
        Kingston:   2
        Kinshasa:   1
        East Jerusalem: 1
        Kiev:   1
        Paris:  1
    Max length: 19
 23. Continent
    <type 'unicode'>
    Nulls: True
    Unique values: 6
    5 most frequent values:
        AF: 58
        AS: 52
        EU: 52
        OC: 27
        SA: 14
    Max length: 4
 24. TLD
    <type 'unicode'>
    Nulls: True
    Unique values: 246
    5 most frequent values:
        .gp:    3
        .cx:    1
        .cy:    1
        .cz:    1
        .ro:    1
    Max length: 4
 25. Languages
    <type 'unicode'>
    Nulls: True
    Unique values: 243
    5 most frequent values:
        fr: 3
        en: 2
        fr-CI:  1
        zh-CN,yue,wuu,dta,ug,za:    1
        ar-TN,fr:   1
    Max length: 89
 26. geonameid
    <type 'int'>
    Nulls: True
    Min: 49518
    Max: 7909807
    Sum: 593982118
    Mean: 2385470.35341
    Median: 2363686
    Standard Deviation: 1541571.57676
    Unique values: 249
 27. EDGAR
    <type 'unicode'>
    Nulls: True
    Unique values: 214
    Max length: 4

Row count: 251

Republic of Kosovo is missing in the list

https://en.wikipedia.org/wiki/Kosovo

MYSQL hyphen problem

It's really great that you provide this data to us all - thanks a lot - this helps very much with my project ;)

There is a general problem when using this data with MYSQL statments. Hyphens in column names are causing trouble. Usually you can enclose them in backticks - except when using JOIN command - then there's always an error. No matter if using backticks or brackets to enclose the column name. The only way is to alter columns name to use underscores instead of hyphens. Please consider it in your work.

"." not part of TLD

The preceding . isn't part of the TLD

Authoritative example

https://www.iana.org/whois?q=tw

add ITU's legal time

available as .pdf and .doc here: http://www.itu.int/pub/T-SP-LT.1-2013
maybe use antiword to parse .doc

Include column for Wikidata identifier, suggestion

Wikipedia have stable pages for all countries, and Wikidata supply an ID for it. Today Wikidata IDs are playing important role as "concept identifier", for Web Semantic in general and for open projects like OpenStreetMaps, etc.

Example: BR is https://www.wikidata.org/wiki/Q155 , so the column wd_id of line BR is Q155. With Wikidata API we can fill automatically the wd_id column.

Validation errors

$ goodtables datapackage http://data.okfn.org/data/core/country-codes/datapackage.json
DATASEThttps://github.com/frictionlessdata/datapackage-py/issues/122
=======
{'error-count': 2, 'table-count': 1, 'time': 4.537, 'valid': False}

TABLE [1]
=========
{'datapackage': 'http://data.okfn.org/data/core/country-codes/datapackage.json',
 'error-count': 2,
 'headers': ['name',
             'official_name_en',
             'official_name_fr',
             'ISO3166-1-Alpha-2',
             'ISO3166-1-Alpha-3',
             'ISO3166-1-numeric',
             'ITU',
             'MARC',
             'WMO',
             'DS',
             'Dial',
             'FIFA',
             'FIPS',
             'GAUL',
             'IOC',
             'ISO4217-currency_alphabetic_code',
             'ISO4217-currency_country_name',
             'ISO4217-currency_minor_unit',
             'ISO4217-currency_name',
             'ISO4217-currency_numeric_code',
             'is_independent',
             'Capital',
             'Continent',
             'TLD',
             'Languages',
             'geonameid',
             'EDGAR'],
 'row-count': 252,
 'source': 'https://raw.github.com/datasets/country-codes/master/data/country-codes.csv',
 'time': 3.752,
 'valid': False}
---------
[-,6] [non-matching-header] Header in column 6 doesn't match field name UN Statistics M49 numeric codes
[-,26] [non-matching-header] Header in column 26 doesn't match field name Geoname ID

from frictionlessdata/datapackage-py#122

ISO 3166-1 numeric contains invalid values

Hi,

The values in the ISO3166-1-numeric column are actually UN M49 codes (which is where you scrape them from).
These two code lists overlap almost completely, except for the two top ones. Channel Islands and Sark, are not in the ISO 3166-1 standard, but are in UN M49. So maybe it's good to either remove the top ones or change the column name & description.

Difference between entity and name

What's difference between these 2 columns?

Deprecate country-codes in favour of this

this is identical for iso 3166-2 and is more comprehensive. Suggest we deprecate country-codes for this repo

get_countries_of_earth.py fails since ISO removed iso3166 list

I want to add TLDs and ITU designation in English (from here), but since ISO no longer provides iso3166 lists for free -- ./scripts/get_countries_of_earth.py fails.
Can I expect that list-en1-semic-3.txt & list-fr1-semic.txt are already downloaded or script should be fixed (add files to repo/change it fetch info from other source like maxmind)?
So how should I proceed with editing get_countries_of_earth.py?

Request: Add timezone columns

Very handy to have that! Name of the timezone and difference from UTC as the two different columns.

Please consider using customary country names provided by the Unicode CLDR.

Please consider using customary country names instead of the current official names. Customary country names provided by the Unicode CLDR are used widely by major websites including Google.

As a bonus, the Unicode CLDR provides additional country names in many languages, and thus better for wider application of the country-codes.

I have provided a full country name list from the latest Unicode CLDR 25 in the forked version below:
https://github.com/hanteng/country-codes/tree/master/data/country-names

Please consider replacing or adding the names into the country-codes dataset.

More here:
http://people.oii.ox.ac.uk/hanteng/zh/2014/05/28/whats-in-a-name-country-names-are-technical-cultural-and-political/

Several locations with double spaces

the raw CSV contains several locations with double space chars

Maybe there could be a search&replace for those items, e.g.

...,"Chine,  région administrative spéciale de Hong Kong",...

in C# style, I'd write:

str.Replace("  ", " ");

datasets / country-codes Goto Github PK

country-codes's Issues

Warnings

Errors

Recommend Projects

Recommend Topics

Recommend Org