Tools to create a geolocation API similar to that offered by Google
- Takes list of place names from GeoNames
- Takes list of languages and prevalences by country from Wikipedia
- Parses and imports into MySQL DB
- Sets up API with Flask
- Responds to queries of text location strings with coordinates and country name
There are four tables iso
, places
, language
and admin1
places
lists all place names and maps to coordinates and other info
name |
clean_name |
lat |
lon |
country |
population |
elevation |
admin_name |
feature |
---|---|---|---|---|---|---|---|---|
Suhūl az̧ Z̧afrah | suhūl az̧ z̧afrah | 22.75 | 53.1667 | AE | 0 | 119 | 00 | 00 |
admin1
lists all first level administrative divisions e.g. in the US, these are states such as New York or Arizona
| code
| name
| ascii_name
| pop
| country
| admin_code
|
|------|------------|-------------|-------|---------|---------|------ |
| AD.06 | Sant Julià de Loria | Sant Julia de Loria | 3039162 | AD |
iso
lists all countries and their ISO codes
name |
iso2 |
iso3 |
---|---|---|
Afghanistan | AF | AFG |
language
lists countries and their languages along with ISO code
language |
country_name |
iso2 |
status |
lang_iso |
level |
---|---|---|---|---|---|
Brunei Malay | brunei | BN | regional | NULL | 2 |
level
indicates importance of language in that country e.g. 'Significant minority' is level 2 while 'Official' is level 1
Each feature has an associated type; referring to populated places, geographical features etc. The (partial) count of most common features are
| PPLA3 | 90397 | Seat of a 3rd order division
| PPLX | 91773 | Section of a populated place
| HMSD | 99105 | Homestead
| ADM3 | 108767 | 3rd level admin division
| RSTN | 116788 | Railroad station
| LCTY | 131307 | Locality (a minor area or place of unspecified or mixed character and indefinite boundaries)
| PPLA4 | 131855 | Seat of a 4th order division
| HTL | 133210 | Hotel
| LK | 161605 | Lake
| HLL | 173397 | Hill
| STMI | 194574 | Intermittent stream
| ADM4 | 206125 | 4th level admin division
| FRM | 218814 | Farm
| ISL | 220766 | Island
| MT | 503068 | Mountain
| STM | 593570 | Stream
| PPL | 5812629 | Populated place
- Make
name
the primary key in theplaces
table, this speeds up querys based onwhere
statements - Eliminate all feature types except PPL and any features with zero population
Set up API with
python app.py
Which serves to http://127.0.0.1:5000/
Query DB for location
with http://127.0.0.1:5000/loc=`location`
e.g. http://127.0.0.1:5000/loc=Mount%20Kpa
[{"name":"Mount Kpa","clean_name":"mount kpa","lat":6.58333,"lon":-9.35,"country":"LR","pop":0,"elevation":322,"admin_name":"11","feature":"MT"}]
Query DB for location
with country hint with http://127.0.0.1:5000/loc=`location`&country=`country`
- Uses ISO-2 code for countries
Query DB for location
with language hint with http://127.0.0.1:5000/loc=`location`&langs=`lang1,lang2...`
- Uses ISO-2 code for languages
Query a large messy string e.g. an entire document with http://127.0.0.1:5000/raw/loc=`rawString` and narrowed down to a single country with http://127.0.0.1:5000/raw/loc=`rawString`&country=`XX`
- Uses NLTK stopwords
Error codes follow W3 guidelines, need to be updated to Heroku spec
The following values sometimes appear in the admin level 1 column
00/0 = the entire country
Values that do not appear in admin1
table are not regular part of country
e.g. the Tunb islands of UAE: feature code is ISL and admin code is 11
Non-core Dependencies
Add in country names explicitly!- Add in clues e.g. likely country, region, timezone or language
- Add in fuzzy matching e.g. Al Raqqah/Al Raqah
- Automatically query Google API and update DB
- Add in admin level 2 as well as level 1
- Add in Google reverse geocoding for placing lat.long coords
- Need to be updated to Heroku spec
- Add sparse/verbose return option e.g. name and lat/lon