osmlab / name-suggestion-index Goto Github PK

Canonical common brand names, operators, transit and flags for OpenStreetMap.

License: BSD 3-Clause "New" or "Revised" License

JavaScript 99.84% HTML 0.16%

openstreetmap osm names brands franchise javascript canonicalization operators transit flags hacktoberfest mapping wikidata

name-suggestion-index's Introduction

name-suggestion-index ("NSI")

Canonical features for OpenStreetMap, collected manually and via the NSI Collector planet scan.

What is it?

The goal of this project is to maintain a canonical list of commonly used features for suggesting consistent spelling and tagging in OpenStreetMap.

👉 Watch the video from our talk at State of the Map US 2019 to learn more about this project!

Browse the index

You can browse the name-suggestion-index and check Wikidata links for accuracy at https://nsi.guide.

How it's used

When mappers create features in OpenStreetMap, they are not always consistent about how they name and tag things. For example, we may prefer McDonald's tagged as amenity=fast_food but we see many examples of other spellings (Mc Donald's, McDonalds, McDonald’s) and taggings (amenity=restaurant).

Building a canonical feature index allows two very useful things:

We can suggest the most "correct" way to tag things as users create them while editing.
We can scan the OSM data for "incorrect" features and produce lists for review and cleanup.

The name-suggestion-index is in use in iD when adding a new item

Currently used in:

About the index

See the project wiki for details.

Participate!

We're always looking for help!

Read the Code of Conduct and remember to be kind to one another.
See the project wiki for info about how to contribute to this index.

If you have any questions or want to reach out to a maintainer, ping @bhousel, @1ec5, or @tas50 on:

OpenStreetMap US Slack (#poi or #general channels)

License

name-suggestion-index is available under the 3-Clause BSD License. See the LICENSE.md file for more details.

name-suggestion-index's People

Contributors

Stargazers

Watchers

Forkers

ingalls skorasaurus tyrasd flaviofalcao hlaw danstowell brianegge matthijsmelissen ricoelectrico nakaner digideskio simonpoole boothym user733 maxerickson reviforks sommerluk naoliv samuelylchong zennoix9726 cknouss sammansch dvontrec dieterwarson olgaboiar endorama moritzsternemann eigenbrot dharmadev108 c0ldf1recsgo lrm25 aestevens jgon6 mraguso2 tstewart15 nkbaba lionelfw andrewharvey guilhermemalfatti xeluna sinclarius ent8r good-praxis sethryder bacto danielr18 wmateam sorija defcon201 imresamu vilashi537 vesihiisi albyianna lucaswerkmeister quincylvania geospatialem higa4 n8chz jeanfred sukkoria amandasaurus nicolasleroux alexbrbr leedoughty sirius207 rye761 scarlettsteph07 stubnovalucia optarion defel anselb voron3d unitedn51 cranzy davidhicks kreed brianalexmitchell mparrault amey-kudari dallinbsmith zilula alphagamer7 mukulkhanna mouyleng2508 omoosey abdig primen justinthrash othankq coderbeetle johanricher tordans willemarcel yoasif waldyrious hfs tohaklim ldw606 alvinalmodal johnnybar

name-suggestion-index's Issues

Add licensing info

Some files have no licensing headers. It would be useful to have information on each file, and if you wish, as a LICENSE.md file too so that Github (and others) can parse the license.

Add healthcare=pharmacy alongside amenity=pharmacy

As of openstreetmap/iD@8d6f59c iD will start adding the healthcare=* tags alongside the existing in use healthcare tags (we are not deprecating the existing tags, just adding support for healthcare tagging to them).

This means that in this name suggestions list, for the amenity=pharmacy suggestions, we should also add healthcare=pharmacy tag too.

cc @1ec5 - it's been a while and we should revisit if the pharmacy suggestions are still confusing here and fix if needed.

Name Suggestions missing presets

Since the name suggestion index was recently rebuilt, there are now a handful of name suggestions that don't correspond to presets in iD.

This isn't necessarily a bad thing, but I am just listing them here so that we can determine next steps to clean them up.

If a name suggestion doesn't have a preset, it either means that

the thing is uncommon, or
we should make a preset for those, or
the name suggestion might be assigned the wrong tag.

WARN: no preset for suggestion = amenity,ice_cream,Grido
WARN: no preset for suggestion = shop,ice_cream,Мороженое

^ Not sure what the best tagging is for ice cream shop these days. There is disagreement here.

WARN: no preset for suggestion = amenity,sauna,Баня
WARN: no preset for suggestion = amenity,driving_school,Автошкола

^ Maybe we should add presets to iD for these?

WARN: no preset for suggestion = shop,energy,Punto Enel

^ Not sure what is a shop=energy?

WARN: no preset for suggestion = shop,charity,British Heart Foundation
WARN: no preset for suggestion = shop,charity,Cancer Research UK
WARN: no preset for suggestion = shop,charity,Oxfam
WARN: no preset for suggestion = shop,charity,Scope
WARN: no preset for suggestion = shop,charity,Age UK
WARN: no preset for suggestion = shop,charity,Goodwill
WARN: no preset for suggestion = shop,charity,Sue Ryder

^ Probably should add a shop=charity preset to iD.

WARN: no preset for suggestion = man_made,windmill,De Hoop

^ I don't see why this is included in the name suggestion index?

Stop suggesting some amenity=fuel names

It seems that the some amenity=fuel names shouldn't be suggested at all.
For example, right now we have in OSM 8615 objects with brand=Shell and 10228 objects with name=Shell

Most probably people are wrongly using brand as name since the default OSM style doesn't display the brand gravitystorm/openstreetmap-carto#1874 (ie, users are tagging for the renderer).

Suggesting such names that clearly are brand will just keep feeding the wrong names.

Validate config files against a JSON Schema

There are now 2 config files for this project
config/filters.json - filter names into keep/discard lists
config/canonical.json - defines canonical representation(s) for each name and tags to go with it

We can make a JSON Schema for each one and validate the files.
This is a nice way to guard against people messing up the files accidentally.

Remove "amenity/school|school"

Generic "school"

Devise way to handle nouns used as names

One of the larger issues with the index as is, is that is doesn't really work internationally. It is quite possible and happens (I believe "Apotheke" is such a case), that a noun in one language is used as the name of a chain in another currently there is no way to handle such a conflict or any regional differences in naming.

Localization of name suggestion

Currently the name suggestion index compiles global frequency counts of POIs for use in iD for suggestions / presets.

In reality the different brand names / chain stores etc are not evenly distributed and differ across countries. Many stores (and not least the usage of their localised names) are very confined.

To enhance the relevancy of the suggestions, it might be useful to partition the counts into countries and provide suggestions based on the occurence of the pois in the country being edited.

As a first step on the backend, I have attempted a new branch for the project at my repo -
https://github.com/hlaw/name-suggestion-index/tree/countrycode

Format changes

The branch revise the project to add a new country code level at the top hierarchy of name-suggestions.json. The JSON format under each country is the same as the current global file. The threshold for generating topNames.json is lowered form 50 to 5 such that names from smaller / less well mapped countries would show up.

Changes made

In my setup the original getRaw.js could not finish processing the Asia extract and got killed after eating up several Gs of memory, and I could not got it to work under node. I have therefore rewritten it in C++ and call libosmium directly (the same backend as osmium-node). Besides counts, coordinates for each POI are saved for further processing.

In build.js, the process now checks the country code from the coordinates using https://github.com/hlaw/codegrid-js. It then counts the POIs by country.

Sample data

The branch contains demo data based on a recent pbf extract of asia (with 315M nodes / 10M ways). I have not download the planet to test but I would guess that the files would be 8-10 times the current size when run on planet.

To use data from the branch, iD would need to be able to load presets / suggestions dynamically when a user moves to a different country. This would probably require a set of country specific preset files to be built before deployment. For most users this should result in smaller download size and more relevant results in suggestions. I will try to explore how this could be done in iD.

Meanwhile as the change would break iD now this is just posted for review. Thank you.

Remove "amenity/place_of_worship|Iglesia"

"Iglesia" means "church" in Spanish and shouldn't be suggested.

"Raiffeisenbank" is miscategorized as fuel not bank

Costco is canonicalized twice

I'm working on this repo a bunch this week. My plan is to close a bunch of issues and make it easier to use. I've also reprocessed a planet dump into topNames and there are quite a lot more names now.

Anyway, in the progress of updating canonical.json I found that "Costco" matches twice - once for "Costco Gasoline" and once for "Costco Wholesale". Thankfully it's the only string that is this way.

The effect of this is that they end up merged together. Currently in iD, if you type "Costco" you are offered a single preset called "Costco Gasoline" but tagged as shop=supermarket.

Update instructions

I tried to follow instructions and ran into couple questions.

The existing example covers "matches" operator. How do "nix_value" and "tag" operators work? Specifically, I tried to detangle cases where Costco may be shop=*, amenity=fuel, (tire shop, glasses shop. This would be a good example to add.
Is each pull request expected to run "Updating topNames.json from planet" - this is a prohibitively large download. If it is optional, so is "Installation" section (after git-clone).
What is the process for updating live data? Assuming this is something that happens automatically in the background, how long before live data is updates? I want to rerun my OverPass query checking for bad combinations.

I can do the updates, would need guidance.

Mark Lewiatan as case where both shop=convenience and shop=supermarket are OK

It in on border of these two, so it is tagged in both styles.

As result it appears in output of make.

Adding

    "Lewiatan":{
        "nix_value":[
        ]
    },

to canonical.josm is not removing

Lewiatan
	 in shop/convenience - 565 times
	 and shop/supermarket - 255 times

from output of make.

Adding mobile_phone shop name

To begin with, I'm looking to add another of the most popular mobile phone shops in the UK to the presets. Currently O2, Vodafone and Carphone Warehouse are available in iD - I'd like to add EE to the list.

Can someone please explain which files to edit to make this happen? Thanks

Eventually I want to "3/3 Store/Three/Three Store" to the list, but need to work out which is the correct name for the shop!

Some changes to default name suggestions to revolve inconsistencies

“ALDI”: default should be “Aldi” because also “Aldi Süd” and “Aldi Nord” do not use all-capitals.
“Adler Apotheke” is wrong orthography. Default should be “Adler-Apotheke” which is correct orthography.
“Citroen” does probably only exist because many keyboard layouts don’t provide “ë”. Default should be “Citroën”, which also the enterprise itself uses in general.
“NETTO”: Default should be “Netto” because all the other German supermarkets in the list are also not all-capitals.
“REWE Getränkemarkt”: Default should be “Rewe Getränkemarkt” because also the main store “Rewe” does not use all-capitals.
“LAWSON”: I suppose that also here the default should be “Lawson”, but I’m not sure here because I’ve never been in one of these stores.

German clothes recycling containers

Can you please add operator names of German clothes recycling containers?

Is Starbucks everywhere (almost) the same?

If yes - amenity=restaurant, shop=coffee, maybe also amenity=fast_food should be nixed.

Starbucks
	 in amenity/cafe - 8463 times
	 in shop/coffee - 169 times
	 in amenity/restaurant - 54 times
	 and amenity/fast_food - 60 times

Consider deleting old-python branch

4 years old, I think that only use is confusing people who forked repository, submitted PRs, deleted branches for submitted PRs and noticed that they still have local branches.

Chemist/pharmacy confusion

I note iD is now showing chemist chains (e.g. Boots, Rowlands, Lloyds, Numark, Well_) as 'amenity pharmacy' when in fact they are building=retail' 'shop=chemist' and should have pharmacy=yes by default. Asda (wal_Mart) in uk often has a pharmacy and pharmacy=yes is more appropriate than a separate 'amenity=pharmacy' - I think. Some have opticians in store too.

That raises the question as to how a mailbox should be shown in Asda - as a node can be used to show the location where as mailbox=yes as a subtag on the building doesn't.

note chain 'Well' is the name but the branding is displayed as '+Well' - which is how I first came across it - but is in fact incorrect.

ubuntu 18.04 + node 10 "TypeError: reader.apply is not a function"

I have installed the latest master - on Ubuntu18.04 + node 10
and I have received this error message - after running:

/osm/name-suggestion-index/getRaw.js:57
reader.apply(handler, { "with_location_handler": false });
       ^

TypeError: reader.apply is not a function
    at Object.<anonymous> (/osm/name-suggestion-index/getRaw.js:57:8)
    at Module._compile (internal/modules/cjs/loader.js:689:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:700:10)
    at Module.load (internal/modules/cjs/loader.js:599:32)
    at tryModuleLoad (internal/modules/cjs/loader.js:538:12)
    at Function.Module._load (internal/modules/cjs/loader.js:530:3)
    at Function.Module.runMain (internal/modules/cjs/loader.js:742:12)
    at startup (internal/bootstrap/node.js:236:19)
    at bootstrapNodeJSCore (internal/bootstrap/node.js:560:3)

My Dockerfile

FROM ubuntu:18.04

RUN apt-get update \
    && apt-get install  -y --no-install-recommends \
       apt-utils \
       build-essential \
       ca-certificates \
       git \
       gcc \
       g++ \
       lsb-release \
       make \
       gnupg2 \
       curl \
       wget \
    && rm -rf /var/lib/apt/lists/

RUN mkdir -p  /osm/name-suggestion-index
WORKDIR /osm/name-suggestion-index

RUN git clone  --quiet --depth 1 https://github.com/osmlab/name-suggestion-index.git /osm/name-suggestion-index
RUN wget http://download.geofabrik.de/europe/monaco-latest.osm.pbf

RUN curl -sL https://deb.nodesource.com/setup_10.x | bash - \
    && apt-get update \
    && apt-get install  -y --no-install-recommends \
       nodejs \
    && npm install \
    && rm -rf /var/lib/apt/lists/

RUN node -v && npm -v
RUN node getRaw ./monaco-latest.osm.pbf

Millenium ???

Just took a quick look at the list, mostly out of curiousity, and straight away fell over the word "Millenium" in canonical.json. Ouch...

Unless of course there is a language in which "millenium" is the correct way of writing it, or that bank really managed to trademark a spelling mistake... :-)

Suspicious "Union School (historical)"

from looking at https://github.com/osmlab/name-suggestion-index/blob/master/name-suggestions.presets.xml

Doesn't work anymore

Just tried to run an update. Totally died, likely because of newer ubuntu and me missing something.

I should just rebuild this. This was my first node project, I know better now. Might be something for a slow night.

Correct apostrophe handling

Instead of using the correct apostrophe character, people often use the character U+0027 because it’s easier to access on many keyboard layouts (as it dates from the times of ASCII).

To quote from Unicode Standard about U+0027:

neutral (vertical) glyph with mixed usage

2019 is preferred for apostrophe

preferred characters in English for paired quotation marks are 2018 & 2019

[…]

Of course U+0027 is used in technical context (programming languages…), but indeed I don’t know of any typographically valid use case for U+0027 in normal text.

I would suggest to always use a variant with U+2019 as default (canonical) variant.

I’ve checked the current canonical.json (2017-09-22) and for the all names in this list, indeed it seems that for all occurencies of U+0027 that the correct character would be U+2019 (and not 2018, 2019, 2032).

Add name, brand, and brand:wikidata=* for common franchises

For example, if I'm going to use the McDonald's iD preset to tag a restaurant, it would automatically add operator:wikidata=Q38076.

The tag brand:wikidata=* can also be conceivably be used if any iD presets make use of brand.

Limit names to a single classification

I didn't think this would be necessary but initial implementation of Quick Add presets in iD shows it really is. Probably needs to be handled in canonical somehow or we just automatically revert to the most common usage based on count.

So right now we have amenity=restaurant name=McDonald's and amenity=fast_food name=McDonald's. We should only have one, it makes searching by name less useful to have more than one.

Merge Rewe and REWE

Rewe and REWE is the same German supermarket chain, just different spellings.

Remove "Dentista"

That's just a generic name meaning "dentist" in Portuguese.

Remove "amenity/restaurant|Pizza"

Most probably somebody is trying to say that the restaurant sells pizza, and not that the name is "Pizza"

Remove "amenity/school|Escola Primária"

"Escola Primária" is a descriptive name for "primary school", and not a proper name.

Remove "amenity/school|Escuela"

amenity/school|Escuela should be removed.
It's just a generic name for "School".

Remove "amenity/library|Library"

A descriptive name like this shouldn't be suggested.

Two banks mapped to one?

Why is Resona Bank mapped to Mizuho Bank?

    "りそな銀行": {
        "matches": [
            "みずほ銀行 (Mizuho Bank)"
        ],
        "tags": {
            "name:en": "Mizuho Bank"
        }
    },

shop=wholesale

This is continuation to: openstreetmap/iD#4657

Per documentation: https://wiki.openstreetmap.org/wiki/Tag:shop%3Dwholesale
...at least first 3 stores listed in the example section (Costco, Sam's Club, BJ's Wholesale Club) should be updated, possibly same for "Makro" and "Real Canadian Superstore". (Pricesmart is not mentioned in topNames.json)

OLD: shop/supermarket OR shop/department_store
NEW: shop=wholesale AND wholesale=supermarket

Note, some items are tagged as "amenity/fuel", those should not be changed.

Additionally, there are many name variations especially for Costco fuel station. These probably should be normalized to "Costco Gasoline".

I could attempt this change with some hints. :)

current installation instructions fail as osmium installation fails on Ubuntu 16.04 64 bit

npm install, also with packages listed as required fails. I narrowed down problem to failed osmium installation and reported it at osmcode/node-osmium#93

Remove "amenity/place_of_worship|Chapelle"

"Chapelle" is just a generic name for "Chapel"
The source of so many "Chapelle" seems to be an import; for example, https://www.openstreetmap.org/way/68238798

Don't use name=Nails to designate a Nail Salon.

While searching for nail salons iD, I saw the results 'Nails' emerge as one of the first results and its tags were beauty=shop and name=Nails, as indicated below.

This isn't the proper tagging for a nail salon (shop=beauty and beauty=nails) and I'm skeptical that 'Nails' is the name for many beauty salons and think it is an error and its usage is being reinforced by newer mappers who don't know that this is the incorrect tag for a nail salon.

Merge dm and DM Drogeriemarkt

It is the same chain in Germany.

Disable tower:type=light

There are
215: tower:type=light
and 7171: tower:type=lighting

The "light" likely refers to "lighting". Please take out "light" as one of available values in ID, and apply global change to existing ones.

document is it used somewhere

There is "In iD we want to help suggest the most common names with the correct formatting and spelling." in the readme, but it is unclear whatever data collected here is really used anywhere (so it is not clear whatever it makes sense to put effort here or one should find a real data source used by iD to blacklist descriptive names).

Bradesco and Banco Bradesco are the same thing

Currently if you type "Bradesco" on iD, you get suggestions for Banco Bradesco and Bradesco. These are all the same and need to be merged.

Produce lists of features that need fixing

While we're processing the planet file, we can also produce some output about which features need updating (by comparing against the accepted entries in config/canonical.json).

e.g.

missing tag (brand:wikidata, brand, etc)
nonstandard name ("McDonalds" should be "McDonald's")
nonstandard tag (amenity=restaurant should be amenity=fast_food)
maybe more

Remove "amenity/bar|Bar" and "amenity/pub|Bar"

"Bar" is clearly a generic name for amenity=bar and amenity=pub, and shouldn't be suggested

malformed markdown

Github displays readme sections as "###Contributing We need help finding all the 'incorrect' names" (probably space after # is missing)

at https://github.com/osmlab/name-suggestion-index

Shell Gas Station, Shell and SHELL are the same

"amenity/fuel|SHELL" and "amenity/fuel|Shell Gas Station" should be only "amenity/fuel|Shell" (while I don't agree that it should suggest such names at all #29)

Add name suggestions for common trees

I have two issues which could be solved with this idea:

a. For each tree I add, I have to do a lot of clicks to specify leaf_type and leaf_cycle. That is very tedious, especially since in my region both have a relation in nearly all cases (as in "nearly all trees which are broadleaved are also deciduous, nearly all trees which are needleleaved are also evergreen").

b. I nearly never tag the type/genus, because I need to look up the correct spelling to not mess up the database. Even though in quite some cases I could at least specify the broader genus (as in, I know that a tree is a genus:de=Eiche (EN: Oak), but I would not know that it is a species:de=Stieleiche).

This would also solve different spelling like "oak" (taginfo:43), "Oak" (taginfo:15), "en:Oak" (taginfo:2)

My suggestion:

Add name suggestion for some commonly used trees.

Commonly could be defined as "all that have been tagged > 1000"

https://taginfo.openstreetmap.org/keys/genus#values ~60
https://taginfo.openstreetmap.org/keys/genus:de#values ~20 with GER names

Example

Suggestion name EN: Tree: Oak tree
Suggestion name DE: Baum: Eiche
Tags:

leaf_type=broadleaved
leaf_cycle=deciduous
genus=Oak
genus:de=Eiche

Suggestion name EN: Tree: Picea tree
Suggestion name DE: Baum: Fichte
Tags:

leaf_type=needleleaved
leaf_cycle=evergreen
genus=Picea
species:de=Fichte

Wiki pages about trees:

Would this name-suggestion-index be the right place for this idea?

provide installation instructions

from readme: run make

there are no installation instructions and it seems that installation instructions are necessary, as simply running make for me results in:

mateusz@grisznak:~/Desktop/tmp/name-suggestion-index$ make
module.js:328
    throw err;
    ^

Error: Cannot find module 'json-stable-stringify'
    at Function.Module._resolveFilename (module.js:326:15)
    at Function.Module._load (module.js:277:25)
    at Module.require (module.js:354:17)
    at require (internal/module.js:12:17)
    at Object.<anonymous> (/home/mateusz/Desktop/tmp/name-suggestion-index/build.js:6:17)
    at Module._compile (module.js:410:26)
    at Object.Module._extensions..js (module.js:417:10)
    at Module.load (module.js:344:32)
    at Function.Module._load (module.js:301:12)
    at Function.Module.runMain (module.js:442:10)
Makefile:2: recipe for target 'all' failed
make: *** [all] Error 1

I guessed and expected npm install to work but it also failed.

"Warzywniak" is Polish for greengrocer, "piekarnia" for bakery, not a brand or a shop name

name-suggestion-index/config/canonical.json

Line 25999 in d4c1b84

"shop/greengrocer|Warzywniak": {

I think that this name should not appear as possible shop name, but I am not 100% sure.

I am 100% sure that it is not a brand.

steak input to return "steak house" as cuisine not as name

Hello,
While editing in iD, I searched for steak house and it returned a result with the tags amenity=restaurant and name="Steak House".

I'm thinking this is an unintended error; I don't think there is any chain named "Steak House" and although there's very few users of this tag combination (at most 75) I would caution that new iD users may want to add a steak house and just click on that top entry; and we want to prevent bad data from being created.

Instead it would return should be "amenity=restaurant and cuisine=steak_house"

Sorry If I'm filing this in the wrong place, as I understand iD is still pulling tag suggestions from the name-suggestion-index.

Add vending machines

from openstreetmap/iD#5260 - a request for Amazon Parcel Pickup lockers.

This project currently doesn't check the planet for amenity=vending_machine so I'll add it to the list and will see what kinds of vending machines currently show up frequently. (Downloading latest planet file now).