Giter Club home page Giter Club logo

name-suggestion-index's Introduction

build npm version

name-suggestion-index ("NSI")

Canonical features for OpenStreetMap, collected manually and via the NSI Collector planet scan.

What is it?

The goal of this project is to maintain a canonical list of commonly used features for suggesting consistent spelling and tagging in OpenStreetMap.

👉   Watch the video from our talk at State of the Map US 2019 to learn more about this project!

Browse the index

You can browse the name-suggestion-index and check Wikidata links for accuracy at https://nsi.guide.

nsi.guide

How it's used

When mappers create features in OpenStreetMap, they are not always consistent about how they name and tag things. For example, we may prefer McDonald's tagged as amenity=fast_food but we see many examples of other spellings (Mc Donald's, McDonalds, McDonald’s) and taggings (amenity=restaurant).

Building a canonical feature index allows two very useful things:

  • We can suggest the most "correct" way to tag things as users create them while editing.
  • We can scan the OSM data for "incorrect" features and produce lists for review and cleanup.

Name Suggestion Index in use in iD

The name-suggestion-index is in use in iD when adding a new item

Currently used in:

About the index

See the project wiki for details.

Participate!

We're always looking for help!

If you have any questions or want to reach out to a maintainer, ping @bhousel, @1ec5, or @tas50 on:

License

name-suggestion-index is available under the 3-Clause BSD License. See the LICENSE.md file for more details.

name-suggestion-index's People

Contributors

1ec5 avatar adamant36 avatar anticompositenumber avatar arch0345 avatar arrival-spring avatar bhousel avatar bmillemathias avatar cj-malone avatar codeinabox avatar contrib1043 avatar davidhicks avatar dimitar5555 avatar doublah avatar ent8r avatar hanchao avatar identitaet avatar kjonosm avatar laoshubaby avatar m-hue avatar matkoniecz avatar maxerickson avatar mortein avatar rickeyrichards avatar robot8a avatar serhii-muchychka avatar sguinetti avatar tas50 avatar tommylung avatar ukchris-osm avatar willemarcel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

name-suggestion-index's Issues

Add licensing info

Some files have no licensing headers. It would be useful to have information on each file, and if you wish, as a LICENSE.md file too so that Github (and others) can parse the license.

Add healthcare=pharmacy alongside amenity=pharmacy

As of openstreetmap/iD@8d6f59c iD will start adding the healthcare=* tags alongside the existing in use healthcare tags (we are not deprecating the existing tags, just adding support for healthcare tagging to them).

This means that in this name suggestions list, for the amenity=pharmacy suggestions, we should also add healthcare=pharmacy tag too.

cc @1ec5 - it's been a while and we should revisit if the pharmacy suggestions are still confusing here and fix if needed.

Name Suggestions missing presets

Since the name suggestion index was recently rebuilt, there are now a handful of name suggestions that don't correspond to presets in iD.

This isn't necessarily a bad thing, but I am just listing them here so that we can determine next steps to clean them up.

If a name suggestion doesn't have a preset, it either means that

  1. the thing is uncommon, or
  2. we should make a preset for those, or
  3. the name suggestion might be assigned the wrong tag.

WARN: no preset for suggestion = amenity,ice_cream,Grido
WARN: no preset for suggestion = shop,ice_cream,Мороженое

^ Not sure what the best tagging is for ice cream shop these days. There is disagreement here.

WARN: no preset for suggestion = amenity,sauna,Баня
WARN: no preset for suggestion = amenity,driving_school,Автошкола

^ Maybe we should add presets to iD for these?

WARN: no preset for suggestion = shop,energy,Punto Enel

^ Not sure what is a shop=energy?

WARN: no preset for suggestion = shop,charity,British Heart Foundation
WARN: no preset for suggestion = shop,charity,Cancer Research UK
WARN: no preset for suggestion = shop,charity,Oxfam
WARN: no preset for suggestion = shop,charity,Scope
WARN: no preset for suggestion = shop,charity,Age UK
WARN: no preset for suggestion = shop,charity,Goodwill
WARN: no preset for suggestion = shop,charity,Sue Ryder

^ Probably should add a shop=charity preset to iD.

WARN: no preset for suggestion = man_made,windmill,De Hoop

^ I don't see why this is included in the name suggestion index?

Stop suggesting some amenity=fuel names

It seems that the some amenity=fuel names shouldn't be suggested at all.
For example, right now we have in OSM 8615 objects with brand=Shell and 10228 objects with name=Shell

Most probably people are wrongly using brand as name since the default OSM style doesn't display the brand gravitystorm/openstreetmap-carto#1874 (ie, users are tagging for the renderer).

Suggesting such names that clearly are brand will just keep feeding the wrong names.

Validate config files against a JSON Schema

There are now 2 config files for this project
config/filters.json - filter names into keep/discard lists
config/canonical.json - defines canonical representation(s) for each name and tags to go with it

We can make a JSON Schema for each one and validate the files.
This is a nice way to guard against people messing up the files accidentally.

Devise way to handle nouns used as names

One of the larger issues with the index as is, is that is doesn't really work internationally. It is quite possible and happens (I believe "Apotheke" is such a case), that a noun in one language is used as the name of a chain in another currently there is no way to handle such a conflict or any regional differences in naming.

Localization of name suggestion

Currently the name suggestion index compiles global frequency counts of POIs for use in iD for suggestions / presets.

In reality the different brand names / chain stores etc are not evenly distributed and differ across countries. Many stores (and not least the usage of their localised names) are very confined.

To enhance the relevancy of the suggestions, it might be useful to partition the counts into countries and provide suggestions based on the occurence of the pois in the country being edited.

As a first step on the backend, I have attempted a new branch for the project at my repo -
https://github.com/hlaw/name-suggestion-index/tree/countrycode

Format changes

The branch revise the project to add a new country code level at the top hierarchy of name-suggestions.json. The JSON format under each country is the same as the current global file. The threshold for generating topNames.json is lowered form 50 to 5 such that names from smaller / less well mapped countries would show up.

Changes made

In my setup the original getRaw.js could not finish processing the Asia extract and got killed after eating up several Gs of memory, and I could not got it to work under node. I have therefore rewritten it in C++ and call libosmium directly (the same backend as osmium-node). Besides counts, coordinates for each POI are saved for further processing.

In build.js, the process now checks the country code from the coordinates using https://github.com/hlaw/codegrid-js. It then counts the POIs by country.

Sample data

The branch contains demo data based on a recent pbf extract of asia (with 315M nodes / 10M ways). I have not download the planet to test but I would guess that the files would be 8-10 times the current size when run on planet.

To use data from the branch, iD would need to be able to load presets / suggestions dynamically when a user moves to a different country. This would probably require a set of country specific preset files to be built before deployment. For most users this should result in smaller download size and more relevant results in suggestions. I will try to explore how this could be done in iD.

Meanwhile as the change would break iD now this is just posted for review. Thank you.

Costco is canonicalized twice

I'm working on this repo a bunch this week. My plan is to close a bunch of issues and make it easier to use. I've also reprocessed a planet dump into topNames and there are quite a lot more names now.

Anyway, in the progress of updating canonical.json I found that "Costco" matches twice - once for "Costco Gasoline" and once for "Costco Wholesale". Thankfully it's the only string that is this way.

The effect of this is that they end up merged together. Currently in iD, if you type "Costco" you are offered a single preset called "Costco Gasoline" but tagged as shop=supermarket.

Update instructions

I tried to follow instructions and ran into couple questions.

  1. The existing example covers "matches" operator. How do "nix_value" and "tag" operators work? Specifically, I tried to detangle cases where Costco may be shop=*, amenity=fuel, (tire shop, glasses shop. This would be a good example to add.
  2. Is each pull request expected to run "Updating topNames.json from planet" - this is a prohibitively large download. If it is optional, so is "Installation" section (after git-clone).
  3. What is the process for updating live data? Assuming this is something that happens automatically in the background, how long before live data is updates? I want to rerun my OverPass query checking for bad combinations.

I can do the updates, would need guidance.

Adding mobile_phone shop name

To begin with, I'm looking to add another of the most popular mobile phone shops in the UK to the presets. Currently O2, Vodafone and Carphone Warehouse are available in iD - I'd like to add EE to the list.

Can someone please explain which files to edit to make this happen? Thanks

Eventually I want to "3/3 Store/Three/Three Store" to the list, but need to work out which is the correct name for the shop!

Some changes to default name suggestions to revolve inconsistencies

Some changes to default name suggestions to revolve inconsistencies

  • “ALDI”: default should be “Aldi” because also “Aldi Süd” and “Aldi Nord” do not use all-capitals.

  • “Adler Apotheke” is wrong orthography. Default should be “Adler-Apotheke” which is correct orthography.

  • “Citroen” does probably only exist because many keyboard layouts don’t provide “ë”. Default should be “Citroën”, which also the enterprise itself uses in general.

  • “NETTO”: Default should be “Netto” because all the other German supermarkets in the list are also not all-capitals.

  • “REWE Getränkemarkt”: Default should be “Rewe Getränkemarkt” because also the main store “Rewe” does not use all-capitals.

  • “LAWSON”: I suppose that also here the default should be “Lawson”, but I’m not sure here because I’ve never been in one of these stores.

Is Starbucks everywhere (almost) the same?

If yes - amenity=restaurant, shop=coffee, maybe also amenity=fast_food should be nixed.

Starbucks
	 in amenity/cafe - 8463 times
	 in shop/coffee - 169 times
	 in amenity/restaurant - 54 times
	 and amenity/fast_food - 60 times

Consider deleting old-python branch

4 years old, I think that only use is confusing people who forked repository, submitted PRs, deleted branches for submitted PRs and noticed that they still have local branches.

Chemist/pharmacy confusion

I note iD is now showing chemist chains (e.g. Boots, Rowlands, Lloyds, Numark, Well_) as 'amenity pharmacy' when in fact they are building=retail' 'shop=chemist' and should have pharmacy=yes by default. Asda (wal_Mart) in uk often has a pharmacy and pharmacy=yes is more appropriate than a separate 'amenity=pharmacy' - I think. Some have opticians in store too.

That raises the question as to how a mailbox should be shown in Asda - as a node can be used to show the location where as mailbox=yes as a subtag on the building doesn't.

  • note chain 'Well' is the name but the branding is displayed as '+Well' - which is how I first came across it - but is in fact incorrect.

ubuntu 18.04 + node 10 "TypeError: reader.apply is not a function"

I have installed the latest master - on Ubuntu18.04 + node 10
and I have received this error message - after running:

/osm/name-suggestion-index/getRaw.js:57
reader.apply(handler, { "with_location_handler": false });
       ^

TypeError: reader.apply is not a function
    at Object.<anonymous> (/osm/name-suggestion-index/getRaw.js:57:8)
    at Module._compile (internal/modules/cjs/loader.js:689:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:700:10)
    at Module.load (internal/modules/cjs/loader.js:599:32)
    at tryModuleLoad (internal/modules/cjs/loader.js:538:12)
    at Function.Module._load (internal/modules/cjs/loader.js:530:3)
    at Function.Module.runMain (internal/modules/cjs/loader.js:742:12)
    at startup (internal/bootstrap/node.js:236:19)
    at bootstrapNodeJSCore (internal/bootstrap/node.js:560:3)

My Dockerfile

FROM ubuntu:18.04

RUN apt-get update \
    && apt-get install  -y --no-install-recommends \
       apt-utils \
       build-essential \
       ca-certificates \
       git \
       gcc \
       g++ \
       lsb-release \
       make \
       gnupg2 \
       curl \
       wget \
    && rm -rf /var/lib/apt/lists/

RUN mkdir -p  /osm/name-suggestion-index
WORKDIR /osm/name-suggestion-index

RUN git clone  --quiet --depth 1 https://github.com/osmlab/name-suggestion-index.git /osm/name-suggestion-index
RUN wget http://download.geofabrik.de/europe/monaco-latest.osm.pbf

RUN curl -sL https://deb.nodesource.com/setup_10.x | bash - \
    && apt-get update \
    && apt-get install  -y --no-install-recommends \
       nodejs \
    && npm install \
    && rm -rf /var/lib/apt/lists/

RUN node -v && npm -v
RUN node getRaw ./monaco-latest.osm.pbf

Millenium ???

Just took a quick look at the list, mostly out of curiousity, and straight away fell over the word "Millenium" in canonical.json. Ouch...

Unless of course there is a language in which "millenium" is the correct way of writing it, or that bank really managed to trademark a spelling mistake... :-)

Doesn't work anymore

Just tried to run an update. Totally died, likely because of newer ubuntu and me missing something.

I should just rebuild this. This was my first node project, I know better now. Might be something for a slow night.

Correct apostrophe handling

Correct apostrophe handling

Instead of using the correct apostrophe character, people often use the character U+0027 because it’s easier to access on many keyboard layouts (as it dates from the times of ASCII).

To quote from Unicode Standard about U+0027:

  • neutral (vertical) glyph with mixed usage
  • 2019 is preferred for apostrophe
  • preferred characters in English for paired quotation marks are 2018 & 2019
  • […]

Of course U+0027 is used in technical context (programming languages…), but indeed I don’t know of any typographically valid use case for U+0027 in normal text.

I would suggest to always use a variant with U+2019 as default (canonical) variant.

I’ve checked the current canonical.json (2017-09-22) and for the all names in this list, indeed it seems that for all occurencies of U+0027 that the correct character would be U+2019 (and not 2018, 2019, 2032).

Limit names to a single classification

I didn't think this would be necessary but initial implementation of Quick Add presets in iD shows it really is. Probably needs to be handled in canonical somehow or we just automatically revert to the most common usage based on count.

So right now we have amenity=restaurant name=McDonald's and amenity=fast_food name=McDonald's. We should only have one, it makes searching by name less useful to have more than one.

Merge Rewe and REWE

Rewe and REWE is the same German supermarket chain, just different spellings.

Two banks mapped to one?

Why is Resona Bank mapped to Mizuho Bank?

    "りそな銀行": {
        "matches": [
            "みずほ銀行 (Mizuho Bank)"
        ],
        "tags": {
            "name:en": "Mizuho Bank"
        }
    },

shop=wholesale

This is continuation to: openstreetmap/iD#4657

Per documentation: https://wiki.openstreetmap.org/wiki/Tag:shop%3Dwholesale
...at least first 3 stores listed in the example section (Costco, Sam's Club, BJ's Wholesale Club) should be updated, possibly same for "Makro" and "Real Canadian Superstore". (Pricesmart is not mentioned in topNames.json)

OLD: shop/supermarket OR shop/department_store
NEW: shop=wholesale AND wholesale=supermarket

Note, some items are tagged as "amenity/fuel", those should not be changed.

Additionally, there are many name variations especially for Costco fuel station. These probably should be normalized to "Costco Gasoline".

I could attempt this change with some hints. :)

Don't use name=Nails to designate a Nail Salon.

While searching for nail salons iD, I saw the results 'Nails' emerge as one of the first results and its tags were beauty=shop and name=Nails, as indicated below.
nails

This isn't the proper tagging for a nail salon (shop=beauty and beauty=nails) and I'm skeptical that 'Nails' is the name for many beauty salons and think it is an error and its usage is being reinforced by newer mappers who don't know that this is the incorrect tag for a nail salon.

Disable tower:type=light

There are
215: tower:type=light
and 7171: tower:type=lighting

The "light" likely refers to "lighting". Please take out "light" as one of available values in ID, and apply global change to existing ones.

document is it used somewhere

There is "In iD we want to help suggest the most common names with the correct formatting and spelling." in the readme, but it is unclear whatever data collected here is really used anywhere (so it is not clear whatever it makes sense to put effort here or one should find a real data source used by iD to blacklist descriptive names).

Produce lists of features that need fixing

While we're processing the planet file, we can also produce some output about which features need updating (by comparing against the accepted entries in config/canonical.json).

e.g.

  • missing tag (brand:wikidata, brand, etc)
  • nonstandard name ("McDonalds" should be "McDonald's")
  • nonstandard tag (amenity=restaurant should be amenity=fast_food)
    maybe more

Add name suggestions for common trees

I have two issues which could be solved with this idea:

a. For each tree I add, I have to do a lot of clicks to specify leaf_type and leaf_cycle. That is very tedious, especially since in my region both have a relation in nearly all cases (as in "nearly all trees which are broadleaved are also deciduous, nearly all trees which are needleleaved are also evergreen").

b. I nearly never tag the type/genus, because I need to look up the correct spelling to not mess up the database. Even though in quite some cases I could at least specify the broader genus (as in, I know that a tree is a genus:de=Eiche (EN: Oak), but I would not know that it is a species:de=Stieleiche).

This would also solve different spelling like "oak" (taginfo:43), "Oak" (taginfo:15), "en:Oak" (taginfo:2)

My suggestion:

Add name suggestion for some commonly used trees.

Commonly could be defined as "all that have been tagged > 1000"

Example

Suggestion name EN: Tree: Oak tree
Suggestion name DE: Baum: Eiche
Tags:

leaf_type=broadleaved
leaf_cycle=deciduous
genus=Oak
genus:de=Eiche

Suggestion name EN: Tree: Picea tree
Suggestion name DE: Baum: Fichte
Tags:

leaf_type=needleleaved
leaf_cycle=evergreen
genus=Picea
species:de=Fichte

Wiki pages about trees:

Would this name-suggestion-index be the right place for this idea?

provide installation instructions

from readme: run make

there are no installation instructions and it seems that installation instructions are necessary, as simply running make for me results in:

mateusz@grisznak:~/Desktop/tmp/name-suggestion-index$ make
module.js:328
    throw err;
    ^

Error: Cannot find module 'json-stable-stringify'
    at Function.Module._resolveFilename (module.js:326:15)
    at Function.Module._load (module.js:277:25)
    at Module.require (module.js:354:17)
    at require (internal/module.js:12:17)
    at Object.<anonymous> (/home/mateusz/Desktop/tmp/name-suggestion-index/build.js:6:17)
    at Module._compile (module.js:410:26)
    at Object.Module._extensions..js (module.js:417:10)
    at Module.load (module.js:344:32)
    at Function.Module._load (module.js:301:12)
    at Function.Module.runMain (module.js:442:10)
Makefile:2: recipe for target 'all' failed
make: *** [all] Error 1

I guessed and expected npm install to work but it also failed.

steak input to return "steak house" as cuisine not as name

Hello,
While editing in iD, I searched for steak house and it returned a result with the tags amenity=restaurant and name="Steak House".

I'm thinking this is an unintended error; I don't think there is any chain named "Steak House" and although there's very few users of this tag combination (at most 75) I would caution that new iD users may want to add a steak house and just click on that top entry; and we want to prevent bad data from being created.

Instead it would return should be "amenity=restaurant and cuisine=steak_house"

Sorry If I'm filing this in the wrong place, as I understand iD is still pulling tag suggestions from the name-suggestion-index.

Add vending machines

from openstreetmap/iD#5260 - a request for Amazon Parcel Pickup lockers.

This project currently doesn't check the planet for amenity=vending_machine so I'll add it to the list and will see what kinds of vending machines currently show up frequently. (Downloading latest planet file now).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.