mapbox / osm-compare Goto Github PK

Functions that identify what changed during a feature edit on OpenStreetMap.

License: ISC License

JavaScript 6.43% Python 0.28% Shell 0.03% Jupyter Notebook 93.26%

osm-compare's Introduction

⚠️ This repo is archived because it is no longer used and no stacks are running.
Previously used only in transferred and discontinued mapbox/osmcha project.
Shutdown ticket in Jira for reference.

osm-compare

Compare functions are small atomic functions that are designed to identify what changed during a feature edit on OpenStreetMap. Compare functions can be broadly split up into two categories:

Property (tags) checking compare function
Geometry checking compare functions

Compare functions take as inputs the following:

oldVersion - GeoJSON of the feature's old version
newVersion - GeoJSON of the feature's new version

Compare functions output the following:

result - Object containing key value pairs representing findings of the compare function or an empty object.

# Format of compare function result where value can be primary data types or objects
{
    'result:comparator_name': value,
    'message': Any custom message which corresponds to the catch
}

# Format of compare function if no result, (default)
false

Install

# Install osm-compare from the Mapbox namespace.
npm install @mapbox/osm-compare

Docs

How do I build an npm package?

We use Semantic Versioning Specification for versioning releases.
Create an appropriate version of the npm package with npm version [major|minor|patch].
Push the package tag commit with git push --tags
Publish the NPM package with npm publish

osm-compare's People

Contributors

Stargazers

Watchers

Forkers

datafordevelopment testbigorg kepta rubythonode nrotstan lasith-niro isabella232 bugdebugger jsfasdf250 mzagorskirs willemarcel mapclone dheeraj-pal ramen2581

osm-compare's Issues

Changelog to record all notable changes made

Edited a place CF review

I reviewed the output of the Edited a place CF today. No bad edits were flagged out of the 50 features I reviewed. Main reason was most of the edits were on non-english village names for which I lack the local context as a mapper. More notes below:

The CF looks at all changes in a feature, we should differentiate the label to whether a tag was changed or a geometry was changed.
In relation to ☝️ , small geometry changes were not detected see for example: https://osmcha.mapbox.com/45210141/features/node-582887071/
For better monitoring, let's split this CF to "major" places (country, city, town) and "minor" places (village, hamlet, neighbourhood, locality). "Major" places are more prone to intentioanl bad edits and this should a priority to monitor.
I'm not sure whether V1 edits can be categorised as Edited a place.

Comparator to monitor primary tags on OpenStreetMap

There are 26 recognized primary tags on OpenStreetMap.

aerialway, aeroway, amenity, barrier, boundary, building, craft, emergency, geological,
highway, historic, landuse, leisure, man_made, military, natural, office, place, power,
public_transport, railway, route, shop, sport, tourism, waterway

The primary-osm-tag compare function will:

Flag when a primary tag of a feature is deleted
- Ex: highway
Flag when a new primary tag is added to an existing feature
- Ex: waterway is added to a building feature
Flag features with more than 1 primary tags
- Ex: A name translation is added to a feature with both natural and places tags

Uncertainties

@amishas157 is there an existing compare function doing something similar?
What are the other potential interesting cases with primary feature tags?

cc: @geohacker @planemad @bsrinivasa

Flag pokemon string from name tag

Based on this overpass query, filtering poke related strings within the name tag should detect most pokemon bad edits.

[out:xml][timeout:900];
(
  node["name"~"pok(é|e)(mon|stop|gym|go)", i];
  way["name"~"pok(é|e)(mon|stop|gym|go)", i];
  relation["name"~"pok(é|e)(mon|stop|gym|go)", i];
);
out meta;
>;
out meta qt;

Steps

Get all new or modified edits from the following primary tags: natural, water, highway, building, leisure, tourism.
Within the name and name:* tag run a regex filter similar to this "~"pok(é|e(mon|stop|gym|go)"
proposed CF name: pokemon_edits

@krishnanammala @amishas157 @batpad

Good detections by compare functions

NOTE: Ticket to track good detection's by compare functions. Feel free to edit this post.

New footway created

http://osmcha.mapbox.com/46677598/

Add waterway=* to highway=* node drags

cc @amishas157

Name

@bkowshik - could this have a better name? compare-geojson is very generic while this project is OSM specific. We don't need to go overly descriptive, but osm-compare or similar would better.

Document all our current compare functions

Let's put a README.md in comparators. @amishas157 as part of the current push can you take a stab at this?

Edited a name tag comparator flagging the previous changed.

From this changeset a road was flagged by the Edited a name comparator. But the change did not happen in this changeset but from a previous changeset.

@amisha157 @batpad @manoharuss

Features that are rarely created in the real world

Ref: #112

There are features that are created not often in the real world. Ex: It is not everyday that a new airport is constructed. So, if these features are rare in the real world, shouldn't they also be rare on OpenStreetMap. Should we they flag these rare features for manual review?

http://www.openstreetmap.org/node/4638801492

A harmful aeroway=aerodrome created by a new user in Georgetown

What are the other features are rarely created in the real world? 💭

cc: @planemad @manoharuss @geohacker @amishas157

Set a flag for suspicious transformations over a certain threshold

From #34

It is possible for compare functions to determine certain suspicious scenarios:

A whole way was displaced by X distance
A node in the way was dragged by X distance

A movement greater than X=100 metres is a definite problem and should probably be flagged as invalid. Its currently very hard to deduce this by relying on the output score of the compare function.

Given that this has been the most common class of breakage https://github.com/mapbox/vandalism-dynamosm/issues/35 is it may be worth trying to focus on catching these issues first before others.

cc @bkowshik @batpad @mikelmaron

Detect invalid turn restrictions

Common cases of invalid TR

TR relation with more than 3 members. For example, 3 from/via.
From/to members that don't share a common node with the via node/way

Sketch for detection

Get old and new version of TR relation
Count number of members. If more than 3, flag as invalid.
Check for connectivity using turfjs. If from/to does not share a node with via, flag as invalid.

Blockers

TR relations should be included in out filters
Can turfjs do topology?

Util for fetching vector tiles

Let's write a util to fetch vector tiles and return geojson feature collection so compare functions can use this to check context of what's on the map. Related to #112 #129

Here's what I think this should look like:

util receives vector tile ID, x, y and z. Default vector tile ID to mapbox-streets
request for the tile by hitting the maps end point
use https://github.com/mapbox/vector-tile-js#methods-1 to convert the pbf to geojson
return the featurecollection

Additionally:

the util can accept a feature filter https://github.com/mapbox/feature-filter and return only features that satisfy this.

cc @amishas157 @bkowshik @batpad @lukasmartinelli @ian29 @manoharuss

npm package for osm-compare

osm-compare is currently called compare-geojson on npm and I am the owner of the package.

Todo

Publish a new npm package called osm-compare
Move osm-compare to Mapbox organization on npm

cc: @amishas157 @batpad

Improving the null_island comparator

Looked at all the changesets flagged by the null_island compare function on osmcha-django.

A total of 51 changesets are flaggged by the comparator
18 changesets were manually reviewed, approx 35%
None of the changesets reviewed are problematic.

One of the false positives looks like below. Ex:

Should we reduce the size of the bbox for null island from the current bbox below?

Synchronous interface for sync functions

Functions like https://github.com/mapbox/osm-compare/blob/master/comparators/wikidata_wikipedia_tag_deleted.js are of synchronous nature.

They should be exposed as sync function - we can write a wrapper function that turns these sync functions into the async callback interface again in https://github.com/mapbox/osm-compare/blob/master/index.js.

This way we can use them in other projects too.

I'm on it.

Features on OpenStreetMap with more than one primary tag

Per voice with @planemad @geohacker @amishas157

We wanted to 👀 what percentage of features on OpenStreetMap have 2 primary tags. There are 26 primary tags. Ex: building, natural, etc.

https://wiki.openstreetmap.org/wiki/Map_Features

Using the key combinations API from TagInfo, we get the percentage of features that have a primary tag also have another primary tag. Ex: 60.05 % of the geological primary tag also have the natural primary tag.

Querying the taginfo API for all combinations of the 26 primary tags, we have a correlogram. Ex: 90.66 % of features with sport tag on OpenStreetMap also has the leisure tag.

NOTE: Cells that are empty represent a zero value.

Category	Count
percentage == 0	286
0 < percentage < 1	330
percentage >= 1	60

npm install should not require Python

This makes it hard to use in a lambda.

Let's port https://github.com/mapbox/osm-compare/blob/master/scripts/download_common_tag_values.py to JS or Bash.

I'm on it.

Flag`name` tag which are same as OSM user name

It'd be good to monitor edits where name = * is same as OSM names.
For E.g: https://www.openstreetmap.org/node/4557487556

@batpad @amishas157 @manoharuss @krishnanammala @planemad

Compare function for long name tag values

Per chat with @geohacker, idea based on a diary post on OSM. It would be good to have a compare function to flag when name=* tags are given very long to verify in validation. Example: When the name tag value length is more than 40-50 characters.

cc @planemad @bkowshik @geohacker @chtnha

Flag any changes in and around null island

cc @mapbox/vandals

Detect changes to disputed borders

Disputed borders from taginfo

Edit wars usually happen on disputed national borders, the DWG and OSMF has a standing position to revert any activities related to disputed areas. We need to catch these edits so as not to escalate widespread edit wars between different OSM community.

Sketch for detection

Get changes from ways/relations with the following tags: name, name:*, disputed, admin_level=2.
Flag any tag changes and geometry changes (need to define a threshold).

Guiding principles to build useful compare functions

Per @planemad's post:

We should have consistent design principles that will serve as a guide to build useful compare functions without being constrained by limitations of osmcha.

Created this ticket to continue the large design discussion:

Differentiate between no data and a positive result.
Comparators should pass on as much useful knowledge as possible to a human reviewer to make the final decision.
Witholding finding will lead to duplicate human effort on the same activity.

cc: @amishas157 @batpad @geohacker

Setup code linting with eslint

Improve new_mapper compare functions with more data points

The OpenStreetMap API gives the following details on a mapper:

User ID
Display name
Date the account was created
Description
Whether the user has agreed to the contributor-terms
Hyperlink to user's gravatar
Number of changesets created
Number of traces done
Number of blocks received and blocks active

Currently, we use just the User ID to decide whether the mapper is a new mapper or not. But, we could potentially use the following in some way to make a better guess:

Date the account was created
Number of changesets created
Number of traces done

Flag new cities and towns

I think we should flag any new place=city or place=town.

cc @amishas157 @bkowshik @manoharuss @batpad @maning

Compare function to catch highway=footway

Let's quickly wire a compare function to catch features that meet the following:

highway=footway
version 1
by a new user

@batpad @amishas157 @bkowshik

Name modification is noisy

@amishas157 @bkowshik this https://github.com/mapbox/osm-compare/blob/master/comparators/name_modified.js one seem to be super noisy. Just this morning, we have over 2000 entries in OSMCha.

Can we take a look and find ways to make the scope tight? Thank you!

cc @manoharuss @planemad @batpad

Test that compare functions don't fail with incomplete inputs

There are several cases when we pass "incomplete" inputs into compare functions. Compare functions should exit gracefully and not throw errors if any of the following is missing:

Either oldVersion or newVersion: this is quite obvious - either will be missing if the feature is created or deleted. We need tests to make sure no function throws an error in this case.
Missing geometries: We want to be able to run compare functions (potentially) on features where we do not have or are unable to get full geometries. Compare functions MUST handle cases of geometry being null.

We need tests for the above ^ - a set of fixtures with each of these cases - i.e. oldVersion missing, newVersion missing, and geometries missing (set to null) - the test suite should run these fixtures against all compare functions and ensure that none throw errors.

cc @amishas157 @geohacker

Version 4 of compare-geojson

Updates

Improve interface to compare functions
Versioning compare functions
Behaviour for newly created and deleted objects
Changes to important features and
User blocks #19
Changelog to record all notable changes made #3

Using context for detecting harmful features

@bkowshik brought an interesting idea of context based detection for flagging harmful features.
Following are notes from the discussion we had on this idea :

So far we haven't worked on any feature detector where we use context while doing any detection.

Context can have different meaning here. It can refer to surrounding area where a feature is created or it can also refer to the time when a feature is created. Following are few examples to understand the idea of context better:

Say there is a lake having all valid tags and looking 💯 if checked individually. But what if it is overlapping with other features in an area which makes it as bad bad feature. Ref: A lake was created in Manhattan city overlapping many buildings.
Another example of context can be distinguishing areas which are well mapped (for example: Germany , San Francisco) and which are not (For example: India). So now thinking about this, say someone adds lakes , hills in places which are well mapped. That will appear to be much more suspicious than if someone adds these kind of features in not so well mapped areas. Because there is a possibility that these kind of features could be missing from these areas. But it's not expected these feature addition in well mapped areas.
Talking about time based context: For example it won't be common, people adding new airports to map now. Because we expect that these kind of features should be added to OSM by now and any new addition should be suspicious.

☝️ can help in following ways:

To create suspicious feature which otherwise looks good.
It also helps in flagging critical issues.

@geohacker @batpad @planemad Would be great to hear your thoughts on above and how can we leverage context more to find harmful features.

Review wikipedia/wikidata comparator

Let's nail scenarios that need escalation for edits to features that have wikipedia/wikidata tags. For example: https://osmcha.mapbox.com/44862344/

@manoharuss can you inspect and lay out what we can do make the scope narrow.

cc @batpad @amishas157 @krishnanammala @chtnha @bkowshik

Hello, world!

Large building threshold too low?

I'm seeing features like https://osmcha.mapbox.com/46511367/features/way-477982666/ come through flagged Added a large building. Is the threshold too low?

@manoharuss @bkowshik can you help verify this?

Compare function for mis-spelled tags

Make a list of common mis-spellings in tags and values. Then check if the newVersion of the feature has any of these bad spellings. If it does, return errors.

cc @Rub21

Flag stray lakes

@bkowshik @amishas157 @manoharuss - can we write a compare function to flag stray water features? Any new water features by a new user.

city_deleted vs city-deleted

2 compare functions which look similar, lets remove one:

Comparator for minor road changed

Catch modified/additions for the following: highway=pedestrian,footway,cycleway,track,path

An inverse of the major-road-changed to monitor PokemonGo edits. Context: https://github.com/mapbox/data/issues/2328

Rename package: compare-geojson -> osm-compare

Can we rename the package https://github.com/mapbox/osm-compare/blob/master/package.json for consistency with the repository name?

cc @bkowshik @amishas157

Write tests for null oldVersion or newVersion to ensure functions dont throw errors

We should have one fixture that includes null for oldVersion and newVersion respectively and run that against all compare functions and test that they don't throw errors.

cc @bkowshik

Evaluation of compare_geometries

Evaluated the compare geometries compare function w/ @bkowshik for various geometry transformation scenarios to see how the output scores compare relative to each issue.

New road, no geometry change
{"result:compare_geometries":{"cfVersion":2,"areaDelta":0,"centroidDisplacement":0,"geometryTransformation":3}} 

New road, one node dragged to double the length
{"result:compare_geometries":{"cfVersion":2,"areaDelta":38433.83745320714,"centroidDisplacement":581.8036084369106,"geometryTransformation":67082835.94906078}} 

New road, one node dragged to 1000 times the length
{"result:compare_geometries":{"cfVersion":2,"areaDelta":32931423208.534332,"centroidDisplacement":1196165.8065632868,"geometryTransformation":118174327210540240}} 

Mature road, one node dragged to double the length
{"result:compare_geometries":{"cfVersion":2,"areaDelta":38433.83745320714,"centroidDisplacement":581.8036084369106,"geometryTransformation":626106468.8579007}} 

Mature road, split in half
{"result:compare_geometries":{"cfVersion":2,"areaDelta":-21589.85576373099,"centroidDisplacement":243.93158052782456,"geometryTransformation":-147460533.91481057}} 

Mature road, displaced by 100m perpendicular to axis
{"result:compare_geometries":{"cfVersion":2,"areaDelta":-0.07723668643666315,"centroidDisplacement":111.2298332293394,"geometryTransformation":-240.54866504303104}} 

Mature road, displaced by 100m along axis
{"result:compare_geometries":{"cfVersion":2,"areaDelta":0,"centroidDisplacement":109.77333342690834,"geometryTransformation":3073.6533359534333}} 

Mature road, new node added and dragged by 1km
{"result:compare_geometries":{"cfVersion":2,"areaDelta":295886.4365296521,"centroidDisplacement":126.03898566162799,"geometryTransformation":1044210337.2744685}}

Observations

The area delta multiplier is strongly affecting the scores, we might need to use the square root to normalize its effect
A node in the way getting displaced is weighted much much higher than the whole way getting displaced, even though both are definitely invalid transformations. We are currently not catching way displacements because of this https://github.com/mapbox/vandalism-dynamosm/issues/35
We need to be able to differentiate a incorrect displacement or distortion from a legitimate improvement to a feature. This could possibly be done by comparing the number of nodes in the feature for changes
When we are certain a transformation is invalid, it might be better to set an explicit flag rather than trying to quantify the problem as a numerical score.

Going to create some tickets to discuss possible improvements based on these findings.

cc @batpad @mikelmaron

Flagging suspicious edits on ocean

Recently there have been many fantasy edits on ocean where suspicious features are added in the middle of sea. Apart from whats happening on land, we also need to catch these edits and take actions immediately.

Other than route=ferry and any other ferry related tags, any new features in the middle of the ocean should be caught and flagged in OSMCha.

http://osmcha.mapbox.com/45228754/

http://osmcha.mapbox.com/45228653/

cc @mapbox/vandals

Consistent naming

@amishas157 @bkowshik can we make the names consistent. Use either camelcase, underscore or hyphens - but not all. For each CF, can we also standardise tile case for the reason?

Notes on edited name comparator

We reviewed the edited name comparator, notes below:

compare function works as expected, no unusual detection was noticed.
name changes includes names in other languages for example, name:nl.

However, it is difficult to assess if the changes are harmful without additional context such as:

local knowledge;
external reference/source for names;
a list important features we should monitor;
watchlist of words like curse words.

On the other hand, this is definitely useful for the local mapper/community if they monitor local changes in their area.

@krishnanammala @poornibadrinath @amishas157

flag name change to place=island

Flag any name changes happening to place=island tag 💣

Right now our place-edited comparator don't consider place=island to check if there are any name changes to an island. We need to add place=island tag along with the rest of the place tags in the compare function.

One harmful changeset where a major island name was changed : http://osmcha.mapbox.com/45315203/

cc @mapbox/vandals

Valid highway tags missing

Seeing some valid highway tags missing from https://github.com/mapbox/osm-compare/blob/master/comparators/invalid-highway-tags.js

highway=milestone for one.

@kupendrayadav @bkowshik @amishas157 would it make sense to rely on taginfo for these?

Best practices for working with external APIs

As compare functions use data from external APIs like osm-comments, Wikidata, etc we need a list of best practices that compare functions are designed with.

Capturing from @amishas157 post: #105 (comment)

Though we are assuming that we won't be hitting wikidata API too hard, but just to be 💯 , what we can do is, catch the errors when wikidata API is ratelimited and find out a way for it to report to us. Maybe we can also use: 'result:wikidataApiLimitExceeded: true, the way we do it for escalate and then read it on vandalism side to send us these error. This list the error codes returned by wikidata API. We can catch forratelimited`. If we get such errors from vandalism, we can figure out some other way, so as to not hit wikidata API hard and also be ensured that this comparator has worked the way it is expected.

Does the feature overlap with another feature

Per voice with @geohacker and @amishas157
Ref: #112, #108

As part of the context based validation, this compare function will flag suspeciously overlapping features. A couple of examples we have seen in the recent past are:

https://osmcha.mapbox.com/46963151/

Lake in Manhattan overlapping existing buildings.

http://osmcha.mapbox.com/45536604/

A residential area becomes a Park for what looks like a Pokemon mapper.

In scenario's like ^, the feature that was created/modified overlapped quite a lot of existing features on OpenStreetMap, which being the context of where the feature will actually be. It is super important to detect features like these and correct them as soon as possible as they affect the very looks of the map. Ex:

Residential areas rendered as lake and park due to harmful edits.

Workflow

NOTE: Focussing on just new/updated water bodies in iteration 1 to keep our focus narrow.

A new lake or an existing lake gets edited on OpenStreetMap.
Calculate all map tiles (xyz) that cover the lake.
Download vector map tiles from Mapbox API, convert to a FeatureCollection with all features in the tile.
Check if new/modified lake overlaps with any feature in the FeatureCollection and flag if there is a suspecious overlap.

Uncertainities

What zoom level tiles should be downloaded from the API
- Tile at lower zoom levels don't have all the data. Ex: Buildings generally show up in tiles greater than zoom 15
There could be good overlaps too. We need to differentiate between a good and a harmful overlap.
Lakes and parks features fit this use-case well. What other feature types can something like this handle?

cc: @planemad @manoharuss

Comparator to detect displacements

From #34

The current compare geometries function makes no distinction between a feature that transformed due to a genuine extension of new nodes from a transformation due to a node being dragged.

This could be done by confirming there was no change in the number of nodes, but only in the spatial configuration.

cc @bkowshik @batpad

Compare function for change in highway classification - Learning

As per the show & tell given by @batpad and @amishas157 on compare functions, I would like to write one simple compare function for the change in highway classification to start with. By this I will try to get more understanding on the compare functions.

Will work with @amishas157 on this.

cc @planemad

mapbox / osm-compare Goto Github PK

osm-compare's Introduction

osm-compare

Install

Docs

How do I build an npm package?

osm-compare's People

Contributors

Stargazers

Watchers

Forkers

osm-compare's Issues

Uncertainties

New footway created

Common cases of invalid TR

Sketch for detection

Blockers

Todo

Sketch for detection

Updates

Workflow

Uncertainities

Recommend Projects

Recommend Topics

Recommend Org