mapbox / osm-compare Goto Github PK
View Code? Open in Web Editor NEWFunctions that identify what changed during a feature edit on OpenStreetMap.
License: ISC License
Functions that identify what changed during a feature edit on OpenStreetMap.
License: ISC License
Let's quickly wire a compare function to catch features that meet the following:
NOTE: Ticket to track good detection's by compare functions. Feel free to edit this post.
It'd be good to monitor edits where name = *
is same as OSM names.
For E.g: https://www.openstreetmap.org/node/4557487556
@amishas157 @bkowshik can we make the names consistent. Use either camelcase, underscore or hyphens - but not all. For each CF, can we also standardise tile case for the reason?
I'm seeing features like https://osmcha.mapbox.com/46511367/features/way-477982666/ come through flagged Added a large building
. Is the threshold too low?
@manoharuss @bkowshik can you help verify this?
The OpenStreetMap API gives the following details on a mapper:
contributor-terms
changesets
createdtraces
doneblocks received
and blocks active
Currently, we use just the User ID
to decide whether the mapper is a new mapper or not. But, we could potentially use the following in some way to make a better guess:
changesets
createdtraces
doneMake a list of common mis-spellings in tags and values. Then check if the newVersion
of the feature has any of these bad spellings. If it does, return errors.
cc @Rub21
Recently there have been many fantasy edits on ocean where suspicious features are added in the middle of sea. Apart from whats happening on land, we also need to catch these edits and take actions immediately.
Other than route=ferry
and any other ferry related tags, any new features in the middle of the ocean should be caught and flagged in OSMCha.
http://osmcha.mapbox.com/45228754/
http://osmcha.mapbox.com/45228653/
cc @mapbox/vandals
@bkowshik @amishas157 @manoharuss - can we write a compare function to flag stray water features? Any new water features by a new user.
I think we should flag any new place=city
or place=town
.
@bkowshik brought an interesting idea of context based detection for flagging harmful features.
Following are notes from the discussion we had on this idea :
So far we haven't worked on any feature detector where we use context while doing any detection.
Context can have different meaning here. It can refer to surrounding area where a feature is created or it can also refer to the time when a feature is created. Following are few examples to understand the idea of context better:
☝️ can help in following ways:
@geohacker @batpad @planemad Would be great to hear your thoughts on above and how can we leverage context more to find harmful features.
We reviewed the edited name comparator, notes below:
name:nl
.However, it is difficult to assess if the changes are harmful without additional context such as:
On the other hand, this is definitely useful for the local mapper/community if they monitor local changes in their area.
Disputed borders from taginfo
Edit wars usually happen on disputed national borders, the DWG and OSMF has a standing position to revert any activities related to disputed areas. We need to catch these edits so as not to escalate widespread edit wars between different OSM community.
name, name:*, disputed, admin_level=2
.As part of the context based validation, this compare function will flag suspeciously overlapping features. A couple of examples we have seen in the recent past are:
Lake in Manhattan overlapping existing buildings.
A residential area becomes a Park for what looks like a Pokemon mapper.
In scenario's like ^, the feature that was created/modified overlapped quite a lot of existing features on OpenStreetMap, which being the context of where the feature will actually be. It is super important to detect features like these and correct them as soon as possible as they affect the very looks of the map. Ex:
Residential areas rendered as lake and park due to harmful edits.
NOTE: Focussing on just new/updated water bodies in iteration 1 to keep our focus narrow.
Lakes
and parks
features fit this use-case well. What other feature types can something like this handle?cc: @planemad @manoharuss
Looked at all the changesets flagged by the null_island
compare function on osmcha-django
.
51
changesets are flaggged by the comparator18
changesets were manually reviewed, approx 35%
One of the false positives looks like below. Ex:
Should we reduce the size of the bbox for null island from the current bbox below?
Catch modified/additions for the following: highway=pedestrian,footway,cycleway,track,path
An inverse of the major-road-changed to monitor PokemonGo edits. Context: https://github.com/mapbox/data/issues/2328
Seeing some valid highway tags missing from https://github.com/mapbox/osm-compare/blob/master/comparators/invalid-highway-tags.js
highway=milestone
for one.
@kupendrayadav @bkowshik @amishas157 would it make sense to rely on taginfo for these?
As per the show & tell given by @batpad and @amishas157 on compare functions, I would like to write one simple compare function for the change in highway classification to start with. By this I will try to get more understanding on the compare functions.
Will work with @amishas157 on this.
cc @planemad
Based on this overpass query, filtering poke related strings within the name
tag should detect most pokemon bad edits.
[out:xml][timeout:900];
(
node["name"~"pok(é|e)(mon|stop|gym|go)", i];
way["name"~"pok(é|e)(mon|stop|gym|go)", i];
relation["name"~"pok(é|e)(mon|stop|gym|go)", i];
);
out meta;
>;
out meta qt;
Steps
natural, water, highway, building, leisure, tourism
.name
and name:*
tag run a regex filter similar to this "~"pok(é|e(mon|stop|gym|go)"pokemon_edits
There are 26
recognized primary tags on OpenStreetMap.
aerialway, aeroway, amenity, barrier, boundary, building, craft, emergency, geological,
highway, historic, landuse, leisure, man_made, military, natural, office, place, power,
public_transport, railway, route, shop, sport, tourism, waterway
The primary-osm-tag
compare function will:
highway
waterway
is added to a building
feature1
primary tags
natural
and places
tags2 compare functions which look similar, lets remove one:
@bkowshik - could this have a better name? compare-geojson
is very generic while this project is OSM specific. We don't need to go overly descriptive, but osm-compare
or similar would better.
Let's write a util to fetch vector tiles and return geojson feature collection so compare functions can use this to check context of what's on the map. Related to #112 #129
Here's what I think this should look like:
Additionally:
cc @amishas157 @bkowshik @batpad @lukasmartinelli @ian29 @manoharuss
From this changeset a road was flagged by the Edited a name
comparator. But the change did not happen in this changeset but from a previous changeset.
@amisha157 @batpad @manoharuss
As compare functions use data from external APIs like osm-comments, Wikidata, etc we need a list of best practices that compare functions are designed with.
Capturing from @amishas157 post: #105 (comment)
Though we are assuming that we won't be hitting wikidata API too hard, but just to be 💯 , what we can do is, catch the errors when wikidata API is ratelimited and find out a way for it to report to us. Maybe we can also use: 'result:wikidataApiLimitExceeded: true, the way we do it for escalate and then read it on vandalism side to send us these error. This list the error codes returned by wikidata API. We can catch forratelimited`. If we get such errors from vandalism, we can figure out some other way, so as to not hit wikidata API hard and also be ensured that this comparator has worked the way it is expected.
I reviewed the output of the Edited a place CF today. No bad edits were flagged out of the 50 features I reviewed. Main reason was most of the edits were on non-english village names for which I lack the local context as a mapper. More notes below:
Per @planemad's post:
We should have consistent design principles that will serve as a guide to build useful compare functions without being constrained by limitations of osmcha.
Created this ticket to continue the large design discussion:
From #34
The current compare geometries function makes no distinction between a feature that transformed due to a genuine extension of new nodes from a transformation due to a node being dragged.
This could be done by confirming there was no change in the number of nodes, but only in the spatial configuration.
Per voice with @planemad @geohacker @amishas157
We wanted to 👀 what percentage of features on OpenStreetMap have 2 primary tags. There are 26
primary tags. Ex: building
, natural
, etc.
Using the key combinations API from TagInfo, we get the percentage of features that have a primary tag also have another primary tag. Ex: 60.05 %
of the geological
primary tag also have the natural
primary tag.
Querying the taginfo API for all combinations of the 26 primary tags, we have a correlogram. Ex: 90.66 %
of features with sport
tag on OpenStreetMap also has the leisure
tag.
NOTE: Cells that are empty represent a zero value.
Category | Count |
---|---|
percentage == 0 | 286 |
0 < percentage < 1 | 330 |
percentage >= 1 | 60 |
From #34
It is possible for compare functions to determine certain suspicious scenarios:
A movement greater than X=100 metres is a definite problem and should probably be flagged as invalid
. Its currently very hard to deduce this by relying on the output score of the compare function.
Given that this has been the most common class of breakage https://github.com/mapbox/vandalism-dynamosm/issues/35 is it may be worth trying to focus on catching these issues first before others.
Can we rename the package https://github.com/mapbox/osm-compare/blob/master/package.json for consistency with the repository name?
Per chat with @geohacker, idea based on a diary post on OSM. It would be good to have a compare function to flag when name=*
tags are given very long to verify in validation. Example: When the name tag value length is more than 40-50 characters.
cc @planemad @bkowshik @geohacker @chtnha
We should have one fixture that includes null
for oldVersion
and newVersion
respectively and run that against all compare functions and test that they don't throw errors.
cc @bkowshik
osm-compare
is currently called compare-geojson
on npm and I am the owner of the package.
osm-compare
osm-compare
to Mapbox organization on npmcc: @amishas157 @batpad
Evaluated the compare geometries compare function w/ @bkowshik for various geometry transformation scenarios to see how the output scores compare relative to each issue.
New road, no geometry change
{"result:compare_geometries":{"cfVersion":2,"areaDelta":0,"centroidDisplacement":0,"geometryTransformation":3}}
New road, one node dragged to double the length
{"result:compare_geometries":{"cfVersion":2,"areaDelta":38433.83745320714,"centroidDisplacement":581.8036084369106,"geometryTransformation":67082835.94906078}}
New road, one node dragged to 1000 times the length
{"result:compare_geometries":{"cfVersion":2,"areaDelta":32931423208.534332,"centroidDisplacement":1196165.8065632868,"geometryTransformation":118174327210540240}}
Mature road, one node dragged to double the length
{"result:compare_geometries":{"cfVersion":2,"areaDelta":38433.83745320714,"centroidDisplacement":581.8036084369106,"geometryTransformation":626106468.8579007}}
Mature road, split in half
{"result:compare_geometries":{"cfVersion":2,"areaDelta":-21589.85576373099,"centroidDisplacement":243.93158052782456,"geometryTransformation":-147460533.91481057}}
Mature road, displaced by 100m perpendicular to axis
{"result:compare_geometries":{"cfVersion":2,"areaDelta":-0.07723668643666315,"centroidDisplacement":111.2298332293394,"geometryTransformation":-240.54866504303104}}
Mature road, displaced by 100m along axis
{"result:compare_geometries":{"cfVersion":2,"areaDelta":0,"centroidDisplacement":109.77333342690834,"geometryTransformation":3073.6533359534333}}
Mature road, new node added and dragged by 1km
{"result:compare_geometries":{"cfVersion":2,"areaDelta":295886.4365296521,"centroidDisplacement":126.03898566162799,"geometryTransformation":1044210337.2744685}}
Observations
Going to create some tickets to discuss possible improvements based on these findings.
cc @amishas157
Let's nail scenarios that need escalation for edits to features that have wikipedia/wikidata tags. For example: https://osmcha.mapbox.com/44862344/
@manoharuss can you inspect and lay out what we can do make the scope narrow.
cc @batpad @amishas157 @krishnanammala @chtnha @bkowshik
This makes it hard to use in a lambda.
Let's port https://github.com/mapbox/osm-compare/blob/master/scripts/download_common_tag_values.py to JS or Bash.
I'm on it.
Ref: #112
There are features that are created not often in the real world. Ex: It is not everyday that a new airport is constructed. So, if these features are rare in the real world, shouldn't they also be rare on OpenStreetMap. Should we they flag these rare features for manual review?
A harmful aeroway=aerodrome
created by a new user in Georgetown
What are the other features are rarely created in the real world? 💭
Let's put a README.md in comparators. @amishas157 as part of the current push can you take a stab at this?
@amishas157 @bkowshik this https://github.com/mapbox/osm-compare/blob/master/comparators/name_modified.js one seem to be super noisy. Just this morning, we have over 2000 entries in OSMCha.
Can we take a look and find ways to make the scope tight? Thank you!
cc @mapbox/vandals
There are several cases when we pass "incomplete" inputs into compare functions. Compare functions should exit gracefully and not throw errors if any of the following is missing:
Either oldVersion
or newVersion
: this is quite obvious - either will be missing if the feature is created or deleted. We need tests to make sure no function throws an error in this case.
Missing geometries: We want to be able to run compare functions (potentially) on features where we do not have or are unable to get full geometries. Compare functions MUST handle cases of geometry
being null.
We need tests for the above ^ - a set of fixtures with each of these cases - i.e. oldVersion
missing, newVersion
missing, and geometries missing (set to null
) - the test suite should run these fixtures against all compare functions and ensure that none throw errors.
Flag any name changes happening to place=island
tag 💣
Right now our place-edited comparator don't consider place=island
to check if there are any name changes to an island. We need to add place=island
tag along with the rest of the place tags in the compare function.
One harmful changeset where a major island name was changed : http://osmcha.mapbox.com/45315203/
cc @mapbox/vandals
Functions like https://github.com/mapbox/osm-compare/blob/master/comparators/wikidata_wikipedia_tag_deleted.js are of synchronous nature.
They should be exposed as sync function - we can write a wrapper function that turns these sync functions into the async callback interface again in https://github.com/mapbox/osm-compare/blob/master/index.js.
This way we can use them in other projects too.
I'm on it.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.