Giter Club home page Giter Club logo

osm-compare's Introduction

⚠️ This repo is archived because it is no longer used and no stacks are running.
Previously used only in transferred and discontinued mapbox/osmcha project.
Shutdown ticket in Jira for reference.


osm-compare

Travis CI

Compare functions are small atomic functions that are designed to identify what changed during a feature edit on OpenStreetMap. Compare functions can be broadly split up into two categories:

  1. Property (tags) checking compare function
  2. Geometry checking compare functions

Compare functions take as inputs the following:

  1. oldVersion - GeoJSON of the feature's old version
  2. newVersion - GeoJSON of the feature's new version

Compare functions output the following:

  1. result - Object containing key value pairs representing findings of the compare function or an empty object.
# Format of compare function result where value can be primary data types or objects
{
    'result:comparator_name': value,
    'message': Any custom message which corresponds to the catch
}

# Format of compare function if no result, (default)
false

Install

# Install osm-compare from the Mapbox namespace.
npm install @mapbox/osm-compare

Docs

How do I build an npm package?

osm-compare's People

Contributors

aburkut avatar alice-hawthorne avatar amishas157 avatar bkowshik avatar bugdebugger avatar geochetan avatar geohacker avatar kepta avatar krishnanammala avatar kupendrayadav avatar lukasmartinelli avatar maning avatar manoharuss avatar mzagorskirs avatar planemad avatar tmcw avatar willemarcel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

osm-compare's Issues

Edited a place CF review

I reviewed the output of the Edited a place CF today. No bad edits were flagged out of the 50 features I reviewed. Main reason was most of the edits were on non-english village names for which I lack the local context as a mapper. More notes below:

  • The CF looks at all changes in a feature, we should differentiate the label to whether a tag was changed or a geometry was changed.
  • In relation to ☝️ , small geometry changes were not detected see for example: https://osmcha.mapbox.com/45210141/features/node-582887071/
  • For better monitoring, let's split this CF to "major" places (country, city, town) and "minor" places (village, hamlet, neighbourhood, locality). "Major" places are more prone to intentioanl bad edits and this should a priority to monitor.
  • I'm not sure whether V1 edits can be categorised as Edited a place.

Comparator to monitor primary tags on OpenStreetMap

There are 26 recognized primary tags on OpenStreetMap.

aerialway, aeroway, amenity, barrier, boundary, building, craft, emergency, geological,
highway, historic, landuse, leisure, man_made, military, natural, office, place, power,
public_transport, railway, route, shop, sport, tourism, waterway

The primary-osm-tag compare function will:

  • Flag when a primary tag of a feature is deleted
    • Ex: highway
  • Flag when a new primary tag is added to an existing feature
    • Ex: waterway is added to a building feature
  • Flag features with more than 1 primary tags
    • Ex: A name translation is added to a feature with both natural and places tags

Uncertainties

  • @amishas157 is there an existing compare function doing something similar?
  • What are the other potential interesting cases with primary feature tags?

cc: @geohacker @planemad @bsrinivasa

Flag pokemon string from name tag

Based on this overpass query, filtering poke related strings within the name tag should detect most pokemon bad edits.

[out:xml][timeout:900];
(
  node["name"~"pok(é|e)(mon|stop|gym|go)", i];
  way["name"~"pok(é|e)(mon|stop|gym|go)", i];
  relation["name"~"pok(é|e)(mon|stop|gym|go)", i];
);
out meta;
>;
out meta qt;

Steps

  • Get all new or modified edits from the following primary tags: natural, water, highway, building, leisure, tourism.
  • Within the name and name:* tag run a regex filter similar to this "~"pok(é|e(mon|stop|gym|go)"
  • proposed CF name: pokemon_edits

@krishnanammala @amishas157 @batpad

Name

@bkowshik - could this have a better name? compare-geojson is very generic while this project is OSM specific. We don't need to go overly descriptive, but osm-compare or similar would better.

Features that are rarely created in the real world

Ref: #112


There are features that are created not often in the real world. Ex: It is not everyday that a new airport is constructed. So, if these features are rare in the real world, shouldn't they also be rare on OpenStreetMap. Should we they flag these rare features for manual review?

screen shot 2017-03-30 at 12 43 04 pm

A harmful aeroway=aerodrome created by a new user in Georgetown

What are the other features are rarely created in the real world? 💭


cc: @planemad @manoharuss @geohacker @amishas157

Set a flag for suspicious transformations over a certain threshold

From #34

It is possible for compare functions to determine certain suspicious scenarios:

  • A whole way was displaced by X distance
  • A node in the way was dragged by X distance

A movement greater than X=100 metres is a definite problem and should probably be flagged as invalid. Its currently very hard to deduce this by relying on the output score of the compare function.

Given that this has been the most common class of breakage https://github.com/mapbox/vandalism-dynamosm/issues/35 is it may be worth trying to focus on catching these issues first before others.

cc @bkowshik @batpad @mikelmaron

Detect invalid turn restrictions

Common cases of invalid TR

  • TR relation with more than 3 members. For example, 3 from/via.
  • From/to members that don't share a common node with the via node/way

Sketch for detection

  • Get old and new version of TR relation
  • Count number of members. If more than 3, flag as invalid.
  • Check for connectivity using turfjs. If from/to does not share a node with via, flag as invalid.

Blockers

  • TR relations should be included in out filters
  • Can turfjs do topology?

Util for fetching vector tiles

Let's write a util to fetch vector tiles and return geojson feature collection so compare functions can use this to check context of what's on the map. Related to #112 #129

Here's what I think this should look like:

Additionally:

cc @amishas157 @bkowshik @batpad @lukasmartinelli @ian29 @manoharuss

Features on OpenStreetMap with more than one primary tag

Per voice with @planemad @geohacker @amishas157


We wanted to 👀 what percentage of features on OpenStreetMap have 2 primary tags. There are 26 primary tags. Ex: building, natural, etc.

Using the key combinations API from TagInfo, we get the percentage of features that have a primary tag also have another primary tag. Ex: 60.05 % of the geological primary tag also have the natural primary tag.

screen shot 2017-03-02 at 5 58 18 pm

Querying the taginfo API for all combinations of the 26 primary tags, we have a correlogram. Ex: 90.66 % of features with sport tag on OpenStreetMap also has the leisure tag.

screen shot 2017-03-02 at 5 57 41 pm

NOTE: Cells that are empty represent a zero value.

Category Count
percentage == 0 286
0 < percentage < 1 330
percentage >= 1 60

Detect changes to disputed borders

screen shot 2016-12-12 at 12 35 20
Disputed borders from taginfo

Edit wars usually happen on disputed national borders, the DWG and OSMF has a standing position to revert any activities related to disputed areas. We need to catch these edits so as not to escalate widespread edit wars between different OSM community.

Sketch for detection

  • Get changes from ways/relations with the following tags: name, name:*, disputed, admin_level=2.
  • Flag any tag changes and geometry changes (need to define a threshold).

Guiding principles to build useful compare functions

Per @planemad's post:

We should have consistent design principles that will serve as a guide to build useful compare functions without being constrained by limitations of osmcha.

Created this ticket to continue the large design discussion:

  • Differentiate between no data and a positive result.
  • Comparators should pass on as much useful knowledge as possible to a human reviewer to make the final decision.
  • Witholding finding will lead to duplicate human effort on the same activity.

cc: @amishas157 @batpad @geohacker

Improve new_mapper compare functions with more data points

The OpenStreetMap API gives the following details on a mapper:

  • User ID
  • Display name
  • Date the account was created
  • Description
  • Whether the user has agreed to the contributor-terms
  • Hyperlink to user's gravatar
  • Number of changesets created
  • Number of traces done
  • Number of blocks received and blocks active

Currently, we use just the User ID to decide whether the mapper is a new mapper or not. But, we could potentially use the following in some way to make a better guess:

  • Date the account was created
  • Number of changesets created
  • Number of traces done

Test that compare functions don't fail with incomplete inputs

There are several cases when we pass "incomplete" inputs into compare functions. Compare functions should exit gracefully and not throw errors if any of the following is missing:

  • Either oldVersion or newVersion: this is quite obvious - either will be missing if the feature is created or deleted. We need tests to make sure no function throws an error in this case.

  • Missing geometries: We want to be able to run compare functions (potentially) on features where we do not have or are unable to get full geometries. Compare functions MUST handle cases of geometry being null.

We need tests for the above ^ - a set of fixtures with each of these cases - i.e. oldVersion missing, newVersion missing, and geometries missing (set to null) - the test suite should run these fixtures against all compare functions and ensure that none throw errors.

cc @amishas157 @geohacker

Version 4 of compare-geojson

Updates

  • Improve interface to compare functions
  • Versioning compare functions
  • Behaviour for newly created and deleted objects
  • Changes to important features and
  • User blocks #19
  • Changelog to record all notable changes made #3

Using context for detecting harmful features

@bkowshik brought an interesting idea of context based detection for flagging harmful features.
Following are notes from the discussion we had on this idea :

So far we haven't worked on any feature detector where we use context while doing any detection.

Context can have different meaning here. It can refer to surrounding area where a feature is created or it can also refer to the time when a feature is created. Following are few examples to understand the idea of context better:

  • Say there is a lake having all valid tags and looking 💯 if checked individually. But what if it is overlapping with other features in an area which makes it as bad bad feature. Ref: A lake was created in Manhattan city overlapping many buildings.
  • Another example of context can be distinguishing areas which are well mapped (for example: Germany , San Francisco) and which are not (For example: India). So now thinking about this, say someone adds lakes , hills in places which are well mapped. That will appear to be much more suspicious than if someone adds these kind of features in not so well mapped areas. Because there is a possibility that these kind of features could be missing from these areas. But it's not expected these feature addition in well mapped areas.
  • Talking about time based context: For example it won't be common, people adding new airports to map now. Because we expect that these kind of features should be added to OSM by now and any new addition should be suspicious.

☝️ can help in following ways:

  • To create suspicious feature which otherwise looks good.
  • It also helps in flagging critical issues.

@geohacker @batpad @planemad Would be great to hear your thoughts on above and how can we leverage context more to find harmful features.

Evaluation of compare_geometries

Evaluated the compare geometries compare function w/ @bkowshik for various geometry transformation scenarios to see how the output scores compare relative to each issue.

New road, no geometry change
{"result:compare_geometries":{"cfVersion":2,"areaDelta":0,"centroidDisplacement":0,"geometryTransformation":3}} 

New road, one node dragged to double the length
{"result:compare_geometries":{"cfVersion":2,"areaDelta":38433.83745320714,"centroidDisplacement":581.8036084369106,"geometryTransformation":67082835.94906078}} 

New road, one node dragged to 1000 times the length
{"result:compare_geometries":{"cfVersion":2,"areaDelta":32931423208.534332,"centroidDisplacement":1196165.8065632868,"geometryTransformation":118174327210540240}} 

Mature road, one node dragged to double the length
{"result:compare_geometries":{"cfVersion":2,"areaDelta":38433.83745320714,"centroidDisplacement":581.8036084369106,"geometryTransformation":626106468.8579007}} 

Mature road, split in half
{"result:compare_geometries":{"cfVersion":2,"areaDelta":-21589.85576373099,"centroidDisplacement":243.93158052782456,"geometryTransformation":-147460533.91481057}} 

Mature road, displaced by 100m perpendicular to axis
{"result:compare_geometries":{"cfVersion":2,"areaDelta":-0.07723668643666315,"centroidDisplacement":111.2298332293394,"geometryTransformation":-240.54866504303104}} 

Mature road, displaced by 100m along axis
{"result:compare_geometries":{"cfVersion":2,"areaDelta":0,"centroidDisplacement":109.77333342690834,"geometryTransformation":3073.6533359534333}} 

Mature road, new node added and dragged by 1km
{"result:compare_geometries":{"cfVersion":2,"areaDelta":295886.4365296521,"centroidDisplacement":126.03898566162799,"geometryTransformation":1044210337.2744685}} 

Observations

  • The area delta multiplier is strongly affecting the scores, we might need to use the square root to normalize its effect
  • A node in the way getting displaced is weighted much much higher than the whole way getting displaced, even though both are definitely invalid transformations. We are currently not catching way displacements because of this https://github.com/mapbox/vandalism-dynamosm/issues/35
  • We need to be able to differentiate a incorrect displacement or distortion from a legitimate improvement to a feature. This could possibly be done by comparing the number of nodes in the feature for changes
  • When we are certain a transformation is invalid, it might be better to set an explicit flag rather than trying to quantify the problem as a numerical score.

Going to create some tickets to discuss possible improvements based on these findings.

cc @batpad @mikelmaron

Consistent naming

@amishas157 @bkowshik can we make the names consistent. Use either camelcase, underscore or hyphens - but not all. For each CF, can we also standardise tile case for the reason?

Notes on edited name comparator

We reviewed the edited name comparator, notes below:

  • compare function works as expected, no unusual detection was noticed.
  • name changes includes names in other languages for example, name:nl.

However, it is difficult to assess if the changes are harmful without additional context such as:

  • local knowledge;
  • external reference/source for names;
  • a list important features we should monitor;
  • watchlist of words like curse words.

On the other hand, this is definitely useful for the local mapper/community if they monitor local changes in their area.

@krishnanammala @poornibadrinath @amishas157

Best practices for working with external APIs

As compare functions use data from external APIs like osm-comments, Wikidata, etc we need a list of best practices that compare functions are designed with.

Capturing from @amishas157 post: #105 (comment)

Though we are assuming that we won't be hitting wikidata API too hard, but just to be 💯 , what we can do is, catch the errors when wikidata API is ratelimited and find out a way for it to report to us. Maybe we can also use: 'result:wikidataApiLimitExceeded: true, the way we do it for escalate and then read it on vandalism side to send us these error. This list the error codes returned by wikidata API. We can catch forratelimited`. If we get such errors from vandalism, we can figure out some other way, so as to not hit wikidata API hard and also be ensured that this comparator has worked the way it is expected.

Does the feature overlap with another feature


As part of the context based validation, this compare function will flag suspeciously overlapping features. A couple of examples we have seen in the recent past are:

screen shot 2017-03-30 at 10 59 17 am

Lake in Manhattan overlapping existing buildings.

screen shot 2017-03-30 at 11 02 32 am

A residential area becomes a Park for what looks like a Pokemon mapper.

In scenario's like ^, the feature that was created/modified overlapped quite a lot of existing features on OpenStreetMap, which being the context of where the feature will actually be. It is super important to detect features like these and correct them as soon as possible as they affect the very looks of the map. Ex:

screen shot 2017-03-30 at 11 08 52 am

Residential areas rendered as lake and park due to harmful edits.


Workflow

NOTE: Focussing on just new/updated water bodies in iteration 1 to keep our focus narrow.

  1. A new lake or an existing lake gets edited on OpenStreetMap.
  2. Calculate all map tiles (xyz) that cover the lake.
  3. Download vector map tiles from Mapbox API, convert to a FeatureCollection with all features in the tile.
  4. Check if new/modified lake overlaps with any feature in the FeatureCollection and flag if there is a suspecious overlap.

Uncertainities

  • What zoom level tiles should be downloaded from the API
    • Tile at lower zoom levels don't have all the data. Ex: Buildings generally show up in tiles greater than zoom 15
  • There could be good overlaps too. We need to differentiate between a good and a harmful overlap.
  • Lakes and parks features fit this use-case well. What other feature types can something like this handle?

cc: @planemad @manoharuss

Comparator to detect displacements

From #34

The current compare geometries function makes no distinction between a feature that transformed due to a genuine extension of new nodes from a transformation due to a node being dragged.

This could be done by confirming there was no change in the number of nodes, but only in the spatial configuration.

cc @bkowshik @batpad

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.