Giter Club home page Giter Club logo

simplification's Introduction

Build Status Coverage Status DownloadsDOI

Simplification

Simplify a LineString using the Ramer–Douglas–Peucker or Visvalingam-Whyatt algorithms

Line

Installation

pip install simplification

Installing for local development

  1. Ensure you have a copy of librdp from https://github.com/urschrei/rdp/releases, and it's in the src/simplification subdir
  2. run pip install -e .[test] --use-pep517
  3. run pytest .

Supported Python Versions (Linux x86_64 + aarch64, macOS x86_64 + arm64, Windows amd64)

  • Python 3.8
  • Python 3.9
  • Python 3.10
  • Python 3.11
  • Python 3.12

Supported Platforms

  • Linux (manylinux-compatible) x86_64 and aarch64
  • macOS Darwin x86_64 and arm64
  • Windows 64-bit

Usage

from simplification.cutil import (
    simplify_coords,
    simplify_coords_idx,
    simplify_coords_vw,
    simplify_coords_vw_idx,
    simplify_coords_vwp,
)

# Using Ramer–Douglas–Peucker
coords = [
    [0.0, 0.0],
    [5.0, 4.0],
    [11.0, 5.5],
    [17.3, 3.2],
    [27.8, 0.1]
]

# For RDP, Try an epsilon of 1.0 to start with. Other sensible values include 0.01, 0.001
simplified = simplify_coords(coords, 1.0)

# simplified is [[0.0, 0.0], [5.0, 4.0], [11.0, 5.5], [27.8, 0.1]]

# Using Visvalingam-Whyatt
# You can also pass numpy arrays, in which case you'll get numpy arrays back
import numpy as np
coords_vw = np.array([
    [5.0, 2.0],
    [3.0, 8.0],
    [6.0, 20.0],
    [7.0, 25.0],
    [10.0, 10.0]
])
simplified_vw = simplify_coords_vw(coords_vw, 30.0)

# simplified_vw is [[5.0, 2.0], [7.0, 25.0], [10.0, 10.0]]

Passing empty and/or 1-element lists will return them unaltered.

But I only want the simplified Indices

simplification now has:

  • cutil.simplify_coords_idx
  • cutil.simplify_coords_vw_idx

The values returned by these functions are the retained indices. In order to use them as e.g. a masked array in Numpy, something like the following will work:

import numpy as np
from simplification.cutil import simplify_coords_idx

# assume an array of coordinates: orig
simplified = simplify_coords_idx(orig, 1.0)
# build new geometry using only retained coordinates
orig_simplified = orig[simplified]

But I need to ensure that the resulting geometries are valid

You can use the topology-preserving variant of VW for this: simplify_coords_vwp. It's slower, but has a far greater likelihood of producing a valid geometry.

But I Want to Simplify Polylines

No problem; Decode them to LineStrings first.

# pip install pypolyline before you do this
from pypolyline.cutil import decode_polyline
# an iterable of Google-encoded Polylines, so precision is 5. For OSRM &c., it's 6
decoded = (decode_polyline(line, 5) for line in polylines)
simplified = [simplify_coords(line, 1.0) for line in decoded]

How it Works

FFI and a Rust binary

Is It Fast

I should think so.

What does that mean

Using numpy arrays for input and output, the library can be reasonably expected to process around 2500 1000-point LineStrings per second on a Core i7 or equivalent, for a 98%+ reduction in size.
A larger LineString, containing 200k+ points can be reduced to around 3k points (98.5%+) in around 50ms using RDP.

This is based on a test harness available here.

Disclaimer

All benchmarks are subjective, and pathological input will greatly increase processing time. Error-checking is non-existent at this point.

License

MIT

Citing Simplification

If Simplification has been significant in your research, and you would like to acknowledge the project in your academic publication, we suggest citing it as follows (example in APA style, 7th edition):

Hügel, S. (2021). Simplification (Version X.Y.Z) [Computer software]. https://doi.org/10.5281/zenodo.5774852

In Bibtex format:

@software{Hugel_Simplification_2021,
author = {Hügel, Stephan},
doi = {10.5281/zenodo.5774852},
license = {MIT},
month = {12},
title = {{Simplification}},
url = {https://github.com/urschrei/simplification},
version = {X.Y.Z},
year = {2021}
}

simplification's People

Contributors

dependabot-preview[bot] avatar dependabot[bot] avatar mogost avatar urschrei avatar utapyngo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

simplification's Issues

Error in usage example

The line in usage:
from simplification.cutil import simplify_coords, simplify_coordsvw

Needs to be changed to:
from simplification.cutil import simplify_coords, simplify_coords_vw

Support for LANG algorythm

Hello!

I wonder if you would be interested to add the LANG algorythm to your simplification library.

I already used you library for simplification with ramer-douglas-peucker and visvalingam-whyatt, but when testing some different algorythms to simplify vectors that resulted from a vectorizing raster images I had the best results with LANG.

So, I implemented LANG (or at least something very similar to it) in python.

However, performance wise, it should (at least might) be significantly faster in rust... and since I am facing a bottleneck caused by the simplification in some processes... I wondered...

This is the code in python:

def simplify_coords_lang(
        coords: Union[np.ndarray, list],
        tolerance: float,
        lookahead: int) -> Union[np.ndarray, list]:
    """
    Simplify a line using lang algorithm.
    Args:
        coords (Union[np.ndarray, list]): list of coordinates to be simplified.
        tolerance (float): distance tolerance to use.
        lookahead (int, optional): the number of points to consider for removing
            in a moving window. Defaults to 8.
    Returns:
        Return the coordinates kept after simplification. 
        If input coords is np.ndarray, returns np.ndarray, otherwise returns a list.
    """

    # Init variables 
    if isinstance(coords, np.ndarray):
        coords_arr = coords
    else:
        coords_arr = np.array(list(coords))

    # Determine the coordinates that need to be kept 
    coords_to_keep_idx = simplify_coords_lang_idx(
            coords=coords_arr,
            tolerance=tolerance,
            lookahead=lookahead)
    coords_simplified_arr = coords_arr[coords_to_keep_idx]

    # If input was np.ndarray, return np.ndarray, otherwise list
    if isinstance(coords, np.ndarray):
        return coords_simplified_arr
    else:
        return coords_simplified_arr.tolist()

def simplify_coords_lang_idx(
        coords: Union[np.ndarray, list],
        tolerance: float,
        lookahead: int = 8) -> Union[np.ndarray, list]:
    """
    Simplify a line using lang algorithm and return the coordinate indexes to 
    be kept.
    Inspiration for the implementation came from:
        * https://github.com/giscan/Generalizer/blob/master/simplify.py
        * https://github.com/keszegrobert/polyline-simplification/blob/master/6.%20Lang.ipynb
        * https://web.archive.org/web/20171005193700/http://web.cs.sunyit.edu/~poissad/projects/Curve/about_algorithms/lang.php
    Args:
        coords (Union[np.ndarray, list]): list of coordinates to be simplified.
        tolerance (float): distance tolerance to use.
        lookahead (int, optional): the number of points to consider for removing
            in a moving window. Defaults to 8.
    Returns:
        Return the indexes of coordinates that need to be kept after 
        simplification. 
        If input coords is np.ndarray, returns np.ndarray, otherwise returns a list.  
    """
    
    def point_line_distance(point_x, point_y, line_x1, line_y1, line_x2, line_y2) -> float:
        denominator = math.sqrt((line_x2-line_x1)*(line_x2-line_x1)+(line_y2-line_y1)*(line_y2-line_y1))
        if denominator == 0:
            return float('Inf')
        else:
            numerator = abs((line_x2-line_x1)*(line_y1-point_y)-(line_x1-point_x)*(line_y2-line_y1)) 
            return numerator/denominator

    # Init variables 
    if isinstance(coords, np.ndarray):
        line_arr = coords
    else:
        line_arr = np.array(list(coords))

    # Prepare lookahead
    nb_points = len(line_arr)
    if lookahead == -1:
        window_size = nb_points-1
    else:
        window_size = min(lookahead, nb_points-1)

    #mask = np.ones(nb_points), dtype='bool')
    mask = np.zeros(nb_points, dtype='bool')
    mask[0] = True
    window_start = 0
    window_end = window_size
    
    # Apply simplification till the window_start arrives at the last point.
    ready = False
    while ready is False:
        # Check if all points between window_start and window_end are within  
        # tolerance distance to the line (window_start, window_end).
        all_points_in_tolerance = True
        for i in range(window_start+1, window_end):
            distance = point_line_distance(
                    line_arr[i, 0], line_arr[i, 1],
                    line_arr[window_start, 0], line_arr[window_start, 1],
                    line_arr[window_end, 0], line_arr[window_end, 1])
            # If distance is nan (= linepoint1 == linepoint2) or > tolerance
            if distance > tolerance:
                all_points_in_tolerance = False
                break

        # If not all the points are within the tolerance distance... 
        if not all_points_in_tolerance:
            # Move window_end to previous point, and try again
            window_end -= 1
        else:
            # All points are within the tolerance, so they can be masked
            mask[window_end] = True
            #mask[window_start+1:window_end-1] = False

            # Move the window forward
            window_start = window_end
            if window_start == nb_points - 1:
                ready = True
            window_end = window_start + window_size
            if window_end >= nb_points: 
                window_end = nb_points - 1
    
    # Prepare result: convert the mask to a list of indices of points to keep. 
    idx_to_keep_arr = mask.nonzero()[0]

    # If input was np.ndarray, return np.ndarray, otherwise list
    if isinstance(coords, np.ndarray):
        return idx_to_keep_arr
    else:
        return idx_to_keep_arr.tolist()

Question re Applicability to a Voltage vs Time data set

I need to model a phenomenon known as contact bounce to understand its effect on the performance of the circuit in which the switch is used. I have taken measurements on contact bounce with an oscilloscope to get some representative data. My oscilloscope allows me to save the measurement data in a CSV file; each data point is a Time, Voltage pair.

The oscilloscope is a high speed digital scope - which only means that it stores many data points while capturing the waveform. I have found that these CSV files are in the range of 100 MB - far too large for my SPICE analysis software to "digest", and most of these data points are irrelevant to the result.

Consequently I need to "thin" my data set to something on the order of 1% of its current size - perhaps even more. I'm posting here because I'd like to verify these algorithms are appropriate for my data set. It seems like a good fit, but I've read that RDP "does not always preserve the property of non-self-intersection". And the mention of lines and cartographic applications for these algorithms leads me to wonder if they are well suited to thinning a function.

Any advice or recommendations are appreciated.

Python 3.9 support on windows

It seems there is no support for windows anymore starting from python 3.9.

Would it be possible to reintroduce support?

pylance and pylint are unable to resolve package content

Describe the bug

VS Code fails to resolve functions imported from simplification (e.g., via from simplification.cutil import simplify_coords_vw_idx), and pylint fails on GitLab's CI/CD with E0611: No name 'simplify_coords_vw_idx' in module 'simplification.cutil' (no-name-in-module) despite all tests passing (i.e., the import works at runtime).

System (please complete the following information)

  • OS: linux (GitLab CI/CD)
  • Architecture: x86_64
  • simplification 0.7.10, python 3.12, pylint 2.17.7

Additional context

I suppose a solution might be to add an __init__.py to the src directory. For Python 3.3+, that's not strictly required due to namespace packages existing, but it seems pylint cannot handle that perfectly yet, nor can VS Code's Pylance.

pylint-dev/pylint#4057

Before/After metadata when simplified points are returned

Let`s say is we have of not only X,Y points but also heading and timestamp of the point.
So input looks like [[X,Y,HEADING,TIMESTAMP]]
After X,Y simplification, can we still return or somehow intersect with initial data-set ?

Can't install any specific version of the library from PyPI

First, thank for the library, it's blazingly fast and we rely a lot on it.

However, I recently updated a project that depended on simplification and it broke in Travis. I realized the only dependency I had not pinned down was simplification, so I try to pin it down but it didn't work. Apparently, v0.3.2 is only available for Python 3.5, while for Python 3.6 and 3.4 the latest available is v0.2.11, making CI testing and packaging simplification as part of another app very complicated. Is there going to be a release for the v0.3.3 that is available for all Python 3.4, 3.5, and 3.6 in PyPI at least for Linux and Mac?

Simplification build for arm64 python docker image

Is there a possibility to get a python package for arm64 python docker image?
I'm working on M1 Mac and all of our services are containerized.
To reproduce (on M1 Macs) you can use following commands:

docker run -it --entrypoint bash python
# then inside the container
pip install simplification

I get following output:

ERROR: Could not find a version that satisfies the requirement simplification (from versions: none)
ERROR: No matching distribution found for simplification

Version 0.6 seems to add 0/0 Points?

I use the simplification package to reduce the point density of multipolygon structures.
Until version 0.5.12 this works perfect for me. In version 0.6+ it stopped working for some polygons. It seems to insert some 0/0 points.

In version 0.5.12:
grafik

In Version 0.6.11:
grafik

There is no exception thrown. I have tried different shapefiles, but the result is always as shown above (the only difference is how many of these "outliers" appear).

Python 3.7 Windows build

Sorry to bug you, would it be possible to provide a "simplification" Wheel for Windows Py3.7?

Python 3.7 support

Thanks for updating pypolyline for py3.7. Could you please update this lib too?

One possible improvement for RDP?

Hi,
The algorithm works pretty fast and accurate. Kudo to you!
One problem I see with all implementations of RDP (or RDP itself) is it does not handle the initial two end points well.
For example:

import numpy as np
from simplification.cutil import simplify_coords
a = np.array([[0,1], [0,0], [2,0], [2,2], [0,2]])
b = simplify_coords(a)

This would result in b equals a, which is not what I expected. I would expect to have b consists of only 4 vertices. I understand that the theory behind RDP doesn't cover this case but I honestly can't think of a good solution for this other than checking for cases like this manually.
Any better ideas?

Extension to 3D lines

Hi and thank you for your hard work.
I'd like to ask you if it's possible to expand this work to 3D lines. I'm not sure of the impact of this feature in this work.

Thank you in advance,
Giovanni

Support for np.ndarray that is not C-contiguous

Hi!

The following code:

from simplification.cutil import simplify_coords
import numpy as np

x = np.array([1,2,3,4,5])
y = np.array([0,1,1,1,0])
coords = np.transpose(np.stack((x,y)))

simplified = simplify_coords(coords, 1.0)

returns the error

Traceback (most recent call last):
  File "simplification-c-contiguous-bug.py", line 10, in <module>
    simplified = simplify_coords(coords, 1.0)
  File "simplification/cutil.pyx", line 51, in simplification.cutil.simplify_coords
  File "simplification/cutil.pyx", line 65, in simplification.cutil.simplify_coords
  File "stringsource", line 658, in View.MemoryView.memoryview_cwrapper
  File "stringsource", line 349, in View.MemoryView.memoryview.__cinit__
ValueError: ndarray is not C-contiguous

Adding the line coords = np.ascontiguousarray(coords) solves the error.

I would guess that this implicit assumption of a contiguous array (see here for an explanation) in your code is unwanted behavior and could be fixed by an explicit check.

Thanks!

simplification using a %-like value

Thanks for your work on these simplification algorithms!

Line simplification can be done using the Douglas-Pecker or Visvalingam-Whyatt algorithm. They both operate through an epsilon/tolerance parameter. Depending on the algorithm it represent distance (DP) or area (VW).
Therefor the required tolerance in each situation is hard to decide since it is implicitly depending on the projection as well (eg. meters or degrees).

Mapshaper provide the option to reduce the linestring by a percentage of points. It would be nice to have this possibility as well for these algorithms.

(you probably also like this blog: http://martinfleischmann.net/line-simplification-algorithms/ which incorporate your package).

3D line?

Is there anyway to make this library work on 3D data?

Adding to conda-forge

Is there any plan to add simplification to the conda-forge channel? It would be great if this happens so other packages that are available on conda-forge can use it as a dependency in their build.

Import simplification doesn't work on Mac M1: Symbol not found. Expected in: flat namespace

I have been trying to install simplification in a miniforge environment on a Mac M1 Big Sur but was not able to do so. I get the following error while importing from simplification.cutil import simplify_coords_vwp -

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: dlopen(/Users/akashyadav/miniforge3/envs/xview_env_tensor/lib/python3.8/site-packages/simplification/cutil.cpython-38-darwin.so, 2): Symbol not found: _drop_float_array
  Referenced from: /Users/akashyadav/miniforge3/envs/xview_env_tensor/lib/python3.8/site-packages/simplification/cutil.cpython-38-darwin.so
  Expected in: flat namespace
 in /Users/akashyadav/miniforge3/envs/xview_env_tensor/lib/python3.8/site-packages/simplification/cutil.cpython-38-darwin.so

The installation works while directly installing from anaconda however. I need to use a miniforge environment to be able to use tensorflow on my machine

Don't require numpy

Is there a way to make the numpy dependency optional?
As far as I understand, the algorithm itself is implemented in Rust, and numpy is required just to make it possible to pass numpy arrays to simplify_coords() and return them back.
numpy is a heavyweight library, and I don't want my project to depend on it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.