Giter Club home page Giter Club logo

lonboard's Introduction

Lonboard

PyPI Conda Version Binder open_in_colab

A Python library for fast, interactive geospatial vector data visualization in Jupyter.

Building on cutting-edge technologies like GeoArrow and GeoParquet in conjunction with GPU-based map rendering, Lonboard aims to enable visualizing large geospatial datasets interactively through a simple interface.

3 million points rendered from a GeoPandas GeoDataFrame in JupyterLab. Example notebook.

Install

To install Lonboard using pip:

pip install lonboard

Lonboard is on conda-forge and can be installed using conda, mamba, or pixi. To install Lonboard using conda:

conda install -c conda-forge lonboard

To install from source, refer to the developer documentation.

Get Started

For the simplest rendering, pass geospatial data into the top-level viz function.

import geopandas as gpd
from lonboard import viz

gdf = gpd.GeoDataFrame(...)
viz(gdf)

Under the hood, this delegates to a ScatterplotLayer, PathLayer, or PolygonLayer. Refer to the documentation and examples for more control over rendering.

Documentation

Refer to the documentation at developmentseed.org/lonboard.

Why the name?

This is a new binding to the deck.gl geospatial data visualization library. A "deck" is the part of a skateboard you ride on. What's a fast, geospatial skateboard? A lonboard.

lonboard's People

Contributors

abarciauskas-bgse avatar chrisgervang avatar dependabot[bot] avatar emmalu avatar giswqs avatar jorisvandenbossche avatar jtmiclat avatar jwass avatar kylebarron avatar naomatheus avatar shriv avatar vgeorge avatar vincentsarago avatar willemarcel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lonboard's Issues

Allow pandas series to FloatAccessor

TraitError: The 'get_radius' trait of a ScatterplotLayer instance expected a float value or numpy ndarray or pyarrow array representing an array of floats, not the Series ...

Manually created geoarrow table support in ScatterplotLayer

I was able to load 20 million polygons in lonboard. It was amazing! Now I am trying to figure out how to load 60 million points without having to use GeoPandas but I keep hitting code paths that expect either an interleaved list or paths that go back to numpy or paths that expect a byte like object from C.

Here is roughly what I am trying:

import lonboard
import gzip
import geoarrow.pyarrow as ga
import pyarrow.csv as pv

with gzip.open("/Users/x/data/points_s2_level_4_gzip/397_buildings.csv.gz") as fp:
        table = pv.read_csv(fp)

points = ga.point().from_geobuffers(None, table["latitude"], y=table["longitude"])

geoarrow_schema = pa.schema([pa.field("geometry", points.type, metadata={b"ARROW:extension:name": b"geoarrow.point"})])

point_table = pa.Table.from_arrays([points], schema=geoarrow_schema)
point_table.schema.field("geometry").metadata.get(b"ARROW:extension:name")
map_ = lonboard.ScatterplotLayer(table=point_table)

Type hints for vectorized accessor callbacks

Recall that you can use a Protocol with a __call__ method to define the API for a function callback. So for accessors like get fill color, you should define this protocol to take in a geodataframe and return an NDArray[np.uint8]

you should also have runtime checks to verify the correct data format

Group examples by amount of data downloaded

e.g. some examples should be illustrative and just use a very small data download. other examples should show off performance, and thus require large datasets (and maybe a large filter of an even larger dataset) but should be grouped in such a way to make it very clear.

docs note about datashader vs lonboard

One note about the difference between datashader and my deck.gl-based visualization... It looked like datashader was re rendering in a specific area when joris zoomed in and panned around. So in that sense datashader is "minimizing rendering" based on the viewport. My deck.gl-based renderer does not minimize rendering... When I'm rendering 3 million points, all 3 million of those are loaded onto the user's GPU at once. So in that sense it's not "infinitely scalable", it just uses your hardware better than any previous library

Like when you zoom in with datashader, it'll re-rasterize based on a new aggregation with the current viewport, and can do that up to your RAM size. So datashader is limited by your RAM (I think?, maybe it supports larger-than-ram) while lonboard is limited by your GPU RAM.

More informative validation error messages

e.g.

class ColorAccessorWidget(Widget):
    color = ColorAccessor()

def test_color_accessor_validation():
    color_arr = np.array([1, 2, 3]).reshape(-1, 3)
    ColorAccessorWidget(color=color_arr)

raises

TraitError: The 'color' trait of a ColorAccessorWidget instance expected a tuple or list representing an RGB(A) color or numpy ndarray or pyarrow FixedSizeList representing an array of RGB(A) colors, not the ndarray array([[1, 2, 3]]).

The issue here is that the input array has a dtype of np.int64 instead of np.uint8, but the error isn't displaying properly

React issues

Goals:

  • Render multiple dataframes/layers on a single map
  • Enable updates of Python properties like get_fill_color to propagate to the map

Attempt:

Python side

On the Python side I attempted to write a sort of "container widget" (ref manzt/anywidget#194), where each Layer object is a Python jupyter widget, and where the Map object collects each of the underlying Widgets. It's useful to have each Layer be its own Widget, because that enables event handling on each layer.

from ipywidgets import Widget # Base widget class
from anywidget import AnyWidget # high level widget helper that subclasses Widget

class BaseLayer(Widget):
    """Base class for our layer types"""
    ...

class PointLayer(BaseLayer):
    ...

class LineStringLayer(BaseLayer):
    ...

class PolygonLayer(BaseLayer):
    ...

class Map(AnyWidget):
    _esm = "path to esm JS bundle"
    _css = "optional path to CSS styling"

    # list of instances of classes that subclass from BaseLayer
    layers = List[BaseLayer]

Then a user will create a variety of layers and instantiate a Map object:

import geopandas as gpd

point_data = gpd.GeoDataFrame(...)
polygon_data = gpd.GeoDataFrame(...)

point_layer = PointLayer(point_data)
polygon_layer = PolygonLayer(polygon_data)

map = Map(layers=[point_layer, polygon_layer])
map # putting this last in a cell "prints" the object..., which in this case renders the map

The goal with this setup is to let a user run

point_layer.fill_color = [255, 0, 0]

and the points on the map turn red.

In order for this to happen, the JS side needs to be able to receive these events and re-render

JS Side

When you render the map object

This will then sync data with the JS side and render the App object

function App() {

The data from the Python side is available on the model object on the JS side. This can be accessed either via useModel or via anywidget's helper useModelState. useModelState is a small shim around useState and useEffect to keep track of the state of that value and propagate updates when the model announces that a field has changed.

The crux of the issue is that if you just use model.get(), you can access the initial value, but you never know when the value has been updated. Using useModelState from the "top level" works well, but only lets you access the attributes on the top level model.

My current code appeared to work well, but uses a synchronous private attribute that didn't appear to work in colab. And then switching to the async function didn't seem to work for that. See #34 for a description of this issue.

So the goal is to define the JS object in such a way that we hook into a model's event handlers so that we know when the on:change events happen and can update the map accordingly.

Support for rendering inside VSCode

Hi, I'm really looking forward to playing with lonboard but I can't get viz to to render. I've tried the two example notebooks... They render in your Binder and Colab links, but when I try them in local Jupyter, nothing gets displayed but a blank white bar in the cell output. The objects returned from viz seem valid -- I can see coordinates etc when I print them as a string. I assume there might be a widget support issue or something like that. I'm using conda to create env's on a Windows machine. Tried various python 3 versions, tried upgrading/downgrading Jupyter-related packages but no luck so far. Also tried upgrading & downgrading pyogrio, lonboard, pyarrow...

Polygon winding order

The deck.gl SolidPolygonLayer has a render option _windingOrder which says

This prop is only effective with _normalize: false. It specifies the winding order of rings in the polygon data, one of:

  • 'CW': outer-ring is clockwise, and holes are counter-clockwise
  • 'CCW': outer-ring is counter-clockwise, and holes are clockwise

The proper value depends on the source of your data. Most geometry formats enforce a specific winding order. Incorrectly set winding order will cause an extruded polygon's surfaces to be flipped, affecting culling and the lighting effect.

Thus, this is probably not the highest priority, given that it only happens with extruded polygons, but should be fixed eventually.

In GEOS, polygon winding order is unspecified, so we'd need to check/force it manually. There's no vectorized shapely function to do this, ref shapely/shapely#1366. So the options are either:

  • non-vectorized shapely implementation of orient. This would be unacceptably slow.
  • geoarrow-based orient implementation in python. This would be ideal but not likely to be imminently implemented.
  • JS-based orientation implementation. This either brings in a wasm implementation (not ideal here) or implements a custom JS function on geoarrow arrays (preferred, but the tooling for geoarrow in pure JS isn't there yet)

So the end goal here is:

  • Implement a fast winding order algorithm in rust on geoarrow memory to do in python.
  • Implement an orientation checking function in pure JS on geoarrow memory (maybe in geoarrow/deck.gl-layers for now)
  • Set the geoarrow winding order flag when winding order is checked/validated in python so it doesn't get done again in js.

Ref geoarrow/deck.gl-layers#36

compute_view crashes on empty geometries

compute_view crashes when empty points exist (giving an infinite bounding box)

Maybe give a warning that null points exist, and then filter them out for creating a bbox?

Separate into multiple widgets/layers?

The rendering API/options will be different based on the type of layer. Should you have a PointWidget, LineStringWidget, PolygonWidget, and then have .get_fill_color as an autocompletion-able attribute on only the PolygonWidget? And have like create_widget(gdf) as a top-level API that creates the table and then switches to create one of the widgets?

Integrate `mapclassify`

From the geopandas.plot docstring

Name of a choropleth classification scheme (requires mapclassify). A mapclassify.MapClassifier object will be used under the hood. Supported are all schemes provided by mapclassify (e.g. ‘BoxPlot’, ‘EqualInterval’, ‘FisherJenks’, ‘FisherJenksSampled’, ‘HeadTailBreaks’, ‘JenksCaspall’, ‘JenksCaspallForced’, ‘JenksCaspallSampled’, ‘MaxP’, ‘MaximumBreaks’, ‘NaturalBreaks’, ‘Quantiles’, ‘Percentiles’, ‘StdMean’, ‘UserDefined’). Arguments can be passed in classification_kwds.

Let widget fill available height

When using with jupyter sidecar, the map can be placed on the right side of the notebook screen:

image

Right now the div containing the deck.gl widget is hard-coded to 500px:

<div style={{ height: 500 }}>

This means that when used with sidecar, if the screen is more than 500px tall, it'll have a weird empty space at the bottom.

Ideally we want to let the widget fill all available height in its containing div, but I can't figure out how to do that. When I switch to, say,

style={{ display: "flex", flexFlow: "column", flexGrow: 1, overflow: "auto" }}

it creates a div with zero height:
image

cc @vgeorge

Select data by bounding box

A user draws a bounding box to select an array of feature indices that fall within the bounding box. The features are highlighted on the map and selected in the geodataframe.

Useful for exploratory data analysis.

Kyle to add details.

Docs: Performance characteristics & advice

  • discuss impact of being on a remote server
  • ultimately dependent on the user's GPU for rendering
  • In contrast to datashader, doesn't minimize the amount of data being rendered; just does it more effectively
  • Use arrow data types in pandas
  • Exclude columns from dataframe before passing into layer

Sync view state between Python and JS

Right now we include an _initial_view_state that lets Python set the initial view state.

deckgl allows you to pass in an initialViewState param which then lets deck manage the internal view state. Or you can manage the view state independently from deck, which you update with onViewStateChange and pass into deck's viewState parameter.

  • Set the state from python but allow the JS side to vary independently (otherwise you couldn't pan)
  • Debounce for messages from JS -> Python to not clog the web socket
  • not debounce for setting the view state from onViewStateChange (because we don't want to slow the deck updates)

The existing implementation of useModelState (in anywidget/react) is:

export function useModelState(key) {
  let model = useModel();
  let [value, setValue] = React.useState(model.get(key));
  React.useEffect(() => {
    let callback = () => setValue(model.get(key));
    model.on(`change:${key}`, callback);
    return () => model.off(`change:${key}`, callback);
  }, [model, key]);
  return [
    value,
    (value) => {
      model.set(key, value);
      model.save_changes();
    },
  ];
}

We probably want something like useModelStateDebounced which returns a callback that immediately calls model.set(key, value) but debounces for model.save_changes().

Note to self: https://www.joshwcomeau.com/snippets/javascript/debounce/ for implementation of debounce + note to use useMemo in react. It's unclear if we do want useMemo because we seemingly do want to re-render the react component on every view state change, because deck is reactive and won't re-render the full map

Align class names with deck.gl

I'm thinking it's better to start aligned with deck.gl and then change names in the future if we find it easier... 🤷‍♂️

So that means starting with e.g. the ScatterplotLayer instead of the PointLayer. We can also link to the ScatterplotLayer docs for this and it'll be hopefully more clear that we're exposing the same api as upstream

colormap helpers

provide at least a helper that takes in values 0-1 and maps them into the user-provided colormap

maybe have different clamping options, just like the GPU. either discrete which rounds to the nearest 1/256 color integer, or continuous which takes the ideal color in between the two nearest choices

Per-environment warnings

E.g. it's easy to check if you're in colab, and then print a warning over, say, 1M coordinates that it tends to get unstable

Try to deduplicate `@traitlets.validate`

It takes in a plural names, so maybe it would be possible to have a single validator and call it on

@traitlets.validate("get_radius", "get_fill_color", ...)

instead of having a separate one for each one

image

Auto-downcast numeric attribute types in `from_geopandas`

Check for float, signed int, unsigned int data types, and call pd.to_numeric(downcast=...).

It would be nice to check if this works with pyarrow-based data types as well.

This should be a kwarg, maybe named auto_downcast: bool = True?

Switch dataframe to be stored on widget as geodataframe?

Instead of storing the buffer on the widget, you could instead store a more structured object and customize the ipywidgets serialization.

Note that this will mean that the widget depends on geopandas instead of just interface with geopandas, so probably not desired.

Probably the best middle ground is to store the GeoArrow table representation (as a pyarrow.Table) on the widget

Data compression over the wire

Right now data is transferred from Python to JS fully uncompressed:

feather.write_feather(table, bio, compression="uncompressed")

Uncompressed data is fine for local kernels, where Python and the browser are on the same machine, but not ideal for remote kernels, like JupyterHub or Colab, where Python is on a remote server and data has to be downloaded before it can be rendered on a map.

Data Compression options

There are a few options for data compression:

  • Uncompressed
  • Apply a simple compression like gzip to the entire table buffer. This is simple to implement on both the Python and JS sides, but is quite slow
  • Apply compression in the Arrow IPC format. This file format supports only "light compression" (LZ4 or ZSTD) and doesn't do any other encoding like delta encoding for smaller file size. The downside is that reading compressed IPC files is not currently supported by Arrow JS.
  • Use Parquet. This has the most efficient compression, but it has the downsides of requiring a WebAssembly-based parser on the JS side. Adding the Wasm could make the build setup more difficult.

Different settings for local/remote?

Another question is whether it's possible to have different compression defaults based on whether the Python session is local or remote. Ideally a local Python kernel could use no compression while a remote Python kernel could use the most efficient compression.

The problem is that because Python-Jupyter follows a server-client model, I don't know of a good way to know from Python whether the attached client is running locally or remotely. There could be some heuristics like checking if google.colab in sys.modules but that's only valid in the colab case.

So it seems like the best default would be fast, moderate-size compression, and then have a parameter to let the user choose either no compression or slow, small-file-size compression.

Unscientific benchmarks

Unscientific benchmarks using the utah dataset of 1 million buildings (7M coords):

Compression Type File size Write time
Feather (uncompressed) 144 MB 17 ms
gzip full-buffer compression 64 MB 13 s
Feather (ZSTD) 80 MB 200 ms
Feather (LZ4) 97 MB 147 ms
Parquet (Snappy) 82 MB 444 ms
Parquet (gzip) 60 MB 4.5 s
Parquet (brotli) 45 MB 3.7 s
Parquet (ZSTD) 74 MB 466 ms
Parquet (ZSTD level 22) 41.6 MB 11 s
Parquet (ZSTD level 18) 41.6 MB 9.8 s
Parquet (ZSTD level 16) 48.3 MB 5.7 s
Parquet (ZSTD level 14) 49.8 MB 2.7 s
Parquet (ZSTD level 12) 49.8 MB 1.9 s
Parquet (ZSTD level 10) 49.8 MB 1.7 s
Parquet (ZSTD level 8) 50.3 MB 1.4 s
Parquet (ZSTD level 7) 50.3 MB 1.25 s
Parquet (ZSTD level 6) 51.4 MB 1.2 s
Parquet (ZSTD level 4) 57.8 MB 800 ms
Parquet (ZSTD level 2) 69.1 MB 560 ms

Given this, ZSTD around level ~7 seems to have a very good combination of write speed and file size, and likely makes sense as a default.

Sync the clicked index back to Python

It would be great, besides a tooltip to display on the JS side, to sync the index of the object that was clicked. Then the user can do gdf.iloc[map_.clicked_index] to retrieve the specific row

Note that this can probably be an array of indices?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.