developmentseed / lonboard Goto Github PK
View Code? Open in Web Editor NEWA Python library for fast, interactive geospatial vector data visualization in Jupyter.
Home Page: https://developmentseed.org/lonboard/latest/
License: MIT License
A Python library for fast, interactive geospatial vector data visualization in Jupyter.
Home Page: https://developmentseed.org/lonboard/latest/
License: MIT License
compute_view
crashes when empty points exist (giving an infinite bounding box)
Maybe give a warning that null points exist, and then filter them out for creating a bbox?
#
The rendering API/options will be different based on the type of layer. Should you have a PointWidget, LineStringWidget, PolygonWidget, and then have .get_fill_color
as an autocompletion-able attribute on only the PolygonWidget
? And have like create_widget(gdf)
as a top-level API that creates the table and then switches to create one of the widgets?
maybe print a warning and then reproject automatically?
Right now we include an _initial_view_state
that lets Python set the initial view state.
deckgl allows you to pass in an initialViewState
param which then lets deck manage the internal view state. Or you can manage the view state independently from deck, which you update with onViewStateChange
and pass into deck's viewState
parameter.
onViewStateChange
(because we don't want to slow the deck updates)The existing implementation of useModelState
(in anywidget/react) is:
export function useModelState(key) {
let model = useModel();
let [value, setValue] = React.useState(model.get(key));
React.useEffect(() => {
let callback = () => setValue(model.get(key));
model.on(`change:${key}`, callback);
return () => model.off(`change:${key}`, callback);
}, [model, key]);
return [
value,
(value) => {
model.set(key, value);
model.save_changes();
},
];
}
We probably want something like useModelStateDebounced
which returns a callback that immediately calls model.set(key, value)
but debounces for model.save_changes()
.
Note to self: https://www.joshwcomeau.com/snippets/javascript/debounce/ for implementation of debounce + note to use useMemo
in react. It's unclear if we do want useMemo
because we seemingly do want to re-render the react component on every view state change, because deck is reactive and won't re-render the full map
From the geopandas.plot docstring
Name of a choropleth classification scheme (requires mapclassify). A mapclassify.MapClassifier object will be used under the hood. Supported are all schemes provided by mapclassify (e.g. ‘BoxPlot’, ‘EqualInterval’, ‘FisherJenks’, ‘FisherJenksSampled’, ‘HeadTailBreaks’, ‘JenksCaspall’, ‘JenksCaspallForced’, ‘JenksCaspallSampled’, ‘MaxP’, ‘MaximumBreaks’, ‘NaturalBreaks’, ‘Quantiles’, ‘Percentiles’, ‘StdMean’, ‘UserDefined’). Arguments can be passed in classification_kwds.
provide at least a helper that takes in values 0-1 and maps them into the user-provided colormap
maybe have different clamping options, just like the GPU. either discrete which rounds to the nearest 1/256 color integer, or continuous which takes the ideal color in between the two nearest choices
The current map is missing any attribution, and should be fixed
I'm thinking it's better to start aligned with deck.gl and then change names in the future if we find it easier... 🤷♂️
So that means starting with e.g. the ScatterplotLayer
instead of the PointLayer
. We can also link to the ScatterplotLayer
docs for this and it'll be hopefully more clear that we're exposing the same api as upstream
It would be great, besides a tooltip to display on the JS side, to sync the index of the object that was clicked. Then the user can do gdf.iloc[map_.clicked_index]
to retrieve the specific row
Note that this can probably be an array of indices?
Check for float, signed int, unsigned int data types, and call pd.to_numeric(downcast=...)
.
It would be nice to check if this works with pyarrow-based data types as well.
This should be a kwarg, maybe named auto_downcast: bool = True
?
Refer to pydeck binary serialization. Might be possible to store the Table
object directly on the widget, with a custom "to_json" which creates {"data": memoryview(feather_buffer)}
or similar
https://github.com/visgl/deck.gl/blob/master/bindings/pydeck/pydeck/widget/widget.py#L62C75-L62C75
https://github.com/visgl/deck.gl/blob/master/bindings/pydeck/pydeck/data_utils/binary_transfer.py
Should be more careful to signify what's public and what's not
Recall that you can use a Protocol with a __call__
method to define the API for a function callback. So for accessors like get fill color, you should define this protocol to take in a geodataframe and return an NDArray[np.uint8]
you should also have runtime checks to verify the correct data format
Not quite supported yet in deck.gl-layers
E.g. it's easy to check if you're in colab, and then print a warning over, say, 1M coordinates that it tends to get unstable
One note about the difference between datashader and my deck.gl-based visualization... It looked like datashader was re rendering in a specific area when joris zoomed in and panned around. So in that sense datashader is "minimizing rendering" based on the viewport. My deck.gl-based renderer does not minimize rendering... When I'm rendering 3 million points, all 3 million of those are loaded onto the user's GPU at once. So in that sense it's not "infinitely scalable", it just uses your hardware better than any previous library
Like when you zoom in with datashader, it'll re-rasterize based on a new aggregation with the current viewport, and can do that up to your RAM size. So datashader is limited by your RAM (I think?, maybe it supports larger-than-ram) while lonboard is limited by your GPU RAM.
TraitError: The 'get_radius' trait of a ScatterplotLayer instance expected a float value or numpy ndarray or pyarrow array representing an array of floats, not the Series ...
Goals:
get_fill_color
to propagate to the mapOn the Python side I attempted to write a sort of "container widget" (ref manzt/anywidget#194), where each Layer
object is a Python jupyter widget, and where the Map
object collects each of the underlying Widgets. It's useful to have each Layer be its own Widget, because that enables event handling on each layer.
from ipywidgets import Widget # Base widget class
from anywidget import AnyWidget # high level widget helper that subclasses Widget
class BaseLayer(Widget):
"""Base class for our layer types"""
...
class PointLayer(BaseLayer):
...
class LineStringLayer(BaseLayer):
...
class PolygonLayer(BaseLayer):
...
class Map(AnyWidget):
_esm = "path to esm JS bundle"
_css = "optional path to CSS styling"
# list of instances of classes that subclass from BaseLayer
layers = List[BaseLayer]
Then a user will create a variety of layers and instantiate a Map
object:
import geopandas as gpd
point_data = gpd.GeoDataFrame(...)
polygon_data = gpd.GeoDataFrame(...)
point_layer = PointLayer(point_data)
polygon_layer = PolygonLayer(polygon_data)
map = Map(layers=[point_layer, polygon_layer])
map # putting this last in a cell "prints" the object..., which in this case renders the map
The goal with this setup is to let a user run
point_layer.fill_color = [255, 0, 0]
and the points on the map turn red.
In order for this to happen, the JS side needs to be able to receive these events and re-render
When you render the map object
This will then sync data with the JS side and render the App
object
Line 142 in 8cd1a19
The data from the Python side is available on the model
object on the JS side. This can be accessed either via useModel
or via anywidget's helper useModelState
. useModelState
is a small shim around useState
and useEffect
to keep track of the state of that value and propagate updates when the model announces that a field has changed.
The crux of the issue is that if you just use model.get()
, you can access the initial value, but you never know when the value has been updated. Using useModelState
from the "top level" works well, but only lets you access the attributes on the top level model.
My current code appeared to work well, but uses a synchronous private attribute that didn't appear to work in colab. And then switching to the async function didn't seem to work for that. See #34 for a description of this issue.
So the goal is to define the JS object in such a way that we hook into a model's event handlers so that we know when the on:change
events happen and can update the map accordingly.
e.g.
class ColorAccessorWidget(Widget):
color = ColorAccessor()
def test_color_accessor_validation():
color_arr = np.array([1, 2, 3]).reshape(-1, 3)
ColorAccessorWidget(color=color_arr)
raises
TraitError: The 'color' trait of a ColorAccessorWidget instance expected a tuple or list representing an RGB(A) color or numpy ndarray or pyarrow FixedSizeList representing an array of RGB(A) colors, not the ndarray array([[1, 2, 3]]).
The issue here is that the input array has a dtype of np.int64
instead of np.uint8
, but the error isn't displaying properly
e.g. some examples should be illustrative and just use a very small data download. other examples should show off performance, and thus require large datasets (and maybe a large filter of an even larger dataset) but should be grouped in such a way to make it very clear.
When using with jupyter sidecar, the map can be placed on the right side of the notebook screen:
Right now the div containing the deck.gl widget is hard-coded to 500px:
lonboard/src/scatterplot-layer.tsx
Line 80 in d4a05a3
This means that when used with sidecar, if the screen is more than 500px tall, it'll have a weird empty space at the bottom.
Ideally we want to let the widget fill all available height in its containing div, but I can't figure out how to do that. When I switch to, say,
style={{ display: "flex", flexFlow: "column", flexGrow: 1, overflow: "auto" }}
it creates a div with zero height:
cc @vgeorge
There's so much helper code here to create geoarrow-formatted data and validate other attributes, that it would be nice to have a private method to export for test data for the JS lib
Can create a binder badge to the repo from this page: https://mybinder.org/; more env docs here: https://mybinder.readthedocs.io/en/latest/introduction.html
mutate existing map objects whenever possible. Every time you create a new map object from scratch, you have to download all that new data to your browser.
I was able to load 20 million polygons in lonboard. It was amazing! Now I am trying to figure out how to load 60 million points without having to use GeoPandas but I keep hitting code paths that expect either an interleaved list or paths that go back to numpy or paths that expect a byte like object from C.
Here is roughly what I am trying:
import lonboard
import gzip
import geoarrow.pyarrow as ga
import pyarrow.csv as pv
with gzip.open("/Users/x/data/points_s2_level_4_gzip/397_buildings.csv.gz") as fp:
table = pv.read_csv(fp)
points = ga.point().from_geobuffers(None, table["latitude"], y=table["longitude"])
geoarrow_schema = pa.schema([pa.field("geometry", points.type, metadata={b"ARROW:extension:name": b"geoarrow.point"})])
point_table = pa.Table.from_arrays([points], schema=geoarrow_schema)
point_table.schema.field("geometry").metadata.get(b"ARROW:extension:name")
map_ = lonboard.ScatterplotLayer(table=point_table)
Right now data is transferred from Python to JS fully uncompressed:
Line 68 in 6a64c6f
Uncompressed data is fine for local kernels, where Python and the browser are on the same machine, but not ideal for remote kernels, like JupyterHub or Colab, where Python is on a remote server and data has to be downloaded before it can be rendered on a map.
There are a few options for data compression:
Another question is whether it's possible to have different compression defaults based on whether the Python session is local or remote. Ideally a local Python kernel could use no compression while a remote Python kernel could use the most efficient compression.
The problem is that because Python-Jupyter follows a server-client model, I don't know of a good way to know from Python whether the attached client is running locally or remotely. There could be some heuristics like checking if google.colab in sys.modules
but that's only valid in the colab case.
So it seems like the best default would be fast, moderate-size compression, and then have a parameter to let the user choose either no compression or slow, small-file-size compression.
Unscientific benchmarks using the utah dataset of 1 million buildings (7M coords):
Compression Type | File size | Write time |
---|---|---|
Feather (uncompressed) | 144 MB | 17 ms |
gzip full-buffer compression | 64 MB | 13 s |
Feather (ZSTD) | 80 MB | 200 ms |
Feather (LZ4) | 97 MB | 147 ms |
Parquet (Snappy) | 82 MB | 444 ms |
Parquet (gzip) | 60 MB | 4.5 s |
Parquet (brotli) | 45 MB | 3.7 s |
Parquet (ZSTD) | 74 MB | 466 ms |
Parquet (ZSTD level 22) | 41.6 MB | 11 s |
Parquet (ZSTD level 18) | 41.6 MB | 9.8 s |
Parquet (ZSTD level 16) | 48.3 MB | 5.7 s |
Parquet (ZSTD level 14) | 49.8 MB | 2.7 s |
Parquet (ZSTD level 12) | 49.8 MB | 1.9 s |
Parquet (ZSTD level 10) | 49.8 MB | 1.7 s |
Parquet (ZSTD level 8) | 50.3 MB | 1.4 s |
Parquet (ZSTD level 7) | 50.3 MB | 1.25 s |
Parquet (ZSTD level 6) | 51.4 MB | 1.2 s |
Parquet (ZSTD level 4) | 57.8 MB | 800 ms |
Parquet (ZSTD level 2) | 69.1 MB | 560 ms |
Given this, ZSTD around level ~7 seems to have a very good combination of write speed and file size, and likely makes sense as a default.
Hi, I'm really looking forward to playing with lonboard but I can't get viz to to render. I've tried the two example notebooks... They render in your Binder and Colab links, but when I try them in local Jupyter, nothing gets displayed but a blank white bar in the cell output. The objects returned from viz seem valid -- I can see coordinates etc when I print them as a string. I assume there might be a widget support issue or something like that. I'm using conda to create env's on a Windows machine. Tried various python 3 versions, tried upgrading/downgrading Jupyter-related packages but no luck so far. Also tried upgrading & downgrading pyogrio, lonboard, pyarrow...
It would be great to have a tooltip that shows the row of data when hovered or clicked. This relies on geoarrow/deck.gl-layers#30
Instead of storing the buffer on the widget, you could instead store a more structured object and customize the ipywidgets serialization.
Note that this will mean that the widget depends on geopandas instead of just interface with geopandas, so probably not desired.
Probably the best middle ground is to store the GeoArrow table representation (as a pyarrow.Table) on the widget
Maybe before erroring allow _geo_interface_
too?
A user draws a bounding box to select an array of feature indices that fall within the bounding box. The features are highlighted on the map and selected in the geodataframe.
Useful for exploratory data analysis.
Kyle to add details.
The deck.gl SolidPolygonLayer
has a render option _windingOrder
which says
This prop is only effective with
_normalize: false
. It specifies the winding order of rings in the polygon data, one of:
'CW'
: outer-ring is clockwise, and holes are counter-clockwise'CCW'
: outer-ring is counter-clockwise, and holes are clockwiseThe proper value depends on the source of your data. Most geometry formats enforce a specific winding order. Incorrectly set winding order will cause an extruded polygon's surfaces to be flipped, affecting culling and the lighting effect.
Thus, this is probably not the highest priority, given that it only happens with extruded polygons, but should be fixed eventually.
In GEOS, polygon winding order is unspecified, so we'd need to check/force it manually. There's no vectorized shapely function to do this, ref shapely/shapely#1366. So the options are either:
So the end goal here is:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.