I assume the answer will be no , in which case feel free to close this issue.<

Zero-copy compatibility with the JS implementation? about static_aabb2d_index HOT 4 OPEN

jbuckmccready commented on July 18, 2024

Zero-copy compatibility with the JS implementation?

from static_aabb2d_index.

Comments (4)

kylebarron commented on July 18, 2024 3

I'm trying to build out an open ecosystem for extensible, modular geospatial analytical data processing. I'm excited about Rust to speed up Python and JS via compiled extension modules. It's true that you can create Python bindings to a Rust library, have Rust manage the memory, and never need to worry about zero-copy. But when someone else writes a C library that would like to interface with your data, if you don't have an ABI-stable way to share the data, you need to serialize it and they need to deserialize it.

For example, In Python, Shapely (and by extension the C library GEOS) is used for most geospatial data storage. But separate Python libraries with C extensions can't use the same GEOS memory because the underlying storage isn't ABI-stable. So there has to be a serde step in between.

Apache Arrow solves this problem for generic tabular data, because it defines a language-independent, ABI-stable memory layout. So you can move memory between Python/Rust/C just by changing ownership of the pointer (e.g. Polars does this from Rust). GeoArrow, a WIP spec I'm a part of, builds on top of Arrow and solves that problem for geometries, so you can share an array of geometries between Python/Rust/C for free. (I've been working on a rust implementation of this).

But it's very useful to be able to share large spatial data, declare that the data is already spatially ordered, and share a spatial index for free. Of the RTree implementations I'm aware of, the only one that's fast, compact, and has an ABI-stable memory layout is the Flatbush algorithm.

My time horizon is pretty long... I'm planning on 2-5 years to maturity of GeoArrow. But watching how much innovation in the non-spatial realm is happening from Arrow, I think there's massive potential in a geospatial zero-copy ecosystem. And given that spatial indexes are absolutely core to geospatial algorithms, it's worth investing in a zero-copy rtree implementation.

from static_aabb2d_index.

jbuckmccready commented on July 18, 2024 1

Hmm, I don't have a use case for this but I understand the utility if trying to interop with other languages using the same library in the same memory space. I think it could be done in Rust using the bytemuck crate (https://docs.rs/bytemuck/latest/bytemuck/), create a buffer of bytes and then cast subslices of the buffer to the different types (number type and index type) as needed. It would require additional constraints on the generic traits and add a dependency on bytemuck.

I am open to pull requests that add this functionality as an additional module behind a feature flag (to avoid bytemuck dependency if not needed). The module would contain new spatial index types which utilize a single byte buffer, have different generic number constraints to work with bytemuck (or possibly not generic at all?), and share the same core algorithm (spatial index/math functions could be shared between the the modules where possible).

If you are wanting to share the data using the same format but don't need to have it zero copy in the same memory space then writing a serializer/deserializer to/from a byte buffer in the same format would be simpler.

from static_aabb2d_index.

kylebarron commented on July 18, 2024

Thanks for the reply! If you don't have an FFI-related use case, it might be simpler for me to prototype the library from scratch instead of trying to shoehorn it into how you've written the library so far. And then maybe at some point in the future once my version is implemented, we can see whether it makes sense to keep them as two libraries or combine them.

If you are wanting to share the data using the same format but don't need to have it zero copy in the same memory space then writing a serializer/deserializer to/from a byte buffer in the same format would be simpler.

Yeah definitely, but I'm really focused on applications that share the same memory space (e.g. for rust-python), so even though zero-copy will be more work, I'm inclined towards that.

from static_aabb2d_index.

jbuckmccready commented on July 18, 2024

EDIT: Removed most of this reply because it was already answered!

I am curious what your use case is that causes the non-zero copy solution to be noticeable/the slow step, and if you do pursue the zero copy byte reinterpretation approach I'd be curious to see how it works out, cool project.

from static_aabb2d_index.

Zero-copy compatibility with the JS implementation? about static_aabb2d_index HOT 4 OPEN

Comments (4)

Related Issues (3)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent