Giter Club home page Giter Club logo

Comments (4)

kylebarron avatar kylebarron commented on July 18, 2024 3

I'm trying to build out an open ecosystem for extensible, modular geospatial analytical data processing. I'm excited about Rust to speed up Python and JS via compiled extension modules. It's true that you can create Python bindings to a Rust library, have Rust manage the memory, and never need to worry about zero-copy. But when someone else writes a C library that would like to interface with your data, if you don't have an ABI-stable way to share the data, you need to serialize it and they need to deserialize it.

For example, In Python, Shapely (and by extension the C library GEOS) is used for most geospatial data storage. But separate Python libraries with C extensions can't use the same GEOS memory because the underlying storage isn't ABI-stable. So there has to be a serde step in between.

Apache Arrow solves this problem for generic tabular data, because it defines a language-independent, ABI-stable memory layout. So you can move memory between Python/Rust/C just by changing ownership of the pointer (e.g. Polars does this from Rust). GeoArrow, a WIP spec I'm a part of, builds on top of Arrow and solves that problem for geometries, so you can share an array of geometries between Python/Rust/C for free. (I've been working on a rust implementation of this).

But it's very useful to be able to share large spatial data, declare that the data is already spatially ordered, and share a spatial index for free. Of the RTree implementations I'm aware of, the only one that's fast, compact, and has an ABI-stable memory layout is the Flatbush algorithm.

My time horizon is pretty long... I'm planning on 2-5 years to maturity of GeoArrow. But watching how much innovation in the non-spatial realm is happening from Arrow, I think there's massive potential in a geospatial zero-copy ecosystem. And given that spatial indexes are absolutely core to geospatial algorithms, it's worth investing in a zero-copy rtree implementation.

from static_aabb2d_index.

jbuckmccready avatar jbuckmccready commented on July 18, 2024 1

Hmm, I don't have a use case for this but I understand the utility if trying to interop with other languages using the same library in the same memory space. I think it could be done in Rust using the bytemuck crate (https://docs.rs/bytemuck/latest/bytemuck/), create a buffer of bytes and then cast subslices of the buffer to the different types (number type and index type) as needed. It would require additional constraints on the generic traits and add a dependency on bytemuck.

I am open to pull requests that add this functionality as an additional module behind a feature flag (to avoid bytemuck dependency if not needed). The module would contain new spatial index types which utilize a single byte buffer, have different generic number constraints to work with bytemuck (or possibly not generic at all?), and share the same core algorithm (spatial index/math functions could be shared between the the modules where possible).

If you are wanting to share the data using the same format but don't need to have it zero copy in the same memory space then writing a serializer/deserializer to/from a byte buffer in the same format would be simpler.

from static_aabb2d_index.

kylebarron avatar kylebarron commented on July 18, 2024

Thanks for the reply! If you don't have an FFI-related use case, it might be simpler for me to prototype the library from scratch instead of trying to shoehorn it into how you've written the library so far. And then maybe at some point in the future once my version is implemented, we can see whether it makes sense to keep them as two libraries or combine them.

If you are wanting to share the data using the same format but don't need to have it zero copy in the same memory space then writing a serializer/deserializer to/from a byte buffer in the same format would be simpler.

Yeah definitely, but I'm really focused on applications that share the same memory space (e.g. for rust-python), so even though zero-copy will be more work, I'm inclined towards that.

from static_aabb2d_index.

jbuckmccready avatar jbuckmccready commented on July 18, 2024

EDIT: Removed most of this reply because it was already answered!

I am curious what your use case is that causes the non-zero copy solution to be noticeable/the slow step, and if you do pursue the zero copy byte reinterpretation approach I'd be curious to see how it works out, cool project.

from static_aabb2d_index.

Related Issues (3)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.