Giter Club home page Giter Club logo

Comments (14)

paleolimbot avatar paleolimbot commented on June 1, 2024 1

Yes, that's very recent (just being implemented!). The idea is that libraries producing an array only have to produce something that implements __arrow_c_array__ instead of an actual pyarrow.Array. On the pyarrow side, anything that expected an Array will (eventually) be able to accept anything that implements __arrow_c_array__ by checking hasattr(x, "__arrow_c_array__").

In R we don't have the ability to do hasattr()...the closest we can do is define generics. The as_nanoarrow_array() generic is easier for an arbitrary library to implement than arrow::as_arrow_array() because nanoarrow is easier to depend on (and it would be a required dependency because nanoarrow is where the S3 method is defined). The adbcdrivermanager package takes advantage of this...you can do write_adbc(<anything that implements as_nanoarrow_array_stream()>, con) and S3 dispatch takes care of the rest.

from arrow-extendr.

JosiahParry avatar JosiahParry commented on June 1, 2024 1

ToArrowRobj is now implemented using {nanoarrow} instead of {arrow}

It is implemented for:

  • DataType
  • ArrayData
  • PrimitiveArray
  • Field
  • Schema
  • RecordBatch

It is less clear how to handle FromArrowRobj. Right now it expects arrow class objects. The approach I am leaning towards right now is to check the class of the object and process accordingly.

Meaning the arrow class objects DataType, Field, Schema, RecordBatch, ArrayData will be processed into their correct arrow-rs type. nanoarrow_array will be processed into ArrayData and nanodata_schema can be processed into Field, Schema and DataType. I think nanoarrow_stream will need to be processed into RecordBatchReader i think..

from arrow-extendr.

paleolimbot avatar paleolimbot commented on June 1, 2024 1

To/From thing is still new to me, but if I were in Rust and I wanted an arrow DataType, Field, or Schema from arbitrary user SEXP input, I'd want to call as_nanoarrow_schema() on the SEXP and then do the FFI import based on the C object. I think the same pattern applies for ArrayData...I'm less clear what the arrow-rs equivalents are of Table and ChunkedArray, but those would use as_nanoarrow_array_stream() (as would RecordBatchReader).

That will get you all Arrow objects for free (because as_nanoarrow_XXX() are implemented for them already) plus any objects that have as_nanoarrow_array() methods defined in other packages (e.g., sfc objects as of five minutes ago in geoarrow/geoarrow-c/r!)

from arrow-extendr.

paleolimbot avatar paleolimbot commented on June 1, 2024 1

Oh, and for an array you can get the schema from nanoarrow::infer_nanoarrow_schema() 🙂 .

from arrow-extendr.

JosiahParry avatar JosiahParry commented on June 1, 2024 1

Jinx

from arrow-extendr.

JosiahParry avatar JosiahParry commented on June 1, 2024 1

@eitsupi If i understand correctly, that's exactly what I'm aiming for here! There should be no matching necessary!

from arrow-extendr.

JosiahParry avatar JosiahParry commented on June 1, 2024

Probably related: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html

Prior to this, many libraries simply provided export to PyArrow data structures, using the _import_from_c and _export_from_c methods. However, this always required PyArrow to be installed. In addition, those APIs could cause memory leaks if handled improperly.

from arrow-extendr.

eitsupi avatar eitsupi commented on June 1, 2024

Ref: pola-rs/r-polars#5

from arrow-extendr.

JosiahParry avatar JosiahParry commented on June 1, 2024

To my knowledge there is no concept of a Table or a ChunkedArrow in arrow-rs as of yet. The RecordBatch serves the purpose of the Table.


Another question if you feel so kind: getting an arrow array using {arrow} isnt so bad with the export_to_c() function which takes pointers to a schema and an array and moves them (i think thats what is happening).

Using nanoarrow, i'm not so sure how to move the single pointer of the array into schema + array (or maybe that just doesnt happen?)

from arrow-extendr.

eitsupi avatar eitsupi commented on June 1, 2024

Recently, the polars package has started using the R! macro to execute as_* functions on the R side and then load Arrow objects on the Rust side.
With this method, we don't need a match arm on the Rust side, just define the S3 method for as_nanoarrow_array_stream on the R side, so isn't it simpler and has a wider range of support?

from arrow-extendr.

paleolimbot avatar paleolimbot commented on June 1, 2024

I think nanoarrow::nanoarrow_pointer_export(<the_nanoarrow_object>, <the address of the arrow-rs FFI object as a string>) is what you want!

To my knowledge there is no concept of a Table or a ChunkedArrow in arrow-rs as of yet.

Good to know! It's a bit of a bummer...the ability to leave chunks as they are is often helpful (but not something you have to deal with now 🙂 )

from arrow-extendr.

JosiahParry avatar JosiahParry commented on June 1, 2024

I think nanoarrow::nanoarrow_pointer_export(<the_nanoarrow_object>, <the address of the arrow-rs FFI object as a string>) is what you want!

Yeah, this did the trick! It turns out that the arrow-rs FFI module requires a schema. Those aren't present on the array so I used infer_nanoarrow_schema() and also exported that pointer.

from arrow-extendr.

JosiahParry avatar JosiahParry commented on June 1, 2024

@eitsupi
I used the DBI example (thank you!!!!!) in the docs. Does this look like what you're after?
https://josiahparry.github.io/arrow-extendr/arrow_extendr/index.html


aside: closing this issue since it now uses nanoarrow in Rust -> R but still allowing arrow -> Rust as well as nanoarrow -> Rust

from arrow-extendr.

eitsupi avatar eitsupi commented on June 1, 2024

I used the DBI example (thank you!!!!!) in the docs. Does this look like what you're after?\nhttps://josiahparry.github.io/arrow-extendr/arrow_extendr/index.html

Looks great!

from arrow-extendr.

Related Issues (7)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.