Comments (14)
Yes, that's very recent (just being implemented!). The idea is that libraries producing an array only have to produce something that implements __arrow_c_array__
instead of an actual pyarrow.Array
. On the pyarrow side, anything that expected an Array
will (eventually) be able to accept anything that implements __arrow_c_array__
by checking hasattr(x, "__arrow_c_array__")
.
In R we don't have the ability to do hasattr()
...the closest we can do is define generics. The as_nanoarrow_array()
generic is easier for an arbitrary library to implement than arrow::as_arrow_array()
because nanoarrow is easier to depend on (and it would be a required dependency because nanoarrow is where the S3 method is defined). The adbcdrivermanager package takes advantage of this...you can do write_adbc(<anything that implements as_nanoarrow_array_stream()>, con)
and S3 dispatch takes care of the rest.
from arrow-extendr.
ToArrowRobj
is now implemented using {nanoarrow}
instead of {arrow}
It is implemented for:
- DataType
- ArrayData
- PrimitiveArray
- Field
- Schema
- RecordBatch
It is less clear how to handle FromArrowRobj
. Right now it expects arrow
class objects. The approach I am leaning towards right now is to check the class of the object and process accordingly.
Meaning the arrow class objects DataType
, Field
, Schema
, RecordBatch
, ArrayData
will be processed into their correct arrow-rs type. nanoarrow_array
will be processed into ArrayData
and nanodata_schema
can be processed into Field
, Schema
and DataType
. I think nanoarrow_stream
will need to be processed into RecordBatchReader
i think..
from arrow-extendr.
To/From thing is still new to me, but if I were in Rust and I wanted an arrow DataType, Field, or Schema from arbitrary user SEXP input, I'd want to call as_nanoarrow_schema()
on the SEXP and then do the FFI import based on the C object. I think the same pattern applies for ArrayData
...I'm less clear what the arrow-rs equivalents are of Table
and ChunkedArray
, but those would use as_nanoarrow_array_stream()
(as would RecordBatchReader
).
That will get you all Arrow objects for free (because as_nanoarrow_XXX()
are implemented for them already) plus any objects that have as_nanoarrow_array()
methods defined in other packages (e.g., sfc objects as of five minutes ago in geoarrow/geoarrow-c/r!)
from arrow-extendr.
Oh, and for an array you can get the schema from nanoarrow::infer_nanoarrow_schema()
🙂 .
from arrow-extendr.
Jinx
from arrow-extendr.
@eitsupi If i understand correctly, that's exactly what I'm aiming for here! There should be no matching necessary!
from arrow-extendr.
Probably related: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html
Prior to this, many libraries simply provided export to PyArrow data structures, using the _import_from_c and _export_from_c methods. However, this always required PyArrow to be installed. In addition, those APIs could cause memory leaks if handled improperly.
from arrow-extendr.
Ref: pola-rs/r-polars#5
from arrow-extendr.
To my knowledge there is no concept of a Table
or a ChunkedArrow
in arrow-rs as of yet. The RecordBatch
serves the purpose of the Table
.
Another question if you feel so kind: getting an arrow array using {arrow}
isnt so bad with the export_to_c()
function which takes pointers to a schema and an array and moves them (i think thats what is happening).
Using nanoarrow, i'm not so sure how to move the single pointer of the array into schema + array (or maybe that just doesnt happen?)
from arrow-extendr.
Recently, the polars package has started using the R! macro to execute as_* functions on the R side and then load Arrow objects on the Rust side.
With this method, we don't need a match arm on the Rust side, just define the S3 method for as_nanoarrow_array_stream on the R side, so isn't it simpler and has a wider range of support?
from arrow-extendr.
I think nanoarrow::nanoarrow_pointer_export(<the_nanoarrow_object>, <the address of the arrow-rs FFI object as a string>)
is what you want!
To my knowledge there is no concept of a Table or a ChunkedArrow in arrow-rs as of yet.
Good to know! It's a bit of a bummer...the ability to leave chunks as they are is often helpful (but not something you have to deal with now 🙂 )
from arrow-extendr.
I think
nanoarrow::nanoarrow_pointer_export(<the_nanoarrow_object>, <the address of the arrow-rs FFI object as a string>)
is what you want!
Yeah, this did the trick! It turns out that the arrow-rs
FFI module requires a schema. Those aren't present on the array so I used infer_nanoarrow_schema()
and also exported that pointer.
from arrow-extendr.
@eitsupi
I used the DBI example (thank you!!!!!) in the docs. Does this look like what you're after?
https://josiahparry.github.io/arrow-extendr/arrow_extendr/index.html
aside: closing this issue since it now uses nanoarrow in Rust -> R but still allowing arrow -> Rust as well as nanoarrow -> Rust
from arrow-extendr.
I used the DBI example (thank you!!!!!) in the docs. Does this look like what you're after?\nhttps://josiahparry.github.io/arrow-extendr/arrow_extendr/index.html
Looks great!
from arrow-extendr.
Related Issues (7)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from arrow-extendr.