Giter Club home page Giter Club logo

arrow-extendr's Introduction

arrow_extendr

arrow-extendr is a crate that facilitates the transfer of Apache Arrow memory between R and Rust. It utilizes extendr, the {nanoarrow} R package, and arrow-rs.

Versioning

At present, versions of arrow-rs are not compatible with each other. This means if your crate uses arrow-rs version 48.0.1, then the arrow-extendr must also use that same version. As such, arrow-extendr uses the same versions as arrow-rs so that it is easy to match the required versions you need.

Versions:

  • 51.0.0
  • 50.0.0 (compatible with geoarrow-rs 0.1.0)
  • 49.0.0-geoarrow (not available on crates.io but is the current Git version)
  • 48.0.1
  • 49.0.0

Motivating Example

Say we have the following DBI connection which we will send requests to using arrow. The result of dbGetQueryArrow() is a nanoarrow_array_stream. We want to count the number of rows in each batch of the steam using Rust.

# adapted from https://github.com/r-dbi/DBI/blob/main/vignettes/DBI-arrow.Rmd

library(DBI)
con <- dbConnect(RSQLite::SQLite())
data <- data.frame(
  a = runif(10000, 0, 10),
  b = rnorm(10000, 4.5),
  c = sample(letters, 10000, TRUE)
)

dbWriteTable(con, "tbl", data)

We can write an extendr function which creates an ArrowArrayStreamReader from an &Robj. In the function we instantiate a counter to keep track of the number of rows per chunk. For each chunk we print the number of rows.

use extendr_api::prelude::*;
use arrow_extendr::from::FromArrowRobj;
use arrow::ffi_stream::ArrowArrayStreamReader;

#[extendr]
/// @export
fn process_stream(stream: Robj) -> i32 {
    let rb = ArrowArrayStreamReader::from_arrow_robj(&stream)
        .unwrap();

    let mut n = 0;

    rprintln!("Processing `ArrowArrayStreamReader`...");
    for chunk in rb {
        let chunk_rows = chunk.unwrap().num_rows();
        rprintln!("Found {chunk_rows} rows");
        n += chunk_rows as i32;
    }

    n
}

With this function we can use it on the output of dbGetQueryArrow() or other Arrow related DBI functions.

query <- dbGetQueryArrow(con, "SELECT * FROM tbl WHERE a < 3")
process_stream(query)
#> Processing `ArrowArrayStreamReader`...
#> Found 256 rows
#> Found 256 rows
#> Found 256 rows
#> ... truncated ...
#> Found 256 rows
#> Found 256 rows
#> Found 143 rows
#> [1] 2959

Using arrow-extendr in a package

To use arrow-extendr in an R package first create an R package and make it an extendr package with:

usethis::create_package("my_package")
rextendr::use_extendr();

Next, you have to ensure that nanoarrow is a dependency of the package since arrow-extendr will call functions from nanoarrow to convert between R and Arrow memory. To do this run usethis::use_package("nanoarrow") to add it to your Imports field in the DESCRIPTION file.

arrow-extendr's People

Contributors

era127 avatar josiahparry avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.