Giter Club home page Giter Club logo

arrow-js-ffi's Introduction

arrow-js-ffi

Interpret Arrow memory across the WebAssembly boundary without serialization.

Why?

Arrow is a high-performance memory layout for analytical programs. Since Arrow's memory layout is defined to be the same in every implementation, programs that use Arrow in WebAssembly are using the same exact layout that Arrow JS implements! This means we can use plain ArrayBuffers to move highly structured data back and forth to WebAssembly memory, entirely avoiding serialization.

I wrote an interactive blog post that goes into more detail on why this is useful and how this library implements Arrow's C Data Interface in JavaScript.

Usage

This package exports two functions, parseField for parsing the ArrowSchema struct into an arrow.Field and parseVector for parsing the ArrowArray struct into an arrow.Vector.

parseField

Parse an ArrowSchema C FFI struct into an arrow.Field instance. The Field is necessary for later using parseVector below.

  • buffer (ArrayBuffer): The WebAssembly.Memory instance to read from.
  • ptr (number): The numeric pointer in buffer where the C struct is located.
const WASM_MEMORY: WebAssembly.Memory = ...
const field = parseField(WASM_MEMORY.buffer, fieldPtr);

parseVector

Parse an ArrowArray C FFI struct into an arrow.Vector instance. Multiple Vector instances can be joined to make an arrow.Table.

  • buffer (ArrayBuffer): The WebAssembly.Memory instance to read from.
  • ptr (number): The numeric pointer in buffer where the C struct is located.
  • dataType (arrow.DataType): The type of the vector to parse. This is retrieved from field.type on the result of parseField.
  • copy (boolean): If true, will copy data across the Wasm boundary, allowing you to delete the copy on the Wasm side. If false, the resulting arrow.Vector objects will be views on Wasm memory. This requires careful usage as the arrays will become invalid if the memory region in Wasm changes.
const WASM_MEMORY: WebAssembly.Memory = ...
const wasmVector = parseVector(WASM_MEMORY.buffer, arrayPtr, field.type);
// Copy arrays into JS instead of creating views
const wasmVector = parseVector(WASM_MEMORY.buffer, arrayPtr, field.type, true);

parseRecordBatch

Parse an ArrowArray C FFI struct plus an ArrowSchema C FFI struct into an arrow.RecordBatch instance. Note that the underlying array and field must be a Struct type. In essence a Struct array is used to mimic a RecordBatch while only being one array.

  • buffer (ArrayBuffer): The WebAssembly.Memory instance to read from.
  • arrayPtr (number): The numeric pointer in buffer where the array C struct is located.
  • schemaPtr (number): The numeric pointer in buffer where the field C struct is located.
  • copy (boolean): If true, will copy data across the Wasm boundary, allowing you to delete the copy on the Wasm side. If false, the resulting arrow.Vector objects will be views on Wasm memory. This requires careful usage as the arrays will become invalid if the memory region in Wasm changes.
const WASM_MEMORY: WebAssembly.Memory = ...
// Pass `true` to copy arrays across the boundary instead of creating views.
const recordBatch = parseRecordBatch(WASM_MEMORY.buffer, arrayPtr, fieldPtr, true);

Type Support

Most of the unsupported types should be pretty straightforward to implement; they just need some testing.

Primitive Types

  • Null
  • Boolean
  • Int8
  • Uint8
  • Int16
  • Uint16
  • Int32
  • Uint32
  • Int64
  • Uint64
  • Float16
  • Float32
  • Float64

Binary & String

  • Binary
  • Large Binary (Not implemented by Arrow JS but supported by downcasting to Binary.)
  • String
  • Large String (Not implemented by Arrow JS but supported by downcasting to String.)
  • Fixed-width Binary

Decimal

  • Decimal128 (failing a test)
  • Decimal256 (failing a test)

Temporal Types

  • Date32
  • Date64
  • Time32
  • Time64
  • Timestamp (with timezone)
  • Duration
  • Interval

Nested Types

  • List
  • Large List (Not implemented by Arrow JS but supported by downcasting to List.)
  • Fixed-size List
  • Struct
  • Map
  • Dense Union
  • Sparse Union
  • Dictionary-encoded arrays

Extension Types

  • Field metadata is preserved.

TODO:

  • Call the release callback on the C structs. This requires figuring out how to call C function pointers from JS.

arrow-js-ffi's People

Contributors

kylebarron avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.