BigArrays.jl

storing and accessing large julia array using different backends.

Features

serverless, clients do IO directly
arbitrary subset cutout (saving should be chunk size aligned)
extensible with multiple backends
arbitrary shape, the dataset boundary can be curve-like
arbitrary dataset size (in theory, tested dataset size: ~ 9 TB)
chunk compression with gzip/blosclz/jpeg
highly scalable due to the serverless design
arbitrary data type

supported backends

AWS S3
Google Cloud Storage
Local HDF5 files

Installation

Pkg.clone("https://github.com/jingpengwu/AWS.jl.git")
Pkg.clone("https://github.com/jingpengwu/GoogleCloud.jl.git")
Pkg.clone("https://github.com/seung-lab/BigArrays.jl.git")
Pkg.clone("https://github.com/seung-lab/S3Dicts.jl.git")
Pkg.clone("https://github.com/seung-lab/GSDicts.jl.git")

usage

BigArrays do not have limit of dataset size, if your reading index is outside of existing file range, will return an array filled with zeros.

use the hdf5 files backend

using BigArrays.H5sBigArrays
ba = H5sBigArray("/directory/of/hdf5/files/");
# use it as normal array

ba[101:200, 201:300, 1:3] = rand(UInt8, 100,100,3)
@show ba[101:200, 201:300, 1:3]

use backend of AWS S3

setup info file

the info file is a JSON file, which defines all the configuration of the dataset. It was defined in neuroglancer

test example

use backend of Google Cloud Storage

the info configuration file is the same with S3 backend.

test example

Development

BigArrays is a high-level architecture to transform Key-Value store (backend) to Julia Array (frontend). it provide an interface of AbstractArray, and implement the get_index and set_index functions.

Add new backend

The backends are different key-value stores. To add a new backend, you can simply do the following:

wrap the key-value store as a Julia Associate type. S3Dicts is an example is a good example.
implement the getindex and setindex! functions. S3Dicts example
make sure that the key-value store have a field of configDict containing the block size and data type.

tkelman / bigarrays.jl Goto Github PK

bigarrays.jl's Introduction

BigArrays.jl

Features

supported backends

Installation

usage

use the hdf5 files backend

use backend of AWS S3

setup info file

use backend of Google Cloud Storage

Development

Add new backend

bigarrays.jl's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent