datafusion-rs's Introduction

DataFusion: Big Data Platform for Rust

DataFusion is a distributed data processing platform implemented in Rust. It is very much inspired by Apache Spark and has a similar programming style through the use of DataFrames and SQL.

DataFusion can also be used as a crate dependency in your project if you want the ability to perform SQL queries and DataFrame style data manipulation in-process.

Project Home Page

The project home page is now at https://datafusion.rs

Current Status

There are two working examples:

Both of these examples run a trivial query against a trivial CSV file using a single thread.

Roadmap

I've started defining milestones and issues in github issues, but here's a high level summary of the plan with some rough guesses of timescale.

POC (Q1 2018)

For the POC, I want to be able to run a single worker process (preferably dockerized) and be able to send it a query (via JSON) and have it execute that query. This will be sufficient to run some representative (but trivial) workloads to compare with Apache Spark.

The workloads will read and write CSV files from HDFS.

MVP (Q2 2018)

MVP should be fully deployable, have a good UX, have good documentation etc. It could still be lacking major features though such as JOIN, GROUP BY, user-defined functions etc.

1.0 (Q4 2018)

The 1.0 release should be able to support real-world workloads with performance, scalability, and reliability that generally exceed those of Apache Spark.

Contributing

Contributers are welcome! Please see CONTRIBUTING.md for details.

Recommend Projects

placrosse / datafusion-rs Goto Github PK

datafusion-rs's Introduction

DataFusion: Big Data Platform for Rust

Project Home Page

Current Status

Roadmap

POC (Q1 2018)

MVP (Q2 2018)

1.0 (Q4 2018)

Contributing

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent