Giter Club home page Giter Club logo

llamas2's Introduction

llamas2

Installation

llamas2 is written for the Rust ecosystem.

  • You will need to install Rust. Rustup is suggested.
  • Currently runs on stable, but may shift to using procedural macros which are unsafe and require nightly.
  • This crate is unpublished. To use as a dependency, in Cargo.toml:
[dependencies]
llamas2 = { git = "https://github.com/hwchen/llamas2" }

Use

Please see examples.

Implemented

  • heterogenous datatypes in a table
  • basic adding new cols to table
  • apply method, to apply a fn to a column
  • melt macro

I'm currently implementing only what I need for a proof of concept (see Motivation section below).

Most important at this stage is to make sure that type and macro system is flexible enough to implement dynamic and flexible dataframes while also tying into performant data structures (without too much pain).

Motivation

My work uses pandas for etl, which mainly consists of reshaping tables. I would love to be able to use Rust so that I don't have to use Python, which can give me headaches.

Since etl is the major usecase, the focus areas are:

  • reshaping (melting and pivoting)
  • splitting string cols
  • apply (or map to dict)
  • group by
  • filtering
  • reading and writing csv (perhaps compressed also)
  • generating sql for creating tables
  • (performance and ergonomics of course)

non-focus areas:

  • numerical computing (but maybe in the future)
  • operations on single rows
  • ergonomics to python/pandas ease of use

Design/Influences

This project is most influence by my time using pandas for etl. There are some pandas idiosyncracies (like non-nullable integers!) that I would love to resolve. In that vein, I've been following the development of pandas 2 closely. I'm inspired by the project's focus on performance and ergonomics, and the use of C++ data structures on the backend.

In particular, I want to be able to have an Array representation which combines a null bitvec representation with an Vec of a primitive type or struct. In order to have the most compact representation (and the best alignment?), I'm trying to design this to not have each value be stored as an enum, like some other libraries.

I've seen e.g. InnerType::Float(x) in Utah, or Nullable::Value(T) and Nullable::Null in brassfibres. In addition to both not having the most compact representation, the usage of InnerType would seem to allow the use of mixed types within a series, which I would not want to allow.

My other influence is from databases. At work we use columnar database (Monet in particular) as a backend to an OLAP service. And my desire to learn more about databases also led me to bradfield, where I took a computer architecture and a databases course. As the project for the databases course, I also wrote a toy sql database executor in Rust link.

Other dataframe/etl projects

  • datafusion has a dataframe-like representation, but is meant to be used with sql and query planner on the frontend.
  • brassfibres
  • utah

llamas2's People

Contributors

hwchen avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.