Giter Club home page Giter Club logo

rover's Introduction

Rover

Simple, powerful data frames for Ruby

⛰️ Designed for data exploration and machine learning, and powered by Numo

🌲 Uses Vega for visualization

Build Status

Installation

Add this line to your application’s Gemfile:

gem "rover-df"

Intro

A data frame is an in-memory table. It’s a useful data structure for data analysis and machine learning. It uses columnar storage for fast operations on columns.

Try it out for forecasting by clicking the button below (it can take a few minutes to start):

Binder

Use the Run button (or SHIFT + ENTER) to run each line.

Creating Data Frames

From an array

Rover::DataFrame.new([
  {a: 1, b: "one"},
  {a: 2, b: "two"},
  {a: 3, b: "three"}
])

From a hash

Rover::DataFrame.new({
  a: [1, 2, 3],
  b: ["one", "two", "three"]
})

From Active Record

Rover::DataFrame.new(User.all)

From a CSV

Rover.read_csv("file.csv")
# or
Rover.parse_csv("CSV,data,string")

From Parquet (requires the red-parquet gem)

Rover.read_parquet("file.parquet")
# or
Rover.parse_parquet("PAR1...")

Attributes

Get number of rows

df.count

Get column names

df.keys

Check if a column exists

df.include?(name)

Selecting Data

Select a column

df[:a]

Note that strings and symbols are different keys, just like hashes. Creating a data frame from Active Record, a CSV, or Parquet uses strings.

Select multiple columns

df[[:a, :b]]

Select first rows

df.head
# or
df.first(5)

Select last rows

df.tail
# or
df.last(5)

Select rows by index

df[1]
# or
df[1..3]
# or
df[[1, 4, 5]]

Iterate over rows

df.each_row { |row| ... }

Iterate over a column

df[:a].each { |item| ... }
# or
df[:a].each_with_index { |item, index| ... }

Filtering

Filter on a condition

df[df[:a] == 100]
df[df[:a] != 100]
df[df[:a] > 100]
df[df[:a] >= 100]
df[df[:a] < 100]
df[df[:a] <= 100]

In

df[df[:a].in?([1, 2, 3])]
df[df[:a].in?(1..3)]
df[df[:a].in?(["a", "b", "c"])]

Not in

df[!df[:a].in?([1, 2, 3])]

And, or, and exclusive or

df[(df[:a] > 100) & (df[:b] == "one")] # and
df[(df[:a] > 100) | (df[:b] == "one")] # or
df[(df[:a] > 100) ^ (df[:b] == "one")] # xor

Operations

Basic operations

df[:a] + 5
df[:a] - 5
df[:a] * 5
df[:a] / 5
df[:a] % 5
df[:a] ** 2

Summary statistics

df[:a].count
df[:a].sum
df[:a].mean
df[:a].median
df[:a].percentile(90)
df[:a].min
df[:a].max
df[:a].std
df[:a].var

Count occurrences

df[:a].tally

Cross tabulation

df[:a].crosstab(df[:b])

Grouping

Group

df.group(:a).count

Works with all summary statistics

df.group(:a).max(:b)

Multiple groups

df.group(:a, :b).count

Visualization

Add Vega to your application’s Gemfile:

gem "vega"

And use:

df.plot(:a, :b)

Specify the chart type (line, pie, column, bar, area, or scatter)

df.plot(:a, :b, type: "pie")

Group data

df.plot(:a, :b, group: :c)

Stacked columns or bars

df.plot(:a, :b, group: :c, stacked: true)

Updating Data

Add a new column

df[:a] = 1
# or
df[:a] = [1, 2, 3]

Update a single element

df[:a][0] = 100

Update multiple elements

df[:a][0..2] = 1
# or
df[:a][0..2] = [1, 2, 3]

Update all elements

df[:a] = df[:a].map { |v| v.gsub("a", "b") }
# or
df[:a].map! { |v| v.gsub("a", "b") }

Update elements matching a condition

df[:a][df[:a] > 100] = 0

Clamp

df[:a].clamp!(0, 100)

Delete columns

df.delete(:a)
# or
df.except!(:a, :b)

Rename a column

df[:new_a] = df.delete(:a)

Sort rows

df.sort_by! { |r| r[:a] }

Clear all data

df.clear

Combining Data Frames

Add rows

df.concat(other_df)

Add columns

df.merge!(other_df)

Inner join

df.inner_join(other_df)
# or
df.inner_join(other_df, on: :a)
# or
df.inner_join(other_df, on: [:a, :b])
# or
df.inner_join(other_df, on: {df_col: :other_df_col})

Left join

df.left_join(other_df)

Encoding

One-hot encoding

df.one_hot

Drop a variable in each category to avoid the dummy variable trap

df.one_hot(drop: true)

Conversion

Array of hashes

df.to_a

Hash of arrays

df.to_h

Numo array

df.to_numo

CSV

df.to_csv

Parquet (requires the red-parquet gem)

df.to_parquet

Types

You can specify column types when creating a data frame

Rover::DataFrame.new(data, types: {"a" => :int64, "b" => :float64})

Or

Rover.read_csv("data.csv", types: {"a" => :int64, "b" => :float64})

Supported types are:

  • boolean - :bool
  • float - :float64, :float32
  • integer - :int64, :int32, :int16, :int8
  • unsigned integer - :uint64, :uint32, :uint16, :uint8
  • object - :object

Get column types

df.types

For a specific column

df[:a].type

Change the type of a column

df[:a] = df[:a].to(:int32)

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development:

git clone https://github.com/ankane/rover.git
cd rover
bundle install
bundle exec rake test

rover's People

Contributors

ankane avatar dansbits avatar francis avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.