Giter Club home page Giter Club logo

chimp's Introduction

ChimpDB Build Status

This project is in a VERY early stage

ChimpDB is a distributed database with built-in machine learning capabilities. In order to achieve the best possible performance, ChimpDB does not rely on any exising database technologies. The entire stack (storage, replication, networking, api) will be built from scratch.

Building

ChimpDB comes with Vagrant configuration file. The code has been tested on Ubuntu Presice 64bit machine. After launching a Vagrant machine all build dependencies will be installed, to build:

cd /vagrant
./augoten.sh
./configure
make
make test (optional)
make install (optional)

Running

Launching ChimpDB is as easy as running chimp in your shell.

Example

Suppose we want to perform perform linear regression on the famous iris dataset. The following is a python code example:

>>> import pychimp
>>> c = pychimp.Client(('localhost', 8000)    # instantiate ChimpDB client

Suppose we only have setosa class:

>>> data = [
# sepalLength, sepalWidth, petalLength, petalWidth, species
5.1,3.5,1.4,0.2,1.0,
4.9,3.0,1.4,0.2,1.0,
4.7,3.2,1.3,0.2,1.0,
4.6,3.1,1.5,0.2,1.0,
5.0,3.6,1.4,0.2,1.0,
5.4,3.9,1.7,0.4,1.0,
4.6,3.4,1.4,0.3,1.0,
5.0,3.4,1.5,0.2,1.0,
4.4,2.9,1.4,0.2,1.0,
4.9,3.1,1.5,0.1,1.0,
5.4,3.7,1.5,0.2,1.0,
4.8,3.4,1.6,0.2,1.0,
4.8,3.0,1.4,0.1,1.0,
4.3,3.0,1.1,0.1,1.0,
5.8,4.0,1.2,0.2,1.0,
5.7,4.4,1.5,0.4,1.0,
5.4,3.9,1.3,0.4,1.0,
5.1,3.5,1.4,0.3,1.0,
5.7,3.8,1.7,0.3,1.0,
5.1,3.8,1.5,0.3,1.0,
5.4,3.4,1.7,0.2,1.0,
5.1,3.7,1.5,0.4,1.0,
4.6,3.6,1.0,0.2,1.0,
5.1,3.3,1.7,0.5,1.0,
4.8,3.4,1.9,0.2,1.0,
5.0,3.0,1.6,0.2,1.0,
5.0,3.4,1.6,0.4,1.0,
5.2,3.5,1.5,0.2,1.0,
5.2,3.4,1.4,0.2,1.0,
4.7,3.2,1.6,0.2,1.0,
4.8,3.1,1.6,0.2,1.0,
5.4,3.4,1.5,0.4,1.0,
5.2,4.1,1.5,0.1,1.0,
5.5,4.2,1.4,0.2,1.0,
4.9,3.1,1.5,0.1,1.0,
5.0,3.2,1.2,0.2,1.0,
5.5,3.5,1.3,0.2,1.0,
4.9,3.1,1.5,0.1,1.0,
4.4,3.0,1.3,0.2,1.0,
5.1,3.4,1.5,0.2,1.0,
5.0,3.5,1.3,0.3,1.0,
4.5,2.3,1.3,0.3,1.0,
4.4,3.2,1.3,0.2,1.0,
5.0,3.5,1.6,0.6,1.0,
5.1,3.8,1.9,0.4,1.0,
4.8,3.0,1.4,0.3,1.0,
5.1,3.8,1.6,0.2,1.0,
4.6,3.2,1.4,0.2,1.0,
5.3,3.7,1.5,0.2,1.0,
5.0,3.3,1.4,0.2,1.0,
]

Load the data into ChimpDB:

# Create a new dataset
>>> c.dsnew("iris", 5)
>>> c.dsappend("iris", data)

Next, let's create two linear regression models one to model sepal length with sepal width, and another one to model sepal length using petal width.

>>> c.modnew("model1", "linear_regression")
>>> c.modnew("model2", "linear_regression")

Now build the models

>>> c.modbuild("model1", {"dataset_name": "iris", "features": [1], "response": 0})
>>> c.modbuild("model2", {"dataset_name": "iris", "features": [3], "response": 0})

Predicting is very simple:

>>> c.modpredict("model1", [3.4, 4.1])
[200, {'responses': [4.993564620877732, 5.477162697854874]}, 000]

In the example above we are trying to predict two data points: 3.4 and 4.1.

Interfacing With ChimpDB

ChimpDB works over TCP/IP and uses MessagePack for data transport. MessagePack provides small traffic footprint and is relativelly easy to parse. Writing client code for connecting to ChimpDB should be a breeze in any language.

The Protocol

ChimpDB uses message based protocol. Each message is an array of items.

Every request message consists of atleast 1 item -- a command name, e.g:

["PING"]
["DSNEW", "dataset_name", 5]

In the example above we see two commands: PING and DSNEW. PING command takes no arguments where as DSNEW takes two additional parameters. Response messages are also arrays, but of fixed 3 item size:

[200, None, None]
[400, None, "error message"]

First item in the response list is response code. Response codes fall into 3 categories: success (2xx), user error (4xx) and server error (5xx). Second response argument is the response data, if any. Final element is the error message.

Commands

  • PING - Pings the server, a noop command.
  • SHUTDOWN - Shuts down the server.
  • DSNEW <name> <num columns> - Creates a new dataset with the specified number of columns.
  • DSLIST - Lists all available dataset.
  • MODNEW <model name> <model type> - Create a new model. Currently only linear_regression is a valid model type.
  • MODBUILD <model name> [<model specific parameters>] - Build given model.
  • MOPREDICT <model name> [model specific data] - Uses given model to make a prediction.

chimp's People

Contributors

tadasv avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.