Giter Club home page Giter Club logo

sequins's Introduction

sequins!!!!!!!

Sequins!!!

build

Sequins is a dead-simple static database. It indexes and serves SequenceFiles over HTTP, so it's perfect for serving data created with Hadoop.

Installing

There are tarballs on the releases page. There's also a docker image, if you're into that.

Building

To create a sequins binary (you'll need go on your path):

$ git clone https://github.com/stripe/sequins
$ cd sequins
$ make

Or, to install a binary to $GOPATH/bin:

$ make install

Usage

$ sequins -b ':9599' -cr 1m hdfs://namenode:8020/path/to/mydata

That tells sequins to load your data from HDFS, and check every minute for new versions, and then bind to the port 9599 and listen for requests. The URL can point to HDFS, or s3, or just be a local path.

Sequins expects your data to be versioned. Inside the top-level directory you you specify, you should have subdirectories, like this:

/mydata/
  version0/
    part-00000
    part-00001
    ...
  version1/
    ...

The versions can be timestamps, dates, or anything - sequins will automatically choose whichever version is the greatest, in lexicographical order.

This may seem a little weird, but it works really well for aggregates that you produce perodically, and it allows sequins to easily hotload new data (see the corresponding section, below).

Once sequins has started and built the index, you can get the value for a given key over HTTP. The body of the response will be the result, or if the key doesn't exist, it'll give you a 404. For example:

$ http localhost:9599/foo
HTTP/1.1 200 OK
Accept-Ranges: bytes
Content-Length: 3
Content-Type: text/plain; charset=utf-8
Date: Thu, 04 Sep 2014 11:42:01 GMT
Last-Modified: Thu, 04 Sep 2014 11:39:38 GMT

bar
$ http localhost:9599/baz
HTTP/1.1 404 Not Found
Content-Length: 0
Content-Type: text/plain; charset=utf-8
Date: Thu, 04 Sep 2014 11:42:20 GMT


Note the Last-Modified header: this corresponds to the last time sequins was given new data (see 'hotloading', below). Sequins will also happily (and correctly) respond to requests with a Range header.

Hotloading

Sequins knows how to hotload new data without dropping any requests. After you've dropped a new, lexicographically-greater version into your top-level directory, just send SIGHUP to the running process:

kill -HUP <pid>

and it'll download the files (if necessary), build an index in the background, then switch when it's done. If it fails while building the new index for some reason, it'll continue to serve the current one.

You can also tell sequins to automatically look for new versions with the --refresh-period option.

If you're working with hadoop output, hotloading might accidentally load a partial result, because Hadoop creates directories when it starts a job. To mitigate this, you can pass in --check-for-success, which will tell sequins to only load versions with a _SUCCESS file in them (Hadoop creates these files automatically when it's done running a job).

Status

Sending a plain GET request to / will make sequins dump out its current status, like so:

$ http localhost:9599/ | python -m json.tool
{
    "count": 3,
    "path": "path/to/stuff/1401490544",
    "started": 1409830778,
    "updated": 1409830778
}

Miscellany, Caveats

  • Here's a Scalding sink for generating sequins-compatible SequenceFiles. It works for anything that can be converted to a JSON value.
  • The HDFS code uses this hdfs library, which currently only supports Hadoop 2.0.0 and up (including CDH5).
  • SequenceFiles don't strictly enforce that you have only one value for each key; if your data has multiple values for a key, sequins will load it without complaint, but only index one value for the key (probably nondeterministically).
  • Currently, there's no support for compressed SequenceFiles, or for key/value serializations other than BytesWritable.

sequins's People

Contributors

colinmarc avatar praboud-stripe avatar jhp-stripe avatar praboud avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.