Giter Club home page Giter Club logo

cascading.cassandra's Introduction

cascading.cassandra

A Cascading Scheme and Tap for Cassandra and can operate as both a Sink and a Source.

Current build status: Build status

Installing

The project can be built using Maven and installed into your local repository:

mvn install

Alternatively, there's a snapshot available from Conjars.

Restrictions

This is very much an early Work-in-Progress and all contributions are welcome. It only supports regular column families currently (no super columns or counter columns yet).

Usage

cascading.cassandra is heavily influenced by cascading.hbase and uses Cassandra's ColumnFamilyOutputFormat for it's sink.

First, create a CassandraScheme and specify the field to be used as the row key (currently the Scheme will only work with single-field keys). The second parameter is an array of Fields that represent the columns you wish to store. The column names will be serialized from the name provided and the values will come from the Tuples during the flow.

Using as a Sink

For narrow rows with known column names, use the NarrowRowScheme, specifying the field to use for the key and fields to use a column values. The field names are used for the written column names:

Fields keyFields = new Fields("num");
Fields nameFields = new Fields("lower", "upper");
CassandraScheme scheme = new NarrowRowScheme(keyFields, nameFields);

Finally, hook the CassandraScheme into a CassandraTap and provide the Cassandra Thrift RPC Host and Port that the ColumnFamilyOutputFormat should connect to, as well as the keyspace and column family names you wish to store/retrieve values for.

Tap sink = new CassandraTap(getRpcHost(), getRpcPort(), keyspaceName, columnFamilyName, scheme);

For dumping wide rows, use the WideRowScheme, which takes no argument in construction:

CassandraScheme scheme = new WideRowScheme();

This scheme expects each sunk tuple to consist of a row key followed by any number of column name / value pairs.

Using as a Source

Using the NarrowRowScheme with a source is identical to usage with a sink. The WideRowScheme cannot currently be used as a source.

License

Licensed under the Apache 2.0 license.

Copyright

Copyright © Paul Ingles, 2011.

cascading.cassandra's People

Contributors

kjim avatar llasram avatar pingles avatar quantisan avatar yuhanonescreen avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.