Giter Club home page Giter Club logo

hive-export's Introduction

VoltDB Hive Export Conduit

An experimental VoltDB to Hive export conduit that takes advantage of Hive's new streaming hcatalog API. It allows export table writers to push data directly into correspoding Hive tables.

How to build artifacts and setup Eclipse

  • Install Gradle

On a Mac if you have Homebrew setup then simply install the gradle bottle

brew install gradle

On Linux setup GVM, and install gradle as follows

gvm install gradle
  • Create gradle.properties file and set the voltdbhome property to the base directory where your VoltDB is installed
echo voltdbhome=/voltdb/home/dirname > gradle.properties
  • Invoke gradle to compile artifacts
gradle shadowJar
  • To setup an eclipse project run gradle as follows
gradle cleanEclipse eclipse

then import it into your eclipse workspace by using File->Import projects menu option

Configuration

  • Copy the built jar from build/libs to lib/extension under your VoltDB installation directory

  • Edit your deployment file and use the following export XML stanza as a template

<?xml version="1.0"?>
<deployment>
    <cluster hostcount="1" sitesperhost="4" kfactor="0" />
    <httpd enabled="true">
        <jsonapi enabled="true" />
    </httpd>
    <export>
        <configuration stream="default" enabled="true" type="custom"
            exportconnectorclass="org.voltdb.exportclient.hive.HiveExportClient">
            <property name="hive.uri">thrift://hive-host:9083</property>
            <property name="hive.db">meco</property>
            <property name="hive.table">alerts</property>
            <property name="hive.partition.columns">ALERTS:CONTINENT|COUNTRY</property>
        </configuration>
    </export>
</deployment>

This tells VoltDB to write to the alerts table on Hive, via the homonymous export table in VoltDB, using columns CONTINENT and COUNTRY as value providers for Hive partitions discerners. For example the alerts table is defined in Hive as:

create table alerts ( id int , msg string )
     partitioned by (continent string, country string)
     clustered by (id) into 5 buckets
     stored as orc; // currently ORC is required for streaming

while the VoltDB export table is defined as:

FILE -inlinebatch END_OF_EXPORT

create table alerts (
  id integer not null,
  msg varchar(128),
  continent varchar(64),
  country varchar(64)
)
;
partition table alerts on column id
;
export table alerts
;
END_OF_EXPORT

When a row is inserted into the export table

INSERT INTO ALERTS (ID,MSG,CONTINENT,COUNTRY) VALUES (1,'fab-02 inoperable','EU','IT');

The continent ('EU') and country ('IT') column values are used to specify the Hive table partition.

Configuration Properties

  • hive.uri (mandatory) thrift URI to the Hive host
  • hive.db (mandatory) Hive database
  • hive.table (mandatory) Hive table
  • hive.partition.columns (mandatory if the hive table is partitioned) format: table-1:column-1|column-2|...|column-n,table-2:column-1|column-2|...|column-n,...,table-n:column-1|column-2|...|column-n
  • timezone (optional, default: local timezone) timezone used to format timestamp values

Partition columns must be of type VARCHAR. Any empty or null partition column values are converted to __VoltDB_unspecified__

hive-export's People

Contributors

jhugg avatar akhanzode avatar pdshaw avatar vtkstef avatar

Watchers

James Cloos avatar vanguard_space avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.