Giter Club home page Giter Club logo

hive-bulkload-hbase's Introduction

hive-bulkload-hbase

Import hive table into hbase as fast as possible.

Directories

  • bin: Contains the shell script that starts the program.
  • src: Contains the source code and the test code.
  • schema: Contains the schema file of a table.

Compilation

$ mvn clean compile

$ mvn clean package

$ mvn assembly:assembly

Description

HBase gives random read and write access to your big data, but getting your big data into HBase can be a challenge. And there are three methods to be able to make it.

  1. Use the API to put the data one by one.
  2. Hive Integrates HBase. And you can check HBaseIntegration and here
  3. HBase comes with bulk load capabilities.

However, the first two methods is slower than the last method that you simply bypassed the lot and created the HFiles yourself and copied them directly into the HDFS. The HBase bulk load process consists of two steps if Hive and HBase are on one cluster.

  1. HFile preparation via a MapReduce job.
  2. Importing the HFile into HBase using LoadIncrementalHFiles.doBulkLoad(eg. Driver2.java).

But HBase bulk load process consists of three steps if Hive and HBase are on different cluster.

  1. HFile preparation via a MapReduce job.
  2. Copying HFile from Hive cluster to HBase cluster.
  3. Importing the HFile into HBase via HBase commands on HBase cluster.

Usage

The aim of the MapReduce job is to generate HBase date files(HFile) from your input RCFile using HFileOutputFormat. Before you generate HFile, you should get Hive table's schema. And you can make use the following methods to get the schema.

  • Reading Hive metadata.
    • Using JDBC to obtain from MysSQL
    • Using HCatalog to obtain from MySQL
  • Parsing a file that records the schema. In my opinion, it is more efficient than reading metadata, even if a table contains serveral thousands columns.

Output from Mapper class are ImmutableBytesWritable, KeyValue. These classes are used by the subsequent partitioner and reducer to create the HFiles.
There is no need to write your own reducer as the HFileOutputFormat.configureIncrementalLoad() as used in the driver code sets the correct reducer and partitioner up for you.
Then, you should copy generated HFile from one cluster to another if Hive and HBase are on different cluster.

hadoop distcp hdfs://mycluster-hive/hfile/hbase hdfs://mycluster-hbase/hbase/test

Finally, import the File into HBase via HBase commands on HBase cluster.

hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /hbase/test hbase_table

Or import the File into HBase via Java code on HBase cluster(eg. Driver2.java).

// Importing the generated HFiles into a HBase table
LoadIncrementalHFiles loader = new LoadIncrementalHFiles(conf);
loader.doBulkLoad(new Path(outputPath, htable);

hive-bulkload-hbase's People

Contributors

gatsbynewton avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

hive-bulkload-hbase's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.