CCIndex V1.1.2 User Manual

##File Packing List The CCIndex V1.1.2 package consists of:

ict.ocrabase.main.java.client
ict.ocrabase.main.java.regionserver
ict.ocrabase.main.java.test
org.apache.hadoop.hbase.filter
org.apache.hadoop.hbase.filter
org.apache.hbase.coprocessor

The total overview like this:

##Installation Prerequisites: the Hadoop and the HBase environment have been right configured. the hadoop version is hadooop-2.5.1. the hbase version is hbase-1.1.2

Copy the file CCIndex-1.1.2.jar to path : $HADOOP_ENV$/share/hadoop/yarn . like this:

Add the src file to your project if you want to debug your code.(if not, you should add the CCIndex-1.1.2.jar to your path )
Add the flowing configuration to $HBASE-ENV$/conf/hbase-site.xml

    <property>
      <name>hbase.coprocessor.region.classes</name>             
      <value>org.apache.hbase.coprocessor.PutObserver,org.apache.hbase.coprocessor.DeleteObserver</value>
    </property>

Up to now, the environment have been successfully configured

##How to use There are two ways to use CCIndex. One is no bulkload, the other is bulkload. ###No bulkload No bulkload means you have to put data to you table, when there are a large amount of data, it may consume huge time. There have some example in the ict.ocrabase.main.java.test package.

Example:

ict.ocrabase.main.java.test.CreateTableWithIndexTest.java create hbase table and the ccindex table. after accomplish, you will see there exists three tables in hbase like this: ict.ocrabase.main.java.test.PutTableWithIndexTest.java put data to the hbase table and the ccindex table ###Bulkload Bulkload use map reduce to put data to the base table and the ccindex table

Example:

First upload the TPC-H test data to the hdfs.download

bin/hadoop fs -put ../test-data/xaa1000.txt /index-data

ict.ocrabase.main.java.test.CreateTableTestBulkload.java create hbase table and the ccindex table. after run the above code, you will see there exists three tables like this:
import data ict.ocrabase.main.java.client.cli.Import.java import the hdfs data to hbase, the configuration parameter like this:

-s /index-data -ts real_table_with_index,SEMICOLON,f:c1:STRING:CCINDEX:real_table_with_index-f_c1,f:c2:STRING,f:c3:STRING,f:c4:STRING:CCINDEX
:real_table_with_index-f_c4,f:c5:STRING,f:c6:STRING,f:c7:STRING,f:c8:STRING -l 32

after run the above code, you will see some output like this: PS: there the index columns are c1 and c4 ##Query

Scan without index Example:

ict.ocrabase.main.java.test.QueryByCondition.java

Scan with index Example:

ict.ocrabase.main.java.test.QueryMultiColumnUseCCIndex.java ##Simple query test

Data source: TPC-H ORDERS table
Sample data source in our test download
Total 10 million line data
Index column c1(CUSTKEY) and c4(ORDERDATE)
The query sql like this:

select * from ORDERS where 100000<CUSTKEY<300000 and "1994-01-01"<ORDERDATE<"1994-01-30"

###caching number=1,threads number=1 tbale1

	CCIndex	HBase
time(ms)	19962	101593

tbale2

	CCIndex	HBase
time(ms)	13841	91988

Table1 only use c4 as query condition, the result count are 33514; Table2 use both c1 and c4 as query condition the result count are 13649; ###caching number=10000,threads number=10

tbale1

	CCIndex	HBase
time(ms)	4921	37599

tbale2

	CCIndex	HBase
time(ms)	4744	56022

Reference

CCIndex: A Complemental Clustering Index on Distributed Ordered Tables for Multi-dimensional Range Queries

neuyilan / ccindex-1.1.2-new Goto Github PK

ccindex-1.1.2-new's Introduction

CCIndex V1.1.2 User Manual

Reference

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent