Giter Club home page Giter Club logo

ccindex-1.1.2-new's Introduction

CCIndex V1.1.2 User Manual

##File Packing List The CCIndex V1.1.2 package consists of:

  • ict.ocrabase.main.java.client
  • ict.ocrabase.main.java.regionserver
  • ict.ocrabase.main.java.test
  • org.apache.hadoop.hbase.filter
  • org.apache.hadoop.hbase.filter
  • org.apache.hbase.coprocessor

The total overview like this:

packing-list

##Installation Prerequisites: the Hadoop and the HBase environment have been right configured. the hadoop version is hadooop-2.5.1. the hbase version is hbase-1.1.2

  • Copy the file CCIndex-1.1.2.jar to path : $HADOOP_ENV$/share/hadoop/yarn . like this:

yarn-jar

  • Add the src file to your project if you want to debug your code.(if not, you should add the CCIndex-1.1.2.jar to your path )
  • Add the flowing configuration to $HBASE-ENV$/conf/hbase-site.xml
    <property>
      <name>hbase.coprocessor.region.classes</name>             
      <value>org.apache.hbase.coprocessor.PutObserver,org.apache.hbase.coprocessor.DeleteObserver</value>
    </property>

Up to now, the environment have been successfully configured

##How to use There are two ways to use CCIndex. One is no bulkload, the other is bulkload. ###No bulkload No bulkload means you have to put data to you table, when there are a large amount of data, it may consume huge time. There have some example in the ict.ocrabase.main.java.test package.

Example:

ict.ocrabase.main.java.test.CreateTableWithIndexTest.java create hbase table and the ccindex table. after accomplish, you will see there exists three tables in hbase like this: index-table ict.ocrabase.main.java.test.PutTableWithIndexTest.java put data to the hbase table and the ccindex table ###Bulkload Bulkload use map reduce to put data to the base table and the ccindex table

Example:

  • First upload the TPC-H test data to the hdfs.download

bin/hadoop fs -put ../test-data/xaa1000.txt /index-data

  • ict.ocrabase.main.java.test.CreateTableTestBulkload.java create hbase table and the ccindex table. after run the above code, you will see there exists three tables like this: index-table

  • import data ict.ocrabase.main.java.client.cli.Import.java import the hdfs data to hbase, the configuration parameter like this:

-s /index-data -ts real_table_with_index,SEMICOLON,f:c1:STRING:CCINDEX:real_table_with_index-f_c1,f:c2:STRING,f:c3:STRING,f:c4:STRING:CCINDEX
:real_table_with_index-f_c4,f:c5:STRING,f:c6:STRING,f:c7:STRING,f:c8:STRING -l 32

after run the above code, you will see some output like this: out-put PS: there the index columns are c1 and c4 ##Query

  • Scan without index Example:

ict.ocrabase.main.java.test.QueryByCondition.java

  • Scan with index Example:

ict.ocrabase.main.java.test.QueryMultiColumnUseCCIndex.java paramater ##Simple query test

  • Data source: TPC-H ORDERS table
  • Sample data source in our test download
  • Total 10 million line data
  • Index column c1(CUSTKEY) and c4(ORDERDATE)
  • The query sql like this:
select * from ORDERS where 100000<CUSTKEY<300000 and "1994-01-01"<ORDERDATE<"1994-01-30"

###caching number=1,threads number=1 tbale1

CCIndex HBase
time(ms) 19962 101593

tbale2

CCIndex HBase
time(ms) 13841 91988

Table1 only use c4 as query condition, the result count are 33514; Table2 use both c1 and c4 as query condition the result count are 13649; ###caching number=10000,threads number=10

tbale1

CCIndex HBase
time(ms) 4921 37599

tbale2

CCIndex HBase
time(ms) 4744 56022

Reference

CCIndex: A Complemental Clustering Index on Distributed Ordered Tables for Multi-dimensional Range Queries

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.