huawei-hadoop / hindex Goto Github PK

View Code? Open in Web Editor NEW

589.0 134.0 289.0 9.89 MB

Secondary Index for HBase

License: Apache License 2.0

Ruby 1.44% Shell 0.62% Java 97.62% XSLT 0.03% C++ 0.06% PHP 0.04% Perl 0.05% Python 0.09% CSS 0.05%

hindex's Introduction

hindex - Secondary Index for HBase

The solution is 100% Java, compatible with Apache HBase 0.94.8, and is open sourced under ASL.

Following capabilities are supported currently.

multiple indexes on table,
multi column index,
index based on part of a column value,
equals and range condition scans using index, and
bulk loading data to indexed table (Indexing done with bulk load).

How it works

HBase Secondary Index is 100% server side implementation with co processors which persists index data in a separate table. Indexing is region wise and custom load balancer co-locates the index table regions with actual table regions.

Server reads the Index specification passed during the table creation and creates the index table. There will be one index table for one user table and all index information for that user table goes into the same index table.

Put Operation

When a row is put into the HBase (user) table, co processors prepare and put the index information in the corresponding index table. Index table rowkey = region startkey + index name + indexed column value + user table rowkey

E.g.:

Table –> tab1 column family –> cf

Index –> idx1, cf1:c1 and idx2, cf1:c2

Index table –> tab1_idx (user table name with suffix “_idx” )

Scan Operation

For a user table scan, co processor creates a scanner on the index table, scans the index data and seeks to exact rows in the user table. These seeks on HFiles are based on rowkey obtained from index data. This will help to skip the blocks where data is not present and sometimes full HFiles may also be skipped.

Usage

Clients need to pass the IndexedHTableDescriptor with the index name and columns while creating the table

IndexedHTableDescriptor htd = new IndexedHTableDescriptor(usertableName);

IndexSpecification iSpec = new IndexSpecification(indexName);

HColumnDescriptor hcd = new HColumnDescriptor(columnFamily);

iSpec.addIndexColumn(hcd, indexColumnQualifier, ValueType.String, 10);

htd.addFamily(hcd);

htd.addIndex(iSpec);

admin.createTable(htd);

No changes required for Puts, Deletes at client side as index operations for the same are internally handled by co-processors

No change in scan code for the client app.

No need to specify the index(s) to be used. Secondary Index implementation finds the best index for Scan by analyzing the filters used for the query.

Source

This repository contains source for Secondary Index support on Apache HBase 0.94.8.

Building from source and testing

Building from source procedure is same as building HBase source hence it requires

Java 1.6 or later
Maven 3.X

Separate test source (secondaryindex\src\test\java\ )is available for running the tests on secondary indexes.

Note

Configure following configurations in hbase-site.xml for using secondary index.

Property

name - hbase.use.secondary.index
value - true
description - Enable this property when you are using secondary index

Property

name - hbase.coprocessor.master.classes
value - org.apache.hadoop.hbase.index.coprocessor.master.IndexMasterObserver
description - A comma-separated list of org.apache.hadoop.hbase.coprocessor.MasterObserver coprocessors that are loaded by default on the active HMaster process. For any implemented coprocessor methods, the listed classes will be called in order. After implementing your own MasterObserver, just put it in HBase's classpath and add the fully qualified class name here. org.apache.hadoop.hbase.index.coprocessor.master.IndexMasterObserver -defines of coprocessor hooks to support secondary index operations on master process.

Property

name - hbase.coprocessor.region.classes
value - org.apache.hadoop.hbase.index.coprocessor.regionserver.IndexRegionObserver
description - A comma-separated list of Coprocessors that are loaded by default on all tables. For any override coprocessor method, these classes will be called in order. After implementing your own Coprocessor, just put it in HBase's classpath and add the fully qualified class name here. A coprocessor can also be loaded on demand by setting HTableDescriptor. org.apache.hadoop.hbase.index.coprocessor.regionserver.IndexRegionObserver –class defines coprocessor hooks to support secondary index operations on Region.

Property

name - hbase.coprocessor.wal.classes
value - org.apache.hadoop.hbase.index.coprocessor.wal.IndexWALObserver
description - Classes which defines coprocessor hooks to support WAL operations. org.apache.hadoop.hbase.index.coprocessor.wal.IndexWALObserver – class define coprocessors hooks to support secondary index WAL operations

Future Work

Dynamically add/drop index
Integrate Secondary Index Management in the HBase Shell
Optimize range scan scenarios
HBCK tool support for Secondary index tables
WAL Optimizations for Secondary index table entries
Make Scan Evaluation Intelligence Pluggable

hindex's People

Contributors

Stargazers

Watchers

Forkers

jyothi-mandava chrajeshbabu anoopsjohn ljinliuj mahak jackode jack19861225 alienfeel lotomer crazyliu haitaoyao kloisiie javadba johnnyhg javaside andy071001 liuyuuan billhongs tangc changguanghua streambo quemilk luzilon1 michaelyz hshuo lijinhui ramkrish86 tonywutao xokao dreamfrog houhlin mt0803 lyrl mravi venkatakarunakar panelion stamhe zenglzh twtbgn ledkk zcwfeng bitted learsu wangperry dijingshu vinkeychen1987 david34 raceli xuyanhui changming8 matyix xwjenny liwei-rk justasabc toby941org ptambe wanghj966 quzhongqiu pengyanhong lizhuo yanlinw doubaokun olivererwang jy01649210 achun2080 bytetian flash0729 wy1100 geeeeeeeek qimengzou noflawless huchao819 hujunfei jy4618272 hu174 lemonhall wuyang630 sipims mickshi wtest wolfwang1989 jeremychen smartdw idouba achellies zhengshuxin bdevetak zogwei aaronzhangl songfj weixu8 yx-slamdunk lutingjessica chinalongganhu tdyy ranjan4unitt chengc017 syaroslavtsev vipence kunkumar

hindex's Issues

In TestForComplexIssues the testcases are named with internal Issue ids.

You can close this issue. Am not sure how to comment. so am using this facility. If you want to keep this open you can have it.

Support indexing part of the RK

Now the index can be specified on one or more columns. Adding a part of the RK also into the index specification can be done (?)

A point been raised by James Taylor

Pass the IndexSpecification via HTD#setValue

We can avoid the new IndexedHTD

check and optimize bulkload operations for secondary index

there is an Exception on regionserver when i execute a search on hindex

the data is uploaded to hbase by bulkload tool.
the regionserver log is :

ERROR org.apache.hadoop.hbase.index.coprocessor.regionserver.IndexRegionObserver: Exception occured in postScannerOpen for the table test_CDR2
java.util.NoSuchElementException
at java.util.LinkedHashMap$LinkedHashIterator.nextEntry(LinkedHashMap.java:375)
at java.util.LinkedHashMap$KeyIterator.next(LinkedHashMap.java:384)
at org.apache.hadoop.hbase.index.coprocessor.regionserver.ScanFilterEvaluator.selectBestFitIndexForColumn(ScanFilterEvaluator.java:1086)
at org.apache.hadoop.hbase.index.coprocessor.regionserver.ScanFilterEvaluator.selectBestFitAndPossibleIndicesForSCVF(ScanFilterEvaluator.java:1064)
at org.apache.hadoop.hbase.index.coprocessor.regionserver.ScanFilterEvaluator.evalFilterForIndexSelection(ScanFilterEvaluator.java:480)
at org.apache.hadoop.hbase.index.coprocessor.regionserver.ScanFilterEvaluator.evaluate(ScanFilterEvaluator.java:128)
at org.apache.hadoop.hbase.index.coprocessor.regionserver.IndexRegionObserver.postScannerOpen(IndexRegionObserver.java:484)
at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postScannerOpen(RegionCoprocessorHost.java:1315)
at org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2560)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
2013-08-20 17:46:36,796 ERROR org.apache.hadoop.hbase.index.coprocessor.regionserver.IndexRegionObserver: Exception occured in postScannerOpen for the table test_CDR2
java.util.NoSuchElementException
at java.util.LinkedHashMap$LinkedHashIterator.nextEntry(LinkedHashMap.java:375)
at java.util.LinkedHashMap$KeyIterator.next(LinkedHashMap.java:384)
at org.apache.hadoop.hbase.index.coprocessor.regionserver.ScanFilterEvaluator.selectBestFitIndexForColumn(ScanFilterEvaluator.java:1086)
at org.apache.hadoop.hbase.index.coprocessor.regionserver.ScanFilterEvaluator.selectBestFitAndPossibleIndicesForSCVF(ScanFilterEvaluator.java:1064)
at org.apache.hadoop.hbase.index.coprocessor.regionserver.ScanFilterEvaluator.evalFilterForIndexSelection(ScanFilterEvaluator.java:480)
at org.apache.hadoop.hbase.index.coprocessor.regionserver.ScanFilterEvaluator.evaluate(ScanFilterEvaluator.java:128)
at org.apache.hadoop.hbase.index.coprocessor.regionserver.IndexRegionObserver.postScannerOpen(IndexRegionObserver.java:484)
at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postScannerOpen(RegionCoprocessorHost.java:1315)
at org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2560)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)

thanks!!

batchMutateForIndex() needs to be checked for better way

This is added specifically in the core but used only for index scenario. Need to find a better way.

Fix TestmultipleIndicesInScan test case failures after issue 29

Index wal edits should be written to the wal associated to index region instead main region wal

  Long time = indexWALEdit.getKeyValues().get(0).getTimestamp();
  ctx.getEnvironment()
      .getWAL()
      .appendNoSync(indexRegion.getRegionInfo(), Bytes.toBytes(indexTableName), indexWALEdit,
        logKey.getClusterId(), time, indexRegion.getTableDesc());

Can you tell me ！how can I run hindex on hadoop 2,so it can support impala。

Move changes in SplitTransaction to secondary index code rather than the core code

SplitTransaction deals with ThreadLocal variables to deal with details of parent's daughter regions and the index regions's daughter regions.
I think the CP hooks added can be contributed(may be already it is contribued) but the other changes like adding an atomic Put to the main and index region, Carrying the daughter region's info from the threadlocal has to be either moved out of the core code or we need to find a way to make it in the core code through a CP hook. If we go ahead with CF approach then this may not be needed.

TableIndexer should properly handle when we add new indices.

Presently when we add new indices we are not checking existences of indices, so if user specifies same index which is present we are doing duplicate index puts.

And also index table descriptor creation also not proper.

WAL optimizations for secondary index

I saw this in the future work list.
-> WAL for secondary index can be created from the WAL for main reigon.
-> But the order of region opening may be important.
-> New hooks in the creation/reading of recovered.edits may be needed if current ones are not sufficient.
As i said you can close these issue if you already have plans to implement this. I am just using this as a mode to add comments. Thank you.

How to use hindex for scanning data?

i have deployed Hadoop and hindex successfully, created table and inserted data , index table also existed, so ,how do i scan for special Qualifier which has index? like the code:
get 'test','rowkey','Family:Qualifier','value' ?

hindex compare to phoenix secondary index

hindex compare to phoenix secondary index(https://github.com/forcedotcom/phoenix
),there are some advantages and defects? combine phoenix's sql and hindex's secondary index is a good scheme?

Have separate split policy for index regions.

Presently while doing split from external client or after compaction, we are having a check for index table region, which is kernel change. Instead we can define a split policy which will return false on explicit split.

Add index related configurations to HConstants, remove year from licence in index code, make javadoc proper for {Master|Region}ObserverExt

Handle points mentioned by Ted at
https://issues.apache.org/jira/browse/HBASE-9203?focusedCommentId=13740511&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13740511

add index related configurations to constants
remove year from licence in sec index code
make java doc proper for MasterObserverExt,RegionObserverExt

Steps to implement hindex to my hbase cluster

I was able to build the project and run the map reduce bulk insert and load incremental file.

hbase org.apache.hadoop.hbase.index.mapreduce.IndexImportTsv
hbase org.apache.hadoop.hbase.index.mapreduce.IndexLoadIncrementalHFile

But, something strange is happening, after the compeletion of process for just 6GB of data, the size of the hbase table keeps and keeps on increasing till 200 gb after which I had to shutdown the cluster.

Please suggest whats going wrong here ?

Thanks

Support secure access to the index tables

Secure check is bypassed while accessing the indexing tables currently. Basic implementation can be granting the same ACL as usertable. Need to work on the best solution by considering per cell ACL in HBase >= 0.98 versions

postFilterRow implemented in IndexRegionObserver should be renamed to postScannerFilterRow(actual hook calling in RegionScanner)

Its missed during rebase and observed less performance with secondary index.

auto index the table after modify the table

If we create table first and there are some data in the table, now we want to use hindex, I use HtableAdmin.modifyAdmin method, it can create the index table, but it cannot index the data already exist in the table, is there some method to resolve this issues.?

put data is very slowly

hi, i'm a Chinese coder. and i use your project for secondary index . but put data is very slower than native hbase.

test environment：

put data size：more than 100 million
cluster：1 hmaser ；4 regionserver
cluster setting:
- heapsize：12g，
- memstore：
  - flushSize:256m
  - lowerLimit:0.38
  - upperLimit:0.4
index table setting：one index,and one column in this index
pre create 128 regions
put data use 8 reduces，and put 1000 rows of each

test result
- have index：5hrs, 14mins, 13sec
- no index：1hrs, 12mins, 23sec

so，thanks to help me。

Index usage for wildcard queries

hindex write issues

These days i do write performance test. There is two table, on include index(called idx_table), other not(called no_idx_table).
Write no_idx_table is ok, but after write some data to idx_table, region server always throw some exception:
org.apache.hadoop.hbase.NotServingRegionException:
xxxx. is closing

In the http://hmaster:60010, in the "Regions in Transition", there are two region(one is idx_table, the other is idx_table_idx) display
state=PENDING_CLOSE, ts=xxxxx, server=null

who can tell me what is wrong, it always happen this issues.

Verify whether we can get rid of max value length by maintaining value length in value of index put

how to use the index?

and how to deploy the hindex into cluster，or just like hbase 0.90.4? and how to bulid the project?..
and can it works with hadoop 1.X?

Move Index Mutations creation to a seperate IndexBuilder class

Support indexing functional result of column values

Some thing like index To_UPPER(c1).. We can have a set of predefined such functions. (Can refere Oracle/PG) Also allow UDFs(?)

hindex supports CDH4 hbase ？

i found some interfaces not implemented when compiled with eclipse using cdh4.3.0 hbase jars ，
does hindex take supportting CDH4 hbase into consideration ？

IndexAdmin#deleteTable call on indexed table should return after deleting both user and index table

Custom blooms on index data to speed up index data look up

Will add the details later in description

Have IndexAdmin for index related admin operations

May be we can have IndexAdmin extended from HBaseAdmin
This helps to avoid kernel changes in HBaseAdmin,
Also even if there are any new admin operations related to index we can add to this class.

Range scan performance and optimizations

The org.apache.hadoop.hbase.index.mapreduce.TableIndexer arguments problem

hi, i used bulkload to import data. and there have some code in TableIndexer like this:

String[] tableName = conf.getStrings(TABLE_NAME_TO_INDEX);
    if (tableName == null) {
      System.out
          .println("Wrong usage.  Usage is pass the table -Dindex.tablename='table1' "
              + "-Dtable.columns.index='IDX1=>cf1:[q1->datatype& length],[q2],"
              + "[q3];cf2:[q1->datatype&length],[q2->datatype&length],[q3->datatype& lenght]#IDX2=>cf1:q5,q5'");
      System.out.println("The format used here is: ");

the TABLE_NAME_TO_INDEX's actual value is tablename.to.index but the comment is -Dindex.tablename='table1',is right?

support index regions merge on user regions merge request.

Presently we don't have merge support for index regions on user regions merge request. it's very much important for >0.95 because we have online merge support. Any time regions can be merged.

TableIndexer cannot index data

when I run this mapreduce job, I found it can read from hbase table, but the map output is 0.
I found it always return null when execute IndexUtils.getIndexedHTableDescriptor(tablename, conf) method. So mapper cannot output any records to reducer.

issues in put operation

I get block in below situation:

when do put operation, below is the case:

put into userRegion is ok.
at this time, indexRegion.closing is set true, but wait for the lock.
execute the indexRegion.batchMutateForIndex method, will log WARN when execute in doMiniBatchMutation method, because the indexRegion.closing is true.

try {
      acquiredLockId = getLock(providedLockId, mutation.getRow(), shouldBlock);
} catch (IOException ioe) {
      LOG.warn("Failed getting lock in batch put, row=" + Bytes.toStringBinary(mutation.getRow()), ioe);
}

doMiniBatchMutation method always return 0L, so the indexRegion.batchMutateForIndex is a endless operation.

while (!batchOp.isDone()) {
    ....
}

it always log WARN to file, until the disk is full.

Patch file for core changes

Can you create a patch for all the core changes (which are not present in 94 code base) and attach as a file in this Git?

is this a bug?

when I insert testData,there is an error.

org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 5000 actions: NotServingRegionException: 5000 times, servers with issues: xxxxx:60020,
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1677)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1453)
at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:738)

verify whether we can open index daughter regions before opening user regions

This helps to avoid put failures during split if user daughter regions assigned but index daughter regions assignment in progress.

Convert SecondaryIndexColocator to AbstractHBaseTool or Tool

This helps to make it more user friendly and we can reuse some methods AbstractHBaseTool.

Support index data maintenance on separate column family as well.

If we maintain index data on separate family as well instead of separate table, then we can avoid all admin operations like create,enable,disable,split on index table. We can achieve this with minimal changes to our code base. Then most of the region wise operations will be taken care by kernel only.

Error when compile with maven.

migrate hindex to hbase 0.96 or higher

is it possible to migrate hindex to a higher version of hbase,like 0.96?

based on my understanding, there is a huge difference between 0.96 and 0.94.8, like project structure's changing, some core api's changing etc..

how can i merge the hindex's code into hbase 0.96?

Thank you

I hadn't found the api of "admin.createTable(desc, startKey, endKey, numRegions)"

I think the api is so import to begioning .And another question is What time the second
can support Hadoop 2.X.
Thank you

Add hints to skip index table scan in case of range queries

This can be implemented by passing Expression like SkipRangeScan as attribute to scan.

deadlock in put operation

Is user region and index region balanced at the same time ?

When put data into Hbase, it cause dead-lock issues, after track the code, i found in HRegion.batchMutate method.

    startRegionOperation();
    if (coprocessorHost != null) {
        coprocessorHost.postStartRegionOperation(); ------a
    }
    try {
        ......
    } finally {
        closeRegionOperation(); ----------b
        if (coprocessorHost != null) {
            coprocessorHost.postCloseRegionOperation();
        }
    }
  ......

When execute 'a' step, the index region has not been banlanced to this server, it will throw IOException and do not execute 'b' step, then cause the deadlock when closing this region.

Could you check it?

scan.setAttribute() question

hi, i use hindex scan data like this:

public void testSingleIndexExpressionWithMoreEqualsExpsAndOneRangeExp() throws Exception {
        String indexName = "IDX1";
        SingleIndexExpression singleIndexExpression = new SingleIndexExpression(indexName);

        byte[] value1 = "g".getBytes();
        Column column = new Column(FAMILY1, QUALIFIER9);
        EqualsExpression equalsExpression = new EqualsExpression(column, value1);
        singleIndexExpression.addEqualsExpression(equalsExpression);

        column = new Column(FAMILY1, QUALIFIER2);
        byte[] value2_1 = Bytes.toBytes("1383633260000");
        byte[] value2_2 = Bytes.toBytes("1383633262000");
        RangeExpression re = new RangeExpression(column, value2_1, value2_2, true, false);
        singleIndexExpression.setRangeExpression(re);

        Scan scan = new Scan();
        scan.setAttribute(Constants.INDEX_EXPRESSION, IndexUtils.toBytes(singleIndexExpression));
        FilterList fl = new FilterList(Operator.MUST_PASS_ALL);
        Filter filter = new SingleColumnValueFilter(FAMILY1, QUALIFIER9, CompareOp.EQUAL, value1);
        fl.addFilter(filter);
        filter = new SingleColumnValueFilter(FAMILY1, QUALIFIER2, CompareOp.GREATER_OR_EQUAL, value2_1);
        fl.addFilter(filter);
        filter = new SingleColumnValueFilter(FAMILY1, QUALIFIER2, CompareOp.LESS, value2_2);
        fl.addFilter(filter);
        scan.setFilter(fl);

        HTablePool pool = new HTablePool(configuration, 1000);
        HTableInterface table = pool.getTable(tableName);
        long current = System.currentTimeMillis();
        outputResult(scan, table);
        System.out.println(System.currentTimeMillis() - current);
    }

When i use scan.setAttribute(Constants.INDEX_EXPRESSION, IndexUtils.toBytes(singleIndexExpression));, this method can't scan any data. but if i remove this code , i can scan correct data. so i'm confused.

the some log data which remove scan.setAttribute()

2013-11-26 10:58:05,224 DEBUG org.apache.hadoop.hbase.index.coprocessor.regionserver.ScanFilterEvaluator: Checking for the best index(s) for the cols combination : [[cf1 : a2, cf1 : a9]]
2013-11-26 10:58:05,224 DEBUG org.apache.hadoop.hbase.index.coprocessor.regionserver.ScanFilterEvaluator: Trying to find a best index for the cols : [cf1 : a2, cf1 : a9]
2013-11-26 10:58:05,225 DEBUG org.apache.hadoop.hbase.index.coprocessor.regionserver.ScanFilterEvaluator: Possible indices for cols [cf1 : a2, cf1 : a9] : [Index : IDX1,Index Columns : [CF : cf1,Qualifier : a2, CF : cf1,Qualifier : a3, CF : cf1,Qualifier : a9]]
2013-11-26 10:58:05,225 DEBUG org.apache.hadoop.hbase.index.coprocessor.regionserver.ScanFilterEvaluator: Best index selected for the cols [cf1 : a2, cf1 : a9] : null
2013-11-26 10:58:05,225 DEBUG org.apache.hadoop.hbase.index.coprocessor.regionserver.ScanFilterEvaluator: Not even one index found for the cols combination : [[cf1 : a2, cf1 : a9]]
2013-11-26 10:58:05,225 DEBUG org.apache.hadoop.hbase.index.coprocessor.regionserver.ScanFilterEvaluator: Checking for the best index(s) for the cols combination : [[cf1 : a2], [cf1 : a9]]
2013-11-26 10:58:05,225 DEBUG org.apache.hadoop.hbase.index.coprocessor.regionserver.ScanFilterEvaluator: Trying to find a best index for the cols : [cf1 : a2]
2013-11-26 10:58:05,225 DEBUG org.apache.hadoop.hbase.index.coprocessor.regionserver.ScanFilterEvaluator: Possible indices for cols [cf1 : a2] : [Index : IDX1,Index Columns : [CF : cf1,Qualifier : a2, CF : cf1,Qualifier : a3, CF : cf1,Qualifier : a9]]
2013-11-26 10:58:05,225 DEBUG org.apache.hadoop.hbase.index.coprocessor.regionserver.ScanFilterEvaluator: Index Index : IDX1,Index Columns : [CF : cf1,Qualifier : a2, CF : cf1,Qualifier : a3, CF : cf1,Qualifier : a9] seems to be suitable for the columns [cf1 : a2]
2013-11-26 10:58:05,225 DEBUG org.apache.hadoop.hbase.index.coprocessor.regionserver.ScanFilterEvaluator: Best index selected for the cols [cf1 : a2] : Index : IDX1,Index Columns : [CF : cf1,Qualifier : a2, CF : cf1,Qualifier : a3, CF : cf1,Qualifier : a9]
2013-11-26 10:58:05,225 DEBUG org.apache.hadoop.hbase.index.coprocessor.regionserver.ScanFilterEvaluator: Trying to find a best index for the cols : [cf1 : a9]
2013-11-26 10:58:05,225 DEBUG org.apache.hadoop.hbase.index.coprocessor.regionserver.ScanFilterEvaluator: Possible indices for cols [cf1 : a9] : [Index : IDX1,Index Columns : [CF : cf1,Qualifier : a2, CF : cf1,Qualifier : a3, CF : cf1,Qualifier : a9]]
2013-11-26 10:58:05,225 DEBUG org.apache.hadoop.hbase.index.coprocessor.regionserver.ScanFilterEvaluator: Best index selected for the cols [cf1 : a9] : null
2013-11-26 10:58:05,225 DEBUG org.apache.hadoop.hbase.index.coprocessor.regionserver.ScanFilterEvaluator: Index(s) which will be used for columns [cf1 : a2, cf1 : a9] : {[cf1 : a2]=Index : IDX1,Index Columns : [CF : cf1,Qualifier : a2, CF : cf1,Qualifier : a3, CF : cf1,Qualifier : a9]}
2013-11-26 10:58:05,225 INFO org.apache.hadoop.hbase.index.coprocessor.regionserver.ScanFilterEvaluator: Index using for the columns [cf1 : a2] : Index : IDX1,Index Columns : [CF : cf1,Qualifier : a2, CF : cf1,Qualifier : a3, CF : cf1,Qualifier : a9]
2013-11-26 10:58:05,231 DEBUG org.apache.hadoop.hbase.index.coprocessor.regionserver.IndexRegionScannerForAND: Removing scanner org.apache.hadoop.hbase.index.coprocessor.regionserver.LeafIndexRegionScanner@3aa49259 from the list.
2013-11-26 10:58:05,873 DEBUG org.apache.hadoop.hbase.index.coprocessor.regionserver.ScanFilterEvaluator: Checking for the best index(s) for the cols combination : [[cf1 : a2, cf1 : a9]]
2013-11-26 10:58:05,873 DEBUG org.apache.hadoop.hbase.index.coprocessor.regionserver.ScanFilterEvaluator: Trying to find a best index for the cols : [cf1 : a2, cf1 : a9]
2013-11-26 10:58:05,873 DEBUG org.apache.hadoop.hbase.index.coprocessor.regionserver.ScanFilterEvaluator: Possible indices for cols [cf1 : a2, cf1 : a9] : [Index : IDX1,Index Columns : [CF : cf1,Qualifier : a2, CF : cf1,Qualifier : a3, CF : cf1,Qualifier : a9]]
2013-11-26 10:58:05,873 DEBUG org.apache.hadoop.hbase.index.coprocessor.regionserver.ScanFilterEvaluator: Best index selected for the cols [cf1 : a2, cf1 : a9] : null
2013-11-26 10:58:05,873 DEBUG org.apache.hadoop.hbase.index.coprocessor.regionserver.ScanFilterEvaluator: Not even one index found for the cols combination : [[cf1 : a2, cf1 : a9]]
2013-11-26 10:58:05,873 DEBUG org.apache.hadoop.hbase.index.coprocessor.regionserver.ScanFilterEvaluator: Checking for the best index(s) for the cols combination : [[cf1 : a2], [cf1 : a9]]
2013-11-26 10:58:05,873 DEBUG org.apache.hadoop.hbase.index.coprocessor.regionserver.ScanFilterEvaluator: Trying to find a best index for the cols : [cf1 : a2]
2013-11-26 10:58:05,874 DEBUG org.apache.hadoop.hbase.index.coprocessor.regionserver.ScanFilterEvaluator: Possible indices for cols [cf1 : a2] : [Index : IDX1,Index Columns : [CF : cf1,Qualifier : a2, CF : cf1,Qualifier : a3, CF : cf1,Qualifier : a9]]
2013-11-26 10:58:05,874 DEBUG org.apache.hadoop.hbase.index.coprocessor.regionserver.ScanFilterEvaluator: Index Index : IDX1,Index Columns : [CF : cf1,Qualifier : a2, CF : cf1,Qualifier : a3, CF : cf1,Qualifier : a9] seems to be suitable for the columns [cf1 : a2]
2013-11-26 10:58:05,874 DEBUG org.apache.hadoop.hbase.index.coprocessor.regionserver.ScanFilterEvaluator: Best index selected for the cols [cf1 : a2] : Index : IDX1,Index Columns : [CF : cf1,Qualifier : a2, CF : cf1,Qualifier : a3, CF : cf1,Qualifier : a9]
2013-11-26 10:58:05,874 DEBUG org.apache.hadoop.hbase.index.coprocessor.regionserver.ScanFilterEvaluator: Trying to find a best index for the cols : [cf1 : a9]
2013-11-26 10:58:05,874 DEBUG org.apache.hadoop.hbase.index.coprocessor.regionserver.ScanFilterEvaluator: Possible indices for cols [cf1 : a9] : [Index : IDX1,Index Columns : [CF : cf1,Qualifier : a2, CF : cf1,Qualifier : a3, CF : cf1,Qualifier : a9]]

Optimization in the column data getting stored in single column of index table

We store some 6 bytes of data with every KV getting added into the index table. (One int and one short) We can use vint here? Mostly these int and short can be represented using 1 bytes each. So in effect we can save 4 bytes with each KV in the index table.

postStartRegionOperation in batchMutate should call in try block.

presently we are calling postStartRegionOperation as after startRegionOperation.
{code}
startRegionOperation();

  if (coprocessorHost != null) {
    coprocessorHost.postStartRegionOperation();
  }

{code}

In startRegionOperation we are aquiring read lock. If any exceptions in postStartRegionOperation we will throw out the exception and the lock wont be released. This will cause block region closing or other operations which needs write lock.

If we call this in try block then we can release the lock in closeRegionOperation.

Some test cases are hanging while closing all regions in shutdown because of this issue.

CP hook for IndexHalfStoreFileReader

Now the creation of this new class/ HalfStoreFileReader is happening (to read half files) based on checks on table name in core code. Can check adding a CP hook in core code and implementing this using the new CP hook.

huawei-hadoop / hindex Goto Github PK

hindex's Introduction

hindex - Secondary Index for HBase

How it works

Put Operation

Scan Operation

Usage

Source

Building from source and testing

Note

Future Work

hindex's People

Contributors

Stargazers

Watchers

Forkers

hindex's Issues

test environment：

Recommend Projects

Recommend Topics

Recommend Org