How to use hindex for scanning data? #49

xuxc · 2014-07-28T08:20:53Z

i have deployed Hadoop and hindex successfully, created table and inserted data , index table also existed, so ,how do i scan for special Qualifier which has index? like the code:
get 'test','rowkey','Family:Qualifier','value' ?

hy2014 · 2014-07-30T03:00:28Z

just like before, if you add filter which include the index column, the hindex can get it. No need to add any code in client.

chrajeshbabu · 2014-08-25T11:19:55Z

Need not make any client changes to make use of index. Internally we have filter evaluator to check whether to make use of index or not.

xuxc · 2014-11-03T08:15:46Z

now i have used 3 filters to filter data ,and three cols all have index,
but for 200W data ,it costs almost 40s ,i'd like to know whether indexes worked?
@chrajeshbabu
thank u.

anoopsjohn · 2014-11-03T10:23:57Z

Can u bit more clear your table schema and query? Total how many rows of data with you?

xuxc · 2014-11-03T10:34:17Z

the table with a CF:"info",and 17 cols under "info", and i create index on every col at the time i creating the table,
when i use filter as this:
List filters = new ArrayList();
Filter filter1 = new SingleColumnValueFilter(Bytes.toBytes("info"), Bytes.toBytes("style_No"), CompareOp.EQUAL, Bytes.toBytes("4674"));
filters.add(filter1);
Filter filter2 = new SingleColumnValueFilter(Bytes.toBytes("info"), Bytes.toBytes("country_No"), CompareOp.EQUAL, Bytes.toBytes("3871"));
filters.add(filter2);
FilterList filterList1 = new FilterList(filters);
Scan scan = new Scan();
scan.setFilter(filterList1);

hbase-site.xm is rightl:

hbase.rootdir
hdfs://namenode:9000/hbase

hbase.cluster.distributed
true

hbase.master
hdfs://namenode:60000

hbase.tmp.dir
/home/hadoop/tmp/data

hbase.zookeeper.quorum
namenode,datanode1,datanode2

hbase.zookeeper.property.dataDir
${hbase.tmp.dir}/zookeeper

hbase.use.secondary.index
true

hbase.coprocessor.master.classes org.apache.hadoop.hbase.index.coprocessor.master.IndexMasterObserver

hbase.coprocessor.region.classes
org.apache.hadoop.hbase.index.coprocessor.regionserver.IndexRegionObserver

hbase.coprocessor.wal.classes
org.apache.hadoop.hbase.index.coprocessor.wal.IndexWALObserver

It's take almost 40 seconds to get result for 2million rows.
i'm afraid the indexes doesn't work....

anoopsjohn · 2014-11-03T10:42:18Z

So the total data on which scan happened is 2 million and data satisfying the condition is less than that? Or the actual fetched data is 2 million. Just trying to know the data size.
How big cluster? Total how many regions?
So I assume you are using HFile block size as default ie. 64KB. Can try reducing that

xuxc · 2014-11-03T10:50:40Z

the cluster with just 3 nodes,.
does hindex's 2nd filter select data(resultsets) from which be got after 1st filter?
or every filter do full scan on index_table?

anoopsjohn · 2014-11-03T10:55:33Z

None of the filter do full scan on index table.
So your indexed columns type is String only and you do equals condition.
So for the index table scan we will create start and stop row.
As this query covers 2 index, we will have 2 index scanners which retrieve data (at server side) simultaneously and using AND find the data rks. If there can be single index on both these columns that will be better any way. Just saying. Any idea you have, when there is no index usage, what time it will take to do the above query?

xuxc · 2014-11-03T11:10:22Z

i am sorry to put forward such unprofessional problem ..>.<,
what i want to say is that :

i first create 17 indexes on each col ,and create the table,
2.load 2million data to the table,and indexes worked;
3.i use filters to query by java API(just as methoded above );
4.i found it need 40 sec to get the results..
May i have a code .java for reference?..Maybe the codes have something wrong...

anoopsjohn · 2014-11-03T11:36:02Z

So your total rows count is 2 million. Can you tell me how many rows satisfy above said condition (col1=? AND col2=?)
Also any idea you have that when you don't declare any column for index (normal full table scan) what time it takes(?)

xuxc · 2014-11-03T12:09:02Z

within 10 rows satisfy above said condition,and it spends 40 seconds getting results.all cols have indexes..

xuxc · 2014-11-03T12:16:17Z

'qx', {NAME => 'info', DATA_BLOCK_ENCODING => 'NONE true
', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0',
VERSIONS => '3', COMPRESSION => 'NONE', MIN_VERSIO
NS => '0', TTL => '2147483647', KEEP_DELETED_CELLS
=> 'false', BLOCKSIZE => '65536', IN_MEMORY => 'fal
se', ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true'
}

'qx_idx', {METHOD => 'table_att', MAX_FILESIZE => ' true
9223372036854775807', CONFIG => {'SPLIT_POLICY' =>
'org.apache.hadoop.hbase.regionserver.ConstantSizeR
egionSplitPolicy'}}, {NAME => 'd', DATA_BLOCK_ENCOD
ING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_S
COPE => '0', COMPRESSION => 'NONE', VERSIONS => '3'
, TTL => '2147483647', MIN_VERSIONS => '0', KEEP_DE
LETED_CELLS => 'false', BLOCKSIZE => '65536', ENCOD
E_ON_DISK => 'true', IN_MEMORY => 'false', BLOCKCAC
HE => 'true'}

anoopsjohn · 2014-11-03T12:22:16Z

Only 10 rows and taking 40 secs seems too much !
Total how many regions in these 3 nodes?
I doubt whether index is getting used or not ...

Do u have below way
Have so many rows satisfying col1 condition alone
And so many rows satisfying col2 condition
And both together many be max 10

xuxc · 2014-11-03T12:31:39Z

yeah ,i think so ,
Maybe so many rows satisfying one of 2 condition ,but both together many be max 10.
and it has 5 regions in 3 nodes.
so i want to know if index works~

use filters:
HTablePool pool = new HTablePool(configuration, 1000);
List filters = new ArrayList();
Filter filter1 = new SingleColumnValueFilter(Bytes
.toBytes("info"), Bytes
.toBytes("style_No"), CompareOp.EQUAL, Bytes
.toBytes("4674"));
filters.add(filter1);
Filter filter2 = new SingleColumnValueFilter(Bytes
.toBytes("info"), Bytes
.toBytes("country_No"), CompareOp.EQUAL, Bytes
.toBytes("3871"));
filters.add(filter2);
FilterList filterList1 = new FilterList(filters);
Scan scan = new Scan();
scan.setFilter(filterList1);
//ResultScanner rs = table.getScanner(scan);// hindex Filter
ResultScanner rs = pool.getTable(tableName).getScanner(scan);
{....code...}
May i have your email and send u some pics?
@hy2014 @chrajeshbabu @anoopsjohn

anoopsjohn · 2014-11-03T13:56:36Z

[email protected]

xuxc · 2014-11-04T06:57:38Z

i got the point,
the index name is "contry_No",and when i loaded data into Hbase,the col name is "country_No"...
BTW,i found a interesting thing, i new the filter:
new SingleColumnValueFilter(Bytes
.toBytes("info"), Bytes
.toBytes("contry_No"), CompareOp.EQUAL, Bytes
.toBytes("8600"));

and the actual col is "info:country",but after full scan the table, it get the results correctly!!
thank you , @anoopsjohn

anoopsjohn · 2014-11-04T07:08:25Z

So after correcting the name how long the query with usage of index takes? I hope it will much much lower than 40 sec.

xuxc · 2014-11-04T08:24:24Z

within 1 sec. hindex is so fast!!
i have got the ResultScanner rs in Dao.java and "return" it, and i want to show data in rs in other page , but in the Action.java ,rs is not null,but there is also no Result r in it,whether ResultScanner can't be returned?
code as follows:
Dao.java-------
{ ................
rs=pool.getTable(tableName).getScanner(scan);
return rs;
}

Action.java--------
{
QueryHbaseService qhsi=new QueryHbaseServiceImpl();
rs=qhsi.queryHbase(condMap); //call for Dao.java and return Rs
for (Result r : rs) { //codes in " for clause" is undo.
/* where is can't be reach */
for (KeyValue keyValue : r.raw()) { ... }
}
}

anoopsjohn · 2014-11-04T10:53:59Z

There should not problem using the ResultScanner in one class or a passed in class.. Not sure what is the problem you are facing. Can you check at the logs in client and RS side?

xuxc · 2014-11-05T06:12:28Z

Hindex is prefect for indexing data in Hbase.
And further tests will be done.
thank u very much!

xuxc changed the title ~~Run Error with Eclipse~~ How to use hindex for scanning? Jul 28, 2014

xuxc changed the title ~~How to use hindex for scanning?~~ How to use hindex for scanning data? Jul 28, 2014

chrajeshbabu closed this as completed Aug 25, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use hindex for scanning data? #49

How to use hindex for scanning data? #49

xuxc commented Jul 28, 2014

hy2014 commented Jul 30, 2014

chrajeshbabu commented Aug 25, 2014

xuxc commented Nov 3, 2014

anoopsjohn commented Nov 3, 2014

xuxc commented Nov 3, 2014

anoopsjohn commented Nov 3, 2014

xuxc commented Nov 3, 2014

anoopsjohn commented Nov 3, 2014

xuxc commented Nov 3, 2014

anoopsjohn commented Nov 3, 2014

xuxc commented Nov 3, 2014

xuxc commented Nov 3, 2014

anoopsjohn commented Nov 3, 2014

xuxc commented Nov 3, 2014

anoopsjohn commented Nov 3, 2014

xuxc commented Nov 4, 2014

anoopsjohn commented Nov 4, 2014

xuxc commented Nov 4, 2014

anoopsjohn commented Nov 4, 2014

xuxc commented Nov 5, 2014

How to use hindex for scanning data? #49

How to use hindex for scanning data? #49

Comments

xuxc commented Jul 28, 2014

hy2014 commented Jul 30, 2014

chrajeshbabu commented Aug 25, 2014

xuxc commented Nov 3, 2014

anoopsjohn commented Nov 3, 2014

xuxc commented Nov 3, 2014

anoopsjohn commented Nov 3, 2014

xuxc commented Nov 3, 2014

anoopsjohn commented Nov 3, 2014

xuxc commented Nov 3, 2014

anoopsjohn commented Nov 3, 2014

xuxc commented Nov 3, 2014

xuxc commented Nov 3, 2014

anoopsjohn commented Nov 3, 2014

xuxc commented Nov 3, 2014

anoopsjohn commented Nov 3, 2014

xuxc commented Nov 4, 2014

anoopsjohn commented Nov 4, 2014

xuxc commented Nov 4, 2014

anoopsjohn commented Nov 4, 2014

xuxc commented Nov 5, 2014