Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use hindex for scanning data? #49

Closed
xuxc opened this issue Jul 28, 2014 · 20 comments
Closed

How to use hindex for scanning data? #49

xuxc opened this issue Jul 28, 2014 · 20 comments

Comments

@xuxc
Copy link

xuxc commented Jul 28, 2014

i have deployed Hadoop and hindex successfully, created table and inserted data , index table also existed, so ,how do i scan for special Qualifier which has index? like the code:
get 'test','rowkey','Family:Qualifier','value' ?

@xuxc xuxc changed the title Run Error with Eclipse How to use hindex for scanning? Jul 28, 2014
@xuxc xuxc changed the title How to use hindex for scanning? How to use hindex for scanning data? Jul 28, 2014
@hy2014
Copy link

hy2014 commented Jul 30, 2014

just like before, if you add filter which include the index column, the hindex can get it. No need to add any code in client.

@chrajeshbabu
Copy link
Member

Need not make any client changes to make use of index. Internally we have filter evaluator to check whether to make use of index or not.

@xuxc
Copy link
Author

xuxc commented Nov 3, 2014

now i have used 3 filters to filter data ,and three cols all have index,
but for 200W data ,it costs almost 40s ,i'd like to know whether indexes worked?
@chrajeshbabu
thank u.

@anoopsjohn
Copy link

Can u bit more clear your table schema and query? Total how many rows of data with you?

@xuxc
Copy link
Author

xuxc commented Nov 3, 2014

the table with a CF:"info",and 17 cols under "info", and i create index on every col at the time i creating the table,
when i use filter as this:
List filters = new ArrayList();
Filter filter1 = new SingleColumnValueFilter(Bytes.toBytes("info"), Bytes.toBytes("style_No"), CompareOp.EQUAL, Bytes.toBytes("4674"));
filters.add(filter1);
Filter filter2 = new SingleColumnValueFilter(Bytes.toBytes("info"), Bytes.toBytes("country_No"), CompareOp.EQUAL, Bytes.toBytes("3871"));
filters.add(filter2);
FilterList filterList1 = new FilterList(filters);
Scan scan = new Scan();
scan.setFilter(filterList1);

hbase-site.xm is rightl:


hbase.rootdir
hdfs://namenode:9000/hbase


hbase.cluster.distributed
true


hbase.master
hdfs://namenode:60000


hbase.tmp.dir
/home/hadoop/tmp/data


hbase.zookeeper.quorum
namenode,datanode1,datanode2


hbase.zookeeper.property.dataDir
${hbase.tmp.dir}/zookeeper


hbase.use.secondary.index
true


hbase.coprocessor.master.classes org.apache.hadoop.hbase.index.coprocessor.master.IndexMasterObserver


hbase.coprocessor.region.classes
org.apache.hadoop.hbase.index.coprocessor.regionserver.IndexRegionObserver


hbase.coprocessor.wal.classes
org.apache.hadoop.hbase.index.coprocessor.wal.IndexWALObserver

It's take almost 40 seconds to get result for 2million rows.
i'm afraid the indexes doesn't work....

@anoopsjohn
Copy link

So the total data on which scan happened is 2 million and data satisfying the condition is less than that? Or the actual fetched data is 2 million. Just trying to know the data size.
How big cluster? Total how many regions?
So I assume you are using HFile block size as default ie. 64KB. Can try reducing that

@xuxc
Copy link
Author

xuxc commented Nov 3, 2014

the cluster with just 3 nodes,.
does hindex's 2nd filter select data(resultsets) from which be got after 1st filter?
or every filter do full scan on index_table?

@anoopsjohn
Copy link

None of the filter do full scan on index table.
So your indexed columns type is String only and you do equals condition.
So for the index table scan we will create start and stop row.
As this query covers 2 index, we will have 2 index scanners which retrieve data (at server side) simultaneously and using AND find the data rks. If there can be single index on both these columns that will be better any way. Just saying. Any idea you have, when there is no index usage, what time it will take to do the above query?

@xuxc
Copy link
Author

xuxc commented Nov 3, 2014

i am sorry to put forward such unprofessional problem ..>.<,
what i want to say is that :

  1. i first create 17 indexes on each col ,and create the table,
    2.load 2million data to the table,and indexes worked;
    3.i use filters to query by java API(just as methoded above );
    4.i found it need 40 sec to get the results..
    May i have a code .java for reference?..Maybe the codes have something wrong...

@anoopsjohn
Copy link

So your total rows count is 2 million. Can you tell me how many rows satisfy above said condition (col1=? AND col2=?)
Also any idea you have that when you don't declare any column for index (normal full table scan) what time it takes(?)

@xuxc
Copy link
Author

xuxc commented Nov 3, 2014

within 10 rows satisfy above said condition,and it spends 40 seconds getting results.all cols have indexes..

@xuxc
Copy link
Author

xuxc commented Nov 3, 2014

'qx', {NAME => 'info', DATA_BLOCK_ENCODING => 'NONE true
', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0',
VERSIONS => '3', COMPRESSION => 'NONE', MIN_VERSIO
NS => '0', TTL => '2147483647', KEEP_DELETED_CELLS
=> 'false', BLOCKSIZE => '65536', IN_MEMORY => 'fal
se', ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true'
}

'qx_idx', {METHOD => 'table_att', MAX_FILESIZE => ' true
9223372036854775807', CONFIG => {'SPLIT_POLICY' =>
'org.apache.hadoop.hbase.regionserver.ConstantSizeR
egionSplitPolicy'}}, {NAME => 'd', DATA_BLOCK_ENCOD
ING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_S
COPE => '0', COMPRESSION => 'NONE', VERSIONS => '3'
, TTL => '2147483647', MIN_VERSIONS => '0', KEEP_DE
LETED_CELLS => 'false', BLOCKSIZE => '65536', ENCOD
E_ON_DISK => 'true', IN_MEMORY => 'false', BLOCKCAC
HE => 'true'}

@anoopsjohn
Copy link

Only 10 rows and taking 40 secs seems too much !
Total how many regions in these 3 nodes?
I doubt whether index is getting used or not ...

Do u have below way
Have so many rows satisfying col1 condition alone
And so many rows satisfying col2 condition
And both together many be max 10

@xuxc
Copy link
Author

xuxc commented Nov 3, 2014

yeah ,i think so ,
Maybe so many rows satisfying one of 2 condition ,but both together many be max 10.
and it has 5 regions in 3 nodes.
so i want to know if index works~

use filters:
HTablePool pool = new HTablePool(configuration, 1000);
List filters = new ArrayList();
Filter filter1 = new SingleColumnValueFilter(Bytes
.toBytes("info"), Bytes
.toBytes("style_No"), CompareOp.EQUAL, Bytes
.toBytes("4674"));
filters.add(filter1);
Filter filter2 = new SingleColumnValueFilter(Bytes
.toBytes("info"), Bytes
.toBytes("country_No"), CompareOp.EQUAL, Bytes
.toBytes("3871"));
filters.add(filter2);
FilterList filterList1 = new FilterList(filters);
Scan scan = new Scan();
scan.setFilter(filterList1);
//ResultScanner rs = table.getScanner(scan);// hindex Filter
ResultScanner rs = pool.getTable(tableName).getScanner(scan);
{....code...}
May i have your email and send u some pics?
@hy2014 @chrajeshbabu @anoopsjohn

@anoopsjohn
Copy link

[email protected]

@xuxc
Copy link
Author

xuxc commented Nov 4, 2014

i got the point,
the index name is "contry_No",and when i loaded data into Hbase,the col name is "country_No"...
BTW,i found a interesting thing, i new the filter:
new SingleColumnValueFilter(Bytes
.toBytes("info"), Bytes
.toBytes("contry_No"), CompareOp.EQUAL, Bytes
.toBytes("8600"));

and the actual col is "info:country",but after full scan the table, it get the results correctly!!
thank you , @anoopsjohn

@anoopsjohn
Copy link

So after correcting the name how long the query with usage of index takes? I hope it will much much lower than 40 sec.

@xuxc
Copy link
Author

xuxc commented Nov 4, 2014

within 1 sec. hindex is so fast!!
i have got the ResultScanner rs in Dao.java and "return" it, and i want to show data in rs in other page , but in the Action.java ,rs is not null,but there is also no Result r in it,whether ResultScanner can't be returned?
code as follows:
Dao.java-------
{ ................
rs=pool.getTable(tableName).getScanner(scan);
return rs;
}

Action.java--------
{
QueryHbaseService qhsi=new QueryHbaseServiceImpl();
rs=qhsi.queryHbase(condMap); //call for Dao.java and return Rs
for (Result r : rs) { //codes in " for clause" is undo.
/* where is can't be reach */
for (KeyValue keyValue : r.raw()) { ... }
}
}

@anoopsjohn
Copy link

There should not problem using the ResultScanner in one class or a passed in class.. Not sure what is the problem you are facing. Can you check at the logs in client and RS side?

@xuxc
Copy link
Author

xuxc commented Nov 5, 2014

Hindex is prefect for indexing data in Hbase.
And further tests will be done.
thank u very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants