Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HBase access and index

Tags:

indexing

hbase

I have a HBase table with about 50 million rows and each row has several columns. My goal is to retrieve from the table those rows who have a given value in a given column, e.g. rows whose column 'col_1' has value 'val_1'.

I have two options to choose:

  1. scan through the table from the beginning to the end, and check each row and see if it should be retrieved or not;
  2. build indices for this table (e.g., indices for values in column 'col_1'), then for a given column value 'val_1', get all the row keys associated with this index 'val_1', then go through these row keys and retrieve the corresponding rows.This in my mind will involve random access to the original hbase table.

Does anyone give me some suggestions about which option runs faster, or you have another better option?

Thanks a lot!

like image 498
RecSys_2010 Avatar asked Nov 18 '25 06:11

RecSys_2010


1 Answers

Are you asking whether adding an index will make it faster? The answer is of course yes. You can see the wiki for thoughts on secondary indexes in HBase.

like image 169
Xodarap Avatar answered Nov 21 '25 09:11

Xodarap