Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HBase column qualifier limits

This is the question's setup:

I'm working in a project using HBase to store information about books availability. One of the queries I need to answer (and the one that will happen most of the time) is: Give me all the available books in a range of dates. To solve that I came up with a schema where, for every book ISBN, I have a column family with one qualifier for each day of the year and there I store how many books there are for that given day. This way I have 365 columns per row, and using ColumnRangeFilter I can return the book availability for any book (provided it's ISBN) in a given date range.

And this is the question itself:

Is there a limit in the amount of column qualifiers a row can have? Or at least a best practice for this? Because now I only have 365 column qualifiers per row, but if this project succeeds there's a chance to have around 10000 qualifiers per row. I like to know if this schema scales well for such scenario.

like image 477
Diego Avatar asked Oct 24 '25 10:10

Diego


2 Answers

Although there is actually not any limit of this kind, you must design your model keeping your use case into mind. IMHO, you are better off having wider rows if you need to do atomic operations across a broad range of data rather than having more number of rows with lesser number of columns. The advantage with this approach is that you can act atomically on several columns, as your use case demands. This will also allow readers, which are accessing a row concurrently, to see the entire update on that row.

But I see a possible downside with this approach. You might face some performance issues as a row will not be split across regions and hence will always be served by a single server putting extra load on that server. To avoid these kind of things you could probably think of having multiple column families, if it's possible in your case.

like image 72
Tariq Avatar answered Oct 26 '25 05:10

Tariq


As far as I know there is no special limit of number of columns per row declared. Everything changes but at least people discuss 'millions of columns' cases which is definitely not your case. OK, there is limit of row size but it is far away from your numbers.

Here is another potential issue with 'too wide' rows. If you don't specify exact qualifiers any scan will result whole rows so you could get much more data than you actually need. Think about ranges like 'this month'. Again, do you really want to use slow intra-row scanning to get needed column inside row?

I'd recommend instead to think about better row key design. Hope this will help you in some way.

like image 42
Roman Nikitchenko Avatar answered Oct 26 '25 05:10

Roman Nikitchenko