Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HBase checkAndPut atomicity clarification

I just want to be clear how HBASE checkAndPut works, based on the documentation of HBASE,

Atomically checks if a row/family/qualifier value matches the expected value. If it does, it adds the put. If the passed value is null, the check is for the lack of column (ie: non-existance)

When it says "Atomically", i assume it will lock and isolate the row and do the comparison before it do the put to prevent any other operation for this row. Also, checkAndPut works for checking non-existence, if the row key is not existing, what it will isolate/lock?

I have 2 theory on this:

  1. Either HBASE checkAndPut don't isolate any row if its not existing, does it mean it is possible that when you do checkAnPut on the same row that is not existing at the same time, both will be processed successfully?

  2. Is it isolating by row key?

I just wanted to confirm which is the correct implementation but for me the ideal would be the second one.

Or HBASE checkAndPut is not ideal to use for checking the existence of a row? Maybe it is only ideal to use when a row is existing, and only checking the family/qualifier? Because the JAVA API looks like this.

like image 422
Azel Avatar asked Oct 26 '25 03:10

Azel


1 Answers

Before trying to understand how checkAndPut behaves in case of a non-existing row, you should first understand how mutations work in HBase.

Mutations in HBase

A mutation in HBase is any write operation e.g. Put, Delete etc. Since HBase is a strongly consistent system and it provides atomicity guarantees for a single row (across column families), all the mutations for a particular row have to go through the same server. You should read more on the concept of regions and regionservers in HBase documentation to understand how HBase divides the responsibility of serving non-overlapping partitions of the row key space across a bunch of servers.

Whenever, a regionserver gets a mutation for a particular row, it acquires an in-memory write lock on the value of that row key. This essentially means four things:

  1. Since one row can be written by only one regionserver, there can never be more than one servers trying to write to and acquire lock for the same row.
  2. Since the lock is in memory, if the server crashes immediately after the lock acquistion, the lock is automatically released. The region's responsibility will then gracefully move to a new server, but your operation would have failed (not accounting for automatic retries on the client).
  3. Since the write lock is for the whole row, a mutation to column x will cause operations to column y of the same row to get blocked.
  4. Since the lock is on the value of the row key (the regionserver maintains a list of currently locked rows in memory), the row does not necessarily have to exist beforehand.

CheckAndPut is no different from a regular Put in terms of locking semantics. The only difference lies in the fact that it does an extra Get operation after locking the row key to verify the existing value of a column for that row key (it can be null, the row key might not exist at all yet). This is also the reason the row key for which the Put has been generated has to be the same as the row key for which the Get operation is generated. Otherwise, the in-memory locking semantics won't be able to provide consistency guarantees. This works well with HBase's other ACID guarantees, which are also provided only at the level of a single row.

like image 137
Ashu Pachauri Avatar answered Oct 29 '25 08:10

Ashu Pachauri



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!