I just want to be clear how HBASE checkAndPut works, based on the documentation of HBASE,
Atomically checks if a row/family/qualifier value matches the expected value. If it does, it adds the put. If the passed value is null, the check is for the lack of column (ie: non-existance)
When it says "Atomically", i assume it will lock and isolate the row and do the comparison before it do the put to prevent any other operation for this row. Also, checkAndPut works for checking non-existence, if the row key is not existing, what it will isolate/lock?
I have 2 theory on this:
Either HBASE checkAndPut don't isolate any row if its not existing, does it mean it is possible that when you do checkAnPut on the same row that is not existing at the same time, both will be processed successfully?
Is it isolating by row key?
I just wanted to confirm which is the correct implementation but for me the ideal would be the second one.
Or HBASE checkAndPut is not ideal to use for checking the existence of a row? Maybe it is only ideal to use when a row is existing, and only checking the family/qualifier? Because the JAVA API looks like this.
Before trying to understand how checkAndPut behaves in case of a non-existing row, you should first understand how mutations work in HBase.
A mutation in HBase is any write operation e.g. Put, Delete etc. Since HBase is a strongly consistent system and it provides atomicity guarantees for a single row (across column families), all the mutations for a particular row have to go through the same server. You should read more on the concept of regions and regionservers in HBase documentation to understand how HBase divides the responsibility of serving non-overlapping partitions of the row key space across a bunch of servers.
Whenever, a regionserver gets a mutation for a particular row, it acquires an in-memory write lock on the value of that row key. This essentially means four things:
x will cause operations to column y of the same row to get blocked.CheckAndPut is no different from a regular Put in terms of locking semantics. The only difference lies in the fact that it does an extra Get operation after locking the row key to verify the existing value of a column for that row key (it can be null, the row key might not exist at all yet). This is also the reason the row key for which the Put has been generated has to be the same as the row key for which the Get operation is generated. Otherwise, the in-memory locking semantics won't be able to provide consistency guarantees.
This works well with HBase's other ACID guarantees, which are also provided only at the level of a single row.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With