What is the principle behind updating data in HBase?

2 years ago

Olivia Parker

2 minutes

The process of updating data in HBase is carried out through the following steps:

The client sends update requests to the master node of HBase.
After receiving a request, the HBase master node forwards it to the corresponding Region Server based on the partitioning rules of the table.
The target Region Server receives the request and searches for the data to be updated in memory.
If the data exists in memory, the Region Server will directly update the data in memory and then write the updated data into the Write-Ahead Log (WAL) to ensure the data’s durability.
If the data is not present in memory, the Region Server will retrieve it from the HFile, which is the underlying data storage file in HBase. If the data that needs to be updated is found in the HFile, the Region Server will load the data into memory for updating and then write the updated data into the WAL log file.
After the update operation is completed, the Region Server will write the updated data to the MemStore (in-memory storage).
When the data in MemStore reaches a certain size, the Region Server will flush the data to the HFile on disk.
After successfully updating, the Region Server will respond back to the client.

In general, the principle behind updating data in HBase involves the client sending update requests to the master node, which then carries out the actual data update operation through the Region Server. The updated data is first written to a WAL log file, then stored in the in-memory MemStore, and eventually flushed to the disk’s HFile to ensure data persistence.