HBase as primary NoSql Hadoop storage

Three admins once went to noSql bar, but a little while walked away from there as they could not find a table” – says one popular joke. The statement is arguable from my point of view, but it undercover the general idea – differences of approaches of treating the data. Most of developers usually start their career from working with some RDBMS products like Oracle or MSSQL server. These systems keep the data in some normalized format represented by a sets of tables and relations between them. In many cases this model ideally suits to accomplish the data-related requirements of many products. But sometimes such structure brings certain overheads which negatively effects the end performance of the application. In order to overcome these restrictions an alternative NoSQL approach for keeping and managing the data was invented. There are a lot of different implementations of this technique and each of them targets to resolve some individual tasks, but from Hadoop perspective HBase is probably one of the most commonly used NoSQL database which provides a strong and reliable mechanism for managing huge amounts of data across distributed environment. In this article I want to describe the general idea of this product and show some examples of working with it.

The origin of HBase is closely related to another product, invented and driven by Google called Big Table. This technology allows to keep PetaBytes of data across thousands of machines with an option of close to real-time accessing this data. HBase is an actual implementation of this approach in scope of Hadoop environment. It is oriented on keeping the data in a set of big KeyValues based tables. The usually key represents some common unique attribute of the data and values contain of a set of dynamic requisites which could vary from one row to another. Here are some feature of HBase which I want to point out from personal experience:

  • Oriented to work with big amounts of data, like billions of rows. Otherwise RDBMS probably would be a better option for working with the data
  • Does not support ACID features and Transactions by default
  • Provides close to real-time data access upon the key attribute, but search upon other attributes assumes full-table scans
  • Does not support Secondary indexes in default implementation
  • Allows to keep different sets of columns per each individual row within same table
  • Keeps all the data distributed across the nodes of the cluster inside HDFS DataNodes

The primary container of the information in HBase is a Table. A set of tables group a Namespace – an equivalent of Database in RDBMS. Each table contains a collection of KeyValues called Rows. The row is represented by primary attribute called  Key (unique binary array) and collection of values, combined into special groups called Column Families. Each column family has a set of column cells called Column Qualifiers which hold the end values at some moment of time. Common HBase table represents three-dimensional dynamic container with rows-columns-time dimensions for storing the data:

DB name         Table name     Column Family       Column Qualifier        Time              Value
NameSpace  => peopleTable=> {1, PersonalCF}    => {FamilyName at Time 11/05/2010}  =  Jone
                                                                                              => {FamilyName at Time 03/07/2013} =  Boyle
                                                                                              => {ForeName at Time 11/05/2010}      =  Marta
                                                           {1, ProfessionCF} => {Occupation at Time 01/05/2014}     =  Painter
                                                           {2, PersonalCF}    => {FamilyName at Time 02/01/2015} =  Spencer
                                                                                              => {ForeName at Time 02/01/2015}      =  Mark

In such structure we can see certain differences comparing to standard relational model. First we can have different sets of columns per each individual row which avoid us from storing empty cells comparing to normalized approach. Besides we can keep several versions of the value of certain cell. Some RDBMS are also extended to persist time dimension within their data storage, but in HBase this feature is embedded as a fundamental part of the system.

HBase architecture

When we are developing some custom applications we usually don’t care much about the cost of the access operations related to our data structures. We are using Binary trees, Red-Black trees, heaps or vectors to standardize the information in order to perform basic operations upon it. But once our data structure goes outside the scope of the virtual memory to some less performant storage like hard drive, the price of such access highly increases. If you have some experience of working with RDBMS products you probably know that most systems use B-Tree+ structure for keeping the data. Such model allows to significantly improve the performance of working with the data by minimizing the amounts of IO operations to the local hard drives. This approach perfectly suits for some static data, but when the information changes frequently, it leads to higher fragmentation factor across the storage and as a result additional IO operations affecting the end performance. HBase uses another data structure called LSM-tree for operating the data. According to such model the data consists of two parts – in-memory tree which contains most recent updates upon the data and disk store tree which arranges the rest part of the data into a form of immutable sequential B-tree located on the hard drive. From time to time HBase service decides that it has enough changes in memory to flush them into file storage. In that case it performs the rolling merge of data from the virtual space to disc, executing an operation similar to merge step of Merge sort algorithm:

part7lms

Figure 1 – LSM-tree Merge process

Such technique will guarantee that all the actual data on the local hard drive would be always stored sequentially and every read operation would require a minimal number of random IO operations upon the original hard drive. At the same time most recent part of information with all deletes, inserts and updates would always be stored in memory which by its nature does not have such restrictions related to number of random reads. When end client will want to perform read operation, service will first try to get the latest state of the information from the memory tree and then will try to find it in the file store on the hard drive.

In HBase infrastructure such data model is based on several components which organize  all data across the cluster as a collections of LSM-trees located on slave servers and driven by the main master service. From high-level perspective HBase consists of following components:

  1. HMaster – primary HBase service which maintains the correct state of slave Region Server nodes by managing and balancing the data among them. Besides it drives the changes of metadata information in the storage, like table or column creations and updates.
  2. Zookeeper – represents a distributed coordinator storage used by HBase services and its clients to keep reconciled up-to-date information about naming and configurations.
  3. Regional servers – HBase worker nodes which perform the management and storage of pieces of the information in LSM-tree fashion
  4. HDFS – used by Regional servers behind the scene for the actual storage of the data

From Low-level the most part of HBase functionality is located within Regional server which performs the read-write work upon the tables. Every table technically can be distributed across different Regional servers as a collection of of separate pieces called HRegions. Single Regional server node can hold several HRegions of one table. Each HRegion holds a certain range of rows shared between the memory and disc space and sorted by key attribute. These ranges do not intersect between different regions so we can relay on their sequential behavior across the cluster. Individual Regional server HRegion includes following parts:

  1. Write Ahead Log (WAL) file – the first place when data is been persisted on every write operation before getting into Memory. As I’ve mentioned earlier the first part of the LSM-tree is kept in memory, which means that it can be affected by some external factors like power lose from example. Keeping the log file of such operations in a separate place would allow to restore this part easily without any looses.
  2. Memstore – keeps a sorted collection of most recent updates of the information in the memory. It is the actual implementation of the first part of LMS-tree structure, described earlier. Periodically performs rolling merges into the store files called HFiles on the local hard drives
  3. HFile – represents a small pieces of date received from the Memstore and saved in  HDFS. Each HFile contains sorted KeyValues collection and B-Tree+ index which allows to seek the data without reading the whole file. Periodically HBase performs merge sort operations upon these files to make them fit the configured size of standard HDFS block and avoid small files problem

In this diagram I tried to summarize this model in a single workflow:

Part7Chart.png

Figure 2 – HBase architecture

Accessing the storage

HBase provides a couple of techniques for accessing the data:

  • HBase CLI – allows to access the data directly from the cluster using hbase shell tool through a set of special commands
  • HBase native Jave API – provides a set of Java classes which allow to perform most commonly used operations upon HBase Data store. Considered to be a Native HBase client.
  • Stargate RESTful API – provides a set of web service methods upon HBase. Supports Kerberos authentication through SPNEGO protocol
  • Thrif API – allows to create lightweight cross-language proxy interfaces for accessing the data through native API
  • Hive External Table – HBase storage can be accessed from Hive using HBaseStorageHandler by means of HiveQL language
  • Integration through Phoenix – Phoenix is a special engine which works upon HBase and allows to perform SQL-based queries and to decorate the storage with extra features like secondary indexes
  • Integration through Drill – Apache drill is another framework which is working upon HBase and allows to operate with HBase in a SQL-based fashion

As you can see there is a great variety of techniques for working with this product and each of them probably worth individual article to cover at least some primary concepts. Personally I advice you start working with the data using CLI as it gives some basic understanding of internal processes without any high-level abstractions.

Now I want to  show you how things are working inside the storage. For this purpose we are going to use HBase CLI client. Let launch this tool in our Sandbox, then create new table and put some data inside it:

hbase shell

hbase(main):002:0> create ‘people’, ‘personaldata’, ‘professionaldata’

hbase(main):002:0>list

people
1 row(s) in 0.0580 seconds

       hbase(main):002:0>put ‘people’,’1′,’personaldata:firstname’,’oleksii’

hbase(main):002:0>put ‘people’,’1′,’personaldata:lastname’,’yermolenko’

hbase(main):002:0>put ‘people’,’1′,’professionaldata:profession’,’developer’

hbase(main):002:0> get ‘people’,’1′

COLUMN                                               CELL
 personaldata:firstname                              timestamp=1487674414844, value=oleksii
 personaldata:lastname                               timestamp=1487674425019, value=yermolenko
 professionaldata:profession                      timestamp=1487674562249, value=developer
3 row(s) in 0.0200 seconds

Our new table people now contains one row with key equals “1” and three column qualifiers stored in two column families. In this article I’ve mentioned before that for reading the data clients first need to identify its location using metadata. Technically this information is represented by hbase:meta table located on one of the region servers of the cluster. Clients use Zookeeper to identify the actual region which holds this data. For our CLI this operation would be transparent, but you can connect to the Zookeeper and find this information yourself using Zookeeper CLI tool:

/usr/hdp/current/zookeeper-client/bin/zkCli.sh -server localhost:2181

[zk: localhost:2181(CONNECTED) 1] get /hbase-unsecure/meta-region-server

regionserver:1602����Y�HPBUF
(localhost�}

Now lets look into the metadata of our new table. For search operations HBase uses scanners which iterate over the Region Servers and filter the records according to the search conditions. To get the information about our people table from metadata table we can apply ROWPREFIX scanner:

hbase(main):005:0> scan ‘hbase:meta’, {ROWPREFIXFILTER => ‘people’}

people,,1487673905633.358f2c1572b7275fafe5a263abec0, column=info:regioninfo,value={NAME=> ‘people,,1487673905633.358f2c1572b7275fafe5a263abec0‘, STARTKEY => ”, ENDKEY => ”}

people,,1487673905633.358f2c1572b7275fafe5a263abec0,column=info:server, value=localhost:16020

As you see our metadata defines the name of the region, its actual location, start key and end key which are currently empty due to lack of values in our table. If we check our physical storage on HDFS we will be able to localize all the actual data inside the cluster directly. The primary location of the data store is defined in hbase.rootdir property of hbase-site.xml file, /apps/hbase in my case. This directory stores a number of folders but I want to point out two of them –/data/WALs which contains the log files and /data/data which contains the rest part of  information. If you are lucky and HBase didn’t flush the data into the disk, you’ll be able to read it using special hbase wal command directly from the log files:

hdfs dfs -lsr /apps/hbase/data/WALs

/apps/hbase/data/WALs/bil-hdp-app-02.prometric.qc2,16020,1486983757064
/apps/hbase/data/WALs/bil-hdp-app-02.prometric.qc2,16020,1486983757064/bil-hdp-app-02.prometric.qc2%2C16020%2C1486983757064.default.1487675047638

hbase wal /apps/hbase/data/WALs/localhost,16020,1486983757064/localhost%2C16020%2C1486983757064.default.1487675047638

Sequence=5 , region=358f2c1572b7275fafe5a263abec0 at write timestamp=Tue Feb 21 06:53:46 EST 2017
row=1, column=personaldata:firstname
Sequence=7 , region=358f2c1572b7275fafe5a263abec0 at write timestamp=Tue Feb 21 06:53:52 EST 2017
row=1, column=personaldata:lastname
Sequence=9 , region=358f2c1572b7275fafe5a263abec0at write timestamp=Tue Feb 21 06:53:58 EST 2017
row=1, column=professionaldata:profession

Data files of our table are located at /data/data/default/people catalog. It contains a separate partition folder per each region of the table. If the data is still in memory we will not be able to see anything there:

 hdfs dfs -lsr /apps/hbase/data/data/default/people

drwxr-xr-x 0 /default/people/358f2c1572b7275fafe5a263abec0
-rw-r–r– 41 /default/people/358f2c1572b7275fafe5a263abec0/.regioninfo
drwxr-xr-x   0 /default/people/358f2c1572b7275fafe5a263abec0/personaldata
drwxr-xr-x   0/default/people/358f2c1572b7275fafe5a263abec0/professionaldata

Now lets perform the flush operation manually and review the content of our WAL file and data folder:

echo “flush ‘people'” | hbase shell

hdfs dfs -lsr /apps/hbase/data/data/default/people

drwxr-xr-x 0 /default/people/358f2c1572b7275fafe5a263abec0
-rw-r–r– 41 /default/people/358f2c1572b7275fafe5a263abec0/.regioninfo
drwxr-xr-x   0 /default/people/358f2c1572b7275fafe5a263abec0/personaldata  rw-r–r– 5054 /apps/hbase/data/data/default/people/358f2c1572b7275fafe5a263abec0/personaldata/5345037f8ae04e78896e901cff2a73e6
drwxr-xr-x   0/default/people/358f2c1572b7275fafe5a263abec0/professionaldata -rw-r–r–   5018 /apps/hbase/data/data/default/people/ 358f2c1572b7275fafe5a263abec0/professionaldata/df462fa12f0645f79af574bf9b8a7962

hbase wal /apps/hbase/data/WALs/localhost,16020,1486983757064/localhost%2C16020%2C1486983757064.default.1487675047638

Sequence=11 , region=358f2c1572b7275fafe5a263abec0 at write timestamp=Tue Feb 21 07:39:35 EST 2017 row=\x00, column=METAFAMILY:HBASE::FLUSH
Sequence=12 , region=358f2c1572b7275fafe5a263abec0 at write timestamp=Tue Feb 21 07:39:36 EST 2017 row=\x00, column=METAFAMILY:HBASE::FLUSH

We see that our data store now contains some files in column family related catalogs. These files are HFiles which hold the end data on the bottom of HBase. Besides after merge operation we see new records in the WAL log file which say that data has been successfully merged into local storage. If you remember HFile keeps internal data in a B-tree data  structure. So if  you look into its bare output, you will see lots on encoded metadata. HBase provides another great tool for working with these files called hfile. We can use it to look inside the content of the file:

hbase org.apache.hadoop.hbase.io.hfile.HFile -p -f /apps/hbase/data/data/default/people/358f2c1572b7275fafe5a263abec0/personaldata/5345037f8ae04e78896e901cff2a73e6

K: 1/personaldata:firstname/1487678026507/Put/vlen=7/seqid=5 V: oleksii
K: 1/personaldata:lastname/1487678032785/Put/vlen=10/seqid=7 V: yermolenko
Scanned kv count -> 2

Well done, we have just passed through end-to-end implementation of LSM tree in HBase using our custom data. This process is fully transparent for the end clients and HBase does all this work behind the scene. But it is very important to know how such mechanize works in order to properly maintain it and build custom solutions upon it.

Conclusion

In this article I tried to give you the general idea of HBase data storage and processes within it. This product extends Hadoop with extra piece of functionality which easily allows to structure big amounts of data within single place and to perform close to real time queries upon it. Besides it can be upgraded with such applications like Hive, Phoenix or Drill in order to enable an access through using well-know SQL-based syntax. The data within HBase also can be used as an input for MapReduce jobs or Spark queries. As you can see the level of support of this storage in Hadoop ecosystem is really great. I’m sure that we will keep meeting this product in our next Hadoop dives as lots of interesting things can be done using this wonderful data system.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s