Understanding HBase

Column in HBase

HBase’s table can has multidimensional column.
Hence if you know about RDBMS, you might have a problem to understand about HBase table structure. JSON expression will give little help to understanding.

1. Run hbase shell

hadoop@delmonte:~$ hbase shell

2. Create Table

You can make a table and columns through two ways.

hbase> create 'blogposts', {NAME => 'post'}, {NAME => 'image', 
VERSIONS => 1, TTL => 2592000, BLOCKCACHE => true}

or

hbase> create 'blogposts', 'post', 'image'
Express by JSON
blogposts = {
'post':{},
'image':{}
}

3. Add Data

hbase> put 'blogposts', 'post1', 'post:title', 'Hello World'
hbase> put 'blogposts', 'post1', 'post:author', 'The Author'
hbase> put 'blogposts', 'post1', 'post:body', 'This is a blog post'
hbase> put 'blogposts', 'post1', 'image:header', 'image1.jpg'
hbase> put 'blogposts', 'post1', 'image:bodyimage', 'image2.jpg'
Express by JSON
blogposts = {
  'post1':{ // row
    'post':{ // column
      'title':'Hello World', // cell
      'author':'The Author', // cell
      'body':'This is a blog post' // cell
    },
    'image':{ // column
      'header':'image1.jpg', // cell
      'bodyimage':'image2.jpg' // cell
    }
  }
}

4. Look at the Data

hbase> get ‘blogposts’, ‘post1′
COLUMN          CELL
image:bodyimage timestamp=1229953133260, value=image2.jpg
image:header    timestamp=1229953110419, value=image1.jpg
post:author     timestamp=1229953071910, value=The Author
post:body       timestamp=1229953072029, value=This is a blog post
post:title      timestamp=1229953071791, value=Hello World


Row in HBase

In HBase, you need to keep in mind that a table is possible to consist more than billion rows. Thus, to find a particular row, HBase use named ‘rowkey’: It is a same concept as ‘hashkey’ in the hash table.

Most of people mentioned about MD5 to make a ‘rowkey’ and there’s reason to do.

First of all, MD5 makes long name to short (16 bytes).
e.g. An URL ‘http:// do-buffalo-buffalo-buffalo-buffalo-buffalo-buffalo-buffalo-buffalo.com’ can be ‘739729cc1870c16e78c1cb1395bf2bc4’.

Second, If you monotonically increase ‘rowkey’ like ‘r1’, ‘r2’, ‘r3’ … ‘rn’, you will encounter a problem called ‘RegionServer Hotspotting’. It is really well described in here by comics.

Of course you can make a ‘rowkey’ by combination like this. The choice is depends on what your site want to store to HBase.

refer for COLUMN IN HBASE
http://www.evanconkle.com/2011/11/hbase-tutorial-creating-table/
http://wiki.apache.org/hadoop/Hbase/Shell
http://jimbojw.com/wiki/index.php?title=Understanding_Hbase_and_BigTable
refer for ROW IN HBASE
http://hbase.apache.org/book/rowkey.design.html
http://entireboy.egloos.com/viewer/4689269
http://ikaisays.com/2011/01/25/app-engine-datastore-tip-monotonically-increasing-values-are-bad/
http://hbase.apache.org/book/schema.casestudies.html

Published by

Raphael, Eom

Working these days with Meteor, Redux, MongoDB email: gblue1223@gmail.com

2 thoughts on “Understanding HBase”

Leave a comment