4. Setting up HBase with zookeeper to be used in Java via maven project

HBase is a NoSQL database used in Hadoop world to store “Big Data”. This extends Understanding HBase (NoSQL) database basics in 7 steps.

Step 1: Create a Maven based Java project

Step 2: Import the project into eclipse & modify the pom.xml file

to add Hadoop & HBase libraries. HBase stores data files on HDFS (i.e. HaDoop File System).

Step 3: Java class to create an HBase table with column-qualifier

By default “HBaseConfiguration.create()” runs on localhost:2181 where your zookeeper is running. If you want to specify the host:port then

Creates an HBase table named “eai_systems“. The column-family is “i

Step 4: Check the table creation in an HBase shell

Key Points to get HBase working

The above tutorial was based on Cloudera QuickStart 5.4.2. You need to have the following services running:

1) Zookeeper: provides an infrastructure for cross-node synchronization.
3) HBase

HBase runs on top of Hadoop, and Hadoop leverages a technology called ZooKeeper to handle its load distribution

Hadoop eco system services on Cloudera

Hadoop eco system services on Cloudera

Start & stop zookeeper services on command-line

The “zookeeper-server” is in “/usr/bin” folder.

Install & uninstall zookeeper services on Ubuntu

Other CDH components can be installed & removed from command-line.

Zookeeper file location

The config file is in “zoo.cfg“.

Note: If zookeeper, Hbase, or HDFS service is down, then the above code will not work.

800+ Java & Big Data Interview Q&As