HBASE Installation in Ubuntu

Step 1 :- Install HBASE

$ sudo apt-get install hbase


Step 2 :- To list the installed files on Ubuntu and Debian systems:


$ dpkg -L hbase

Step 3 :-  Enable Java-based client access

$ sudo gedit .bashrc
export CLASSPATH=$CLASSPATH:/usr/lib/hbase/*:.
export CLASSPATH=$CLASSPATH:
/usr/lib/hbase/lib/*:.

Step 4 :- Setting the ulimit in for the users

$ sudo gedit /etc/security/limits.conf

hdfs  -       nofile  32768
hdfs  -       nproc   2048
hbase -       nofile  32768
hbase -       nproc   2048
 
To apply the changes in /etc/security/limits.conf on Ubuntu and Debian systems, add the following line in the /etc/pam.d/common-session file:

session required pam_limits.so
 

Step 5 :- Using dfs.datanode.max.transfer.threads with HBase


A Hadoop HDFS DataNode has an upper bound on the number of files that it can serve at any one time. The upper bound is controlled by the dfs.datanode.max.transfer.threads property (the property is spelled in the code exactly as shown here). Before loading, make sure you have configured the value for dfs.datanode.max.transfer.threads in the conf/hdfs-site.xml file (by default found in /etc/hadoop/conf/hdfs-site.xml) to at least 4096 as shown below:

<property>
  <name>dfs.datanode.max.transfer.threads</name>
  <value>4096</value>
</property>

Step 6 :- Installing the HBase Master

$ sudo apt-get install hbase-master

Step 7 :- Configuring HBase in Pseudo-Distributed Mode

7.1.  Modifying the HBase Configuration
To enable pseudo-distributed mode, you must first make some configuration changes. Open /etc/hbase/conf/hbase-site.xml in your editor of choice, and insert the following XML properties between the <configuration> and </configuration> tags. The hbase.cluster.distributed property directs HBase to start each process in a separate JVM. The hbase.rootdir property directs HBase to store its data in an HDFS filesystem, rather than the local filesystem. Be sure to replace myhost with the hostname of your HDFS NameNode (as specified by fs.default.name or fs.defaultFS in your conf/core-site.xml file); you may also need to change the port number from the default (8020).

<property>
  <name>hbase.cluster.distributed</name>
  <value>true</value>
</property>
<property>
  <name>hbase.rootdir</name>
  <value>hdfs://localhost:8020/hbase</value>
</property>

7.2.  Creating the /hbase Directory in HDFS


Before starting the HBase Master, you need to create the /hbase directory in HDFS. The HBase master runs as hbase:hbase so it does not have the required permissions to create a top level directory.
To create the /hbase directory in HDFS:
$ sudo -u hdfs hadoop fs -mkdir /hbase
$ sudo -u hdfs hadoop fs -chown hbase /hbase

7.3.  Starting the HBase Master

After ZooKeeper is running, you can start the HBase master in standalone mode.

$ sudo service hbase-master start


7.4. Starting an HBase RegionServer

The RegionServer is the part of HBase that actually hosts data and processes requests. The region server typically runs on all of the slave nodes in a cluster, but not the master node

To enable the HBase RegionServer on Ubuntu and Debian systems:

$ sudo apt-get install hbase-regionserver

To start the RegionServer:

$ sudo service hbase-regionserver start

[You should be able to navigate to http://localhost:60010 and verify that the local RegionServer has registered with the Master.]

Step 8 :-Installing and Starting the HBase Thrift Server

The HBase Thrift Server is an alternative gateway for accessing the HBase server. Thrift mirrors most of the HBase client APIs while enabling popular programming languages to interact with HBase. The Thrift Server is multiplatform and more performant than REST in many situations. Thrift can be run collocated along with the region servers, but should not be collocated with the NameNode or the JobTracker.

To enable the HBase Thrift Server on Ubuntu and Debian systems:
$ sudo apt-get install hbase-thrift
To start the Thrift server:
$ sudo service hbase-thrift start
 
Step 9 :- Configuring for Distributed Operation

After you have decided which machines will run each process, you can edit the configuration so that the nodes can locate each other. In order to do so, you should make sure that the configuration files are synchronized across the cluster. Cloudera strongly recommends the use of a configuration management system to synchronize the configuration files, though you can use a simpler solution such as rsync to get started quickly.
The only configuration change necessary to move from pseudo-distributed operation to fully-distributed operation is the addition of the ZooKeeper Quorum address in hbase-site.xml. Insert the following XML property to configure the nodes with the address of the node where the ZooKeeper quorum peer is running:
<property>
  <name>hbase.zookeeper.quorum</name>
  <value>localhost</value>
</property>
The hbase.zookeeper.quorum property is a comma-separated list of hosts on which ZooKeeper servers are running. If one of the ZooKeeper servers is down, HBase will use another from the list. By default, the ZooKeeper service is bound to port 2181. To change the port, add the hbase.zookeeper.property.clientPort property to hbase-site.xml and set the value to the port you want ZooKeeper to use.

Trouble Shooting [https://hbase.apache.org/book.html]

Though above steps are able enough to run HBASE successfully but if it fails with an error like JAVA_HOME not set then do the following :-

$ sudo gedit /etc/hbase/conf/hbase-env.sh

export JAVA_HOME=/usr/lib/jvm/jdk1.8.0


2 comments:

  1. Configuring for Distributed Operation in the process of hbase installation for ubuntu was explained very well. Thanks.

    ReplyDelete
  2. Nice and good article. It is very useful for me to learn and understand easily. Thanks for sharing your valuable information and time. Please keep updating
    IELTS Coaching in chennai

    German Classes in Chennai

    GRE Coaching Classes in Chennai

    TOEFL Coaching in Chennai

    spoken english classes in chennai | Communication training

    ReplyDelete