Step 1 :- Install HBASE
Step 2 :- To list the installed files on Ubuntu and Debian systems:
Step 3 :- Enable Java-based client access
$ sudo gedit .bashrc
export CLASSPATH=$CLASSPATH:/usr/lib/hbase/*:.
export CLASSPATH=$CLASSPATH:/usr/lib/hbase/lib/*:.
Step 4 :- Setting the ulimit in for the users
$ sudo gedit /etc/security/limits.conf
session required pam_limits.so
Step 5 :- Using dfs.datanode.max.transfer.threads with HBase
Step 6 :- Installing the HBase Master
Step 7 :- Configuring HBase in Pseudo-Distributed Mode
7.1. Modifying the HBase Configuration
7.2. Creating the /hbase Directory in HDFS
7.3. Starting the HBase Master
After ZooKeeper is running, you can start the HBase master in standalone mode.
$ sudo service hbase-master start
7.4. Starting an HBase RegionServer
The RegionServer is the part of HBase that actually hosts data and processes requests. The region server typically runs on all of the slave nodes in a cluster, but not the master node
To enable the HBase RegionServer on Ubuntu and Debian systems:
$ sudo apt-get install hbase-regionserver
To start the RegionServer:
$ sudo service hbase-regionserver start
[You should be able to navigate to http://localhost:60010 and verify that the local RegionServer has registered with the Master.]
Step 8 :-Installing and Starting the HBase Thrift Server
The HBase Thrift Server is an alternative gateway for accessing the HBase server. Thrift mirrors most of the HBase client APIs while enabling popular programming languages to interact with HBase. The Thrift Server is multiplatform and more performant than REST in many situations. Thrift can be run collocated along with the region servers, but should not be collocated with the NameNode or the JobTracker.
Trouble Shooting [https://hbase.apache.org/book.html]
Though above steps are able enough to run HBASE successfully but if it fails with an error like JAVA_HOME not set then do the following :-
$ sudo gedit /etc/hbase/conf/hbase-env.sh
$ sudo apt-get install hbase
Step 2 :- To list the installed files on Ubuntu and Debian systems:
$ dpkg -L hbase
Step 3 :- Enable Java-based client access
$ sudo gedit .bashrc
export CLASSPATH=$CLASSPATH:/usr/lib/hbase/*:.
export CLASSPATH=$CLASSPATH:/usr/lib/hbase/lib/*:.
Step 4 :- Setting the ulimit in for the users
$ sudo gedit /etc/security/limits.conf
hdfs - nofile 32768 hdfs - nproc 2048 hbase - nofile 32768 hbase - nproc 2048
To apply the changes in /etc/security/limits.conf on Ubuntu and
Debian systems, add the following line in the
/etc/pam.d/common-session file:
Step 5 :- Using dfs.datanode.max.transfer.threads with HBase
A Hadoop HDFS DataNode has an upper bound on the number of files that it can
serve at any one time. The upper bound is controlled by the
dfs.datanode.max.transfer.threads property (the
property is spelled in the code exactly as shown here). Before loading,
make sure you have configured the value for
dfs.datanode.max.transfer.threads in the
conf/hdfs-site.xml file (by default found in
/etc/hadoop/conf/hdfs-site.xml) to at least
4096 as shown below:
<property> <name>dfs.datanode.max.transfer.threads</name> <value>4096</value> </property>
Step 6 :- Installing the HBase Master
$ sudo apt-get install hbase-master
Step 7 :- Configuring HBase in Pseudo-Distributed Mode
7.1. Modifying the HBase Configuration
To enable pseudo-distributed mode, you must first make some configuration
changes. Open /etc/hbase/conf/hbase-site.xml in your
editor of choice, and insert the following XML properties between the
<configuration> and
</configuration> tags. The
hbase.cluster.distributed property directs HBase to
start each process in a separate JVM. The hbase.rootdir
property directs HBase to store its data in an HDFS filesystem, rather
than the local filesystem. Be sure to replace myhost
with the hostname of your HDFS NameNode (as specified by
fs.default.name or fs.defaultFS in
your conf/core-site.xml file); you may also need to
change the port number from the default (8020).
<property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase.rootdir</name> <value>hdfs://localhost:8020/hbase</value> </property>
7.2. Creating the /hbase Directory in HDFS
Before starting the HBase Master, you need to create the /hbase
directory in HDFS. The HBase master runs as
hbase:hbase so it does not have the required
permissions to create a top level directory.
To create the /hbase directory in HDFS:
$ sudo -u hdfs hadoop fs -mkdir /hbase $ sudo -u hdfs hadoop fs -chown hbase /hbase
7.3. Starting the HBase Master
After ZooKeeper is running, you can start the HBase master in standalone mode.
$ sudo service hbase-master start
7.4. Starting an HBase RegionServer
The RegionServer is the part of HBase that actually hosts data and processes requests. The region server typically runs on all of the slave nodes in a cluster, but not the master node
To enable the HBase RegionServer on Ubuntu and Debian systems:
$ sudo apt-get install hbase-regionserver
To start the RegionServer:
$ sudo service hbase-regionserver start
[You should be able to navigate to http://localhost:60010 and verify that the local RegionServer has registered with the Master.]
Step 8 :-Installing and Starting the HBase Thrift Server
The HBase Thrift Server is an alternative gateway for accessing the HBase server. Thrift mirrors most of the HBase client APIs while enabling popular programming languages to interact with HBase. The Thrift Server is multiplatform and more performant than REST in many situations. Thrift can be run collocated along with the region servers, but should not be collocated with the NameNode or the JobTracker.
To enable the HBase Thrift Server on Ubuntu and Debian systems:
$ sudo apt-get install hbase-thrift
To start the Thrift server:
$ sudo service hbase-thrift startStep 9 :- Configuring for Distributed Operation
After you have decided which machines will run each process, you can edit the
configuration so that the nodes can locate each other. In order to do
so, you should make sure that the configuration files are synchronized
across the cluster. Cloudera strongly recommends the use of a
configuration management system to synchronize the configuration files,
though you can use a simpler solution such as rsync to
get started quickly.
The only configuration change necessary to move from pseudo-distributed
operation to fully-distributed operation is the addition of the
ZooKeeper Quorum address in hbase-site.xml. Insert the
following XML property to configure the nodes with the address of the
node where the ZooKeeper quorum peer is running:
<property> <name>hbase.zookeeper.quorum</name> <value>localhost</value> </property>The hbase.zookeeper.quorum property is a comma-separated list of hosts on which ZooKeeper servers are running. If one of the ZooKeeper servers is down, HBase will use another from the list. By default, the ZooKeeper service is bound to port 2181. To change the port, add the hbase.zookeeper.property.clientPort property to hbase-site.xml and set the value to the port you want ZooKeeper to use.
Trouble Shooting [https://hbase.apache.org/book.html]
Though above steps are able enough to run HBASE successfully but if it fails with an error like JAVA_HOME not set then do the following :-
$ sudo gedit /etc/hbase/conf/hbase-env.sh
export JAVA_HOME=/usr/lib/jvm/jdk1.8.0
Configuring for Distributed Operation in the process of hbase installation for ubuntu was explained very well. Thanks.
ReplyDeleteNice and good article. It is very useful for me to learn and understand easily. Thanks for sharing your valuable information and time. Please keep updating
ReplyDeleteIELTS Coaching in chennai
German Classes in Chennai
GRE Coaching Classes in Chennai
TOEFL Coaching in Chennai
spoken english classes in chennai | Communication training