As we are using Clodera distribution hence installing hive is a very easy job.
Just type the below command :-

$ sudo yum install hadoop-hive

Now to use Hive type the below command to use from shell

$ hive
hive>

You cal also install hive with hbase additionally by typing the command [As of CDH3u5, Hive no longer has a dependency on HBase. Instead, if you want to use Hive with HBase (referred to as Hive-HBase integration), you need to install the hadoop-hive-hbase package, as follows.]

$ sudo yum install hadoop-hive-hbase

To run a Hive script that uses Hive-HBase integration:

hive --auxpath /usr/lib/hive/lib/hbase.jar,/usr/lib/hive/lib/hive-hbase-handler-0.7.1-cdh3u6.jar,
  /usr/lib/hive/lib/zookeeper.jar,/usr/lib/hive/lib/guava-r06.jar 
-hiveconf hbase.zookeeper.quorum=<zookeeper_quorum> -f <script>

Using Hive with HBase

To allow Hive scripts to use HBase, add the following statements to the top of each script. Replace the <component_version> strings with current version numbers for CDH, Guava and the Hive HBase handler. (You can find current version numbers for CDH dependencies such as Guava in CDH's root pom.xml file for the current release, for example cdh-root-4.4.0.pom.)

ADD JAR /usr/lib/hive/lib/zookeeper.jar
ADD JAR /usr/lib/hive/lib/hbase.jar
ADD JAR /usr/lib/hive/lib/hive-hbase-handler-<Hive-HBase-Handler_version>-cdh<CDH_version>.jar
ADD JAR /usr/lib/hive/lib/guava-<Guava_version>.jar

For example,

ADD JAR /usr/lib/hive/lib/zookeeper.jar
ADD JAR /usr/lib/hive/lib/hive-hbase-handler-0.10.0-cdh4.4.0.jar
ADD JAR /usr/lib/hive/lib/guava-11.0.2.jar

Note:

Instead of adding these statements to each script, you can populate the hive.aux.jars.path property in hive-site.xml; for example:

<property>
  <name>hive.aux.jars.path </name>
  <value>file:///usr/lib/hive/lib/zookeeper.jar,file:///usr/lib/hive/lib/hive-hbase-handler-0.10.0-cdh4.5.0.jar,file:///usr/lib/hive/lib/guava-11.0.2.jar,file:///usr/lib/hive/lib/hbase.jar </value>
</property>

Configuring Hive :-

In order to make setup easy for new users, Hive's Metastore is configured to store metadata locally in an embedded Apache Derby database. Unfortunately, this configuration only allows a single user to access the Metastore at a time. We strongly encourages users to use a MySQL database instead. This section describes how to configure Hive to use a remote MySQL database, which allows Hive to support multiple users. See the Hive Metastore documentation for additional information.

Prerequisite :-

Step 1: Install and start MySQL if you have not already done so

To install MySQL on a Red Hat system:

$ sudo yum install mysql-server

Step 2: Configure the MySQL Service and Connector
Before you can run the Hive metastore with a remote MySQL database, you must configure a connector to the remote MySQL database, set up the initial database schema, and configure the MySQL user account for the Hive user.
To install the MySQL connector on a Red Hat 6 system:

Install mysql-connector-java and symbolically link the file into the /usr/lib/hive/lib/ directory.

$ sudo yum install mysql-connector-java
$ ln -s /usr/share/java/mysql-connector-java.jar /usr/lib/hive/lib/mysql-connector-java.jar

To set the MySQL root password:

$ sudo /usr/bin/mysql_secure_installation
[...]
Enter current password for root (enter for none):
OK, successfully used password, moving on...
[...]
Set root password? [Y/n] y
New password:
Re-enter new password:
Remove anonymous users? [Y/n] Y
[...]
Disallow root login remotely? [Y/n] N
[...]
Remove test database and access to it [Y/n] Y
[...]
Reload privilege tables now? [Y/n] Y
All done!

To make sure the MySQL server starts at boot:

On Red Hat systems:

$ sudo /sbin/chkconfig mysqld on
$ sudo /sbin/chkconfig --list mysqld
mysqld          0:off   1:off   2:on    3:on    4:on    5:on    6:off

Step 3. Create the Database and User

The instructions in this section assume you are using Remote mode, and that the MySQL database is installed on a separate host from the metastore service, which is running on a host named metastorehost in the example.

Note:

If the metastore service will run on the host where the database is installed, replace 'metastorehost' in the CREATE USER example with 'localhost'. Similarly, the value ofjavax.jdo.option.ConnectionURL in /etc/hive/conf/hive-site.xml (discussed in the next step) must be jdbc:mysql://localhost/metastore. For more information on adding MySQL users, seehttp://dev.mysql.com/doc/refman/5.5/en/adding-users.html.

Create the initial database schema using the hive-schema-0.10.0.mysql.sql file located in the /usr/lib/hive/scripts/metastore/upgrade/mysql directory.

Example

$ mysql -u root -p
Enter password:
mysql> CREATE DATABASE metastore;
mysql> USE metastore;
mysql> SOURCE /usr/lib/hive/scripts/metastore/upgrade/mysql/hive-schema-0.10.0.mysql.sql;

You also need a MySQL user account for Hive to use to access the metastore. It is very important to prevent this user account from creating or altering tables in the metastore database schema.

Important:

If you fail to restrict the ability of the metastore MySQL user account to create and alter tables, it is possible that users will inadvertently corrupt the metastore schema when they use older or newer versions of Hive.

Example

mysql> CREATE USER 'hive'@'metastorehost' IDENTIFIED BY 'mypassword';
...
mysql> REVOKE ALL PRIVILEGES, GRANT OPTION FROM 'hive'@'metastorehost';
mysql> GRANT SELECT,INSERT,UPDATE,DELETE,LOCK TABLES,EXECUTE ON metastore.* TO 'hive'@'metastorehost';
mysql> FLUSH PRIVILEGES;
mysql> quit;

Step 4: Configure the Metastore Service to Communicate with the MySQL Database

This step shows the configuration properties you need to set in hive-site.xml to configure the metastore service to communicate with the MySQL database, and provides sample settings. Though you can use the same hive-site.xml on all hosts (client, metastore, HiveServer), hive.metastore.uris is the only property that must be configured on all of them; the others are used only on the metastore host.

Given a MySQL database running on myhost and the user account hive with the password mypassword, set the configuration as follows (overwriting any existing values).

Note:

The hive.metastore.local property is no longer supported as of Hive 0.10; setting hive.metastore.uris is sufficient to indicate that you are using a remote metastore.

<property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:mysql://myhost/metastore</value>
  <description>the URL of the MySQL database</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>com.mysql.jdbc.Driver</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>hive</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>mypassword</value>
</property>

<property>
  <name>datanucleus.autoCreateSchema</name>
  <value>false</value>
</property>

<property>
  <name>datanucleus.fixedDatastore</name>
  <value>true</value>
</property>

<property>
  <name>datanucleus.autoStartMechanism</name> 
  <value>SchemaTable</value>
</property> 

<property>
  <name>hive.metastore.uris</name>
  <value>thrift://<n.n.n.n>:9083</value>
  <description>IP address (or fully-qualified domain name) and port of the metastore host</description>
</property>

Step 5: Test connectivity to the metastore:

$ hive –e “show tables;”

Note:

This will take a while the first time.

Configuring HiveServer2

You must make the following configuration changes before using HiveServer2. Failure to do so may result in unpredictable behavior.

Table Lock Manager (Required)

You must properly configure and enable Hive's Table Lock Manager. This requires installing ZooKeeper and setting up a ZooKeeper ensemble.

Important:

Failure to do this will prevent HiveServer2 from handling concurrent query requests and may result in data corruption.

Enable the lock manager by setting properties in /etc/hive/conf/hive-site.xml as follows (substitute your actual ZooKeeper node names for those in the example):

<property>
  <name>hive.support.concurrency</name>
  <description>Enable Hive's Table Lock Manager Service</description>
  <value>true</value>
</property>

<property>
  <name>hive.zookeeper.quorum</name>
  <description>Zookeeper quorum used by Hive's Table Lock Manager</description>
  <value>zk1.myco.com,zk2.myco.com,zk3.myco.com</value>
</property>

Important:

Enabling the Table Lock Manager without specifying a list of valid Zookeeper quorum nodes will result in unpredictable behavior. Make sure that both properties are properly configured.

hive.zookeeper.client.port

If ZooKeeper is not using the default value for ClientPort, you need to set hive.zookeeper.client.port in /etc/hive/conf/hive-site.xml to the same value that ZooKeeper is using. Check/etc/zookeeper/conf/zoo.cfg to find the value for ClientPort. If ClientPort is set to any value other than 2181 (the default), set hive.zookeeper.client.port to the same value. For example, if ClientPort is set to 2222, set hive.zookeeper.client.port to 2222 as well:

<property>
  <name>hive.zookeeper.client.port</name>
  <value>2222</value>
  <description>
  The port at which the clients will connect.
  </description>
</property>

JDBC driver

The connection URL format and the driver class are different for HiveServer2 and HiveServer1:

HiveServer version Connection URL Driver Class

HiveServer2 jdbc:hive2://<host>:<port> org.apache.hive.jdbc.HiveDriver

HiveServer1 jdbc:hive://<host>:<port>

org.apache.hadoop.hive.jdbc.HiveDriver

Authentication

HiveServer2 can be configured to authenticate all connections; by default, it allows any client to connect. HiveServer2 supports either Kerberos or LDAP authentication; configure this in the`hive.server2.authentication` property in the `hive-site.xml` file. You can also configure Pluggable Authentication, which allows you to use a custom authentication provider for HiveServer2; and HiveServer2 Impersonation, which allows users to execute queries and access HDFS files as the connected user rather than the super user who started the HiveServer2 daemon.

Configuring HiveServer2 for YARN

To use HiveServer2 with YARN, you must set the HADOOP_MAPRED_HOME environment variable: add the following line to /etc/default/hive-server2:

export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce

Running HiveServer2 and HiveServer Concurrently

Cloudera recommends running HiveServer2 instead of the original HiveServer (HiveServer1) package in most cases; HiveServer1 is included for backward compatibility. Both HiveServer2 and HiveServer1 can be run concurrently on the same system, sharing the same data sets. This allows you to run HiveServer1 to support, for example, Perl or Python scripts that use the native HiveServer1 Thrift bindings.

Both HiveServer2 and HiveServer1 bind to port 10000 by default, so at least one of them must be configured to use a different port. You can set the port for HiveServer2 in hive-site.xml by means of thehive.server2.thrift.port property. For example:

<property>
  <name>hive.server2.thrift.port</name>
  <value>10001</value>
  <description>TCP port number to listen on, default 10000</description>
</property>

You can also specify the port (and the host IP address in the case of HiveServer2) by setting these environment variables:

HiveServer version	Port	Host Address
HiveServer2	HIVE_SERVER2_THRIFT_PORT	HIVE_SERVER2_THRIFT_BIND_HOST
HiveServer1	HIVE_PORT	<Host bindings cannot be specified>

Using Custom UDFs with HiveServer2

To use custom User-Defined Functions (UDFs) with HiveServer2, do the following:

Copy the UDF JAR files to the machine(s) hosting the HiveServer2 server(s).
Save the JARs to any directory you choose, and make a note of the path.

Make the JARs available to the current instance of HiveServer2 by setting `HIVE_AUX_JARS_PATH` to the JARs' full pathname (the one you noted in Step 1) in hive-config.sh
Note:

The path can be the directory, or each JAR's full pathname in a comma-separated list.

If you are using Cloudera Manager, use the HiveServer2 Service Environment Safety Valve to set `HIVE_AUX_JARS_PATH`.

Add each JAR file's full pathname to the `hive.aux.jars.path config` property in hive-site.xml and re-start HiveServer2.
This is to allow JARs to be passed to MapReduce jobs started by Hive.

Starting the Metastore

Important:

If you are running the metastore in Remote mode, you must start the metastore before starting HiveServer2.

You can run the metastore from the command line:

$ hive --service metastore

Use Ctrl-c to stop the metastore process running from the command line.

To run the metastore as a daemon, the command is:

$ sudo service hive-metastore start

Hadoop for Beginner

HTML/JavaScript

Installing Hive

Using Hive with HBase

Step 5: Test connectivity to the metastore:

$ hive –e “show tables;”

Note:

This will take a while the first time.

Configuring HiveServer2

You must make the following configuration changes before using HiveServer2. Failure to do so may result in unpredictable behavior.

Table Lock Manager (Required)

hive.zookeeper.client.port

JDBC driver

The connection URL format and the driver class are different for HiveServer2 and HiveServer1:

HiveServer version Connection URL Driver Class

HiveServer2 jdbc:hive2://<host>:<port> org.apache.hive.jdbc.HiveDriver

HiveServer1 jdbc:hive://<host>:<port>

org.apache.hadoop.hive.jdbc.HiveDriver

Authentication

Configuring HiveServer2 for YARN

To use HiveServer2 with YARN, you must set the `HADOOP_MAPRED_HOME` environment variable: add the following line to `/etc/default/hive-server2`:

export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce

Running HiveServer2 and HiveServer Concurrently

Starting the Metastore

1 comment:

HiveServer version	Connection URL	Driver Class
HiveServer2	jdbc:hive2://<host>:<port>	org.apache.hive.jdbc.HiveDriver
HiveServer1	jdbc:hive://<host>:<port>	org.apache.hadoop.hive.jdbc.HiveDriver

HTML/JavaScript

document.write(ssyby);

Installing Hive

Using Hive with HBase

Step 5: Test connectivity to the metastore: $ hive –e “show tables;” Note: This will take a while the first time.

Configuring HiveServer2

You must make the following configuration changes before using HiveServer2. Failure to do so may result in unpredictable behavior.

Table Lock Manager (Required)

hive.zookeeper.client.port

JDBC driver

The connection URL format and the driver class are different for HiveServer2 and HiveServer1: HiveServer versionConnection URLDriver Class HiveServer2jdbc:hive2://<host>:<port>org.apache.hive.jdbc.HiveDriver HiveServer1jdbc:hive://<host>:<port> org.apache.hadoop.hive.jdbc.HiveDriver

Authentication

Configuring HiveServer2 for YARN

To use HiveServer2 with YARN, you must set the HADOOP_MAPRED_HOME environment variable: add the following line to /etc/default/hive-server2: export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce

Running HiveServer2 and HiveServer Concurrently

Starting the Metastore

1 comment:

Step 5: Test connectivity to the metastore:

$ hive –e “show tables;”

Note:

This will take a while the first time.

The connection URL format and the driver class are different for HiveServer2 and HiveServer1:

HiveServer version Connection URL Driver Class

HiveServer2 jdbc:hive2://<host>:<port> org.apache.hive.jdbc.HiveDriver

HiveServer1 jdbc:hive://<host>:<port>

org.apache.hadoop.hive.jdbc.HiveDriver

To use HiveServer2 with YARN, you must set the `HADOOP_MAPRED_HOME` environment variable: add the following line to `/etc/default/hive-server2`:

export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce