As we are using Clodera distribution hence installing hive is a very easy job.
Just type the below command :-
Now to use Hive type the below command to use from shell
You cal also install hive with hbase additionally by typing the command [As of CDH3u5, Hive no longer has a dependency on HBase. Instead, if you want to use Hive with HBase (referred to as Hive-HBase integration), you need to install the hadoop-hive-hbase package, as follows.]
Configuring Hive :-
In order to make setup easy for new users, Hive's Metastore is configured to store metadata locally in an embedded Apache Derby database. Unfortunately, this configuration only allows a single user to access the Metastore at a time. We strongly encourages users to use a MySQL database instead. This section describes how to configure Hive to use a remote MySQL database, which allows Hive to support multiple users. See the Hive Metastore documentation for additional information.
Prerequisite :-
Step 1: Install and start MySQL if you have not already done so
Before you can run the Hive metastore with a remote MySQL database, you must configure a connector to the remote MySQL database, set up the initial database schema, and configure the MySQL user account for the Hive user.
To install the MySQL connector on a Red Hat 6 system:
On Red Hat systems:
Just type the below command :-
$ sudo yum install hadoop-hive
Now to use Hive type the below command to use from shell
$ hive hive>
You cal also install hive with hbase additionally by typing the command [As of CDH3u5, Hive no longer has a dependency on HBase. Instead, if you want to use Hive with HBase (referred to as Hive-HBase integration), you need to install the hadoop-hive-hbase package, as follows.]
$ sudo yum install hadoop-hive-hbaseTo run a Hive script that uses Hive-HBase integration:
hive --auxpath /usr/lib/hive/lib/hbase.jar,/usr/lib/hive/lib/hive-hbase-handler-0.7.1-cdh3u6.jar, /usr/lib/hive/lib/zookeeper.jar,/usr/lib/hive/lib/guava-r06.jar -hiveconf hbase.zookeeper.quorum=<zookeeper_quorum> -f <script>
Using Hive with HBase
To allow Hive scripts to use HBase, add the following statements to the top of each script. Replace the <component_version> strings with current version numbers for CDH, Guava and the Hive HBase handler. (You can find current version numbers for CDH dependencies such as Guava in CDH's root pom.xml file for the current release, for example cdh-root-4.4.0.pom.)
ADD JAR /usr/lib/hive/lib/zookeeper.jar ADD JAR /usr/lib/hive/lib/hbase.jar ADD JAR /usr/lib/hive/lib/hive-hbase-handler-<Hive-HBase-Handler_version>-cdh<CDH_version>.jar ADD JAR /usr/lib/hive/lib/guava-<Guava_version>.jar
For example,
ADD JAR /usr/lib/hive/lib/zookeeper.jar ADD JAR /usr/lib/hive/lib/hive-hbase-handler-0.10.0-cdh4.4.0.jar ADD JAR /usr/lib/hive/lib/guava-11.0.2.jar
Note:
Instead of adding these statements to each script, you can populate the hive.aux.jars.path property in hive-site.xml; for example:
<property> <name>hive.aux.jars.path </name> <value>file:///usr/lib/hive/lib/zookeeper.jar,file:///usr/lib/hive/lib/hive-hbase-handler-0.10.0-cdh4.5.0.jar,file:///usr/lib/hive/lib/guava-11.0.2.jar,file:///usr/lib/hive/lib/hbase.jar </value> </property>
In order to make setup easy for new users, Hive's Metastore is configured to store metadata locally in an embedded Apache Derby database. Unfortunately, this configuration only allows a single user to access the Metastore at a time. We strongly encourages users to use a MySQL database instead. This section describes how to configure Hive to use a remote MySQL database, which allows Hive to support multiple users. See the Hive Metastore documentation for additional information.
Prerequisite :-
Step 1: Install and start MySQL if you have not already done so
To install MySQL on a Red Hat system:
$ sudo yum install mysql-serverStep 2: Configure the MySQL Service and Connector
Before you can run the Hive metastore with a remote MySQL database, you must configure a connector to the remote MySQL database, set up the initial database schema, and configure the MySQL user account for the Hive user.
To install the MySQL connector on a Red Hat 6 system:
Install mysql-connector-java and symbolically link the file into the /usr/lib/hive/lib/ directory.
$ sudo yum install mysql-connector-java $ ln -s /usr/share/java/mysql-connector-java.jar /usr/lib/hive/lib/mysql-connector-java.jarTo set the MySQL root password:
$ sudo /usr/bin/mysql_secure_installation [...] Enter current password for root (enter for none): OK, successfully used password, moving on... [...] Set root password? [Y/n] y New password: Re-enter new password: Remove anonymous users? [Y/n] Y [...] Disallow root login remotely? [Y/n] N [...] Remove test database and access to it [Y/n] Y [...] Reload privilege tables now? [Y/n] Y All done!To make sure the MySQL server starts at boot:
On Red Hat systems:
$ sudo /sbin/chkconfig mysqld on $ sudo /sbin/chkconfig --list mysqld mysqld 0:off 1:off 2:on 3:on 4:on 5:on 6:off
Step 3. Create the Database and User
The instructions in this section assume you are using Remote mode, and that the MySQL database is installed on a separate host from the metastore service, which is running on a host named metastorehost in the example.
Note:
If the metastore service will run on the host where the database is installed, replace 'metastorehost' in the CREATE USER example with 'localhost'. Similarly, the value ofjavax.jdo.option.ConnectionURL in /etc/hive/conf/hive-site.xml (discussed in the next step) must be jdbc:mysql://localhost/metastore. For more information on adding MySQL users, seehttp://dev.mysql.com/doc/refman/5.5/en/adding-users.html.
Create the initial database schema using the hive-schema-0.10.0.mysql.sql file located in the /usr/lib/hive/scripts/metastore/upgrade/mysql directory.
Example
$ mysql -u root -p Enter password: mysql> CREATE DATABASE metastore; mysql> USE metastore; mysql> SOURCE /usr/lib/hive/scripts/metastore/upgrade/mysql/hive-schema-0.10.0.mysql.sql;
You also need a MySQL user account for Hive to use to access the metastore. It is very important to prevent this user account from creating or altering tables in the metastore database schema.
Important:
If you fail to restrict the ability of the metastore MySQL user account to create and alter tables, it is possible that users will inadvertently corrupt the metastore schema when they use older or newer versions of Hive.
Example
mysql> CREATE USER 'hive'@'metastorehost' IDENTIFIED BY 'mypassword'; ... mysql> REVOKE ALL PRIVILEGES, GRANT OPTION FROM 'hive'@'metastorehost'; mysql> GRANT SELECT,INSERT,UPDATE,DELETE,LOCK TABLES,EXECUTE ON metastore.* TO 'hive'@'metastorehost'; mysql> FLUSH PRIVILEGES; mysql> quit;
Step 4: Configure the Metastore Service to Communicate with the MySQL Database
This step shows the configuration properties you need to set in hive-site.xml to configure the metastore service to communicate with the MySQL database, and provides sample settings. Though you can use the same hive-site.xml on all hosts (client, metastore, HiveServer), hive.metastore.uris is the only property that must be configured on all of them; the others are used only on the metastore host.
Given a MySQL database running on myhost and the user account hive with the password mypassword, set the configuration as follows (overwriting any existing values).
Note:
The hive.metastore.local property is no longer supported as of Hive 0.10; setting hive.metastore.uris is sufficient to indicate that you are using a remote metastore.
<property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://myhost/metastore</value> <description>the URL of the MySQL database</description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hive</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>mypassword</value> </property> <property> <name>datanucleus.autoCreateSchema</name> <value>false</value> </property> <property> <name>datanucleus.fixedDatastore</name> <value>true</value> </property> <property> <name>datanucleus.autoStartMechanism</name> <value>SchemaTable</value> </property> <property> <name>hive.metastore.uris</name> <value>thrift://<n.n.n.n>:9083</value> <description>IP address (or fully-qualified domain name) and port of the metastore host</description> </property>
Step 5: Test connectivity to the metastore:
$ hive –e “show tables;”
Note:
This will take a while the first time.
Step 5: Test connectivity to the metastore:
$ hive –e “show tables;”
Note:
This will take a while the first time.
Configuring HiveServer2
You must make the following configuration changes before using HiveServer2. Failure to do so may result in unpredictable behavior.
You must make the following configuration changes before using HiveServer2. Failure to do so may result in unpredictable behavior.
Table Lock Manager (Required)
You must properly configure and enable Hive's Table Lock Manager. This requires installing ZooKeeper and setting up a ZooKeeper ensemble.
Important:
Failure to do this will prevent HiveServer2 from handling concurrent query requests and may result in data corruption.
Enable the lock manager by setting properties in /etc/hive/conf/hive-site.xml as follows (substitute your actual ZooKeeper node names for those in the example):
<property>
<name>hive.support.concurrency</name>
<description>Enable Hive's Table Lock Manager Service</description>
<value>true</value>
</property>
<property>
<name>hive.zookeeper.quorum</name>
<description>Zookeeper quorum used by Hive's Table Lock Manager</description>
<value>zk1.myco.com,zk2.myco.com,zk3.myco.com</value>
</property>
Important:
Enabling the Table Lock Manager without specifying a list of valid Zookeeper quorum nodes will result in unpredictable behavior. Make sure that both properties are properly configured.
You must properly configure and enable Hive's Table Lock Manager. This requires installing ZooKeeper and setting up a ZooKeeper ensemble.
Important:
Failure to do this will prevent HiveServer2 from handling concurrent query requests and may result in data corruption.
Enable the lock manager by setting properties in /etc/hive/conf/hive-site.xml as follows (substitute your actual ZooKeeper node names for those in the example):
<property> <name>hive.support.concurrency</name> <description>Enable Hive's Table Lock Manager Service</description> <value>true</value> </property> <property> <name>hive.zookeeper.quorum</name> <description>Zookeeper quorum used by Hive's Table Lock Manager</description> <value>zk1.myco.com,zk2.myco.com,zk3.myco.com</value> </property>
Important:
Enabling the Table Lock Manager without specifying a list of valid Zookeeper quorum nodes will result in unpredictable behavior. Make sure that both properties are properly configured.
hive.zookeeper.client.port
If ZooKeeper is not using the default value for ClientPort, you need to set hive.zookeeper.client.port in /etc/hive/conf/hive-site.xml to the same value that ZooKeeper is using. Check/etc/zookeeper/conf/zoo.cfg to find the value for ClientPort. If ClientPort is set to any value other than 2181 (the default), set hive.zookeeper.client.port to the same value. For example, if ClientPort is set to 2222, set hive.zookeeper.client.port to 2222 as well:
<property>
<name>hive.zookeeper.client.port</name>
<value>2222</value>
<description>
The port at which the clients will connect.
</description>
</property>
If ZooKeeper is not using the default value for ClientPort, you need to set hive.zookeeper.client.port in /etc/hive/conf/hive-site.xml to the same value that ZooKeeper is using. Check/etc/zookeeper/conf/zoo.cfg to find the value for ClientPort. If ClientPort is set to any value other than 2181 (the default), set hive.zookeeper.client.port to the same value. For example, if ClientPort is set to 2222, set hive.zookeeper.client.port to 2222 as well:
<property> <name>hive.zookeeper.client.port</name> <value>2222</value> <description> The port at which the clients will connect. </description> </property>
JDBC driver
The connection URL format and the driver class are different for HiveServer2 and HiveServer1:
HiveServer version Connection URL Driver Class
HiveServer2 jdbc:hive2://<host>:<port> org.apache.hive.jdbc.HiveDriver
HiveServer1 jdbc:hive://<host>:<port>
org.apache.hadoop.hive.jdbc.HiveDriver
The connection URL format and the driver class are different for HiveServer2 and HiveServer1:
HiveServer version | Connection URL | Driver Class |
HiveServer2 | jdbc:hive2://<host>:<port> | org.apache.hive.jdbc.HiveDriver |
HiveServer1 | jdbc:hive://<host>:<port> | org.apache.hadoop.hive.jdbc.HiveDriver |
Authentication
HiveServer2 can be configured to authenticate all connections; by default, it allows any client to connect. HiveServer2 supports either Kerberos or LDAP authentication; configure this in thehive.server2.authentication property in the hive-site.xml file. You can also configure Pluggable Authentication, which allows you to use a custom authentication provider for HiveServer2; and HiveServer2 Impersonation, which allows users to execute queries and access HDFS files as the connected user rather than the super user who started the HiveServer2 daemon.
HiveServer2 can be configured to authenticate all connections; by default, it allows any client to connect. HiveServer2 supports either Kerberos or LDAP authentication; configure this in thehive.server2.authentication property in the hive-site.xml file. You can also configure Pluggable Authentication, which allows you to use a custom authentication provider for HiveServer2; and HiveServer2 Impersonation, which allows users to execute queries and access HDFS files as the connected user rather than the super user who started the HiveServer2 daemon.
Configuring HiveServer2 for YARN
To use HiveServer2 with YARN, you must set the HADOOP_MAPRED_HOME environment variable: add the following line to /etc/default/hive-server2:
export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
To use HiveServer2 with YARN, you must set the HADOOP_MAPRED_HOME environment variable: add the following line to /etc/default/hive-server2:
export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
Running HiveServer2 and HiveServer Concurrently
Cloudera recommends running HiveServer2 instead of the original HiveServer (HiveServer1) package in most cases; HiveServer1 is included for backward compatibility. Both HiveServer2 and HiveServer1 can be run concurrently on the same system, sharing the same data sets. This allows you to run HiveServer1 to support, for example, Perl or Python scripts that use the native HiveServer1 Thrift bindings.
Both HiveServer2 and HiveServer1 bind to port 10000 by default, so at least one of them must be configured to use a different port. You can set the port for HiveServer2 in hive-site.xml by means of thehive.server2.thrift.port property. For example:
<property>
<name>hive.server2.thrift.port</name>
<value>10001</value>
<description>TCP port number to listen on, default 10000</description>
</property>
You can also specify the port (and the host IP address in the case of HiveServer2) by setting these environment variables:
HiveServer
version
Port
Host
Address
HiveServer2
HIVE_SERVER2_THRIFT_PORT
HIVE_SERVER2_THRIFT_BIND_HOST
HiveServer1
HIVE_PORT
<Host bindings cannot be
specified>
Using Custom UDFs with HiveServer2
Cloudera recommends running HiveServer2 instead of the original HiveServer (HiveServer1) package in most cases; HiveServer1 is included for backward compatibility. Both HiveServer2 and HiveServer1 can be run concurrently on the same system, sharing the same data sets. This allows you to run HiveServer1 to support, for example, Perl or Python scripts that use the native HiveServer1 Thrift bindings.
Both HiveServer2 and HiveServer1 bind to port 10000 by default, so at least one of them must be configured to use a different port. You can set the port for HiveServer2 in hive-site.xml by means of thehive.server2.thrift.port property. For example:
<property> <name>hive.server2.thrift.port</name> <value>10001</value> <description>TCP port number to listen on, default 10000</description> </property>
You can also specify the port (and the host IP address in the case of HiveServer2) by setting these environment variables:
HiveServer version | Port | Host Address |
HiveServer2 | HIVE_SERVER2_THRIFT_PORT | HIVE_SERVER2_THRIFT_BIND_HOST |
HiveServer1 | HIVE_PORT | <Host bindings cannot be specified> |
To use custom User-Defined Functions (UDFs) with HiveServer2, do the following:
- Copy the UDF JAR files to the machine(s) hosting the HiveServer2 server(s).
Save the JARs to any directory you choose, and make a note of the path.
- Make the JARs available to the current instance of HiveServer2 by setting HIVE_AUX_JARS_PATH to the JARs' full pathname (the one you noted in Step 1) in hive-config.sh
Note
:
The path can be the directory, or each JAR's full pathname in a comma-separated list.
If you are using Cloudera Manager, use the HiveServer2 Service Environment Safety Valve to set HIVE_AUX_JARS_PATH.
- Add each JAR file's full pathname to the hive.aux.jars.path config property in hive-site.xml and re-start HiveServer2.
This is to allow JARs to be passed to MapReduce jobs started by Hive.
To use custom User-Defined Functions (UDFs) with HiveServer2, do the following:
- Copy the UDF JAR files to the machine(s) hosting the HiveServer2 server(s).Save the JARs to any directory you choose, and make a note of the path.
- Make the JARs available to the current instance of HiveServer2 by setting HIVE_AUX_JARS_PATH to the JARs' full pathname (the one you noted in Step 1) in hive-config.shNote
:
The path can be the directory, or each JAR's full pathname in a comma-separated list.If you are using Cloudera Manager, use the HiveServer2 Service Environment Safety Valve to set HIVE_AUX_JARS_PATH. - Add each JAR file's full pathname to the hive.aux.jars.path config property in hive-site.xml and re-start HiveServer2.This is to allow JARs to be passed to MapReduce jobs started by Hive.
Starting the Metastore
Important:
If you are running the metastore in Remote mode, you must start the metastore before starting HiveServer2.
You can run the metastore from the command line:
$ hive --service metastore
Use Ctrl-c to stop the metastore process running from the command line.
To run the metastore as a daemon, the command is:
$ sudo service hive-metastore start
Important:
If you are running the metastore in Remote mode, you must start the metastore before starting HiveServer2.
You can run the metastore from the command line:
$ hive --service metastore
Use Ctrl-c to stop the metastore process running from the command line.
To run the metastore as a daemon, the command is:
$ sudo service hive-metastore start
Hi Just wondering if you have pointers for me on installing HIVE on windows server. I can build the latest version of Hadoop but would like to also use HIVE with it. Would appreciate your help.
ReplyDeleteThankls