Installing Pig
To install Pig On Red Hat-compatible systems:
$ sudo yum install pig
To install Pig on SLES systems:
$ sudo zypper install pig
To install Pig on Ubuntu and other Debian systems:
$ sudo apt-get install pig

Pig automatically uses the active Hadoop configuration (whether standalone, pseudo-distributed mode, or
distributed). After installing the Pig package, you can start the grunt shell.
To start the Grunt Shell (MRv1):
$ export PIG_CONF_DIR=/usr/lib/pig/conf $ export PIG_CLASSPATH=/usr/lib/hbase/hbase-0.94.2-cdh4.2.1-security.jar:
/usr/lib/zookeeper/zookeeper-3.4.5-cdh4.2.1.jar $ pig grunt>
To start the Grunt Shell (YARN):

For each user who will be submitting MapReduce jobs using MapReduce v2 (YARN), or running Pig, Hive, or Sqoop in
a YARN installation, set the HADOOP_MAPRED_HOME environment variable as follows:
$ export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
$ export PIG_CONF_DIR=/usr/lib/pig/conf $ export PIG_CLASSPATH=/usr/lib/hbase/hbase-0.94.2-cdh4.2.1-security.jar:
/usr/lib/zookeeper/zookeeper-3.4.5-cdh4.2.1.jar $ pig ... grunt>
To verify that the input and output directories from the example grep job exist list an HDFS directory from the Grunt Shell:
grunt> ls hdfs://localhost/user/joe/input <dir> hdfs://localhost/user/joe/output <dir>
To run a grep example job using Pig for grep inputs:
grunt> A = LOAD 'input'; grunt> B = FILTER A BY $0 MATCHES '.*dfs[a-z.]+.*'; grunt> DUMP B;
To check the status of your job while it is running, look at the JobTracker web console
http://localhost:50030/.
|
No comments:
Post a Comment