Installing Hadoop in a pseudo distributed mode

Step 1: Run the following command to install hadoop from yum repository in a pseudo distributed mode

sudo yum install hadoop-­‐0.20-­‐conf-­‐pseudo

Step 2: Verify if the packages are installed properly

rpm ‐ql hadoop‐0.20­‐conf‐pseudo

Step 3: Format the namenode

sudo ‐u hdfs hdfs namenode ‐format

Step 4: Stop existing services (As Hadoop was already installed for you, there might be some services running)

$for service in /etc/init.d/hadoop*
> do
>sudo $service stop
>done

Step 5: Start HDFS

$ for service in /etc/init.d/hadoop‐hdfs­‐*
>do
>sudo $service start
>done

Step 6: Verify if HDFS has started properly (In the browser)

http://localhost:50070

Step 7: Create the /tmp directory
$sudo -u hdfs hadoop fs ‐mkdir /tmp
$sudo ‐u hdfs hadoop fs ‐chmod ‐R 1777 /tmp
Step 8: Create mapreduce specific directories
sudo ‐u hdfs hadoop fs ‐mkdir /var
sudo ‐u hdfs hadoop fs ‐mkdir /var/lib
sudo ‐u hdfs hadoop fs ‐mkdir /var/lib/hadoop‐hdfs
sudo ‐u hdfs hadoop fs ‐mkdir /var/lib/hadoop-hdfs/cache
sudo ‐u hdfs hadoop fs ‐mkdir /var/lib/hadoop‐hdfs/cache/mapred
sudo ‐u hdfs hadoop fs ‐mkdir /var/lib/hadoop-hdfs/cache/mapred/mapred
sudo ‐u hdfs hadoop fs ‐mkdir /var/lib/hadoop-hdfs/cache/mapred/mapred/staging
sudo ‐u hdfs hadoop fs ‐chmod 1777 /var/lib/hadoop-
hdfs/cache/mapred/mapred/staging
sudo ‐u hdfs hadoop fs ‐chown ‐R mapred /var/lib/hadoop-
hdfs/cache/mapred
Step 9: Verify the directory structure
$sudo ‐u hdfs hadoop fs ‐ls ‐R /
Output should be
drwxrwxrwt - hdfs supergroup 0 2012-04-19 15:14 /tmp
drwxr-xr-x - hdfs supergroup 0 2012-04-19 15:16 /var
drwxr-xr-x - hdfs supergroup 0 2012-04-19 15:16 /var/lib
drwxr-xr-x - hdfs supergroup 0 2012-04-19 15:16 /var/lib/hadoop-hdfs
drwxr-xr-x - hdfs supergroup 0 2012-04-19 15:16 /var/lib/hadoophdfs/
cache
drwxr-xr-x - mapred supergroup 0 2012-04-19 15:19 /var/lib/hadoophdfs/
cache/mapred
drwxr-xr-x - mapred supergroup 0 2012-04-19 15:29 /var/lib/hadoophdfs/
cache/mapred/mapred
drwxrwxrwt - mapred supergroup 0 2012-04-19 15:33 /var/lib/hadoophdfs/
cache/mapred/mapred/staging
Step 10: Start MapReduce
$ for service in /etc/init.d/hadoop-0.20‐mapreduce­‐*
>do
>sudo $service start
>done
Step 11: Verify if MapReduce has started properly (In Browser)
http://localhost:50030
Step 12: Verify if the installation went on well by running a program
Step 12.1: Create a home directory on HDFS for the user
sudo ‐u hdfs hadoop fs ‐mkdir /user/training
sudo ‐u hdfs hadoop fs ‐chown training /user/training
Step 12.2: Make a directory in HDFS called input and copy some XML
files into it by running the following commands
$hadoop fs ‐mkdir input
$hadoop fs ‐put /etc/hadoop/conf/*.xml input
$hadoop fs ‐ls input
Found 3 items:
rw-r-‐r-- 1 joe supergroup 1348 2012‐02‐13 12:21 input/core-site.xml
rw-r-‐r-- 1 joe supergroup 1348 2012‐02‐13 12:21 input/hdfs-site.xml
rw-r-‐r-- 1 joe supergroup 1348 2012‐02‐13 12:21 input/mapred-site.xml
Step 12.3: Run an example Hadoop job to grep with a regular expression
in your input data.
$/usr/bin/hadoop jar /usr/lib/hadoop‐0.20‐mapreduce/hadoop‐examples.jar grep input output 'dfs[a‐z.]+'
Step 12.4: After the job completes, you can find the output in the HDFS
directory named output because you specified that output directory to
Hadoop.
$hadoop fs ‐ls
Found 2 items:
drwxr-xr-x - joe supergroup 0 2009-08-18 18:36
/user/joe/input
drwxr-xr-x - joe supergroup 0 2009-08-18 18:38
/user/joe/output
Step 12.5: List the output files
$ hadoop fs -ls output
Found 2 items
drwxr-xr-x - joe supergroup 0 2009-02-25
10:33 /user/joe/output/_logs
-rw-r--r-- 1 joe supergroup 1068 2009-02-25
10:33 /user/joe/output/part-00000
-rw-r--r- 1 joe supergroup 0 2009-02-25
10:33 /user/joe/output/_SUCCESS
Step 12.6: Read the output
$ hadoop fs -cat output/part-00000 | head
1 dfs.datanode.data.dir
1 dfs.namenode.checkpoint.dir
1 dfs.namenode.name.dir
1 dfs.replication
1 dfs.safemode.extension
1 dfs.safemode.min.

No comments:

Post a Comment