Step 1: Run the following command to install hadoop from yum repository in a pseudo distributed mode
sudo yum install hadoop-‐0.20-‐conf-‐pseudo
Step 2: Verify if the packages are installed properly
rpm ‐ql hadoop‐0.20‐conf‐pseudo
Step 3: Format the namenode
sudo ‐u hdfs hdfs namenode ‐format
Step 4: Stop existing services (As Hadoop was already installed for you, there might be some services running)
$for service in /etc/init.d/hadoop*
> do
>sudo $service stop
>done
Step 5: Start HDFS
$ for service in /etc/init.d/hadoop‐hdfs‐*
>do
>sudo $service start
>done
Step 6: Verify if HDFS has started properly (In the browser)
http://localhost:50070
sudo yum install hadoop-‐0.20-‐conf-‐pseudo
Step 2: Verify if the packages are installed properly
rpm ‐ql hadoop‐0.20‐conf‐pseudo
Step 3: Format the namenode
sudo ‐u hdfs hdfs namenode ‐format
Step 4: Stop existing services (As Hadoop was already installed for you, there might be some services running)
$for service in /etc/init.d/hadoop*
> do
>sudo $service stop
>done
Step 5: Start HDFS
$ for service in /etc/init.d/hadoop‐hdfs‐*
>do
>sudo $service start
>done
Step 6: Verify if HDFS has started properly (In the browser)
http://localhost:50070
Step
7: Create the /tmp directory
$sudo
-u hdfs hadoop fs ‐mkdir /tmp
$sudo
‐u hdfs hadoop fs ‐chmod ‐R 1777 /tmp
Step
8: Create mapreduce specific directories
sudo
‐u hdfs hadoop fs ‐mkdir /var
sudo
‐u hdfs hadoop fs ‐mkdir /var/lib
sudo
‐u hdfs hadoop fs ‐mkdir /var/lib/hadoop‐hdfs
sudo
‐u hdfs hadoop fs ‐mkdir /var/lib/hadoop-hdfs/cache
sudo
‐u hdfs hadoop fs ‐mkdir /var/lib/hadoop‐hdfs/cache/mapred
sudo
‐u hdfs hadoop fs ‐mkdir /var/lib/hadoop-hdfs/cache/mapred/mapred
sudo
‐u hdfs hadoop fs ‐mkdir
/var/lib/hadoop-hdfs/cache/mapred/mapred/staging
sudo
‐u hdfs hadoop fs ‐chmod 1777 /var/lib/hadoop-
hdfs/cache/mapred/mapred/staging
sudo
‐u hdfs hadoop fs ‐chown ‐R mapred /var/lib/hadoop-
hdfs/cache/mapred
Step
9: Verify the directory structure
$sudo
‐u hdfs hadoop fs ‐ls ‐R /
Output
should be
drwxrwxrwt
- hdfs supergroup 0 2012-04-19 15:14 /tmp
drwxr-xr-x
- hdfs supergroup 0 2012-04-19 15:16 /var
drwxr-xr-x
- hdfs supergroup 0 2012-04-19 15:16 /var/lib
drwxr-xr-x
- hdfs supergroup 0 2012-04-19 15:16 /var/lib/hadoop-hdfs
drwxr-xr-x
- hdfs supergroup 0 2012-04-19 15:16 /var/lib/hadoophdfs/
cache
drwxr-xr-x
- mapred supergroup 0 2012-04-19 15:19 /var/lib/hadoophdfs/
cache/mapred
drwxr-xr-x
- mapred supergroup 0 2012-04-19 15:29 /var/lib/hadoophdfs/
cache/mapred/mapred
drwxrwxrwt
- mapred supergroup 0 2012-04-19 15:33 /var/lib/hadoophdfs/
cache/mapred/mapred/staging
Step
10: Start MapReduce
$
for service in /etc/init.d/hadoop-0.20‐mapreduce‐*
>do
>sudo $service start
>done
>do
>sudo $service start
>done
Step
11: Verify if MapReduce has started properly (In Browser)
http://localhost:50030
Step
12: Verify if the installation went on well by running a program
Step
12.1: Create a home directory on HDFS for the user
sudo
‐u hdfs hadoop fs ‐mkdir /user/training
sudo
‐u hdfs hadoop fs ‐chown training /user/training
Step
12.2: Make a directory in HDFS called input and copy some XML
files
into it by running the following commands
$hadoop
fs ‐mkdir input
$hadoop
fs ‐put /etc/hadoop/conf/*.xml input
$hadoop
fs ‐ls input
Found
3 items:
‐rw-r-‐r--
1 joe supergroup 1348 2012‐02‐13 12:21 input/core-site.xml
‐rw-r-‐r--
1 joe supergroup 1348 2012‐02‐13 12:21 input/hdfs-site.xml
‐rw-r-‐r--
1 joe supergroup 1348 2012‐02‐13 12:21 input/mapred-site.xml
Step
12.3: Run an example Hadoop job to grep with a regular expression
in
your input data.
$/usr/bin/hadoop
jar /usr/lib/hadoop‐0.20‐mapreduce/hadoop‐examples.jar grep
input output 'dfs[a‐z.]+'
Step
12.4: After the job completes, you can find the output in the HDFS
directory
named output because you specified that output directory to
Hadoop.
$hadoop
fs ‐ls
Found
2 items:
drwxr-xr-x
- joe supergroup 0 2009-08-18 18:36
/user/joe/input
drwxr-xr-x
- joe supergroup 0 2009-08-18 18:38
/user/joe/output
Step
12.5: List the output files
$
hadoop fs -ls output
Found
2 items
drwxr-xr-x
- joe supergroup 0 2009-02-25
10:33
/user/joe/output/_logs
-rw-r--r--
1 joe supergroup 1068 2009-02-25
10:33
/user/joe/output/part-00000
-rw-r--r-
1 joe supergroup 0 2009-02-25
10:33
/user/joe/output/_SUCCESS
Step
12.6: Read the output
$
hadoop fs -cat output/part-00000 | head
1
dfs.datanode.data.dir
1
dfs.namenode.checkpoint.dir
1
dfs.namenode.name.dir
1
dfs.replication
1
dfs.safemode.extension
1
dfs.safemode.min.
No comments:
Post a Comment