Opening the New Java Project wizard
The New Java Project wizard can be used to create a new java project. There are many ways to open this wizard:- By clicking on the File menu and choosing New > Java Project
- By right clicking anywhere in the Project Explorer and selecting New > Java Project
- By clicking on the New button ( ) in the Tool bar and selecting Java Project
Using the New Java Project wizard
The New Java Project Wizard has two pages.On the first page:
- Enter the Project Name
- Select the Java Runtime Environment (JRE) or leave it at the default
- Select the Project Layout which determines whether there would be
a separate folder for the sources code and class files. The recommended
option is to create separate folders for sources and class files.
You can click on the Finish button to create the project or click on the Next button to change the java build settings.
On the second page you can change the Java Build Settings like setting the Project dependency (if there are multiple projects) and adding additional jar files to the build path.
Writing the Mapper Class
As we all start up with writing some basic code for map reduce hence we will write a Word Count program which will simply count the number of words in a file and give a out put.
Now here in the mapper class we write WordCountMapper
package com.hadoop.training;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import java.util.StringTokenizer;
public class WordCountMapper extends Mapper<LongWritable,Text,Text,IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map (LongWritable key,Text value, Context context) throws IOException,InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()){
word.set(itr.nextToken());
context.write(word,one);
}
}
}
Writing the Reducer Class
Now here in the reducer class we write WordCountReducer
package com.hadoop.training;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class WordCountReducer extends Reducer<Text,IntWritable,Text,IntWritable>{
private IntWritable result = new IntWritable();
public void reduce(Text key,Iterable<IntWritable> value, Context context) throws IOException,InterruptedException {
int sum = 0;
for (IntWritable val : value) {
sum +=val.get();
}
result.set(sum);
context.write(key,result);
}
}
Writing the MapReduce driver class
Writing the MapReduce driver class as WordCount
package com.hadoop.training;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCount {
public static void main (String[] args) throws Exception {
if (args.length != 2) {
System.err.println("Usage: MaxTemperature <input path> <output path>");
System.exit(-1);
}
@SuppressWarnings("deprecation")
Job job = new Job();
job.setJarByClass(WordCount.class);
job.setJobName("Word Count");
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
Running The Map Reduce programme
$ hadoop jar WC.jar com.hadoop.training.WordCount hdfs://localhost:8020/user/rajeev/input hdfs://localhost:8020/user/rajeev/output