1. Hadoop MapReduce Basic Tutorial

Input Data & How Hadoop reads the Data

scores.data in folder:/Users/arulk/projects

Mapper Input

The Hadoop “org.apache.hadoop.mapreduce.lib.input.KeyValueTextInputFormat” class read the input as key/value pairs. The default delimiter is tab. Our data is using “,“. You can change this with the “mapreduce.input.keyvaluelinerecordreader.key.value.separator” property. as in

MapReduce reading values as key/value pairs using  KeyValueTextInputFormat

MapReduce reading values as key/value pairs using KeyValueTextInputFormat

Mapper Output

The mapper will go through each mark in comma separated values like 80, 75, 89, 90 and convert them to 80_75_89_90_.

Mapper output key/value pairs.

Mapper output key/value pairs.

Reducer Input

Same as mapper output.

Reducer Output

The reducer splits the “_” delimited values 80_75_89_90_ into a string array [80, 75, 89, 90] and finds out the max score for each key (i.e. each subject like Science) and stores the value as “max score is: 90”.

Hadoop Reducer output

Hadoop Reducer output

Hadoop MapReduce Steps

Step 1: Create a very simple maven project using Maven in a Unix command prompt. Press enter for all the questions.

Import the new maven project into eclipse or IDE of your choice.

Step 2: Add Hadoop dependency to the pom.xml.

Step 3: The Hadoop based mapper class “ScoreMapper” that can be executed in parallel by multiple nodes. It processe each input line as key/value pairs. E.g Science/80, 75, 89, 90.

Step 4: The Hadoop based reducer class “ScoreReducer” that can be executed in parallel by multiple nodes. It processe each input line as key/value pairs. E.g Science/80_75_89_90_. The output key/value pairs will be E.g Science/max score is: 90

Step 5: Finally the executable main Java class “MaxScoreMain” that ties everything together.

Step 6: The results will be written out to the folder: “/Users/arulk/tempMapreduce” in a file named “part-r-00000“. The contents of this file will be:

This tutorial was created on a Unix environment. You may have additional challenges running on a Windows machine, and you need to have a Unix emulator like Cygwin to run.


Categories Menu - Q&As, FAQs & Tutorials

Top