3. Apache Pig: XPath for XML

This extends the tutorial 1. Apache Pig Getting started and 2. Apache Pig: Regex (Regular expressions).

Input Data

scores.xml in folder:/Users/arulk/projects representing marks of 4 students in 3 subjects:

Step 1: Start pig in local file system mode.

Step 2: Extract the “Subjects” from the input XML file.

Dump the output:

Step 3: XPath expressions to extract the marks for each subject by 4 students. The XPath and XPathAll methods are defined in “piggybank.jar”.

Dump the output:


Step 4: Create a new file into a folder named “MARKS_FOR_SUBJECT.TXT” to store the output into a file named “part-m-00000”.

