3. Apache Pig: XPath for XML

This extends the tutorial 1. Apache Pig Getting started and 2. Apache Pig: Regex (Regular expressions).

Input Data

scores.xml in folder:/Users/arulk/projects representing marks of 4 students in 3 subjects:

Step 1: Start pig in local file system mode.

Step 2: Extract the “Subjects” from the input XML file.

Dump the output:

Step 3: XPath expressions to extract the marks for each subject by 4 students. The XPath and XPathAll methods are defined in “piggybank.jar”.

Dump the output:

Output:

Step 4: Create a new file into a folder named “MARKS_FOR_SUBJECT.TXT” to store the output into a file named “part-m-00000”.


Why & What are the benefits

🎯 Why java-success.com?

🎯 What are the benefits of Q&As approach?

Learn by categories such as FAQs – Core Java, Key Area – Low Latency, Core Java – Java 8, JEE – Microservices, Big Data – NoSQL, Architecture – Distributed, Big Data – Spark, etc. Some posts belong to multiple categories.

800+ Java & Big Data Q&As Menu

Top