01: Learn Hadoop API by examples in Java

These Hadoop tutorials assume that you have installed Cloudera QuickStart, which has the Hadoop eco system like HDFS, Spark, Hive, HBase, YARN, etc.

What is Hadoop & HDFS? Hadoop based data hub architecture & basics | Hadoop eco system basics Q&As style.

List files in HDFS

The following Java code uses the Hadoop API to list files in HDFS.

You can use the handle on “fs” to perform operations on a file like

List files in a local Unix file system
Append contents to a file in HDFS

The above Java code is equivalent to the following command-line.

How do you find the name node URI?

Option 1: On the edge node Via /etc/hadoop/conf/core-site.xml.

Option 2: If you are on Cloudera, go to Cloudera Manager, and click on “HDFS“, and then select NameNode to get its configuration details including the ip address.

Option 3: If you are on Cloudera, go to Cloudera Manager, and click on “HDFS“, and then click on the Actions drop down and click the “Download Client Configuration“, which will have all the config files including the core-site.xml as a zip file.

What libraries do you need in the classpath?

The Hadoop examples shown in this post must have the relevant JARs shown in the pom.xml file. The spark-core will transitively bring in the Hadoop libraries.

Learn about the Hadoop eco system with examples

Learn HDFS, Spark, HBase, Hive, AVro, Parquet by examples.

800+ Java & Big Data Interview Q&As