Blog Archives

01: Apache Hadoop HDFS Tutorial

Step 1: Download the latest version of “Apache Hadoop common” from http://apache.claz.org/hadoop using wget, curl or a browser. This tutorial uses “http://apache.claz.org/hadoop/core/hadoop-2.7.1/”.

Step 2: You can set Hadoop environment variables by appending the following commands to ~/.bashrc file.

You can run this in a Unix command prompt as

Step 3: You can verify if Hadoop has been setup properly with

Step 4: The Hadoop file in $HADOOP_HOME/etc/Hadoop/hadoop-env.sh has the JAVA_HOME setting.

Read more ›



02: Java to write from/to Local to HDFS File System

This extends Hadoop MapReduce Basic Tutorial and Apache Hadoop HDFS Tutorial. This could have have been done on the command-line as shown below after running “start-dfs.sh” to start the name and data nodes.

The focus of this tutorial is to do the same via Java and Hadoop APIs.

Read more ›



03: Create or append a file to HDFS – Hadoop API tutorial

Step 1: Create a simple maven project named “simple-hadoop“. Step 2: Import the “simple-hadoop” maven project into eclipse or IDE of your choice. Step 3: Modify the pom.xml file include 1) relevant Hadoop libraries 2) The shade plugin to create a single jar (i.e. … Read more ›...

Members Only Content
Log In Register Home


04: Create new or append to an existing AVRO file tutorial

This extends Create or append a file to HDFS – Hadoop API tutorial to write an AVRO file to HDFS. Step 1: Include the AVRO library files in the pom.xml file. Step 2: The AVRO files are schema based. … Read more ›...

Members Only Content
Log In Register Home


05: Create or append a Sequence file to HDFS – Hadoop API tutorial

The following tutorial extends Create or append a file to HDFS – Hadoop API tutorial, and Create or append an AVRO file to HDFS – Hadoop & AVRO API tutorial. In this tutorial we will write to a Sequence file, … Read more ›...

Members Only Content
Log In Register Home


800+ Java Interview Q&As Menu

Learn by categories on the go...
Learn by categories such as FAQs – Core Java, Key Area – Low Latency, Core Java – Java 8, JEE – Microservices, Big Data – NoSQL, Architecture – Distributed, Big Data – Spark, etc. Some posts belong to multiple categories.
Top