Blog Archives
1 2

01: Spark tutorial- writing a file from a local file system to HDFS

This tutorial assumes that you have set up Cloudera as per “cloudera quickstart vm tutorial installation” YouTube videos that you … … Read more ›...

Members Only Content
Log In Register Home


01B: Spark tutorial – writing to HDFS from Spark using Hadoop API

Step 1: The “pom.xml” that defines the dependencies for Spark & Hadoop APIs. Step 2: The Spark job that writes numbers … … Read more ›...

Members Only Content
Log In Register Home


02: Spark tutorial – reading a file from HDFS

This extends Spark tutorial – writing a file from a local file system to HDFS.

This tutorial assumes that you …

Read more ›



03: Spark tutorial – reading a Sequence File from HDFS

This extends Spark submit – reading a file from HDFS. A SequenceFile is a flat file consisting of binary key/value … … Read more ›...

Members Only Content
Log In Register Home


04: Running a Simple Spark Job in local & cluster modes

Step 1: Create a simple maven Spark project using “-B” for non-interactive mode. Step 2: Import the maven project “simple-spark” into … … Read more ›...

Members Only Content
Log In Register Home


05: Spark SQL & CSV with DataFrame Tutorial

Step 1: Create a simple maven project. Step 2: Import the “simple-spark” maven project into eclipse or IDE of your choice.… … Read more ›...

Members Only Content
Log In Register Home


05a: Spark DataFrame simple tutorial

A DataFrame is an immutable distributed collection of data like an RDD, but unlike an RDD, data is organized into named … … Read more ›...

Members Only Content
Log In Register Home


06: Spark Streaming with Flume Avro Sink Tutorial

This extends Running a Simple Spark Job in local & cluster modes and Apache Flume with JMS source (Websphere MQ) and … … Read more ›...

Members Only Content
Log In Register Home


07: spark-xml to split & read very large XML files

Processing very large XML files can be a bit tricky as they cannot be processed line by line in parallel as … … Read more ›...

Members Only Content
Log In Register Home


08: Spark writing RDDs to multiple text files & HAR to solve small files issue

We know that the following code snippets in Spark will write each JavaRDD element to a single file What if you …… Read more ›...

Members Only Content
Log In Register Home


09: Append to AVRO from Spark with distributed Zookeeper locking using Apache’s Curator framework

Step 1: The pom.xml file that has all the relevant dependencies to Spark, Avro & hadoop libraries. Step 2: Avro … … Read more ›...

Members Only Content
Log In Register Home


10: Spark RDDs to HBase & HBase to Spark RDDs

Step 1: pom.xml with library dependencies. It is important to note that 1) “https://repository.cloudera.com/artifactory/cloudera-repos/” is added as the “Cloudera Maven … … Read more ›...

Members Only Content
Log In Register Home


11: Spark streaming with “textFileStream” simple tutorial

Using Spark streaming data can be ingested from many sources like Kafka, Flume, HDFS, Unix/Windows File system, etc. In this example, … … Read more ›...

Members Only Content
Log In Register Home


1 2

Java FAQs to Fast-track & Go places

Java Interview Q&As

Top