Blog Archives

12: Spark streaming with “fileStream” and “PortableDataStream” simple tutorial

This extends the Spark streaming with “textFileStream” simple tutorial to use fileStream(…) and PortableDataStream. The pom.xml file is same as the previous Spark streaming tutorial. Step 1: Using “fileStream(…)”. What if you want to process the files already in the folder when the streaming job started?...

Members Only Content

This content is for the members with any one of the following paid subscriptions:

30-Day-Java-JEE-Career-Companion, 90-Day-Java-JEE-Career-Companion, 180-Day-Java-JEE-Career-Companion, 365-Day-Java-JEE-Career-Companion and 2-Year-Java-JEE-Career-Companion Log In | Register | Try free FAQs | Home
Posted in member-paid, Spark Tutorials

11: Spark streaming with “textFileStream” simple tutorial

Using Spark streaming data can be ingested from many sources like Kafka, Flume, HDFS, Unix/Windows File system, etc. In this example, let’s run the Spark in a local mode to ingest data from a Unix file system. Step 1: The pom.xml file. Using textFileStream(..) textFileStream watches a directory for new…...

Members Only Content

This content is for the members with any one of the following paid subscriptions:

30-Day-Java-JEE-Career-Companion, 90-Day-Java-JEE-Career-Companion, 180-Day-Java-JEE-Career-Companion, 365-Day-Java-JEE-Career-Companion and 2-Year-Java-JEE-Career-Companion Log In | Register | Try free FAQs | Home
Posted in member-paid, Spark Tutorials

05a: Spark DataFrame simple tutorial

A DataFrame is an immutable distributed collection of data like an RDD, but unlike an RDD, data is organized into named columns of a table in a relational database. This makes processing easier by imposing a structure onto a distributed collection of data. From Spark 2.0 onwards, DataFrame APIs will…...

Members Only Content

This content is for the members with any one of the following paid subscriptions:

30-Day-Java-JEE-Career-Companion, 90-Day-Java-JEE-Career-Companion, 180-Day-Java-JEE-Career-Companion, 365-Day-Java-JEE-Career-Companion and 2-Year-Java-JEE-Career-Companion Log In | Register | Try free FAQs | Home
Posted in member-paid, Spark Tutorials

07: Avro IDL (e.g. avdl) to Java objects & Avro Schemas (i.e. avsc) tutorial

Avro IDL (i.e Interface Definition Language) schema can be specified with two type of files “avpr” (i.e. AVro PRotocol file) & “avdl” (i.e. AVro iDL). Step 1: Create a maven based Java project from a command-line Step 2: Import it into eclipse as a maven project. Step 3: Create new…...

Members Only Content

This content is for the members with any one of the following paid subscriptions:

30-Day-Java-JEE-Career-Companion, 90-Day-Java-JEE-Career-Companion, 180-Day-Java-JEE-Career-Companion, 365-Day-Java-JEE-Career-Companion and 2-Year-Java-JEE-Career-Companion Log In | Register | Try free FAQs | Home
Posted in Converting File Formats, member-paid

10: Spark RDDs to HBase & HBase to Spark RDDs

Step 1: pom.xml with library dependencies. It is important to note that 1) “https://repository.cloudera.com/artifactory/cloudera-repos/” is added as the “Cloudera Maven Repository” and 2) hbase-spark dependency is used for writing to HBase from Spark RDDs & reading from HBase into Spark RDDs. Step 2: Write the Spark job that interacts with…...

Members Only Content

This content is for the members with any one of the following paid subscriptions:

30-Day-Java-JEE-Career-Companion, 90-Day-Java-JEE-Career-Companion, 180-Day-Java-JEE-Career-Companion, 365-Day-Java-JEE-Career-Companion and 2-Year-Java-JEE-Career-Companion Log In | Register | Try free FAQs | Home
Posted in member-paid, Spark Tutorials

09: Append to AVRO from Spark with distributed Zookeper locking using Apache’s Curator framework

Step 1: The pom.xml file that has all the relevant dependencies to Spark, Avro & hadoop libraries. Step 2: Avro schema /schema/employee.avsc file under src/main/resources folder. Step 3: Spark job that creates random data into a RDD named “employeeRdd“, and the RDDs are processed in parallel by multiple Spark executors.…...

Members Only Content

This content is for the members with any one of the following paid subscriptions:

30-Day-Java-JEE-Career-Companion, 90-Day-Java-JEE-Career-Companion, 180-Day-Java-JEE-Career-Companion, 365-Day-Java-JEE-Career-Companion and 2-Year-Java-JEE-Career-Companion Log In | Register | Try free FAQs | Home
Posted in member-paid, Spark Tutorials

08: Spark writing RDDs to multiple text files & HAR to solve small files issue

We know that the following code snippets in Spark will write each JavaRDD element to a single file What if you want to write each employee history to a separate file? Step 1: Create a JavaPairRDD from JavaRDD Step 2: Create a MultipleOutputFormat, which allows you to write the output…...

Members Only Content

This content is for the members with any one of the following paid subscriptions:

30-Day-Java-JEE-Career-Companion, 90-Day-Java-JEE-Career-Companion, 180-Day-Java-JEE-Career-Companion, 365-Day-Java-JEE-Career-Companion and 2-Year-Java-JEE-Career-Companion Log In | Register | Try free FAQs | Home
Posted in member-paid, Spark Tutorials

07: spark-xml to split & read very large XML files

Processing very large XML files can be a bit tricky as they cannot be processed line by line in parallel as you would do with CSV files. The xml file has to be intact whilst matching the start and end entity tags, and if the tags are distributed in parts…...

Members Only Content

This content is for the members with any one of the following paid subscriptions:

30-Day-Java-JEE-Career-Companion, 90-Day-Java-JEE-Career-Companion, 180-Day-Java-JEE-Career-Companion, 365-Day-Java-JEE-Career-Companion and 2-Year-Java-JEE-Career-Companion Log In | Register | Try free FAQs | Home
Posted in member-paid, Spark Tutorials
Page 1 of 6123456

800+ Java Interview Q&As – ♥Free | ♦FAQs

open all | close all

Pressed for time? 200+ Java Interview FAQs

open all | close all

16 Technical Key Areas to be a top-notch

open all | close all

100+ Java Tutorials – Step by step

open all | close all

100+ Java Coding Exercises

open all | close all

How good are your

open all | close all