Before running a Spark job on a YARN cluster in Cloudera

Problem: When you run a Spark job via “spark-submit” command on a “YARN” cluster as shown below in a terminal,

It creates a folder and files in HDFS under

The files will be named with “application ids” like “application_1510971789108_0015”. This will cause a permission issue, and the history server running at “http://quickstart.cloudera:18088/” will not be able to read the files in the folder “/user/spark/applicationHistory/“.

Setting the HDFS permissions to enable viewing via Spark History server

Step 1: Before you can run the Spark history server, you must create the “/user/spark/applicationHistory/” directory in HDFS and set ownership and permissions as follows:

Now, if you run the Spark job on a YARN cluster in Cloudera

You can see the following created in “/user/spark/applicationHistory/” with the application id “application_1510971789108_0016”

Spark run creates a file in HDFS

The file is created with the Spark run application id as depicted below.

Hue – Spark job run history files

Spark history server shows the job history

The history server can be accessed via the URL http://quickstart.cloudera:18088/. You can also get to this via the “Cloudera Manager” Web UI by selecting “Spark” service –> “History Server Web UI“.

Spark jobs History Server

Drill into each job via Spark UI

to look at the logs, memory usage, stage failures, etc. Spark UI is a great tool get more insights to performance tune your Spark jobs.

Spark UI to analyze a job


Why & What are the benefits

🎯 Why java-success.com?

🎯 What are the benefits of Q&As approach?

Learn by categories such as FAQs – Core Java, Key Area – Low Latency, Core Java – Java 8, JEE – Microservices, Big Data – NoSQL, Architecture – Distributed, Big Data – Spark, etc. Some posts belong to multiple categories.

BigData on Cloudera
Module 1 Installing & getting started with Cloudera Quick Start+
Unit 1 Installing & getting started with Cloudera QuickStart on VMWare for windows in 17 steps  - Preview
Unit 2 ⏯ Cloudera Hue, Terminal Window (on edge node) & Cloudera Manager overview  - Preview
Unit 3 Understanding Cloudera Hadoop users  - Preview
Unit 4 Upgrading Java version to JDK 8 in Cloudera Quickstart  - Preview
Module 2 Getting started with HDFS on Cloudera+
Unit 1 ⏯ Hue and terminal window to work with HDFS  - Preview
Unit 2 Java program to list files in HDFS & write to HDFS using Hadoop API  - Preview
Unit 3 ⏯ Java program to list files on HDFS & write to a file in HDFS  - Preview
Unit 4 Write to & Read from a csv file in HDFS using Java & Hadoop API  - Preview
Unit 5 ⏯ Write to & read from HDFS using Hadoop API in Java  - Preview
Module 3 Running an Apache Spark job on Cloudera-
Unit 1 Before running a Spark job on a YARN cluster in Cloudera  - Preview
Unit 2 Running a Spark job on YARN cluster in Cloudera  - Preview
Unit 3 ⏯ Running a Spark job on YARN cluster  - Preview
Unit 4 Write to HDFS from Spark in YARN mode & local mode  - Preview
Unit 5 ⏯ Write to HDFS from Spark in YARN & local modes  - Preview
Unit 6 Spark running on YARN and Local modes reading from HDFS  - Preview
Unit 7 ⏯ Spark running on YARN and Local modes reading from HDFS  - Preview
Module 4 Hive on Cloudera+
Unit 1 Getting started with Hive  - Preview
Unit 2 ⏯ Getting started with Hive  - Preview
Module 5 HBase on Cloudera+
Unit 1 Write to HBase from Java  - Preview
Unit 2 Read from HBase in Java  - Preview
Unit 3 HBase shell commands to get, scan, and delete  - Preview
Unit 4 ⏯ Write to & read from HBase  - Preview
Module 6 Writing to & reading from Avro in Spark+
Unit 1 Write to an Avro file from a Spark job in local mode  - Preview
Unit 2 Read an Avro file from HDFS via a Spark job running in local mode  - Preview
Unit 3 ⏯ Write to & read from an Avro file on HDFS using Spark  - Preview
Unit 4 Write to HDFS as Avro from a Spark job using Avro IDL  - Preview
Unit 5 ⏯ Write to Avro using Avro IDL from a Spark job  - Preview
Unit 6 Create a Hive table over Avro data  - Preview
Unit 7 ⏯ Hive table over an Avro folder & avro-tools to generate the schema  - Preview
Module 7 Writing to & reading from Parquet in Spark+
Unit 1 Write to a Parquet file from a Spark job in local mode  - Preview
Unit 2 Read from a Parquet file in a Spark job running in local mode  - Preview
Unit 3 ⏯ Write to and read from Parquet data on HDFS via Spark  - Preview
Unit 4 Create a Hive table over Parquet data  - Preview
Unit 5 ⏯ Hive over Parquet data  - Preview
Module 8 Spark SQL+
Unit 1 Spark SQL read a Hive table  - Preview
Unit 2 Write to Parquet using Spark SQL & Dataframe  - Preview
Unit 3 Read from Parquet with Spark SQL & Dataframe  - Preview
Unit 4 ⏯ Spark SQL basics video tutorial  - Preview
Module 9 Spark streaming+
Unit 1 Spark streaming text files  - Preview
Unit 2 Spark file streaming in Java  - Preview
Unit 3 ⏯ Spark streaming video tutorial  - Preview
Top