Running a Spark job on YARN cluster in Cloudera

Step 1: Open a “terminal” window in VMWare.

This creates a maven project “simple-spark” in the folder “/home/cloudera/projects”.

Step 2: Open up “Eclipse” in your VMWare. Import “simple-spark” within eclipse via “File –> Import –> Maven –> Existing Maven Projects” and browse for “/home/cloudera/projects/simple-spark/pom.xml”.

Step 3: Open up the pom.xml file within Eclipse and replace with the one shown below.

Step 4: Create a new simple Spark project under “src/main/java” within “com.mytutorial” package. The code is very simple as it prints a list of numbers. Does not read from or write to HDFS.

Step 5: Package this as a “jar” within Eclipse by selecting the project “simple-spark”, and then right mouse click and “Run As” –> “Maven install”. This builds the jar file “simple-spark-1.0-SNAPSHOT.jar” in the folder “/home/cloudera/projects/simple-spark/target/”.

Step 6: Run it as a Spark job in a “terminal” window by running the following command.

Important: If your VMWare does not have enough memory, it may struggle to run, and will be in the “ACCEPTED” state for a long time, and then timeout. In that case you can try:

1) Exit Eclipse to give “spark-submit” more memory.

2) The default executor and driver memories are 1 GB. Since above code is very trivial, you can run with lesser memory as shown below.

Spark History Server

Step 5: When the Spark job is run, in the logs you will see a URL
like “http://quickstart.cloudera:8088/proxy/application_1511049109960_0003/
” where “application_1511049109960_0003” is the Spark run “application id“.

Open in a web browser in VMware “http://quickstart.cloudera:8088/proxy/application_1511049109960_0003/”.

Spark UI to drill into the job

View all your Spark Job runs via http://quickstart.cloudera:18088/

You can list all your jobs via the history server URL “http://quickstart.cloudera:18088/”.

Spark History Server

You can also get to it via “Cloudera Manager” by

Logging into it with: cloudera/cloudera

Select “Spark” service and then from the top menu select “History Server Web UI“.

Cloudera Manager Services that you can stop, start, restart, configure, etc

Note: If the history server is not running, start it.

Cloudera Spark History Server Web UI


Why & What are the benefits

🎯 Why java-success.com?

🎯 What are the benefits of Q&As approach?

Learn by categories such as FAQs – Core Java, Key Area – Low Latency, Core Java – Java 8, JEE – Microservices, Big Data – NoSQL, Architecture – Distributed, Big Data – Spark, etc. Some posts belong to multiple categories.

BigData on Cloudera
Module 1 Installing & getting started with Cloudera Quick Start+
Unit 1 Installing & getting started with Cloudera QuickStart on VMWare for windows in 17 steps  - Preview
Unit 2 ⏯ Cloudera Hue, Terminal Window (on edge node) & Cloudera Manager overview  - Preview
Unit 3 Understanding Cloudera Hadoop users  - Preview
Unit 4 Upgrading Java version to JDK 8 in Cloudera Quickstart  - Preview
Module 2 Getting started with HDFS on Cloudera+
Unit 1 ⏯ Hue and terminal window to work with HDFS  - Preview
Unit 2 Java program to list files in HDFS & write to HDFS using Hadoop API  - Preview
Unit 3 ⏯ Java program to list files on HDFS & write to a file in HDFS  - Preview
Unit 4 Write to & Read from a csv file in HDFS using Java & Hadoop API  - Preview
Unit 5 ⏯ Write to & read from HDFS using Hadoop API in Java  - Preview
Module 3 Running an Apache Spark job on Cloudera-
Unit 1 Before running a Spark job on a YARN cluster in Cloudera  - Preview
Unit 2 Running a Spark job on YARN cluster in Cloudera  - Preview
Unit 3 ⏯ Running a Spark job on YARN cluster  - Preview
Unit 4 Write to HDFS from Spark in YARN mode & local mode  - Preview
Unit 5 ⏯ Write to HDFS from Spark in YARN & local modes  - Preview
Unit 6 Spark running on YARN and Local modes reading from HDFS  - Preview
Unit 7 ⏯ Spark running on YARN and Local modes reading from HDFS  - Preview
Module 4 Hive on Cloudera+
Unit 1 Getting started with Hive  - Preview
Unit 2 ⏯ Getting started with Hive  - Preview
Module 5 HBase on Cloudera+
Unit 1 Write to HBase from Java  - Preview
Unit 2 Read from HBase in Java  - Preview
Unit 3 HBase shell commands to get, scan, and delete  - Preview
Unit 4 ⏯ Write to & read from HBase  - Preview
Module 6 Writing to & reading from Avro in Spark+
Unit 1 Write to an Avro file from a Spark job in local mode  - Preview
Unit 2 Read an Avro file from HDFS via a Spark job running in local mode  - Preview
Unit 3 ⏯ Write to & read from an Avro file on HDFS using Spark  - Preview
Unit 4 Write to HDFS as Avro from a Spark job using Avro IDL  - Preview
Unit 5 ⏯ Write to Avro using Avro IDL from a Spark job  - Preview
Unit 6 Create a Hive table over Avro data  - Preview
Unit 7 ⏯ Hive table over an Avro folder & avro-tools to generate the schema  - Preview
Module 7 Writing to & reading from Parquet in Spark+
Unit 1 Write to a Parquet file from a Spark job in local mode  - Preview
Unit 2 Read from a Parquet file in a Spark job running in local mode  - Preview
Unit 3 ⏯ Write to and read from Parquet data on HDFS via Spark  - Preview
Unit 4 Create a Hive table over Parquet data  - Preview
Unit 5 ⏯ Hive over Parquet data  - Preview
Module 8 Spark SQL+
Unit 1 Spark SQL read a Hive table  - Preview
Unit 2 Write to Parquet using Spark SQL & Dataframe  - Preview
Unit 3 Read from Parquet with Spark SQL & Dataframe  - Preview
Unit 4 ⏯ Spark SQL basics video tutorial  - Preview
Module 9 Spark streaming+
Unit 1 Spark streaming text files  - Preview
Unit 2 Spark file streaming in Java  - Preview
Unit 3 ⏯ Spark streaming video tutorial  - Preview
Top