Problem: When you run a Spark job via “spark-submit” command on a “YARN” cluster as shown below in a terminal,
1 2 3 | -bash-4.1$ spark-submit --class com.mytutorial.SparkSimpleRdd --master yarn --deploy-mode cluster /home/cloudera/projects/simple-spark/target/simple-spark-1.0-SNAPSHOT.jar |
It creates a folder and files in HDFS under
1 2 3 | /user/spark/applicationHistory/ |
The files will be named with “application ids” like…