Blog Archives

Debugging Spark applications written in Java locally by connecting to HDFS, Hive and HBase

This extends Remotely debugging Spark submit Jobs in Java. Running Spark in local mode When you run Spark in local mode, both the Driver and Executor will be running in…



Finding your way around YARN and Spark on Cloudera

What is Apache Hadoop YARN? Apache Hadoop YARN (Yet Another Resource Negotiator) is the prerequisite for Enterprise Hadoop for dynamic allocation oc the cluster resources. For example, when you run…



Remotely debugging Spark submit Jobs in Java

This extends Remote debugging in Java with Java Debug Wire Protocol (JDWP) to debug Spark jobs written in Java. We need to debug both the “Driver” and the “Executor“.

Debugging the Spark Driver in Java

Step 1: Run the Spark submit job in the remote machine, which waits on port “7777” for the eclipse debugger to connect.…



Spark understanding DAG for tuning performance interview Q&As

This extends 15 Apache Spark best practices & performance tuning interview FAQs to delve into DAGs, Stages, Tasks, Partitions and Shuffling in Spark. If you can’t read Spark Event Timelines…



800+ Java & Big Data Interview Q&As

200+ Java & Big Data Tutorials

Top