Blog Archives
Page 1 of 2
1 2

15+ Apache Spark best practices, memory mgmt & performance tuning interview FAQs – Part-1

There are so many different ways to solve the big data problems at hand in Spark, but some approaches can impact on performance, and lead to performance and memory issues. Here are some best practices to keep in mind when writing Spark jobs. General Best Practices #1 Favor DataFrames over...

Members Only Content
Log In Register Home


15+ Apache Spark best practices, memory mgmt & performance tuning interview FAQs – Part-2

This extends 15+ Apache Spark best practices, memory mgmt & performance tuning interview FAQs – Part-1. #7 Use Spark UI: Running Spark jobs without inspecting the Spark UI is a definite NO. It is a very handy debugging & … Read more ›...

Members Only Content
Log In Register Home


Debugging Spark applications written in Java locally by connecting to HDFS, Hive and HBase

This extends Remotely debugging Spark submit Jobs in Java. Running Spark in local mode When you run Spark in local mode, both the Driver and Executor will be running in the same JVM and is very handy to debug the logic of your transformations. … Read more ›...

Members Only Content
Log In Register Home


Finding your way around YARN and Spark on Cloudera

What is Apache Hadoop YARN? Apache Hadoop YARN (Yet Another Resource Negotiator) is the prerequisite for Enterprise Hadoop for dynamic allocation oc the cluster resources. For example, when you run a Spark job on YARN as opposed to in standalone mode, you can will take advantage of categorizing, isolating, and...

Members Only Content
Log In Register Home


Remotely debugging Spark submit Jobs in Java

This extends Remote debugging in Java with Java Debug Wire Protocol (JDWP) to debug Spark jobs written in Java. We need to debug both the “Driver” and the “Executor“.

Debugging the Spark Driver in Java

Step 1: Run the Spark submit job in the remote machine,

Read more ›



Page 1 of 2
1 2

800+ Java Interview Q&As Menu

Top