Blog Archives
1 2 3 4

00: Apache Spark eco system & anatomy interview Q&As

Q01. Can you summarise the Spark eco system?
A01. Apache Spark is a general purpose cluster computing system. It provides high-level API in Java, Scala, Python, and R. It has 6 components Core, Spark SQL, Spark Streaming, Spark MLlib,

Read more ›



02: Cleansing & pre-processing data in BigData & machine learning with Spark interview Q&As

Q1. Why are data cleansing & pre-processing important in analytics & machine learning? A1. Garbage in gets you garbage out. No matter how good your machine learning algorithm is. Q2. What are the general steps of cleansing data A2. … Read more ›...



12 Apache Spark getting started interview Q&As

Q01. Where is Apache Spark used in the Hadoop eco system?
A01. Spark is essentially a data processing framework that is faster & more flexible than “Map Reduce”. The Spark itself has grown into an eco system with Spark SQL, Spark streaming, Spark UI, GraphX,

Read more ›



14: Q105 – Q108 Spark “map” vs “flatMap” interview questions & answers

Q105. What is the difference between “map” and “flatMap” operations in Spark? A105. The map and flatMap are transformation operations in Spark. map transformation is applied to each element of RDD and it returns the result as a new RDD. … Read more ›...



15: Q109 – Q113 Spark RDD partitioning and “mapPartitions” interview questions & answers

Q109. What is the difference between “map” and “mapPartitions” transformations in Spark? A109. The method map converts each element of the source RDD into a single element of the result RDD by applying a function. The method mapPartitions converts each partition of the source RDD into multiple elements of the...



15+ Apache Spark best practices, memory mgmt & performance tuning interview FAQs – Part-1

There are so many different ways to solve the big data problems at hand in Spark, but some approaches can impact on performance, and lead to performance and memory issues. Here are some best practices to keep in mind when writing Spark jobs. General Best Practices #1 Favor DataFrames over...



15+ Apache Spark best practices, memory mgmt & performance tuning interview FAQs – Part-2

This extends 15+ Apache Spark best practices, memory mgmt & performance tuning interview FAQs – Part-1, where best practices 1-6 were covered with examples & diagrams.

#7 Use Spark UI: Running Spark jobs without inspecting the Spark UI is a definite NO.

Read more ›



1 2 3 4

Java Interview FAQs

800+ Java Interview Q&As

Top