Blog Archives
1 2 3

00: Apache Spark eco system & anatomy interview Q&As

Q01. Can you summarise the Spark eco system?
A01. Apache Spark is a general purpose cluster computing system. It provides high-level API in Java,

Read more ›



02: Cleansing & pre-processing data in BigData & machine learning with Spark interview Q&As

Q1. Why are data cleansing & pre-processing important in analytics & machine learning? A1. Garbage in gets you garbage out. No matter how good your machine learning algorithm is. …...



12 Apache Spark getting started interview Q&As

Q01. Where is Apache Spark used in the Hadoop eco system?
A01. Spark is essentially a data processing framework that is faster & more flexible than “Map Reduce”.

Read more ›



14: Q105 – Q108 Spark “map” vs “flatMap” interview questions & answers

Q105. What is the difference between “map” and “flatMap” operations in Spark? A105. The map and flatMap are transformation operations in Spark. … Read more ›...



15: Q109 – Q113 Spark RDD partitioning and “mapPartitions” interview questions & answers

Q109. What is the difference between “map” and “mapPartitions” transformations in Spark? A109. The method map converts each element of the source RDD into a single element of the result...



17: Spark interview Q&As with coding examples in pyspark (i.e. python)

Q01. How will you create a Spark context? A01. Q02. How will you create a Dataframe by reading a file from AWS S3 bucket? … Read more ›...



40+ Apache Spark best practices & optimisation interview FAQs – Part-1

There are so many different ways to solve the big data problems at hand in Spark, but some approaches can impact on performance, and lead to performance and memory issues.

Read more ›



40+ Apache Spark best practices & optimisation interview FAQs – Part-2

This extends 40+ Apache Spark best practices & optimisation interview FAQs – Part-1, where best practices 1-6 were covered with examples & diagrams. #11 Use Spark UI: Running Spark jobs...



40+ Apache Spark best practices & optimisation interview FAQs – part 03: Partitions & buckets

#31 Bucketing is another data optimisation technique that groups data with the same bucket value across a fixed number of “buckets”. Bucketing improves performance in wide transformations and joins by...



5 Spark streaming & Apache storm Interview Q&As

Q116. What is “Spark streaming” in the Spark ecosystem with Spark core, Spark SQL, Spark MLlib, Spark GraphX, etc? A116. Spark is a distributed and scalable batch processing framework that...



1 2 3

300+ Java Developer Interview Q&As

800+ Java Interview Q&As

Top