Blog Archives
1 2

00: Apache Spark eco system & anatomy interview Q&As

Q01. Can you summarise the Spark eco system?
A01. Apache Spark is a general purpose cluster computing system. It provides high-level …

Read more ›



02: Cleansing & pre-processing data in BigData & machine learning with Spark interview Q&As

Q1. Why are data cleansing & pre-processing important in analytics & machine learning? A1. Garbage in gets you garbage out. No … … Read more ›...

Members Only Content
Log In Register Home


12 Apache Spark getting started interview Q&As

Q01. Where is Apache Spark used in the Hadoop eco system?
A01. Spark is essentially a data processing framework that is …

Read more ›



14: Q105 – Q108 Spark “map” vs “flatMap” interview questions & answers

Q105. What is the difference between “map” and “flatMap” operations in Spark? A105. The map and flatMap are transformation operations in … … Read more ›...

Members Only Content
Log In Register Home


15: Q109 – Q113 Spark RDD partitioning and “mapPartitions” interview questions & answers

Q109. What is the difference between “map” and “mapPartitions” transformations in Spark? A109. The method map converts each element of the … … Read more ›...

Members Only Content
Log In Register Home


15+ Apache Spark best practices, memory mgmt & performance tuning interview FAQs – Part-1

There are so many different ways to solve the big data problems at hand in Spark, but some approaches can impact … … Read more ›...

Members Only Content
Log In Register Home


15+ Apache Spark best practices, memory mgmt & performance tuning interview FAQs – Part-2

This extends 15+ Apache Spark best practices, memory mgmt & performance tuning interview FAQs – Part-1, where best practices 1-6 … … Read more ›...

Members Only Content
Log In Register Home


17: Spark interview Q&As with coding examples in pyspark (i.e. python)

Q01. How will you create a Spark context? A01. Q02. How will you create a Dataframe by reading a file from … … Read more ›...

Members Only Content
Log In Register Home


8 Apache Spark repartition Vs. coalesce scenarios interview Q&As

Q01: Why is partitioning of data required in Apache Spark?
A01: Partitioning is a key concept in distributed systems where the …

Read more ›



Debugging Spark applications written in Java locally by connecting to HDFS, Hive and HBase

This extends Remotely debugging Spark submit Jobs in Java. Running Spark in local mode When you run Spark in local … … Read more ›...

Members Only Content
Log In Register Home


Spark interview Q&As with coding examples in Scala – part 1

Some of these basic Apache Spark interview questions can make or break your chance to get an offer.

Q01. Why is …

Read more ›



Spark interview Q&As with coding examples in Scala – part 2

This extends Spark interview Q&As with coding examples in Scala – part 1 with the key optimisation concepts.

Partition Pruning

Q13.

Read more ›



Spark interview Q&As with coding examples in Scala – part 3

This extends Spark interview Q&As with coding examples in Scala – part 2 with more coding examples on Databricks Note book.… … Read more ›...

Members Only Content
Log In Register Home


1 2

800+ Java & Big Data Interview Q&As

Top