Blog Archives

00: Apache Spark eco system & anatomy interview Q&As

Q01. Can you summarise the Spark eco system?
A01. Apache Spark is a general purpose cluster computing system. It provides high-level API in Java,

Read more ›



00: Data Lake Vs. Data Warehouse Vs. Delta Lake

Modern data architectures will have both the Data Lakes & Data Warehouses. The Data Engineers build the data pipelines for the data analysts and scientists to build business reports &

Read more ›



00: Q1 – Q6 Hadoop based Big Data architecture & basics interview Q&As

There are a number of technologies to ingest & run analytical queries over Big Data (i.e. large volume of data). Big Data is used in Business Intelligence (i.e. BI) reporting,

Read more ›



01: Databricks getting started – Spark, Shell, SQL


Step 1:
Signup to Databricks community edition – https://databricks.com/try-databricks. Fill in the details and you can leave your mobile number blank. Select “

Read more ›



01: Python Iterators, Generators & Decorators Tutorial

Assumes that Python3 is installed as described in Getting started with Python.

1. Iterators

Iterators don’t compute the value of each item when instantiated. They only compute it when you ask for it.

Read more ›



01: Scala Functional Programming basics – pure functions, referential transparency & side effects

Q1. What is a pure function?
A1. A pure function is a function where the following conditions are met:

1) The Input solely determines the output.

Read more ›



02: Databricks – Spark schemas, casting & PySpark API

Prerequisite: Extends Databricks getting started – Spark, Shell, SQL.

Q: What is a Dataframe?
A: A DataFrame is a data abstraction or a domain-specific language (DSL) for working with structured and semi-structured data,

Read more ›



03: Databricks – Spark SCD Type 1

Prerequisite: Extends Databricks getting started – Spark, Shell, SQL.

What is SCD Type 1

SCD stands for Slowly Changing Dimension,

Read more ›



12 Apache Spark getting started interview Q&As

Q01. Where is Apache Spark used in the Hadoop eco system?
A01. Spark is essentially a data processing framework that is faster & more flexible than “Map Reduce”.

Read more ›



15+ SQL scenarios based interview Q&As – part 1

Q01 How will you go about identifying duplicate records in a table?
A01 This is a very popular interview question because there are many approaches,

Read more ›



6 Key considerations in processing large files in Java

Q1. What are the key considerations in processing large files?
A1. Before jumping into coding, get the requirements.

#1 Trade-offs among CPU, Memory Usage &

Read more ›



Spark interview Q&As with coding examples in Scala – part 01: Key basics

Some of these basic Apache Spark interview questions can make or break your chance to get an offer.

Q01. Why is “===” used in the below Dataframe join?

Read more ›



Spark interview Q&As with coding examples in Scala – part 02: partition pruning & column projection

This extends Spark interview Q&As with coding examples in Scala – part 1 with the key optimisation concepts.

Partition Pruning

Q13. What do you understand by the concept Partition Pruning?

Read more ›



Spark interview Q&As with coding examples in Scala – part 05: Transformations, actions, pipelining & shuffling

This extends Spark interview Q&As with coding examples in Scala – part 4 with more coding examples on Databricks Note book.

Prerequisite: Create a free account as per Databricks getting started.

Read more ›



Why acquire skills & experience in low latency & Big Data? What specific skills are required?

According to Dice’s 2017 Salary Survey (PDF), those tech professionals who specialise in data warehousing, analytics, Big Data, low latency,

Read more ›



800+ Java Interview Q&As

Top