Blog Archives

00: Data Lake Vs. Data Warehouse Vs. Data Lakehouse

Modern data architectures will have both the Data Lakes and Data Warehouses. The Data Engineers build the data pipelines for the data analysts and scientists to build business reports & models to analyse the data.

Q01. What is a Data Lake?

Read more ›

00: Q1 – Q6 Hadoop based Big Data architecture & basics interview Q&As

There are a number of technologies to ingest & run analytical queries over Big Data (i.e. large volume of data). Big Data is used in Business Intelligence (i.e. BI) reporting, Data Science, Machine Learning, and Artificial Intelligence (i.e. AI). Processing a large volume of data will be intensive on disk I/O,

Read more ›

01: Lambda, Kappa & Delta Data Architectures Interview Q&As – Overview

Q1. What is the Lambda Architecture? A1. It is a data-processing architecture designed to handle Big Data by using both real-time streaming (e.g. Spark streaming, Apache Storm) and batch processing (E.g. Hive, Pig, Spark batch). This means you have to build 2 separate pipelines. … Read more ›...

02: Q7 – Q15 Hadoop overview & architecture interview Q&As

This extends Q1 – Q6 Hadoop Overview & Architecture interview Q&As. Q7. What are the major machine roles in a Hadoop cluster? A7. The three major categories of machine roles in a Hadoop cluster are 1) Client machines. … Read more ›...

08: Q71 – Q75 ETL/ELT on BigData Interview Q&As

Q71. Can ETL in traditional data management (E.g. RDBMs) be migrated to EDH (i.e. Enterprise Data Hub) powered by Hadoop eco system? A71. Yes, it can be migrated, but it is not a direct & straight forward migration as there is a mismatch in underpinning concepts & … Read more...

10 Distributed storage & computing systems interview Q&As – Big Data

A distributed system consists of multiple software components that are on multiple computers (aka nodes), but run as a single system. These components can be stateful, stateless, or serverless, and these components can be created in different languages running on hybrid environments and developing open-source technologies, open standards, and interoperability.

Read more ›

16: Q114 – Q115 CAP theorem interview Q&As

Q114. What does CAP stand for in CAP theorem?
A114. In a distributed system having two or more nodes, and maintaining two or more copies of your data for fault tolerance, the CAP theorem can be depicted & explained as below:


Read more ›

800+ Java Interview Q&As

Java & Big Data Tutorials