Blog Archives

00: Data Lake Vs. Data Warehouse Vs. Data Lakehouse

Modern data architectures will have both the Data Lakes and Data Warehouses. The Data Engineers build the data pipelines for the data analysts and scientists to build business reports & models to analyse the data.

Q01. What is a Data Lake?
A01. It is a distributed storage system to store different types of data from distributed source systems.…



00: Q1 – Q6 Hadoop based Big Data architecture & basics interview Q&As

There are a number of technologies to ingest & run analytical queries over Big Data (i.e. large volume of data). Big Data is used in Business Intelligence (i.e. BI) reporting, Data Science, Machine Learning, and Artificial Intelligence (i.e. AI). Processing a large volume of data will be intensive on disk I/O, CPU, and memory usage.…



01: Lambda, Kappa & Delta Data Architectures Interview Q&As – Overview

Q1. What is the Lambda Architecture? A1. It is a data-processing architecture designed to handle Big Data by using both real-time streaming (e.g. Spark streaming, Apache Storm) and batch processing…



02: Q7 – Q15 Hadoop overview & architecture interview Q&As

This extends Q1 – Q6 Hadoop Overview & Architecture interview Q&As. Q7. What are the major machine roles in a Hadoop cluster? A7. The three major categories of machine roles…



08: Q71 – Q75 ETL/ELT on BigData Interview Q&As

Q71. Can ETL in traditional data management (E.g. RDBMs) be migrated to EDH (i.e. Enterprise Data Hub) powered by Hadoop eco system? A71. Yes, it can be migrated, but it…



10 Distributed storage & computing systems interview Q&As

A distributed system consists of multiple software components that are on multiple computers (aka nodes), but run as a single system. These components can be stateful, stateless, or serverless, and these components can be created in different languages running on hybrid environments and developing open-source technologies, open standards, and interoperability.…



16: Q114 – Q115 CAP theorem interview Q&As

Q114. What does CAP stand for in CAP theorem?
A114. In a distributed system having two or more nodes, and maintaining two or more copies of your data for fault tolerance, the CAP theorem can be depicted & explained as below:

Consistency – Every read should give the most recent write.…



800+ Java & Big Data Interview Q&As

200+ Java & Big Data Tutorials

Top