Blog Archives
1 2 3 4 5 11

00: 13 Data Warehouse interview Q&As – Fact Vs Dimension, CDC, SCD, etc – part 1

Q1. What is dimensional modelling in a Data Warehouse (i.e. DWH)?
A1. A dimensional model is a data structure technique optimised for Data Warehousing tools (i.e. OLAP products). The concept of Dimensional Modelling is comprised of Fact and Dimension tables.

A “Fact” is a numeric value (i.e. aka a measure) that a business wishes to count or sum.…



00: 13 Data Warehouse interview Q&As – Fact Vs Dimension, CDC, SCD, etc – part 2

This extends Q1 to Q5 at 13 Data Warehouse interview Q&As – Fact Vs Dimension, CDC, SCD, etc – part 1. Q6. What is a Factless Fact table? A6. From…



00: Apache Spark eco system & anatomy interview Q&As

Q01. Can you summarise the Spark eco system?
A01. Apache Spark is a general purpose cluster computing system. It provides high-level API in Java, Scala, Python, and R. It has 6 components Core, Spark SQL, Spark Streaming, Spark MLlib, Spark GraphX, and SparkR. All the functionalities being provided by Apache Spark are built on the top of Spark Core.…



00: Data Lake Vs. Data Warehouse Vs. Data Lakehouse

Modern data architectures will have both the Data Lakes and Data Warehouses. The Data Engineers build the data pipelines for the data analysts and scientists to build business reports & models to analyse the data.

Q01. What is a Data Lake?
A01. It is a distributed storage system to store different types of data from distributed source systems.…



00: Q1 – Q6 Hadoop based Big Data architecture & basics interview Q&As

There are a number of technologies to ingest & run analytical queries over Big Data (i.e. large volume of data). Big Data is used in Business Intelligence (i.e. BI) reporting, Data Science, Machine Learning, and Artificial Intelligence (i.e. AI). Processing a large volume of data will be intensive on disk I/O, CPU, and memory usage.…



01: Data Backfilling interview questions & answers

Q1. What is data backfilling? A1. Backfilling data is a process of reactively processing any missing data for a past time window. Q2. Why do you need to backfill data?…



01: 8 Apache Kafka must-know basics interview Q&As

Apache Kafka is used in Micro Services Architecture (i.e. MSA) to Big Data & Low Latency application architectures. Q1. What is Apache Kafka? A1. Apache Kafka is a distributed messaging…



01: AWS Q&As on VPC, Subnets, Availability Zones, VPN, Route tables, NACLs & Security Groups

The above diagram addresses many of the questions that follow. Q1. What is a VPC in AWS? A1. A virtual private cloud (VPC) is a virtual network dedicated to your…



01: Lambda, Kappa & Delta Data Architectures Interview Q&As – Overview

Q1. What is the Lambda Architecture? A1. It is a data-processing architecture designed to handle Big Data by using both real-time streaming (e.g. Spark streaming, Apache Storm) and batch processing…



01: NoSQL interview Q&As – Key Value, Wide Column, Document & Graph databases

Q1. What are the key differences between SQL & NoSQL databases? A1. Now a days you have a choice which database to use based on your requirements. There are pros…



1 2 3 4 5 11

800+ Java & Big Data Interview Q&As

200+ Java & Big Data Tutorials

Top