Blog Archives
1 2

00: 13 Data Warehouse interview Q&As – Fact Vs Dimension, CDC, SCD, etc

Q1. What is dimensional modelling in a Data Warehouse (i.e. DWH)?
A1. A dimensional model is a data structure technique optimised for Data Warehousing tools (i.e. OLAP products). The concept of Dimensional Modelling is comprised of Fact and Dimension tables.

A “Fact” is a numeric value (i.e. aka a measure



03: Databricks – Spark SCD Type 1

Prerequisite: Extends Databricks getting started – Spark, Shell, SQL.

What is SCD Type 1

SCD stands for Slowly Changing Dimension, and it was explained in 10 Data warehouse interview Q&As.

Step 1: Remove all cells in the notebook with the “x” and then confirm or …



04: Databricks – Spark SCD Type 2

Prerequisite: Extends 03: Databricks – Spark SCD Type 1. What is SCD Type 2 SCD stands for Slowly Changing Dimension, and it was explained in 10 Data warehouse interview Q&As. Step 1: You may have to reattach the cluster to the notebook as clusters auto terminate after 2 hours. Create… …



04a: Databricks – Spark SCD Type 1 with Merge

Prerequisite: Extends 03: Databricks – Spark SCD Type 1. What is SCD Type 1 SCD stands for Slowly Changing Dimension, and it was explained in 10 Data warehouse interview Q&As. Step 1: You may have to reattach the cluster to the notebook as clusters auto terminate after 2 hours. Create… …



04b: Databricks – Spark SCD Type 2 with Merge

Prerequisite:…



05: Q37-Q41 – Data lake & metadata interview Q&As

Q37. What is a Data Lake? A37. A data lake is a storage repository that holds a vast amount of structured, semi-structured, and unstructured raw data in its native format (aka pristine condition). The data structure and requirements are not defined until the data is needed. You can also call… …



10 ERD (Entity-Relationship Diagrams) Interview Q&As

Q01. Can you describe a business domain of a Telecom company offering multiple services to its customers? A01. A Telecom company will have entities such as Customer, Account, Subscriptions & Products representing a business domain. 1) Each customer entity has a name, physical address, and an email address. 2) A… …



Apache Hive for Slowly Changing Dimension (i.e. SCD) interview Q&As

Q1. What is a Slowly Changing Dimension (i.e. SCD)? A1. SCD means the dimensions that change slowly over time, rather than changing on regular basis. For example, change in customer name or address. There are different types of changing dimensions, and type 1 & type 2 are the most common…. …



Canonical Data Model (i.e. CDM) interview Q&As

Q01. What do you understand by the term canonicalizing? A01. Canonicalizing is an activity of replacing multiple copies of an object/entity/URL with just a few objects/entities/URLs. A canonical URL is a good example. If you have a single page that’s accessible by multiple URLs, or different pages with similar content… …



Data categories in Data warehouse & Data lake

Q01. What are the different types of data that get stored in a data lake or data warehouse? A01. An enterprise stores different types of data. #1. Transactional Data Transactional data describes business events like placing an order. Example 1: If you take a Telecom service provider there could be… …



1 2

Java developer & architect Q&As

Java developers Q&As

Top