Blog Archives
1 2 3 4 5 8

0: 25 Big Data Engineering key concepts that Data Engineers, Analysts & Scientists must know

#01 Data Cardinality

In data modelling, cardinality is the numerical relationship between rows of one table & rows in another. Common cardinalities are one-to-one, one-to-many and many-to-many.

Data cardinality also refers to the uniqueness of the values contained in a database column. If most of the values are distinct, then it is considered to have high cardinality.… Read more ...



00: 13 Data Warehouse interview Q&As – Fact Vs Dimension, CDC, SCD, etc – part 1

Q1. What is dimensional modelling in a Data Warehouse (i.e. DWH)?
A1. A dimensional model is a data structure technique optimised for Data Warehousing tools (i.e. OLAP products). The concept of Dimensional Modelling is comprised of Fact and Dimension tables.

A “Fact” is a numeric value (i.e. aka a measure) that a business wishes to count or sum.… Read more ...

Tags:

00: 13 Data Warehouse interview Q&As – Fact Vs Dimension, CDC, SCD, etc – part 2

This extends Q1 to Q5 at 13 Data Warehouse interview Q&As – Fact Vs Dimension, CDC, SCD, etc – part 1.

Q6. What is a Factless Fact table?
A6. From the above store sales example we know that fact table is a collection of many facts having multiple keys joined with one or more dimension tables.… Read more ...

Tags:

00: 18+ SQL best practices interview Q&As

It is a must to know the order in which the SQL clauses are executed. This is demonstrated with an example below in #5. Have this order of execution visibly pinned and understood. SQL is very easy to learn, but lots of hands-on experience is required to master:

1) to translate business requirements into SQL.… Read more ...

Tags:

00: 25+ SQL interview questions & answers – beginner

SQL interview Questions & Answers is a must for any developer as all non-trivial applications need to talk to a database with CRUD operations. Q3 – Q15 are very popular with the interviewers.

If you want to quickly practice your SQL skills try DB Fiddle or install MySQL locally as shown in the MySQL database getting started.… Read more ...

Tags:

00: 25+ SQL interview questions & answers – intermediate to experienced

This continues 25+ SQL interview questions & answers – beginner Q16. Why do you have CASE statements in SQL? A16. CASE statements in SQL are similar to IF and ELSE conditions in programming languages. CASE statements are used to fetch particular values based on certain conditions.

In the above…

Read more ...


00: A roadmap to become a Big Data Engineer – What skills are required?

What is all the hype about becoming a (Big) Data Engineer? There is a demand for Data Engineers as organisations have been ramping up their investments on big data related projects since 2019. Why Big Data?

Confused about the various roles like Data Engineer, Technical Business Analyst, DevOps Engineer, Data Scientist, etc.… Read more ...

Tags:

00: Apache Spark eco system & anatomy interview questions and answers

Q01. Can you summarise the Spark eco system?
A01. Apache Spark is a general purpose cluster computing system. It provides high-level API in Java, Scala, Python, and R. It has 6 components Core, Spark SQL, Spark Streaming, Spark MLlib, Spark GraphX, and SparkR. All the functionalities being provided by Apache Spark are built on the top of Spark Core.… Read more ...

Tags:

00: Data Lake Vs. Data Warehouse Vs. Data Lakehouse Vs Data Fabric Vs Data Mesh

Modern data architectures will have both the Data Lakes and Data Warehouses. The Data Engineers build the data pipelines for the data analysts and scientists to build business reports & models to analyse the data.

What is Big Data?

Big Data is huge volumes of structured (e.g. entries in tables, rows & columns), semi-structured (e.g.… Read more ...
Tags:

00: Q1 – Q6 Hadoop based Big Data architecture & basics interview Q&As

There are a number of technologies to ingest & run analytical queries over Big Data (i.e. large volume of data). Big Data is used in Business Intelligence (i.e. BI) reporting, Data Science, Machine Learning, and Artificial Intelligence (i.e. AI). Processing a large volume of data will be intensive on disk I/O, CPU, and memory usage.… Read more ...

Tags:

01: Data Backfilling interview questions & answers

Q1. What is data backfilling? A1. Backfilling data is a process of reactively processing any missing data for a past time window. Q2. Why do you need to backfill data? A2. There are two types of data loads from source systems to target (aka sink) systems via ETL pipelines: 1)…

Read more ...


01: 15+ Apache Kafka must-know basics interview Q&As – Part 1

Apache Kafka is used in Micro Services Architecture (i.e. MSA) to Big Data & Low Latency application architectures.

Q1. What is Apache Kafka?
A1. Apache Kafka is a distributed messaging broker. The purpose of the Kafka project is to provide a unified, high-throughput, and low latency platform for real-time data processing.… Read more ...



01: 15+ Apache Kafka must-know basics interview Q&As – Part 2

This extends 8 Apache Kafka must-know basics interview Q&As – Part 1. Q4. What do you understand by the term “data is presented to Kafka as stream”? A4. This means either the Data is acquired from source systems in real time or as a scheduled extract process, the data is…

Read more ...


01: 15+ Apache Kafka must-know basics interview Q&As – Part 3

This extends Apache Kafka must-know basics interview Q&As – Part 2. Q10. What do you understand by the terms Kafka Consumer Groups & group.id? A10. Consumers read from any single partition, allowing you to scale throughput of message consumption as depicted below. Consumers can also be organised into consumer groups…

Read more ...


01: 50+ SQL scenarios based interview Q&As on identifying & deleting duplicate records

50+ SQL interview questions and answers to solve real business scenarios. SQL is widely used in building microservices & Big Data projects. Learning SQL syntax is easy, but being able to convert a given business requirement into a query takes lots of practice. These scenarios based interview questions can assesses your experience.… Read more ...

Tags: ,

01: Apache Kafka example with Java – getting started tutorial

Apache Kafka with Java getting started tutorial demonstrates how quickly you can get started with Kafka using Docker.

Step 1: Make sure Docker engine is installed on your computer. For example on a Mac OS $ brew cask install docker or on Windows.

Step 2: Start the Docker engine on your operating system.… Read more ...



01: AWS interview Q&As on VPC, Subnets, Availability Zones, VPN, Route tables, NACLs & Security Groups

This extends Architecture Networking. The above diagram addresses many of the questions that follow. Q1. What is a VPC in AWS? A1. A virtual private cloud (VPC) is a virtual network dedicated to your AWS account. It is logically isolated from other virtual networks in the AWS Cloud. You can…

Read more ...


01: Databricks interview questions & answers – overview

The best way to prepare for the Databricks interview is via the 28 tutorials on getting started with Databricks & PySpark. These tutorials will not only get you started on Databricks, but also help you prepare for the job interviews.

Here you will look at some high level Databricks interview Questions & answers.… Read more ...



01: High level & low level system design considerations for read heavy systems

Q1. What are some of the design considerations for a read heavy system? A1. Before designing any systems, one should gather the functional & non-functional requirements. The SLAs (i.e. Service Level Agreements) have to be clearly defined. A rough-cut capacity planning has to be done in terms of how many…

Read more ...


01: High level & low level system design considerations for write heavy systems

This extends High level & low level system design considerations for read heavy systems Q1. What are some of the design considerations for a write heavy system? A1. Before designing any systems, one should gather the functional & non-functional requirements. The SLAs (i.e. Service Level Agreements) have to be clearly…

Read more ...


1 2 3 4 5 8

500+ Java Interview FAQs

Java & Big Data Tutorials

Top