Blog Archives
1 2 3 4 5 16

0: 25 Big Data Engineering key concepts that Data Engineers, Analysts & Scientists must know

#01 Data Cardinality In data modelling, cardinality is the numerical relationship between rows of one table & rows in another. Common cardinalities are one-to-one, one-to-many and many-to-many. Data cardinality also refers to the uniqueness of the values contained in a database column. If most of the values are distinct, then…

Read more ...

00: 13 Data modelling interview Q&As – Fact Vs Dimension, CDC, SCD, etc – part 1

Q1. What is dimensional modelling in a Data Warehouse (i.e. DWH)?
A1. A dimensional model is a data structure technique optimised for Data Warehousing tools (i.e. OLAP products). The concept of Dimensional Modelling is comprised of Fact and Dimension tables.

A “Fact” is a numeric value (i.e. aka a measure) that a business wishes to count or sum. A “Dimension” is essentially descriptive value in text for getting at the facts.… Read more ...


00: 13 Data modelling interview Q&As – Fact Vs Dimension, CDC, SCD, etc – part 2

This extends Q1 to Q5 at 13 Data Warehouse interview Q&As – Fact Vs Dimension, CDC, SCD, etc – part 1. Q6. What is a Factless Fact table? A6. From the above store sales example we know that fact table is a collection of many facts having multiple keys joined…

Read more ...

00: 18+ SQL best practices interview Q&As

It is a must to know the order in which the SQL clauses are executed. This is demonstrated with an example below in #5. Have this order of execution visibly pinned and understood. SQL is very easy to learn, but lots of hands-on experience is required to master:

1) to translate business requirements into SQL.
2) to write efficient & maintainable queries.… Read more ...


00: 50+ SQL scenarios based interview Q&As on identifying & deleting duplicate records

50+ SQL interview questions and answers to solve real business scenarios. SQL is widely used in building microservices & Big Data projects. Learning SQL syntax is easy, but being able to convert a given business requirement into a query takes lots of practice. These scenarios based interview questions can assesses your experience.

Considerations & Tips

It is important to first understand the problem statement, and then ask the right questions to solve the problem.… Read more ...

00: A roadmap to become a Big Data Engineer – What skills are required?

What is all the hype about becoming a (Big) Data Engineer? There is a demand for Data Engineers as organisations have been ramping up their investments on big data related projects since 2019.

Why Big Data?

Confused about the various roles like Data Engineer, Technical Business Analyst, DevOps Engineer, Data Scientist, etc. Often Big Data projects will have all the above roles complimenting each other.… Read more ...


00: Apache Spark eco system & anatomy interview questions and answers

Q01. Can you summarise the Spark eco system?
A01. Apache Spark is a general purpose cluster computing system. It provides high-level API in Java, Scala, Python, and R. It has 6 components Core, Spark SQL, Spark Streaming, Spark MLlib, Spark GraphX, and SparkR. All the functionalities being provided by Apache Spark are built on the top of Spark Core. Spark Core is the foundation of in-memory parallel and distributed processing of huge dataset with fault-tolerance & recovery.… Read more ...


00: Data Lake Vs. Data Warehouse Vs. Data Lakehouse Vs Data Fabric Vs Data Mesh

Modern data architectures will have both the Data Lakes and Data Warehouses. The Data Engineers build the data pipelines for the data analysts and scientists to build business reports & models to analyse the data.

What is Big Data?

Big Data is huge volumes of structured (e.g. entries in tables, rows & columns), semi-structured (e.g. JSON, XML, log files, etc) and unstructured (i.e.… Read more ...

00: Q1 – Q6 Hadoop based Big Data architecture & basics interview Q&As

There are a number of technologies to ingest & run analytical queries over Big Data (i.e. large volume of data). Big Data is used in Business Intelligence (i.e. BI) reporting, Data Science, Machine Learning, and Artificial Intelligence (i.e. AI). Processing a large volume of data will be intensive on disk I/O, CPU, and memory usage. Big Data processing is based on distributed computing where you will have multiple nodes across several machines to process data in parallel.… Read more ...


01: Data Backfilling interview questions & answers

Q1. What is data backfilling? A1. Backfilling data is a process of reactively processing any missing data for a past time window. Q2. Why do you need to backfill data? A2. There are two types of data loads from source systems to target (aka sink) systems via ETL pipelines: 1)…

Read more ...

1 2 3 4 5 16

Java Developer & Architect Interview Q&As

Java & Big Data Tutorials

Prepare to fast-track & go places

FAQs are marked with 🔥 as some questions are not only more popular with the interviewers, but also required to build robust systems. If you are an interviewer, cover well rounded topics to judge real experience.

Don't be overwhelmed by the number of questions as the technology stacks are so vast. The quality of the answers you provide to some of the key technical & open-ended questions along with your soft skills & attitude will go a long way in getting the job offers.

Note: Some Q&As belong to more than one category.