Blog Archives

0: 25 Big Data Engineering key concepts that Data Engineers, Analysts & Scientists must know

#01 Data Cardinality

In data modelling, cardinality is the numerical relationship between rows of one table & rows in another. Common cardinalities are one-to-one, one-to-many and many-to-many.

Data cardinality also refers to the uniqueness of the values contained in a database column. If most of the values are distinct, then it is considered to have high cardinality.… Read more ...

00: A roadmap to become a Big Data Engineer – What skills are required?

What is all the hype about becoming a (Big) Data Engineer? There is a demand for Data Engineers as organisations have been ramping up their investments on big data related projects since 2019. Why Big Data?

Confused about the various roles like Data Engineer, Technical Business Analyst, DevOps Engineer, Data Scientist, etc.… Read more ...


00: Data Lake Vs. Data Warehouse Vs. Data Lakehouse Vs Data Fabric Vs Data Mesh

Modern data architectures will have both the Data Lakes and Data Warehouses. The Data Engineers build the data pipelines for the data analysts and scientists to build business reports & models to analyse the data.

What is Big Data?

Big Data is huge volumes of structured (e.g. entries in tables, rows & columns), semi-structured (e.g.… Read more ...

01: Q01 – Q07 General Big Data, Data Science & Data Analytics Interview Q&As

Q01. How is Big Data used in industries?
A01. The main goal for most organisations is to enhance customer experience, and consequently increase sales. The other goals include cost reduction, better targeted marketing, fraud detection, identifying data breaches to enhance security, making existing processes more efficient, medical records to drug discovery and genetic disease exploration, and the list goes on.… Read more ...


02: Cleansing & pre-processing data in BigData & machine learning with Spark interview questions & answers

Q1. Why are data cleansing & pre-processing important in analytics & machine learning?
A1. Garbage in gets you garbage out. No matter how good your machine learning algorithm is.

Q2. What are the general steps of cleansing data
A2. General steps involve Deduplication, dropping/imputing missing values, fixing structural errors, removing the outliers, encoding the categorical values and scaling down the features.… Read more ...


08: Q71 – Q75 ETL/ELT on BigData Interview Q&As

Q71. Can ETL in traditional data management (E.g. RDBMs) be migrated to EDH (i.e. Enterprise Data Hub) powered by Hadoop eco system? A71. Yes, it can be migrated, but it is not a direct & straight forward migration as there is a mismatch in underpinning concepts & technologies between RDBMs…

Read more ...

500+ Java Interview FAQs

Java & Big Data Tutorials