Blog Archives
1 2 3 4 5 12

00: 13 Data Warehouse interview Q&As – Fact Vs Dimension, CDC, SCD, etc

Q1. What is dimensional modelling in a Data Warehouse (i.e. DWH)?
A1. A dimensional model is a data structure technique optimised for Data Warehousing tools (i.e. OLAP products). The concept of Dimensional Modelling is comprised of Fact and Dimension tables.

Read more ›

00: Apache Spark eco system & anatomy interview Q&As

Q01. Can you summarise the Spark eco system?
A01. Apache Spark is a general purpose cluster computing system. It provides high-level API in Java, Scala, Python, and R. It has 6 components Core, Spark SQL, Spark Streaming, Spark MLlib,

Read more ›

00: Data Lake Vs. Data Warehouse Vs. Delta Lake

Modern data architectures will have both the Data Lakes & Data Warehouses. Q1. What questions do you need to ask for choosing a Data Warehouse over a Data Lake for your BI (i.e. Business Intelligence) reporting? A1. The gap between a data lake & … Read more ›...

00: Q1 – Q6 Hadoop based Big Data architecture & basics interview Q&As

There are a number of technologies to ingest & run analytical queries over Big Data (i.e. large volume of data). Big Data is used in Business Intelligence (i.e. BI) reporting, Data Science, Machine Learning, and Artificial Intelligence (i.e. AI). Processing a large volume of data will be intensive on disk I/O,

Read more ›

01: Coding “Java way in Scala” Vs “Scala way in Scala”

Example #1: Read from a list & write to a list

Java Way in Scala

Output: List(Java Programming, Scala Programming, Ruby Programming)

Scala Way in Scala: Using the “map” functional combinator

Immutable code as shown below using the “map”

Read more ›

01: Lambda, Kappa & Delta Data Architectures Interview Q&As – Overview

Q1. What is the Lambda Architecture? A1. It is a data-processing architecture designed to handle Big Data by using both real-time streaming (e.g. Spark streaming, Apache Storm) and batch processing (E.g. Hive, Pig, Spark batch). This means you have to build 2 separate pipelines. … Read more ›...

01: Python Iterators, Generators & Decorators Tutorial

Assumes that Python3 is installed as described in Getting started with Python.

1. Iterators

Iterators don’t compute the value of each item when instantiated. They only compute it when you ask for it. This is known as lazy evaluation. Lazy evaluation is useful when you have a very large data set to compute.

Read more ›

01: Q01 – Q07 General Big Data, Data Science & Data Analytics Interview Q&As

Q01. How is Big Data used in industries?
A01. The main goal for most organisations is to enhance customer experience, and consequently increase sales. The other goals include cost reduction, better targeted marketing, fraud detection, identifying data breaches to enhance security, making existing processes more efficient,

Read more ›

01: Scala Functional Programming basics – pure functions, referential transparency, side effects, etc

Q1. What is a pure function?
A1. A pure function is a function where the following conditions are met:

1) The Input solely determines the output.

2) The function does not change its input.

3) The function does not do anything else except computing the output.

Read more ›

02: Cleansing & pre-processing data in BigData & machine learning with Spark interview Q&As

Q1. Why are data cleansing & pre-processing important in analytics & machine learning? A1. Garbage in gets you garbage out. No matter how good your machine learning algorithm is. Q2. What are the general steps of cleansing data A2. … Read more ›...

02: Coding Scala Way – Recursion & Iterator in FP

This extends Coding Scala Way – Part 1 Example #4: FP using both recursion and functional combinator like foldLeft Can you write the following code written in Java the Scala way? Java coding question on recursion and generics 1. Define the Trait 2.… Read more ›...

02: Python comprehensions tutorial

Q. What is a comprehension?
A. Comprehensions are constructs that allow sequences to be built from other sequences. Python 2.0 introduced list comprehensions and Python 3.0 comes with dictionary, set and generator comprehensions.

List Comprehension

Set Comprehension

Given a list return a set.

Read more ›

02: Q7 – Q15 Hadoop overview & architecture interview Q&As

This extends Q1 – Q6 Hadoop Overview & Architecture interview Q&As. Q7. What are the major machine roles in a Hadoop cluster? A7. The three major categories of machine roles in a Hadoop cluster are 1) Client machines. … Read more ›...

1 2 3 4 5 12

800+ Java Q&As & tutorials