Blog Archives

01: Q01 – Q07 General Big Data, Data Science & Data Analytics Interview Q&As

Q01. How is Big Data used in industries?
A01. The main goal for most organisations is to enhance customer experience, and consequently increase sales. The other goals include cost reduction, better targeted marketing, fraud detection, identifying data breaches to enhance security, making existing processes more efficient, medical records to drug discovery and genetic disease exploration, and the list goes on.

Q02. What do you understand by the terms personalization, next best offer, next best action, and recommendation engines?
A02. Big data processing and machine learning techniques can be used for customer personalization. By gathering historical data from all users …


02: Cleansing & pre-processing data in BigData & machine learning with Spark interview questions & answers

Q1. Why are data cleansing & pre-processing important in analytics & machine learning?
A1. Garbage in gets you garbage out. No matter how good your machine learning algorithm is.

Q2. What are the general steps of cleansing data
A2. General steps involve Deduplication, dropping/imputing missing values, fixing structural errors, removing the outliers, encoding the categorical values and scaling down the features.

Step 1: The first step to data cleansing is removing unwanted observations from your dataset. This means removing duplicate (i.e. requires deduplication) & irrelevant (i.e. do not fit the specific problem) data from the dataset.

#1. Deduplication

Remove the …


03: Simple Linear Regression interview Q&As

Q01. What is a gradient? A01. In algebra we can represent a straight line with: y = mx + c A parabola is represented as: y = m1x2 + m2x + c, and so on. The diagram depicts the parabola y = x2. A gradient in maths is the slope…

04: Residuals, Cost/Loss functions, R-squared & Gradient Descent interview Q&As

Q01. What do you understand by the terms mean, variance, and standard deviation of the sample Vs. the population? A01. Given that the following are the number of job applications sent by 6 individuals:

Where X is the Sample. Mean: To calculate the mean we add up the observed…

05: Linear regression outputs, null hypothesis, t-test & p-value interview Q&As

Q1. How do you produce & interpret Linear Regression output? A1. Scatter plots can only detect obvious relationships between variables by looking at the graph, but we can use statistics to comment about the variable relationships as outlined below. The link 11A: Databricks – Spark ML – Pandas Dataframe &…

What do data analysts, engineers & scientists do?

In addition to the Data Analysts, the Data Engineers & Data Scientists must have the below know-hows.

300+ Java & Big Data FAQs - Quick Prep

Java & Big Data Tutorials