Q01. Can you summarise the Spark eco system?
A01. Apache Spark is a general purpose cluster computing system. It provides high-level API in Java, Scala, Python, and R. It has 6 …
Q01. Can you summarise the Spark eco system?
A01. Apache Spark is a general purpose cluster computing system. It provides high-level API in Java, Scala, Python, and R. It has 6 …
Modern data architectures will have both the Data Lakes & Data Warehouses. Q1. What questions do you need to ask for choosing a Data Warehouse over a Data Lake for your …
There are a number of technologies to ingest & run analytical queries over Big Data (i.e. large volume of data). Big Data is used in Business Intelligence (i.e. BI) reporting, Data …
Q1. What is the Lambda Architecture? A1. It is a data-processing architecture designed to handle Big Data by using both real-time streaming (e.g. Spark streaming, Apache Storm) and batch processing (E.g. …
Q01. How is Big Data used in industries?
A01. The main goal for most organisations is to enhance customer experience, and consequently increase sales. The other goals include cost reduction, better …
Q1. Why are data cleansing & pre-processing important in analytics & machine learning? A1. Garbage in gets you garbage out. No matter how good your machine learning algorithm is. Q2. What …
This extends Q1 – Q6 Hadoop Overview & Architecture interview Q&As. Q7. What are the major machine roles in a Hadoop cluster? A7. The three major categories of machine roles in …
This extends 02: Hadoop overview & architecture interview Q&As. Q16. What is MapReduce (i.e MR)? A16. MapReduce is a parallel programming model used for processing large datasets across 10 to 1000 …
Q01. What is a gradient? A01. In algebra we can represent a straight line with: y = mx + c A parabola is represented as: y = m1x2 + m2x + …
Q01. What do you understand by the terms mean, variance, and standard deviation of the sample Vs. the population? A01. Given that the following are the number of job applications sent …
Q1. What is an ETL process? A1. ETL is a architectural style, and it stands for Extract, Transform and Load. Extract does the process of reading data from an input data …
Q1. How do you produce & interpret Linear Regression output? A1. Scatter plots can only detect obvious relationships between variables by looking at the graph, but we can use statistics to …
Q37. Where do use Apache Flume in the BigData world? A37. Apache Flume is used to ingest big data into HDFS. BigData is generally ingested from 1) Sporadic bulk loading processes, …