Modern data architectures will have both the Data Lakes and Data Warehouses. The Data Engineers build the data pipelines for the data analysts and scientists to build business reports & models to analyse the data.
Q01. What is a Data Lake?
A01.
…
Modern data architectures will have both the Data Lakes and Data Warehouses. The Data Engineers build the data pipelines for the data analysts and scientists to build business reports & models to analyse the data.
Q01. What is a Data Lake?
A01.
…
There are a number of technologies to ingest & run analytical queries over Big Data (i.e. large volume of data). Big Data is used in Business Intelligence (i.e. BI) reporting, Data Science, Machine Learning, and Artificial Intelligence (i.e. AI). Processing a large volume of data will be intensive on disk I/O,
…
Q1. What is the Lambda Architecture? A1. It is a data-processing architecture designed to handle Big Data by using both real-time streaming (e.g. Spark streaming, Apache Storm) and batch processing (E.g. Hive, Pig, Spark batch). This means you have to build 2 separate pipelines. … Read more ›...
This extends Q1 – Q6 Hadoop Overview & Architecture interview Q&As. Q7. What are the major machine roles in a Hadoop cluster? A7. The three major categories of machine roles in a Hadoop cluster are 1) Client machines. … Read more ›...
Q71. Can ETL in traditional data management (E.g. RDBMs) be migrated to EDH (i.e. Enterprise Data Hub) powered by Hadoop eco system? A71. Yes, it can be migrated, but it is not a direct & straight forward migration as there is a mismatch in underpinning concepts & … Read more...
A distributed system consists of multiple software components that are on multiple computers (aka nodes), but run as a single system. These components can be stateful, stateless, or serverless, and these components can be created in different languages running on hybrid environments and developing open-source technologies, open standards, and interoperability.
…
Q114. What does CAP stand for in CAP theorem?
A114. In a distributed system having two or more nodes, and maintaining two or more copies of your data for fault tolerance, the CAP theorem can be depicted & explained as below:
Consistency –
…