Blog Archives

12: Q92 – Q97 Hadoop file formats and how to choose

Q92. What are the criteria for choosing storage file formats in Hadoop? A92. Choosing the wrong file formats can significantly increase the query times and storage spaces. Choosing a format that does not support flexible schema evolution may cost massive re-processing just to add a new field. Here the key…...

Members Only Content

This content is for the members with any one of the following paid subscriptions:

30-Day-Java-JEE-Career-Companion, 90-Day-Java-JEE-Career-Companion, 180-Day-Java-JEE-Career-Companion, 365-Day-Java-JEE-Career-Companion and 2-Year-Java-JEE-Career-Companion Log In | Register | Try free FAQs | Home
Posted in Hadoop, Spark & BigData Q&As, member-paid

11: Q88 – Q91 Read-Write Vs Append-Only File Systems

Q88. How will you modify a portion of an HDFS file? A88. HDFS is an “append-only” file system. The most common use case of Hadoop data ingestion is to append new sets of event-based and/or sub-transactional data. The large data processing applications are typically built around the premise that things…...

Members Only Content

This content is for the members with any one of the following paid subscriptions:

30-Day-Java-JEE-Career-Companion, 90-Day-Java-JEE-Career-Companion, 180-Day-Java-JEE-Career-Companion, 365-Day-Java-JEE-Career-Companion and 2-Year-Java-JEE-Career-Companion Log In | Register | Try free FAQs | Home
Posted in Hadoop, Spark & BigData Q&As, member-paid

10: ♥ Q80 – Q87 HBase Schema Design Interview Questions & Answers

Q80. Why is schema design for HBase is different from relational database design? A80. HBase is a columnar NoSQL database. This means no two rows in a table need to have the same columns. In a columnar database table, each row contains cells containing

Values are stored against each

Read more ›

Posted in Hadoop, Spark & BigData Q&As, NoSQL

09: ♦ Q76– Q79 BigData Partitioning, Bucketing & De-normalization Q&As

This extends Q71 – Q75 ETL or ELT on Hadoop Eco System Interview Q&As. Q76. What are the 3 key considerations in processing data in HDFS compared to traditional RDBMS or EDW (i.e. Enterprise Data Warehouse)?AQ76. 1) partitioning, 2) bucketing, and 3) denormalization. Q77. Can you explain the importance of…...

Members Only Content

This content is for the members with any one of the following paid subscriptions:

30-Day-Java-JEE-Career-Companion, 90-Day-Java-JEE-Career-Companion, 180-Day-Java-JEE-Career-Companion, 365-Day-Java-JEE-Career-Companion and 2-Year-Java-JEE-Career-Companion Log In | Register | Try free FAQs | Home
Posted in Hadoop, Spark & BigData Q&As, member-paid

08: ♦ Q71 – Q75 ETL/ELT on BigData Interview Q&As

Q71. Can ETL in traditional data management (E.g. RDBMs) be migrated to EDH (i.e. Enterprise Data Hub) powered by Hadoop eco system? A71. Yes, it can be migrated, but it is not a direct & straight forward migration as there is a mismatch in underpinning concepts & technologies between RDBMs…...

Members Only Content

This content is for the members with any one of the following paid subscriptions:

30-Day-Java-JEE-Career-Companion, 90-Day-Java-JEE-Career-Companion, 180-Day-Java-JEE-Career-Companion, 365-Day-Java-JEE-Career-Companion and 2-Year-Java-JEE-Career-Companion Log In | Register | Try free FAQs | Home
Posted in Hadoop, Spark & BigData Q&As, member-paid

13: Q98 – Q104 Hive Basics Interview Q&As and Tutorial

Q98. What is Hive? A98. Hive is used for accessing and analyzing data in Hadoop using SQL syntax. It is known as the HiveQL. Q99. What is the difference between Hive internal tables & external tables? A99. When you drop an internal table, it drops the data, and it also…...

Members Only Content

This content is for the members with any one of the following paid subscriptions:

30-Day-Java-JEE-Career-Companion, 90-Day-Java-JEE-Career-Companion, 180-Day-Java-JEE-Career-Companion, 365-Day-Java-JEE-Career-Companion and 2-Year-Java-JEE-Career-Companion Log In | Register | Try free FAQs | Home
Posted in Hadoop, Spark & BigData Q&As, Hive Tutorial & Q&As, member-paid

05: Q37-Q41 – Data lake & metadata interview questions & answers

Q37. What is a Data Lake? A37. A data lake is a storage repository that holds a vast amount of structured, semi-structured, and unstructured raw data in its native format (aka pristine condition). The data structure and requirements are not defined until the data is needed. You can also call…...

Members Only Content

This content is for the members with any one of the following paid subscriptions:

30-Day-Java-JEE-Career-Companion, 90-Day-Java-JEE-Career-Companion, 180-Day-Java-JEE-Career-Companion, 365-Day-Java-JEE-Career-Companion and 2-Year-Java-JEE-Career-Companion Log In | Register | Try free FAQs | Home
Posted in Hadoop, Spark & BigData Q&As, member-paid

07: Q62 – Q70 HDFS blocks Vs. splits & Spark partitions Interview Questions & Answers

Q62. Can you explain the difference between HDFS blocks and input splits? A62. A block is a physical representation of data, and a Split is a logical division of your data or records. For example, an input split might split a large text file at the end of a record…...

Members Only Content

This content is for the members with any one of the following paid subscriptions:

30-Day-Java-JEE-Career-Companion, 90-Day-Java-JEE-Career-Companion, 180-Day-Java-JEE-Career-Companion, 365-Day-Java-JEE-Career-Companion and 2-Year-Java-JEE-Career-Companion Log In | Register | Try free FAQs | Home
Posted in Hadoop, Spark & BigData Q&As, member-paid
Page 1 of 212

800+ Interview Q&As – ♥ Free | ♦ FAQs

open all | close all

Pressed for time? 200+ Quick Prep

open all | close all

16 Technical Key Areas to be a top-notch

open all | close all

100+ Java Tutorials – Step by step

open all | close all

100+ Java Coding Exercises

open all | close all

How good are your

open all | close all