Blog Archives

01: Databricks getting started – Spark, Shell, SQL


Step 1:
Signup to Databricks community edition – https://databricks.com/try-databricks. Fill in the details and you can leave your mobile number blank. Select “COMMUNITY EDITION” ==“GET STARTED“.

If you have a Cloud account then you can use it.

Step 2: Check your email and click the “link” in the email & reset your password.…



02: Databricks – Spark schemas, casting & PySpark API

Prerequisite: Extends Databricks getting started – Spark, Shell, SQL.

Q: What is a Dataframe?
A: A DataFrame is a data abstraction or a domain-specific language (DSL) for working with structured and semi-structured data, i.e. datasets with a schema. Dataframes are immutable, stored in memory, resilient (i.e. fault-tolerant), distributed (i.e.…



03: Databricks – Spark SCD Type 1

Prerequisite: Extends Databricks getting started – Spark, Shell, SQL.

What is SCD Type 1

SCD stands for Slowly Changing Dimension, and it was explained in 10 Data warehouse interview Q&As.

Step 1: Remove all cells in the notebook with the “x” and then confirm or create a new Python notebook.…



04: Databricks – Spark SCD Type 2

Prerequisite: Extends 03: Databricks – Spark SCD Type 1. What is SCD Type 2 SCD stands for Slowly Changing Dimension, and it was explained in 10 Data warehouse interview Q&As….



04a: Databricks – Spark SCD Type 1 with Merge

Prerequisite: Extends 03: Databricks – Spark SCD Type 1. What is SCD Type 1 SCD stands for Slowly Changing Dimension, and it was explained in 10 Data warehouse interview Q&As….



04b: Databricks – Spark SCD Type 2 with Merge

Prerequisite:…



05: Databricks – Spark UDFs

Prerequisite: Extends Databricks getting started – Spark, Shell, SQL.

What is a UDF?

User-Defined Functions (aka UDF) is a feature of Spark SQL to define new Column-based functions that extend the vocabulary of Spark SQL’s DSL for transforming Datasets.

Step 1: Create a new Notebook in Databricks, and choose Python as the language.…



06: Databricks – Spark Window functions

Prerequisite: Extends Databricks getting started – Spark, Shell, SQL. What is a window function? Q. What are the different types of functions in Spark SQL? A. There are 4 types…



07: Databricks – groupBy, collect_list & explode

Prerequisite: Extends Databricks – Spark Window functions. Step 1: Create a new Python notebook, and attach it to a cluster. Step 2: Let’s create some data using pyspark.



08: Databricks – Spark problem 1

Prerequisite: Extends Databricks – Spark Window functions.

Problem: Convert the below table

to

Where each column is counted for its occurrence.…



800+ Java & Big Data Interview Q&As

200+ Java & Big Data Tutorials

Top