Blog Archives
1 2

01: Databricks getting started – Spark, Shell, SQL


Step 1:
Signup to Databricks community edition – https://databricks.com/try-databricks. Fill in the details and you can leave your mobile number blank. Select “COMMUNITY EDITION” ==“GET STARTED“.

If you have a Cloud account then you can use it.

Read more ›



02: Databricks – Spark schemas, casting & PySpark API

Prerequisite: Extends Databricks getting started – Spark, Shell, SQL.

Q: What is a Dataframe?
A: A DataFrame is a data abstraction or a domain-specific language (DSL) for working with structured and semi-structured data,

Read more ›



03: Databricks – Spark SCD Type 1

Prerequisite: Extends Databricks getting started – Spark, Shell, SQL.

What is SCD Type 1

SCD stands for Slowly Changing Dimension, and it was explained in 10 Data warehouse interview Q&As.

Read more ›



04: Databricks – Spark SCD Type 2

Prerequisite: Extends 03: Databricks – Spark SCD Type 1. What is SCD Type 2 SCD stands for Slowly Changing Dimension, and it was explained in 10 Data warehouse interview Q&As. … Read more ›...



04a: Databricks – Spark SCD Type 1 with Merge

Prerequisite: Extends 03: Databricks – Spark SCD Type 1. What is SCD Type 1 SCD stands for Slowly Changing Dimension, and it was explained in 10 Data warehouse interview Q&As. … Read more ›...



04b: Databricks – Spark SCD Type 2 with Merge

Prerequisite: … Read more ›...



05: Databricks – Spark UDFs

Prerequisite: Extends Databricks getting started – Spark, Shell, SQL.

What is a UDF?

User-Defined Functions (aka UDF) is a feature of Spark SQL to define new Column-based functions that extend the vocabulary of Spark SQL’s DSL for transforming Datasets.

Read more ›



06: Databricks – Spark Window functions

Prerequisite: Extends Databricks getting started – Spark, Shell, SQL. What is a window function? Q. What are the different types of functions in Spark SQL? A. There are 4 types of functions: 1) Built-in functions: from org.apache.spark.sql. … Read more ›...



07: Databricks – groupBy, collect_list & explode

Prerequisite: Extends Databricks – Spark Window functions. Step 1: Create a new Python notebook, and attach it to a cluster. Step 2: Let’s create some data using pyspark. Output: agg( ) &… Read more ›...



08: Databricks – Spark problem 1

Prerequisite: Extends Databricks – Spark Window functions. Problem: Convert the below table to Where each column is counted for its occurrence. For example, the emp_city column values “Melbourne” & “Sydney” cities occur twice. … Read more ›...



09: Databricks – Spark Problem 2

Prerequisite: Extends Databricks – Spark problem 1. Problem: Convert the below table to Step 1: Create a new Python notebook, and attach it to a new cluster. Step 2: Let’s create some data using pyspark. … Read more ›...



10: Databricks – Spark ML – Linear Regression

Prerequisite: Extends Databricks getting started – Spark, Shell, SQL.

You can try these tutorials in Scala using Databricks Notebook. There are Scala tutorials covered in Spark using Scala on Zeppelin Notebook.

Problem statement: Predict the land prices by land area in square feet based on a given set of known prices.

Read more ›



11: Databricks – Spark ML – Multivariate Linear Regression

Prerequisite: Extends Databricks – Spark ML – Linear Regression. Problem statement: Predict the house prices by land area in square feet, no of bedrooms, and how old the house is?, which is the age. … Read more ›...



1 2

800+ Java Q&As & tutorials

Top