Blog Archives
1 2

01: Databricks getting started – PySpark, Shell, and SQL


Step 1:
Signup to Databricks community edition – https://databricks.com/try-databricks. Fill in the details and you can leave your mobile number blank. Select “COMMUNITY EDITION” ==“GET STARTED“.

If you have a Cloud account then you can use it.

Read more ›

Tags:

02: Databricks – Spark schemas, casting & PySpark API

Prerequisite: Extends Databricks getting started – Spark, Shell, SQL.

Q: What is a Dataframe?
A: A DataFrame is a data abstraction or a domain-specific language (DSL) for working with structured and semi-structured data,

Read more ›

Tags:

03: Databricks – Spark SCD Type 1

Prerequisite: Extends Databricks getting started – Spark, Shell, SQL.

What is SCD Type 1

SCD stands for Slowly Changing Dimension, and it was explained in 10 Data warehouse interview Q&As.

Read more ›

Tags:

04: Databricks – Spark SCD Type 2

Prerequisite: Extends 03: Databricks – Spark SCD Type 1. What is SCD Type 2 SCD stands for Slowly Changing Dimension, and it was explained in 10 Data warehouse interview Q&As. … Read more ›...



04a: Databricks – Spark SCD Type 1 with Merge

Prerequisite: Extends 03: Databricks – Spark SCD Type 1. What is SCD Type 1 SCD stands for Slowly Changing Dimension, and it was explained in 10 Data warehouse interview Q&As. … Read more ›...



04b: Databricks – Spark SCD Type 2 with Merge

Prerequisite: … Read more ›...



05: Databricks – Spark UDFs

Prerequisite: Extends Databricks getting started – Spark, Shell, SQL.

What is a UDF?

User-Defined Functions (aka UDF) is a feature of Spark SQL to define new Column-based functions that extend the vocabulary of Spark SQL’s DSL for transforming Datasets.

Read more ›



06: Databricks – Spark Window functions

Prerequisite: Extends Databricks getting started – Spark, Shell, SQL. What is a window function? Q. What are the different types of functions in Spark SQL? A. There are 4 types of functions: 1) Built-in functions: from org.apache.spark.sql. … Read more ›...



07: Databricks – groupBy, collect_list & explode

Prerequisite: Extends Databricks – Spark Window functions. Step 1: Create a new Python notebook, and attach it to a cluster. Step 2: Let’s create some data using pyspark. Output: agg( ) &… Read more ›...



08: Databricks – Spark problem 1

Prerequisite: Extends Databricks – Spark Window functions.

Problem: Convert the below table

to

Where each column is counted for its occurrence. For example, the emp_city column values “Melbourne”

Read more ›

Tags:

09: Databricks – Spark Problem 2

Prerequisite: Extends Databricks – Spark problem 1. Problem: Convert the below table to Step 1: Create a new Python notebook, and attach it to a new cluster. … Read more ›...



10: Databricks – Spark ML – Linear Regression

Prerequisite: Extends Databricks getting started – Spark, Shell, SQL.

You can try these tutorials in Scala using Databricks Notebook. There are Scala tutorials covered in Spark using Scala on Zeppelin Notebook.

Problem statement: Predict the land prices by land area in square feet based on a given set of known prices.

Read more ›

Tags:

11: Databricks – Spark ML – Multivariate Linear Regression

Prerequisite: Extends Databricks – Spark ML – Linear Regression. Problem statement: Predict the house prices by land area in square feet, no of bedrooms, and how old the house is?, which is the age. … Read more ›...



11A: Databricks – Spark ML – Pandas Dataframe & Matplotlib

Prerequisite: Extends 11: Databricks – Spark ML – Multivariate Linear Regression. How do you convert Pyspark dataframe to Pndas Dataframe? df.toPandas() converts Pyspark Dataframe to Pandas Dataframe. Output: How to perform statistical data exploration?… Read more ›...



12: Databricks – Spark ML – Categorical Features

Prerequisite: Extends Databricks – Spark ML – Linear Regression. Problem statement: Predict the house prices by land area in square feet, house condition as in “Bad”, “Average”, and “Good”, and house locality as in “Sydney”, … Read more ›...



13: Databricks – Spark ML – Dummy Variables

Prerequisite: Extends Databricks – Spark ML – Categorical Features – Linear Regression. Problem statement: Predict the house prices by land area in square feet, house color as in “White”, “Grey”, and “Cream”, and house locality as in “Eastwood”, … Read more ›...



14: Databricks – Spark ML – StringIndexer & OneHotEncoder

Prerequisite: Extends Databricks – Spark ML – Linear Regression Categorical Features. Problem statement: Predict the house prices by land area in square feet, house condition as in “Bad”, “Average”, and “Good”, and house locality as in “Sydney”, … Read more ›...



15: Databricks – Spark ML – Classification with Logistic Regression

Prerequisite: Extends Spark ML – StringIndexer & OneHotEncoder – LinearRegression. Q. What is a Classification type prediction? How does it differ from Linear Regression? A. Classification type predictions are 1) Is Email spam or not? … Read more ›...



16: Databricks – Spark ML Multiclass Logistic Regression & Pipeline

Prerequisite: Extends 15: Databricks – Spark ML – Classification with Logistic Regression & Databricks – Spark ML – StringIndexer & OneHotEncoder. Problem statement: Predict the likelihood of leaning towards a political party based on age & … Read more ›...



17: Databricks – Spark ML K-Folds Cross Validation

Prerequisite: Extends Databricks – Spark ML Multiclass Logistic Regression & Pipeline. Q: What is K-Folds Cross Validation in ML? How does it differ from the train-split validation? A: Once we are done with training our model, … Read more ›...



1 2

300+ Java & Big Data Interview FAQs

Java & Big Data Tutorials

Top