01: Databricks getting started – Spark, Shell, SQL

Step 1:
Signup to Databricks community edition – https://databricks.com/try-databricks. Fill in the details and you can leave your mobile number blank. Select “COMMUNITY EDITION” ==> “GET STARTED“.

Databricks getting started

If you have a Cloud account then you can use it.

Step 2: Check your email and click the “link” in the email & reset your password.

Step 3: Login to Databricks notebook:

Databricks getting started dashboard

Step 4: Create a CLUSTER and it will take a few minutes to come up. This cluster will go down after 2 hours.

Databricks create a cluster

Step 5: Select “DATA“, and upload a file named “employee.csv”.

Databricks upload a .csv

Step 6:Create Table With UI” as shown below:

Note: Please check the “First row is header” check box on the LHS so that column names appear from the file.

Databricks create a table with .csv

Click on “Create Table“.

Step 7: Click on the “databricks” icon on the LHS menu, and then “Create a Blank Notebook“.

Databricks blank notebook

Spark in Python (i.e.PySpark)

Since we created the notebook as “python“, we don’t have to do “%python” as it is the default language. If you want to use “scala” then add “%scala” as the first line in a cell.

Step 8: Run the below PySpark code to display uploaded “/FileStore/tables/employee.csv” as a Dataframe.

Click on “Run” in the top “RHS” menu.

Databricks Notebook simple Dataframe

Use “Down Arrow” on in the RHS of a cell to create a new cell.

Step 9: Add a new column to the Dataframe in a separate cell and then Run.

Databricks notebook add new column to dataframe

Run a shell command

You can run a shell command with “%sh” as shown below:

You can click on the “+” in the middle to add new cells.

Databricks shell command

Run a SQL command

Databricks SQL command

Spark in Scala

Databricks Spark in Scala

Important: PySpark API

Have the PySpark API PySpark modules handy to code. You can click on “Dataframe” to see what functions are available. For example, withColumn function in the Dataframe module

Databricks PySpark Modules & API

Where are my notebooks saved?

Your notebooks will be saved under “workspace” ==> “users” ==> “[your username]”

What if you want to practice in Scala?

You an try the examples from Tutorials – Spark Scala on Zeppelin with some minor changes.

800+ Java & Big Data Interview Q&As