03: Spark on Zeppelin – DataFrame Operations in Scala

Pre-requisite: Docker is installed on your machine for Mac OS X (E.g. $ brew cask install docker) or Windows 10. Docker interview Q&As.

This tutorial extends Apache Zeppelin on Docker Tutorial – Docker pull from Docker hub and Spark stand-alone to read a file from local file system

1. Print the schema of the Dataframe

|– id: integer (nullable = true)
|– name: string (nullable = true)
|– location: string (nullable = true)
|– salary: double (nullable = true)

2. Show contents of a Dataframe

3. Count number of rows in a Dataframe

res12: Long = 6

4. Add a new column to a Dataframe

You can drop a column with “dfEmployees.drop(“location”).show()”

5. Select a few columns from a Dataframe

6. Distinct values

7. Sorting

8. Applying SQL queries

9. Filtering by a predicate

10. Grouping & aggregation

11. Map operations on Dataframe columns

We can apply a function on each row of DataFrame using map operation.

12. Get some stats on your data

800+ Java & Big Data Interview Q&As