19: Docker Tutorial: Apache Spark SQL – on Cloudera quickstart

This extends Docker Tutorial: BigData on Cloudera quickstart via Docker.

Step 1: Run the container on a command line.

Create a .csv in HDFS

Step 2: Create a csv file “employee.csv” in the local file system.

Step 3: Copy this file onto HDFS file system.

Start spark-shell & create tables

Step 4: Start spark-shell.

Step 5: Create tables via Spark SQL.

Load data

Step 6: Load data from HDFS into the table via Spark SQL.

Create a Dataframe

Step 7: Create a Dataframe via Spark SQL, and show the results.

Filter by name

Step 8: Using the “filter” on a Dataframe. Please note “===” is used as opposed to “==” or “=”.

Save Dataframe as a table

Categories Menu - Q&As, FAQs & Tutorials