01A: Spark on Zeppelin – Docker pull from Docker hub

Pre-requisite: Docker is installed on your machine for Mac OS X (E.g. $ brew cask install docker) or Windows 10. Docker interview Q&As.

What is Apache Zeppelin?

Zeppelin is a web based notebook to execute arbitrary code in Scala, SQL, Spark, etc. You can mix languages. Apache Zeppelin helps data analysts, data scientist, and business users to get better understanding of data. As described below you can quickly explore data, create visualizations and share their insights, as web pages, with various stakeholders. For example

1) Prepare data using Shell by say downloading files with curl/wget, and then inject to HDFS.

2) Perform data analytics with Spark (i.e Scala) or pyspark (i.e. Python).

3) Perform simple visualizations in SQL.

4) Export the results with Shell, and publish to create graphs.

How to install Apache Zeppelin on Docker

Step 1: Go to the Docker Hub https://hub.docker.com/, which is the repository for the images that you can pull create isolated containers.

Step 2: Search for “Zeppelin“.

Docker Hub – Zeppelin

Step 3: Select “apache/zeppelin“. Click on “Dockerfile” and inspect what is getting installed FYI. Click on “Build details” to get the version or tag. For example “0.8.0” or 0.7.3.

Step 4: Pull this from the docker hub, and build the image with the following command.

This may take several minutes to download and create an image. Once done check the image with

Step 5: Run the above image to create a container with the following command.

You can open another terminal, and check if the container is up and running with:

Step 6: Go to a browser and type: “http://locahost:8080”.

Apache Zeppelin UI

Step 7: Select the link “Create new note”, and name it “Simple Spark with Scala” and select the interpreter as “spark”.

Type the following simple Spark code to add 1 to the given set of numbers.

Press the play button, and the output will be:

Spark using Scala

🔥 300+ Java Interview FAQs

Java & Big Data Tutorials