Pre-requisite: Docker is installed on your machine for Mac OS X (E.g. $ brew cask install docker) or Windows 10. Docker interview Q&As.
What is Apache Zeppelin?
Zeppelin is a web based notebook to execute arbitrary code in Scala, SQL, Spark, etc. You can mix languages. Apache Zeppelin helps data analysts, data scientist, and business users to get better understanding of data. As described below you can quickly explore data, create visualizations and share their insights, as web pages, with various stakeholders. For example
1) Prepare data using Shell by say downloading files with curl/wget, and then inject to HDFS.
2) Perform data analytics with Spark (i.e Scala) or pyspark (i.e. Python).
3) Perform simple visualizations in SQL.
4) Export the results with Shell, and publish to create graphs.
How to install Apache Zeppelin on Docker
Step 1: Go to the Docker Hub https://hub.docker.com/, which is the repository for the images that you can pull create isolated containers.
Step 2: Search for “Zeppelin“.
Step 3: Select “apache/zeppelin“. Click on “Dockerfile” and inspect what is getting installed FYI. Click on “Build details” to get the version or tag. For example “0.8.0” or 0.7.3.
Step 4: Pull this from the docker hub, and build the image with the following command.
1 2 | $ docker pull apache/zeppelin:0.8.0 |
This may take several minutes to download and create an image. Once done check the image with
1 2 | $ docker images |
1 2 3 | REPOSITORY TAG IMAGE ID CREATED SIZE apache/zeppelin 0.8.0 353d7641c769 2 weeks ago 2.58GB |
Step 5: Run the above image to create a container with the following command.
1 2 | $ docker run -it -p 8080:8080 apache/zeppelin:0.8.0 |
You can open another terminal, and check if the container is up and running with:
1 2 | $ docker ps |
1 2 3 | CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES b0461edaa8f0 apache/zeppelin:0.8.0 "/usr/bin/tini -- bi…" 24 minutes ago Up 24 minutes 0.0.0.0:8080->8080/tcp confident_panini |
Step 6: Go to a browser and type: “http://locahost:8080”.
Step 7: Select the link “Create new note”, and name it “Simple Spark with Scala” and select the interpreter as “spark”.
Type the following simple Spark code to add 1 to the given set of numbers.
1 2 3 | val primitiveDS = Seq(1, 2, 3).toDS() primitiveDS.map(_ + 1).collect() // Returns: Array(2, 3, 4) |
Press the play button, and the output will be:
1 2 | Array(2, 3, 4) |