Blog Archives

01A: Spark on Zeppelin – Docker pull from Docker hub

Pre-requisite: Docker is installed on your machine for Mac OS X (E.g. $ brew cask install docker) or Windows 10. Docker interview Q&As.

What is Apache Zeppelin?

Zeppelin is a web based notebook to execute arbitrary code in Scala, SQL, Spark, etc. You can mix languages. Apache Zeppelin helps data analysts, data scientist, and business users to get better understanding of data.… Read more ...

Tags:

01B: Spark on Zeppelin – custom Dockerfile

Pre-requisite: Docker is installed on your machine for Mac OS X (E.g. $ brew cask install docker) or Windows 10. Docker interview Q&As.

What is Apache Zeppelin?

Zeppelin is a web based notebook to execute arbitrary code in Scala, SQL, Spark, etc. You can mix languages. Apache Zeppelin helps data analysts, data scientist, and business users to get better understanding of data.… Read more ...



02: Spark on Zeppelin – read a file from local file system

Pre-requisite: Docker is installed on your machine for Mac OS X (E.g. $ brew cask install docker) or Windows 10. Docker interview Q&As. This extends setting up Apache Zeppelin Notebook.

Step 1: Pull this from the docker hub, and build the image with the following command.

You can verify the image with the “docker images” command.… Read more ...



03: Spark on Zeppelin – DataFrame Operations in Scala

Pre-requisite: Docker is installed on your machine for Mac OS X (E.g. $ brew cask install docker) or Windows 10. Docker interview Q&As.

This tutorial extends Apache Zeppelin on Docker Tutorial – Docker pull from Docker hub and Spark stand-alone to read a file from local file system

Read more ...


04: Spark on Zeppelin – DataFrame joins in Scala

This tutorial extends the series: Spark on Apache Zeppelin Tutorials.

1. Create “Orders” DataFrame

Read more ...


05: Spark on Zeppelin – semi-structured log file

This tutorial extends the series: Spark on Apache Zeppelin Tutorials. Step 1: Pull apache/zeppelin image from the docker hub, and build the image with the following command.

“docker images” will show the image that was created.… Read more ...



06: Spark on Zeppelin – RDD operation zipWithIndex

Pre-requisite: Docker is installed on your machine for Mac OS X (E.g. $ brew cask install docker) or Windows 10. Docker interview Q&As. This extends setting up Apache Zeppelin Notebook. Q. Why do we need zipWithIndex? A. In database world there are various instances where we want to assign a…

Read more ...


07: Spark on Zeppelin – window functions in Scala

Pre-requisite: Docker is installed on your machine for Mac OS X (E.g. $ brew cask install docker) or Windows 10. Docker interview Q&As. This extends setting up Apache Zeppelin Notebook. Q. What are the different types of functions in Spark SQL? A. There are 4 types of functions: 1) Built-in…

Read more ...


08: Spark on Zeppelin – convert DataFrames to RDD[Row] and RDD[Row] to DataFrame

Pre-requisite: Docker is installed on your machine for Mac OS X (E.g. $ brew cask install docker) or Windows 10. Docker interview Q&As. This extends setting up Apache Zeppelin Notebook. Important: It is not a best practice to mutate values or to use RDD directly as opposed to using Dataframes….

Read more ...


09: Spark on Zeppelin – convert DataFrames to RDD and RDD to DataFrame

Pre-requisite: Docker is installed on your machine for Mac OS X (E.g. $ brew cask install docker) or Windows 10. Docker interview Q&As. This extends setting up Apache Zeppelin Notebook. Important: It is not a best practice to mutate values or to use RDD directly as opposed to using Dataframes….

Read more ...


10: Spark on Zeppelin – union, udf and explode

Pre-requisite: Docker is installed on your machine for Mac OS X (E.g. $ brew cask install docker) or Windows 10. Docker interview Q&As. This extends setting up Apache Zeppelin Notebook. Step 1: Pull this from the docker hub, and build the image with the following command.

You can verify…

Read more ...


11: Spark on Zeppelin – Dataframe groupBy, collect_list, explode & window

Pre-requisite: Docker is installed on your machine for Mac OS X (E.g. $ brew cask install docker) or Windows 10. Docker interview Q&As. This extends setting up Apache Zeppelin Notebook. Step 1: Pull this from the docker hub, and build the image with the following command.

You can verify…

Read more ...


12: Spark on Zeppelin – Dataframe pivot

Pre-requisite: Docker is installed on your machine for Mac OS X (E.g. $ brew cask install docker) or Windows 10. Docker interview Q&As. This extends setting up Apache Zeppelin Notebook. Step 1: Pull this from the docker hub, and build the image with the following command.

You can verify…

Read more ...


13: Spark on Zeppelin – Dataframe date & timestamp

Pre-requisite: Docker is installed on your machine for Mac OS X (E.g. $ brew cask install docker) or Windows 10. Docker interview Q&As. This extends setting up Apache Zeppelin Notebook. Step 1: Pull this from the docker hub, and build the image with the following command.

You can verify…

Read more ...


500+ Java Interview FAQs

Java & Big Data Tutorials

Top