09: Docker Tutorial: Getting started with Hadoop Big Data on Cloudera quickstart

If you are not familiar with Docker get some hands-on experience at a series of step by step Docker tutorials with Java & Springboot examples.

Step 1: https://hub.docker.com/ is a docker repository from where you can pull & push images. You can search for the images. You can see some of the below steps at: https://hub.docker.com/r/cloudera/quickstart.

DockerHub - Cloudera Quickstart

DockerHub – Cloudera Quickstart

It will take a few minutes to download.

Step 2: Check the downloaded image with

Step 3: Create the container from the image “cloudera/quickstart”.

Step 4: List the running containers and get the container id.

You can see that 0.0.0.0:8888->8888/tcp, which means “0.0.0.0:8888” is the host ip address & port. You can also inspect the container by opening a new terminal and then type:

Where you can see:

HUE

HUE stands for Hadoop User Experience where you can browse the files in HDFS and run SQL like queries against Hive & Impala tables.

Step 5: Open a browser and type “http://0.0.0.0:8888” to open HUE GUI.

username: cloudera
password: cloudera

HDFS commands

You can run a number HDFS commands on a command-line as shown below. These commands are Unix like:

Step 6: On the original terminal you can practice the Hadoop commands like

You can also see this via the HUE GUI by clicking on the “File Browser” at the top right. Click on the “/user” to see the same folders as above.

mkdir
ls
touch

Create a file.

You can see what command options are available by typing:

Cloudera guide

You can look at the guide & examples via: “http://localhost:80” takes you to the Quickstart guide & tutorial.

Cloudera Manager

The Cloudera manager is not started by default. It requires at around 10 GB of RAM. Cloudera manager is a web UI to manage Hadoop cluster and services like Hive, Spark, Impala, HBase, etc. You can stop, start, and restart the services. You can modify the configuration values. You can monitor the jobs and their statuses.

Cloudera Manager - Client Configurations

Cloudera Manager – manage cluster services & configurations

http://localhost:7180/ to access the Cloudera manager. You can start all the services you require or only the services that you require, but it can consume lots of resources. Services may be in bad health due to lack of resources.

The Cloudera Manager Server is the master service that manages the data model of the entire cluster in a database. The data model contains information regarding the services, roles, and configurations assigned for each node in the cluster. You can also upgrade the services via parcels & packages.

Cloudera Manager - Server & Agents

Cloudera Manager – Server & Agents

CDH – stands for Cloudera Distribution Hadoop. CDH upgrades contain updated versions of the Hadoop software and other components. You can use Cloudera Manager to upgrade CDH for major, minor, and maintenance upgrades.

Do you want to open multiple terminal windows?

How to stop the conatiner?

How to remove all exited containers?

List all containers inclusive of stopped containers:

Remove all containers inclusive of stopped containers:

How to list all the images?

What is next?

In the next tutorials will drill into Cloudera Quickstart – Services, CLIs, config files, etc to get a good overview. This compliments ⏯ Getting started with BigData on Cloudera, which was on a Virtual Machine.

These tutorials are based on lighter Docker containers. 10: Docker Tutorial: BigData services & folders on Cloudera quickstart.


800+ Java & Big Data Interview Q&As

Top