10: Docker Tutorial: Hadoop Big Data services & folders on Cloudera quickstart

You can also install it on VMWare as illustrated on the ⏯ Getting started with BigData on Cloudera.

If you are not familiar with Docker get some hands-on experience at a series of step by step Docker tutorials with Java & Springboot examples.

This tutorial is based on 09: Docker Tutorial: Cloudera on Docker via DockerHub, where Cloudera Quickstart gets installed on Docker for learning purpose.

When the above command runs, you will see all the services that gets started:

zookeeper, journalnode, datanode, namenode, and secondary namenode

historyserver, nodemanager, resourcemanager,HBase master, rest, thrift, Hive Metastore, Hive Server2, and Sqoop Server

Spark history-server, HBase regionserver, hue, and Impala

All these services can be viewed via the “Cloudera Manager” admin console.

/etc/init.d services

init.d is the sub-directory of /etc directory in Linux file system. init.d basically contains the bunch of start/stop/reload/restart/status scripts which are used to control the Hadoop ecosystem daemons whilst the system is running or during boot. If you look at /etc/init.d then you will notice all the scripts for different services

netstat -anp to find ports

mysql runs on port 3306.

impalad runs on multiple ports, and 21050 for impalad front-end, 21000 for impalad impala-shell, 22000 is for back-end, and so on. You can check the Cloudera documentation for further details.

ps auxwww to find service run details

Hbase starts a number of services like master, region server, etc.

/var/log folder

This where the log files go.

Java & Python versions

/usr/bin folder

/usr/lib folder

/usr/jars folder

All the jars used above in “/usr/lib/…

/var/run/ or /run folder

Run-time variable data. You can get the “pid” (i.e process id). You will also know what services are running.

Examples from Cloudera quickstart

The jar shown below has a number of examples, and you can test your environment by running the MapReduce job as shown.

The jobs that are available in hadoop-mapreduce-example.jar:

Run a mapreduce job

Running the “pi” example MapReduce job:

Do you want to open multiple terminal windows?

How to stop the conatiner?

What is next?

In the next post let’s look at the CLIs like impala-shell, hdfs, hive, spark-shell, pyspark, etc.

800+ Java & Big Data Interview Q&As