Blog Archives

09: Docker Tutorial: Getting started with Hadoop Big Data on Cloudera quickstart

If you are not familiar with Docker get some hands-on experience at a series of step by step Docker tutorials with Java & Springboot examples.

Step 1: https://hub.docker.com/ is a docker repository from where you can pull & push images. You can search for the images. You can see some of the below steps at: https://hub.docker.com/r/cloudera/quickstart



10: Docker Tutorial: Hadoop Big Data services & folders on Cloudera quickstart

You can also install it on VMWare as illustrated on the ⏯ Getting started with BigData on Cloudera.

If you are not familiar with Docker get some hands-on experience at a series of step by step Docker tutorials with Java & Springboot examples.

This tutorial is based on 09: Docker Tutorial: Cloudera on Docker via DockerHub, where Cloudera Quickstart gets installed on Docker for learning purpose.…



11: Docker Tutorial: Hadoop Big Data CLIs on Cloudera quickstart

There are a number of CLIs that you can run from the edge node, which is the gateway node to the Hadoop cluster consisting of master & slave (aka worker) nodes. Let’s look at the different CLIs (i.e Command Line Interfaces)

Most of the CLIs listed below are in /usr/bin

hdfs CLI

The following command will give you the commands you can use with “hdfs”

dfs” – run a filesystem command on the file systems supported in Hadoop.…



12: Docker Tutorial: Hadoop Big Data configuration files on Cloudera quickstart

The “etc” (i.e etcetera) folder is mainly for configuration files. It’s purpose is to host various configuration files. For instance, to add a new hard drive to your system and have Linux auto-mount it on boot, you’d have to edit /etc/fstab. Key “Hadoop Cluster Configuration” files are:



13: Docker Tutorial: Apache Spark (spark-shell & pyspark) on Cloudera quickstart

This extends Docker Tutorial: BigData on Cloudera quickstart via Docker. Step 1: Run the container on a command line.



14: Docker Tutorial: Hive (via beeline) on Cloudera quickstart

This extends Docker Tutorial: BigData on Cloudera quickstart via Docker. Step 1: Run the container on a command line.



15: Docker Tutorial: Hive & parquet-tools – csv to parquet on Cloudera quickstart

CSV is a row based storage, and Parquet is columnar in nature, and it is designed from the ground up for efficient storage, compression and encoding, which gives better performance….



16: Docker Tutorial: Apache Spark (spark-shell) & parquet-tools – csv to parquet on Cloudera quickstart

This extends Docker Tutorial: BigData on Cloudera quickstart via Docker. Step 1: Run the container on a command line.



17: Docker Tutorial: sqoop import – on Cloudera quickstart

This extends Docker Tutorial: BigData on Cloudera quickstart via Docker. Step 1: Run the container on a command line.



18: Docker Tutorial: sqoop export – on Cloudera quickstart

This extends Docker Tutorial: BigData on Cloudera quickstart via Docker. Step 1: Run the container on a command line.



800+ Java & Big Data Interview Q&As

200+ Java & Big Data Tutorials

Top