Blog Archives
1 2

09: Docker Tutorial: Getting started with Hadoop Big Data on Cloudera quickstart

If you are not familiar with Docker get some hands-on experience at a series of step by step Docker tutorials with Java & Springboot examples.

Step 1: https://hub.docker.com/ is a docker repository from where you can pull & push images.

Read more ›



10: Docker Tutorial: Hadoop Big Data services & folders on Cloudera quickstart

You can also install it on VMWare as illustrated on the ⏯ Getting started with BigData on Cloudera.

If you are not familiar with Docker get some hands-on experience at a series of step by step Docker tutorials with Java & Springboot examples.

Read more ›



11: Docker Tutorial: Hadoop Big Data CLIs on Cloudera quickstart

There are a number of CLIs that you can run from the edge node, which is the gateway node to the Hadoop cluster consisting of master & slave (aka worker) nodes. Let’s look at the different CLIs (i.e Command Line Interfaces)

Most of the CLIs listed below are in /usr/bin

hdfs CLI

The following command will give you the commands you can use with “hdfs”

Read more ›



12: Docker Tutorial: Hadoop Big Data configuration files on Cloudera quickstart

The “etc” (i.e etcetera) folder is mainly for configuration files. It’s purpose is to host various configuration files. For instance, to add a new hard drive to your system and have Linux auto-mount it on boot, you’d have to edit /etc/fstab. Key “Hadoop Cluster Configuration” files are:

hadoop-env.sh

This file specifies environment variables that affect the JDK used by Hadoop Daemon (i.e.

Read more ›



13: Docker Tutorial: Apache Spark (spark-shell & pyspark) on Cloudera quickstart

This extends Docker Tutorial: BigData on Cloudera quickstart via Docker. Step 1: Run the container on a command line. Step 2: Create a text file “simple.txt” in the local file system. Step 3: Copy this file onto HDFS file system. … Read more ›...



14: Docker Tutorial: Hive (via beeline) on Cloudera quickstart

This extends Docker Tutorial: BigData on Cloudera quickstart via Docker. Step 1: Run the container on a command line. Step 2: Create a csv file “employee.csv” in the local file system. Step 3: Copy this file onto HDFS file system. … Read more ›...



15: Docker Tutorial: Hive & parquet-tools – csv to parquet on Cloudera quickstart

CSV is a row based storage, and Parquet is columnar in nature, and it is designed from the ground up for efficient storage, compression and encoding, which gives better performance. Run the cloudera/quickstart This extends Docker Tutorial: BigData on Cloudera quickstart via Docker. … Read more ›...



16: Docker Tutorial: Apache Spark (spark-shell) & parquet-tools – csv to parquet on Cloudera quickstart

This extends Docker Tutorial: BigData on Cloudera quickstart via Docker. Step 1: Run the container on a command line. Get parquet-tools Step 2: Install wget. The “uname -a” gets you the info of the kernel. … Read more ›...



17: Docker Tutorial: sqoop import – on Cloudera quickstart

This extends Docker Tutorial: BigData on Cloudera quickstart via Docker. Step 1: Run the container on a command line. Connect to MySQL DB Step 2: Hive & Impala store the metadata in a database like MySQL. The Hive metastore service connects to the metastore Database to store metadata. … Read...



18: Docker Tutorial: sqoop export – on Cloudera quickstart

This extends Docker Tutorial: BigData on Cloudera quickstart via Docker. Step 1: Run the container on a command line. Connect to MySQL DB Step 2: Hive & Impala store the metadata in a database like MySQL. The Hive metastore service connects to the metastore Database to store metadata. … Read...



19: Docker Tutorial: Apache Spark SQL – on Cloudera quickstart

This extends Docker Tutorial: BigData on Cloudera quickstart via Docker.

Step 1: Run the container on a command line.

Create a .csv in HDFS

Step 2: Create a csv file “employee.csv” in the local file system.

Read more ›



20: Docker Tutorial: Apache Spark (spark-submit) in Java on Cloudera quickstart

This extends Docker Tutorial: BigData on Cloudera quickstart via Docker. Step 1: Run the container on a command line. Step 2: Java is already installed as part of the “cloudera/quickstart” docker image. update-alternatives If you have multiple versions or installations of Java, … Read more ›...



21: Docker Tutorial: Apache Spark (spark-submit) in Scala on Cloudera quickstart

Extends 20: Docker Tutorial: Apache Spark-submit in Java – on Cloudera quickstart, and Docker Tutorial: BigData on Cloudera quickstart via Docker. Step 1: Run the container on a command line. Install Java 8 Step 2: Install Java 8. … Read more ›...



1 2

800+ Java Q&As & tutorials

Top