Blog Archives
1 2 3 4

09: Docker Tutorial: Getting started with Hadoop Big Data on Cloudera quickstart

If you are not familiar with Docker get some hands-on experience at a series of step by step Docker tutorials with Java & Springboot examples.

Step 1: https://hub.docker.com/ is a docker repository from where you can pull & push images.

Read more ›



10: Docker Tutorial: Hadoop Big Data services & folders on Cloudera quickstart

You can also install it on VMWare as illustrated on the ⏯ Getting started with BigData on Cloudera.

If you are not familiar with Docker get some hands-on experience at a series of step by step Docker tutorials with Java & Springboot examples.

Read more ›



11: Docker Tutorial: Hadoop Big Data CLIs on Cloudera quickstart

There are a number of CLIs that you can run from the edge node, which is the gateway node to the Hadoop cluster consisting of master & slave (aka worker) nodes. Let’s look at the different CLIs (i.e Command Line Interfaces)

Most of the CLIs listed below are in /usr/bin

hdfs CLI

The following command will give you the commands you can use with “hdfs”

Read more ›



12: Docker Tutorial: Hadoop Big Data configuration files on Cloudera quickstart

The “etc” (i.e etcetera) folder is mainly for configuration files. It’s purpose is to host various configuration files. For instance, to add a new hard drive to your system and have Linux auto-mount it on boot, you’d have to edit /etc/fstab. Key “Hadoop Cluster Configuration” files are:

hadoop-env.sh

This file specifies environment variables that affect the JDK used by Hadoop Daemon (i.e.

Read more ›



13: Docker Tutorial: Apache Spark (spark-shell & pyspark) on Cloudera quickstart

This extends Docker Tutorial: BigData on Cloudera quickstart via Docker. Step 1: Run the container on a command line. Step 2: Create a text file “simple.txt” in the local file system. Step 3: Copy this file onto HDFS file system. … Read more ›...

Members Only Content
Log In Register Home


14: Docker Tutorial: Hive (via beeline) on Cloudera quickstart

This extends Docker Tutorial: BigData on Cloudera quickstart via Docker. Step 1: Run the container on a command line. Step 2: Create a csv file “employee.csv” in the local file system. Step 3: Copy this file onto HDFS file system. … Read more ›...

Members Only Content
Log In Register Home


15: Docker Tutorial: Hive & parquet-tools – csv to parquet on Cloudera quickstart

CSV is a row based storage, and Parquet is columnar in nature, and it is designed from the ground up for efficient storage, compression and encoding, which gives better performance. Run the cloudera/quickstart This extends Docker Tutorial: BigData on Cloudera quickstart via Docker. … Read more ›...

Members Only Content
Log In Register Home


1 2 3 4

800+ Java Interview Q&As Menu

Learn by categories on the go...
Learn by categories such as FAQs – Core Java, Key Area – Low Latency, Core Java – Java 8, JEE – Microservices, Big Data – NoSQL, Architecture – Distributed, Big Data – Spark, etc. Some posts belong to multiple categories.
Top