12: Docker Tutorial: Hadoop Big Data configuration files on Cloudera quickstart

The “etc” (i.e etcetera) folder is mainly for configuration files. It’s purpose is to host various configuration files. For instance, to add a new hard drive to your system and have Linux auto-mount it on boot, you’d have to edit /etc/fstab. Key “Hadoop Cluster Configuration” files are:

hadoop-env.sh

This file specifies environment variables that affect the JDK used by Hadoop Daemon (i.e. /usr/bin/hadoop). As Hadoop framework is written in Java and uses Java Runtime environment, one of the important environment variables for Hadoop daemon is $JAVA_HOME.

core-site.xml

This file tells where NameNode runs in the cluster. It contains the configuration settings for Hadoop Core. The commonly used port is 8020 and you can also specify IP address rather than hostname.

hdfs-site.xml

This file contains the configuration settings for HDFS daemons.

yarn-site.xml

Configurations for ResourceManager and NodeManager.

mapred-site.xml

slaves

The ‘slaves’ file at Master node contains a list of hosts, one per line, that are to host Data Node and Task Tracker servers. The “slaves” file at Slave node contains only its own IP address and not of any other Data Nodes in the cluster. In Hadoop 3.0 all worker hostnames or IP addresses will be in /etc/hadoop/workers file.

Masters

This file informs about the Secondary Namenode location to hadoop daemon. The ‘masters’ file at Master server contains a hostname Secondary Name Node servers. In Hadoop 3.0 you will have one active name node & more than one passive name nodes.

/etc/init.d services

init.d is the sub-directory of /etc directory in Linux file system. init.d basically contains the bunch of start/stop/reload/restart/status scripts which are used to control the Hadoop ecosystem daemons whilst the system is running or during boot. If you look at /etc/init.d then you will notice all the scripts for different services

Cloudera Manager Agent/Server Architecture

Cloudera Manager runs a central server, which is aka “SCM Server” that hosts the UI Web Server and the application logic for managing CDH. Everything related to installing CDH, configuring services, and starting and stopping services is managed by the Cloudera Manager Server. You can also configure all the configuration parameters listed on config files like hdfs-site.xml, core-site.xml, yarn-site.xml, etc.

The Cloudera Manager Agents are installed on every managed host. They are responsible for starting and stopping Linux processes, unpacking configurations, triggering various installation paths, and monitoring the host.

The Cloudera Manager Server is the master service that manages the data model of the entire cluster in a database. The data model contains information regarding the services, roles, and configurations assigned for each node in the cluster. You can also upgrade the services via parcels & packages.

Cloudera Manager - Server & Agents

Cloudera Manager – Server & Agents

CDH – stands for Cloudera Distribution Hadoop. CDH upgrades contain updated versions of the Hadoop software and other components. You can use Cloudera Manager to upgrade CDH for major, minor, and maintenance upgrades.

Cloudera Manager Client Vs. Server configurations

Novice Cloudera Manager administrators are often surprised that modifying /etc/hadoop/conf and then restarting HDFS has no effect. This is because service instances started by Cloudera Manager do not read configuration files from the default locations. Cloudera Manager distinguishes between server and client configuration. In the case of HDFS, the file /etc/hadoop/conf/hdfs-site.xml contains only configuration relevant to an HDFS client.

Let’s start the Cloudera Manager on Docker as described in Docker Tutorial: Cloudera BigData on Docker via DockerHub

http://localhost:7180/ to access the Cloudera manager, where services can be not only stopped and started, but also configured.

Cloudera Manager obtain their configurations from a private per-process directory, under /var/run/cloudera-scm-agent/process/unique-process-name. Giving each process its own private execution and configuration environment allows Cloudera Manager to control each process independently. Here is an example:

http://localhost:7180/

Login with cloudera/cloudera.

Click on HDFS –> Configuration, where you can modify the server configuration values.

Cloudera Manager - HDFS Configurations

Cloudera Manager – HDFS Configurations

Cloudera Manager Client Configurations

Cloudera Manager - Client Configurations

Cloudera Manager – Client Configurations

Cloudera Manager Client Configurations – download URLs

Cloudera Manager - Client Configuration download URLs

Cloudera Manager – Client Configuration download URLs


🔥 300+ Java Interview FAQs

Java & Big Data Tutorials

Top