The “etc” (i.e etcetera) folder is mainly for configuration files. It’s purpose is to host various configuration files. For instance, to add a new hard drive to your system and have Linux auto-mount it on boot, you’d have to edit /etc/fstab. Key “Hadoop Cluster Configuration” files are:
1 2 3 4 5 6 7 8 9 10 11 | [root@quickstart /]# ls -ltr /etc/hadoop/conf/ total 40 -rwxr-xr-x 1 root root 2375 Feb 23 2016 yarn-site.xml -rwxr-xr-x 1 root root 1104 Feb 23 2016 README -rwxr-xr-x 1 root root 2890 Feb 23 2016 hadoop-metrics.properties -rwxr-xr-x 1 root root 1366 Feb 23 2016 hadoop-env.sh -rwxr-xr-x 1 root root 11291 Mar 23 2016 log4j.properties -rw-rw-r-- 1 root root 1546 Apr 5 2016 mapred-site.xml -rw-rw-r-- 1 root root 3739 Apr 5 2016 hdfs-site.xml -rw-rw-r-- 1 root root 1915 Apr 5 2016 core-site.xml |
hadoop-env.sh
This file specifies environment variables that affect the JDK used by Hadoop Daemon (i.e. /usr/bin/hadoop). As Hadoop framework is written in Java and uses Java Runtime environment, one of the important environment variables for Hadoop daemon is $JAVA_HOME.
core-site.xml
This file tells where NameNode runs in the cluster. It contains the configuration settings for Hadoop Core. The commonly used port is 8020 and you can also specify IP address rather than hostname.
1 2 3 4 5 6 7 8 9 10 | [root@quickstart /]# cat /etc/hadoop/conf/core-site.xml ..... <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://quickstart.cloudera:8020</value> </property> ..... </configuration> |
hdfs-site.xml
This file contains the configuration settings for HDFS daemons.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | [root@quickstart /]# cat /etc/hadoop/conf/hdfs-site.xml ..... <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> ..... <property> <name>dfs.permissions</name> <value>false</value> </property> ..... </configuration> |
yarn-site.xml
Configurations for ResourceManager and NodeManager.
mapred-site.xml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | [root@quickstart /]# cat /etc/hadoop/conf/mapred-site.xml <configuration> ... <property> <name>mapreduce.jobhistory.address</name> <value>0.0.0.0:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>0.0.0.0:19888</value> </property> ..... </configuration> |
slaves
The ‘slaves’ file at Master node contains a list of hosts, one per line, that are to host Data Node and Task Tracker servers. The “slaves” file at Slave node contains only its own IP address and not of any other Data Nodes in the cluster. In Hadoop 3.0 all worker hostnames or IP addresses will be in /etc/hadoop/workers file.
Masters
This file informs about the Secondary Namenode location to hadoop daemon. The ‘masters’ file at Master server contains a hostname Secondary Name Node servers. In Hadoop 3.0 you will have one active name node & more than one passive name nodes.
/etc/init.d services
init.d is the sub-directory of /etc directory in Linux file system. init.d basically contains the bunch of start/stop/reload/restart/status scripts which are used to control the Hadoop ecosystem daemons whilst the system is running or during boot. If you look at /etc/init.d then you will notice all the scripts for different services
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | [root@quickstart /]# cd /etc/init.d [root@quickstart init.d]# ls atd hadoop-httpfs hive-metastore netconsole sentry-store cloudera-quickstart-init hadoop-mapreduce-historyserver hive-server2 netfs single cloudera-scm-agent hadoop-yarn-nodemanager htcacheclean network solr-server cloudera-scm-server hadoop-yarn-proxyserver httpd ntpd spark-history-server crond hadoop-yarn-resourcemanager hue ntpdate sqoop2-server flume-ng-agent halt impala-catalog oozie sqoop-metastore functions hbase-master impala-server rdisc sysstat hadoop-hdfs-datanode hbase-regionserver impala-state-store restorecond udev-post hadoop-hdfs-journalnode hbase-rest iptables rpcbind zookeeper-server hadoop-hdfs-namenode hbase-solr-indexer killall rsyslog hadoop-hdfs-secondarynamenode hbase-thrift mysqld sandbox [root@quickstart init.d]# service hbase-master status HBase master daemon is running [ OK ] [root@quickstart init.d]# service mysqld status mysqld (pid 169) is running... [root@quickstart init.d]# service impala-server status Impala Server is running [ OK ] |
Cloudera Manager Agent/Server Architecture
Cloudera Manager runs a central server, which is aka “SCM Server” that hosts the UI Web Server and the application logic for managing CDH. Everything related to installing CDH, configuring services, and starting and stopping services is managed by the Cloudera Manager Server. You can also configure all the configuration parameters listed on config files like hdfs-site.xml, core-site.xml, yarn-site.xml, etc.
The Cloudera Manager Agents are installed on every managed host. They are responsible for starting and stopping Linux processes, unpacking configurations, triggering various installation paths, and monitoring the host.
The Cloudera Manager Server is the master service that manages the data model of the entire cluster in a database. The data model contains information regarding the services, roles, and configurations assigned for each node in the cluster. You can also upgrade the services via parcels & packages.
CDH – stands for Cloudera Distribution Hadoop. CDH upgrades contain updated versions of the Hadoop software and other components. You can use Cloudera Manager to upgrade CDH for major, minor, and maintenance upgrades.
Cloudera Manager Client Vs. Server configurations
Novice Cloudera Manager administrators are often surprised that modifying /etc/hadoop/conf and then restarting HDFS has no effect. This is because service instances started by Cloudera Manager do not read configuration files from the default locations. Cloudera Manager distinguishes between server and client configuration. In the case of HDFS, the file /etc/hadoop/conf/hdfs-site.xml contains only configuration relevant to an HDFS client.
Let’s start the Cloudera Manager on Docker as described in Docker Tutorial: Cloudera BigData on Docker via DockerHub
1 2 3 | [root@quickstart /]# cd /home/cloudera/ [root@quickstart cloudera]# ./cloudera-manager --enterprise --force |
http://localhost:7180/ to access the Cloudera manager, where services can be not only stopped and started, but also configured.
Cloudera Manager obtain their configurations from a private per-process directory, under /var/run/cloudera-scm-agent/process/unique-process-name. Giving each process its own private execution and configuration environment allows Cloudera Manager to control each process independently. Here is an example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | [root@quickstart cloudera]# ls -ltr /var/run/cloudera-scm-agent/process total 0 drwxr-x--x 3 zookeeper zookeeper 260 May 28 10:16 1-zookeeper-init drwxr-x--x 5 sqoop2 sqoop 260 May 28 10:16 2-sqoop-create-database drwxr-xr-x 4 root root 100 May 28 10:16 ccdeploy_hadoop-conf_etchadoopconf.cloudera.yarn_-2662724828499381521 drwxr-xr-x 7 root root 200 May 28 10:17 ccdeploy_spark-conf_etcsparkconf.cloudera.spark_on_yarn_-1719474062264734402 drwxr-xr-x 4 root root 100 May 28 10:17 ccdeploy_hive-conf_etchiveconf.cloudera.hive_8031995061896064328 drwxr-xr-x 6 root root 140 May 28 10:17 ccdeploy_sqoop-conf_etcsqoopconf.cloudera.sqoop_client_7384993928373551612 drwxr-xr-x 4 root root 100 May 28 10:17 ccdeploy_solr-conf_etcsolrconf.cloudera.solr_-516324481236666050 drwxr-xr-x 4 root root 100 May 28 10:17 ccdeploy_hbase-conf_etchbaseconf.cloudera.hbase_-2161336867764635751 drwxr-xr-x 4 root root 100 May 28 10:17 ccdeploy_hadoop-conf_etchadoopconf.cloudera.hdfs_-9173769841997679756 drwxr-x--x 3 cloudera-scm cloudera-scm 240 May 28 10:17 3-cloudera-mgmt-EVENTSERVER drwxr-x--x 3 cloudera-scm cloudera-scm 300 May 28 10:17 5-cloudera-mgmt-NAVIGATORMETASERVER drwxr-x--x 3 cloudera-scm cloudera-scm 220 May 28 10:17 6-cloudera-mgmt-ALERTPUBLISHER drwxr-x--x 3 cloudera-scm cloudera-scm 320 May 28 10:17 8-cloudera-mgmt-SERVICEMONITOR drwxr-x--x 3 cloudera-scm cloudera-scm 260 May 28 10:17 4-cloudera-mgmt-HOSTMONITOR drwxr-x--x 3 cloudera-scm cloudera-scm 340 May 28 10:17 7-cloudera-mgmt-REPORTSMANAGER drwxr-x--x 3 cloudera-scm cloudera-scm 320 May 28 10:18 9-cloudera-mgmt-NAVIGATOR |
1 2 3 4 5 6 7 8 9 10 11 12 | [root@quickstart cloudera]# ls -ltr /var/run/cloudera-scm-agent/process/ccdeploy_hadoop-conf_etchadoopconf.cloudera.hdfs_-9173769841997679756/hadoop-conf/ total 36 -rw-r----- 1 root root 1510 May 28 10:17 topology.py -rw-r----- 1 root root 201 May 28 10:17 topology.map -rw-r----- 1 root root 315 May 28 10:17 ssl-client.xml -rw-r----- 1 root root 314 May 28 10:17 log4j.properties -rw-r----- 1 root root 1772 May 28 10:17 hdfs-site.xml -rw-r----- 1 root root 2696 May 28 10:17 hadoop-env.sh -rw-r----- 1 root root 3549 May 28 10:17 core-site.xml -rw-r--r-- 1 root root 26 May 28 10:17 __cloudera_metadata__ -rw-r--r-- 1 root root 21 May 28 10:17 __cloudera_generation__ |
http://localhost:7180/
Login with cloudera/cloudera.
Click on HDFS –> Configuration, where you can modify the server configuration values.