Step 1: Log into your AWS Management Console [ https://aws.amazon.com/console/ ] and select “EC2” from “Services -> Compute -> EC2”. On these EC2 servers you can later host Web applications servers or multi-node Hadoop clusters. More and more organizations are hosting their applications on AWS infrastructures. EC2 stands for Elastic Cloud Computing.
Step 2: From “EC2 Dashboard” select “Network & Security” and set up the “Security Groups”. For Example, Cloudera requires the following TCP ports to be open 22 (for SSH), 7180 (for Cloudera Manager web console ), 7182 (for Agent heartbeat), 7183 (optional, Cloudera Manager web console with TLS), and 7432 (Embedded PostgreSQL).
Step 3: Ready to create instances from “Instances -> instances”. Click on “Launch Instance” and then “Ubuntu Server 16.04 LTS (HVM)”. “t2.micro” is the free tier, but Cloudera requires 4 GB memory.
WARNING: Check the https://aws.amazon.com/ec2/pricing/ for pricing of “t2.medium”. It is also important that you stop the instance when not using to not get charged. You pay for compute capacity by per hour or per second depending on which instances you run. At the time of writing it is about $0.0464 per Hour.
Step 4: Click “Configure Instant Details”. Launch 3 instances as in 1 master & 2 slave nodes. Leave the other settings as they are.
Step 5: Click “Add storage”, and increase it to 32GB.
Step 6: Click “Add Tags”, which is handy to identify your instances when you have 100s of them for different purposes. We will tag them as “Cloudera”. Click “next: Configure Security Group”.
Step 7: “Select an existing Security Group” and select the group named “Hadoop Cluster” that we set up earlier. Click on “Review & Launch”. Make sure that you have selected the region at the top. I have chose “Sydney”
Step 8: Heed the warnings about the tier is not being free and if you are happy to proceed click “Launch”.
Step 9: Create a key-pair so that you can connect to the server from SSH clients like “putty.exe” to install Java & Cloudera Quickstart. Download the “cloudera-instances-private-key.pem”, which we will convert to a “cloudera-instances-private-key.ppk” file using the “puttygen.exe” program. The EC2 Instance will keep the public key and the SSH client client like putty will use the private key to connect without requiring to enter the password.
Step 10: Click on “Launch Instances”. You may get a warning:
Step 11: Three instances will start, and name those instances as Master, Slave01, and Slave02 as easier to identify by name than ip addresses.
Each instance will be assigned public & private IP addresses. Note down the “Public DNS” as we will be using them to connect from puTTY via ssh.
Step 12: Generate 3 elastic ip addresses and associate them to the 3 AWS EC2 instances. If you don’t create elastic ip addresses, every time you stop and start the instances, new public ip addresses will be allocated.
Note: In order to make your private ip addresses static you need to use a VPC. Amazon Virtual Private Cloud (Amazon VPC) lets you provision a logically isolated section of the AWS Cloud where you can launch AWS resources in a virtual network that you define.
Step 13: Associate the ip addresses to the AWS EC2 instances. From “Elastic IPs” select each ip address and then from the “Actions” drop down, and select the “Associate Address”.
After associating each elastic ip address to AWS EC2 instance, go to EC2 instances and check the details.
AWS EC2 instances overview
Amazon Elastic Block Store (Amazon EBS) provides persistent block storage volumes for use with Amazon EC2 instances in the AWS Cloud. Each Amazon EBS volume is automatically replicated within its Availability Zone to protect you from component failure. Typical use cases include Big Data analytics engines like the Hadoop/HDFS ecosystem and Amazon EMR clusters, relational and NoSQL databases like MySQL Server or HBase, stream and log processing applications like Kafka and Storm, and data warehousing applications like Teradata.
In the next tutorial we will set up
SSH client connecting to EC2 instances
For example, putty to connect to EC2 instances without having to enter the password. You need to use the public DNS for SSH clients
SSH among EC2 instances
You need to use the private ip addresses. Setup aliases in /etc/hosts file and generate key pair to connect via SSH among 3 EC2 instances. For example, from “Master” to “Slave01” and “Slave02”. Use private ip addresses.