Install Hadoop and Hbase on AWS(Amazon Web Services) #2

I’ve tried to install Hadoop through Cloudera, but no luck, installation always failed with server crash and after crash the service couldn’t start again, even after rebooting. So I decide to install Hadoop manually.

1. Install hadoop on the AWS server

ubuntu@ip-xxx-xxx-xxx-xxx:~$ wget http://apache.mirror.gtcomm.net/hadoop/common/current/hadoop-2.6.0.tar.gz
ubuntu@ip-xxx-xxx-xxx-xxx:~$ tar -xvzf hadoop-2.6.0.tar.gz
ubuntu@ip-xxx-xxx-xxx-xxx:~$ ln -s hadoop-2.6.0 hadoop

2. Install java

ubuntu@ip-xxx-xxx-xxx-xxx:~$ sudo apt-get update
ubuntu@ip-xxx-xxx-xxx-xxx:~$ sudo apt-get install default-jdk
ubuntu@ip-xxx-xxx-xxx-xxx:~$ java -version

3. Configure environment

ubuntu@ip-xxx-xxx-xxx-xxx:~$ sudo nano /etc/environment

...

JAVA_HOME="/usr/lib/jvm/java-7-openjdk-amd64"
PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/game$

ubuntu@ip-xxx-xxx-xxx-xxx:~$ sudo nano ~/.bashrc

...

# JAVA
export JAVA_HOME="/usr/lib/jvm/java-7-openjdk-amd64"
export PATH=$PATH:$JAVA_HOME/bin

# HADOOP
export HADOOP_PREFIX=/home/ubuntu/hadoop
export HADOOP_HOME=/home/ubuntu/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin

export HADOOP_MAPRED_HOME=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_HDFS_HOME=${HADOOP_HOME}
export HADOOP_CONF=${HADOOP_HOME}/etc/hadoop

export YARN_HOME=${HADOOP_HOME}
export YARN_CONF=${HADOOP_HOME}/etc/hadoop

# HADOOP Native Path
export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_HOME}/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"

ubuntu@ip-xxx-xxx-xxx-xxx:~$ source ~/.bashrc
ubuntu@ip-xxx-xxx-xxx-xxx:~$ echo $HADOOP_HOME
/home/ubuntu/hadoop
ubuntu@ip-xxx-xxx-xxx-xxx:~$

If you don’t know where is the installed java path. You can find it by this.

ubuntu@ip-xxx-xxx-xxx-xxx:~$ sudo update-alternatives --config java

or this

ubuntu@ip-xxx-xxx-xxx-xxx:~$ which java

And repeat 1 ~ 3 for 3 machines (Secondaray name node and two slaves).
Or make an image of this instance then duplicate it.
I just figure out about the AMIs while struggling with Hadoop installation.
It is quite convenient for multiple machine installation.

4. Configure SSH for Master to connect Slaves without password

ubuntu@ip-xxx-xxx-xxx-xxx:~$ ssh-keygen -t dsa
Generating public/private dsa key pair.
Enter file in which to save the key (/home/luser/.ssh/id_dsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/luser/.ssh/id_dsa.
Your public key has been saved in /home/luser/.ssh/id_dsa.pub.
The key fingerprint is:
f3:77:56:58:a8:bb:08:59:67:15:2c:0e:1d:d0:40:a3 ubuntu@ip-xxx-xxx-xxx-xxx
ubuntu@ip-xxx-xxx-xxx-xxx:~$ chmod 600 $HOME/.ssh/id_dsa*

After generate key pair, add the public key to ‘authorized_keys’ and then check the key works properly.

ubuntu@ip-xxx-xxx-xxx-xxx:~$ cat /home/ubuntu/.ssh/id_rsa.pub >> /home/ubuntu/.ssh/authorized_keys
ubuntu@ip-xxx-xxx-xxx-xxx:~$ ssh localhost
Welcome to Ubuntu 14.04.1 LTS (GNU/Linux 3.13.0-36-generic x86_64)

 * Documentation:  https://help.ubuntu.com/

  System information as of Thu Dec 11 21:33:54 UTC 2014

  System load:  0.0               Processes:           97
  Usage of /:   2.6% of 62.86GB   Users logged in:     0
  Memory usage: 5%                IP address for eth0: 172.31.22.234
  Swap usage:   0%

  Graph this data and manage this system at:
    https://landscape.canonical.com/

  Get cloud support with Ubuntu Advantage Cloud Guest:
    http://www.ubuntu.com/business/services/cloud


Last login: Thu Dec 11 21:33:54 2014 from aftr-37-201-193-43.unity-media.net
ubuntu@ip-xxx-xxx-xxx-xxx:~$

To communicate between Master and other machines, you just copy the ‘authorized_keys’ to the others.

ubuntu@ip-xxx-xxx-xxx-xxx:~$ scp -i ~/.ssh/name.pem ~/.ssh/authorized_keys ubuntu@SECONDARAY_NAME_NODE_ADDRESS:~/.ssh/
ubuntu@ip-xxx-xxx-xxx-xxx:~$ scp -i ~/.ssh/name.pem ~/.ssh/authorized_keys ubuntu@SLAVE1_ADDRESS:~/.ssh/
ubuntu@ip-xxx-xxx-xxx-xxx:~$ scp -i ~/.ssh/name.pem ~/.ssh/authorized_keys ubuntu@SLAVE2_ADDRESS:~/.ssh/

refer
https://macnugget.org/projects/publickeys/
http://haruair.com/blog/1827

5. Hadoop Cluster Setup – hadoop-env.sh

ubuntu@ip-xxx-xxx-xxx-xxx:~$ nano $HADOOP_CONF/hadoop-env.sh
#replace this
export JAVA_HOME=${JAVA_HOME}
#to this
export JAVA_HOME="/usr/lib/jvm/java-7-openjdk-amd64"

6. Hadoop Cluster Setup – core-site.xml

ubuntu@ip-xxx-xxx-xxx-xxx:~$ mkdir hdfstmp
ubuntu@ip-xxx-xxx-xxx-xxx:~$ nano $HADOOP_CONF/core-site.xml
<configuration>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://MASTER_SERVER_ADDRESS:9000</value>
  </property>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/home/ubuntu/hdfstmp</value>
  </property>
</configuration>

The MASTER_SERVER_ADDRESS can be local address like “hdfs://ip-172-31-24-134.us-west-1.compute.internal” and also can be ‘localhost’ or ‘127.0.0.1’, but if you want to set up as a cluster, you should write public or private address instead of ‘localhost’ or ‘127.0.0.1’ because we gonna copy these setting files to slave machines.

7. Hadoop Cluster Setup – hdfs-site.xml

ubuntu@ip-xxx-xxx-xxx-xxx:~$ nano $HADOOP_CONF/hdfs-site.xml
<configuration>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/home/ubuntu/hdfstmp/dfs/name</value>
    <final>true</final>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/home/ubuntu/hdfstmp/dfs/data</value>
    <final>true</final>
  </property>
  <property>
    <name>dfs.replication</name>
    <value>2</value>
  </property>
  <property>
    <name>dfs.permissions</name>
    <value>false</value>
  </property>
</configuration>

8. Hadoop Cluster Setup – yarn-site.xml

ubuntu@ip-xxx-xxx-xxx-xxx:~$ nano $HADOOP_CONF/yarn-site.xml
<configuration>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>
  <property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>MASTER_SERVER_ADDRESS:8025</value>
  </property>
  <property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>MASTER_SERVER_ADDRESS:8030</value>
  </property>
  <property>
    <name>yarn.resourcemanager.address</name>
    <value>MASTER_SERVER_ADDRESS:8040</value>
  </property>
 </configuration>

9. Hadoop Cluster Setup – mapred-site.xml

ubuntu@ip-xxx-xxx-xxx-xxx:~$ cp $HADOOP_CONF/mapred-site.xml.template $HADOOP_CONF/mapred-site.xml
ubuntu@ip-xxx-xxx-xxx-xxx:~$ nano $HADOOP_CONF/mapred-site.xml
<configuration>
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
</configuration>

10. Move configuration files to other machines

ubuntu@ip-xxx-xxx-xxx-xxx:~$ scp $HADOOP_CONF/hadoop-env.sh $HADOOP_CONF/core-site.xml $HADOOP_CONF/yarn-site.xml $HADOOP_CONF/hdfs-site.xml $HADOOP_CONF/mapred-site.xml ubuntu@SECONDARAY_NAME_NODE_ADDRESS:/home/ubuntu/hadoop/etc/hadoop/
ubuntu@ip-xxx-xxx-xxx-xxx:~$ scp $HADOOP_CONF/hadoop-env.sh $HADOOP_CONF/core-site.xml $HADOOP_CONF/yarn-site.xml $HADOOP_CONF/hdfs-site.xml $HADOOP_CONF/mapred-site.xml ubuntu@SLAVE1_ADDRESS:/home/ubuntu/hadoop/etc/hadoop/
ubuntu@ip-xxx-xxx-xxx-xxx:~$ scp $HADOOP_CONF/hadoop-env.sh $HADOOP_CONF/core-site.xml $HADOOP_CONF/yarn-site.xml $HADOOP_CONF/hdfs-site.xml $HADOOP_CONF/mapred-site.xml ubuntu@SLAVE2_ADDRESS:/home/ubuntu/hadoop/etc/hadoop/

11. Configure Slaves on Master machine

We will do this only Master machine because in hadoop-2.6.0, you don’t need to configure the ‘slaves’ file in slave machines.

ubuntu@ip-xxx-xxx-xxx-xxx:~$ nano $HADOOP_CONF/slaves
SLAVE1_ADDRESS
SLAVE2_ADDRESS

12. Hadoop Daemon Startup

On the master machine.

ubuntu@ip-172-31-22-234:~$ hdfs namenode -format
...
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ip-172-31-22-234.us-west-1.compute.internal/172.31.22.234
************************************************************/
ubuntu@ip-172-31-22-234:~$ start-dfs.sh
Starting namenodes on [ip-172-31-22-234.us-west-1.compute.internal]
The authenticity of host 'ip-172-31-22-234.us-west-1.compute.internal (172.31.22.234)' can't be established.
ECDSA key fingerprint is a3:bf:43:66:11:52:40:cc:6c:00:ea:81:73:11:b2:a8.
Are you sure you want to continue connecting (yes/no)? yes
ip-172-31-22-234.us-west-1.compute.internal: Warning: Permanently added 'ip-172-31-22-234.us-west-1.compute.internal' (ECDSA) to the list of known hosts.
ip-172-31-22-234.us-west-1.compute.internal: starting namenode, logging to /home/ubuntu/hadoop-2.6.0/logs/hadoop-ubuntu-namenode-ip-172-31-22-234.out
ip-172-31-0-61.us-west-1.compute.internal: starting datanode, logging to /home/ubuntu/hadoop-2.6.0/logs/hadoop-ubuntu-datanode-ip-172-31-0-61.out
ip-172-31-0-60.us-west-1.compute.internal: starting datanode, logging to /home/ubuntu/hadoop-2.6.0/logs/hadoop-ubuntu-datanode-ip-172-31-0-60.out
Starting secondary namenodes [0.0.0.0]
The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
ECDSA key fingerprint is a3:bf:43:66:11:52:40:cc:6c:00:ea:81:73:11:b2:a8.
Are you sure you want to continue connecting (yes/no)? yes
0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.
0.0.0.0: starting secondarynamenode, logging to /home/ubuntu/hadoop-2.6.0/logs/hadoop-ubuntu-secondarynamenode-ip-172-31-22-234.out
ubuntu@ip-172-31-22-234:~$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /home/ubuntu/hadoop/logs/yarn-ubuntu-resourcemanager-ip-172-31-22-234.out
ip-172-31-0-60.us-west-1.compute.internal: starting nodemanager, logging to /home/ubuntu/hadoop-2.6.0/logs/yarn-ubuntu-nodemanager-ip-172-31-0-60.out
ip-172-31-0-61.us-west-1.compute.internal: starting nodemanager, logging to /home/ubuntu/hadoop-2.6.0/logs/yarn-ubuntu-nodemanager-ip-172-31-0-61.out
ubuntu@ip-172-31-22-234:~$ jps
2253 NameNode
2865 Jps
2466 SecondaryNameNode
2600 ResourceManager
ubuntu@ip-172-31-22-234:~$

Cool~ Looks working!

refer
http://blog.c2b2.co.uk/2014/05/hadoop-v2-overview-and-cluster-setup-on.html
http://stackoverflow.com/questions/4681090/how-do-i-find-where-jdk-is-installed-on-my-windows-machine

	Raphael, Eom on Adding System.Data.dll in Unit…
	Altay on Adding System.Data.dll in Unit…
	Raphael, Eom on VirtualBox(5.1.26)’s bri…
	5rf on VirtualBox(5.1.26)’s bri…
	Raphael, Eom on Mongoose with Webpack

Install Hadoop and Hbase on AWS(Amazon Web Services) #2

1. Install hadoop on the AWS server

2. Install java

3. Configure environment

4. Configure SSH for Master to connect Slaves without password

5. Hadoop Cluster Setup – hadoop-env.sh

6. Hadoop Cluster Setup – core-site.xml

7. Hadoop Cluster Setup – hdfs-site.xml

8. Hadoop Cluster Setup – yarn-site.xml

9. Hadoop Cluster Setup – mapred-site.xml

10. Move configuration files to other machines

11. Configure Slaves on Master machine

12. Hadoop Daemon Startup

Published by

Raphael, Eom

Leave a comment Cancel reply

1. Install hadoop on the AWS server

2. Install java

3. Configure environment

4. Configure SSH for Master to connect Slaves without password

5. Hadoop Cluster Setup – hadoop-env.sh

6. Hadoop Cluster Setup – core-site.xml

7. Hadoop Cluster Setup – hdfs-site.xml

8. Hadoop Cluster Setup – yarn-site.xml

9. Hadoop Cluster Setup – mapred-site.xml

10. Move configuration files to other machines

11. Configure Slaves on Master machine

12. Hadoop Daemon Startup

Share this:

Related

Published by

Raphael, Eom

Leave a comment Cancel reply