To implement Pseudo Node configuration for Hadoop on ubuntu
a) jdk
Single-Node Configuration
- Create a dedicated user account for hadoop
$sudo addgroup hadoop
$sudo adduser --ingroup hadoop hduser
$sudo usermod -a -G sudo hduser
$su - hduser
- Install java1.8 in folder /usr/local
$sudo chmod 777 /usr/local
$cd /usr/local
$sudo tar xvzf $HOME/Downloads/jdk1.8.tar.gz
- Install Hadoop
$cd /usr/local
$tar xvzf $HOME/Downloads/hadoop-2.5.1.tar.gz
$sudo chmod 777 hadoop-2.5.1
- Set the hadoop environment variables: Include the following lines in the $HOME/.bashrc file #Set Hadoop-related environment variables
export HADOOP_HOME=/usr/local/hadoop-2.5.1
#Set JAVA home directory
export JAVA_HOME=/usr/local/jdk1.8.0_31
#Add Hadoop bin/ directory to PATH
export PATH=$PATH:$HADOOP_HOME/bin
- Set hadoop environment variables: Include the following lines /etc/profile file #--insert JAVA_HOME
JAVA_HOME=/usr/local/jdk1.8.0_31
#--insert HADOOP_PREFIX
HADOOP_PREFIX=/usr/local/hadoop-2.5.1
#--in PATH variable just append at the end of the line
PATH=$PATH: $JAVA_HOME/bin:$HADOOP_PREFIX/bin
#--Append HADOOP_PREFIX at end of the export statement export
PATH JAVA_HOME HADOOP_PREFIX
- Run the.bashrc & profile files from the $ prompt for updating the changes
$ source $HOME/.bashrc
$ source /etc/profile
Verify java & hadoop installation using
$ java -version
$ echo $HADOOP_PREFIX
$ cd $HADOOP_PREFIX
$ bin/hadoop version
- Configuration of the hadoop files: hadoop-env.sh, core-site.xml, mapred-site.xml, hdfs- site.xml and yarn-site.xml
path :: /usr/local/hadoop-2.5.1/etc/hadoop
a) hadoop-env.sh Include the following lines in hadoop-env.sh file
export JAVA_HOME=/usr/local/jdk1.8.0_31
export HADOOP_PREFIX=/usr/local/hadoop-2.5.1
b) core-site.xml Configure the directory for Hadoop to store its data files, the network ports it listens to, etc. Setup will use Hadoop’s Distributed File System (HDFS-single local machine)
$ mkdir -p /app/hadoop/tmp
$ chown hduser:hadoop /app/hadoop/tmp
Include the following lines in core-site.xml file between and tags
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
c) mapred-site.xml
$sudo cp mapred-site.xml.template mapred-site.xml
Include the following lines in mapred-site.xml file
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
d) hdfs-site.xml Include the following lines in hdfs-site.xml file
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
e) yarn-site.xml Include the following lines in yarn-site.xml file
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
- Format the Hadoop File system implemented on top of the local file system using
$/usr/local/hadoop-2.5.1/bin/hadoop namenode –format
- Start Hadoop using
$sbin/start-all.sh
Explore Hadoop using http://localhost:50070/ from the browser
- The commonly used HDFS Commands are as follows:
### mkdir :
##### Creates a directory in the given path:
syntax: bin/hdfs dfs -mkdir <paths>
### ls
##### Lists the files in a given path
syntax:bin/hdfs dfs -ls <args>
### cp
##### Copies files from source to destination. This command allows multiple sources as well in which case the destination must be a directory.
syntax:bin/hdfs dfs -cp <source> <dest>
- Create a directory ‘/input’ in HDFS
$bin/hdfs dfs -mkdir /input
- Copy the input files into the distributed file system
$cd $HOME/Downloads/
$ tar xvzf $HOME/Downloads/mrsampledata.tar.gz
- Run some of the examples provided
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce- examples-2.5.1.jar grep /input /output '(CSE)'
- Examine the output files
$bin/hdfs dfs -put $HOME/Downloads/mrsampledata/* /input
Copy the output files from the distributed file system to the local file system and examine them:
$ bin/hdfs dfs -get output output
$ cat output/*
or View the output files on the distributed file system
$ bin/hdfs dfs -cat /output/*
Thus, the implementation of Pseudo Node configuration for Hadoop on ubuntu is successfully executed.