Giter Club home page Giter Club logo

apache_hadoop_ant_install's Introduction

Apache hadoop installation steps

Setup the SSH Key authentication for non-root account within server

ssh-keygen -t rsa
chmod 0700 $HOME/.ssh
ssh-copy-id -i $HOME/.ssh/id_rsa.pub sathish@localhost

Install OpenJDK Java and git using below command

su - root
yum -y install java-1.8.0-openjdk.x86_64
yum -y install java-1.8.0-openjdk-devel
yum -y install git
yum -y install lsof wget zip unzip net-tools
useradd hadoop
password hadoop

Install and setup ant

su - hadoop
wget https://archive.apache.org/dist/ant/binaries/apache-ant-1.9.16-bin.tar.gz -P $HOME/
tar -xvf $HOME/apache-ant-1.9.16-bin.tar.gz -C $HOME/

Download the Apache hadoop installation github files

git clone https://github.com/skumarx87/apache_hadoop_ant_install.git

Update bigdata.properties file with you server details and location

  • cd apache_hadoop_ant_install
bigdata.root=/usr/bigdata
bigdata.user=hadoop
namenode.hostname=laksha.home.com
dfs.replication.level=1

##Release version ##
bigdata.release.version=1.0.0

##Hadoop Componets version ##

hadoop.version=3.2.0
hive.version=2.3.5
spark.version=3.0.3
spark.hadoop.version=3.2
tez.version=0.9.2
derby.version=10.10.2.0


##Download URL ##

apache.hadoop.site=https://archive.apache.org/dist
apache.hive.site=https://archive.apache.org/dist
apache.spark.site=https://dlcdn.apache.org/
apache.tez.site=https://dlcdn.apache.org
apache.derby.site=https://archive.apache.org/dist


deploy.local=true
deploy.local.dir=/usr/bigdata/buildtmp/parcel

Create slaves files

  • create apache_hadoop_ant_install/conf/slaves file and add the hostname
  • image

Run the ant build for hadoop installation

cd apache_hadoop_ant_install
export JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk
$HOME/apache-ant-1.9.16/bin/ant -f hadoop_install.xml

Apache hadoop administration

Update the $HOME/.bashrc file with below line

  • . /usr/bigdata/Env/1.0.0/scripts/bigdata-user-profile.sh.template Then logoff and login back

NameNode Format and Hive metastore initialize

  • /usr/bigdata/Env/1.0.0/scripts/bigadm.sh fresh_install

Managing hadoop services

stop and start services

/usr/bigdata/Env/1.0.0/scripts/bigadm.sh  stop
/usr/bigdata/Env/1.0.0/scripts/bigadm.sh  start
/usr/bigdata/Env/1.0.0/scripts/bigadm.sh  status

image

Basic hdfs admin commands

touch /tmp/test.txt
hdfs dfs -mkdir /tmp
hdfs dfs -mkdir /user/
hdfs dfs -copyFromLocal /tmp/test.txt /user/
hdfs dfs -copyToLocal /user/test.txt /tmp/

Testing spark from spark-shell

scala> sc.version
res0: String = 3.0.3

scala>

Testing Spark context from Jupyter Notebook

import os 
import pyspark 
from pyspark.sql import SQLContext, SparkSession

sc = SparkSession
.builder
.master('spark://192.168.198.128:7077')
.appName("sparkFromJupyter")
.getOrCreate()

sqlContext = SQLContext(sparkContext=sc.sparkContext, sparkSession=sc) 
print("Spark Version: " + sc.version) 
print("PySpark Version: " + pyspark.version)

Testing Hive

beeline -u "jdbc:hive2://localhost:10000/default"
create database test;
CREATE TABLE test.testTable (id INT,Name STRING);

0: jdbc:hive2://localhost:10000/default> show databases;
+----------------+
| database_name  |
+----------------+
| default        |
| test           |
+----------------+
0: jdbc:hive2://localhost:10000/default> show create table test.testTable;;
+----------------------------------------------------+
|                   createtab_stmt                   |
+----------------------------------------------------+
| CREATE TABLE `test.testTable`(                     |
|   `id` int,                                        |
|   `name` string)                                   |
| ROW FORMAT SERDE                                   |
|   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'  |
| STORED AS INPUTFORMAT                              |
|   'org.apache.hadoop.mapred.TextInputFormat'       |
| OUTPUTFORMAT                                       |
|   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' |
| LOCATION                                           |
|   'hdfs://laksha.home.com:9000/data/hive/warehouse/test.db/testtable' |
| TBLPROPERTIES (                                    |
|   'transient_lastDdlTime'='1638275125')            |
+----------------------------------------------------+
13 rows selected (0.406 seconds)

Hadoop URL's

NameNode URL

https://${hostname}:9871/dfshealth.html

Spark URL

https://${hostname}:8480/

YARN URL

https://${hostname}:8088/cluster

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.