Giter Club home page Giter Club logo

sparkhadoopcluster's Introduction

SparkHadoopCluster

create your own Apache Spark cluster with Hadoop/HDFS installed.

full blog post...

  1. install java of course sudo apt-get -y install openjdk-8-jdk-headless default-jre
    • This must be done on all nodes.
  2. Install Scala sudo apt install scala
    • I have no idea if this must be done on all nodes, I did.
  3. Setup password-less ssh between all nodes.
    • sudo apt install openssh-server openssh-client
  4. create keys ssh-keygen -t rsa -P ""
    • move .pub key into each worker nodes ~/.ssh/authorized_keys location.
  5. sudo vim /etc/hosts
    • add lines for master and nodes with name and ip address... something like follows.
    • 173.255.199.161 master and maybe 198.58.124.54 worker1
  6. Probably want to install Spark on master and nodes.... wget https://archive.apache.org/dist/spark/spark-2.4.3/spark-2.4.3-bin-hadoop2.7.tgz
    • unpack tar xvf spark-2.4.3-bin-hadoop2.7.tgz
    • move sudo mv spark-2.4.3-bin-hadoop2.7/ /usr/local/spark
  7. We need to set Spark/Java path etc.... sudo vim ~/.bashrc
    • export PATH=/usr/local/spark/bin:$PATH
    • export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
    • export PATH=$JAVA_HOME/bin:$PATH
    • source ~/.bashrc
  8. vim /usr/local/spark/conf/spark-env.sh
    • export SPARK_MASTER_HOST=<master-ip <- fill in your IP address of master node here.
    • export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
  9. modify the ridiculously named file vim /usr/local/spark/conf/slaves
    • add names of master and workers from .hosts file above.
    • Finally, start Spark.... sh /usr/local/spark/sbin/start-all.sh

Hadoop

  1. wget https://archive.apache.org/dist/hadoop/core/hadoop-2.7.3/hadoop-2.7.3.tar.gz
    • tar -xvf hadoop-2.7.3.tar.gz
    • mv hadoop-2.7.3ย  hadoop<
  2. vim ~/hadoop/etc/hadoop/hadoop-env.sh
    • export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
  3. vim ~/hadoop/etc/hadoop/core-site.xml
fs.default.name hdfs://master:9000
  1. vim ~/hadoop/etc/hadoop/hdfs-site.xml
dfs.namenode.name.dir /home/beach/data/nameNode dfs.namenode.data.dir /home/beach/data/dataNode dfs.replication 2
  1. cd ~/hadoop/etc/hadoop
    • mv mapred-site.xml.template mapred-site.xml
    • sudo vim ~/hadoop/etc/hadoop/mapred-site.xml
mapreduce.framework.name yarn yarn.app.mapreduce.am.resource.mb 800 mapreduce.map.memory.mb 400 mapreduce.reduce.memory.mb 400
  1. vim ~/hadoop/etc/hadoop/slaves

    • localhost
    • worker1
    • worker2
  2. vim ~/hadoop/etc/hadoop/yarn-site.xml

yarn.acl.enable 0 yarn.resourcemanager.hostname master yarn.nodemanager.aux-services mapreduce_shuffle yarn.nodemanager.resource.memory-mb 800 yarn.scheduler.maximum-allocation-mb 800 yarn.scheduler.minimum-allocation-mb 400
  1. sudo vim ~/.bashrc
    • export PATH=/home/beach/hadoop/bin:/home/beach/hadoop/sbin:$PATH

sparkhadoopcluster's People

Contributors

danielbeach avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

Forkers

dmprospace

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.