Giter Club home page Giter Club logo

myhadoop's Introduction

myHadoop

This represents the logical continuation of the myhadoop project by Sriram Krishnan (http://sourceforge.net/projects/myhadoop/).

myhadoop provides a framework for launching Hadoop clusters within traditional high-performance compute clusters and supercomputers. It allows users to provision and deploy Hadoop clusters within the batch scheduling environment of such systems with minimal expertise required.

Quick Install

Assuming you unpacked myHadoop in /usr/local/myhadoop and your Hadoop binary distribution is located in /usr/local/hadoop-1.2.1:

cd /usr/local/hadoop-1.2.1/conf
patch < /usr/local/myhadoop/myhadoop-1.2.1.patch

That's it.

myHadoop also supports Hadoop 2 via the myhadoop-2.2.0.patch file. This patch has been tested to work against Hadoop 2.6, and it may work with newer versions as well.

See USERGUIDE.md for a more detailed installation guide and a brief introduction to using myHadoop.

About myHadoop

This framework comes with three principal runtimes:

  • bin/myhadoop-configure.sh - creates a Hadoop configuration set based on the information presented by a system's batch scheduler. At present, myhadoop interfaces with Torque, SLURM, and Sun Grid Engine.
  • bin/myhadoop-cleanup.sh - cleans up after the Hadoop cluster has been torn down.
  • bin/myhadoop-bootstrap.sh - When run from either within a job submission script or an interactive job, it provides a one-command configuration and spinup of a Hadoop cluster and instructions for the user on how to connect to his or her personal cluster.

myhadoop-configure.sh

This script takes a series of template configuration XML files, applies the necessary patches based on the job runtime environment provided by the batch scheduler, and (optionally) formats HDFS. The general syntax is

myhadoop-configure.sh -c /your/new/config/dir -s /path/to/node/local/storage

where

  • /your/new/config/dir is where you would like your Hadoop cluster's config directory to reside. This is a relatively arbitrary choice, but it will then serve as your Hadoop cluster's $HADOOP_CONF_DIR
  • /path/to/node/local/storage is the location of a non-shared filesystem on each node that can be used to store each node's configuration, state, and HDFS data.

The examples/ directory contains torque.qsub which illustrates how this would look in practice.

Before calling myhadoop-configure.sh, you MUST have JAVA_HOME defined in your environment and HADOOP_HOME defined in either your environment or your myhadoop/etc/myhadoop.conf file. myhadoop-configure.sh will look in $HADOOP_HOME/conf for the configuration templates it will use for your personal Hadoop cluster.

myhadoop-cleanup.sh

This is a courtesy script that simply deletes all of the data created by a Hadoop cluster on all of the cluster's nodes. A proper batch system should do this automatically.

To run myhadoop-cleanup.sh, you must have HADOOP_HOME and HADOOP_CONF_DIR defined in your environment.

myhadoop-bootstrap.sh

This is another courtesy script that wraps myhadoop-configure.sh to simplify the process of creating Hadoop clusters for users. When run from within a batch environment (e.g., a batch job or an interactive job), it will gather the necessary information from the batch system to run myhadoop-configure.sh, run it, and create a file called "setenv.sourceme" that contains all of the instructions a user will need to connect to his or her newly spawned Hadoop cluster and begin using it.

The examples/ directory contains bootstrap.qsub that illustrates how a user may use myhadoop-boostrap.sh in all supported batch environments. He or she simply has to submit this script and wait for "setenv.sourceme" to appear in his or her directory. Once it appears, he or she can "cat" the file and follow the instructions contained within to get to an environment where he or she can begin interacting with his or her cluster using the "hadoop" command.

myhadoop's People

Contributors

glennklockwood avatar hugomeiland avatar molden avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

myhadoop's Issues

myhadoop requires bash 4

myhadoop requires bash 4 and the documentation should reflect this. On systems with bash 3 as the default, we should hashbang with /usr/bin/env bash to allow non-system bash4 installations.

myspark still calls start-master.sh

myspark still starts the master daemon using start-master.sh which contains the following code:

. "$sbin/spark-config.sh"

if [ -f "${SPARK_CONF_DIR}/spark-env.sh" ]; then
  . "${SPARK_CONF_DIR}/spark-env.sh"
fi

This causes start-master.sh to always ignore SPARK_CONF_DIR passed from the user environment, which can cause unexpected behavior within the myHadoop framework.

bad umask can hose hdfs

With persistent mode and umask 0022,

2014-04-28 13:05:40,634 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Invalid directory in dfs.data.dir: Incorrect permission for /tmp/hdfs_data, expected: rwxr-xr-x, while actual: rwxrwxr-x

This may not be unique to persistent mode, but this is how it manifests on a system that makes a new group for every user.

dfs.namenode.name.dir should be a URI in hdfs-site.xml

14/04/14 14:31:06 WARN common.Util: Path /scratch/glock/1296990.gordon-fe2.local/namenode_data should be specified as a URI in configuration files. Please update hdfs configuration.
14/04/14 14:31:06 WARN common.Util: Path /scratch/glock/1296990.gordon-fe2.local/namenode_data should be specified as a URI in configuration files. Please update hdfs configuration.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.