Giter Club home page Giter Club logo

hhdfs's Introduction

HHDFS

Haskell Hadoop Distributed File System

HHDFS is an re-implementation of HDFS in Haskell. As this is a University related project, our main focus will be on the performance and scalability of HHDFS. So do not expect HHDFS to be very robust. Take a look at our features to see what we have implemented.

Features

  • Data chunking
  • Replication pipelining
  • Client-side streaming io
  • Remote client access

Architecture

The architecture of HHDFS is mostly a copy of HDFS. We therefore recommend you to read the documentation

We made various simplifications and some changes to the architecture described in the docs.

The replication factor, for example, is not something that is specified for each file but is instead fixed on the Namenode side.

Most of the features described in the Robustness section of HDFS have not been implemented. There is also currently no notion of a Heartbeat in HHDFS, although it could easily be added.

A change to the architecture we made is adding a 'proxy' to each node. The proxy sits between the client and the Namenode/Datanode. It is used to allow clients to remotely connect to the network. The proxy of the Namenode simply passed any received messages from the client to the Namenode and it also sends any responses from the Namenode back to the client. The proxy of the Datanode also does reading from and writing to the local filesystem.

Installation

To install on you system simply clone this repository to a folder. Then from that folder type

cabal sandbox init
cabal install --only-dependencies
cabal build

Usage

You can manually start up the network or use our testing scripts, which are located in the test folder.

To manually start the network:

First start up a NameNode supplying the local host and a port:

./hhdfs namenode 127.0.0.1 44444

Now you can start a number of Datanodes. This time also supply the host and port of the Namenode:

./hhdfs datanode 127.0.0.1 44446 127.0.0.1 44444

In this case we are starting up a Datanode on port 44446 and also telling him that the Namenode is running on port 44444.

It is important that in this case you do not use port 44445 for the Datanode as each Node uses 2 ports: The given port and the same port + 1

Finally you can start up a Client only suppling the host and port of the NameNode:

./hhdfs client 127.0.0.1 44444

It is possible for the Client to not be on the same Network as the Namenode and Datanodes, but you have to make sure yourself that any messages from the Client to the Namenode or Datanode are correctly forwarded (e.g. port forwarding).

Using our scripts to startup a network:

Linux:

Before being able to start the network you need to ensure that you have a working gnome-terminal, which should be the default terminal. Then you have to add a profile to the terminal named "KEEPOPEN". The only thing you have to change is that after executing a script the terminal will remain open.

OSX:

The startup script just assume you are using iTerm as you terminal application. The run.osx.sh can be easily adapted to whatever terminal application you are currently using.


These scripts have proven to be working on our own systems: Linux Mint 17.3 and OSX 10.10.4

  1. ./build.sh
  2. ./clean.sh
  3. ./run.linux.sh
  4. ./run.osx.sh
  5. ./runTest.sh

./build.sh will build the project and copy the executable to the folders located in this folder.

./clean.sh will remove any persistent data from the network (and client).

./runTest.sh will run the test client. It will immediately start a test. Simply press enter to run the next. There is a total of three tests.

Linux:

./run.linux.sh will start a Namenode, a number of Datanodes and a client. This script assumes you are using the gnome-terminal

OSX:

./run.osx.sh starts up a Namenode, a client and a number of Datanodes. This is a simple osascript file that launches up everything in a new terminal window.

Usage of the client:

There are five commands possible:

  • show - Lists all files on the network
  • write local remote - Writes to the network
  • read remote - Reads from the network
  • quit - Closes client application
  • help - Shows the above comands

Authors

Giovanni Garufi Wilco Kusee Ferdinand van Walree

hhdfs's People

Contributors

ferdinand-vw avatar detrumi avatar

Watchers

James Cloos avatar  avatar  avatar Giovanni Garufi avatar  avatar

Forkers

nazrhom

hhdfs's Issues

Error: Kevent does not exist (No such file or Directory)

When running on OSX 10.10.4 I occasionally get this error on one of the Datanodes. The same operation
may or may not cause this problem so I am assuming its somehow related to the OS scheduling of filesystem reads/writes.
Once the Datanode throws the error it does not immediately stop, inspection of the log shows that some operations are performed after the error is thrown but the Datanode will eventually become unresponsive potentially brining the whole system to a stop.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.