Giter Club home page Giter Club logo

ferry's Introduction

Ferry: Big Data Development Environment using Docker

Ferry lets you launch, run, and manage big data clusters on AWS, OpenStack, and your local machine. It does this by leveraging awesome technologies such as Docker. Ferry currently supports:

  • Hadoop/YARN (version 2.5.1)
  • Cassandra (version 2.1.0) + Titan (0.3.1)
  • Spark (version 1.1.0)
  • GlusterFS (version 3.5)
  • Open MPI (version 1.8.1)

All you have to do to start is specify your stack using YAML or download a pre-existing application.

Why?

Ferry is made for developers and data scientists that want to develop big data applications without the fuss of setting up the infrastructure. It will help you:

  • Experiment with big data technologies, such as Hadoop or Cassandra without having to learn the intricacies of configuring each software
  • Share and evaluate other people's big data application quickly and safely via Dockerfiles
  • Develop and test applications locally before being deployed onto an operational cluster

Because Ferry uses Docker underneath, each virtual cluster is completely isolated. That means you can create multiple clusters for different applications.

Getting started

Ferry is a Python application and runs on your local machine. All you have to do to get started is have docker installed and type the following pip install -U ferry. More detailed installation instructions and examples can be found here.

Once installed, you can create your big data application using YAML files.

   backend:
      - storage:
           personality: "gluster"
           instances: 2
        compute:
           - personality: "yarn"
             instances: 2
   connectors:
      - personality: "hadoop-client"
        name: "control-0"

This stack consists of two GlusterFS data nodes, and two Hadoop/YARN compute nodes. There's also an Ubuntu-based client that automatically connects to those backend components. Of course you can substitute your own customized client.

To create this stack, just type ferry start yarn. Once you create the stack, you can log in by typing ferry ssh control-0.

Contributing

Contributions are totally welcome. Here are some suggestions on how to get started:

  • Use Ferry, report bugs, and file new features! By filing issues and sharing your experience, you will help improve the software for others.
  • Create Dockerfiles for your favorite backend, especially if you think the installation process is harder than it should be. The Dockerfile can be basic and we'll work together to get it ready for other users.
  • Create a new configuration module for your backend. This one is more complicated since it will involve actually hacking Ferry, but it's not so hard if we work together.

I strongly recommend using GitHub issues + pull requests for contributions. Tweets sent to @open_core_io are also welcome. Happy hacking!

Under the hood

Ferry leverages some awesome open source projects:

  • Docker simplifies the management of Linux containers
  • Python programming language
  • Hadoop is a general-purpose big data storage and processing framework
  • GlusterFS is a parallel filesystem actively developed by Redhat
  • OpenMPI is a scalable MPI implementation focused on modeling & simulation
  • Cassandra is a highly scalable column store
  • PostgreSQL is a popular relational database

ferry's People

Contributors

iosusan avatar jhorey avatar petro-rudenko avatar renzok avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.