Giter Club home page Giter Club logo

giraph-docker's Introduction

Standalone Apache Giraph Docker image

The graph processing system Giraph takes some effort to get set up properly. We built this image to make it easier to try things out on Giraph. It's published at the Docker hub here.

The image is based on the SequenceIQ pseudo-distributed standalone Hadoop Docker image. We built it using a snapshot of the Giraph repo for compatibility with Hadoop 2.7.x and Yarn. When a Giraph release supports these versions, we will switch to that.

Using the image

Pull the image

The image is released through Docker's automated build repository. You can get it like this:

docker pull riyadparvez/giraph-docker

Start a container

Once you've pulled the image, you can run it like this:

docker run --volume $HOME:/myhome --rm --interactive --tty riyadparvez/giraph-docker /etc/giraph-bootstrap.sh -bash

Once it starts you'll be at a root prompt where you can run Giraph jobs.

(Explanation of flags: the --volume $HOME:/myhome flag maps your home directory outside the container to the /myhome directory inside the container. The --rm flag cleans up the image once you shut it down so it doesn't take up disk space. The --interactive and --tty flags make the container behave as you'd expect for human usage. riyadparvez/giraph-docker is the name of the image. /etc/giraph-bootstrap.sh is the script that starts the Hadoop and Zookeeper daemons, and with the -bash option it dumps you into a shell so you can use them.)

Running an example

Here's how to run the Giraph single-source shortest paths example app on a small dataset.

First, change to the Giraph source directory.

cd $GIRAPH_HOME

Now, prepare some input. We've left a simple example graph in this directory; to process it with Giraph you must first copy it into HDFS:

$HADOOP_HOME/bin/hdfs dfs -put tiny-graph.txt /user/root/input/tiny-graph.txt

Now we can run the example:

$HADOOP_HOME/bin/hadoop jar \
 /usr/local/giraph/giraph-examples/target/giraph-examples-1.1.0-for-hadoop-2.7.1-jar-with-dependencies.jar \
 org.apache.giraph.GiraphRunner org.apache.giraph.examples.SimplePageRankComputation \
 --yarnjars giraph-examples-1.1.0-for-hadoop-2.7.1-jar-with-dependencies.jar \
 --workers 1 \
 --vertexInputFormat org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \
 --vertexInputPath /user/root/input/tiny-graph.txt \
 --vertexOutputFormat org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
 --outputPath /user/root/output

Eventually you'll see Completed Giraph: org.apache.giraph.examples.SimpleShortestPathsComputation: SUCCEEDED. Now you can examine the output:

$HADOOP_HOME/bin/hdfs dfs -cat /user/root/output/part-m-00001

Compiling and running your own Giraph code

Here's how to build a simple example. We'll use a quick-and-dirty method to build a jar file containing all necessary Giraph code along with our code by copying the Giraph examples jar with dependences and adding our code to that.

First we'll create a workspace using the directory you mapped inside the container. On your system (outside of Docker), create a working directory outside the container in your home directory (which we mapped inside the container), along with a package directory for your code:

mkdir $HOME/giraph-work
mkdir $HOME/giraph-work/mypackage

Our example will be a simple computation that just updates weights on vertices and passes the graph through unmodified. Put the following example code in $HOME/giraph-work/mypackage/DummyComputation.java on your system (outside of Docker):

package mypackage;

import org.apache.giraph.graph.BasicComputation;
import org.apache.giraph.graph.Vertex;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.io.WritableComparable;

import org.apache.giraph.conf.LongConfOption;
import org.apache.giraph.edge.Edge;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.FloatWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.log4j.Logger;

import java.io.IOException;

public class DummyComputation extends BasicComputation<
        LongWritable, DoubleWritable, FloatWritable, DoubleWritable> {
    @Override
    public void compute(Vertex<LongWritable, DoubleWritable, FloatWritable> vertex,
                        Iterable<DoubleWritable> messages) throws IOException {
        vertex.setValue(new DoubleWritable(1.0));
        vertex.voteToHalt();
    }
}

Now, go inside the Docker container and compile the code. Set the classpath to include both the Giraph examples jar with dependences along with the auto-generated Hadoop classpath:

cd /myhome/giraph-work
javac -cp /usr/local/giraph/giraph-examples/target/giraph-examples-1.1.0-for-hadoop-2.7.1-jar-with-dependencies.jar:$($HADOOP_HOME/bin/hadoop classpath) mypackage/DummyComputation.java

Now, we'll make a copy of the Giraph examples jar and add our class files to it.

cp /usr/local/giraph/giraph-examples/target/giraph-examples-1.1.0-for-hadoop-2.7.1-jar-with-dependencies.jar ./myjar.jar
jar uf myjar.jar mypackage

Now we can run the code using the extended jar file:

$HADOOP_HOME/bin/hadoop \
 jar myjar.jar org.apache.giraph.GiraphRunner \
 mypackage.DummyComputation \
 --yarnjars myjar.jar \
 --workers 1 \
 --vertexInputFormat org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \
 --vertexInputPath /user/root/input/tiny-graph.txt \
 --vertexOutputFormat org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
 --outputPath /user/root/dummy-output

giraph-docker's People

Contributors

riyadparvez avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.