Giter Club home page Giter Club logo

graphstorm's Introduction

GraphStorm

GraphStorm is a graph machine learning (GML) framework for enterprise use cases. It simplifies the development, training and deployment of GML models for industry-scale graphs by providing scalable training and inference pipelines of Graph Machine Learning (GML) models for extremely large graphs (measured in billons of nodes and edges). GraphStorm provides a collection of built-in GML models and users can train a GML model with a single command without writing any code. To help develop SOTA models, GraphStorm provides a large collection of configurations for customizing model implementations and training pipelines to improve model performance. GraphStorm also provides a programming interface to train any custom GML model in a distributed manner. Users provide their own model implementations and use GraphStorm training pipeline to scale.

GraphStorm architecture

Get Started

Installation

GraphStorm is compatible to Python 3.7+. It requires PyTorch 1.13+, DGL 1.0 and transformers 4.3.0+.

To install GraphStorm in your environment, you can clone the repository and run python3 setup.py install to install it. However, running GraphStorm in a distributed environment is non-trivial. Users need to install dependencies and configure distributed Pytorch running environments. For this reason, we highly recommend users to using Docker container to run GraphStorm. A guideline to build GraphStorm docker image and run it on Amazon EC2 can be found at here.

Run GraphStorm with OGB datasets

Note: we assume users have setup their Docker container following the Build a Docker image from source instructions. All following commands run within a Docker container.

Start the GraphStorm docker container First, start your docker container by running the following command:

nvidia-docker run --network=host -v /dev/shm:/dev/shm/ -d --name test <graphstomr-image-name>

After running the container as a daemon, you need to connect to your container:

docker container exec -it test /bin/bash

Node classification on OGB arxiv graph First, use the below command to download the OGB arxiv data and process it into a DGL graph for the node classification task.

python3 /graphstorm/tools/gen_ogb_dataset.py --savepath /tmp/ogbn-arxiv-nc/ --retain_original_features true

Second, use the below command to partition this arxiv graph into a distributed graph that GraphStorm can use as its input.

python3 /graphstorm/tools/partition_graph.py --dataset ogbn-arxiv \
                                             --filepath /tmp/ogbn-arxiv-nc/ \
                                             --num_parts 1 \
                                             --num_trainers_per_machine 4 \
                                             --output /tmp/ogbn_arxiv_nc_train_val_1p_4t

GraphStorm distributed training relies on ssh to launch training jobs. These containers run ssh services in port 2222. Users need to collect the IP addresses of all machines and put all IP addresses in an ip_list.txt file, in which every row is an IP address. We suggest users to provide the ip_list.txt file’s absolute path in the launch script. If run GraphStorm training in a single machine, the ip_list.txt only contains one row as below.

127.0.0.1

NOTE: please do NOT leave blank lines in the ip_list.txt.

Third, run the below command to train an RGCN model to perform node classification on the partitioned arxiv graph.

python3 ~/dgl/tools/launch.py \
        --workspace /tmp/ogbn-arxiv-nc \
        --num_trainers 1 \
        --num_servers 1 \
        --num_samplers 0 \
        --part_config /tmp/ogbn_arxiv_nc_train_val_1p_4t/ogbn-arxiv.json \
        --ip_config  /tmp/ogbn-arxiv-nc/ip_list.txt \
        --ssh_port 2222 \
        "python3 /graphstorm/training_scripts/gsgnn_np/gsgnn_np.py \
        --cf /graphstorm/training_scripts/gsgnn_np/arxiv_nc.yaml \
        --ip-config /tmp/ogbn-arxiv-nc/ip_list.txt \
        --part-config /tmp/ogbn_arxiv_nc_train_val_1p_4t/ogbn-arxiv.json \
        --save-perf-results-path /tmp/ogbn-arxiv-nc/"

Link Prediction on OGB MAG graph First, use the below command to download the OGB MAG data and process it into a DGL graph for the link prediction task. The edge type for prediction is “author,writes,paper”. The command also set 80% of the edges of this type for training and validation (default 10%), and the rest 20% for testing.

python3 /graphstorm/tools/gen_mag_dataset.py --savepath /tmp/ogbn-mag-lp/ --edge_pct 0.8

Second, use the following command to partition the MAG graph into a distributed format.

python3 /graphstorm/tools/partition_graph_lp.py --dataset ogbn-mag \
                                                --filepath /tmp/ogbn-mag-lp/ \
                                                --num_parts 1 \
                                                --num_trainers_per_machine 4 \
                                                --predict_etypes author,writes,paper \
                                                --output /tmp/ogbn_mag_lp_train_val_1p_4t

Third, run the below command to train an RGCN model to perform link prediction on the partitioned MAG graph.

python3 ~/dgl/tools/launch.py \
        --workspace /tmp/ogbn-mag-lp/ \
        --num_trainers 1 \
        --num_servers 1 \
        --num_samplers 0 \
        --part_config /tmp/ogbn_mag_lp_train_val_1p_4t/ogbn-mag.json \
        --ip_config /tmp/ogbn-mag-lp/ip_list.txt \
        --ssh_port 2222 \
        "python3 /graphstorm/training_scripts/gsgnn_lp/gsgnn_lp.py \
        --cf /graphstorm/training_scripts/gsgnn_lp/mag_lp.yaml \
        --num-gpus 1 \
        --ip-config /tmp/ogbn-mag-lp/ip_list.txt \
        --part-config /tmp/ogbn_mag_lp_train_val_1p_4t/ogbn-mag.json \
        --feat-name paper:feat \
        --save-model-path /tmp/ogbn-mag/ \
        --save-perf-results-path /tmp/ogbn-mag/"

Limitation

GraphStorm framework only works on GPU environments. It was only tested on AWS instances equipped with NVidia GPUs including P4, V100, A10 and A100.

License

This project is licensed under the Apache-2.0 License.

graphstorm's People

Contributors

amazon-auto avatar classicsong avatar prateekdesai04 avatar zheng-da avatar zhjwy9343 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.