Giter Club home page Giter Club logo

tgl-1's Introduction

TGL: A General Framework for Temporal Graph Training on Billion-Scale Graphs

Overview

This repo is the open-sourced code for our work TGL: A General Framework for Temporal Graph Training on Billion-Scale Graphs.

Requirements

  • python >= 3.6.13
  • pytorch >= 1.8.1
  • pandas >= 1.1.5
  • numpy >= 1.19.5
  • dgl >= 0.6.1
  • pyyaml >= 5.4.1
  • tqdm >= 4.61.0
  • pybind11 >= 2.6.2
  • g++ >= 7.5.0
  • openmp >= 201511

Our temporal sampler is implemented using C++, please compile the sampler first with the following command

python setup.py build_ext --inplace

Dataset

[2022/06/29] We noticed that we uploaded the wrong version of the GDELT dataset and have uploaded the correct version. Please re-download all the files in the GDELT folder. Sorry of any inconvenience created.

The four datasets used in our paper are available to download from AWS S3 bucket using the down.sh script. The total download size is around 350GB.

To use your own dataset, you need to put the following files in the folder \DATA\\<NameOfYourDataset>\

  1. edges.csv: The file that stores temporal edge informations. The csv should have the following columns with the header as ,src,dst,time,ext_roll where each of the column refers to edge index (start with zero), source node index (start with zero), destination node index, time stamp, extrapolation roll (0 for training edges, 1 for validation edges, 2 for test edges). The CSV should be sorted by time ascendingly.
  2. ext_full.npz: The T-CSR representation of the temporal graph. We provide a script to generate this file from edges.csv. You can use the following command to use the script

    python gen_graph.py --data <NameOfYourDataset>

  3. edge_features.pt (optional): The torch tensor that stores the edge featrues row-wise with shape (num edges, dim edge features). Note: at least one of edge_features.pt or node_features.pt should present.
  4. node_features.pt (optional): The torch tensor that stores the node featrues row-wise with shape (num nodes, dim node features). Note: at least one of edge_features.pt or node_features.pt should present.
  5. labels.csv (optional): The file contains node labels for dynamic node classification task. The csv should have the following columns with the header as ,node,time,label,ext_roll where each of the column refers to node label index (start with zero), node index (start with zero), time stamp, node label, extrapolation roll (0 for training node labels, 1 for validation node labels, 2 for test node labels). The CSV should be sorted by time ascendingly.

Configuration Files

We provide example configuration files for five temporal GNN methods: JODIE, DySAT, TGAT, TGN and TGAT. The configuration files for single GPU training are located at /config/ while the multiple GPUs training configuration files are located at /config/dist/.

The provided configuration files are all tested to be working. If you want to use your own network architecture, please refer to /config/readme.yml for the meaining of each entry in the yaml configuration file. As our framework is still under development, it possible that some combination of the confiruations will lead to bug.

Run

Currently, our framework only supports extrapolation setting (inference for the future).

Single GPU Link Prediction

python train.py --data <NameOfYourDataset> --config <PathToConfigFile>

MultiGPU Link Prediction

python -m torch.distributed.launch --nproc_per_node=<NumberOfGPUs+1> train_dist.py --data <NameOfYourDataset> --config <PathToConfigFile> --num_gpus <NumberOfGPUs>

Dynamic Node Classification

Currenlty, TGL only supports performing dynamic node classification using the dynamic node embedding generated in link prediction.

For Single GPU models, directly run

python train_node.py --data <NameOfYourDATA> --config <PathToConfigFile> --model <PathToSavedModel>

For multi-GPU models, you need to first generate the dynamic node embedding

python -m torch.distributed.launch --nproc_per_node=<NumberOfGPUs+1> extract_node_dist.py --data <NameOfYourDataset> --config <PathToConfigFile> --num_gpus <NumberOfGPUs> --model <PathToSavedModel>

After generating the node embeding for multi-GPU models, run

python train_node.py --data <NameOfYourDATA> --model <PathToSavedModel>

Security

See CONTRIBUTING for more information.

Cite

If you use TGL in a scientific publication, we would appreciate citations to the following paper:

@article{zhou2022tgl,
    title={{TGL}: A General Framework for Temporal GNN Training on Billion-Scale Graphs},
    author={Zhou, Hongkuan and Zheng, Da and Nisa, Israt and Ioannidis, Vasileios and Song, Xiang and Karypis, George},
    year = {2022},
    journal = {Proc. VLDB Endow.},
    volume = {15},
    number = {8},
}

License

This project is licensed under the Apache-2.0 License.

tgl-1's People

Contributors

zheng-da avatar tedzhouhk avatar congweilin avatar

Stargazers

Gracia avatar

Forkers

wyq-carol

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.