dmlc / dgl Goto Github PK

Python package built to ease deep learning on graph, on top of existing DL frameworks.

License: Apache License 2.0

Python 62.17% Shell 0.26% CMake 0.58% C++ 23.10% C 0.26% Batchfile 0.05% Cuda 6.39% Cython 0.20% Jupyter Notebook 6.98%

dgl's Introduction

DGL is an easy-to-use, high performance and scalable Python package for deep learning on graphs. DGL is framework agnostic, meaning if a deep graph model is a component of an end-to-end application, the rest of the logics can be implemented in any major frameworks, such as PyTorch, Apache MXNet or TensorFlow.

Figure: DGL Overall Architecture

Highlighted Features

A GPU-ready graph library

DGL provides a powerful graph object that can reside on either CPU or GPU. It bundles structural data as well as features for better control. We provide a variety of functions for computing with graph objects including efficient and customizable message passing primitives for Graph Neural Networks.

A versatile tool for GNN researchers and practitioners

The field of graph deep learning is still rapidly evolving and many research ideas emerge by standing on the shoulders of giants. To ease the process, DGl-Go is a command-line interface to get started with training, using and studying state-of-the-art GNNs. DGL collects a rich set of example implementations of popular GNN models of a wide range of topics. Researchers can search for related models to innovate new ideas from or use them as baselines for experiments. Moreover, DGL provides many state-of-the-art GNN layers and modules for users to build new model architectures. DGL is one of the preferred platforms for many standard graph deep learning benchmarks including OGB and GNNBenchmarks.

Easy to learn and use

DGL provides plenty of learning materials for all kinds of users from ML researchers to domain experts. The Blitz Introduction to DGL is a 120-minute tour of the basics of graph machine learning. The User Guide explains in more details the concepts of graphs as well as the training methodology. All of them include code snippets in DGL that are runnable and ready to be plugged into one’s own pipeline.

Scalable and efficient

It is convenient to train models using DGL on large-scale graphs across multiple GPUs or multiple machines. DGL extensively optimizes the whole stack to reduce the overhead in communication, memory consumption and synchronization. As a result, DGL can easily scale to billion-sized graphs. Get started with the tutorials and user guide for distributed training. See the system performance note for the comparison with other tools.

Get Started

Users can install DGL from pip and conda. You can also download GPU enabled DGL docker containers (backended by PyTorch) from NVIDIA NGC for both x86 and ARM based linux systems. Advanced users can follow the instructions to install from source.

For absolute beginners, start with the Blitz Introduction to DGL. It covers the basic concepts of common graph machine learning tasks and a step-by-step on building Graph Neural Networks (GNNs) to solve them.

For acquainted users who wish to learn more,

Experience state-of-the-art GNN models in only two command-lines using DGL-Go.
Learn DGL by example implementations of popular GNN models.
Read the User Guide (中文版链接), which explains the concepts and usage of DGL in much more details.
Go through the tutorials for advanced features like stochastic training of GNNs, training on multi-GPU or multi-machine.
Study classical papers on graph machine learning alongside DGL.
Search for the usage of a specific API in the API reference manual, which organizes all DGL APIs by their namespace.

All the learning materials are available at our documentation site. If you are new to deep learning in general, check out the open source book Dive into Deep Learning.

Community

Get connected

We provide multiple channels to connect you to the community of the DGL developers, users, and the general GNN academic researchers:

Our Slack channel, click to join
Our discussion forum: https://discuss.dgl.ai/
Our Zhihu blog (in Chinese)
Monthly GNN User Group online seminar (event link | past videos)

Take the survey here and leave any feedback to make DGL better fit for your needs. Thanks!

DGL-powered projects

DGL-LifeSci: a DGL-based package for various applications in life science with graph neural networks. https://github.com/awslabs/dgl-lifesci
DGL-KE: a high performance, easy-to-use, and scalable package for learning large-scale knowledge graph embeddings. https://github.com/awslabs/dgl-ke
Benchmarking GNN: https://github.com/graphdeeplearning/benchmarking-gnns
OGB: a collection of realistic, large-scale, and diverse benchmark datasets for machine learning on graphs. https://ogb.stanford.edu/
Graph4NLP: an easy-to-use library for R&D at the intersection of Deep Learning on Graphs and Natural Language Processing. https://github.com/graph4ai/graph4nlp
GNN-RecSys: https://github.com/je-dbl/GNN-RecSys
Amazon Neptune ML: a new capability of Neptune that uses Graph Neural Networks (GNNs), a machine learning technique purpose-built for graphs, to make easy, fast, and more accurate predictions using graph data. https://aws.amazon.com/cn/neptune/machine-learning/
GNNLens2: Visualization tool for Graph Neural Networks. https://github.com/dmlc/GNNLens2
RNAGlib: A package to facilitate construction, analysis, visualization and machine learning on RNA 2.5D Graphs. Includes a pre-built dataset: https://rnaglib.cs.mcgill.ca
OpenHGNN: Model zoo and benchmarks for Heterogeneous Graph Neural Networks. https://github.com/BUPT-GAMMA/OpenHGNN
TGL: A graph learning framework for large-scale temporal graphs. https://github.com/amazon-research/tgl
gtrick: Bag of Tricks for Graph Neural Networks. https://github.com/sangyx/gtrick
ArangoDB-DGL Adapter: Import ArangoDB graphs into DGL and vice-versa. https://github.com/arangoml/dgl-adapter
DGLD: DGLD is an open-source library for Deep Graph Anomaly Detection based on pytorch and DGL.

Awesome Papers Using DGL

Benchmarking Graph Neural Networks, Vijay Prakash Dwivedi, Chaitanya K. Joshi, Thomas Laurent, Yoshua Bengio, Xavier Bresson
Open Graph Benchmarks: Datasets for Machine Learning on Graphs, NeurIPS'20, Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, Jure Leskovec
DropEdge: Towards Deep Graph Convolutional Networks on Node Classification, ICLR'20, Yu Rong, Wenbing Huang, Tingyang Xu, Junzhou Huan
Discourse-Aware Neural Extractive Text Summarization, ACL'20, Jiacheng Xu, Zhe Gan, Yu Cheng, Jingjing Liu
GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training, KDD'20, Jiezhong Qiu, Qibin Chen, Yuxiao Dong, Jing Zhang, Hongxia Yang, Ming Ding, Kuansan Wang, Jie Tang
DGL-KE: Training Knowledge Graph Embeddings at Scale, SIGIR'20, Da Zheng, Xiang Song, Chao Ma, Zeyuan Tan, Zihao Ye, Jin Dong, Hao Xiong, Zheng Zhang, George Karypis
Improving Graph Neural Network Expressivity via Subgraph Isomorphism Counting, Giorgos Bouritsas, Fabrizio Frasca, Stefanos Zafeiriou, Michael M. Bronstein
INT: An Inequality Benchmark for Evaluating Generalization in Theorem Proving, Yuhuai Wu, Albert Q. Jiang, Jimmy Ba, Roger Grosse
Finding Patient Zero: Learning Contagion Source with Graph Neural Networks, Chintan Shah, Nima Dehmamy, Nicola Perra, Matteo Chinazzi, Albert-László Barabási, Alessandro Vespignani, Rose Yu
FeatGraph: A Flexible and Efficient Backend for Graph Neural Network Systems, SC'20, Yuwei Hu, Zihao Ye, Minjie Wang, Jiali Yu, Da Zheng, Mu Li, Zheng Zhang, Zhiru Zhang, Yida Wang

BP-Transformer: Modelling Long-Range Context via Binary Partitioning., Zihao Ye, Qipeng Guo, Quan Gan, Xipeng Qiu, Zheng Zhang
OptiMol: Optimization of Binding Affinities in Chemical Space for Drug Discovery, Jacques Boitreaud,Vincent Mallet, Carlos Oliver, Jérôme Waldispühl
JAKET: Joint Pre-training of Knowledge Graph and Language Understanding, Donghan Yu, Chenguang Zhu, Yiming Yang, Michael Zeng
Architectural Implications of Graph Neural Networks, Zhihui Zhang, Jingwen Leng, Lingxiao Ma, Youshan Miao, Chao Li, Minyi Guo
Combining Reinforcement Learning and Constraint Programming for Combinatorial Optimization, Quentin Cappart, Thierry Moisan, Louis-Martin Rousseau1, Isabeau Prémont-Schwarz, and Andre Cire
Therapeutics Data Commons: Machine Learning Datasets and Tasks for Therapeutics (code repo), Kexin Huang, Tianfan Fu, Wenhao Gao, Yue Zhao, Yusuf Roohani, Jure Leskovec, Connor W. Coley, Cao Xiao, Jimeng Sun, Marinka Zitnik
Sparse Graph Attention Networks, Yang Ye, Shihao Ji
On Self-Distilling Graph Neural Network, Yuzhao Chen, Yatao Bian, Xi Xiao, Yu Rong, Tingyang Xu, Junzhou Huang
Learning Robust Node Representations on Graphs, Xu Chen, Ya Zhang, Ivor Tsang, and Yuangang Pan
Recurrent Event Network: Autoregressive Structure Inference over Temporal Knowledge Graphs, Woojeong Jin, Meng Qu, Xisen Jin, Xiang Ren
Graph Neural Ordinary Differential Equations, Michael Poli, Stefano Massaroli, Junyoung Park, Atsushi Yamashita, Hajime Asama, Jinkyoo Park
FusedMM: A Unified SDDMM-SpMM Kernel for Graph Embedding and Graph Neural Networks, Md. Khaledur Rahman, Majedul Haque Sujon, , Ariful Azad
An Efficient Neighborhood-based Interaction Model for Recommendation on Heterogeneous Graph, KDD'20 Jiarui Jin, Jiarui Qin, Yuchen Fang, Kounianhua Du, Weinan Zhang, Yong Yu, Zheng Zhang, Alexander J. Smola
Learning Interaction Models of Structured Neighborhood on Heterogeneous Information Network, Jiarui Jin, Kounianhua Du, Weinan Zhang, Jiarui Qin, Yuchen Fang, Yong Yu, Zheng Zhang, Alexander J. Smola
Graphein - a Python Library for Geometric Deep Learning and Network Analysis on Protein Structures, Arian R. Jamasb, Pietro Lió, Tom L. Blundell
Graph Policy Gradients for Large Scale Robot Control, Arbaaz Khan, Ekaterina Tolstaya, Alejandro Ribeiro, Vijay Kumar
Heterogeneous Molecular Graph Neural Networks for Predicting Molecule Properties, Zeren Shui, George Karypis
Could Graph Neural Networks Learn Better Molecular Representation for Drug Discovery? A Comparison Study of Descriptor-based and Graph-based Models, Dejun Jiang, Zhenxing Wu, Chang-Yu Hsieh, Guangyong Chen, Ben Liao, Zhe Wang, Chao Shen, Dongsheng Cao, Jian Wu, Tingjun Hou
Principal Neighbourhood Aggregation for Graph Nets, Gabriele Corso, Luca Cavalleri, Dominique Beaini, Pietro Liò, Petar Veličković
Collective Multi-type Entity Alignment Between Knowledge Graphs, Qi Zhu, Hao Wei, Bunyamin Sisman, Da Zheng, Christos Faloutsos, Xin Luna Dong, Jiawei Han
Graph Representation Forecasting of Patient's Medical Conditions: towards A Digital Twin, Pietro Barbiero, Ramon Viñas Torné, Pietro Lió
Relational Graph Learning on Visual and Kinematics Embeddings for Accurate Gesture Recognition in Robotic Surgery, Yong-Hao Long, Jie-Ying Wu, Bo Lu, Yue-Ming Jin, Mathias Unberath, Yun-Hui Liu, Pheng-Ann Heng and Qi Dou
Dark Reciprocal-Rank: Boosting Graph-Convolutional Self-Localization Network via Teacher-to-student Knowledge Transfer, Takeda Koji, Tanaka Kanji
Graph InfoClust: Leveraging Cluster-Level Node Information For Unsupervised Graph Representation Learning, Costas Mavromatis, George Karypis
GraphSeam: Supervised Graph Learning Framework for Semantic UV Mapping, Fatemeh Teimury, Bruno Roy, Juan Sebastian Casallas, David macdonald, Mark Coates
Comprehensive Study on Molecular Supervised Learning with Graph Neural Networks, Doyeong Hwang, Soojung Yang, Yongchan Kwon, Kyung Hoon Lee, Grace Lee, Hanseok Jo, Seyeol Yoon, and Seongok Ryu
A graph auto-encoder model for miRNA-disease associations prediction, Zhengwei Li, Jiashu Li, Ru Nie, Zhu-Hong You, Wenzheng Bao
Graph convolutional regression of cardiac depolarization from sparse endocardial maps, STACOM 2020 workshop, Felix Meister, Tiziano Passerini, Chloé Audigier, Èric Lluch, Viorel Mihalef, Hiroshi Ashikaga, Andreas Maier, Henry Halperin, Tommaso Mansi
AttnIO: Knowledge Graph Exploration with In-and-Out Attention Flow for Knowledge-Grounded Dialogue, EMNLP'20, Jaehun Jung, Bokyung Son, Sungwon Lyu
Learning from Non-Binary Constituency Trees via Tensor Decomposition, COLING'20, Daniele Castellana, Davide Bacciu
Inducing Alignment Structure with Gated Graph Attention Networks for Sentence Matching, Peng Cui, Le Hu, Yuanchao Liu
Enhancing Extractive Text Summarization with Topic-Aware Graph Neural Networks, COLING'20, Peng Cui, Le Hu, Yuanchao Liu
Double Graph Based Reasoning for Document-level Relation Extraction, EMNLP'20, Shuang Zeng, Runxin Xu, Baobao Chang, Lei Li
Systematic Generalization on gSCAN with Language Conditioned Embedding, AACL-IJCNLP'20, Tong Gao, Qi Huang, Raymond J. Mooney
Automatic selection of clustering algorithms using supervised graph embedding, Noy Cohen-Shapira, Lior Rokach
Improving Learning to Branch via Reinforcement Learning, Haoran Sun, Wenbo Chen, Hui Li, Le Song
A Practical Guide to Graph Neural Networks, Isaac Ronald Ward, Jack Joyner, Casey Lickfold, Stash Rowe, Yulan Guo, Mohammed Bennamoun, code
APAN: Asynchronous Propagation Attention Network for Real-time Temporal Graph Embedding, SIGMOD'21, Xuhong Wang, Ding Lyu, Mengjian Li, Yang Xia, Qi Yang, Xinwen Wang, Xinguang Wang, Ping Cui, Yupu Yang, Bowen Sun, Zhenyu Guo, Junkui Li
Uncertainty-Matching Graph Neural Networks to Defend Against Poisoning Attacks, Uday Shankar Shanthamallu, Jayaraman J. Thiagarajan, Andreas Spanias
Computing Graph Neural Networks: A Survey from Algorithms to Accelerators, Sergi Abadal, Akshay Jain, Robert Guirado, Jorge López-Alonso, Eduard Alarcón
NHK_STRL at WNUT-2020 Task 2: GATs with Syntactic Dependencies as Edges and CTC-based Loss for Text Classification, Yuki Yasuda, Taichi Ishiwatari, Taro Miyazaki, Jun Goto
Relation-aware Graph Attention Networks with Relational Position Encodings for Emotion Recognition in Conversations, Taichi Ishiwatari, Yuki Yasuda, Taro Miyazaki, Jun Goto
PGM-Explainer: Probabilistic Graphical Model Explanations for Graph Neural Networks, Minh N. Vu, My T. Thai
A Generalization of Transformer Networks to Graphs, Vijay Prakash Dwivedi, Xavier Bresson
Discourse-Aware Neural Extractive Text Summarization, ACL'20, Jiacheng Xu, Zhe Gan, Yu Cheng, Jingjing Liu
Learning Robust Node Representations on Graphs, Xu Chen, Ya Zhang, Ivor Tsang, Yuangang Pan
Adaptive Graph Diffusion Networks with Hop-wise Attention, Chuxiong Sun, Guoshi Wu
The Photoswitch Dataset: A Molecular Machine Learning Benchmark for the Advancement of Synthetic Chemistry, Aditya R. Thawani, Ryan-Rhys Griffiths, Arian Jamasb, Anthony Bourached, Penelope Jones, William McCorkindale, Alexander A. Aldrick, Alpha A. Lee
A community-powered search of machine learning strategy space to find NMR property prediction models, Lars A. Bratholm, Will Gerrard, Brandon Anderson, Shaojie Bai, Sunghwan Choi, Lam Dang, Pavel Hanchar, Addison Howard, Guillaume Huard, Sanghoon Kim, Zico Kolter, Risi Kondor, Mordechai Kornbluth, Youhan Lee, Youngsoo Lee, Jonathan P. Mailoa, Thanh Tu Nguyen, Milos Popovic, Goran Rakocevic, Walter Reade, Wonho Song, Luka Stojanovic, Erik H. Thiede, Nebojsa Tijanic, Andres Torrubia, Devin Willmott, Craig P. Butts, David R. Glowacki, Kaggle participants
Adaptive Layout Decomposition with Graph Embedding Neural Networks, Wei Li, Jialu Xia, Yuzhe Ma, Jialu Li, Yibo Lin, Bei Yu, DAC'20
Transfer Learning with Graph Neural Networks for Optoelectronic Properties of Conjugated Oligomers, J. Chem. Phys. 154, Chee-Kong Lee, Chengqiang Lu, Yue Yu, Qiming Sun, Chang-Yu Hsieh, Shengyu Zhang, Qi Liu, and Liang Shi
Jet tagging in the Lund plane with graph networks, Journal of High Energy Physics 2021, Frédéric A. Dreyer and Huilin Qu
Global Attention Improves Graph Networks Generalization, Omri Puny, Heli Ben-Hamu, and Yaron Lipman
Learning over Families of Sets -- Hypergraph Representation Learning for Higher Order Tasks, SDM 2021, Balasubramaniam Srinivasan, Da Zheng, and George Karypis
SSFG: Stochastically Scaling Features and Gradients for Regularizing Graph Convolution Networks, Haimin Zhang, Min Xu
Application and evaluation of knowledge graph embeddings in biomedical data, PeerJ Computer Science 7:e341, Mona Alshahrani, Maha A. Thafar, Magbubah Essack
MoTSE: an interpretable task similarity estimator for small molecular property prediction tasks, bioRxiv 2021.01.13.426608, Han Li, Xinyi Zhao, Shuya Li, Fangping Wan, Dan Zhao, Jianyang Zeng
Reinforcement Learning For Data Poisoning on Graph Neural Networks, Jacob Dineen, A S M Ahsan-Ul Haque, Matthew Bielskas
Generalising Recursive Neural Models by Tensor Decomposition, IJCNN'20, Daniele Castellana, Davide Bacciu
Tensor Decompositions in Recursive Neural Networks for Tree-Structured Data, ESANN'20, Daniele Castellana, Davide Bacciu
Combining Self-Organizing and Graph Neural Networks for Modeling Deformable Objects in Robotic Manipulation, Frotiers in Robotics and AI, Valencia, Angel J., and Pierre Payeur
Joint stroke classification and text line grouping in online handwritten documents with edge pooling attention networks, Pattern Recognition, Jun-Yu Ye, Yan-Ming Zhang, Qing Yang, Cheng-Lin Liu
Toward Accurate Predictions of Atomic Properties via Quantum Mechanics Descriptors Augmented Graph Convolutional Neural Network: Application of This Novel Approach in NMR Chemical Shifts Predictions, The Journal of Physical Chemistry Letters, Peng Gao, Jie Zhang, Yuzhu Sun, and Jianguo Yu
A Graph Neural Network to Model User Comfort in Robot Navigation, Pilar Bachiller, Daniel Rodriguez-Criado, Ronit R. Jorvekar, Pablo Bustos, Diego R. Faria, Luis J. Manso
Medical Entity Disambiguation Using Graph Neural Networks, Alina Vretinaris, Chuan Lei, Vasilis Efthymiou, Xiao Qin, Fatma Özcan
Chemistry-informed Macromolecule Graph Representation for Similarity Computation and Supervised Learning, Somesh Mohapatra, Joyce An, Rafael Gómez-Bombarelli
Characterizing and Forecasting User Engagement with In-app Action Graph: A Case Study of Snapchat, Yozen Liu, Xiaolin Shi, Lucas Pierce, Xiang Ren
GIPA: General Information Propagation Algorithm for Graph Learning, Qinkai Zheng, Houyi Li, Peng Zhang, Zhixiong Yang, Guowei Zhang, Xintan Zeng, Yongchao Liu
Graph Ensemble Learning over Multiple Dependency Trees for Aspect-level Sentiment Classification, NAACL'21, Xiaochen Hou, Peng Qi, Guangtao Wang, Rex Ying, Jing Huang, Xiaodong He, Bowen Zhou
Enhancing Scientific Papers Summarization with Citation Graph, AAAI'21, Chenxin An, Ming Zhong, Yiran Chen, Danqing Wang, Xipeng Qiu, Xuanjing Huang
Improving Graph Representation Learning by Contrastive Regularization, Kaili Ma, Haochen Yang, Han Yang, Tatiana Jin, Pengfei Chen, Yongqiang Chen, Barakeel Fanseu Kamhoua, James Cheng
Extract the Knowledge of Graph Neural Networks and Go Beyond it: An Effective Knowledge Distillation Framework, WWW'21, Cheng Yang, Jiawei Liu, Chuan Shi
VIKING: Adversarial Attack on Network Embeddings via Supervised Network Poisoning, PAKDD'21, Viresh Gupta, Tanmoy Chakraborty
Knowledge Graph Embedding using Graph Convolutional Networks with Relation-Aware Attention, Nasrullah Sheikh, Xiao Qin, Berthold Reinwald, Christoph Miksovic, Thomas Gschwind, Paolo Scotton
SLAPS: Self-Supervision Improves Structure Learning for Graph Neural Networks, Bahare Fatemi, Layla El Asri, Seyed Mehran Kazemi
Finding Needles in Heterogeneous Haystacks, AAAI'21, Bijaya Adhikari, Liangyue Li, Nikhil Rao, Karthik Subbian
RetCL: A Selection-based Approach for Retrosynthesis via Contrastive Learning, IJCAI 2021, Hankook Lee, Sungsoo Ahn, Seung-Woo Seo, You Young Song, Eunho Yang, Sung-Ju Hwang, Jinwoo Shin
Accurate Prediction of Free Solvation Energy of Organic Molecules via Graph Attention Network and Message Passing Neural Network from Pairwise Atomistic Interactions, Ramin Ansari, Amirata Ghorbani
DIPS-Plus: The Enhanced Database of Interacting Protein Structures for Interface Prediction, Alex Morehead, Chen Chen, Ada Sedova, Jianlin Cheng
Coreference-Aware Dialogue Summarization, SIGDIAL'21, Zhengyuan Liu, Ke Shi, Nancy F. Chen
Document Structure aware Relational Graph Convolutional Networks for Ontology Population, arXiv, Abhay M Shalghar, Ayush Kumar, Balaji Ganesan, Aswin Kannan, Shobha G
Covid-19 Detection from Chest X-ray and Patient Metadata using Graph Convolutional Neural Networks, Thosini Bamunu Mudiyanselage, Nipuna Senanayake, Chunyan Ji, Yi Pan, Yanqing Zhang
Rossmann-toolbox: a deep learning-based protocol for the prediction and design of cofactor specificity in Rossmann fold proteins, Briefings in Bioinformatics, Kamil Kaminski, Jan Ludwiczak, Maciej Jasinski, Adriana Bukala, Rafal Madaj, Krzysztof Szczepaniak, Stanislaw Dunin-Horkawicz
LGESQL: Line Graph Enhanced Text-to-SQL Model with Mixed Local and Non-Local Relations, ACL'21, Ruisheng Cao, Lu Chen, Zhi Chen, Yanbin Zhao, Su Zhu, Kai Yu
Enhancing Graph Neural Networks via auxiliary training for semi-supervised node classification, Knowledge-Based System'21, Yao Wu, Yu Song, Hong Huang, Fanghua Ye, Xing Xie, Hai Jin
Modeling Graph Node Correlations with Neighbor Mixture Models, Linfeng Liu, Michael C. Hughes, Li-Ping Liu
COMBINING PHYSICS AND MACHINE LEARNING FOR NETWORK FLOW ESTIMATION, ICLR'21, Arlei Silva, Furkan Kocayusufoglu, Saber Jafarpour, Francesco Bullo, Ananthram Swami, Ambuj Singh
A Classification Method for Academic Resources Based on a Graph Attention Network, Future Internet'21, Jie Yu, Yaliu Li, Chenle Pan and Junwei Wang
Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture, Seung Won Min, Kun Wu, Sitao Huang, Mert Hidayetoğlu, Jinjun Xiong, Eiman Ebrahimi, Deming Chen, Wen-mei Hwu
Graph Attention Multi-Layer Perception, Wentao Zhang, Ziqi Yin, Zeang Sheng, Wen Ouyang, Xiaosen Li, Yangyu Tao, Zhi Yang, Bin Cui
GNNLens: A Visual Analytics Approach for Prediction Error Diagnosis of Graph Neural Networks, Zhihua Jin, Yong Wang, Qianwen Wang, Yao Ming, Tengfei Ma, Huamin Qu
How Attentive are Graph Attention Networks?, Shaked Brody, Uri Alon, Eran Yahav, code
SCENE: Reasoning about Traffic Scenes using Heterogeneous Graph Neural Networks, Thomas Monninger*, Julian Schmidt*, Jan Rupprecht, David Raba, Julian Jordan, Daniel Frank, Steffen Staab, Klaus Dietmayer, code, *co-first authors

Contributing

Please let us know if you encounter a bug or have any suggestions by filing an issue.

We welcome all contributions from bug fixes to new features and extensions.

We expect all contributions discussed in the issue tracker and going through PRs. Please refer to our contribution guide.

Cite

If you use DGL in a scientific publication, we would appreciate citations to the following paper:

@article{wang2019dgl,
    title={Deep Graph Library: A Graph-Centric, Highly-Performant Package for Graph Neural Networks},
    author={Minjie Wang and Da Zheng and Zihao Ye and Quan Gan and Mufei Li and Xiang Song and Jinjing Zhou and Chao Ma and Lingfan Yu and Yu Gai and Tianjun Xiao and Tong He and George Karypis and Jinyang Li and Zheng Zhang},
    year={2019},
    journal={arXiv preprint arXiv:1909.01315}
}

The Team

DGL is developed and maintained by NYU, NYU Shanghai, AWS Shanghai AI Lab, and AWS MXNet Science Team.

License

DGL uses Apache License 2.0.

dgl's People

Contributors

Stargazers

Watchers

Forkers

jalamao sufeidechabei zhouyonglong jhy1993 chenghuige hyzcn dthboyd iszhaoxin libertatis hulalazz szha renly zheng-da onisimchukv renyi533 anonymous0429 giserh jermainewang zch42 chaoso chaoyue729 frankchu0229 lizimingchina allenanswerzq tornadomeet flylearning breakdawn yzh119 shaunstanislauslau lingfanyu fengziyjun crazyjvm yfyang86 ginobilinie bozheng-hit eric-haibin-lin kevinthesun ahoyosid sumeetkr ml-lab truongbuu hungz23 beesitech zhizhid hhy5277 ashwarypande nguyenducnhaty hsouporto franlucchini earlbabson stevenlol o7s8r6 vdeepak13 primus-zhao dailyactie ncantrell narayanmahto ziyuehuang mahshad92 cclauss jaykimbravekjh sungjinlees yflyzhang rishotics hdjsjyl fendaq mrunix gaoyz0625 barclayii vikingmew zhouyy92 astonzhang catyans zhuyiche hecosysu yf817 ruizewang pythseq patriciaxiao brettkoonce zcrwind subhankar21 szldmgy gaiyu0 tbmihailov csgcmai uranuszs handasontam ustc-miner azamatb yongwww cvignac mori97 codeaudit ashuein chocl8camellia gaimjkp gnn2qsu vishalbelsare yanjunli-cs

dgl's Issues

[Improvement] SSE consumes too much memory during forward

Just a simple note. It seems that there are memory issue when gradient is not required. Need verify.

[DISCUSSION] Potential confusion for accessing edges

In the doc for dgl.DGLGraph, it says that we can access the edges with either g.edges[src, dest] or g.edges[edge_id] for a single edge. But this can be a bit confusing as one might wonder what if I want to access two edges with edge ids. Although careful users will realize that we only allow accessing multiple edges by edge ids through a list of edge ids, this is just confusing. Perhaps we should forbid using a single int for edge id to access the associated edge and require the users to consistently use a list of ints for edge ids?

Dynamically maintaining line graph?

As discussed in the meeting earlier, Junction Tree VAE requires us to perform partial loopy-BP-like message passing (in particular, edge messages are computed in DFS order) while generating nodes when decoding a tree during inference.

It seems that we somehow need to maintain a dynamic non-backtracking line graph, handling (appended) new edges on-the-fly, in order to use the DGL reduce primitive there.

Just for bookkeeping this issue.

[RFC] DGLGraph APIs Related to Dynamic Graphs

Here are two small issues for DGLGraph APIs about whether the model/tutorial code can be cleaner:

set_n_initializer and set_e_initializer

For most of the time, we simply initialize features as zero tensors. Therefore it will be more convenient if we can simply call g.set_n_initializer() with

def set_n_initializer(self, initializer=None):
    """Set the initializer for empty node features.
    Initializer is a callable that returns a tensor given the shape, data type
    and device context.
    Parameters
    ----------
    initializer : callable
        The initializer.
    """
    if initializer is None:
        initializer = lambda shape, dtype, ctx: F.zeros(shape, device=ctx)
    self._node_frame.set_initializer(initializer)

Similar for set_e_initializer.

Set features
When we are generating a graph from scratch, the first time we add a feature after adding the first node, we must call g.ndata[key] = value. After there is at least a row for the column, the next time we add another node and its feature, we can call g.nodes[v].data[key] = value. This is due to L281, frame.py. Is it possible to allow g.nodes[v].data[key] = value even when the column is added as long as v == range(g.number_of_nodes())? Similar for edges.

[DISCUSSION] GraphIndex profiling result

Here is a profiling result on my machine (Intel(R) Xeon(R) CPU E5-1620 v2 @ 3.70GHz). The code is in https://github.com/jermainewang/dgl-workspace/tree/master/profile-graph-index . It measure the time of each API calls with different graph sizes. All the time is measured in seconds.

Some highlights:

Adding 1M nodes takes 0.1s.
Adding 1M nodes and 5M edges takes 2.63s. Clear it takes ~1s.
Querying the edge ids of 5M edges takes 0.53s.
Querying in edges of 1M nodes (5M edges) takes 0.1s.
Geting all the edges of a 1M graph takes 50ms. (The cost is simply the memory copying from the internal vector to an IdArray).
Node subgraph:
- Creating a 10K size subgraph out of a 1M graph (E=5M) takes 0.0147s.
- Creating a 100K size subgraph out of a 1M graph (E=5M) takes 0.4841s.
Creating the adjacency matrix in pytorch of a 100K graph takes 0.0063s. Creating the incidence matrix in pytorch of a 100K graph takes 0.0127s.
Batching 100 graphs each of 50 size takes 0.0051s.

@zheng-da @BarclayII @ylfdq1118 @GaiYu0 Please see which one is the potential problem in current models.

Profiling add nodes: 100
Profiling add nodes: 1000
Profiling add nodes: 10000
Profiling add nodes: 100000
Profiling add nodes: 1000000

=== add nodes ===
+----------+----------+
| 1.0E+02  | 0.0000   |
+----------+----------+
| 1.0E+03  | 0.0001   |
+----------+----------+
| 1.0E+04  | 0.0009   |
+----------+----------+
| 1.0E+05  | 0.0106   |
+----------+----------+
| 1.0E+06  | 0.1023   |
+----------+----------+

Profiling clear nodes: 100
Profiling clear nodes: 1000
Profiling clear nodes: 10000
Profiling clear nodes: 100000
Profiling clear nodes: 1000000

=== clear nodes ===
+----------+----------+
| 1.0E+02  | 0.0000   |
+----------+----------+
| 1.0E+03  | 0.0000   |
+----------+----------+
| 1.0E+04  | 0.0000   |
+----------+----------+
| 1.0E+05  | 0.0005   |
+----------+----------+
| 1.0E+06  | 0.0059   |
+----------+----------+

Profiling add edges: 500
Profiling add edges: 5000
Profiling add edges: 50000
Profiling add edges: 500000
Profiling add edges: 5000000

=== add edges ===
+----------+----------+
| 5.0E+02  | 0.0001   |
+----------+----------+
| 5.0E+03  | 0.0007   |
+----------+----------+
| 5.0E+04  | 0.0084   |
+----------+----------+
| 5.0E+05  | 0.2007   |
+----------+----------+
| 5.0E+06  | 2.6291   |
+----------+----------+

Profiling clear edges: 500
Profiling clear edges: 5000
Profiling clear edges: 50000
Profiling clear edges: 500000
Profiling clear edges: 5000000

=== clear edges ===
+----------+----------+
| 5.0E+02  | 0.0000   |
+----------+----------+
| 5.0E+03  | 0.0002   |
+----------+----------+
| 5.0E+04  | 0.0014   |
+----------+----------+
| 5.0E+05  | 0.0613   |
+----------+----------+
| 5.0E+06  | 0.9252   |
+----------+----------+

Profiling has nodes: 5
Profiling has nodes: 50
Profiling has nodes: 500
Profiling has nodes: 5000
Profiling has nodes: 50000

=== has nodes (5%) ===
+----------+----------+
| 5.0E+00  | 0.0000   |
+----------+----------+
| 5.0E+01  | 0.0000   |
+----------+----------+
| 5.0E+02  | 0.0000   |
+----------+----------+
| 5.0E+03  | 0.0000   |
+----------+----------+
| 5.0E+04  | 0.0001   |
+----------+----------+

Profiling has edges: 25
Profiling has edges: 250
Profiling has edges: 2500
Profiling has edges: 25000
Profiling has edges: 250000

=== has edges (5%) ===
+----------+----------+
| 2.5E+01  | 0.0000   |
+----------+----------+
| 2.5E+02  | 0.0000   |
+----------+----------+
| 2.5E+03  | 0.0001   |
+----------+----------+
| 2.5E+04  | 0.0008   |
+----------+----------+
| 2.5E+05  | 0.0165   |
+----------+----------+

Profiling edge ids: 500
Profiling edge ids: 5000
Profiling edge ids: 50000
Profiling edge ids: 500000
Profiling edge ids: 5000000

=== edge ids ===
+----------+----------+
| 5.0E+02  | 0.0001   |
+----------+----------+
| 5.0E+03  | 0.0002   |
+----------+----------+
| 5.0E+04  | 0.0015   |
+----------+----------+
| 5.0E+05  | 0.0392   |
+----------+----------+
| 5.0E+06  | 0.5345   |
+----------+----------+

Profiling in edges: 100
Profiling in edges: 1000
Profiling in edges: 10000
Profiling in edges: 100000
Profiling in edges: 1000000

=== in edges ===
+----------+----------+
| 1.0E+02  | 0.0001   |
+----------+----------+
| 1.0E+03  | 0.0001   |
+----------+----------+
| 1.0E+04  | 0.0003   |
+----------+----------+
| 1.0E+05  | 0.0061   |
+----------+----------+
| 1.0E+06  | 0.1020   |
+----------+----------+

Profiling out edges: 100
Profiling out edges: 1000
Profiling out edges: 10000
Profiling out edges: 100000
Profiling out edges: 1000000

=== out edges ===
+----------+----------+
| 1.0E+02  | 0.0001   |
+----------+----------+
| 1.0E+03  | 0.0001   |
+----------+----------+
| 1.0E+04  | 0.0003   |
+----------+----------+
| 1.0E+05  | 0.0061   |
+----------+----------+
| 1.0E+06  | 0.0983   |
+----------+----------+

Profiling edges: 100
Profiling edges: 1000
Profiling edges: 10000
Profiling edges: 100000
Profiling edges: 1000000

=== edges ===
+----------+----------+
| 1.0E+02  | 0.0001   |
+----------+----------+
| 1.0E+03  | 0.0001   |
+----------+----------+
| 1.0E+04  | 0.0002   |
+----------+----------+
| 1.0E+05  | 0.0040   |
+----------+----------+
| 1.0E+06  | 0.0534   |
+----------+----------+

Profiling edges: 100
Profiling edges: 1000
Profiling edges: 10000
Profiling edges: 100000
Profiling edges: 1000000

=== edges sorted ===
+----------+----------+
| 1.0E+02  | 0.0001   |
+----------+----------+
| 1.0E+03  | 0.0004   |
+----------+----------+
| 1.0E+04  | 0.0037   |
+----------+----------+
| 1.0E+05  | 0.0432   |
+----------+----------+
| 1.0E+06  | 0.5492   |
+----------+----------+

Profiling in degrees: 100
Profiling in degrees: 1000
Profiling in degrees: 10000
Profiling in degrees: 100000
Profiling in degrees: 1000000

=== in degrees ===
+----------+----------+
| 1.0E+02  | 0.0000   |
+----------+----------+
| 1.0E+03  | 0.0000   |
+----------+----------+
| 1.0E+04  | 0.0001   |
+----------+----------+
| 1.0E+05  | 0.0006   |
+----------+----------+
| 1.0E+06  | 0.0058   |
+----------+----------+

Profiling out degrees: 100
Profiling out degrees: 1000
Profiling out degrees: 10000
Profiling out degrees: 100000
Profiling out degrees: 1000000

=== out degrees ===
+----------+----------+
| 1.0E+02  | 0.0000   |
+----------+----------+
| 1.0E+03  | 0.0000   |
+----------+----------+
| 1.0E+04  | 0.0001   |
+----------+----------+
| 1.0E+05  | 0.0006   |
+----------+----------+
| 1.0E+06  | 0.0043   |
+----------+----------+

Profiling node subgraph: 1
Profiling node subgraph: 10
Profiling node subgraph: 100
Profiling node subgraph: 996
Profiling node subgraph: 9947

=== node subgraph (1%) ===
+----------+----------+
| 100      | 0.0001   |
+----------+----------+
| 1000     | 0.0001   |
+----------+----------+
| 10000    | 0.0002   |
+----------+----------+
| 100000   | 0.0008   |
+----------+----------+
| 1000000  | 0.0147   |
+----------+----------+

Profiling node subgraph: 5
Profiling node subgraph: 48
Profiling node subgraph: 488
Profiling node subgraph: 4898
Profiling node subgraph: 48790

=== node subgraph (5%) ===
+----------+----------+
| 100      | 0.0001   |
+----------+----------+
| 1000     | 0.0001   |
+----------+----------+
| 10000    | 0.0005   |
+----------+----------+
| 100000   | 0.0071   |
+----------+----------+
| 1000000  | 0.1671   |
+----------+----------+

Profiling node subgraph: 10
Profiling node subgraph: 93
Profiling node subgraph: 951
Profiling node subgraph: 9510
Profiling node subgraph: 95170

=== node subgraph (10%) ===
+----------+----------+
| 100      | 0.0001   |
+----------+----------+
| 1000     | 0.0002   |
+----------+----------+
| 10000    | 0.0020   |
+----------+----------+
| 100000   | 0.0231   |
+----------+----------+
| 1000000  | 0.4841   |
+----------+----------+

Profiling node subgraph: 18
Profiling node subgraph: 178
Profiling node subgraph: 1842
Profiling node subgraph: 18193
Profiling node subgraph: 181262

=== node subgraph (20%) ===
+----------+----------+
| 100      | 0.0001   |
+----------+----------+
| 1000     | 0.0004   |
+----------+----------+
| 10000    | 0.0041   |
+----------+----------+
| 100000   | 0.0634   |
+----------+----------+
| 1000000  | 1.1784   |
+----------+----------+

Profiling adjacency matrix: 100
Profiling adjacency matrix: 1000
Profiling adjacency matrix: 10000
Profiling adjacency matrix: 100000
Profiling adjacency matrix: 1000000

=== adjmat ===
+----------+----------+
| 100      | 0.0002   |
+----------+----------+
| 1000     | 0.0002   |
+----------+----------+
| 10000    | 0.0005   |
+----------+----------+
| 100000   | 0.0063   |
+----------+----------+
| 1000000  | 0.1354   |
+----------+----------+

Profiling incidence matrix: 100
Profiling incidence matrix: 1000
Profiling incidence matrix: 10000
Profiling incidence matrix: 100000
Profiling incidence matrix: 1000000

=== incmat ===
+----------+----------+
| 100      | 0.0003   |
+----------+----------+
| 1000     | 0.0003   |
+----------+----------+
| 10000    | 0.0010   |
+----------+----------+
| 100000   | 0.0127   |
+----------+----------+
| 1000000  | 0.3241   |
+----------+----------+

Profiling line graph: 100
Profiling line graph: 1000
Profiling line graph: 10000
Profiling line graph: 100000
Profiling line graph: 1000000

=== line graph (density 1/n) ===
+----------+----------+
| 1.0E+02  | 0.0001   |
+----------+----------+
| 1.0E+03  | 0.0010   |
+----------+----------+
| 1.0E+04  | 0.0074   |
+----------+----------+
| 1.0E+05  | 0.1506   |
+----------+----------+
| 1.0E+06  | 2.8589   |
+----------+----------+

Profiling union 5 graph(n=50):
Profiling union 10 graph(n=50):
Profiling union 50 graph(n=50):
Profiling union 100 graph(n=50):
Profiling union 500 graph(n=50):
Profiling union 5 graph(n=100):
Profiling union 10 graph(n=100):
Profiling union 50 graph(n=100):
Profiling union 100 graph(n=100):
Profiling union 500 graph(n=100):
Profiling union 5 graph(n=500):
Profiling union 10 graph(n=500):
Profiling union 50 graph(n=500):
Profiling union 100 graph(n=500):
Profiling union 500 graph(n=500):
Profiling union 5 graph(n=1000):
Profiling union 10 graph(n=1000):
Profiling union 50 graph(n=1000):
Profiling union 100 graph(n=1000):
Profiling union 500 graph(n=1000):

=== disjoint union graph ===
+----------+----------+
| 50x5     | 0.0003   |
+----------+----------+
| 50x10    | 0.0006   |
+----------+----------+
| 50x50    | 0.0025   |
+----------+----------+
| 50x100   | 0.0051   |
+----------+----------+
| 50x500   | 0.0300   |
+----------+----------+
| 100x5    | 0.0007   |
+----------+----------+
| 100x10   | 0.0011   |
+----------+----------+
| 100x50   | 0.0055   |
+----------+----------+
| 100x100  | 0.0116   |
+----------+----------+
| 100x500  | 0.0604   |
+----------+----------+
| 500x5    | 0.0031   |
+----------+----------+
| 500x10   | 0.0059   |
+----------+----------+
| 500x50   | 0.0351   |
+----------+----------+
| 500x100  | 0.0744   |
+----------+----------+
| 500x500  | 0.4055   |
+----------+----------+
| 1000x5   | 0.0064   |
+----------+----------+
| 1000x10  | 0.0130   |
+----------+----------+
| 1000x50  | 0.0839   |
+----------+----------+
| 1000x100 | 0.1733   |
+----------+----------+
| 1000x500 | 0.9125   |
+----------+----------+

Profiling partition 5 graph(n=50):
Profiling partition 10 graph(n=50):
Profiling partition 50 graph(n=50):
Profiling partition 100 graph(n=50):
Profiling partition 500 graph(n=50):
Profiling partition 5 graph(n=100):
Profiling partition 10 graph(n=100):
Profiling partition 50 graph(n=100):
Profiling partition 100 graph(n=100):
Profiling partition 500 graph(n=100):
Profiling partition 5 graph(n=500):
Profiling partition 10 graph(n=500):
Profiling partition 50 graph(n=500):
Profiling partition 100 graph(n=500):
Profiling partition 500 graph(n=500):
Profiling partition 5 graph(n=1000):
Profiling partition 10 graph(n=1000):
Profiling partition 50 graph(n=1000):
Profiling partition 100 graph(n=1000):
Profiling partition 500 graph(n=1000):

=== disjoint partition graph ===
+----------+----------+
| 50x5     | 0.0003   |
+----------+----------+
| 50x10    | 0.0006   |
+----------+----------+
| 50x50    | 0.0022   |
+----------+----------+
| 50x100   | 0.0044   |
+----------+----------+
| 50x500   | 0.0256   |
+----------+----------+
| 100x5    | 0.0005   |
+----------+----------+
| 100x10   | 0.0009   |
+----------+----------+
| 100x50   | 0.0038   |
+----------+----------+
| 100x100  | 0.0075   |
+----------+----------+
| 100x500  | 0.0423   |
+----------+----------+
| 500x5    | 0.0019   |
+----------+----------+
| 500x10   | 0.0033   |
+----------+----------+
| 500x50   | 0.0194   |
+----------+----------+
| 500x100  | 0.0368   |
+----------+----------+
| 500x500  | 0.2000   |
+----------+----------+
| 1000x5   | 0.0036   |
+----------+----------+
| 1000x10  | 0.0071   |
+----------+----------+
| 1000x50  | 0.0347   |
+----------+----------+
| 1000x100 | 0.0645   |
+----------+----------+
| 1000x500 | 0.3294   |
+----------+----------+

[BUG] Loop detected in BatchedDGLGraph, but no loop found in each DGLGraph

I'm running the latest version of TreeLSTM code, but I met a bug when using the built-in topological traversal.

The problem is that, apply topological_nodes_generator on each graph in the batch does not yield any error, but when it comes to the batched graph, it says loop detected in the given graph.

More specifically, only 995 nodes visited but there are 999 nodes in the batched graph overall. I've checked the edge number and node number of both the batched graph(999, 975) and each graph in the batch, they're both ok.

I guess the problem is with the BatchedDGLGraph, could you guys worked on this module please help locate the bug? To reproduce the bug, you may use the code here and run python train.py --gpu 0.

@BarclayII @ylfdq1118 @jermainewang

Two issues in set_e_repr()

The following code fails:

import networkx as nx
from dgl import DGLGraph
import torch

G = nx.DiGraph()
G.add_nodes_from([0, 1, 2])
G.add_edges_from([(0, 1), (0, 2)])

erepr = torch.randn(2, 20)

DG = DGLGraph(G)
DG.set_e_repr({'features': erepr}, [0, 0], [1, 2])

There are two causes:
(1) Converting a DiGraph to a DGLGraph does not initialize the edge list appropriately. An easy fix would be passing graph_data into _init_state() and set self._edge_list = list(graph_data.edges). Although I'm not sure what the order of edges should be (I don't think it matters).
(2) In set_e_repr():

        if u_is_all:
            # ...
        else:
            eid = self.cached_graph.get_edge_id(u, v)
            if isinstance(h_uv, dict):
                for key, val in h_uv.items():
                    self._edge_frame[key] = F.scatter_row(self._edge_frame[key], eid, val)
            else:
                self._edge_frame[__REPR__] = F.scatter_row(self._edge_frame[__REPR__], eid, h_uv)

Setting a new field with given u and v will refer to a non-existent edge frame.
Which leads to the question: do we want to allow partial set of edge/node frames before initialization? And it is not even "partial" setting in the example above; the user is merely specifying the edge order by themselves.

Problem of Memory Leak

In my current capsule implementation, I have to manually call del self.g in my code, unless it will raise CUDA Out of Memory Error. Not sure what's the problem.
Code link: here
Run by python main.py, link

[TODO] Remove anonymous fields?

In retrospect, I think the anonymous fields unnecessarily complicates the code base.
Allowing anonymous fields requires us additional if/else logic to decide whether to return/update a dictionary or a tensor (and as a result, we have two helper functions _get_repr() and _set_e_repr() solely for dealing with anonymous attributes), which are fundamentally different. Similar logic is also appearing in executors (see scheduler.py). I would suspect that we would have such kind of code appearing all over the place as we add in more features.
Moreover, the semantics of get_n_repr(), set_n_repr(), etc., has subtleties if anonymous field is present. Basically, we are still allowing anonymous fields and explicit fields to co-exist. Thus, if a user sets an anonymous field and then adds in another explicit field, he/she is unable to get the anonymous field again by calling get_n_repr(). (This is related to #21)
So my suggestion would be removing the anonymous fields altogether, and force the users to always pass in dictionaries, which shouldn't be a heavy burden for users but would simplify our code base.

[BUG] Could not set attributes in BatchedDGLGraph

After updated to the latest version, the following code does not work anymore:

import dgl
import torch as th
a = dgl.DGLGraph()
a.add_nodes(4)
b = dgl.DGLGraph()
b.add_nodes(3)
c = dgl.batch([a, b])
c.ndata['h'] = th.ones(7, 1)

Error info:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/zy1404/repos/dgl/python/dgl/view.py", line 61, in __setitem__
    self._graph.set_n_repr({key : val}, self._nodes)
  File "/home/zy1404/repos/dgl/python/dgl/graph.py", line 821, in set_n_repr
    self._node_frame[key] = val
  File "/home/zy1404/repos/dgl/python/dgl/frame.py", line 618, in __setitem__
    self.update_column(key, val, inplace=False)
  File "/home/zy1404/repos/dgl/python/dgl/frame.py", line 647, in update_column
    self._frame[name] = col
  File "/home/zy1404/repos/dgl/python/dgl/frame.py", line 291, in __setitem__
    self.update_column(name, data)
  File "/home/zy1404/repos/dgl/python/dgl/frame.py", line 364, in update_column
    (self.num_rows, len(col)))
dgl._ffi.base.DGLError: Expected data to have 0 rows, got 7.

About message and update function

I'm thinking about how to make the API cleaner. Basically, we only have two types of computations:

Computation on edge. It takes the states of the src and dst nodes, and the old states of the edge and returns the new states of the edges. The signature can be summarized as:
(node_states, node_states, edge_states) -> edge_states
Computation on node. It takes the states of the in-coming edges and the old node states to compute the new node states. The signature can be summarized as:
(list[edge_states], node_states) -> node_states

In networkx, all node and edge states are dictionaries (you can use type(g.nodes[node]) to verify this).

A message function can be viewed as a special edge computation as follows:

def edge_func(src_node_states, dst_node_states, edge_states):
  msg = ...  # compute message
  new_edge_states = edge_states
  new_edge_states['msg'] = msg
  return new_edge_states

Because this pattern is so common, we can let user only returns the msg and handle the edge state update by ourselves:

def mfunc(node_states, edge_states):
  #User-defined
  msg = ... # compute message
  return msg
def mfunc_wrapper(src_node_states, dst_node_states, edge_states):
  #System wrapper
  new_edge_states = edge_states
  new_edge_states['msg'] = mfunc(src_node_states, edge_states)
  return new_edge_states

Similarly, we can wrap the update function as node function as well:

def ufunc(msgs, edge_states, node_states):
  #User-defined
  new_node_states = ...  # compute new node state
  return new_node_states
def ufunc_wrapper(edge_states, node_states):
  #System wrapper
  msgs = [es['msg'] for es in edge_states]
  new_node_states = ufunc(msgs, edge_states, node_states)
  return new_node_states

I want to know your opinions about this implementation. Using this idea, we might have two basic APIs: register_edge_func and register_node_func. And have register_message_func and register_update_func be two APIs that simply wrap the above two.

State sharing between graphs

Let us discuss how to enable state sharing between graphs.

This page describes four kinds of semantics for graph duplication.

In the application of CDGNN (and many others), we are primarily interested in the "independent shallow" kind of semantics, which ensures that

g.nodes[node]['x'] is g.copy().nodes[node]['x']

The challenge is that we have to ensure this not only after duplication, but also after updating shared representations in either g or g.copy() (persistent sharing).

We can override copy, add_node_from and add_edge_from of networkx to let users tell us which representations to share. For example, after executing

h = g.copy(share=['x'])

the 'x' attribute will be shared between g and h, even if g or h is updated. We have to consider the implication of adding/deleting node/edge/attribute as well. The following is my suggestion:

Add a node/edge/attribute: in order to be consistent with the "independent shallow" semantics of networkx, this should only affect the graph that user adds node/edge/attribute to.
Delete a node/edge/attribute: I am not sure whether this is necessary. Maybe it suffices to override nx.subgraph as we override copy.

One possible implementation is to maintain a tensor containing shared representations and in-place update the tensor whenever an update of representation occurs. There are at least two problems with this:

In-place update of a tensor breaks autograd.
update_func may change the shape of representation, in which case we must create a new tensor for shared representations. After creating a new tensor, we have to update every graph attribute that refers to the previous tensor. Extra bookkeeping is necessary to know which attributes to update.

However, a similar solution may work. Pseudo-code:

class DGLGraph(nx.DiGraph):
  # ...
  def copy(self, as_view=False, share=[]):
    # For simplicity, assume that only node attributes may be shared.
    # The following line works if we override d.__getitem__, see below.
    for n, d in self.nodes.items():
      self.storage[n].update({x : d[x] for x in share})
    for d in self.nodes.values():
      for x in share:
        d[x] = __SHARED__
    g = super().copy(as_view)
    g.storage = self.storage
    return g

We override d.__getitem__ to lookup self.storage if an attribute equals __SHARED__. d.__setitem__ is override similarly.

[DISCUSSION] Overhead generated by TVM

I uploaded the JTNN profile result here. It's a JSON file generated by pyinstrument and it can be visualized using Python Flame Chart tool.

I found that among 401s of forward propagation including graph construction, 76s are spent on _make_tvm_args call alone, and another 10.6s on todgltensor call which converts an index to a TVMArray. Seems that the TVM backend is causing a non-trivial overhead in my case.

Probably it's time to switch to Cython and see how it goes?

Partial update with changing tensor shapes

Consider following node update function:

# Node update module
class NodeModule(nn.module):
  def __init__(self):
    self.linear = nn.Linear(in_features=100, out_features=10)
  def forward(self, node, accum):
    return nn.relu(self.linear(accum))

The update module transform the node features of length 100 to new features of length 10. Suppose we apply this update function to a subset of nodes, we have following issue:

In this example, only nodes 3,4,5 are updated. As a result, their node features are transformed to vector of length 10 (in the red part). In this case, they cannot be packed together with other node reprs in one tensor.

[BUG] Multi-Head Attention dimension error when calling built-in reduce function

When writing an attention module, a combination of fn.src_mul_edge and fn.sum is required, the first one for computing the attention score and the second one for computing the weighted sum of values(by attention score).

There are three ways to implement this:

def src_mul_edge_udf(edges):
    return {'sum': edges.src['h'] * edges.data['h']}

def sum_udf(nodes):
    return {'h': nodes.mailbox['sum'].sum(1)}

g.update_all(message_func=fn.src_mul_edge('h', 'h', 'sum'), reduce_func=fn.sum('sum', 'h')) # 1
g.update_all(message_func=src_mul_edge_1, reduce_func=fn.sum('sum', 'h')) # 2
g.update_all(message_func=src_mul_edge_1, reduce_func=sum_1) # 3

they all works fine when the node feature and edge feature are both 1d vectors.

However, when it comes to Multi-Head Attention, the node features are 2d tensors rather than a 1d vector.

import dgl
import torch as th
import dgl.function as fn

g = dgl.DGLGraph()
g.add_nodes(10)
for i in range(10):
    g.add_edges(i, range(10))

g.ndata['h'] = th.rand(10, 10, 5) # (N, number_of_heads, dim)
g.edata['h'] = th.rand(100, 10, 1) # (M, number_of_head, 1)

def src_mul_edge_1(edges):
    return {'sum': edges.src['h'] * edges.data['h']}

def sum_1(nodes):
    return {'h': nodes.mailbox['sum'].sum(1)}

#g.update_all(message_func=fn.src_mul_edge('h', 'h', 'sum'), reduce_func=fn.sum('sum', 'h'))
#g.update_all(message_func=src_mul_edge_1, reduce_func=fn.sum('sum', 'h'))
g.update_all(message_func=src_mul_edge_1, reduce_func=sum_1)

Then only the third statement works, which corresponds to using udf for both message_func and reduce_func. When calling the first two statements, we will get error message:

  File "/home/zy1404/repos/dgl/python/dgl/graph.py", line 1324, in update_all
    Runtime.run(prog)
  File "/home/zy1404/repos/dgl/python/dgl/runtime/runtime.py", line 8, in run
    exe.run()
  File "/home/zy1404/repos/dgl/python/dgl/runtime/ir/executor.py", line 252, in run
    C = F.spmm(spA, B)
  File "/home/zy1404/repos/dgl/python/dgl/backend/pytorch/tensor.py", line 109, in spmm
    return th.spmm(x, y)
RuntimeError: addmm: matrices expected, got 3D tensor

I wonder if that is the expected behavior?

Cannot get node attribute when setting by set_n_repr

Minimal Reproduce Code:

import dgl
import torch as th

g = dgl.DGLGraph()

g.add_node(0)
g.add_node(1)

g.set_n_repr(th.ones(2, 1))
print(g.nodes[0])
#{}
g.set_n_repr({'h': th.ones(2, 1)})
print(g.nodes[0])
#{}

However through gcn_batch.py example, we found when updata_all was called with batchable=True, everything works fine. When setting batchable=False in msg function ,it also fail to get the representation.

The link in the readme file is broken

Minor issue, as the title.

[BUG] Consecutive sends without recv

Here's the test I was running on cpp branch.

def test_send_twice():
    g = DGLGraph()
    g.add_nodes(2)
    g.add_edge(0, 1)
    def _message_a(src, edge):
        return {'a': src['a']}
    def _message_b(src, edge):
        return {'b': src['b']}
    def _reduce(node, msgs):
        return {'a': msgs['a'].sum(1), 'b': msgs['b'].sum(1)}

    old_a = th.randn(2, 5)
    old_b = th.randn(2, 5)
    g.set_n_repr({'a': old_a, 'b': old_b})

    g.send(0, 1, _message_a)
    g.send(0, 1, _message_b)
    g.recv([1], _reduce)

    new_repr = g.get_n_repr()

    assert th.allclose(new_repr['a'][1], old_a[0])
    assert th.allclose(new_repr['b'][1], old_b[0])

First, if I do two consecutive sends with different message keys, DGLGraph throws a KeyError: 'b' regardless of whether we are sending on the same node pairs/edges.

The next issue is that we are adding edges into a "message graph" during sends. This does not make a lot of sense on multigraphs if we allow sending between the same node pairs twice without recv, which will leave us with two edges in the message graph. Actually, in the test case above, I stepped into the _batch_recv method and found that the degree is 2 in the degree bucketing procedure.

So do we want to allow (1) consecutive sends without recv, and (2) consecutive sends between the same node pairs without recv?

If we were to allow both, I guess ultimately we would want to have the message graph represented as an edge subgraph, or somehow maintain a set of "already-added-edges" in the message graph.

Thoughts?

[BUG] 'TVMContext' object has no attribute 'type'

When running tree_lstm, I encountered a bug with the new tensor.py.

Description:

Traceback (most recent call last):
  File "train.py", line 123, in <module>
    main(args)
  File "train.py", line 78, in main
    giter = list(tensor_topo_traverse(graph, False, args))
  File "train.py", line 21, in tensor_topo_traverse
    adjmat = g._graph.adjacency_matrix().get(nd.cpu())
  File "/home/zy1404/repos/dgl/python/dgl/utils.py", line 284, in get
    self._ctx_dict[ctx] = self._generator(ctx)
  File "/home/zy1404/repos/dgl/python/dgl/graph_index.py", line 504, in <lambda>
    self._cache['adj'] = utils.CtxCachedObject(lambda ctx: F.copy_to(mat, ctx))
  File "/home/zy1404/repos/dgl/python/dgl/backend/pytorch/tensor.py", line 50, in copy_to
    if ctx.type == 'cpu':
AttributeError: 'TVMContext' object has no attribute 'type'

It seems in runtime_ctypes.py, TVMContext does not maintain type field, and the device_name method is not supported since _api_internel.py is empty.

Subgraph sampling

as proposed in this design, subgraph sampling has three steps:

compute the sampling probability. There are many different ways of computing sampling probability. The output of this step is the sampling probability associated with each vertex. That is, it's a 1D NDArray.

def compute_prob(g, method) => 1D NDArray

sample vertices based on the probability. The step traverses local neighborhoods from the seed vertices and sample neighbors based on the sampling probability. It returns one 1D array contains sampled vertices.

def sample_vertices(g, prob, seed_vertices, num_hops, traverse_method, max_samples) => 1D NDArray

construct an induced subgraph with sampled vertices. There is an efficient implementation in MXNet for CSRArrays. https://github.com/zheng-da/incubator-mxnet/blob/dgl_subgraph/src/operator/dgl_graph.cc#L192

def node_subgraph(g, vertices) => subgraph

Each step can be written as a module to make the design and implementation of each step simpler. A modular design also gives users the freedom of choosing a way to compute sampling probability and sampling strategies. Each step can be compute-intensive. Thus, it won't cause any performance overhead.

Running TODOs

Hi all,

Let's use this thread to book-keep all the major TODOs before our first major milestone.

High priority
Tasks that have a major contributor to be in charge of.

Low priority
Tasks that anyone can contribute when they don't have high priority tasks at the moment.

Applications

[RFC] C++ graph interface

The draft interface is here: https://github.com/jermainewang/dgl/blob/cpp/include/dgl/graph.h
It covers the requirements from networkx and current DGL's cached graph.

Currently, we have several assumptions on the graph:

Vertices/edges are indexed by integers starting from zero.
Deletion of vertices/edges is not allowed.
Attributes are not stored in the graph. Instread, they are stored in Frames.

The idea is to lower some graph manipulation logic to C++ side. For example, it should speedup adding multiple edges at once (i.e, DGLGraph::AddEdges), finding neighbors (i.e, DGLGraph::InEdges or DGLGraph::OutEdges) and subgraph creation (i.e, DGLGraph::Subgraph).

The graph is represented by an adjacency list structure.

Please leave your comments.

Sending and receiving multiple fields (batched)

There are several instances where messages and node reductions are over more than node['REPR'], for example:

FastGCN has a node weight vector which where q_i is applied on message v_i->v_j
Control variate sampling has a 'previous' representation on node reduction, e.g. sum(h[i], h_prev[i])

Presumably the best way to handle this is to pass in optional fields to the send_and_recv and related functions which batch a dictionary of tensors and returns the node['REPR'] batching on None.

Does this seem like a good strategy for this? I need to address this for 1, 2.

Forbid multiple edges when multigraph is False in DGLGraph

If I understand correctly, when a dgl.DGLGraph object is created, by default it is not a multigraph. However when multiple edges are added with the same end nodes and direction, no warnings or errors are raised and it gets multiple edges created as if the graph object is a multigraph.

[DISCUSSION] Pros and cons of the new signature of MessageFunc

The current interface of MessageFunc is

def MessageFunc(src, edge)

while its previous interface was

def MessageFunc(src, dst, edge)

The main advantage of the new interface is

many algorithms don't need the embedding of the destination vertices to generate messages. If we always pass the embedding of the destination vertices, we needs to fetch them from the Frame, which is unnecessary overhead.

The disadvantage of the new interface is

the expressiveness. There exist some algorithms (e.g., GAT, Genipath, Interaction Networks) that require both source and destination vertices to compute messages. With the new interface, the option of implementing these algorithms is to move all logics to the reduce function or use the edge update function.

The problem of moving all logics to the reduce function is that we actually prefer to keep the reduce function simple as discussed last time. For example, if the reduce function is simple, we can easily convert it into per-edge reduce function internally (this can be achieved in the Gluon symbol mode) to achieve better performance and reduce memory consumption.

The problem of using the edge update function is the memory complexity. If we need to generate edge data explicitly, which results in the memory complexity of O(|E|), while the algorithms can achieve O(|V|). Reducing memory complexity is very important in terms of both performance and scalability.

Another thing is that the new interface of MessageFunc and the one of EdgeUpdateFunc are now inconsistent for an unobvious reason. This might confuse users.

[DISCUSSION] Subgraph semantics

As mentioned in our last meeting, there are a couple of things to think about when implementing subgraph.

Does a subgraph support graph mutation? Can such mutation be seen by its parent graph?
We need to support both shared and non-shared mode on node/edge frames.
What if a subgraph is created from another subgraph? Who is its parent?

@ylfdq1118 and I have a couple of discussions last week. Here, we write down our thoughts.

For the python side subgraph class:

It needs to maintain a reference to its parent DGLGraph.
It needs to maintain node/edge mappings to its parent DGLGraph.

For graph index:

Currently, a subgraph is read-only on graph structure. So mutation is not allowed.
A new graph index will be created when a subgraph is created. There is no need to design fancy view data structure like networkx.

For node/edge frame:

Subgraph will have a copy_from_parent API that takes arguments of which node/edge features need to be copied. The API is used to copy node/edge features from its parent graph. [Note: name is tentative]
For "shared" mode, the copy_from_parent function will do nothing because the subgraph and parent graph already share the underlying frames.
For "non-shared" mode, the user needs to call copy_from_parent to explicitly copy node/edge features from parent graph.
- If the user tries to get node/edge features before copy_from_parent, s/he will get nothing.
- If the subgraph already has its own node/edge features, the copy_from_parent will override them.
- Any update on the subgraph's node/edge features will not be seen by the parent graph. And thus, the memory consumption is of the order of the subgraph size.
- To write the subgraph's node/edge features back to parent graph. There are two options:
  - The subgraph can have another write_to_parent API to write node/edge features back.
  - A functional API merge (currently here), to merge multiple subgraphs back to one parent. This should be more efficient than calling write_to_parent one by one.

For creating a subgraph from another subgraph:

Suppose we have a subgraph B created from another subgraph A, and subgraph A's parent is P.
If A is created in "shared" mode, then B can be viewed as created directly from P. (no need to maintain the tree hierarchy).
If A is created in "non-shared" mode, then B's parent is A. This means copy_from_parent and write_to_parent will only look at the node/edge frame of A.
- To ease the lookup of multiple hierarchy, we could have a recursive flag in the two calls to indicate that such read/write will recursively applied in the subgraph hierarchy.

@zheng-da please have a look.

How to support graph batch-norm?

With current API, we can implement batch-norm by adding a shadow node and pulling/pushing globally. However, this may not be very user-friendly. An ideal syntax is

batch_norm(node_reprs['x'])

set_e_repr(): behavior description when u and v equals ALL

It seems that for now set_e_repr() directly assigns the batched edge tensor storage when u == v == ALL. But how do I construct such a tensor storage? In particular, when u == v == ALL, which edge does each row of the tensor storage correspond to?

To mutable or not to mutable?

Here is a question we found when writing models using DGL. Let's first look at a following GCN example:

import torch.nn as nn
from dgl import DGLGraph
# Node update module
class NodeModule(nn.module):
  def __init__(self):
    self.linear = nn.Linear(in_features=100, out_features=10)
  def forward(self, node, accum):
    return nn.relu(self.linear(accum))
# GCN module
class GCN(nn.Module):
  def __init__(self):
    self.updmod = NodeModule()
  def forward(self, g):
    g.update_all(message_func='from_src',
                 reduce_func='sum',
                 update_func=self.updmod,
                 batchable=True)
    return g
# main loop
nx_graph, features, labels = load_data('cora')
gcn = GCN()
g = DGLGraph(nx_graph)
g.set_n_repr(features)
for epoch in range(MAX_EPOCH):
  gg = gcn(g)
  logits = gg.get_n_repr()
  loss = NLLLoss(logits, labels)
  loss.backward()
  # ...

Can you spot the bug in this code?

The bug is that we need to reset the input features to the graph at the beginning of each iteration:

# ...
g = DGLGraph(nx_graph)
for epoch in range(MAX_EPOCH):
  g.set_n_repr(features)  # Need to reset the features every iter!!
  gg = gcn(g)
# ...

This is a very subtle mistake, but we feel that this worth our attention. The reason of such mistake is that we are very used to what autograd DNN frameworks provide us -- immutability. In Pytorch, all tensor operations are immutable (i.e, they always return new tensors). As a result, following Pytorch code is totally fine:

X = # init features
for epoch in range(MAX_EPOCH):
  y = MyModel(X)
  loss = MyLossFunc(y)
  loss.backward()

As a contrast, because DGL is derived from networkx, our APIs are mutable. For example, the update_all does not return a new graph. It in fact changes the node representations internally. Because of this, even if user write gg = gcn(g), the gg and g are pointing to the same DGLGraph and the node reprs have been updated after update_all.

This is an inherent conflict of networkx and Pytorch/TF/MX. I want to know about the opinions from model developers (@GaiYu0 @BarclayII @ivanbrugere @zzhang-cn ). What do you guys think? Do you find this bug very subtle or actually a cunning pitfall? Do you like the current mutable graph or prefer immutable objects?

Here, I also want to share more about what if we want to support immutable graph object. What are the challenge and solution? To make DGLGraph immutable, we need to handle two things:

Transformation on node/edge features.
Transformation on graph structures.

For transformations on node/edge features (e.g. sendto, recv, update_all, ...), we can return a new graph containing different data as follows. In the following example, note that the update_all function returns a new graph g2. Initially the graph (i.e, g1) has three node features a1, a2, a3. The node update function reads all node features but only update a1 attribute. As a result, the new g2 node storage (called node frame) reuses the other two feature columns but use the newly generated a1 column. Such change to the system should have little overhead.

For transformations on graph structures, this is a little tricky. For users that are used to networkx, they actually expect mutable behaviors as follows:

import networkx as nx
g = nx.path_graph(3)
print(g.edges())  # (0, 1), (1, 2)
h = g
h.add_edge(0, 2)
print(g.edges())  # (0, 1), (0, 2), (1, 2)

We can change it to immutable by using Copy-On-Write. The question is shall we?

Anonymous representations

Currently, if an update function returns an object other than a dict, the object is supposed to be the new (anonymous) node/edge representation (named __N_REPR__ or __E_REPR__ implicitly). However, a node/edge with an anonymous representation may have explicitly named representations as well (those returned in a dict by an update function).

When we pass node/edge representations to user-defined functions, how to tell whether user demands anonymous representations or explicitly named representations? Maybe we need to enforce the rule that once a node/edge has an anonymous representation, it cannot have any explicitly named representations, and vice versa.

API rename

Currently, our API names are confusing (such as update_edge and update_by_edge). I proposed to have following changes:

sendto -> send to match recv
update_by_edge -> send_and_recv
update_to -> pull
update_from -> push

Two new APIs to easily apply changes to the node/edges:

apply_nodes: Equivalent to recv API without messages.
apply_edges: Equivalent to update_edge API without src/dst node features.

TODOs for C++ GraphIndex

Implementations
GraphIndex related:

Graph operators: Currently, only DisjointUnion has been implemented.
~~- [ ] Graph traversal: A topological traversal implementation in C++ that is used by tree-lstm example.~~ (Move to next PR for improving TreeLSTM)

BatchedGraph related batch.py:

Basic implementation: batch, unbatch.
~~- [ ] Need to finish other APIs: split, slice/update, readout.~~ (Move to next PR for improving DGMG)

Subgraph related: (@zheng-da )
Discussion thread #67

For graph index, currently only node subgraph has been implemented in C++ side.
~~On python side, we need to rewrite the DGLSubgraph class to support: shared-write on node/edge features.~~(move to next PR)

Necessary optimizations

Zero-copy between dgl.ndarray and torch.tensor.
Fused implementation of send_and_recv.
Lowering degree bucketing to C++.
Benchmark the throughput of GraphIndex

Project managements

New install script to also compile the cpp files.
New jenkins script for CI.
I suggest we setup dockerfile for our project to ease the above jenkins setup. Could borrow from TVM again.
Setup lint check for C++ files. Borrow from dmlc-core.

Others

Remove batchable=False mode and related tests/codes.
Remove anonymous representation.
Double check all the models. Benchmark their speed and accuracy under the new change. (TBD: LineGraph) (DGMG and JTNN will be included in the next PR)

The goal is to make sure that once the cpp branch is merged, all the tests and examples are still working. Therefore, the BatchedGraph and Subgraph utilities can be incomplete. They only need to serve the current examples.

No longer allowed to set new features for dgl.BatchedDGLGraph

In previous versions of dgl, it is allowed to add new field for a dgl.BatchedDGLGraph object. This seems no longer allowed now, which is a bit troublesome. For example, when performing batched readout, one may need to create some auxiliary fields.

You can use the code below for testing:

import dgl
import torch

g1 = dgl.DGLGraph()
g1.add_nodes(2)
g2 = dgl.DGLGraph()
g2.add_nodes(1)
bg = dgl.batch([g1, g2])
bg.ndata['h'] = torch.randn(3, 1)

[Doc] add more documents

define the signature of UDFs.
documents for NodeBatch and EdgeBatch. These are classes exposed to users.

[USABILITY] Error in send_and_recv on partial nodes

It seems that send_and_recv didn't properly process the index of partial nodes.

import dgl
import torch as th

g = dgl.DGLGraph()
g.add_nodes(100)
g.add_edges([20, 21, 22, 23, 24], [25, 26, 27, 28, 29])
g.set_e_repr({'e': th.ones(5)})
g.set_n_repr({'h': th.ones(100)})


def msg(src, edge):
    return {'h': src['h']}


def reduce(node, msg):
    return {'sum': msg['h']}


def update(node):
    return {'h': node['sum']}

## Works Fine
g.update_all(msg, reduce, update)

## Error
g.send_and_recv([20, 21, 22, 23, 24], [25, 26, 27, 28, 29], msg, reduce, update)

[BUG?] Messages are not passed and reduced in parallel with send_and_recv

Old

In the older version of API, with the following block of code, the messages are passed and reduced in parallel:

def gcn_msg(src, edge):
    return {'m': src['h']}

def gcn_reduce(node, msgs):
    return {'h': torch.sum(msgs['m'], 1)}

g = dgl.DGLGraph()
g.add_nodes(3)
g.add_edges([0, 1, 0], [1, 0, 2])
g.set_n_repr({'h': torch.tensor([[1.], [2.], [3.]])})
g.send_and_recv([0, 1, 0], [1, 0, 2], gcn_msg, gcn_reduce)

After send_and_recv,

g.get_n_repr()['h']

yields

tensor([[2.],
        [1.],
        [1.]])

New

With current APIs, a different behavior is observed:

def gcn_msg(edges):
    return {'m' : edges.src['h']}

def gcn_reduce(nodes):
    return {'h' : torch.sum(nodes.mailbox['m'], 1)}

g = dgl.DGLGraph()
g.add_nodes(3)
g.add_edges([0, 1, 0], [1, 0, 2])
g.ndata['h'] = torch.tensor([[1.], [2.], [3.]])
g.send_and_recv([[0, 1, 0], [1, 0, 2]], gcn_msg, gcn_reduce)

Now after send_and_recv,

g.ndata['h']

yields

tensor([[2.],
        [1.],
        [3.]])

I tried replacing send_and_recv above with update_all and such a discrepancy is not observed. Any idea why this is the case?

Tree-LSTM and msg/reduce/update functions

We met a problem when implementing tree-lstm with the new reduce function. The Child-Sum Tree-LSTM is described as follows:

@GaiYu0 has implemented this when there are only msg and update functions (see codes here). However, with the new reduce function, it is tricky. Remember that the msg/reduce/update function signatures are defined as follows:

def message_func(src, dst, edge):
   # return the message to be sent along the edge.
   ...

def reduce_func(msgs):
   # return the reduced msg
   ...

def update_func(node, msg_reduced):
   # return the new node state
   ...

Let's look at the equations. Apparently, equation (2) is a summation that should be implemented in the reduce function. Equations (3), (5), (6) rely on the node state of the receiver node (x_j) and the reduced hidden state (h_tilda), so they should be put in the update function. Equations (4) and (7) are problematic. Equation (4) needs to compute a "pair-wise" transformation using the node state of the receiver and sender nodes, and the result is applied as the weight in the summation in equation (7). This computation paradigm is similar to the GAT model but without the softmax. As a result, equation (4) and (7) should be put in the reduce function but the reduce function has no access to the receiver state (x_j). This means the user needs to also include the dst node state (x_j) in the message. @ylfdq1118 also mentioned this problem in the GAT model before. As a result, we propose to change the msg/reduce/update signatures to followings:

def message_func(src, edge):
   # return the message to be sent along the edge.
   ...

def update_edge_func(src, dst, edge):
   # return the new edge state
   ...

def reduce_func(node, msgs):
   # return the reduced msg
   ...

def update_func(node, msg_reduced):
   # return the new node state
   ...

Change#1: The message function no longer has access to the dst node state. By contrast, the update edge function still has access to both end points.
Change#2: The reduce function now has access to the receiver node state.

This seems to be a step back as the new reduce function signature is equal to the old update function signature before we decided to add reduce. Reduce function is a very powerful tool as user can fuse the message and update function into a single reduce function now:

def message_func(src, edge):
  # Dummy message function that simply returns all the inputs.
  return {'src' : src, 'edge': edge}
def reduce_func(node, msgs):
  # The reduce function now has access to both the dst node and all the src nodes.
  ...
def update_func(node, msg_reduced):
  # Dummy update function that simply returns the reduced msg.
  return msg_reduced

We think this change is necessary as message/reduce/update functions have different batching approach: message function is batched by all the edges; reduce function is batched by nodes with the same degree; update function is batched by all the nodes. Since batching by all the edges/nodes are more efficient by the constrained batching of the reduce function, it is user's responsibility to make the reduce function as light as possible while putting more logic in the message/update functions. The best case is when reduce function is the built-in reducer (e.g. "sum"); the worst case is when all the logic is fused in the reduce function. For example, a good implementation of tree-lstm using the new signature is as follows. You can see how the equations are computed in different functions.

def message_func(src, edge):
  return {'h' : src['h'], 'c' : src['c']}
def reduce_func(node, msgs):
  # equation (2)
  h_tild = th.sum(msgs['h'], 1)
  # equation (4)
  wx = th.mm(v['x'], W_f).unsqueeze(1)
  uh = th.mm(msgs['h'], U_f)
  f = th.sigmoid(wx + uh + b_f)
  # equation (7) second term
  c_tild = th.sum(f * msgs['c'], 1)
  return {'h_tild' : h_tild, 'c_tild' : c_tild}
def update_func(v, accum):
  # equation (3), (5), (6)
  iou = th.mm(v['x'], W_iou) + th.mm(accum['h_tild'], self.U_iou) + self.b_iou
  i, o, u = th.chunk(iou, 3, 1)
  i, o, u = th.sigmoid(i), th.sigmoid(o), th.tanh(u)
  # equation (7)
  c = i * u + accum['c_tild']
  # equation (8)
  h = o * th.tanh(c)
  return {'h' : h, 'c' : c}

I may have just brought up the problem that has been found by @ylfdq1118 when implementing the GAT model.

[DISCUSSION] [RFC] Multi-edge support

Had a discussion with @ylfdq1118 today. Here is the conclusion we have reached:

Storage

In our current codebase, we store the graph as adjacency lists (node to node). We propose to directly change the storage in C++ backend to incidence lists, more specifically, an incidence list for inbound edges and another for outbound ones. This implies that we are no longer handling simple graphs and multi-graphs separately like in NetworkX.
This will affect the implementation of the following backend methods:

pred and succ: Essentially, we will have to find the inbound/outbound edges of the given vertices first, then look up the source/target vertex of those edges (we can also have a predecessor/successor list in parallel to the inbound/outbound edges, but this is essentially duplicating the edge list and I'm not sure whether we would like it). These methods should support a boolean argument unique to dictate whether we expect a unique list of vertices (useful for graph traversal) or not (useful for deciding source/target nodes for message and apply_edge functions). EDIT4: this should be fine.
~~vertex_subgraph: Similar to pred/succ, we need to do two-step queries.~~ EDIT4: this should also be fine.
get_eid~~, in_edges, out_edges~~: Now it would return a list of edge IDs instead of one.
get_eids, batch_in_edges, batch_out_edges: We haven't go through the semantics of this method yet. I would suggest returning a list of lists if possible, following the same practice as igraph.Graph.neighborhood(). Returning a padded array (along with the lengths) can also do the job if we wish to enable padding (which is a separate functionality we wish to consider I assume). EDIT3: this should be fine since we are returning (source, target, edge ID) tuples.

EDIT4: after all, the changes in C++ interface only involves changing the returning type of get_eid to IdArray and batch_get_eid to EdgeArray.

User Interface

Right now, the following user interfaces assumes that no multi-edge exist:

set_e_repr
get_e_repr
apply_edges
send (and consequently send_and_recv, pull, push, and update_all)
- Note that pull, push and update_all only assumes that in the implementation due to usage of send; the interface itself is fine.
update_edge

Since the interfaces expects to take in and return tensors, we think that having multi-edges will complicate the semantics significantly for get_e_repr and set_e_repr in particular: the users often have to track of how many edges are between two given vertices (e.g. Given a multi-graph ((0, 1), (0, 1), (1, 2)), set_edge_repr([0, 1], [1, 2]) will actually require a tensor with 3 rows, quite unnatural).

EDIT2: Our suggestion is to

For get_e_repr and set_e_repr, simply raise an error if multi-edge exists between any of the given source/target pairs. We can have nicer semantics later.
For other methods, simply send/apply on all edges in between.
Have a counterpart working on edge IDs for apply_edges, send, send_and_recv, and update_edge (e.g. sendon) as well (set_e_repr and get_e_repr already have their edge ID counterparts), to allow sending/applying on individual edges.

We concluded that switching to incidence matrices in the backend should not break compatibility on the current user interfaces.

Side note: Filtering vertices/edges

Since RGCN already involves edges with different "types", which are represented by integers, we can expect that later on we will have models requiring to perform graph operations on a "type" of node/edge at a time. To be even more general, the users can do stuff on a subset of nodes/edges satisfying a certain predicate. We think it will be nice to have a filtering function which returns a subset of given nodes/edges that satisfies a given predicate. For example:

# returns all edges which has the field t equal to 2
eids = g.filter_edges(ALL, lambda e: e['t'] == 2)
# returns the nodes within @nids that has a maximum component greater than 1
# equivalent to something like nids[g.get_n_repr()['h'][nids].max(1) > 1]
nids = g.filter_nodes(nids, lambda n: n['h'].max(1) > 1)

Of course, the signature does not have to be exactly like this.

gSPMV Optimization

One thing we have noticed is that, if we use incidence matrices in our storage, then builtin reduce functions can be always represented by a gSPMV between the inbound incidence matrix and the message tensors.

However, if the message function is also a builtin, we don't want to actually broadcast the messages onto edges since it will potentially take up much more space. So

If both message and reduce functions are builtins, we still stick to gSPMV on adjacency matrix if possible.
Otherwise, if the reduce function is a builtin, we can do gSPMV on inbound incidence matrix.

EDIT: gSPMV on adjacency matrix with multi-edges is more subtle. Since multi-edges are analogous to duplicate entries in COO sparse matrix, depending on the message and reduce function form the actual implementation can be different:

For message function copy_src/src_mul_edge and reduce function sum, on PyTorch we can still stick with existing SPMM multiplications by directly constructing the (weighted) adjacency matrix from edge lists; PyTorch will automatically coalesce duplicated entries by summing them up. I don't know if this is the case for MXNet and Tensorflow.
For other cases, it is very likely that we need to write our own operators to coalesce the duplicate entries. For instance, if the reduce function is max, the coalescing operation is no longer a sum but something like max pooling, so direct sparse tensor construction in PyTorch no longer works.

RGCN example

Under the proposed changes, we expect that the RGCN code can be greatly simplified, and our behavior could match the official implementations. Namely, the message function is simply a batched dot product between the source tensor and the weight matrices corresponding to the edge types, while the reduce function is just a sum (@ylfdq1118 correct me if I'm wrong):

def msg(src, edges, W):
    # W is a 3D Tensor (n_types, in_features, out_features) inside a closure or a parameter in a PyTorch module
    return src[:, None] @ W[edges['type']]

reducer = builtin.sum

Comments and questions are welcome.

[DISCUSSION] JTVAE profiling result on GPU is up

See https://docs.google.com/spreadsheets/d/11ZttVv7J8tq7d5Qq9MuCxrY-5P3zT5BP0P12oLyamts/edit?usp=sharing

TLDR; DGL should work a lot faster after we switch to C++ graph backend, but it may still have trouble to catch up the official implementation since the latter has a lot less overhead, including (1) not explicitly constructing line graph for tree message passing, and (2) always uses padding and dense sum instead of sparse sum for reduction.

Some API corner cases

Hi all,

I want to use this issue as a reminder of the corner cases of the API semantics.
(1) Is message phase a prerequisite of the node update phase?. Our current implementation will check whether the predecessors of the node have sent messages and raise error if no message is found. This means there must be some sendto call before recvfrom. One argument for this is that if the node update does not require msgs, it can be computed outside of DGLGraph by readouting the node values and performing the transformation. I want to know your opinions on this. Do you think we should allow node update without messages?
(2) Should messages be transient tensors? Our current implementation saves messages with edges and never delete them. This increases total memory consumption. Also, what is the right semantics if recvfrom is called repeatedly?
(3) What is the msg if the node has no predecessors?
The following is the code example:

def mfunc(u, v, e_uv):
  return u['h']
def ufunc(u, msg):
  return u['h'] + msg
g = DGLGraph()
g.add_edge(0, 1)  # The graph has one edge 0->1
g.register_msg_func(mfunc)
g.register_update_func(ufunc)
g.register_reduce_func('sum')

# Case 1: what will happen?
g.recvfrom(1, [0])

# Case 2: what will happen?
g.sendto(0, 1)
g.recvfrom(1, [0])
g.recvfrom(1, [0])

# Case 3: what will happen?
g.recvfrom(0)

[BUG] update_edge API broken

import dgl
import torch as th
g = dgl.DGLGraph()
g.add_nodes(10)
g.add_edges([0, 1, 2, 3, 4], [5, 6, 7, 8, 9])
g.set_e_repr({'e': th.ones(5)})
g.update_edge(edge_func=lambda u, v, edge: edge['e'])

Feature request: auto-batching preset tensor storages during construction from networkx

Just for booking this issue raised in discussion.

Reference: https://github.com/BarclayII/icml18-jtnn/blob/master/jtnn/mpn.py#L87-L161

In particular, for initializing a DGLGraph where each node and edge gets its own feature vector, I have an "ideal" snippet which looks roughly like this: basically adding nodes and edges along with their feature vectors one by one to a networkx.Graph, then converting this thing to a DGLGraph

def mol2dgl_ideal(smiles_batch):
    '''
    Get a batched DGLGraph object from the given batch of SMILES
    '''
    graphs = []
    for smiles in smiles_batch:
        mol = get_mol(smiles)
        graph = Graph()
        for atom in mol.GetAtoms():
            graph.add_node(atom.GetIdx(), features=atom_features(atom))
        for bond in mol.GetBonds():
            graph.add_edge(
                    bond.GetBeginAtom().GetIdx(),
                    bond.GetEndAtom().GetIdx(),
                    features=bond_features(bond),
                    )
        graphs.append(graph)

    graphs = nx.disjoint_union_all(graphs)
    return DGLGraph(graphs)

The problem is that these feature vectors are not batched and therefore we cannot do batched computation within our current implementation. The workaround is to keep track of the node and edge features by ourselves, and batch-set them after construction:

def mol2dgl(smiles_batch):
    graphs = []
    atom_feature_list = []
    bond_feature_list = []
    bond_begin_nodes = []
    bond_end_nodes = []
    n_nodes = 0
    for smiles in smiles_batch:
        mol = get_mol(smiles)
        graph = Graph()
        for atom in mol.GetAtoms():
            graph.add_node(atom.GetIdx())
            atom_feature_list.append(atom_features(atom))
        for bond in mol.GetBonds():
            begin_idx = bond.GetBeginAtom().GetIdx()
            end_idx = bond.GetEndAtom().GetIdx()
            features = bond_features(bond)
            graph.add_edge(begin_idx, end_idx)
            bond_feature_list.append(features)
            bond_begin_nodes.append(begin_idx + n_nodes)
            bond_end_nodes.append(end_idx + n_nodes)
            # set up the reverse direction
            bond_feature_list.append(features)
            bond_begin_nodes.append(end_idx + n_nodes)
            bond_end_nodes.append(begin_idx + n_nodes)

        graphs.append(graph)

    graphs = nx.disjoint_union_all(graphs)
    dgl_graph = DGLGraph(graphs)
    dgl_graph.set_n_repr({'features': torch.stack(atom_feature_list)})
    dgl_graph.set_e_repr(
            {'features': torch.stack(bond_feature_list)},
            bond_begin_nodes,
            bond_end_nodes,
            )

    return dgl_graph

A mechanism to automatically batch-store the tensors during construction, or something alike, would be preferred. (not among the highest priority though)

[GraphIndex] Graph traversal

BFS/Topological traversal

Process nodes/edges belonging to the same layer (BFS) / frontier (topological traversal) in the same batch.

bfs(G[, source, depth_limit]) and topo(G) returns a list of Tensor of layer/frontier nodes
bfs_edge(G[, source, depth_limit]) and topo_edge(G) returns a list of Tensor of layer/frontier edges

where source is a collection of nodes that specifies the initial BFS layer and depth_limit specifies the maximum BFS depth.

DFS

Process nodes/edges in the same strongly connected component in sequential DFS order.
Nodes/edges belonging to different paths in the same strongly connected component cannot be processed in the same batch because of possible dependency.
However, the traversal of different strongly connected components are processed in the same batch because there is no dependency.
We only consider strongly connected components that are graphs batched in DGLBatchedGraph.

NetworkX provides the following API's for DFS:

dfs_edges(G[, source, depth_limit])
dfs_tree(G[, source, depth_limit])
dfs_predecessors(G[, source, depth_limit])
dfs_successors(G[, source, depth_limit])
dfs_preorder_nodes(G[, source, depth_limit])
dfs_postorder_nodes(G[, source, depth_limit])
dfs_labeled_edges(G[, source, depth_limit]) (used in JT-VAE example)

We can extend these API's such that source can be a container of nodes (belonging to different graphs) and a call to them yields a list of Tensor of nodes instead of a list of nodes.

About the semantic of propagation

I met a problem when implementing the propagate method. If there is a graph like this: 0->1, 0->2, 1->2. If we call list(nx.bfs_edges(g, 0)), it gives edges [(0, 1), (0, 2)]. The edge (1, 2) is ignored. I wonder what is the right semantic for the bfs propagation in GNN? What are the messages to be computed?

Bug in _batch_reduce() with dicts

In graph.py:647

all_reduced_msgs = {key : F.pack(val) for key, val in reduced_msgs.items()}

reduced_msgs is always a list, so it throws an error if the reducer returns a dict.
Fix:

keys = reduced_msgs[0].keys()
all_reduced_msgs = {
        key : F.pack([msg[key] for msg in reduced_msgs]) 
        for key in keys}

Loopy belief propagation

The loopy belief propagation updates messages on a directed edge e = (u, v) by looking at the messages on all edges pointing to u, except the one that comes from v. An example from Junction-Tree VAE looks like this:

where m's are messages, x are input features and W are weights (the latter two should be fine).
The readout would be something like

and the output vector being the average of all h_u's.

I can't think of a direct way to do loopy belief propagation like this within the current framework.

A possible workaround for this particular phase I can think of is to divide the process into two stages, maintaining a field called msg_sum, containing the sum of incoming message for each node :

Send phase: compute m_uv like this:

As you can see, this require us to query the message on the reverse edge, which I wonder if is possible. AFAIK the message function does not allow you to look at edges other than the ones to be updated.
Recv phase: compute s_u as the sum of all incoming m_uv.
[EDIT] Final readout phase: we can directly use s_u to compute the h's.

Any ideas?

Is dependency on nltk intended?

`add_edges_from` in batch.py does not preserve edge order

In batch.py:28, it seems that the new batched graph adds its edges using add_edges_from, which does not preserve the order of the supplied edge list.

This will essentially fail the following test:

g1 = dgl.DGLGraph()
g1.add_nodes_from([0,1,2, 3, 4, 5])
g1.add_edges_from([(4, 5), (4, 3), (2, 3), (2, 1), (0, 1)])
g1.edge_list
e1 = torch.randn(5, 10)
g1.set_e_repr(e1)
g2 = dgl.DGLGraph()
g2.add_nodes_from([0, 1, 2, 3, 4, 5])
g2.add_edges_from([(0, 1), (1, 2), (2, 3), (5, 4), (4, 3), (5, 0)])
e2 = torch.randn(6, 10)
g2.set_e_repr(e2)
g = dgl.batch([g1, g2])
r1 = g.get_e_repr()[g.get_edge_id(4, 5)]
r2 = g1.get_e_repr()[g1.get_edge_id(4, 5)]
print(r1)
print(r2)
assert torch.equal(r1, r2)

[BUG] Tree-LSTM performs terrible after using new scheduler & executor

Tree LSTM behaves differently before and after #140 .

As a reference:
DGL version: commit 3e8b63e
The result of epoch 0:

Epoch 00000 training time 23.9037s
Epoch 00000 | Dev Acc 0.7920 | Root Acc 0.4278
Epoch 00000 | Test Acc 0.7897 | Root Acc 0.4380

DGL version: commit deb653f
The result of epoch 0:

Epoch 00000 training time 23.2815s
Epoch 00000 | Dev Acc 0.6856 | Root Acc 0.2552
Epoch 00000 | Test Acc 0.6859 | Root Acc 0.2271

In fact, as a 5-class classification task, the baseline is supposed to be close to 1/5, this being said the model is not being trained correctly.

Still have no idea about it.

[DISCUSSION] Decouple networkx and deprecate batchable=False mode?

With the upcoming new C++ graph library support, it is time to think about decoupling DGLGraph from networkx and whether we want to continue support batchable=False model.

Here is how the DGLGraph will look like when it is backed by C++. You can see several changes:
(1) DGLGraph no longer inherits from networkx's DiGraph. Instead, we have our own GraphIndex object that directly calls C apis.
(2) The weird CachedGraph object is removed. It is replaced by our own graph library, so python-igraph is no longer a dependency.
(3) There is only one kind of node/edge attribute storage that is dgl.Frame. Data cannot be stored in networkx's dict-of-dict-of-dict structure any more. As a result, batchable=False model cannot be supported.
(4) Networkx graph can be converted from/to DGLGraph very easily. We also plan to support automatically convert networkx's node/edge attributes from/to our frame storage (issue #41 ).

I think non-batchable mode has served its purpose as a very naive baseline in the very beginning. It adds extra confusion for new users.

What do you guys think?