Giter Club home page Giter Club logo

graphssl's Introduction

Self-Supervised Learning For Graphs

Introduction

This repository serves as a mini-tutorial on using self-supervised learning for graphs. Self-supervised learning is a class of unsupervised machine learning methods where the goal is to learn rich representations of unstructured data when we do not have access to any labels. This repository implements a variety of commonly used methods (augmentations, encoders, loss functions) for self-supervised learning on graphs. The codebase also includes the option of loading commonly used graph datasets for a variety of downstream tasks. It is built using PyTorch Geometric which is a library built on PyTorch for graph machine learning.

Setting up the environment

Setup the conda environment which ensures the installation of the correct version of PyTorch Geometric (PyG) and all other dependencies.

conda env create -f environment.yml
conda activate graphssl

Clone this repository.

git clone https://github.com/paridhimaheshwari2708/GraphSSL.git
cd GraphSSL/

Self-supervised pretraining

For training a self-supervised model, we use the run.py script and we need to specify a few important arguments:

--save                  Specify where folder name of the experiment where the logs and models are saved
--dataset               Specify the dataset on which you want to train the model
--model                 Specify the model architecture of the GNN Encoder
--feat_dim              Specify the dimension of node features in GNN
--layers                Specify the number of layers of GNN Encoder
--loss                  Specify the loss function for contrastive training
--augment_list          Specify the augmentations to be applied as space separated strings

The options supported for above arguments are:

Argument Choices
dataset proteins, enzymes, collab ,reddit_binary, reddit_multi, imdb_binary, imdb_multi, dd, mutag, nci1
model gcn, gin, resgcn, gat, graphsage, sgc
loss infonce, jensen_shannon
augment_list edge_perturbation, diffusion, diffusion_with_sample, node_dropping, random_walk_subgraph, node_attr_mask

As an example, run the following command to train a self-supervised model on the proteins dataset

python3 run.py --save proteins_exp --dataset proteins --model gcn --loss infonce --augment_list edge_perturbation node_dropping

Evaluation on graph classification

For training and evaluating the model on the downstream task, here graph classification, we use the run_classification.py script. The arguments are:

--save                  Specify where folder name of the experiment where the logs and models are saved
--load                  Specify the folder name from which we want the self-supervised model is loaded
                        If present, the GNN loads the pretrained weights and only the classifier head is trained
                        If left empty, the model will be trained end-to-end without self-supervised learning
--dataset               Specify the dataset on which you want to train the model
--model                 Specify the model architecture of the GNN Encoder
--feat_dim              Specify the dimension of node features in GNN
--layers                Specify the number of layers of GNN Encoder
--train_data_percent    Specify the fraction of training samples which are labelled

The options supported for above arguments are:

Argument Choices
dataset proteins, enzymes, collab ,reddit_binary, reddit_multi, imdb_binary, imdb_multi, dd, mutag, nci1
model gcn, gin, resgcn, gat, graphsage, sgc
augment_list edge_perturbation, diffusion, diffusion_with_sample, node_dropping, random_walk_subgraph, node_attr_mask

As an example, run the following command to train the final classifier head of a self-supervised model on the proteins dataset

python3 run_classification.py --save proteins_exp_finetuned --load proteins_exp --dataset proteins --model gcn --train_data_percent 1.0

For the same dataset, training the model end-to-end (without self-supervised pretraining) can be done as follows

python3 run_classification.py --save proteins_exp_finetuned_e2e  --dataset proteins --model gcn --train_data_percent 1.0

Tutorial

We have also created a Colab Notebook which combines various techniques in self-supervised learning and provides an easy-to-use interface for training your own models.

graphssl's People

Contributors

jianvora avatar paridhimaheshwari2708 avatar sharmilanangi avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.