jw9730 / tokengt Goto Github PK

[NeurIPS'22] Tokenized Graph Transformer (TokenGT), in PyTorch

License: MIT License

Dockerfile 0.32% Shell 16.24% Python 58.33% Cython 1.27% Jupyter Notebook 23.84%

equivariance gnn graph hypergraph pytorch self-attention transformer

tokengt's Introduction

Tokenized Graph Transformer (PyTorch)

Pure Transformers are Powerful Graph Learners
Jinwoo Kim, Tien Dat Nguyen, Seonwoo Min, Sungjun Cho, Moontae Lee, Honglak Lee, Seunghoon Hong
NeurIPS 2022

Setting up experiments

Using the provided Docker image (recommended)

docker pull jw9730/tokengt:latest
docker run -it --gpus=all --ipc=host --name=tokengt -v /home:/home jw9730/tokengt:latest bash
# upon completion, you should be at /tokengt inside the container

Using the provided Dockerfile

git clone --recursive https://github.com/jw9730/tokengt.git /tokengt
cd tokengt
docker build --no-cache --tag tokengt:latest .
docker run -it --gpus all --ipc=host --name=tokengt -v /home:/home tokengt:latest bash
# upon completion, you should be at /tokengt inside the container

Using pip

sudo apt-get update
sudo apt-get install python3.9
git clone --recursive https://github.com/jw9730/tokengt.git tokengt
cd tokengt
bash install.sh

Running experiments

Synthetic second-order equivariant basis approximation

cd equivariant-basis-approximation/scripts

# Train and save logs, ckpts, and attention maps (--save_display)
bash [INPUT]-[NODE_IDENTIFIER]-[TYPE_IDENTIFIER].sh

# Test and save attention maps (--save_display)
bash [INPUT]-[NODE_IDENTIFIER]-[TYPE_IDENTIFIER]-test.sh

# For the visualization of saved attention maps, please see viz_multi.ipynb

PCQM4Mv2 large-scale graph regression

cd large-scale-regression/scripts

# TokenGT (ORF)
bash pcqv2-orf.sh

# TokenGT (Lap)
bash pcqv2-lap.sh

# TokenGT (Lap) + Performer
bash pcqv2-lap-performer-finetune.sh

# TokenGT (ablated)
bash pcqv2-ablated.sh

# Attention distance plot for TokenGT (ORF)
bash visualize-pcqv2-orf.sh

# Attention distance plot for TokenGT (Lap)
bash visualize-pcqv2-lap.sh

Pre-Trained Models

We provide checkpoints of TokenGT (ORF) and TokenGT (Lap), both trained with PCQM4Mv2. Please download ckpts.zip from this link. Then, unzip ckpts and place it in the large-scale-regression/scripts directory, so that each trained checkpoint is located at large-scale-regression/scripts/ckpts/pcqv2-tokengt-[NODE_IDENTIFIER]-trained/checkpoint_best.pt. After that, you can resume the training from these checkpoints by adding the option --pretrained-model-name pcqv2-tokengt-[NODE_IDENTIFIER]-trained to the training scripts.

References

Our implementation uses code from the following repositories:

Performer for FAVOR+ attention kernel
Graph Transformer, SignNet, and SAN for Laplacian eigenvectors
Graphormer for PCQM4Mv2 experiment pipeline
timm for stochastic depth regularization

Citation

If you find our work useful, please consider citing it:

@article{kim2022pure,
  author    = {Jinwoo Kim and Tien Dat Nguyen and Seonwoo Min and Sungjun Cho and Moontae Lee and Honglak Lee and Seunghoon Hong},
  title     = {Pure Transformers are Powerful Graph Learners},
  journal   = {arXiv},
  volume    = {abs/2207.02505},
  year      = {2022},
  url       = {https://arxiv.org/abs/2207.02505}
}

Acknowledgements

The development of this open-sourced code was supported in part by the National Research Foundation of Korea (NRF) (No. 2021R1A4A3032834).

tokengt's People

Contributors

Stargazers

Watchers

tokengt's Issues

How about the lap_eigvec input of tokenizer.py

Hello,
i want to know where is lap_eigvec from?i means,i can not find out which one function gives it.i just seen the lap_eigvec is one of the inputs of tokenizer forward

How to get the test rusults?

Could you provide some test scripts like the training scripts to get the result reported in the paper? Thanks a lot.

SEB in node classification

Hello,

First of all, great work!

I am implementing a node classification task using your model. It works perfectly with small tweaks and additions. However, I can't figure our how to implement SEB (sparse equivariant basis). Where do I need to add the torch.coalesce call?

Thanks in advance for your answer!

Will the code be available in July?

The experiment is really good 👍

code

Hello, I'm particularly interested in your work. Can you publish your code?

Error when using pre-trained weights

Hi, there is an error that says "command not found" when using the --pretrained-model-name command to use pre-trained weights

Using for chess engine

Hello, I wanted to include this in my experimentation for augmenting a chess engine. I have prepared some datasets that represent each successive board state of 64 nodes (1 node for every space on the board) and along with it I have added edges with connecting nodes representing legal moves. I also have the stockfish evaluation of the board (so a positive/negative float representing who is winning based off of some specific heuristics that stockfish uses.) I want to first see if I can train a graph network to predict this board evaluation without any custom heuristics. Any chance you can point me in the right direction? I am familiar with pytorch lightning so was hoping I could just import your mode. The data I have prepared is arranged using networkx graph library by the way.
Thanks!

error when I run "bash visualize-pcqv2-lap.sh"

AttributeError: module 'distutils' has no attribute 'version'

Reasoning behind `convert_to_single_emb`

I am trying to apply TokenGT to the 2D_data_npj molecular dataset for property prediction. I am struggling to understand why the following function is applied during the preprocessing stage:

@torch.jit.script
def convert_to_single_emb(x, offset: int = 512):
    feature_num = x.size(1) if len(x.size()) > 1 else 1
    feature_offset = 1 + torch.arange(0, feature_num * offset, offset, dtype=torch.long)
    x = x + feature_offset
    return x

This seems to increase the values of the features, and leads to errors with the embeddings downstream since the lookup table size is much smaller than say, 51200 (if I have a node feature dimension of 100).

Can't run it on Colab with these instructions.

Run this code without Fairseq

Is there any way to run this code without fairseq? I want to train tokengt with my own graph data and check the results.

Missing best_valid

In your visualize folder, torch.load("best_valid.pt") is not provided in either checkpoint or in the repo.
where can I find it?

Fairesq advantage

Hello,

Can I know what the advantage of using Fairseq is?
I mean if we constructed the dataset using the PyTorch geometric constructor, added your wrapper for the eig_values calculation, and then pushed all of them to a regular simple Transformer (using our own working implementation for example), will this work? (Let's forget about the usage of performer and its dependencies for now).

Thanks!

Weight sharing compatibility

In the Transformer, a weight sharing scheme between the input embedding and output projection layer is used to improve efficiency. Any reasons why this is not implemented, and how it could be done?

Are token features and node/type identifiers added or concatenated?

Thanks for providing the great implementation codes!

After looking into the codes, I have a quick question about the formation of input features to the TokenGT model. Specifically, if I understood the paper correctly, the node features, token identifiers, and token type identifiers are concatenated (C + 2 * d_p + d_e dimensions according to Section 2 - Main Transformer in the paper), while in the code here, they seem to be added together rather than concatenated. Am I misunderstanding the paper or the codes? Or are these two approaches actually equivalent or achieving similar performances?

Thank you for any help on this!

question

what's the difference between model in equivariant- and in large-scala-?