Sample Code for Gated Graph Neural Networks

License: MIT License

Python 100.00%

gated-graph-neural-network-samples's Introduction

Gated Graph Neural Networks

This repository is not maintained anymore. An updated version of the sparse codebase in this repo, together with many more GNN implementations, is available on https://github.com/microsoft/tf-gnn-samples.

This repository contains two implementations of the Gated Graph Neural Networks of Li et al. 2015 for learning properties of chemical molecules. The inspiration for this application comes from Gilmer et al. 2017.

This code was tested in Python 3.5 with TensorFlow 1.3. To run the code docopt is also necessary.

This code was maintained by the Deep Program Understanding project at Microsoft Research, Cambridge, UK.

Data Extraction

To download the related data run get_data.py. It requires the python package rdkit within the Python package environment. For example, this can be obtained by

conda install -c rdkit rdkit

Running Graph Neural Network Training

We provide four versions of Graph Neural Networks: Gated Graph Neural Networks (one implementation using dense adjacency matrices and a sparse variant), Asynchronous Gated Graph Neural Networks, and Graph Convolutional Networks (sparse). The dense version is faster for small or dense graphs, including the molecules dataset (though the difference is small for it). In contrast, the sparse version is faster for large and sparse graphs, especially in cases where representing a dense representation of the adjacency matrix would result in prohibitively large memory usage. Asynchronous GNNs do not propagate information from all nodes to all neighbouring nodes at each timestep; instead, they follow an update schedule such that messages are propagated in sequence. Their implementation is far more inefficient (due to the small number of updates at each step), but a single propagation round (i.e., performing each propagation step along a few edges once) can suffice to propagate messages across a large graph.

To run dense Gated Graph Neural Networks, use

python3 ./chem_tensorflow_dense.py

To run sparse Gated Graph Neural Networks, use

python3 ./chem_tensorflow_sparse.py

To run sparse Graph Convolutional Networks (as in Kipf et al. 2016), use

python3 ./chem_tensorflow_gcn.py

Finally, it turns out that the extension of GCN to different edge types is a variant of GGNN, and you can run GCN (as in Schlichtkrull et al. 2017) by calling

python3 ./chem_tensorflow_sparse.py --config '{"use_edge_bias": false, "use_edge_msg_avg_aggregation": true, "residual_connections": {}, "layer_timesteps": [1,1,1,1,1,1,1,1], "graph_rnn_cell": "RNN", "graph_rnn_activation": "ReLU"}'

To run asynchronous Gated Graph Neural Networks, use

python3 ./chem_tensorflow_async.py

Restoring models

Suppose you have trained a model e.g. the following trains for a single epoch:

python3 ./chem_tensorflow_dense.py --config '{"num_epochs": 1}'
== Epoch 1
 Train: loss: 0.52315 | acc: 0:0.64241 | error_ratio: 0:9.65831 | instances/sec: 6758.04
 Valid: loss: 0.26930 | acc: 0:0.55949 | error_ratio: 0:8.41163 | instances/sec: 9902.71
  (Best epoch so far, cum. val. acc decreased to 0.55949 from inf. Saving to './2018-02-01-11-30-05_16306_model_best.pickle')

Note that a checkpoint was stored to './2018-02-01-11-30-05_16306_model_best.pickle'. To restore this model and continue training, use:

python3 ./chem_tensorflow_dense.py --restore ./2018-02-01-11-30-05_16306_model_best.pickle

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

gated-graph-neural-network-samples's People

Contributors

Stargazers

Watchers

Forkers

ml-lab oujunke jennifersun27 sukelluskello ryotatomioka thaonguyen19 jiaxuanyou huizhuzhao alexpolozov sungjinlees rex-hou nandarahul ai3dvision codegank jkloop45 binyi10 wangxinqi94 opmusic thilinicooray yaqinzhou paul841029 songfgh feifanxu xxarbiter cermakm afcarl nerry95 lukaszozimek lrpopeyou lidaiqing schaelle hvdthong jamesyifan nearwatson rohansaphal97 blssel hyzcn ricardozitseng arbi11 aymenwah chubbymaggie truongbuu oj9040 guyifanll soraismus batermj yijunwu codeaudit zhouyonglong entslscheia qitong jyosoman queenie88 hejia-zhang heyuanhao raihan2108 leechikara leechang-soo vermamachinelearning coder-chenzhi jingang-cv jaehongyoon ad05bzag salemameen moonlightlong guyulongcs an87li triper1022 asep-fajar-firmansyah jessicaschrouff sybilzz qss2012 hikylemorris hulalazz zhouyy92 maknju shadowkun 13331112522 kkkkkwai minkyuha science4fun bingslient castor-v-pollux swapneelm zhenv5 fudp doojin88 aunaik bkwapong sangaj mbrukman scorpjd rhengzi xennygrimmato zhihaolzh cl2227619761 shi27feng xuyou314 thabet1983 lusinga

gated-graph-neural-network-samples's Issues

tying layers

Thanks for sharing the ggnn code. I am a little confused by the concept of layers. It seems that in the earlier version of the codebase, each layer represents one timestep, and there is an option called tie_gnn_layers to toggle the sharing of weights, biases, and rnn cells across all time steps. Later, I noticed that after a commit, layers no longer share those tensors and cells. Does that mean a layer now represent an actual physical layer in the model instead of a time step? Thanks.

PS: since tying layers is no longer an option, the README file still includes an obsolete parameter "tie_gnn_layers": false for GCN. The comment about tying layers also seems to be obsolete.

reported loss

Hi!
It seems to me that the training/ test 'loss' reported per epoch is the mean absolute error average_per_moelcule(|y-y*|). Is that correct? I am new to tf so the syntax is not very intuitive to me.

Thanks!

Please explain comments like ' # [b x v x h]'

Hello,
I'm reading your code and I'd like to use gnn in my work. I noticed that you added some comments like ' # [b x v x h]' in your code. I guess understanding these comments will help me understand these quickly. While I'm not sure about what these characters actually mean. Especially I can't figure out what 'b' means. Could you please explain what 'b','v','e','h' mean in chem_tensorflow_dense.py? It will help me a lot.

Thank you very much!
Jennifer

Chemical accuracy values

Hello,
I am trying to reproduce results from the Gilmer et al., 2017 paper and have noticed that they mention normalizing of the targets. However, the chemical accuracies used do not seem to be normalized in their Supplementary Table. In your code, those values are different from Faber et al., 2017 and Gilmer et al., 2017. Would you mind explaining how you derived them?
Also, does your validation split reproduce the one mentioned in Gilmer et al., 2017, or was it random?
Thank you!
Best,

Can I use float rather than int in the graph?

Hello:
I want to use this awesome tool on my chemical reaction path search, but I want to make graphs from molecule coordinates, so the graph in .json file should be [<atom_idx1>, distance of atom1,2 , <atom_idx2>], rather than single_bond = 1, double_bond = 2, trans_bond = 3, the targets is molecule energy, it changes with atom distance.

I don't know much about your algorithm, so I tried to change these int into different float, only chem_tensorflow_async.py works(chem_tensorflow_gcn.py works but results keep same), but I don't know if it really did what I want.
So I want to know can I use float in the graph, is algorithm supported for this function?

How to use the trained models?

After training, how to use those trained models? How to predict the properties of chemicals? Any examples? Thanks a lot.

Where can we got the pytorch version?

Hi, author. Where can we got the pytorch version?

where are the gates during propagation

gates seems not used during propagation process

It seems strange to set the lowest value of valid accuracy as the best result.

In chem_tensorflow.py Line 279:"if val_acc < best_val_acc". It seems to choose smaller accuracy as the best result, so that the network always run 25 epochs.
Is this a mistaken?

Thank you very much.
Jennifer

UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape.

There is a UserWarning problem when I run
python ./chem_tensorflow_sparse.py.
This is the detail :
UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "

Dense for small graph?

Hi,

I was wondering typically what on what order of the size of graph shall be classified as "small", so that the dense version will be more beneficial than sparse version? I know I can benchmark this myself, but some expert comments will give me a good prior! Thanks!

Explaining data in .json files

The following are data that follow this model downloaded by me:
Here is a json object from valid.json
{ "targets": [ [ -0.3917742606773421 ] ], "graph": [ [ 0, 2, 1 ], [ 0, 1, 2 ], [ 0, 1, 3 ] ], "node_features": [ [ 0, 1, 0, 0, 0 ], [ 0, 0, 0, 1, 0 ], [ 1, 0, 0, 0, 0 ], [ 1, 0, 0, 0, 0 ] ] }

I wonder how the graph data (edges) are presented.

In the example above:

graph data are (0,2,1), (0,1,2), (0,1,3)

Do you use GPU for training?

Thank you for your great work.

When you train your model, do you set your tensorflow with GPU or not? How long will it take for your training?

I pip install -r requirements.txt, then run. It showed that I am using CPU. Even though I install tensowflow-gpu, it still running on CPU. Any advice? Thanks a lot.

Unable to wget the sample data

Thank you for sharing the ggnn codes. When I was trying to run the testing samples, I found I cannot get the sample data from the target URL neither by using get_data.py nor wget (with "unable to establish SSL connection") nor curl ("SSL_ERROR_SYSCALL"). May I please know if you shared testing data sample is still reachable? Thanks a lot!

microsoft / gated-graph-neural-network-samples Goto Github PK