daiquocnguyen / graph-transformer Goto Github PK

Universal Graph Transformer Self-Attention Networks (TheWebConf WWW 2022) (Pytorch and Tensorflow)

License: Apache License 2.0

Python 98.06% C++ 1.26% Makefile 0.06% Cython 0.62%

graph-representation-learning graph-classification graph-neural-networks node-embeddings graph-embeddings graph-transformer self-attention transformer graph-machine-learning text-classification

graph-transformer's People

Contributors

Stargazers

Watchers

graph-transformer's Issues

Area under the ROC Curve ??

Hello, thanks for your work. Is there anyway to evaluate the model with the Area under the ROC Curve ? I don't know what is the current evaluation (looking through the paper https://arxiv.org/pdf/1909.11855.pdf but didn't find it.

I'd like to evaluate with AUC ROC because I'm currently working with a big dataset with supervised approach, and I found it weird that at each epoch the score remains the same with absolutely no change of the scores for differents epochs.

Is there also any way to run separatly training and testing ? I'm trying to test 3 splitting methods and compare them: Index/Random/Scaffold

Thank you for any kind of help.

Error while running "train_UGformerV2.py"

The error message is as follows:

Traceback (most recent call last):
File "D:/graph_code/Graph-Transformer-master/UGformerV2_PyTorch/train_UGformerV2.py", line 144, in
train_loss = train()
File "D:/graph_code/Graph-Transformer-master/UGformerV2_PyTorch/train_UGformerV2.py", line 101, in train
graph_label = label_smoothing(graph_label, num_classes)
File "D:\graph_code\Graph-Transformer-master\UGformerV2_PyTorch\UGformerV2.py", line 102, in label_smoothing
true_dist.scatter_(1, true_labels.data.unsqueeze(1), confidence)
IndexError: scatter_(): Expected dtype int64 for index.

And my pytorch version is 1.7.0. How can I solve this problem, thanks for your answer.

Got a pytorch RuntimeError when reproducing the unsup result

tried train_pytorch_U2GNN_UnSup.py and got:

Namespace(batch_size=4, dataset='PTC', dropout=0.5, ff_hidden_size=1024, fold_idx=1, learning_rate=0.005, model_name='PTC', num_epochs=50, num_hidden_layers=2, num_neighbors=4, num_self_att_layers=1, run_folder='../', sampled_num=512)
Loading data...
loading data
classes: 2
maximum node tag: 19
data: 344
19
Loading data... finished!
Writing to C:\Users\Administrator\Desktop\runs_pytorch_U2GNN_UnSup\PTC

Traceback (most recent call last):
File "C:\Users\Administrator\Desktop\U2GNN\U2GNN_pytorch\train_pytorch_U2GNN_UnSup.py", line 206, in
train_loss = train()
File "C:\Users\Administrator\Desktop\U2GNN\U2GNN_pytorch\train_pytorch_U2GNN_UnSup.py", line 157, in train
logits = model(X_concat, input_x, input_y)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\nn\modules\module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "C:\Users\Administrator\Desktop\U2GNN\U2GNN_pytorch\pytorch_U2GNN_UnSup.py", line 31, in forward
input_Tr = F.embedding(input_x, X_concat)
File "C:\Users\Administrator\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\nn\functional.py", line 1724, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.cuda.IntTensor instead (while checking arguments for embedding)

Seems like a int32 to int64 error
so fixed by adding "input_x = input_x.astype(np.int64)" behind "input_x = np.array(input_neighbors)" in line 124

No node attribute lables

Hi, thank you for research and your code, the results are excellent.

My issue is with datasets which usually have node attributes, not just node labels. For instance the original ENZYMES dataset includes 18 attributes per node, but the txt file in the zip lacks these attributes. I also haven't been able to find any reference to node attribute labels in the code, but I may be missing something obvious.

Is reading in node attribute labels a functionality you will eventually be able to add? Is it already implemented and I'm missing something? If it is impossible to add this functionality/ it will take a while to develop/ it's currently low priority because you think it'll have little impact on the results, would you mind clarifying that in the README?

Many thanks

transition layer

how do you implement the transition layer mentioned in your paper?

Essential related method: "Universal Transformers"

This is a highly cited study that apparently pioneers recurrent transformers: https://arxiv.org/abs/1807.03819
I am not completely convinced of the study quality, though. There are a few insufficiently substantiated claims, weird (buggy?) code excerpts, not obviously fair comparisons to alternatives, etc..

Code run error

Dear author,
This error occurred when I was running the code in the process of reproducing your experimental results. Could you please give me a solution

`ValueError: The two structures don't have the same nested structure.

First structure: type=list str=[<tf.Tensor 'universal_transformer_encoder1/parallel_0_4/universal_transformer_encoder1/universal_transformer_encoder1/body/encoder/universal_transformer_basic/foldl/while/Identity:0' shape=() dtype=int32>, <tf.Tensor 'universal_transformer_encoder1/parallel_0_4/universal_transformer_encoder1/universal_transformer_encoder1/body/encoder/universal_transformer_basic/foldl/while/Identity_1:0' shape=(3, ?, 9, 65) dtype=float32>]

Second structure: type=list str=[<tf.Tensor 'universal_transformer_encoder1/parallel_0_4/universal_transformer_encoder1/universal_transformer_encoder1/body/encoder/universal_transformer_basic/foldl/while/add:0' shape=() dtype=int32>, (<tf.Tensor 'universal_transformer_encoder1/parallel_0_4/universal_transformer_encoder1/universal_transformer_encoder1/body/encoder/universal_transformer_basic/foldl/while/ffn/layer_postprocess/add:0' shape=(?, 9, 65) dtype=float32>, <tf.Tensor 'universal_transformer_encoder1/parallel_0_4/universal_transformer_encoder1/universal_transformer_encoder1/body/encoder/universal_transformer_basic/foldl/while/unstack:1' shape=(?, 9, 65) dtype=float32>, <tf.Tensor 'universal_transformer_encoder1/parallel_0_4/universal_transformer_encoder1/universal_transformer_encoder1/body/encoder/universal_transformer_basic/foldl/while/unstack:2' shape=(?, 9, 65) dtype=float32>)]

More specifically: Substructure "type=tuple str=(<tf.Tensor 'universal_transformer_encoder1/parallel_0_4/universal_transformer_encoder1/universal_transformer_encoder1/body/encoder/universal_transformer_basic/foldl/while/ffn/layer_postprocess/add:0' shape=(?, 9, 65) dtype=float32>, <tf.Tensor 'universal_transformer_encoder1/parallel_0_4/universal_transformer_encoder1/universal_transformer_encoder1/body/encoder/universal_transformer_basic/foldl/while/unstack:1' shape=(?, 9, 65) dtype=float32>, <tf.Tensor 'universal_transformer_encoder1/parallel_0_4/universal_transformer_encoder1/universal_transformer_encoder1/body/encoder/universal_transformer_basic/foldl/while/unstack:2' shape=(?, 9, 65) dtype=float32>)" is a sequence, while substructure "type=Tensor str=Tensor("universal_transformer_encoder1/parallel_0_4/universal_transformer_encoder1/universal_transformer_encoder1/body/encoder/universal_transformer_basic/foldl/while/Identity_1:0", shape=(3, ?, 9, 65), dtype=float32)" is not`

order requirement ?

Hi,

The text file data is formatted as follows: https://github.com/muhanzhang/pytorch_DGCNN/tree/master/data

1st line: N number of graphs; then the following N blocks describe the graphs
for each block of text:
- a line contains n l, where n is number of nodes in the current graph, and l is the graph label
- following n lines:
  - the ith line describes the information of ith node (0 based), which starts with t m, where t is the tag of current node, and m is the number of neighbors of current node;
  - following m numbers indicate the neighbor indices (starting from 0).
  - following d numbers (if any) indicate the continuous node features (attributes)

Originally posted by @daiquocnguyen in #1 (comment)

Do you plan to apply to machine translation?

Would be interested to know whether you plan to apply graph transformer for a machine translation task in the future?

Format of the dataset

Could you please let me know how I can run similar analysis for my own data-set ? What is the format of the data-set fed into the code ?

About the result

Hello, thanks for the code.
In the paper, you follow the evlataion strategy used in 《How Powerful are Graph Neural Networks?》. In your code, I can't find how you calculate the final result.
Because I find several other papers that use the same split strategy but the method of computing final result is different.
so, could you please tell me that whether you follow the method:
"The cross-validation in our paper only uses training and validation sets (no test set) due to small dataset size. Specifically, after obtaining 10 validation curves corresponding to 10 folds, we first took average of validation curves across the 10 folds (thus, we obtain an averaged validation curve), and then selected a single epoch that achieved the maximum averaged validation accuracy. Finally, the standard devision over the 10 folds was computed at the selected epoch."
to calculate the result reported in your paper.

Looking forward to your reply.

Thanks

Please help me

when I run this command: $ python train_TextGNN.py --dataset mr --learning_rate 0.0001 --batch_size 4096 --num_epochs 150 --num_GNN_layers 2 --hidden_size 384 --model GatedGT,
an error occurred:
Traceback (most recent call last):
File "train_TextGNN.py", line 129, in
train_loss = train()
File "train_TextGNN.py", line 99, in train
torch.from_numpy(train_mask[idx]).float().to(device))
File "/home/zhang/anaconda3/envs/Gra/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/zhang/anaconda3/envs/Gra/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 155, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/zhang/anaconda3/envs/Gra/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 165, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/zhang/anaconda3/envs/Gra/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply
output.reraise()
File "/home/zhang/anaconda3/envs/Gra/lib/python3.7/site-packages/torch/_utils.py", line 395, in reraise
raise self.exc_type(msg)
AttributeError: Caught AttributeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/home/zhang/anaconda3/envs/Gra/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
output = module(*input, **kwargs)
File "/home/zhang/anaconda3/envs/Gra/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/home/zhang/compatibility_analysis/Graph-Transformer/TextGNN/model_MPGNN.py", line 62, in forward
graph_embeddings = torch.sum(x, 1) * torch.amax(x, 1)
AttributeError: module 'torch' has no attribute 'amax'
My torch version==1.5.0.
Can you help me solved this problem?

Graph input format

Hello,

How should be the dataset.txt file storing the graph to be loaded correctly by load_data? I have a networkx graph dumped in json and would like to use your good work!

Thanks.

Where will the learned embedding saved?

Thanks for your great work, but where will the learned embedding save?

Issue while running the demo code

Hello, I'm trying to run the Variant 1 demo with tensorflow:
python train_UGformerV1_Sup.py --dataset IMDBBINARY --batch_size 4 --ff_hidden_size 1024 --fold_idx 1 --num_neighbors 8 --num_epochs 50 --num_timesteps 4 --learning_rate 0.0005 --model_name IMDBBINARY_bs4_fold1_1024_8_idx0_4_1

However I obtain the following error:

`Traceback (most recent call last):
File "train_UGformerV1_Sup.py", line 11, in
from UGformerV1_Sup import UGformerV1
File "/content/Graph-Transformer/UGformerV1_TF/UGformerV1_Sup.py", line 2, in
import universal_transformer_modified
File "/content/Graph-Transformer/UGformerV1_TF/universal_transformer_modified.py", line 32, in
from tensor2tensor.models import transformer
File "/usr/local/lib/python3.7/dist-packages/tensor2tensor/models/init.py", line 51, in
from tensor2tensor.models.research import rl
File "/usr/local/lib/python3.7/dist-packages/tensor2tensor/models/research/rl.py", line 27, in
from tensor2tensor.envs import tic_tac_toe_env
File "/usr/local/lib/python3.7/dist-packages/tensor2tensor/envs/init.py", line 23, in
from tensor2tensor.envs import tic_tac_toe_env
File "/usr/local/lib/python3.7/dist-packages/tensor2tensor/envs/tic_tac_toe_env.py", line 244, in
register()
File "/usr/local/lib/python3.7/dist-packages/tensor2tensor/envs/tic_tac_toe_env.py", line 240, in register
"tensor2tensor.envs.tic_tac_toe_env:TicTacToeEnv", version="v0")
File "/usr/local/lib/python3.7/dist-packages/tensor2tensor/rl/gym_utils.py", line 360, in register_gym_env
return env_name, gym.make(env_name)
File "/usr/local/lib/python3.7/dist-packages/gym/envs/registration.py", line 610, in make
kwargs = spec.kwargs.copy()
AttributeError: 'NoneType' object has no attribute 'copy'

CalledProcessError Traceback (most recent call last)
in
----> 1 get_ipython().run_cell_magic('bash', '', '#cd liblinear\n#cd U2GNN\ncd Graph-Transformer\ncd UGformerV1_TF\npython train_UGformerV1_Sup.py --dataset IMDBBINARY --batch_size 4 --ff_hidden_size 1024 --fold_idx 1 --num_neighbors 8 --num_epochs 50 --num_timesteps 4 --learning_rate 0.0005 --model_name IMDBBINARY_bs4_fold1_1024_8_idx0_4_1\n')

3 frames
/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py in run_cell_magic(self, magic_name, line, cell)
2357 with self.builtin_trap:
2358 args = (magic_arg_s, cell)
-> 2359 result = fn(*args, **kwargs)
2360 return result
2361

/usr/local/lib/python3.7/dist-packages/IPython/core/magics/script.py in named_script_magic(line, cell)
140 else:
141 line = script
--> 142 return self.shebang(line, cell)
143
144 # write a basic docstring:

in shebang(self, line, cell)

/usr/local/lib/python3.7/dist-packages/IPython/core/magic.py in (f, *a, **k)
185 # but it's overkill for just that one bit of state.
186 def magic_deco(arg):
--> 187 call = lambda f, *a, **k: f(*a, **k)
188
189 if callable(arg):

/usr/local/lib/python3.7/dist-packages/IPython/core/magics/script.py in shebang(self, line, cell)
243 sys.stderr.flush()
244 if args.raise_error and p.returncode!=0:
--> 245 raise CalledProcessError(p.returncode, cell, output=out, stderr=err)
246
247 def _run_script(self, p, cell, to_close):

CalledProcessError: Command 'b'#cd liblinear\n#cd U2GNN\ncd Graph-Transformer\ncd UGformerV1_TF\npython train_UGformerV1_Sup.py --dataset IMDBBINARY --batch_size 4 --ff_hidden_size 1024 --fold_idx 1 --num_neighbors 8 --num_epochs 50 --num_timesteps 4 --learning_rate 0.0005 --model_name IMDBBINARY_bs4_fold1_1024_8_idx0_4_1\n'' returned non-zero exit status 1.`

I have all requirements up to date, do you have any idea how to fix it please ? I'm really interested in this repo

Thank you !

Build own dataset

@daiquocnguyen I enjoyed the work you created. I just wonder, how can I use it with my own dataset? Could you please provide the pattern that the graphs should be?

dataset

dear author, I have a question about your dataset, every dataset has a txt file, such as MUTAG.txt, COLLAB.txt. What does the data in those file mean?

Results decrease after shuffling the dataset

Hi,

I have a question about the result. I run the code (UGformerV1_PyTorch/train_UGformerV1_UnSup.py) with shuffled dataset, and the result decreases sharply compared to the dataset without shuffling (Please correct me if I run it wrongly and the result remains the same with shuffling). I wonder what the reason is...

I found that the graph order, if the dataset is not shuffled, is strongly related to the graph labels in the original dataset (e.g., the former half of the dataset have label 0), so is the global node id. But I don't know where the model (Transformer or SampledSoftmax) uses the global node id information...

Thanks

Error while running "train_pytorch_U2GNN_UnSup.py"

Hi, I am seeing the following error while running "train_pytorch_U2GNN_UnSup.py".
I see, this shows "cannot import name 'LogUniformSampler' from 'log_uniform' (unknown location)". But the file is already present in the mentioned path. I am not sure, why am I getting this error.
Please help me solving this issue.

Traceback (most recent call last):
File "train_pytorch_U2GNN_UnSup.py", line 12, in
from pytorch_U2GNN_UnSup import *
File "/mnt/c/Users/sanagraw/Documents/Graph/Graph-Transformer/U2GNN_pytorch/pytorch_U2GNN_UnSup.py", line 6, in
from sampled_softmax import *
File "/mnt/c/Users/sanagraw/Documents/Graph/Graph-Transformer/U2GNN_pytorch/sampled_softmax.py", line 7, in
from log_uniform import LogUniformSampler
ImportError: cannot import name 'LogUniformSampler' from 'log_uniform' (unknown location)

daiquocnguyen / graph-transformer Goto Github PK

graph-transformer's People

Contributors

Stargazers

Watchers

Forkers

graph-transformer's Issues

Recommend Projects

Recommend Topics

Recommend Org