deepgraphlearning / nbfnet Goto Github PK
View Code? Open in Web Editor NEWOfficial implementation of Neural Bellman-Ford Networks (NeurIPS 2021)
License: MIT License
Official implementation of Neural Bellman-Ford Networks (NeurIPS 2021)
License: MIT License
Hello,
Thank you for your wonderful work .
Can you provide the code for NBFNet to implement wikikg90m? How can I reproduce this result? I hope to get your help.
Hi
Do you have pretrained models available for FB15k-237/WN18RR that I can use to run evaluation?
Thanks
Hello,
I followed the instruction to install the torchdrug-related packages and matching PyTorch/CUDA version. However, I got this following error when initializing the code. Any ideas to fix this? The system has intel/19.0.3.199 loaded.
01:24:15 Epoch 0 begin
Traceback (most recent call last):
File "script/run.py", line 62, in <module>
train_and_validate(cfg, solver)
File "script/run.py", line 27, in train_and_validate
solver.train(**kwargs)
File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torchdrug/core/engine.py", line 143, in train
loss, metric = model(batch)
File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torchdrug/tasks/reasoning.py", line 85, in forward
pred = self.predict(batch, all_loss, metric)
File "~/Workspace/Python/NBFNet/nbfnet/task.py", line 288, in predict
pred = self.model(graph, h_index, t_index, r_index, all_loss=all_loss, metric=metric)
File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "~/Workspace/Python/NBFNet/nbfnet/model.py", line 149, in forward
output = self.bellmanford(graph, h_index[:, 0], r_index[:, 0])
File "<decorator-gen-888>", line 2, in bellmanford
File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torchdrug/utils/decorator.py", line 56, in wrapper
return forward(self, *args, **kwargs)
File "~/Workspace/Python/NBFNet/nbfnet/model.py", line 115, in bellmanford
hidden = layer(step_graph, layer_input)
File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torchdrug/layers/conv.py", line 91, in forward
update = self.message_and_aggregate(graph, input)
File "~/Workspace/Python/NBFNet/nbfnet/layer.py", line 124, in message_and_aggregate
adjacency = graph.adjacency.transpose(0, 1)
File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torchdrug/utils/decorator.py", line 21, in __get__
result = self.func(obj)
File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torchdrug/data/graph.py", line 658, in adjacency
return utils.sparse_coo_tensor(self.edge_list.t(), self.edge_weight, self.shape)
File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torchdrug/utils/torch.py", line 182, in sparse_coo_tensor
return torch_ext.sparse_coo_tensor_unsafe(indices, values, size)
File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torchdrug/utils/torch.py", line 27, in __getattr__
return getattr(self.module, key)
File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torchdrug/utils/decorator.py", line 21, in __get__
result = self.func(obj)
File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torchdrug/utils/torch.py", line 31, in module
return cpp_extension.load(self.name, self.sources, self.extra_cflags, self.extra_cuda_cflags,
File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1079, in load
return _jit_compile(
File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1292, in _jit_compile
_write_ninja_file_and_build_library(
File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1378, in _write_ninja_file_and_build_library
check_compiler_abi_compatibility(compiler)
File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 282, in check_compiler_abi_compatibility
if not check_compiler_ok_for_platform(compiler):
File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 249, in check_compiler_ok_for_platform
version_string = subprocess.check_output([compiler, '-v'], stderr=subprocess.STDOUT).decode()
File "~/anaconda3/envs/dlg_env/lib/python3.8/subprocess.py", line 415, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "~/anaconda3/envs/dlg_env/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['icpc', '-v']' returned non-zero exit status 1.
First of all, thanks for the awesome code!
The authors claim that they follow the experiment settings of GraIL, which draws 50 negative triplets for each positive triplet and use the filtered ranking. However, I do not find the corresponding process of drawing 50 negative samples in the code. Can the authors please answer my question?
Hi!
Could you provide the code required for training on WikiKG90M?
Hello,
I was wondering whether there is a particular reason for adding the inverse relations in the graph.
Thanks in advance for your answer!
Hi,
Thank you for your wonderful work and open source code.
I want to know how to train other KGs on the KG completion task? Such as FB15K.
Thanks very much.
Hi, Doctor. I meet some problems when I run the code on the Linux.
I do really need your help. Could you help me? It really troubles me a lot.
15:43:32 Preprocess training set
15:43:36 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
15:43:36 Epoch 0 begin
Traceback (most recent call last):
File "/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1666, in _run_ninja_build
subprocess.run(
File "/data1/home/wza/.conda/envs/linkp/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "script/run.py", line 62, in <module>
train_and_validate(cfg, solver)
File "script/run.py", line 27, in train_and_validate
solver.train(**kwargs)
File "/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/core/engine.py", line 143, in train
loss, metric = model(batch)
File "/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/tasks/reasoning.py", line 85, in forward
pred = self.predict(batch, all_loss, metric)
File "/data1/home/wza/nbfnet/nbfnet/task.py", line 288, in predict
pred = self.model(graph, h_index, t_index, r_index, all_loss=all_loss, metric=metric)
File "/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/data1/home/wza/nbfnet/nbfnet/model.py", line 149, in forward
output = self.bellmanford(graph, h_index[:, 0], r_index[:, 0])
File "/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/decorator.py", line 232, in fun
return caller(func, *(extras + args), **kw)
File "/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/utils/decorator.py", line 56, in wrapper
return forward(self, *args, **kwargs)
File "/data1/home/wza/nbfnet/nbfnet/model.py", line 115, in bellmanford
hidden = layer(step_graph, layer_input)
File "/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/layers/conv.py", line 91, in forward
update = self.message_and_aggregate(graph, input)
File "/data1/home/wza/nbfnet/nbfnet/layer.py", line 140, in message_and_aggregate
sum = functional.generalized_rspmm(adjacency, relation_input, input, sum="add", mul=mul)
File "/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/layers/functional/spmm.py", line 378, in generalized_rspmm
return Function.apply(sparse.coalesce(), relation, input)
File "/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/layers/functional/spmm.py", line 172, in forward
forward = spmm.rspmm_add_mul_forward_cuda
File "/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/utils/torch.py", line 27, in __getattr__
return getattr(self.module, key)
File "/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/utils/decorator.py", line 21, in __get__
result = self.func(obj)
File "/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/utils/torch.py", line 31, in module
return cpp_extension.load(self.name, self.sources, self.extra_cflags, self.extra_cuda_cflags,
File "/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1080, in load
return _jit_compile(
File "/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1293, in _jit_compile
_write_ninja_file_and_build_library(
File "/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1405, in _write_ninja_file_and_build_library
_run_ninja_build(
File "/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1682, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'spmm': [1/3] /usr/local/cuda-10.2/bin/nvcc -DTORCH_EXTENSION_NAME=spmm -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torch/include -isystem /data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torch/include/TH -isystem /data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-10.2/include -isystem /data1/home/wza/.conda/envs/linkp/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -O3 -std=c++14 -c /data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/layers/functional/extension/rspmm.cu -o rspmm.cuda.o
FAILED: rspmm.cuda.o
/usr/local/cuda-10.2/bin/nvcc -DTORCH_EXTENSION_NAME=spmm -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torch/include -isystem /data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torch/include/TH -isystem /data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-10.2/include -isystem /data1/home/wza/.conda/envs/linkp/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -O3 -std=c++14 -c /data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/layers/functional/extension/rspmm.cu -o rspmm.cuda.o
/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/layers/functional/extension/rspmm.cu: In instantiation of ‘at::rspmm_forward_cuda(const SparseTensor&, const at::Tensor&, const at::Tensor&)::<lambda()>::<lambda()> [with NaryOp = at::NaryAdd; BinaryOp = at::BinaryMul]’:
/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/layers/functional/extension/rspmm.cu:246:600: required from ‘struct at::rspmm_forward_cuda(const SparseTensor&, const at::Tensor&, const at::Tensor&)::<lambda()> [with NaryOp = at::NaryAdd; BinaryOp = at::BinaryMul]::<lambda()>’
/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/layers/functional/extension/rspmm.cu:246:608: required from ‘at::rspmm_forward_cuda(const SparseTensor&, const at::Tensor&, const at::Tensor&)::<lambda()> [with NaryOp = at::NaryAdd; BinaryOp = at::BinaryMul]’
/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/layers/functional/extension/rspmm.cu:246:607: required from ‘struct at::rspmm_forward_cuda(const SparseTensor&, const at::Tensor&, const at::Tensor&) [with NaryOp = at::NaryAdd; BinaryOp = at::BinaryMul; at::sparse::SparseTensor = at::Tensor]::<lambda()>’
/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/layers/functional/extension/rspmm.cu:246:28: required from ‘at::Tensor at::rspmm_forward_cuda(const SparseTensor&, const at::Tensor&, const at::Tensor&) [with NaryOp = at::NaryAdd; BinaryOp = at::BinaryMul; at::sparse::SparseTensor = at::Tensor]’
/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/layers/functional/extension/rspmm.cu:356:193: required from here
/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/layers/functional/extension/rspmm.cu:244:37: internal compiler error: in tsubst_copy, at cp/pt.c:13189
const int num_row_block = (num_row + row_per_block - 1) / row_per_block;
^
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-5/README.Bugs> for instructions.
[2/3] /usr/local/cuda-10.2/bin/nvcc -DTORCH_EXTENSION_NAME=spmm -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torch/include -isystem /data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torch/include/TH -isystem /data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-10.2/include -isystem /data1/home/wza/.conda/envs/linkp/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -O3 -std=c++14 -c /data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/layers/functional/extension/spmm.cu -o spmm.cuda.o
FAILED: spmm.cuda.o
/usr/local/cuda-10.2/bin/nvcc -DTORCH_EXTENSION_NAME=spmm -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torch/include -isystem /data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torch/include/TH -isystem /data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-10.2/include -isystem /data1/home/wza/.conda/envs/linkp/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -O3 -std=c++14 -c /data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/layers/functional/extension/spmm.cu -o spmm.cuda.o
/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/layers/functional/extension/spmm.cu: In instantiation of ‘at::spmm_forward_cuda(const SparseTensor&, const at::Tensor&)::<lambda()>::<lambda()> [with NaryOp = at::NaryAdd; BinaryOp = at::BinaryMul]’:
/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/layers/functional/extension/spmm.cu:219:506: required from ‘struct at::spmm_forward_cuda(const SparseTensor&, const at::Tensor&)::<lambda()> [with NaryOp = at::NaryAdd; BinaryOp = at::BinaryMul]::<lambda()>’
/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/layers/functional/extension/spmm.cu:219:514: required from ‘at::spmm_forward_cuda(const SparseTensor&, const at::Tensor&)::<lambda()> [with NaryOp = at::NaryAdd; BinaryOp = at::BinaryMul]’
/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/layers/functional/extension/spmm.cu:219:512: required from ‘struct at::spmm_forward_cuda(const SparseTensor&, const at::Tensor&) [with NaryOp = at::NaryAdd; BinaryOp = at::BinaryMul; at::sparse::SparseTensor = at::Tensor]::<lambda()>’
/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/layers/functional/extension/spmm.cu:219:28: required from ‘at::Tensor at::spmm_forward_cuda(const SparseTensor&, const at::Tensor&) [with NaryOp = at::NaryAdd; BinaryOp = at::BinaryMul; at::sparse::SparseTensor = at::Tensor]’
/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/layers/functional/extension/spmm.cu:315:157: required from here
/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/layers/functional/extension/spmm.cu:217:37: internal compiler error: in tsubst_copy, at cp/pt.c:13189
const int num_row_block = (num_row + row_per_block - 1) / row_per_block;
^
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-5/README.Bugs> for instructions.
ninja: build stopped: subcommand failed.
RuntimeError: Error(s) in loading state_dict for KnowledgeGraphCompletion:
While copying the parameter named "graph", expected torch.Tensor or Tensor-like object from checkpoint but received <class 'torchdrug.data.graph.Graph'>
While copying the parameter named "fact_graph", expected torch.Tensor or Tensor-like object from checkpoint but received <class 'torchdrug.data.graph.Graph'>
Congratulations to the authors for NeurIPS'21, looking forward to your talk during LoGaG
While installing the project on VMs and local systems, I've been running into multiple issues getting the correct package versions installed. Be it CUDA errors while installing torch-scatter
and torchdrug
or simply pybind11
issues. Having a Dockerfile would help out with preventing such errors and make reproducibility + experimentation easier.
I think it'd be easier and better for there to be a Docker image for torchdrug
itself and then the image for NBFNet would just use that as the base image. More than happy to take this up.
This way one could also use the nvidia container toolkit for running experiments across multiple GPUs/nodes easily.
Hi,
I tried a new model on NBFNet and tried to load it. But I cannot load it, the issues seem to come from the torchdrug/patch.py. I wonder if you have a good solution on this:
Traceback (most recent call last):
File "script/run.py", line 60, in
solver = util.build_solver(cfg, dataset)
File "/shared-datadrive/shared-training/NBFNet/nbfnet/util.py", line 120, in build_solver
solver.load(cfg.checkpoint)
File "/home/azureuser/.pyenv/versions/nbfnet/lib/python3.8/site-packages/torchdrug-0.1.2-py3.8.egg/torchdrug/core/engine.py", line 231, in load
self.model.load_state_dict(state["model"])
File "/home/azureuser/.pyenv/versions/nbfnet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1497, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for KnowledgeGraphCompletion:
While copying the parameter named "graph", expected torch.Tensor or Tensor-like object from checkpoint but received <class 'torchdrug.data.graph.Graph'>
While copying the parameter named "fact_graph", expected torch.Tensor or Tensor-like object from checkpoint but received <class 'torchdrug.data.graph.Graph'>
And I checked the module in nn.Module is actually overwritten by PatchedModule
-> self.model.load_state_dict(state["model"])
(Pdb) nn.Module
<class 'torchdrug.patch.PatchedModule'>
Hello!
First of all thanks so much for this awesome publication & codebase.
I'm in the process of tweaking a config for training NBFNet, and trying to understand the proportion of the training triples used when training on ogbl-biokg
using the provided config config/knowledge_graph/ogbl-biokg.yaml.
Since batch_size: 8
, batch_per_epoch: 200
and num_epoch: 10
, and the number of training triples in ogbl-biokg
being 4,762,678
, is it correct to assume that only (8 * 200 * 10)/4,762,678 = 0.000335... ≈ 0.34%
of the training triples is used for the entire training run?
It seems very small and I'm most likely missing some vital implementation details - I'd appreciate your help.
Thanks so much!
Hello. I have difficulties reproducing the results of SEAL baselines in the paper. Could you please provide more details? For example, the code base you use and the hyperparameters?
Hi, I followed the instruction to reproduce results but had a problem with module 'spmm'. My torch version is 1.8.2, torchdrug is 0.1.2. Any ideas how to fix it?
12:53:15 Epoch 0 begin
Traceback (most recent call last):
File "script/run.py", line 78, in
File "script/run.py", line 30, in train_and_validate
File "C:\Users\Pengfei\anaconda3\envs\py38\lib\site-packages\torchdrug\core\engine.py", line 143, in train
loss, metric = model(batch)
File "C:\Users\Pengfei\anaconda3\envs\py38\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\Pengfei\anaconda3\envs\py38\lib\site-packages\torchdrug\tasks\reasoning.py", line 85, in forward
pred = self.predict(batch, all_loss, metric)
File "C:\Users\Pengfei\Documents\cse research\NBFNet-master\nbfnet\task.py", line 288, in predict
pred = self.model(graph, h_index, t_index, r_index, all_loss=all_loss, metric=metric)
File "C:\Users\Pengfei\anaconda3\envs\py38\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\Pengfei\Documents\cse research\NBFNet-master\nbfnet\model.py", line 149, in forward
output = self.bellmanford(graph, h_index[:, 0], r_index[:, 0])
File "C:\Users\Pengfei\anaconda3\envs\py38\lib\site-packages\decorator.py", line 232, in fun
return caller(func, *(extras + args), **kw)
File "C:\Users\Pengfei\anaconda3\envs\py38\lib\site-packages\torchdrug\utils\decorator.py", line 56, in wrapper
return forward(self, *args, **kwargs)
File "C:\Users\Pengfei\Documents\cse research\NBFNet-master\nbfnet\model.py", line 115, in bellmanford
hidden = layer(step_graph, layer_input)
File "C:\Users\Pengfei\anaconda3\envs\py38\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\Pengfei\anaconda3\envs\py38\lib\site-packages\torchdrug\layers\conv.py", line 91, in forward
update = self.message_and_aggregate(graph, input)
File "C:\Users\Pengfei\Documents\cse research\NBFNet-master\nbfnet\layer.py", line 140, in message_and_aggregate
sum = functional.generalized_rspmm(adjacency, relation_input, input, sum="add", mul=mul)
File "C:\Users\Pengfei\anaconda3\envs\py38\lib\site-packages\torchdrug\layers\functional\spmm.py", line 378, in generalized_rspmm
return Function.apply(sparse.coalesce(), relation, input)
File "C:\Users\Pengfei\anaconda3\envs\py38\lib\site-packages\torchdrug\layers\functional\spmm.py", line 172, in forward
forward = spmm.rspmm_add_mul_forward_cuda
File "C:\Users\Pengfei\anaconda3\envs\py38\lib\site-packages\torchdrug\utils\torch.py", line 27, in getattr
return getattr(self.module, key)
File "C:\Users\Pengfei\anaconda3\envs\py38\lib\site-packages\torchdrug\utils\decorator.py", line 21, in get
result = self.func(obj)
File "C:\Users\Pengfei\anaconda3\envs\py38\lib\site-packages\torchdrug\utils\torch.py", line 31, in module
return cpp_extension.load(self.name, self.sources, self.extra_cflags, self.extra_cuda_cflags,
File "C:\Users\Pengfei\anaconda3\envs\py38\lib\site-packages\torch\utils\cpp_extension.py", line 1079, in load
return _jit_compile(
File "C:\Users\Pengfei\anaconda3\envs\py38\lib\site-packages\torch\utils\cpp_extension.py", line 1317, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "C:\Users\Pengfei\anaconda3\envs\py38\lib\site-packages\torch\utils\cpp_extension.py", line 1700, in _import_module_from_library
file, path, description = imp.find_module(module_name, [path])
File "C:\Users\Pengfei\anaconda3\envs\py38\lib\imp.py", line 296, in find_module
raise ImportError(_ERR_MSG.format(name), name=name)
ImportError: No module named 'spmm'
Hey,
Most likely this is an error with torch drug itself however when I try to run any of the examples from the readme, the code will crash with the following error:
spmm.cuda.o.d -DTORCH_EXTENSION_NAME=spmm -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/user/miniconda3/envs/path/lib/python3.8/site-packages/torch/include -isystem /home/user/miniconda3/envs/path/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/miniconda3/envs/path/lib/python3.8/site-packages/torch/include/TH -isystem /home/user/miniconda3/envs/path/lib/python3.8/site-packages/torch/include/THC -isystem /opt/scp/software/CUDA/11.1.0/include -isystem /home/miniconda3/envs/path/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -O3 -std=c++14 -c /home/user/miniconda3/envs/path/lib/python3.8/site-packages/torchdrug/layers/functional/extension/spmm.cu -o spmm.cuda.o
FAILED: spmm.cuda.o
/opt/scp/software/CUDA/11.1.0/bin/nvcc --generate-dependencies-with-compile --dependency-output spmm.cuda.o.d -DTORCH_EXTENSION_NAME=spmm -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/user/miniconda3/envs/path/lib/python3.8/site-packages/torch/include -isystem /home/user/miniconda3/envs/path/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/miniconda3/envs/path/lib/python3.8/site-packages/torch/include/TH -isystem /home/user/miniconda3/envs/path/lib/python3.8/site-packages/torch/include/THC -isystem /opt/scp/software/CUDA/11.1.0/include -isystem /home/user/miniconda3/envs/path/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -O3 -std=c++14 -c /home/user/miniconda3/envs/path/lib/python3.8/site-packages/torchdrug/layers/functional/extension/spmm.cu -o spmm.cuda.o
/opt/software/CUDA/11.1.0/include/cuComplex.h: In function ‘float cuCabsf(cuFloatComplex)’:
/opt/software/CUDA/11.1.0/include/cuComplex.h:179:16: error: expected ‘)’ before numeric constant
This only occurs on a GPU linux machine, which is using CUDA 11.1 and GCC 10.3.
The conda env is as follows:
blas 1.0 mkl
boost 1.74.0 py38hc10631b_3 conda-forge
boost-cpp 1.74.0 h9359b55_0 conda-forge
brotlipy 0.7.0 py38h497a2fe_1001 conda-forge
bzip2 1.0.8 h7f98852_4 conda-forge
ca-certificates 2021.10.8 ha878542_0 conda-forge
cairo 1.16.0 h3fc0475_1005 conda-forge
certifi 2021.10.8 py38h578d9bd_1 conda-forge
cffi 1.15.0 py38hd667e15_1
charset-normalizer 2.0.10 pyhd8ed1ab_0 conda-forge
colorama 0.4.4 pyh9f0ad1d_0 conda-forge
cryptography 35.0.0 py38ha5dfef3_0 conda-forge
cudatoolkit 11.1.1 h6406543_8 conda-forge
cycler 0.11.0 pyhd8ed1ab_0 conda-forge
decorator 4.4.2 py_0 conda-forge
easydict 1.9 py_0 conda-forge
fontconfig 2.13.1 hba837de_1005 conda-forge
freetype 2.10.4 h0708190_1 conda-forge
glib 2.69.1 h4ff587b_1
icu 67.1 he1b5a44_0 conda-forge
idna 3.3 pyhd8ed1ab_0 conda-forge
intel-openmp 2021.4.0 h06a4308_3561
jinja2 3.0.3 pyhd8ed1ab_0 conda-forge
joblib 1.1.0 pyhd8ed1ab_0 conda-forge
jpeg 9d h36c2ea0_0 conda-forge
kiwisolver 1.3.1 py38h2531618_0
ld_impl_linux-64 2.35.1 h7274673_9
libffi 3.3 he6710b0_2
libgcc-ng 9.3.0 h5101ec6_17
libgfortran-ng 7.5.0 h14aa051_19 conda-forge
libgfortran4 7.5.0 h14aa051_19 conda-forge
libgomp 9.3.0 h5101ec6_17
libiconv 1.16 h516909a_0 conda-forge
libpng 1.6.37 h21135ba_2 conda-forge
libstdcxx-ng 9.3.0 hd4cf53a_17
libtiff 4.0.10 hc3755c2_1005 conda-forge
libuuid 2.32.1 h7f98852_1000 conda-forge
libuv 1.42.0 h7f98852_0 conda-forge
libxcb 1.13 h7f98852_1003 conda-forge
libxml2 2.9.10 h68273f3_2 conda-forge
littleutils 0.2.2 py_0 conda-forge
lz4-c 1.9.3 h9c3ff4c_1 conda-forge
markupsafe 2.0.1 py38h497a2fe_0 conda-forge
matplotlib 3.2.2 1 conda-forge
matplotlib-base 3.2.2 py38h5d868c9_1 conda-forge
mkl 2021.4.0 h06a4308_640
mkl-service 2.4.0 py38h497a2fe_0 conda-forge
mkl_fft 1.3.1 py38hd3c417c_0
mkl_random 1.2.2 py38h1abd341_0 conda-forge
ncurses 6.3 h7f8727e_2
networkx 2.5.1 pyhd8ed1ab_0 conda-forge
ninja 1.10.2 h4bd325d_0 conda-forge
numpy 1.21.2 py38h20f2e39_0
numpy-base 1.21.2 py38h79a1101_0
ogb 1.3.2 pyhd8ed1ab_0 conda-forge
olefile 0.46 pyh9f0ad1d_1 conda-forge
openssl 1.1.1m h7f8727e_0
outdated 0.2.1 pyhd8ed1ab_0 conda-forge
pandas 1.2.5 py38h1abd341_0 conda-forge
pcre 8.45 h9c3ff4c_0 conda-forge
pillow 6.2.1 py38h6b7be26_0 conda-forge
pip 21.2.4 py38h06a4308_0
pixman 0.38.0 h516909a_1003 conda-forge
pthread-stubs 0.4 h36c2ea0_1001 conda-forge
pycairo 1.20.1 py38hf61ee4a_0 conda-forge
pycparser 2.21 pyhd8ed1ab_0 conda-forge
pyopenssl 21.0.0 pyhd8ed1ab_0 conda-forge
pyparsing 3.0.7 pyhd8ed1ab_0 conda-forge
pysocks 1.7.1 py38h578d9bd_4 conda-forge
python 3.8.12 h12debd9_0
python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge
python_abi 3.8 2_cp38 conda-forge
pytorch 1.8.2 py3.8_cuda11.1_cudnn8.0.5_0 pytorch-lts
pytorch-scatter 2.0.8 py38_torch_1.8.0_cu111 pyg
pytz 2021.3 pyhd8ed1ab_0 conda-forge
pyyaml 5.4.1 py38h497a2fe_0 conda-forge
rdkit 2020.09.5 py38h2bca085_0 conda-forge
readline 8.1.2 h7f8727e_1
reportlab 3.5.68 py38hadf75a6_0 conda-forge
requests 2.27.1 pyhd8ed1ab_0 conda-forge
scikit-learn 1.0.2 py38h51133e4_1
scipy 1.7.3 py38hc147768_0
setuptools 58.0.4 py38h06a4308_0
six 1.16.0 pyh6c4a22f_0 conda-forge
sqlalchemy 1.3.23 py38h497a2fe_0 conda-forge
sqlite 3.37.0 hc218d9a_0
threadpoolctl 3.0.0 pyh8a188c0_0 conda-forge
tk 8.6.11 h1ccaba5_0
torchdrug 0.1.2 ha710097 milagraph
tornado 6.1 py38h497a2fe_1 conda-forge
tqdm 4.62.3 pyhd8ed1ab_0 conda-forge
typing_extensions 4.0.1 pyha770c72_0 conda-forge
urllib3 1.26.8 pyhd8ed1ab_1 conda-forge
wheel 0.37.1 pyhd3eb1b0_0
xorg-kbproto 1.0.7 h7f98852_1002 conda-forge
xorg-libice 1.0.10 h7f98852_0 conda-forge
xorg-libsm 1.2.3 hd9c2040_1000 conda-forge
xorg-libx11 1.7.2 h7f98852_0 conda-forge
xorg-libxau 1.0.9 h7f98852_0 conda-forge
xorg-libxdmcp 1.1.3 h7f98852_0 conda-forge
xorg-libxext 1.3.4 h7f98852_1 conda-forge
xorg-libxrender 0.9.10 h7f98852_1003 conda-forge
xorg-renderproto 0.11.1 h7f98852_1002 conda-forge
xorg-xextproto 7.3.0 h7f98852_1002 conda-forge
xorg-xproto 7.0.31 h7f98852_1007 conda-forge
xz 5.2.5 h7b6447c_0
yaml 0.2.5 h516909a_0 conda-forge
zlib 1.2.11 h7f8727e_4
zstd 1.4.9 ha95c52a_0 conda-forge
Any ideas how to get this to run?
Many thanks!
Hi there.
I have tried running this code on one of my machine with four RTX3090 GPUs (GPU memory 24GB for each)
python -m torch.distributed.launch --nproc_per_node=4 script/run.py -c config/inductive/wn18rr.yaml --gpus [0,1,2,3]
I do not change any other parts of this repo. However, I encountered the CUDA error saying that I need more GPU memory. Later I modified this code as follows:
python script/run.py -c config/inductive/wn18rr.yaml --gpus [0]
and run it on a machine with one A100 GPU with 40GB GPU memory. The code runs successfully and costs roughly 32GB GPU memory. I am really puzzled for this: why the code does not properly utilize the total 24GB*4=96GB GPU memory and still report a memory issue? Is there something wrong with my setups?
Hi! I followed the instruction to install the packages. But now I'm getting an ImportError when reproducing the results. The error is as following. I also tried rm -r ~/.cache/torch_extensions/*
as suggested in Readme but that will cause more error.
Traceback (most recent call last):
File "script/run.py", line 69, in
train_and_validate(cfg, solver)
File "script/run.py", line 28, in train_and_validate
solver.evaluate("test")
File "/home/lja/anaconda3/envs/NBFnet/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/lja/anaconda3/envs/NBFnet/lib/python3.8/site-packages/torchdrug-0.1.2-py3.8.egg/torchdru g/core/engine.py", line 206, in evaluate
pred, target = model.predict_and_target(batch)
File "/home/lja/anaconda3/envs/NBFnet/lib/python3.8/site-packages/torchdrug-0.1.2-py3.8.egg/torchdru g/tasks/task.py", line 27, in predict_and_target
return self.predict(batch, all_loss, metric), self.target(batch)
File "/home/lja/git_clone/NBFNet/nbfnet/task.py", line 277, in predict
t_pred = self.model(graph, h_index, t_index, r_index, all_loss=all_loss, metric=metric)
File "/home/lja/anaconda3/envs/NBFnet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/lja/git_clone/NBFNet/nbfnet/model.py", line 149, in forward
output = self.bellmanford(graph, h_index[:, 0], r_index[:, 0])
File "/home/lja/anaconda3/envs/NBFnet/lib/python3.8/site-packages/decorator.py", line 232, in fun
return caller(func, *(extras + args), **kw)
File "/home/lja/anaconda3/envs/NBFnet/lib/python3.8/site-packages/torchdrug-0.1.2-py3.8.egg/torchdru g/utils/decorator.py", line 88, in wrapper
result = forward(self, *args, **kwargs)
File "/home/lja/git_clone/NBFNet/nbfnet/model.py", line 115, in bellmanford
hidden = layer(step_graph, layer_input)
File "/home/lja/anaconda3/envs/NBFnet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/lja/anaconda3/envs/NBFnet/lib/python3.8/site-packages/torchdrug-0.1.2-py3.8.egg/torchdru g/layers/conv.py", line 91, in forward
update = self.message_and_aggregate(graph, input)
File "/home/lja/git_clone/NBFNet/nbfnet/layer.py", line 140, in message_and_aggregate
sum = functional.generalized_rspmm(adjacency, relation_input, input, sum="add", mul=mul)
File "/home/lja/anaconda3/envs/NBFnet/lib/python3.8/site-packages/torchdrug-0.1.2-py3.8.egg/torchdru g/layers/functional/spmm.py", line 378, in generalized_rspmm
return Function.apply(sparse.coalesce(), relation, input)
File "/home/lja/anaconda3/envs/NBFnet/lib/python3.8/site-packages/torchdrug-0.1.2-py3.8.egg/torchdru g/layers/functional/spmm.py", line 172, in forward
forward = spmm.rspmm_add_mul_forward_cuda
File "/home/lja/anaconda3/envs/NBFnet/lib/python3.8/site-packages/torchdrug-0.1.2-py3.8.egg/torchdru g/utils/torch.py", line 27, in getattr
return getattr(self.module, key)
File "/home/lja/anaconda3/envs/NBFnet/lib/python3.8/site-packages/torchdrug-0.1.2-py3.8.egg/torchdru g/utils/decorator.py", line 21, in get
result = self.func(obj)
File "/home/lja/anaconda3/envs/NBFnet/lib/python3.8/site-packages/torchdrug-0.1.2-py3.8.egg/torchdru g/utils/torch.py", line 31, in module
return cpp_extension.load(self.name, self.sources, self.extra_cflags, self.extra_cuda_cflags,
File "/home/lja/anaconda3/envs/NBFnet/lib/python3.8/site-packages/torch/utils/cpp_extension.py", lin e 1144, in load
return _jit_compile(
File "/home/lja/anaconda3/envs/NBFnet/lib/python3.8/site-packages/torch/utils/cpp_extension.py", lin e 1382, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/home/lja/anaconda3/envs/NBFnet/lib/python3.8/site-packages/torch/utils/cpp_extension.py", lin e 1776, in _import_module_from_library
module = importlib.util.module_from_spec(spec)
File "", line 556, in module_from_spec
File "", line 1166, in create_module
File "", line 219, in _call_with_frames_removed
ImportError: /home/lja/.cache/torch_extensions/spmm_0/spmm.so: cannot open shared object file: No such file or directory
I'm using torch1.11+cuda11.3 \ torchdrug0.1.2
Do you know how to dealing with this? Any help is appreciated!
By the way, in other issues I noticed an enviroment.yml would be released. Where can I find that? Thanks!
Hi,
I found the Hits@10 of RotatE in FB15k237 (0.553) is higher than original paper (0.533). And others are same.
Is this a recording error or did you improve the performance of RotatE?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.