daiquocnguyen / convkb Goto Github PK

A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network (NAACL 2018) (Pytorch and Tensorflow)

License: Apache License 2.0

Python 69.85% Shell 1.82% C++ 12.68% Objective-C 2.68% C 12.97%

knowledge-base-completion knowledge-base-embeddings link-prediction knowledge-graph-completion knowledge-graph-embeddings wn18rr fb15k237 convolutional-neural-network pytorch-implementation knowledge-graphs

convkb's Introduction

A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network

This program provides the implementation of the CNN-based model ConvKB for knowledge graph embeddings as described in the paper. ConvKB uses a convolution layer with different filters of the same 1 × 3 shape and then concatenates output feature maps into a single vector which is multiplied by a weight vector to produce a score for the given triple.

Usage

News

June 13, 2020: Update Pytorch (1.5.0) implementation. The ConvKB Pytorch implementation, which is based on the OpenKE framework, is to deal with the issue #5.
May 30, 2020: The Tensorflow implementation was completed approximately three years ago, and now it is out-of-date.
March 06, 2018: Note that our Tensorflow implementation can leverage different filters of different n × 3 shapes, so we can tune the hyper-parameter n. In our paper, we set n to 1 for simplification.

Requirements

Python 3.6
Pytorch 1.5.0 or Tensorflow 1.6

Training

Regarding the Pytorch implementation, you should run ''bash make.sh'' to compile the base package and then use the commands as:

$ python train_ConvKB.py --dataset WN18RR --hidden_size 50 --num_of_filters 64 --neg_num 10 --valid_step 50 --nbatches 100 --num_epochs 300 --learning_rate 0.01 --lmbda 0.2 --model_name WN18RR_lda-0.2_nneg-10_nfilters-64_lr-0.01 --mode train

$ python train_ConvKB.py --dataset FB15K237 --hidden_size 100 --num_of_filters 128 --neg_num 10 --valid_step 50 --nbatches 100 --num_epochs 300 --learning_rate 0.01 --lmbda 0.1 --model_name FB15K237_lda-0.1_nneg-10_nfilters-128_lr-0.01 --mode train

Dataset	MR	MRR	Hits@10
WN18RR	2741	0.220	50.8
FB15K-237	196	0.302	48.3

Cite

Please cite the paper whenever ConvKB is used to produce published results or incorporated into other software:

@inproceedings{Nguyen2018,
  author={Dai Quoc Nguyen and Tu Dinh Nguyen and Dat Quoc Nguyen and Dinh Phung},
  title={A Novel Embedding Model for Knowledge Base Completion Based on Convolutional Neural Network},
  booktitle={Proceedings of the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT)},
  pages={327--333},
  year={2018}
}

License

Please cite the paper whenever ConvKB is used to produce published results or incorporated into other software. I would highly appreciate to have your bug reports, comments and suggestions about ConvKB. As a free open-source implementation, ConvKB is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

ConvKB is licensed under the Apache License 2.0.

convkb's People

Stargazers

Watchers

convkb's Issues

Finding validation loss

Can you tell me how to find validation loss in your code? Thank you in advance.

Training on FB15k-237 without pre-trained embeddings.

Hi,
I am unable to reproduce the results reported on FB15k-237 without using the given pre-trained embeddings. Also, providing --pre-trained as True/False doesn't make any difference because in either case it is set to True (reference). The provided training command for FB15k-237 is

python train.py --embedding_dim 100 --num_filters 50 --learning_rate 0.000005 --name FB15k-237 --useConstantInit --model_name fb15k237

This uses a very small learning rate (almost 0), so basically it is not changing the provided pre-trained embeddings during training. Please provide the hyperparameters you used for learning the pre-trained embeddings.

Thanks in advance

Hi, i try to implement this model myself, i use transe embeding i train myself, but i find that i can not reach the result you claim in your paper, i try many many times, help!

how can i get entity2vec100.init?

Try to run this in another dataset,how can i get entity2vec100.init?

Interpretability of results/test

Hi, appreciate your work on knowledge graphs!

After running the code (in PyTorch) I do not get a h@10, MR or MRR.
Am I interpreting the results of wrongly, or are they not printed?

Best regards

no suitable image found. Did find:

`python train_ConvKB.py --dataset FB15K237 --hidden_size 100 --num_of_filters 128 --neg_num 10 --valid_step 50 --nbatches 100 --num_epochs 300 --learning_rate 0.01 --lmbda 0.1 --model_name FB15K237_lda-0.1_nneg-10_nfilters-128_lr-0.01 --mode train
Namespace(checkpoint_path=None, dataset='FB15K237', dropout=0.5, hidden_size=100, kernel_size=1, learning_rate=0.01, lmbda=0.1, lmbda2=0.01, mode='train', model_name='FB15K237_lda-0.1_nneg-10_nfilters-128_lr-0.01', nbatches=100, neg_num=10, num_epochs=300, num_of_filters=128, optim='adagrad', save_steps=1000, test_file='', use_init=1, valid_steps=50)
Writing to /Users/***/Downloads/ConvKB-master/runs_pytorch_ConvKB

Traceback (most recent call last):
File "train_ConvKB.py", line 42, in
con = Config()
File "/Users//Downloads/ConvKB-master/ConvKB_pytorch/Config.py", line 34, in init
self.lib = ctypes.cdll.LoadLibrary(base_file)
File "/Users//opt/anaconda3/lib/python3.7/ctypes/init.py", line 442, in LoadLibrary
return self._dlltype(name)
File "/Users//opt/anaconda3/lib/python3.7/ctypes/init.py", line 364, in init
self._handle = _dlopen(self._name, mode)
OSError: dlopen(/Users//Downloads/ConvKB-master/ConvKB_pytorch/release/Base.so, 6): no suitable image found. Did find:
/Users//Downloads/ConvKB-master/ConvKB_pytorch/release/Base.so: unknown file type, first eight bytes: 0x7F 0x45 0x4C 0x46 0x02 0x01 0x01 0x00
/Users//Downloads/ConvKB-master/ConvKB_pytorch/release/Base.so: unknown file type, first eight bytes: 0x7F 0x45 0x4C 0x46 0x02 0x01 0x01 0x00`

MacOS
torch 1.7.0
when i run the command get above error
why and how to fix ?

About python train.py

Hi,
When i run ' CUDA_VISIBLE_DEVICES =2 python train.py --embedding_dim 100 --num_filters 400 --learning_rate 0.00005 --name WN18RR --num_epochs 101 --saveStep 100 --model_name wn18rr_400_3',
why the program stays on ‘Writing to /home/***/code/runs/fb15k237’，and the GPU-Util is 0
the 'checkpoints' is empty
thanks！

a problem in the eval.py

I find a problem in the eval.py. when the scores of these entities are same, the ranking of the all entities are 1. Can you explain this？

No file named 'model-200.eval.0.txt'

Thank you for your code, it helps me a lot!
I am a fresh man, and my English is really poor,
I run the code following your README
when I run the eval code, there is an error like this:

[FileNotFoundError]: No such file or directory: 'XXX/runs/fb15k237/checkpoints/model-200.eval.0.txt'

In the directory "checkpoints",there are
checkpoint model-200.data-00000of-00001 model-200.index model-200.meta
four files

How to use my own dataset?

I want to use my own dataset in Pytorch, but I don't know how to generate the .init file.

How to run ConvKB in windows

ConvKB does not work in windows. Do you have a windows version?

How to implement ConvKB on triplet classification

Hi,
I have read your paper of http://www.semantic-web-journal.net/system/files/swj1867.pdf and found you have implemented ConvKB on the triplet classification task.
I tried to implement it by myself on WN11 dataset but I failed: the model can't fit test set and dev set.I guess the key is about the score function and threshold .
So I would appreciate it if you can provide the code on the task so that I can find where the mistake is.
Thanks a lot!

Issue try running the code

Hi there,

When I tried running "python train.py --embedding_dim 100 --num_filters 50 --learning_rate 0.000005 --name FB15k-237 --useConstantInit --model_name fb15k237", It seems the program does not create a new "runs" directory. Do you guys know why?

bugs in eval.py

I used your code in my project and found that calculated MRR is greater than 1. I'm not sure that your code is implemented correctly or not.

Hyperparameter settings for different datasets

I tried to use the hyperparameters for FB15K_237 to train NELL_995 but the results (hit-10 42.1%) are far below the benchmarks given in the KBAT paper (hit-10 54.5%). So I want to ask are there any other golden hyperparameter settings for different datasets, such as NELL_995?
Thanks.