thunlp / openke Goto Github PK

An Open-Source Package for Knowledge Embedding (KE)

C++ 13.55% C 16.22% Python 70.17% Shell 0.07%

openke's Introduction

OpenKE (sub-project of OpenSKL)

OpenKE is a sub-project of OpenSKL, providing an Open-source Knowledge Embedding toolkit for knowledge representation learning (KRL), with TransR and PTransE as key features to handle complex relations and relational paths in large-scale knowledge graphs.

Overview

OpenKE is an efficient implementation based on PyTorch for knowledge embedding. We use C++ to implement some underlying operations such as data preprocessing and negative sampling. For each specific model, it is implemented by PyTorch with Python interfaces so that there is a convenient platform to run models on GPUs. OpenKE contains 4 repositories:

OpenKE-PyTorch: the repository based on PyTorch, which provides the optimized and stable framework for knowledge graph embedding models.

OpenKE-Tensorflow1.0: OpenKE implemented with TensorFlow, also providing the optimized and stable framework for knowledge graph embedding models.

TensorFlow-TransX: light and simple version of OpenKE based on TensorFlow, including TransE, TransH, TransR and TransD.

Fast-TransX: efficient lightweight C++ inferences for TransE and its extended models utilizing the framework of OpenKE, including TransH, TransR, TransD, TranSparse and PTransE.

More information (especially the embedding databases of popular knowledge graphs obtained by OpenKE and related documents) is available on our website http://openke.thunlp.org/

Models

Besides our proposed TransR and PTransE, we also support the following typical knowledge embedding models:

OpenKE (PyTorch):

RESCAL
DistMult, ComplEx, Analogy
TransE, TransH, TransR, TransD
SimplE
RotatE

OpenKE (Tensorflow):

RESCAL, HolE
DistMult, ComplEx, Analogy
TransE, TransH, TransR, TransD

TensorFlow-TransX (TensorFlow):

TransE, TransH, TransR, TransD

Fast-TransX (C++):

TransE, TransH, TransR, TransD, TranSparse, PTransE

We welcome any issues and requests for model implementation and bug fix.

Evaluation

To validate the effectiveness of this toolkit, we employ the link prediction task on large-scale knowledge graphs for evaluation.

Settings

For each test triplet, the head is removed and replaced by each of the entities from the entity set in turn. The scores of those corrupted triplets are first computed by the models and then sorted by the order. Then, we get the rank of the correct entity. This whole procedure is also repeated by removing those tail entities. We report the proportion of those correct entities ranked in the top 10/3/1 (Hits@10, Hits@3, Hits@1). The mean rank (MR) and mean reciprocal rank (MRR) of the test triplets under this setting are also reported.

Because some corrupted triplets may be in the training set and validation set. In this case, those corrupted triplets may be ranked above the test triplet, but this should not be counted as an error because both triplets are true. Hence, we remove those corrupted triplets appearing in the training, validation or test set, which ensures the corrupted triplets are not in the dataset. We report the proportion of those correct entities ranked in the top 10/3/1 (Hits@10 (filter), Hits@3(filter), Hits@1(filter)) under this setting. The mean rank (MR (filter)) and mean reciprocal rank (MRR (filter)) of the test triplets under this setting are also reported.

More details of the above-mentioned settings can be found from the papers TransE, ComplEx.

For those large-scale entity sets, to corrupt all entities with the whole entity set is time-costing. Hence, we also provide the experimental setting named "type constraint" to corrupt entities with some limited entity sets determining by their relations.

Results

We have provided the hyper-parameters of some models to achieve the state-of-the-art performace (Hits@10 (filter)) on FB15K237 and WN18RR. These scripts can be founded in the folder "./examples/". The results of these models are as follows: the left two columns are the performance implemented by OpenKE, and the right two columns are the performance reported in the original papers. Overall, OpenKE can reproduce the results in the original papers.

Model	WN18RR	FB15K237	WN18RR (Paper*)	FB15K237 (Paper*)
TransE (2013)	0.512	0.476	0.501	0.486
TransH (2014)	0.507	0.490	-	-
TransR (2015)	0.519	0.511	-	-
TransD (2015)	0.508	0.487	-	-
DistMult (2014)	0.479	0.419	0.49	0.419
ComplEx (2016)	0.485	0.426	0.51	0.428
ConvE (2017)	0.506	0.485	0.52	0.501
RotatE (2019)	0.549	0.479	-	0.480
RotatE+adv (2019)	0.565	0.522	0.571	0.533

RotatE has the best performance by representing knowledge in complex space. Our proposed TransR has the second best performance, and the real-valued representations learned by TransR can be more easily integrated with other neural network models, e.g. pre-trained language models. Please refer to our another toolkit Knowledge-Plugin for such integration.

Usage

Installation

Install PyTorch
Clone the OpenKE-PyTorch branch:

git clone -b OpenKE-PyTorch https://github.com/thunlp/OpenKE --depth 1
cd OpenKE
cd openke

Compile C++ files

bash make.sh

Quick Start

cd ../
cp examples/train_transe_FB15K237.py ./
python train_transe_FB15K237.py

Data Format

For training, datasets contain three files:

train2id.txt: training file, the first line is the number of triples for training. Then the following lines are all in the format (e1, e2, rel) which indicates there is a relation rel between e1 and e2 . Note that train2id.txt contains ids from entitiy2id.txt and relation2id.txt instead of the names of the entities and relations. If you use your own datasets, please check the format of your training file. Files in the wrong format may cause segmentation fault.

entity2id.txt: all entities and corresponding ids, one per line. The first line is the number of entities.

relation2id.txt: all relations and corresponding ids, one per line. The first line is the number of relations.
For testing, datasets contain additional two files (totally five files):

test2id.txt: testing file, the first line is the number of triples for testing. Then the following lines are all in the format (e1, e2, rel) .

valid2id.txt: validating file, the first line is the number of triples for validating. Then the following lines are all in the format (e1, e2, rel) .

type_constrain.txt: type constraining file, the first line is the number of relations. Then the following lines are type constraints for each relation. For example, the relation with id 1200 has 4 types of head entities, which are 3123, 1034, 58 and 5733. The relation with id 1200 has 4 types of tail entities, which are 12123, 4388, 11087 and 11088. You can get this file through n-n.py in folder benchmarks/FB15K .

Citation

If you find OpenKE is useful for your research, please consider citing the following papers:

 @inproceedings{han2018openke,
   title={OpenKE: An Open Toolkit for Knowledge Embedding},
   author={Han, Xu and Cao, Shulin and Lv Xin and Lin, Yankai and Liu, Zhiyuan and Sun, Maosong and Li, Juanzi},
   booktitle={Proceedings of EMNLP},
   year={2018}
 }

This package is mainly contributed (in chronological order) by Xu Han, Yankai Lin, Ruobing Xie, Zhiyuan Liu, Xin Lv, Shulin Cao, Weize Chen, Jingqin Yang.

About OpenSKL

OpenSKL project aims to harness the power of both structured knowledge and natural languages via representation learning. All sub-projects of OpenSKL, under the categories of Algorithm, Resource and Application, are as follows.

Algorithm:
- OpenKE
  - An effective and efficient toolkit for representing structured knowledge in large-scale knowledge graphs as embeddings, with TransR and PTransE as key features to handle complex relations and relational paths.
  - This toolkit also includes three repositories:
- ERNIE
  - An effective and efficient toolkit for augmenting pre-trained language models with knowledge graph representations.
- OpenNE
  - An effective and efficient toolkit for representing nodes in large-scale graphs as embeddings, with TADW as key features to incorporate text attributes of nodes.
- OpenNRE
  - An effective and efficient toolkit for implementing neural networks for extracting structured knowledge from text, with ATT as key features to consider relation-associated text information.
  - This toolkit also includes two repositories:
    - JointNRE
    - NRE
Resource:
- The embeddings of large-scale knowledge graphs pre-trained by OpenKE, covering three typical large-scale knowledge graphs: Wikidata, Freebase, and XLORE. The embeddings are free to use under the MIT license, and please click the following link to submit download requests.
- OpenKE-Wikidata
  - Wikidata is a free and collaborative database, collecting structured data to provide support for Wikipedia. The original Wikidata contains 20,982,733 entities, 594 relations and 68,904,773 triplets. In particular, Wikidata-5M is the core subgraph of Wikidata, containing 5,040,986 high-frequency entities from Wikidata with their corresponding 927 relations and 24,267,796 triplets.
  - TransE version: Knowledge embeddings of Wikidata pre-trained by OpenKE.
  - TransR version of Wikidata-5M: Knowledge embeddings of Wikidata-5M pre-trained by OpenKE.
- OpenKE-Freebase
  - Freebase was a large collaborative knowledge base consisting of data composed mainly by its community members. It was an online collection of structured data harvested from many sources. Freebase contains 86,054,151 entities, 14,824 relations and 338,586,276 triplets.
  - TransE version: Knowledge embeddings of Freebase pre-trained by OpenKE.
- OpenKE-XLORE
  - XLORE is one of the most popular Chinese knowledge graphs developed by THUKEG. XLORE contains 10,572,209 entities, 138,581 relations and 35,954,249 triplets.
  - TransE version: Knowledge embeddings of XLORE pre-trained by OpenKE.
Application:
- Knowledge-Plugin
  - An effective and efficient toolkit of plug-and-play knowledge injection for pre-trained language models. Knowledge-Plugin is general for all kinds of knowledge graph embeddings mentioned above. In the toolkit, we plug the TransR version of Wikidata-5M into BERT as an example of applications. With the TransR embedding, we enhance the knowledge ability of BERT without fine-tuning the original model, e.g., up to 8% improvement on question answering.

openke's People

Contributors

Stargazers

Watchers

Forkers

zhangzhaocs lancoyy leiloong nemochina2008 dreadlord1984 junma3861 lightsilver wuxiaobo hfxunlp zwj-ml ybsmile geraldhzy expressgit liuweiping2020 zhangxt 0xqq frankatmech amesty scape1989 arrmac liurenjin zhoujialinmumu zgsxwsdxg molinli qiuyuew yangchunm royzhenggao johnnyliu27 zhaodongh lemonadeww erickguan xjhann xitongdashi huokedu stevenlee-belief frankiegu jackysnake vincenttam97 junhengh xuwukui mzkhan2000 rogermonkey xqtbox yuanchun-li thupzj liloveyang adam-whj aqzlpm11 shubhampachori12110095 wellwang kyeongpil yaosheng42 yunweidashuju arthurcgusmao jessedu apierleoni hongtaowutj xeren zsh1993 pagra colinsongf baifengbai whuopm cosastro zxsted sxdkxgwan los-phoenix tingsongpku bigdong89 czhu12 xrick x-hacker dinghe zhangyijia1979 jinlonghe yangkuoone tieqiang tidesq zhanglbjames cxjtju tranhungnghiep jswhy beethika ch4osmy7h maximtian bisonx yshihui keyuding tree-pi hujiajia0401 doug919 jalamao hiropppe chanshunli ai3dvision nooralahzadeh changquanyou quatrejuin moonlightlong liweitj47

openke's Issues

训练集过大时出现段错误

您好，我在使用平台测试我自己的样本的时候出现了段错误，我调试+搜了好多相关的内容定位到了出错的位置在config.py 文件中的self.lib.importTrainFiles()句中，我在网上看到的是由于在import训练集的时候太大在c返回给python的时候就会有段错误，但我搜了很多也没有搜索到解决方案，想咨询一下你们是怎么处理的，我的训练集文件大小22M

Request for functionality to pull out actual predictions on test examples.

Hi,
Thanks for the great framework!

Would you consider implementing functionality for returning a handle for the actual score predictions , i.e., predicted tensor elements on the test set? It will then be straightforward to rank the retrieved results and gain insight on how the model performs, on demonstrative examples.
This is essentially the same operation as your framework already covers while computing the MRR, hits@n, etc.

At the moment, I am implementing this atop the returned trained embeddings, and then carrying out my analysis. I maintain the mappings between entities/relations and their ids; and then fixing say the head 'h' and the relation 'r', I compute the corresponding scores for all possible tail candidates 't' (subject to type constraints), using their corresponding embeddings. Thereafter, I rank to retrieve the n top scoring results. My motivation behind this exercise is to then field those examples that have the true tail at a high rank, and those examples that perform poorly in their predictions.

I am, at the moment, also investigating the more proper (elegant) way of pulling out the relevant class methods and variables from your own code from config, models, etc. But it would be wonderful if you already have tested, working functionality of the same that you could share.

请问近期的更新是否更新了tensorflow的版本

如题，我记得之前项目对应的TF版本是0.12，在更新后运行测试代码貌似遇到了tensorflow版本不兼容的情况。谢谢

使用自己的数据集时发生c++的库出现内存访问错误

您好，
我在使用这份代码读取自己生成的数据集时发生了一些错误
问题已解决，在此提醒大家：
注意数据集的反人类设计。。。三元组排列顺序是 h t r 而不是KG中的 h r t !!!

如果能上传一份entity的真实数据的文件就好了

现在是/m/016qtt这样的数据，
如果能对应到真实数据，就能有更直观的认识。
多谢多谢！

关于Triple Classification 测试

我阅读了readme里边的介绍，理解了这个测试通过计算某个三元组头尾节点之间的dissimilarity来衡量该三元组是否正确。
我困惑的地方在于这个测试的结果该如何理解，比如说我在自己的数据上跑的结果在执行Triple Classification测试的时候结果特别高（达到了0.99），那么我改如何理解这个测试结果呢？这个测试结果可以告诉我什么呢。
小白望赐教，O(∩_∩)O谢谢！

能不能出一个中文文档？

TransD模型中relation的维度问题

TransD模型中，relation和entity的向量维度是不同的，但是代码里relation和entity的维度大小定义是一样的，这样对话，TransD应该就和TransE没有太大区别了吧，所以想请问一下这块代码是否有问题？

可否把所有相关论文都写在README？

多谢多谢！

关于模型constraints部分的代码实现疑惑

您好，我阅读代码(PyTorch版)时有一个疑惑：以TransE为例，请问原论文算法流程图中提到实体norm，在代码中什么地方实现的，我好像没有找到？谢谢
即，e ← e/ ∥e∥ for each entity e ∈ E

Some Advice

HI friends.. I hope i can get more examples, when I run example.py with TransH, the output is nan.. but i dont know why... and i hope you can post the configuration on the site like the version of Tensorflow and describe clearly like OpenNE ... Thanks you .

how to get the embedding matrix result following the example.py

following the example.py, the result is saved in './res/model.vec'
I wonder how to get the embedding results? for example, how to get the numpy array for each entity?

raise TypeError(init)

请问embedding.vec.json文件中对应的向量表示，是经过filter处理之后的吗？

提这个问题是因为，我看到运行测试集后，分别生成了raw模式与filtered模式的统计结果，而最终生成的向量表示似乎只有唯一的embedding.vec.json文件。
因此，embedding.vec.json里的向量表示对应的是raw还是filtered呢？在我的实验中发现采用两种模式的预测结果相差非常悬殊。

test tail

void testTail(REAL *con) {
INT h = testList[lastTail].h;
INT t = testList[lastTail].t;
INT r = testList[lastTail].r;

REAL minimal = con[t];
INT r_s = 0;
INT r_filter_s = 0;
INT r_s_constrain = 0;

for (INT j = 0; j <= entityTotal; j++) {
    REAL value = con[j];
    if (j != t && value < minimal) {
        r_s += 1;
        if (not _find(h, j, r))
            r_filter_s += 1;
    }
}

would you please tell me what does the 'j' mean ,thanks

It seems that the valid sets are not used at all?

Hi there,

I suppose that valid set is typically used to do the early stop in the training phase. I understand that running a test on a valid set after each epoch is very time-consuming, while it is better to have that option.

Also, the other important function of valid sets, I think, is to select the hyper-parameters. However, the test phase seems to be only able to run on the test set. Are the valid sets used in this framework at all?

pytorch版本中TransR的实现

PyTorch版本的TransR代码没有载入TransE的预训练结果啊？能否补充一下？

segmentation fault

➜ OpenKE git:(master) ✗ python example.py
Input Files Path : ./data/FB15K/
Output Files Path : ./data/FB15K/
The toolkit is importing datasets.
[1] 21071 segmentation fault python example.p

Confused about the meaning of test results

Hi! I'm running the code following your instructions:
python example_train_transe.py
which runs perfectly. However, I'm having trouble understanding the results it gives:
overall results: left 245.622864 0.382235 0.173181 0.000000 left(filter) 83.448372 0.579167 0.359415 0.000000 right 161.250595 0.460615 0.228031 0.000000 right(filter) 59.193935 0.631359 0.408847 0.000000
I'm not very familiar with C, so it's hard for me to understand your source code. So I'm having a very hard time figure out which number is mean rank, which number is hit@10, and why there are so many results. Can you kindly explain it for me? Thank you very much!

Is there any text data document ?

Is there any corresponding text data [such as (ID, entity_context) ] document available for download in order to be able to display the results visually? The data files downloaded from the website (http://openke.thunlp.org) only have IDs. Or I didn't find them. Thanks.

[feature request] consider supporting MRR metric in the test?

Hi,

I found that most recent papers do not use Mean Rank as their evaluation metrics anymore. Instead, Mean Reciprocal Rank (MRR) seems to be a more popular alternative. Do you think it is easy to support this metric?

Best,

Imposing Type Constraints for Link Prediction

Hi,
I was investigating the functionality around imposing type constraints - via the type_constrain.txt file.
If i understand correctly, this has not been implemented for link-prediction yet, has it? Only triple_classification appears to be dependent on it, and the training of a model for triple classification on FB15K returns a segmentation fault when type_constrain.txt is removed.
I have tried to understand the underlying implementation of type constraints in your code base, but have not been able to grasp it as clearly yet. Any elaboration on this would be much appreciated.

It would be interesting if type constraints are imposed on the link prediction task as one can force certain examples to label 0 based on the type constraints, which I feel would add valuable negative examples for learning. Otherwise, with uniform sampling for negative triples across the entire unrestricted range of entities, as is done now, we often return triples that could have directly been labeled as negative.

Is this something we can implement here for link prediction? Or am I missing something?

Thanks!

中文数据集

请问有没有中文的数据集，比如COAE2016任务三的数据集？

Analogy implementation

Hi,

I have a question about ANALOGY's implementation in PyTorch: in line 44 we see that the softplus activation function is used. Should not it be the sigmoid function however? In the original implementation we see that a sigmoid was used.

Btw, I opened a pull request on the implementation of Analogy in TensorFlow, following the implementation in PyTorch.

Thanks for your attention and support.

it seems that there is no L2-norm constraints of the entities embedding?

it seems that there is no L2-norm constraints of the entities embedding in the transE code implementation, while the constraints is recommended in the transE paper, for what consideration you made this change?

Are there any official benchmark results of the framework?

Hi,

Thanks for this wonderful framework!

I was wondering whether you have reported your official results (with hyper-parameters) on the WN18 or FB15K using this framework. I am trying to reproduce the results of TransE on the two benchmark datasets, while it seems to be hard to tune for a comparable result reported by other papers.

Thus, I was thinking that you might have some benchmark results to verify that the framework could possibly reproduce similar results.

An issue about the file "Reader.h"

in the line 106, the range of 'j' is from 'lefHead[i] + 1' to 'rigHead[i]-1', why not it is from 'lefHead[i] + 1' to 'rigHead[i]'?

confused by testing

Hello,I am a little confused by the predicted results,could you please explain what is the meaning of them,thanks

confusing about the test or predict results of models.

Hi,thanks for your test case, and I train and test successfully.
But I was confused by the test stage, here are the core code picce:

self.lib.getHeadBatch(self.test_h_addr, self.test_t_addr, self.test_r_addr)
res = self.test_step(self.test_h, self.test_t, self.test_r)

when I debug ,I found that res is just a list with value range from -1 to 0. are they stand for the possiblity for the indexed entity be choosing ? so the index of the min value is the predict entity id ? If so, why is range from [-1,0] but not [0,1]?

Ambiguous dimension.

Try to run the example.py under python 3.5. I got the error:

Traceback (most recent call last):
File "example.py", line 21, in
con.set_model(models.TransE)
File "/home/deeplearning/OpenKE/config/Config.py", line 148, in set_model
self.trainModel = self.model(config = self)
File "/home/deeplearning/OpenKE/models/Model.py", line 63, in init
self.input_def()
File "/home/deeplearning/OpenKE/models/Model.py", line 42, in input_def
self.batch_h = tf.placeholder(tf.int64, [config.batch_seq_size])
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/array_ops.py", line 1494, in placeholder
shape = tensor_shape.as_shape(shape)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/tensor_shape.py", line 800, in as_shape
return TensorShape(shape)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/tensor_shape.py", line 436, in init
self._dims = [as_dimension(d) for d in dims_iter]
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/tensor_shape.py", line 436, in
self._dims = [as_dimension(d) for d in dims_iter]
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/tensor_shape.py", line 378, in as_dimension
return Dimension(value)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/tensor_shape.py", line 36, in init
raise ValueError("Ambiguous dimension: %s" % value)
ValueError: Ambiguous dimension: 9662.84

Need to fix the it to force the number to be integer @ Config.py.

OSError: [WinError 1114] 动态链接库(DLL)初始化例程失败。

Hi, thunlp:

when I try run example.py after run bash make.py, console show this, how can I fix that? thanks.

可否问下这里打印的数值的含义？

https://github.com/thunlp/OpenKE/blob/master/base/Test.h#L110-L114

printf("overall results:\n");
printf("left %f %f %f %f \n", l_rank/ testTotal, l_tot / testTotal, l3_tot / testTotal, l1_tot / testTotal);
printf("left(filter) %f %f %f %f \n", l_filter_rank/ testTotal, l_filter_tot / testTotal,  l3_filter_tot / testTotal,  l1_filter_tot / testTotal);
printf("right %f %f %f %f \n", r_rank/ testTotal, r_tot / testTotal,r3_tot / testTotal,r1_tot / testTotal);
printf("right(filter) %f %f %f %f\n", r_filter_rank/ testTotal, r_filter_tot / testTotal,r3_filter_tot / testTotal,r1_filter_tot / testTotal);

多谢多谢！

datasets里面train2id,test2id,valid2id,entity2id,relation2id数据具体的格式？

运算结果解释

我运行了example_train_transe.py，并得到了transE和transR算法的运行结果。但是对于打印的结果不太明白是什么含义。例如
l_filter_s: 2139
0.001585 0.006737 3068.608398 2783.705811
r_filter_s: 152
0.311274 0.336437 1587.949707 1586.709473

还有最后
overall results:
left 3068.608398 0.001585 0.000000 0.000000
left(filter) 2783.705811 0.006737 0.001387 0.000000
right 1587.949707 0.311274 0.085992 0.000000
right(filter) 1586.709473 0.336437 0.160491 0.000000

我看找到了对应的打印运行结果的C代码，但还是不太清楚这些打印结果的含义

Results of TransE

Sir, how can i get accuracy i want to find precision from confusion matrix and im getting my results something like this

请问OpenKE有接口输出predict的结果吗？

更新后貌似不能使用自己的数据集运行训练和测试代码

我不知道是不是自己的问题...但是同样的数据集在更新之前是可以跑的，更新之后会报错：
Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)
我看到之前的issue有的是因为三元组的顺序弄错了，但是我这个检查了一下是正确的，而且在之前的版本是可以正常运行的。

dataset for transr

I want to make a movie dataset for TransR as it requires triples, entities, and relations. How these three are considered for ex: in train2id.txt (0 1 0) means 0's are entities and 1 is their relation is it so or just sequence numbers?

弱问下，把knowledge Embedding之后最直接的做的inference是做什么？

举个小例子就更好了，多谢多谢！！

model can not be exported

Hi, I clone the code, and run the example, it seems good and print the log. but I can not get model.vec.tf file in res folder.
it only contain some files like this:

预训练嵌入下载

我想下载wiki的预训练嵌入，可是这个网站http://openke.thunlp.org/download/wikidata显示not found了?
谢谢！

安装问题

请问目前还无法在windows系统上安装是吗？谢谢！

Error in indices

tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[0,0] = 19670950215894 is not in [0, 14951)

I just executed the example.py

Add a license to OpenKE?

Have you considered adding a license to your code? Without an explicit license, the default copyright laws apply, meaning that you retain all rights to your source code and no one may reproduce, distribute, or create derivative works from your work (source). Since you've created an open-source project, would you consider adding an open-source license?

GitHub created choosealicense.com to help repo managers select the right license. They also created the Open Source Guide, which goes more in-depth on the pros/cons of each license for different kinds of open-source projects.

The MIT license would be a great choice because it’s short, very easy to understand, and allows anyone to do anything so long as they keep a copy of the license, including your copyright notice. You’ll be able to release the project under a different license if you ever need to.

Thanks in advance!

Knowledge-Embedding

I downloaded the wiki embedding files, name for entity2vec. Bin, but how do I use this file, what is the format of the content, the entity and the vector?or other, for example, when I was in the use of word2vec, first for the word back to vector, how to use this embedding?
one enamble:(dim=50)
give 0.14555487 0.05351801 0.10649003 0.45990866 0.08294252 -0.46757272
0.1927298 0.42644426 -0.44441804 0.47094241 -0.38926664 0.11177664
-0.38736385 -0.34790769 0.56327432 0.49133539 -0.10508542 0.07396693
0.4166784 -0.15814875 0.63791603 -0.52564949 0.02029056 -0.0246984
-0.1063034 -0.20384586 -0.10796665 0.0293985 0.49127114 -0.54218274
0.09538431 -0.3666929 -0.24478485 0.47898671 0.11772798 0.234074
0.16560279 0.25576589 0.17955644 0.10463575 -0.09829795 -0.24375246
0.37840015 0.18217942 0.64507526 0.52854127 0.15324134 0.73259991
0.09031539 0.0369201
what about you ?
thank you!

Segmentation fault (core dumped)

Dear THUNLP,

I installed OpenKE successfully but it reports segmentation fault when I run example.py.

~/OpenKE$ python example.py
Input Files Path : ./data/FB15K/
Output Files Path : ./data/FB15K/
The toolkit is importing datasets.
Segmentation fault (core dumped)

Segmentation fault

Hi,

When I run TransE, both of the python version and C version( from https://github.com/thunlp/KB2E ) rose "Segmentation fault".
My env is Python 3 +MacOS (I have already modified the init files to make them adapt to Python3)

I would appreciate if you could help me on this. Thank you so much.