hwwang55 / kgcn Goto Github PK

View Code? Open in Web Editor NEW

471.0 8.0 151.0 175.19 MB

A tensorflow implementation of Knowledge Graph Convolutional Networks

License: MIT License

Python 100.00%

recommender-systems knowledge-graph graph-convolutional-networks

kgcn's Introduction

KGCN

This repository is the implementation of KGCN (arXiv):

Knowledge Graph Convolutional Networks for Recommender Systems
Hongwei Wang, Miao Zhao, Xing Xie, Wenjie Li, Minyi Guo.
In Proceedings of The 2019 Web Conference (WWW 2019)

KGCN is Knowledge Graph Convolutional Networks for recommender systems, which uses the technique of graph convolutional networks (GCN) to proces knowledge graphs for the purpose of recommendation.

Files in the folder

data/
- movie/
  - item_index2entity_id.txt: the mapping from item indices in the raw rating file to entity IDs in the KG;
  - kg.txt: knowledge graph file;
- music/
  - item_index2entity_id.txt: the mapping from item indices in the raw rating file to entity IDs in the KG;
  - kg.txt: knowledge graph file;
  - user_artists.dat: raw rating file of Last.FM;
src/: implementations of KGCN.

Running the code

Movie
(The raw rating file of MovieLens-20M is too large to be contained in this repository. Download the dataset first.)

$ wget http://files.grouplens.org/datasets/movielens/ml-20m.zip
$ unzip ml-20m.zip
$ mv ml-20m/ratings.csv data/movie/
$ cd src
$ python preprocess.py -d movie

Music
- ```
$ cd src
$ python preprocess.py -d music
```
- open src/main.py file;
- comment the code blocks of parameter settings for MovieLens-20M;
- uncomment the code blocks of parameter settings for Last.FM;
- ```
$ python main.py
```

kgcn's People

Contributors

Stargazers

Watchers

Forkers

gegetang techstone bujiahao xuezhizeng microw zhustory-ll yuanyuansiyuan zscdumin maofei schukinam lesley0622 jasonhu520 sunyatong wolfkin-hth sept23 todun polybahn josix yangyang233 embeddedsamurai collapseyu weixing2012 chenzhen peternara uliontse unt-iia-lab jhb115 alexzhang-2019 littlefish03 yonghangzhou robstens jqbman pujing-ai liu429536796 wangxuekui junshan233 lazy-poe yli999 paul0m asep-fajar-firmansyah cucsea richardruancw everest1215 jennyzhang0215 crystal22 youngbigbird1985 cjmcgraw cse-ljl thelittlepig lijian10086 fumin-git qianrenjian dc-sunk rzhou2197 piaofu110 ozrong bingjie-du dlnone nimajoorabloo don-ttake yingmuying woniuhu yafengguo zhushaoquan benoitane reproducible-analytics zyy598 hicate yangjingming0601 tejinaco zzhzaihq mighil31 hadoola08 yunyoonaer konstantinklepikov lulu0913 tims13 scarlett796 castrol68 snooker155 mightycrane chenfjs zhaochenyang sarshaw wayswang hzjai0624 inumo sandl99 truongdx5 songx64 juanjuan-lu hanzheli emmmmmboom adrienboustie fs57585 forthappy hyungtaik-oh lesliegaga zjy-ha mostsuperman

kgcn's Issues

Calculation of AUC metric

def ctr_eval(sess, model, data, batch_size):
    start = 0
    auc_list = []
    f1_list = []
    while start + batch_size <= data.shape[0]:
        auc, f1 = model.eval(sess, get_feed_dict(model, data, start, start + batch_size))
        auc_list.append(auc)
        f1_list.append(f1)
        start += batch_size
    return float(np.mean(auc_list)), float(np.mean(f1_list))

In this function, you calculate AUC in every batch and take their average value as the final AUC. But as far as I know AUC needs to be globally sorted in the test set. Can you explain it ?

item_index2entity_id这个文件到底有什么用？

第一列是item的id，第二列是知识图谱中电影的id

造不出来同款数据集，求解

Hello Wang,
Book-Crossing 的 item_index2entity_id.txt 和kg.txt 可否给个获取途径，邮箱[email protected]
THanks

I noticed that on main.py there is a block of code to run the program for the Book Crossing dataset which was also mentioned in the paper. However, I think it seems missing some data required to run it. Although I downloaded the Book Crossing dataset from a previous Issue, I could not find other files such as item_index2entitty and kg.txt that are present in both the music and movie modules.

Would it be possible for you to share these files with me, please? Your assistance would be greatly appreciated.

Thank you and best regards,
Luthfi.

Where is the data set for the book

Hello Mr or Miss Wang,
In the data folder, why can't I find the data set for the book? And
excuse me, how did you get the data and process it into a usable data set? I want to try to make a new one.

关于预测的问题

你好，我在写预测模块的时候，发现预测的数据量跟训练模型的batch_size有关，预测的时候喂的数据量必须和模型训练时的batch_size相等才可以预测出结果，我想咨询一下这个是什么原因引起的？

有些不懂的地方

comment the code blocks of parameter settings for MovieLens-20M;

uncomment the code blocks of parameter settings for Last.FM;

请问这两句是什么意思啊

Request for the rating.csv file for movie data set

Hi Dr. Wang,

Thanks for sharing the code and data sets for the paper. The movie dataset looks like missing one file. Could you please also share the ratings.csv file for the movie data? Thanks a lot for your help!

Best,
Duna

kg.txt编号

请问用自己数据构造kg.txt文件时，头实体代表用户，尾实体代表商品，是否用户编号和商品编号不能有重复，不然就无法进行区分？

The TensorFlow version

May I know the version of TF you run experiments on?

感觉这一步计算量好大

self.l2_loss = tf.nn.l2_loss(self.user_emb_matrix) + tf.nn.l2_loss(
self.entity_emb_matrix) + tf.nn.l2_loss(self.relation_emb_matrix)

请问作者有啥可以优化的办法或者替换的办法呢？

比如我的 emb 大小是 [100000000, 32]

找不到打分文件./data/movie/ratings_final.txt

OSError: ../data/movie/ratings_final.txt not found.

File "D:\projects\KGCN\src\data_loader.py", line 21, in load_rating
rating_np = np.loadtxt(rating_file + '.txt', dtype=np.int64)

pytorch implement

Hello, Professor Wang. Will you use pytorch to implement KGCN in the future?

指标问题

epoch 29 train auc: 0.9121 f1: 0.8071 eval auc: 0.6914 f1: 0.6629 test auc: 0.7041 f1: 0.6715
①论文里面的指标是用的训练、评估、测试哪个阶段啊？
②上面这个是我用自己的数据跑出来的，请问又有问题吗？

咨询一下kg.txt 每个字段的具体含义

你好，music里面txt里面每个字段具体什么含义呢，例如：2086 music.artist.origin 3846。
我看代码里面对应的是head_old，relation_old，tail_old。看着我很懵，可以做一个比较通俗易懂的解释吗？

question about computing user-relation score

Hi, Wang! Should we use tf.reduce_sum to compute user_relation_scores instead of tf.reduce_mean？ Since you said it's a inner product operation in the paper. Thanks in advance.

The arXiv Link in Readme points to MKR.., not KGCN

is it
https://arxiv.org/abs/1905.04413
?

请问kg.txt文件是怎么得出的？

你好作者，kg.txt文件真的不知道怎么创建，我想创建一个ml-100k的kg.txt文件，请帮帮忙

Hi, which version of TF the code is built on?

kg图数据处理问题

你好，KGCN这个方法相对于其他方法很新颖。我在构建KG图的时候，数据量较小时，处理速度比较快。如果数据量较大时，处理数据的时间就成倍增加。王工对于kg图构建优化上有没有研究？

王教授您好，打扰您一下，一直提示我找不到ratings_final.txt。

王教授您好，电影的ratings_final.txt我知道在链接里面，但音乐的ratings_final.txt我找不到，搞了一天到处找，找不到，麻烦您指点一下，辛苦了

Request of Book-Crossing Dataset

Dear Dr. Wang,

Thanks for your wonderful work and clear implementation. I wonder whether you could provide a copy of your processed Book-Crossing dataset with knowledge graph as described in the paper. My e-mail address is [email protected]. Thanks again.

Best

Where is kg.txt come from?

Hi, I am learning knowledge graph based recommender system, and your work is great!
However I have some trouble getting the dataset. For the movielens dataset I know where to download, but for the kg.txt file, I don't know where it comes from. I can't get access to the website https://www.satori.com/. So where do you get the satori knowledge graph dataset?

请问当user数量远远大于item数量时模型的performance是否会受到影响？

比如user的数量是item数量的100倍，如果模型performance会收到影响的话，请教一下有什么优化方向吗？还有一个小问题是，训练完之后，直接拿出user embedding和relationship/entity embedding(not for item)做dot product，可以用来解释user的喜好吗？谢谢！！

How can I use the Top-k evaluation

Hi, i see there us a evaluation called show top, but i'm wondering how can i use the evaluation method?
thank you

Command is not correct

The command should be "python preprocess.py -d movie"?

问下关于KGNN_LS对比KGCN的提升

KGNN_LS 的loss 需要引入interaction_hashtable. 这里需要读入全部的训练数据集，如果是训练数据集很大的话，会非常消耗资源（因为全量数据集需要进tensorflow内存）所以问下这个loss的加入有多大的提升？

Questions about choosing neighbors for each entity

Hi, in your paper, you mention that you sample a fixed size set of neighbors for each entity. How do you choose neighbors for each entity? Random or in other ways?

您好，论文里的embedding向量也是学习出来的吗？

Dear Professor, I want to now how to construct my own Knowledge Graph from Satori？

Dear Professor, first of all, thank you for openning source code and data set. In your paper, you mentioned using satori to construct sub-KG, but in the Web page of Bing Satori, I did not find any information about KG, so I'd like to ask how to construct KG from Satori.
Thank you again and look forward to your reply

About dependency

Hi, could you specify the dependency version (e.g. python3.6 or tf 1.4 etc.) in the repo?

item_index2entity_id.txt

Hello Mr. Wang I'm trying to reproduce your model on a new dataset that uses jurisprudemce documents. My only question is how was the item_index2entity_id.txt dataset generated? Are the item ids the items that appear in the knowledge group only as a tail entity?

Or are the item ids the items that sppear in the knowledge graph as both head and tail entities? For example if a movie item is also a head entity in triple A and is also a tail entity in another triple, say triple B would this be criteria for the item to be part in the item_index2entity_id.txt file or is it enough that an item appear as a tail entity as you've detailed in your MKR paper?