ahuiwang / cikm2020-s3rec Goto Github PK

View Code? Open in Web Editor NEW

75.0 75.0 48.0 60.87 MB

Code for CIKM2020 "S3-Rec: Self-Supervised Learning for Sequential Recommendation with Mutual Information Maximization"

Python 100.00%

self-supervised-learning sequential-recommendation

cikm2020-s3rec's People

Contributors

Stargazers

Watchers

Forkers

feiwang96 rucaibox rcdnn koukoulala duanchao jianzhu ruihongqiu zxreaper cklee94 ll-c8 li-fangyu crabroe

cikm2020-s3rec's Issues

Could you provide the code of other models that you reproduced? Thanks

Hi,
I am a beginner in recommendation system and have recently been working on sequence recommendations. Thank you very much for providing the code for the s3rec model, I have reproduced the results, which I learned a lot from.

But I encountered a problem. I noticed that the gru4rec, SASrec, bert4rec, and caser models have very different values in different paper, but I have insufficient code ability to find difficulties in reproducing these models. I was so frustrated😭. Could you provide the code of these models you reproduced? Thanks! 😭

关于Yelp数据集处理数据不一致的问题

Yelp Raw data has been processed! Lower than 0.0 are deleted! User 5-core complete! Item 5-core complete! Total User: 19855, Avg User: 10.4279, Min Len: 5, Max Len: 235 Total Item: 14541, Avg Item: 14.2387, Min Inter: 5, Max Inter: 317 Iteraction Num: 207045, Sparsity: 99.93% Begin extracting meta infos... 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 150346/150346 [00:49<00:00, 3008.29it/s] 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14541/14541 [00:00<00:00, 491683.26it/s] before delete, attribute num:809 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14541/14541 [00:00<00:00, 136748.70it/s] after delete, attribute num:809 attributes len, Min:1, Max:36, Avg.:5.2599 Yelp & 19,855& 14,541 & 10.4& 14.2& 207,045& 99.93\%&809&5.3 \

我用了/data/data_process.py 程序处理官网下载的数据集【https://www.yelp.com/dataset】然后使用里面的yelp_academic_dataset_review.json， yelp_academic_dataset_business.json 文件，但是结果如上图所示。

论文的Yelp数据是
Yelp
users: 30,431
items: 20,033
avg.: 10.4
actions: 316,354

请问是数据集的问题吗？还是时间需要调整？
`def main(data_name, data_type='Amazon'):
assert data_type in {'Amazon', 'Yelp'}
np.random.seed(12345)
rating_score = 0.0 # rating score smaller than this score would be deleted
# user 5-core item 5-core
user_core = 5
item_core = 5
attribute_core = 0

if data_type == 'Yelp':
    date_max = '2019-12-31 00:00:00'
    date_min = '2019-01-01 00:00:00'
    datas = Yelp(date_min, date_max, rating_score)
else:
    datas = Amazon(data_name+'_5', rating_score=rating_score)`

Why change the prediction to 0?

Hi, may I ask in line 267 of the trainer.py , why do we need to change the prediction result of the input items to 0?

when data size is 1000000 , run_pretrain.py need 200 hours。why is it slow?

Hi, Baseline implementation.

Hi,

Can please provide baseline implementation code?

thank you.

Sports dataset parameters

About GPUs

thanks for your work! I have reproduced your work recently. But it is suppppper time-consuming. What kind of GPU did you use at the time, how many GPUs did you use and how long did it take for the pre-training phase and the fine-tuning phase? 😭

About "attribute_file = './data_path/artist2attributes.json'"

Hi, thank you very much for sharing the code. May I ask where to find "artist2attributes.json" file in the "data_preprocess.py"?

About the reproduce of run_finetune_full

Hi,
Thank you for your great work!

I tested run_funetune_full with default settings on the Beauty dataset with pre_train ckp 150 and got the following result:

Finetune_full-Beauty-150 {'Epoch': 0, 'HIT@5': '0.0381', 'NDCG@5': '0.0239', 'HIT@10': '0.0617', 'NDCG@10': '0.0316', 'HIT@20': '0.0982', 'NDCG@20': '0.0407'}

I found that everything is ok, but the 'HIT@10' and 'NDCG@10' is different from the result reported in the ReadMe.
Are there any different hyperparameter settings with run_finetune_full?

About dataset

Hello author, I want to learn your code, but the data set I downloaded from the Internet seems to be missing some files. Can you share the download link of the data set?
For example, I can't find the file named "artist2attributes.json" or "artist2tags.json" in the LastFM data set, since I downloaded it from the Internet only have these files below :

Looking for your reply, thanks a million!

About the processing of yelp datasets.

Thanks for your great work! But i encounted two questions.

There is a question which is about metric calculation puzzled me when i read your code about s3rec. get_sample_scores, this is the position of code. Could you explain meaning of this line of code?

when i use your code named data_process.py to handle the yelp dataset that is downloaded from https://www.yelp.com/dataset, i got the results as follow , which is different from the results of your paper. So am i doing something wrong？

many thanks.

About the results of SASRec model

Hi, thanks for your great work!
But I have a question. I reran the source code of the SASRec model, using multiple 5-core datasets you provided. But I found that their results are different from the results you reported. For example，i use the code of https://github.com/pmixer/SASRec.pytorch, and the Beauty.txt file you provided in the data folder, we will get the result NDCG@10: 0.3384 and HR@10: 0.5059. Besides, on the sports dataset, we can also get the results NDCG@10: 0.3139 and HR@10: 0.5058. At the same time, we modified the code ourselves so that SASRec does not negatively sample and sorts on all items, and the results are far from the data you provided. Can you provide some instructions on how to get the performance results of the SASRec model on the 5-core dataset?
Thanks

About Beauty datasets

Hi, thanks for your great work!
But I have a question. Did you use the smaller dataset mentioned in http://jmcauley.ucsd.edu/data/amazon/links.html, or did you contact the author to obtain a larger one? In other word, if i run your code for reproduction, do I need to download datasets from other website? Or are all datasets in your repo?
Thanks