Giter Club home page Giter Club logo

prehash's Introduction

PreHash

This is our implementation for the paper:

Shaoyun Shi, Weizhi Ma, Min Zhang, Yongfeng Zhang, Xinxing Yu, Houzhi Shan, Yiqun Liu, and Shaoping Ma. 2020. Beyond User Embedding Matrix: Learning to Hash for Modeling Large-Scale Users in Recommendation In SIGIR'20.

Please cite our paper if you use our codes. Thanks!

Author: Shaoyun Shi (shisy13 AT gmail.com)

@inproceedings{shi2020prehash,
  title={Beyond User Embedding Matrix: Learning to Hash for Modeling Large-Scale Users in Recommendation},
  author={Shaoyun Shi, Weizhi Ma, Min Zhang, Yongfeng Zhang, Xinxing Yu, Houzhi Shan, Yiqun Liu, and Shaoping Ma},
  booktitle={Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval},
  year={2020},
  page={319--328},
  organization={ACM}
}

Environments

Python 3.7.6

Packages: See in requirements.txt

pathos==0.2.5
tqdm==4.42.1
numpy==1.18.1
torch==1.1.0
pandas==1.0.1
scikit_learn==0.23.1

Datasets

The processed datasets can be downloaded from Tsinghua Cloud or Google Drive.

You should place the datasets in the ./dataset/. The tree structure of directories should look like:

.
├── dataset
│   ├── Books-1-1
│   ├── Grocery-1-1
│   ├── Pet-1-1
│   ├── RecSys2017-1-1
│   └── VideoGames-1-1
└── src
    ├── data_loaders
    ├── data_processors
    ├── datasets
    ├── models
    ├── runners
    └── utils
  • Amazon Datasets: The origin dataset can be found here.

  • RecSys2017 Dataset: The origin dataset can be found here.

  • The codes for processing the data can be found in ./src/datasets/

Example to run the codes

# PreHash enhanced BiasedMF on Grocery dataset
> cd PreHash/src/
> python main.py --model_name PreHash --dataset Grocery-1-1 --rank 1 --metrics ndcg@10,precision@1 --lr 0.001 --l2 1e-7 --train_sample_n 1 --hash_u_num 1024 --sparse_his 0 --max_his 10 --sup_his 1 --random_seed 2018 --gpu 0

prehash's People

Contributors

shuriken13 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

prehash's Issues

关于anchor user选择的问题

在PrehashDP.py中提到了4种选择anchor user的方式。分别是1.随机选择;2.选择点击最多的用户;3.kmeans;4.jaccard
1,2两种方式,都只得到了k个anchor user
3,4是将用户分到k个cluster中,这是不是认为每个user都是anchor user了?
那么在经过prehash分桶后,每个用户就固定只得到1个桶的向量作为它的hash user vector了吗?

prehash hash部分的hash layer和tree layer的关系是?

parser.add_argument('--hash_layers', type=str, default='[32]',
help='MLP layer sizes of hash')
parser.add_argument('--tree_layers', type=str, default='[64]',
help='Number of branches in each level of the hash tree')

想请问一下,这里的tree layer和hash layer在Hierarchical Hash中的哪部分,作用是什么?感觉论文中并没有详细介绍(也可能是我没看到)。可以麻烦在详细讲解一下吗?谢谢

代码里算ndcg 为啥用的是split_l 而不是 split_p?

                if metric.startswith('ndcg@'):
                    max_k = max([len(d) for d in split_l])
                    k_data = np.array([(list(d) + [0] * max_k)[:max_k] for d in split_l])
                    best_rank = -np.sort(-k_data, axis=1)
                    best_dcg = np.sum(best_rank[:, :k] / np.log2(np.arange(2, k + 2)), axis=1)
                    best_dcg[best_dcg == 0] = 1
                    dcg = np.sum(k_data[:, :k] / np.log2(np.arange(2, k + 2)), axis=1)
                    ndcgs = dcg / best_dcg
                    evaluations.append(np.average(ndcgs))

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.