changun / collmetric Goto Github PK

A Tensorflow implementation of Collaborative Metric Learning (CML)

License: GNU General Public License v3.0

Python 100.00%

collmetric's Introduction

CollMetric

A Tensorflow implementation of Collaborative Metric Learning (CML):

Cheng-Kang Hsieh, Longqi Yang, Yin Cui, Tsung-Yi Lin, Serge Belongie, and Deborah Estrin. 2017. Collaborative Metric Learning. In Proceedings of the 26th International Conference on World Wide Web (WWW '17) (perm_link, pdf)

** Note: the original Theano implementation is deprecated and is kept in the old_experiment_code branch

Features

Produces embedding that accurately captures the user-item, user-user, and item-item similarity.
Allows the exploitation of item features (e.g. tags, text, image features).
Outperforms state-of-the-art recommendation algorithms on a wide range of tasks
Enjoys an extremely efficient Top-K search using Fast KNN algorithms.

Utility Features

Parallel negative sampler that can sample the user-item pairs when the model is being trained on GPU
Fast recall evaluation based on Tensorflow

Requirements

python3
tensorflow
scipy
scikit-learn

Usage

# install requirements
pip3 install -r requirements.txt
# run demo tensorflow model
python3 CML.py

Known Issue

AdaGrad does not seem to work on GPU. Try using AdamOptimizer instead
~~the WithFeature version does not seems to perform as well as the Theano version. It is being investigated.~~ (The performance is actually slightly better (with AdamOptimizer) than the number reported in the paper now!)

Visuals

An illustration of embbeding learning procedue of CML

Flickr photo recommendation embedding produced by CML (compared to original ImageNet features)

TODO

Model Comparison.
TensorBoard visualization

collmetric's People

Contributors

Stargazers

Watchers

collmetric's Issues

run your code but there is an error

I downloaded your code, and there was an error in running your program using pycharm in Windows.
The error is as follows:
Traceback (most recent call last):
File "", line 1, in
File "E:\anaconda\envs\tensorflow-gpu\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "E:\anaconda\envs\tensorflow-gpu\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
File "E:\anaconda\envs\tensorflow-gpu\lib\site-packages\scipy\sparse\dok.py", line 244, in setitem
Traceback (most recent call last):
File "C:/Users/lenovo/Desktop/CollMetric-master/CML.py", line 331, in
if (isintlike(i) and isintlike(j) and 0 <= i < self.shape[0]
File "E:\anaconda\envs\tensorflow-gpu\lib\site-packages\scipy\sparse\base.py", line 576, in getattr
sampler = WarpSampler(train, batch_size=BATCH_SIZE, n_negative=N_NEGATIVE, check_negative=True)
File "C:\Users\lenovo\Desktop\CollMetric-master\sampler.py", line 61, in init
self.processors[-1].start()
File "E:\anaconda\envs\tensorflow-gpu\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "E:\anaconda\envs\tensorflow-gpu\lib\multiprocessing\context.py", line 223, in _Popen
raise AttributeError(attr + " not found")
AttributeError: shape not found
return _default_context.get_context().Process._Popen(process_obj)
File "E:\anaconda\envs\tensorflow-gpu\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "E:\anaconda\envs\tensorflow-gpu\lib\multiprocessing\popen_spawn_win32.py", line 65, in init
reduction.dump(process_obj, to_child)
File "E:\anaconda\envs\tensorflow-gpu\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
BrokenPipeError: [Errno 32] Broken pipe

Unsupported feed

Hi Changun,

I've been trying to start this project for a few days now.
But I'm running into the same error over and over again : Unsupported feed type.
I've tried it both with tensorflow GPU and CPU and on two different tf versions.
I'm using Windows and Python 3.5.2 and have numpy and all other dependecies named installed.

Every hit you could give me would be extremely appreciated.

10403 features over tag_occurence_thres (5)
Split data into train/valid/test: 100%|################################################| 7947/7947 [00:03<00:00, 2202.89it/s]
79527/25186/31555 train/valid/test samples
2017-07-30 13:19:08.928815: W C:\tf_jenkins\home\workspace\nightly-win\M\windows\PY\35\tensorflow\core\platform\cpu_feature_gu
ard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
Recall on (sampled) validation set: 0.0
Optimizing...:   0%|                                                                                 | 0/100 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "C:\Users\Sandro\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 13
27, in _do_call
    return fn(*args)
  File "C:\Users\Sandro\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 13
06, in _run_fn
    status, run_metadata)
  File "C:\Users\Sandro\AppData\Local\Programs\Python\Python35\lib\contextlib.py", line 66, in __exit__
    next(self.gen)
  File "C:\Users\Sandro\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\framework\errors_impl.py",
line 466, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: Unsupported feed type

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File ".\CML.py", line 397, in <module>
    optimize(model, sampler, train, valid)
  File ".\CML.py", line 361, in optimize
    model.negative_samples: neg})
  File "C:\Users\Sandro\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 89
5, in run
    run_metadata_ptr)
  File "C:\Users\Sandro\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 11
24, in _run
    feed_dict_tensor, options, run_metadata)
  File "C:\Users\Sandro\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 13
21, in _do_run
    options, run_metadata)
  File "C:\Users\Sandro\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 13
40, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Unsupported feed type

how to work

请问一下，这个源代码如何运行，是需要搭建服务器的嘛

Loss wouldn't decrease when training on GPU while everything is OK on CPU

Hello,
thanks for open-sourcing your code and for it being so nicely written!

We've bumped into the following issue: everything works fine when we train the model on CPU (loss decreases very fast), however, when we run that very same code on GPU, the loss would stay approximately the same and the method wouldn't converge at all.

Batch size/learning rate modification didn't help.

Checked the issue on tensorflow 1.4.x and 1.7.0 and on different CUDA versions.

Did you have the same problems with CML? Do you have any ideas why that happens and how to fix it?

Many thanks in advance.

is the model code currently in the repository the same as the one you used to produce the paper results?

Great paper. I am really interested in replicating the results before using the model for a project involving transfer learning. Using the hyperparameters that are discussed here #15 (comment) along with the model code currently in the repo, I am unfortunately not able to reproduce the results. I've attached a screenshot of the model running for nearly 1000 epochs with the accompanied loss and recall validation.

Can you please advise? Thank you very much for your time!

Covariance loss is different from paper

Hi,
Thanks for great research. However, it seems that covariance loss is implemented different from the description in your paper.
(https://github.com/changun/CollMetric/blob/master/CML.py#L174) According to your paper, covariance loss is defined as 1/N(||C||_f - ||diag(C)||_2^2), where C is a covariance matrix between all pairs of dimensions. But you implemented it as summation of off-diagonal elements of covariance matrix, which may result in negative scale. Could you provide some more explanation about covariance loss?

What parameters do you use to achieve slightly better performance than the number reported in the paper?

Great Approach. Besides, I am wondering what parameters you use to achieve slightly better performance than the number reported in the paper. I change the learning rate to 0.001 and it achieve 29% recall in the citeulike dataset, which is lower than 33% recall reported in the paper. The parameters is as follows. Hope for your help soon.

model = CML(n_users,
n_items,
features=dense_features,
embed_dim=100,
margin=2.0,
clip_norm=1.1,
master_learning_rate=0.001,
hidden_layer_dim=512,
dropout_rate=0.3,
feature_projection_scaling_factor=1,
feature_l2_reg=0.1,
use_rank_weight=True,
use_cov_loss=True,
cov_loss_weight=1
)

Movielens 20m

I was trying to reproduce the results you had with Movielens20m.
I failed to reproduce them and assume it's cause of the way the dataset is filtered.

While I was able to reproduce the recall mention in the paper for movielens100k.

Could you let us know a bit more about your filtering of the Movielens20m dataset?
If you still have the dataset I would be extremely greatful if you could make it available such that I can compare my results better to yours.
I could bind it into the algorithm myself

SIR

Is it a bug not to eliminate first element of a line in `tag-item.dat` ?

tag-item.dat has number of tags related to an article at first element of any line.
So i think when counting number of tags first element should be eliminated.
But utils.py seems to regard the first element as a tag according to below code.

CollMetric/utils.py

Line 29 in d9026cf

if len(items) >= tag_occurence_thres:

CollMetric/utils.py

Line 37 in d9026cf

features[[int(i) for i in items], feature_index] = 1

Is it a bug or do I have any misunderstanding?

Validation recall is not improving during training

I just ran a baseline code by typing
python CML.py
I just get validation recall 0.001952619578277473 every epoch.
Although training loss is slightly decreasing, I cannot see any changes in validation recall.
Is the baseline code wrong?
How can i modify it to get the right result?