Giter Club home page Giter Club logo

Comments (10)

dawenl avatar dawenl commented on May 28, 2024
  1. Computing for probability mass doesn't necessarily mean exclusive. Multinomial can allow multiple non-zero entries. If two items tend to co-occur, the model can certainly learn to give both probability mass.

  2. In Eq (3), the purpose of c_{ui} is to downweight 0 (since c_{ui} != 0 even when x_{ui} = 0), which is equivalent to negative sampling. I am not sure why you think it only cares about 1. I didn't do logistic with negative sampling because I didn't find it much more helpful. If you can get logistic with negative sampling with better results (all the necessary code should be available to you), please let me know and I am happy to include that in an updated version of the paper on arxiv. Whatever NCF used in the public source code is what I used.

from vae_cf.

ConanCui avatar ConanCui commented on May 28, 2024

The first problem I understand.

For the second problem, I can understand the situation c_{ui} != 0. I use your code and use the Logistic likelihood function with Negative sampling to replace the loss function as below,
image

the omega^{+}_{u} means the set that contains the all positive observe sample of user u, omega^{-}_{u} means the set that contain negative samples randomly sampling from the interaction history of user u except the positive samples. And I note the ratio of the Negative sample as K which is equal (the number of omega^{-}_{u} / the number of omega^{+}_{u}).

I do some experiments using the Logistic likelihood function with Negative sampling in setting of different K. And I find the conclusion that the performance is improve by enlarge the K. And the performance of Logistic likelihood function reach best when take all the zero entry as negative samples.
But the best performance of Logistic likelihood function is still worse than Multinomial.

And I find that there is a paper which take the variational auto-encoder with Logistic likelihood function[1], being similar with your work. And the Negative sampling improve his result a lot.

[1] Augmented Variational Autoencoders for Collaborative Filtering with Auxiliary Information,2017,CIKM.
http://aai.kaist.ac.kr/xe2/module=file&act=procFileDownload&file_srl=18019&sid=4be19b9d0134a4aeacb9ef1ecd81c784&module_srl=1379

from vae_cf.

dawenl avatar dawenl commented on May 28, 2024

I am not sure I follow, but for logistic isn't what I did is using all the 0's as negatives?

from vae_cf.

dawenl avatar dawenl commented on May 28, 2024

I think I understand now, and maybe you misunderstood what I did -- for both Gaussian and logistic, I used all the 0's in the training. With Gaussian, I applied the c_{ui} weight which is in effect down-weighting all the negatives. With logistic, I simply used all the 0's, which I think corresponds to what you mean by setting K to the largest possible.

from vae_cf.

ConanCui avatar ConanCui commented on May 28, 2024

Hi, I have some doubt about how to apply your data split method in the baseline WMF. As I know, the data split method you use like below,
image
Each row in the matrix represent the interaction data for a user on all items. The interaction data in blue rectangle is used for train, the data in red rectangle is used for getting the necessary representation for test users , and the green is used for compute the NDCG.
As I know, the WMF need know all the users in the process of training. How do you use this data split method for WMF as a baseline ?

from vae_cf.

dawenl avatar dawenl commented on May 28, 2024

Your diagram looks correct. (One minor detail is that the splitting between red and green for each test user is random, not like certain items will only in red or green for all test users, so just to make that clear.)

I think there is only one sensible way to do it. Rather than me directly feeding you the answer, maybe you can think about it first and tell me how you would do it?

from vae_cf.

ConanCui avatar ConanCui commented on May 28, 2024

You are right, the split is random. To see it simply, I draw the diagram like the above.
I have tried to use the data in blue and red rectangle to train the WMF, cause the data two together have all the users, then predict the result of green rectangle. But I am wondering if there is something wrong. Cause, with this train strategy, the interaction data of test set(red rectangle) influenced the learn-able parameters of WMF (user and items latent embeddings). This means I leak the test set in the training process. For VAE, Although the data in red rectangle is used to get the necessary representation for test user, the data doesn't have an influence on the learn-able parameters of VAE.

This is how I think, but I think training WMF like this exists some problems above. Is there anything wrong, and how do you do it?

from vae_cf.

dawenl avatar dawenl commented on May 28, 2024

Yes, you are right that this would leak the validation data for WMF. A simple fix (this is how I did) is to train WMF only with the blue box and only keep the item factors. Then during evaluation, keep the item factors fixed and learn the validation user factors (which corresponds to one ALS update) with the red box and make prediction for the green box. This is known as strong generalization.

from vae_cf.

JoaoLages avatar JoaoLages commented on May 28, 2024

I wonder why you didnt use Binary cross entropy over Cross entropy also. Since it is a multi-label problem.
I also wonder why negative sampling or another technique wasnt applied since you vocabulary is very large.

from vae_cf.

JoaoLages avatar JoaoLages commented on May 28, 2024

Also, in production, how do you represent new videos with this architecture?

from vae_cf.

Related Issues (18)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.