Hi, dear authors In algorithm 1 line 4 of your paper, the aggregator

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Questions about the algorithm about graphsage HOT 4 CLOSED

williamleif commented on August 20, 2024

Questions about the algorithm

from graphsage.

Comments (4)

RexYing commented on August 20, 2024

Hi!

Yes, we are sampling with replacement in the code for convenience. In fact, if max-pooling aggregation is used, sampling with or without replacement would make no difference. In practice, we also do not see any noticeable difference when sampling with replacement for other aggregation methods.

Your second concern is the reason why doing minibatch training is more memory efficient here: when training, in each iteration, we compute h's only for nodes in minibatch and their neighbors, as opposed to computing h's for all nodes in graph.

Could you clarify what do you mean by minibatch data? Are you referring to the sample data in this repo, or the minibatch training procedure? Algorithm 1 in paper is the inference algorithm for computing node embeddings given weight parameters.

from graphsage.

DanqingZ commented on August 20, 2024

@RexYing
Hi! Many thanks for your reply!

(1) for question 1, yes it is very straightforward to understand sampling with or without replacement will not influence the max-pooling aggregator. But for the mean aggregator, I can intuitively understand it, but not theoretically.
In the mean aggregator part, you cited the "Discriminative Embeddings of Latent Variable Models for Structured Data" paper. In the algorithm 3 of the paper, it is still a the mean of the neighbors as the author proved the iterative update steps of mean field and loopy belief can be viewed as function mappings of the embedded marginals. I don't know why sampling with replacement for the mean aggregator works theoretically.

(2) for question 1, 2. Thanks I previously had misunderstanding. When I was reading your paper, I was thinking in each training iterations, the weight is trained based on minibatch samples, however, all the h_v^{k} should be updated based on the new weight. I understand if I did it this way, it is computationally expensive to update the h_v^{k} for all nodes.
I am confused because when I think about mean field and loopy belief, in each iteration, the values of all nodes are updated. So it looked weird to me why it is guaranteed to converge when I only sample some of the nodes, and only update the values of these nodes while keeping the values of the other nodes fixed in one training iteration.

I am reading the appendix of your paper, it will be great if you can point me to the theorem or corollary that exactly answers my questions. Thanks a lot!

from graphsage.

williamleif commented on August 20, 2024

Hi,

Building off what Rex mentioned, I think a key point is that our sampling with replacement approach is a practical/empirical method to deal with large datasets. Intuitively, it can also be viewed as a form of "node/edge dropout" or regularization.

However, none of our proofs actually show that the sampling with replacement converges in the sense that you are discussing. And in fact, we assume that full neighborhood sets are used in our proofs. This will be made more clear in the next version our paper.

Does this clear things up?

from graphsage.

DanqingZ commented on August 20, 2024

@williamleif , thanks for your reply!
I think I have got the answer for my question (1)
But I still don't know the answer to question (2). I don't know why in each iteration, only the mini batch samples got the h_v^{k} updated.
Maybe I don't understand it correctly, thanks again for taking the time to answer my questions!

from graphsage.

Questions about the algorithm about graphsage HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent