Giter Club home page Giter Club logo

subgraph-sketching's People

Contributors

melifluos avatar shir994 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

subgraph-sketching's Issues

Missing (1,3) and (3, 1) Subgraph Features when max_hash_hops is 3

Hey,

Thanks for sharing the code!

I observed that when running with --max_hash_hops equal to 3 the counts for the (1, 3) and (3, 1) labels are 0.

Here's an example on ogbl-collab. Below is the mean subgraph features for the train/valid/test set. The 4th, 5th, 11th, and 12th entries of each are all 0. Per the lookup table in hashing.py , the 4th and 5th indices correspond to the (3, 1) and (1, 3) labels.

tensor([   6.5106,    7.8842,   12.4787,   70.1434,    0.0000,    0.0000,
         107.7395,  164.5204,  837.0035,    3.4168,    3.3445,    0.0000,
           0.0000, 1545.4868, 2061.3445])

tensor([   3.5791,    4.6430,    8.1396,   53.2897,    0.0000,    0.0000,
          83.6231,  136.0730,  767.9641,    4.1845,    4.8364,    0.0000,
           0.0000, 1907.1871, 2633.1541])

tensor([2.3125e+00, 4.6931e+00, 7.8365e+00, 4.6649e+01, 0.0000e+00, 0.0000e+00,
        9.1180e+01, 1.4621e+02, 8.0607e+02, 6.8422e+00, 7.3138e+00, 0.0000e+00,
        0.0000e+00, 2.1475e+03, 2.8840e+03])

This seems to be caused by the function ElphHashes.get_subgraph_features() (here) when the --use_zero_one isn't set.

buddy_code

When max_hops == 2 the 4th and 5th entries correspond to the (0, 1) and (1, 0) labels. However when max_hops == 3 it's the (1, 3) and (3, 1) labels. The same problem also seems to be found in the code here.

From my understanding this is a bug and the indices should be changed to 9 and 10 when max_hops == 3?

Please let me know if I'm misunderstanding what's happening in some way.

Thanks!

Harry

Possible mistake in ElphHashes

I think the following code is incorrect:

features[:, 7] = cards1[:, 1] - features[:, 0] - torch.sum(features[:, 0:4], dim=1) - features[:,
5] # (2, 0)

It should be:
features[:, 7] = cards1[:, 1] - torch.sum(features[:, 0:4], dim=1) - features[:, 5] # (2, 0)

reproduce the ppa

Dear authors,

Thanks for providing the code. We have some issues to reproduce the results for the ppa dataset. I tuned the parameters by myself:--lr in the range of [0.01, 0.001], --label_dropout in the range of [0.1, 0.3,0.5], --feature_dropout in the range of [0.1, 0.3, 0.5], --hidden_channels to be 256, and I also add the--use_RAto include the RA feature. By selecting the parameter based on the validation performance, I got the hit100 result around 38 which is quite lower than 49 as reported in the paper.

To reproduce the result, what paramter do I need to tune? Could you please kindly give some suggestions?

The results in the paper cannot be reproduced

Thank you for sharing your code!When I reproduced BUDDY's results using the parameter Settings provided, I got a lower result than reported in the paper. There is a 3%-5% decline in almost every dataset. Whether some additional setup is required?

Sincerely

Is a node its own neighbour?

Hi @melifluos, I have finished an open-source implementation in Rust of the sketching features described in this paper, but after I ran it on a graph as sparse as directed WikiData I realized that many if not all features in the leaves are zeros if I consider each node not as part of its neighbourhood, aka introducing self-loops.

I was wondering what was your opinion in the matter, as I am not sure whether your paper suggests including them or not.

I guess I'll just add a flag and the user can decide for himself - as you may find in my implementation I have added the option to ingest as part of the sketch several other features, such as node types and edge types.

Please find attached the performance of running it on WikiData (over 1 billion nodes a 5 billion edges) on my desktop.

Screenshot 2023-05-31 alle 23 36 01

Problem for running the ddi

Dear authors,

Thanks for providing the source code of the ELPH/BUDDY. When I tried to run the ddi following the command python runners/run.py --dataset ogbl-ddi --K 20 --train_node_embedding --propagate_embeddings --epochs 120 --num_negs 6 --model BUDDY , I got the error: AttributeError: 'BUDDY' object has no attribute 'sign_embedding'. Could you please help?

Best,
Juanhui

torch_sparse

My versions are all correct, but I keep getting this error, "FileNotFoundError: Could not find module 'D:\Anacondas\envs\ss1\Lib\site-packages\torch_sparse_convert_cuda.pyd' (or one of its dependencies). Try using the full path with the constructor syntax."

Reproduction of GCN and SAGE

Hi, when i reproduce SAGE on Plantoid dataset Citeseer and Pubmed, i get HR@100 on citeseer 37 while on pubmed 57. so i want to know if you may confuse the results, it seems a bit weird...

Suggestion for hyperparameter tuning for other non-attributes datasets

Hi,

I am trying to apply ELPH/BUDDY to other non-attributed graphs (no node features attached graph). There seem to be a lot of hyperparameters to tweak. Can you provide some general suggestions about how to find good hyperparameters?

I plan to play with those non-default parameters in README on ogbl-ddi and ogbl-ppa. More advice is greatly appreciated.

Initialize the hashing table of ELPH at every step/epoch

if self.init_hashes == None:
self.init_hashes = self.elph_hashes.initialise_minhash(num_nodes).to(x.device)
if self.init_hll == None:
self.init_hll = self.elph_hashes.initialise_hll(num_nodes).to(x.device)
# initialise data tensors for storing k-hop hashes
cards = torch.zeros((num_nodes, self.num_layers))
node_hashings_table = {}
for k in range(self.num_layers + 1):
logger.info(f"Calculating hop {k} hashes")
node_hashings_table[k] = {
'hll': torch.zeros((num_nodes, self.hll_size), dtype=torch.int8, device=edge_index.device),
'minhash': torch.zeros((num_nodes, self.num_perm), dtype=torch.int64, device=edge_index.device)}
start = time()
if k == 0:
node_hashings_table[k]['minhash'] = self.init_hashes
node_hashings_table[k]['hll'] = self.init_hll
if self.feature_prop in {'residual', 'cat'}: # need to get features to the hidden dim
x = self._encode_features(x)
else:
node_hashings_table[k]['hll'] = self.elph_hashes.hll_prop(node_hashings_table[k - 1]['hll'],
hash_edge_index)
node_hashings_table[k]['minhash'] = self.elph_hashes.minhash_prop(node_hashings_table[k - 1]['minhash'],
hash_edge_index)
cards[:, k - 1] = self.elph_hashes.hll_count(node_hashings_table[k]['hll'])
x = self.feature_conv(x, edge_index, k)

In the ELPH model, the hashing of the nodes is calculated on the fly. However, the hashing table of ELPH is only initialized once for the entire training process. In this case, I think the propagation of hashing is also only needed once since both the initial hashing and the graph structure don't change.

Alternatively, is it possible to re-initialize the hashing table at every training step/epoch? It then indeed requires repeated propagation of the hashing. Also, it may reduce the variance of the structural feature estimation.

Does the order of src/dst node matter for structural features?

When generating the structural features, the count for (2,1) and (1,2) are calculated as two values and present in the feature vector. Does it violate the permutation-equivalence property for an undirected graph?

Under such implementation, it may give different predictions to the same edge when flipping the order of the src/dst nodes.

I have some problem when running the project

I am nearly mad. I spent the whole afternoon trying to run the code, but it seems that I can't execute it according to the given instructions now. It seems that we should use "pip install torch-geometric" instead of "conda install pyg -c pyg" on gpu.I don't know why. Can someone tell me about the difference?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.