yangji9181 / hne Goto Github PK
View Code? Open in Web Editor NEWHeterogeneous Network Embedding: Survey, Benchmark, Evaluation, and Beyond
Heterogeneous Network Embedding: Survey, Benchmark, Evaluation, and Beyond
In some case, We want to split a smaller dataset from your preprocess dataset, but I find that the weight information is missing.So, I think maybe you can add the weight information into link.dat.test?
Hi,
I tried to run TranE on two different datasets. Both times it throws segmentation fault.
### Program terminated with signal 11, Segmentation fault.
#0 _int_malloc (av=av@entry=0x7fda5cf9b760 <main_arena>, bytes=bytes@entry=800000) at malloc.c:3780
3780 set_head(remainder, remainder_size | PREV_INUSE);
It works on PubMed and my other dataset (133k triplets)
Any pointers to solve this problem..
ps: The system has around 700 Gb ram...
my dgl version is 0.5.2
in MAGNN utils.py line 206:
g.from_network(ng)
raise an error:
raise DGLError('DGLGraph.from_networkx is deprecated. Please call the following\n\n'
dgl._ffi.base.DGLError: DGLGraph.from_networkx is deprecated. Please call the following
dgl.from_networkx(nx_graph, node_attrs, edge_attrs)
, which creates a new DGLGraph from the networkx graph.
In model.eval(), it iterate through every batch to get node embeddings.
main.py line 112: for batch_num in range(batch_total)
I am wondering why in model.train(), you don't iterate through all batch in each epoch?
R-GCN Model
dataset="PubMed"
python 3.7 pytorch 1.7.0 dgl-cu102 0.5.2
Traceback (most recent call last):
File "src/main.py", line 185, in
main(args)
File "src/main.py", line 90, in main
embed, pred = model(g, node_id, edge_type, edge_norm)
File "/home/lqd/software/anaconda3/envs/HNE/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/lqd/code/HNE-master/Model/R-GCN/src/model.py", line 117, in forward
output = self.rgcn.forward(g, h, r, norm)
File "/home/lqd/code/HNE-master/Model/R-GCN/src/model.py", line 52, in forward
h = layer(g, h, r, norm)
File "/home/lqd/software/anaconda3/envs/HNE/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/lqd/software/anaconda3/envs/HNE/lib/python3.7/site-packages/dgl/nn/pytorch/conv/relgraphconv.py", line 274, in forward
g.srcdata['h'] = feat
File "/home/lqd/software/anaconda3/envs/HNE/lib/python3.7/site-packages/dgl/view.py", line 81, in setitem
self._graph._set_n_repr(self._ntid, self._nodes, {key : val})
File "/home/lqd/software/anaconda3/envs/HNE/lib/python3.7/site-packages/dgl/heterograph.py", line 3811, in _set_n_repr
' same device.'.format(key, F.context(val), self.device))
dgl._ffi.base.DGLError: Cannot assign node feature "h" on device cuda:0 to a graph on device cpu. Call DGLGraph.to() to copy the graph to the same device.
The file "path.dat" is lost in all datasets, while these files are necessary in MAGNN/utils.py
When I run MAGNN model, the dgl version matches 0.3, but this version is not suit for R-GCN model. I see this awesome rep contains 13 algorithms, they all have original git rep. So there may be conflict in some packages, if anyone successfully run all algorithms, it will be very nice to provide the version of the important packages such as dgl
pytorch 1.4.0
Dimension out of range (expected to be in range of [-1, 0], but got -2)
dataset : freebase
model: HGT
maybe the data format is wrong
Traceback (most recent call last):
File "src/main.py", line 137, in
node_rep, _ = model.forward(node_feature.to(device), node_type.to(device), edge_time.to(device), edge_type.to(device), edge_index.to(device))
File "/home/bopa/project/HNE/Model/HGT/src/model.py", line 170, in forward
meta_xs = gc(meta_xs, node_type, edge_index, edge_type, edge_time)
File "/home/bopa/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/bopa/project/HNE/Model/HGT/src/model.py", line 49, in forward
return self.propagate(edge_index, node_inp=node_inp, node_type=node_type, edge_type=edge_type, edge_time=edge_time)
File "/home/bopa/.local/lib/python3.6/site-packages/torch_geometric/nn/conv/message_passing.py", line 233, in propagate
kwargs)
File "/home/bopa/.local/lib/python3.6/site-packages/torch_geometric/nn/conv/message_passing.py", line 156, in collect
self.set_size(size, dim, data)
File "/home/bopa/.local/lib/python3.6/site-packages/torch_geometric/nn/conv/message_passing.py", line 119, in set_size
elif the_size != src.size(self.node_dim):
IndexError: Dimension out of range (expected to be in range of [-1, 0], but got -2)
Hi, can you give a few more examples or explanations on the meta meaning in run.sh? I noticed that you wrote a comment saying
"# Choose the meta-paths used for training. Suppose the targeting node type is 1 and link type 1 is between node type 0 and 1, then meta="1" means that we use meta-paths "101"." Do the metapaths all end with targeting node type 1? So, 1, 2, 4, 8 actually mean four meta-paths: 101, 201, 401 and 801? Thank you very much!
Hi,
I followed your steps in running the DistMult. I got this error
File "src/main.py", line 182, in
main(args)
File "src/main.py", line 83, in main
train_batcher = StreamBatcher(args.data, 'train', args.batch_size, randomize=True, keys=input_keys, loader_threads=args.loader_threads)
File ".....HNE/Model/DistMult/src/spodernet/spodernet/preprocessing/batching.py", line 217, in init
log.error('Path {0} does not exists! Have you forgotten to preprocess your dataset?', config_path)
File "....HNE/Model/DistMult/src/spodernet/spodernet/utils/logger.py", line 106, in error
raise Exception(message.format(*args))
Exception: Path ...../.data/PubMed/train/hdf5_config.pkl does not exists! Have you forgotten to preprocess your dataset?
Exception ignored in: <bound method StreamBatcher.del of <spodernet.preprocessing.batching.StreamBatcher object at 0x7f4869e3b5c0>>
Traceback (most recent call last):
File ".....HNE/Model/DistMult/src/spodernet/spodernet/preprocessing/batching.py", line 268, in del
for worker in self.loaders:
AttributeError: 'StreamBatcher' object has no attribute 'loaders'
Dataset: Yelp
Supervised = 'True'
Traceback (most recent call last):
File "src/main.py", line 188, in
main(args)
File "src/main.py", line 94, in main
if args.supervised=='True': loss = model.get_supervised_loss(pred, matched_labels, matched_index, multi)
File "/home/lqd/code/HNE-master/Model/R-GCN/src/model.py", line 142, in get_supervised_loss
predict_loss = F.binary_cross_entropy(torch.sigmoid(embed[matched_index]), matched_labels)
File "/home/lqd/software/anaconda3/envs/HNE/lib/python3.7/site-packages/torch/nn/functional.py", line 2526, in binary_cross_entropy
input, target, weight, reduction_enum)
RuntimeError: Found dtype Long but expected Float
Sorry, since the issue #9 is closed. I reopen this issue.
I installed spodernet. However,
>>> import spodernet
>>> import spodernet.utils
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'spodernet.utils'
As you can see, spodernet is imported normally. But utils not found.
Hi,
DistMult uses negative sampling for the training...
I couldn't find Negative sampling parameter in the algo.
How did you generate the negative samples and how many NS per positive triplet?
Hi,
I ran your code for DistMult on PubMed dataset.
It saves embeddings of entities, where does it save the embedding of relations.
I want to use them in scoring function and calculate MRR and Hits.
Thanks.
Traceback (most recent call last):
File "src/main.py", line 13, in <module>
from spodernet.hooks import LossHook, ETAHook
File "/home/usr/anaconda3/lib/python3.7/site-packages/spodernet/hooks.py", line 6, in <module>
from spodernet.utils.util import Timer
ModuleNotFoundError: No module named 'spodernet.utils'
Happy new year! I got a problem in running the code of HAN and MAGNN with unsuperviesd training, using Yelp's data. And this problem is that these codes run out of my 16G memory. ( T^T ) I am not sure that there are another problems,but I wonder if it's related to data's link number ? Also I want to ask whether there are any ways to help me run this code, such as minibatch?
Then I also try to use my data on MAGNN, which occurred the problem that the items numbers of " batch_node_features " and "batch_targets" are not equal. So that when compute the unsupervised loss, the node_features will occur the Keyword Error. ( Of course, the PubMed runs well ) So I think there is something wrong with my path.dat file , I can't figure out what problem is.
I am confused about what to put in config.dat of model HAN, you said config.dat: The first line specifies the targeting node type. The second line specifies the targeting link type. The third line specifies the information related to each link type, e.g., {head_node_type}\t{tail_node_type}\t{link_type}. I don't understand it fully.
HAN training input file:
link.dat: Each line is formatted as {head_node_id}\t{tail_node_id}\t{link_type}
Is it possible to add edge weight as input and training HAN on a weighted graph? If yes, please advise where shall I make the code modifications. Thanks.
May I know how you find the best hyperparameters for the unsupervised task and use that embedding for downstream tasks? Are the dataset used for hyperparameter tuning of unsupervised tasks and downstream tasks the same (link.dat.test)? Thank you.
Hello,
First of all, thank you for your paper and your code, it is a big help for anyone interested in HNE problem.
I have a question regarding MAGNN training. I tried fitting it to unattributed PubMed (smallest dataset of 4) in an unsupervised fashion. However, I couldn't do it - after the training started, all 32 gigs of RAM I have available were taken and then the script crashed.
I didn't change MAGNN parameters after cloning the repo and ran the data transform stage as indicated in readme. My question is, am I doing something wrong? Not entirely sure how 118 MB dataset and 2-layer net could do this. How much memory did you need for this task while obtaining the results for the paper?
I have seen in the closed issues that someone else has run into the same problem but there was no definite resolution there :(
Please help :)
Hi, I am curious about why do you refer the source code of HAN as NLAH? Is the model in HAN folder HAN model?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.