awslabs / dgl-ke Goto Github PK
View Code? Open in Web Editor NEWHigh performance, easy-to-use, and scalable package for learning large-scale knowledge graph embeddings.
Home Page: https://dglke.dgl.ai/doc/
License: Apache License 2.0
High performance, easy-to-use, and scalable package for learning large-scale knowledge graph embeddings.
Home Page: https://dglke.dgl.ai/doc/
License: Apache License 2.0
Hello DGL-KE people!
I read the arxiv 2020 paper and I found it very interesting. It is great that the project is open source.
I am wondering if I can implement my own graph embedding algorithm on top of your system, which focuses only in knowledge graph embeddings. For example, can I implement DeepWalk in DGL-KE?
If yes, do you provide abstractions for defining random-walk operations in DGL-KE?
If no, is it possible to build such abstractions in your system?
Thanks in advance.
Best,
Makis
Thanks for this awesome library!
I have a confusion in the score function for the TransE model which the library uses. The edge score is given in https://github.com/awslabs/dgl-ke/blob/master/python/dglke/models/pytorch/score_fun.py#L59 as (gamma - norm(h+r-t)). In the forward() method of the KE model https://github.com/awslabs/dgl-ke/blob/master/python/dglke/models/general_models.py#L346, the final score is the sum of logsigmoid of positive score and mean of logsigmoid of negative score. Could you please point me to a paper that uses this particular loss function. It seems to be a combination of margin based loss and logistic loss.
Hi,
I found that in the package /python/dglke/models/general_models.py as refferenced below:
dgl-ke/python/dglke/models/general_models.py
Line 455 in ff4da0b
the line: rankings = F.sum(neg_scores >= pos_scores, dim=1) + 1 seems to exclude the positive sample itself out of the top 1 hit while it compare with itself , which will cause the top 1 hit is almost always near to 0%
I suppose that the ">=" should be changed as ">" like below :
rankings = F.sum(neg_scores > pos_scores, dim=1) + 1
Is that right?
Hi, guys, this looks like a great start of a powerful knowledge graph embedding libraray! Thanks for sharing it!
My question is: many practical applications involve knowledge graphs with mixture of structured (knowledge graph itself) and unstructured literal data (attributes like description, name, date, and etc). Do you have any plan to support literal-enhanced embedding like LiteralE as well? thanks.
The other interesting development is Graph-BERT, where attention on local subgraph is used instead of GCN. It seems to be more scalable. Will you coonsider supporting it?
ps. both algorithms already have their source code available right now.
Few-shot link prediction proposed by this paper seems a useful technique to support in DGL-KE.
When --valid
/ --test
is turned on, the training process will get killed during evaluation. To solve this, --batch_size_eval
and --neg_sample_size_eval
should be the same value.
It's useful to plot the entities after training their embeddings. This helps us verify the training results.
Users' input data may have different delimiters. We should allow users to specify the delimiter.
I understand that when working with larger KGs, it's convenient to set a neg_sample_size_eval that's smaller than num_entities in order to speed up link prediction in evaluation.
However, having a smaller neg_sample_size_eval also means potentially inflating link prediction results (if I evaluate each positive triple against only 1000 negative triples among 200k potential negative samples, the model will likely get higher evaluation metrics). So there's a tradeoff between faster evaluation and less biased evaluation metrics.
How do you deal with this tradeoff and justify using a certain value for the hyperparameter? Do we just assume that neg_sample_size_eval of 10000 is sufficient to approximate the true metrics?
Firstly, run the following command and successfully done
dglke_train --model_name TransE_l1 --dataset FB15k ......
Since FB15k has existed, I want to train directly without downloading. Then, I change the --dataset FB15k
into:
dglke_train --model_name TransE_l1 --data_path ./data/FB15k/ --data_files entities.dict relations.dict train.txt valid.txt test.txt --format udd_hrt .....
However, the terminal gives me an error:
Using backend: pytorch
Traceback (most recent call last):
File "/home/hjhuang/anaconda3/envs/dgl/bin/dglke_train", line 33, in <module>
sys.exit(load_entry_point('dglke==0.1.0.dev0', 'console_scripts', 'dglke_train')())
File "/home/hjhuang/anaconda3/envs/dgl/lib/python3.6/site-packages/dglke-0.1.0.dev0-py3.6.egg/dglke/train.py", line 81, in main
File "/home/hjhuang/anaconda3/envs/dgl/lib/python3.6/site-packages/dglke-0.1.0.dev0-py3.6.egg/dglke/dataloader/KGDataset.py", line 603, in get_dataset
AssertionError: You should provide the dataset name for raw_udd format.
Hi,
I noticed that when use the distribution train, when launch the kvstore server and client, it seems that
the bin path is fixed to within "/usr/local/bin:/bin:/usr/bin:/sbin/" in dglke/dist_train.py like below:
dgl-ke/python/dglke/dist_train.py
Line 79 in ad4be69
however, if the user's develop environment is conda, is may cause some problem.
For me ,I use conda as my python environment,
so the two executable files named dglke_server and dglke_clinet is automatically loacated in "/root/miniconda3/bin/" directory , so it will cause this two files can't be launch correctly.
I suggest that ,maybe when it comes to the path setting, it should add some code to find the right path of dglke's running executable files like dglke_server etc.
Is that right?
What's your input argument to get the result shown in /examples/README.md, on the FB15K dataset?
The Python API is convenient for many use cases. It allows more customization and is very friendly for Jupyter Notebook users.
Is it possible to set this flag to all edges? I mean not by specifying the exact number, but some flags like "all_edges."
I needed to make some changes to the train script, in particular I wanted to change the head and tail samplers in training to exclude_positive=True
here and here, but as soon as I do that this assert in dgl fails:
python3: /opt/dgl/src/graph/sampler.cc:1186: dgl::NegSubgraph dgl::{anonymous}::EdgeSamplerObject::genNegEdgeSubgraph(const dgl::Subgraph&, const string&, int64_t, bool, bool): Assertion `prev_neg_offset + neg_sample_size == neg_vids.size()' failed.
Am I using the sampler incorrectly here?
Hi,
Is there any reference implementation of RGCN on knowledge graphs for multi-gpu training in DGL-KE?
Alternatively, in DGL, I can see some reference RGCN implementation for link prediction task, but I think that is not multi-gpu (may I confirm if that is correct?). Do you have some suggestions on how I can run RGCN on a multi-gpu setting?
Thanks for some pointers on this issue!
Hello! I'm running into this 'std::bad_alloc' error:
|test|: 17248443
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Aborted (core dumped)
I split my dataset into training, validation, and test datasets. I first used dglke_train and passed these three files to --data_files. It finished training successfully. But when I ran dglke_eval with these three files, it yielded this error.
I'm pretty sure I have enough space on the machine. Do you know what could be the possible problem? Also, I'm confused by the command line arguments --dataset and --data_files of dglke_eval. What's the usage of --dataset when running my own dataset? Should I pass the same files to --data_files for evaluation as those for training?
Currently, the KGE models are trained with the logistic loss. However, it's desirable to train the models with a pair-wise ranking loss because in a lot of cases negative edges can potentially be missing edges. The logistic loss cannot handle this case.
Although it is a minor release, but we will introduce several interesting features:
I would like to see the curve of MRR during the training process so that I can get the peak MRR, instead of the MRR at the end of the training.
Hello,
Will triplet classification be implemented as one of the evaluation tasks?
How would you recommend one go about implementing it using dglke?
What's the output of eval.log
supposed to be? When I run this on my user-defined knowledge graph, it just saves an empty text file.
Here's an example of the script I run for eval:
DGLBACKEND=pytorch dglke_eval \
--data_path results_SXSW_2018 --dataset SXSW2018 \
--data_files entities.tsv relations.tsv all_ctups_40.tsv valid.tsv test.tsv --format udd_hrt \
--model_name ComplEx \
--hidden_dim 128 --gamma 128 \
--mix_cpu_gpu --num_proc 6 --num_thread 5 --gpu 0 1 \
--batch_size_eval 1024 --neg_sample_size_eval 10000 --eval_percent 20 \
--model_path /home/amruch/graphika/kg/ckpts/SXSW2018_ComplEx_20200511/
Thanks for contributing such wonderful work! There is an error when I run RotatE using command "DGLBAKEND=pytorch dglke_train --model_name RotatE", which shows:
File ".../dglke/models/pytorch/score_fun.py", line 423, in edge_func
re_score = re_head * re_rel - im_head * im_rel
RuntimeError: The size of tensor a (200) must match the size of tensor b (400) at non-singleton dimension 1
In the paper RotatE it only rotates the head and it seems there is some change in this implement, so I don't know how to fix it. I wonder if you could help me.
hi,
This is a great project and recently our team have tried to run some demo programs.
Now I have a question, TransE, for example, is mostly used for Heterogeneous Graphs which contain different ralations, but could this TransE applied to undirect graph? Undirected Graph only contain one kind of relation and I think it's a special case of Heterogeneous Graphs, so actually I believe that TransE could used here, but how about the performance compared with those embedding algorithm used only for undirected graph(node2vec for instance)? As TransE would create embedding vector for the relation though there is only one kind of relation for undirected graph and this relation embedding is indeed useless for undirect graph.
thx.
In the paper, section 5.3, it is mentioned that for the entire Freebase KB, you use a modified evaluation strategy in which
we use only 2000 negative triplets; 1000 sampled uniformly from the entire set of negative
samples and 1000 sampled proportionally to the degree of the corrupted entities;
Is this evaluation supported in the dgl-ke library? If so, which arguments can be used to enable this?
I noticed that the dglke_predict command is not available because pip installing gives me the stable version that doesn't include infer_score and the dglke_predict command.
How can I get around this?
The technique described in the paper "AutoNE: Hyperparameter Optimization for Massive Network Embedding" is interesting. Similar techniques should be incorporated into DGL-KE to tune hyperparameters on large knowledge graphs effectively.
when calling model.predict_score(dt.g) , I receive the following error:
/opt/conda/lib/python3.7/site-packages/dglke/models/pytorch/score_fun.py in (edges)
306
307 def forward(self, g):
--> 308 g.apply_edges(lambda edges: self.edge_func(edges))
309
310 def create_neg(self, neg_head):
/opt/conda/lib/python3.7/site-packages/dglke/models/pytorch/score_fun.py in edge_func(self, edges)
274
275 def edge_func(self, edges):
--> 276 real_head, img_head = th.chunk(edges.src['emb'], 2, dim=-1)
277 real_tail, img_tail = th.chunk(edges.dst['emb'], 2, dim=-1)
278 real_rel, img_rel = th.chunk(edges.data['emb'], 2, dim=-1)
/opt/conda/lib/python3.7/site-packages/dgl/utils.py in getitem(self, key)
282 def getitem(self, key):
283 if key not in self._keys:
--> 284 raise KeyError(key)
285 return self._fn(key)
286
KeyError: 'emb'
It's likely that the default value of exclude_positive for the EdgeSampler is True. The design will not compromise the performance for large and sparse graphs. But for small and dense graphs, the results can be effected, because the sampled negative edges are more likely to be positive edges.
Say I trained a graph but found it didn't reach the peak MRR. Can I take the embeddings I have now and continue training on them or do I have to start over?
Environment:
Windows -10 (only CPU)
torch: 1.5.0
dgl: 0.4.3
dgl-ke: built from source code.
Encountered below error while trying to execute a command given in the tutorial.
dglke_train --model_name TransE_l2 --dataset FB15k --batch_size 1000
--neg_sample_size 200 --hidden_dim 400 --gamma 19.9 --lr 0.25 --max_step 500 --log_interval 100
--batch_size_eval 16 -adv --regularization_coef 1.00E-09 --test --num_thread 1 --num_proc 8
Logs are being recorded at: ckpts\TransE_l2_FB15k_3\train.log
Reading train triples....
Finished. Read 483142 train triples.
Reading valid triples....
Finished. Read 50000 valid triples.
Reading test triples....
Finished. Read 59071 test triples.
|Train|: 483142
random partition 483142 edges into 8 parts
part 0 has 60393 edges
part 1 has 60393 edges
part 2 has 60393 edges
part 3 has 60393 edges
part 4 has 60393 edges
part 5 has 60393 edges
part 6 has 60393 edges
part 7 has 60391 edges
Using backend: pytorch
C:\Users\riz\Anaconda3\lib\site-packages\dgl\base.py:25: UserWarning: multigraph will be deprecated.DGL will treat all graphs as multigraph in the future.
warnings.warn(msg, warn_type)
Traceback (most recent call last):
File "C:\Users\riz\Anaconda3\Scripts\dglke_train-script.py", line 11, in
load_entry_point('dglke==0.1.0.dev0', 'console_scripts', 'dglke_train')()
File "C:\Users\riz\Anaconda3\lib\site-packages\dglke-0.1.0.dev0-py3.7.egg\dglke\train.py", line 129, in main
File "C:\Users\riz\Anaconda3\lib\site-packages\dglke-0.1.0.dev0-py3.7.egg\dglke\dataloader\sampler.py", line 368, in create_sampler
File "C:\Users\riz\Anaconda3\lib\site-packages\dgl\contrib\sampling\sampler.py", line 660, in init
self._seed_edges = utils.toindex(self._seed_edges)
File "C:\Users\riz\Anaconda3\lib\site-packages\dgl\utils.py", line 242, in toindex
return data if isinstance(data, Index) else Index(data)
File "C:\Users\riz\Anaconda3\lib\site-packages\dgl\utils.py", line 15, in init
self._initialize_data(data)
File "C:\Users\riz\Anaconda3\lib\site-packages\dgl\utils.py", line 22, in _initialize_data
self._dispatch(data)
File "C:\Users\riz\Anaconda3\lib\site-packages\dgl\utils.py", line 47, in _dispatch
raise DGLError('Index data must be an int64 vector, but got: %s' % str(data))
dgl._ffi.base.DGLError: Index data must be an int64 vector, but got: tensor([ 68037, 423679, 381929, ..., 440877, 26144, 464339],
dtype=torch.int32)
I am not sure whether it's a bug or I missed something in the setup. Can someone please help me to resolve this?
It doesn't seem like there are a lot of options for limiting memory consumption in dgl-ke
at the moment, so I was wondering if you have any suggestions for my problem. Presently, my model is running out of ram at
[proc 0][Train](12000/12000) average regularization: 0.00017675260825490114
[proc 0][Train] 1000 steps take 12.623 seconds
[proc 0]sample: 2.133, forward: 5.806, backward: 2.516, update: 2.070
proc 0 takes 161.118 seconds
training takes 162.84567785263062 seconds
Process Process-3:
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.6/dist-packages/dglke/models/pytorch/tensor_models.py", line 77, in decorated_function
raise exception.__class__(trace)
RuntimeError: Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/dglke/models/pytorch/tensor_models.py", line 65, in _queue_result
res = func(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/dglke/train_pytorch.py", line 238, in test_mp
test(args, model, test_samplers, rank, mode, queue)
File "/usr/local/lib/python3.6/dist-packages/dglke/train_pytorch.py", line 214, in test
model.forward_test(pos_g, neg_g, logs, gpu_id)
File "/usr/local/lib/python3.6/dist-packages/dglke/models/general_models.py", line 321, in forward_test
neg_deg_sample=self.args.neg_deg_sample_eval)
File "/usr/local/lib/python3.6/dist-packages/dglke/models/general_models.py", line 243, in predict_neg_score
neg_head = self.entity_emb(neg_head_ids, gpu_id, trace)
File "/usr/local/lib/python3.6/dist-packages/dglke/models/pytorch/tensor_models.py", line 203, in __call__
s = self.emb[idx]
RuntimeError: [enforce fail at CPUAllocator.cpp:64] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 16751475200 bytes. Error code 12 (Cannot allocate memory)
The above was run with
DGLBACKEND=pytorch dglke_train \
--data_path results_SXSW_2018 \
--data_files entities.tsv relations.tsv train.tsv valid.tsv test.tsv \
--format udd_hrt \
--model_name ComplEx \
--max_step 50000 --batch_size 1000 --neg_sample_size 200 --batch_size_eval 16 \
--hidden_dim 400 --gamma 19.9 --lr 0.25 --regularization_coef=1e-9 -adv \
--gpu 0 1 --async_update --force_sync_interval 1000 --log_interval 1000 \
--test
And has the following number of canonical tuples
Reading train triples....
Finished. Read 91802780 train triples.
Reading valid triples....
Finished. Read 10200309 valid triples.
Reading test triples....
Finished. Read 11333677 test triples.
|Train|: 91802780
random partition 91802780 edges into 2 parts
part 0 has 45901390 edges
part 1 has 45901390 edges
My machine has two 1080 Ti GPUs and 128GB of RAM. So this pretty much used up all the RAM right away, which is odd because the graphvite
run on this knowledge graph finished fine (but took ~8 hours).
Allow to directly specify the relevant column indices of the input files (e.g. triplets_column_indices=[1, 0, 2]
):
Now you have to specify the format htr, rht,
etc. which is converted internally with _parse_srd_format
to [0,1,2], [1,0,2],
etc.
The advantage of specifying this directly is that it would also allow input files with unused columns (such as qualifiers or sources).
It would also be great if this is possible for the id mapping files.
The dataset that I want to use has the columns: property_id, en_label, en_description.
This cannot be loaded with the code from this pull request, since the label and id are in the wrong order, and there is an unused column.
Specifying something like relations_map_column_indices=[1,0]
would be very convenient.
When I tried to run predict (not from command line), the model requires entity.dict data that map a entity name to it's id. However there is no such data in the package. Can you please provide entity 2 id data. Thanks
Neither dglke_emb_sim
nor dglke_predict
are appearing in the pip3 installed version of dglke
, even after I try installing in a new environment and after I uninstall all older versions of dglke
. Are these methods only available through the github version? If so, when will these additions be pushed to pip3
?
(dglke) amruch@wit:~/Projects/AmazonScience/graphs$ sudo pip3 install dglke
The directory '/home/amruch/.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
The directory '/home/amruch/.cache/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions andowner of that directory. If executing pip with sudo, you may want sudo's -H flag.
Collecting dglke
Downloading https://files.pythonhosted.org/packages/9a/59/d9571eac71ef5e63784bbf4efa75bbe6803653e04057b774ce043a1b65e3/dglke-0.1.0-py3-none-any.whl (59kB)
100% |████████████████████████████████| 61kB 885kB/s
Requirement already satisfied: setuptools in /home/amruch/.local/lib/python3.6/site-packages (from dglke)
Requirement already satisfied: numpy in /usr/local/lib/python3.6/dist-packages (from dglke)
Installing collected packages: dglke
Successfully installed dglke-0.1.0
(dglke) amruch@wit:~/Projects/AmazonScience/graphs$ dglke_
dglke_client dglke_convert dglke_dist_train dglke_eval dglke_partition dglke_server dglke_train
Hi, I found that your hit@10 result on transE is 0.8x, but in fact the original paper is 0.4x, I am puzzled about this.
Hello!
When using the framework, I encountered the following problem:
This process was automatically killed, and the error report did not provide effective information, so I hope you can help.
Thanks in advance
:)
(dglke_env) YCHABOT@25f55d9cb3bd:~/WikiData$ dglke_train --model_name TransE_l2 --data_path ~/WikiData/ \
> --dataset wikidata_input_nohead.csv --delimiter , \
> --data_files ~/WikiData/wikidata_input_nohead.csv ~/WikiData/wikidata_test.csv ~/WikiData/wikidata_valid.csv \
> --format raw_udd_hrt --batch_size 512 --log_interval 1000 --neg_sample_size 25600 --batch_size_eval 25600 \
> --regularization_coef=1e-9 --hidden_dim 300 --gamma 19.9 --lr 0.25 --batch_size_eval 16 --test -adv \
> --gpu 0 1 2 3 --max_step 6000 --async_update
Reading train triples....
Finished. Read 491065553 train triples.
Reading valid triples....
Finished. Read 70000 valid triples.
Reading test triples....
Finished. Read 70000 test triples.
|Train|: 491065553
random partition 491065553 edges into 4 parts
part 0 has 122766389 edges
part 1 has 122766389 edges
part 2 has 122766389 edges
part 3 has 122766386 edges
/home/YCHABOT/anaconda3/envs/dglke_env/lib/python3.6/site-packages/dgl/base.py:25: UserWarning: multigraph will be deprecated.DGL will treat all graphs as multigraph in the future.
warnings.warn(msg, warn_type)
|valid|: 70000
|test|: 70000
Killed
Hi,
These days I have been using the out-of-box TransE algorithm come with DGL-KE , thanks for your excellent and kind work !
However, I also encountered a quesion about the loss funciton while I am tracing down to the source code about it in the:
dklke/models/general_models.py in method forward, lines between 370 and 399 as figures listed below:
It seems that it's NOT consistent with the loss function described in the paper on dgl-ke's official github homepage, as the figure showed below:
In this paper, the loss you author declared to be usued has just these 2 forms as below:
but is not the same with the implemented as I mentioned above in dgk-ke's source code,
so I'm wondering that why the source code of general_models.py has changed the loss form?
Dose it make any improvement compared with the oringinal two kind of loss function in your paper?
Looking forward your reply
According Performance and Scalability
in README, DGL-KE is much faster than GraphVite and
PBG, even with single GPU. Since GraphVite also using GPUs to speedup, what are the main reasons for these improvements?
We need to explain the arguments of commands.
Hi there,
I found I couldn't reproduce the test results after training.
After training, I got:
-------------- Test result --------------
Test average MRR : 0.2607070246959733
Test average MR : 847.37625
Test average HITS@1 : 0.16041666666666668
Test average HITS@3 : 0.2991666666666667
Test average HITS@10 : 0.45958333333333334
But when I use dgl_eval command to evaluate the saved embeddings, I got:
-------------- Test result --------------
Test average MRR: 0.09140389248502755
Test average MR: 5974.63
Test average HITS@1: 0.034583333333333334
Test average HITS@3: 0.08208333333333333
Test average HITS@10: 0.21875
The command I use is:
dglke_eval --model_name RotatE --dataset Mydata --hidden_dim 200 --gamma 12.0 --batch_size_eval 16
--gpu 0 1 2 3 4 5 6 7 --model_path ./ckpts/Mydata/RotatE_Mydata_0 --data_path ./Mydata/ --format raw_udd_hrt --data_files train.txt valid.txt test.txt
Could you please help me figure out this?
Besides, I also encountered the out-of-memory issue on a larger dataset using dgl_eval command, but it works fine on the same amount of GPUs using dgl_train.
Thanks.
Please refer to this for more details: #84 (comment)
I'm trying to find an optimal set of parameters by running dglke_train
with various sets of parameters (randomly sampled), and on the first instance it keeps freezing at the same iteration.
I'm running the test through the exclamation (!
) mode in a Jupyter notebook, so I can loop through different sampled parameters.
I'm using a custom dataset but these are the parameters:
model = TransE_l1
LOG_INTERVAL=1000
BATCH_SIZE=1000
BATCH_SIZE_EVAL=16
NEG_SAMPLE_SIZE=200
NEG_SAMPLE_SIZE_EVAL=100000
LR= 0.1
-adv= True
hidden_dim= 50
regularization_coef= 2e-08
gamma= 10
neg_deg_sample=False
Here are the last few steps before it freezes (for 10 minutes before I cancel it)
[proc 0][Train] 1000 steps take 8.256 seconds
[proc 0]sample: 1.353, forward: 4.006, backward: 1.711, update: 1.175
[proc 0][Train](35000/60000) average pos_loss: 0.19853664480149746
[proc 0][Train](35000/60000) average neg_loss: 0.2785924620358273
[proc 0][Train](35000/60000) average loss: 0.23856455320119857
[proc 0][Train](35000/60000) average regularization: 0.00012192570248589618
[proc 0][Train] 1000 steps take 8.269 seconds
[proc 0]sample: 1.278, forward: 3.986, backward: 1.712, update: 1.283
[proc 0][Train](36000/60000) average pos_loss: 0.19503579252958297
[proc 0][Train](36000/60000) average neg_loss: 0.27933850078843536
[proc 0][Train](36000/60000) average loss: 0.23718714690953493
[proc 0][Train](36000/60000) average regularization: 0.00012245436408556996
[proc 0][Train] 1000 steps take 8.305 seconds
[proc 0]sample: 1.346, forward: 4.012, backward: 1.712, update: 1.224
[proc 0][Train](37000/60000) average pos_loss: 0.19615361012518406
[proc 0][Train](37000/60000) average neg_loss: 0.27748048058338465
[proc 0][Train](37000/60000) average loss: 0.2368170451670885
[proc 0][Train](37000/60000) average regularization: 0.00012362484454206423
[proc 0][Train] 1000 steps take 8.305 seconds
[proc 0]sample: 1.270, forward: 3.999, backward: 1.733, update: 1.293
[proc 0][Train](38000/60000) average pos_loss: 0.19601027159392834
[proc 0][Train](38000/60000) average neg_loss: 0.2794102805918083
[proc 0][Train](38000/60000) average loss: 0.23771027632802724
[proc 0][Train](38000/60000) average regularization: 0.00012375975443137578
[proc 0][Train] 1000 steps take 8.283 seconds
[proc 0]sample: 1.310, forward: 3.903, backward: 1.766, update: 1.294
[proc 0][Train](39000/60000) average pos_loss: 0.19360717238485814
[proc 0][Train](39000/60000) average neg_loss: 0.2766080161612481
[proc 0][Train](39000/60000) average loss: 0.23510759409517049
[proc 0][Train](39000/60000) average regularization: 0.0001251919507922139
[proc 0][Train] 1000 steps take 8.287 seconds
[proc 0]sample: 1.269, forward: 3.998, backward: 1.742, update: 1.268
[proc 0][Train](40000/60000) average pos_loss: 0.19862385678291322
[proc 0][Train](40000/60000) average neg_loss: 0.279490821111016
[proc 0][Train](40000/60000) average loss: 0.2390573388412595
[proc 0][Train](40000/60000) average regularization: 0.00012537073031126055
[proc 0][Train] 1000 steps take 8.190 seconds
[proc 0]sample: 1.236, forward: 3.902, backward: 1.749, update: 1.293
[proc 0][Train](41000/60000) average pos_loss: 0.19015826864540578
[proc 0][Train](41000/60000) average neg_loss: 0.27666417042165997
[proc 0][Train](41000/60000) average loss: 0.23341121918708085
[proc 0][Train](41000/60000) average regularization: 0.00012650544225471094
[proc 0][Train] 1000 steps take 8.237 seconds
[proc 0]sample: 1.311, forward: 3.908, backward: 1.717, update: 1.291
[proc 0][Train](42000/60000) average pos_loss: 0.19738745559751988
[proc 0][Train](42000/60000) average neg_loss: 0.279010270354338
[proc 0][Train](42000/60000) average loss: 0.23819886273890734
[proc 0][Train](42000/60000) average regularization: 0.0001268844535225071
[proc 0][Train] 1000 steps take 8.367 seconds
[proc 0]sample: 1.301, forward: 4.038, backward: 1.755, update: 1.263
[proc 0][Train](43000/60000) average pos_loss: 0.19044273269176484
[proc 0][Train](43000/60000) average neg_loss: 0.2760635534534231
[proc 0][Train](43000/60000) average loss: 0.23325314317643642
Not sure why this is happening.
Hi!
Thanks for this awesome package! I'm wondering if there is any option available to fix the manual seed so I can reproduce same results across different trainning outputs. Currently I try to manually set the random seeds for pytorch and numpy under train_pytorch.py and dataloader/sampler.py but the final output embeddings of multiple trainning attempts are still different. Is there any workaround for this?
Thanks for any help in advance.
Hi, thanks for putting this library together. I will put a feature request together in a similar format to the dgl repo:
Negative sampling with type constraints in dgl.contrib.sampling.EdgeSampler
(via dataloader.sampler.TrainDataset
).
When using EdgeSampler
to sample negative edges in knowledge graph link prediction, it would be useful to incorporate domain-specific type constraints. For example, edges (relations) in a KG are often typed (only specific entity types can slot into the head or tail entities), so an EdgeSampler
that only samples negative edges by selecting head/tail nodes from a subset of all possible entities would greatly help.
One idea I had was to create different EdgeSampler
objects for relations and then batch the graph based on relations. That way when sampling a mini-batch we are guaranteed that all facts in the batch have the same relation type and can apply the same EdgeSampler
object to get negative samples. But it seems doing this requires diving into the C++ sampler code.
Another alternative is a two-step sampling procedure in training where I first a) sample positive edges only without replacement and then b) based on the relation types in the positive edges, sample negative edges from the specific EdgeSampler
with replacement. This seems to be cleaner but also somewhat inefficient. Are there other disadvantages to this?
Any guidance and tips on how best to implement this would be great. I'd be happy to contribute it back to the repo.
Similar functionality to how type constraints work in OpenKE.
Hi,
Thanks for sharing this library for distributed training.
Is there any plan to add GNN models such as RGCN?
What needs to be done to add GNN models and train them in a distributed environment.
Thanks
DGL-KE needs to verify the mappings and report more informative errors if the input data fails the checks.
Add more info about: entity_ids should start from 0 and be continuous.
And other limitations.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.