Please refer to this for more details: <a class="issue-link js-issue-link" data-error-

Thanks! Glad to hear that it was a helpful suggestion. <span class

The model defaulted to save my entity and relation embeddings as <code class="notransl

The dataset name in conf is wrong when users use their own dataset. about dgl-ke HOT 11 CLOSED

awslabs commented on July 21, 2024

The dataset name in conf is wrong when users use their own dataset.

from dgl-ke.

Comments (11)

AlexMRuch commented on July 21, 2024

It may be nicer to add a datetime timestamp (e.g., model_dataset_20200507_hr_mn_sec, where 20200507 is 2020 year, 05 month, 07 day) at the end of the output instead of an incrementing integer. Right now for hyperparameter tuning I have about 20 logs that look about the same I can't easily remember which was the one I ran on Monday (== my best model).

from dgl-ke.

classicsong commented on July 21, 2024

This could be a good point.
I think, you can use --save_path as a workaround now

from dgl-ke.

AlexMRuch commented on July 21, 2024

Thanks! Glad to hear that it was a helpful suggestion.

…

On Thu, May 7, 2020, 11:15 AM xiang song(charlie.song) < ***@***.***> wrote: This could be a good point. I think, you can use --save_path as a workaround now — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#85 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFIYWOJL4PQNIPGP5QGEYR3RQLGA5ANCNFSM4MXSB35Q> .

from dgl-ke.

AlexMRuch commented on July 21, 2024

FYI, I also realized that this saves the entity and relation embeddings as FB15k_ComplEx_entity.npy and FB15k_ComplEx_relation.npy instead of with the name of my dataset ("SXSW2018"). I manually updated it for now, but just wanted to give a heads up that it's not just the ckpts folder name that has FB15k.

Also, it a little odd that the folder name is MODEL_DATASET_ITERATION while the entity/relation embeddings are named DATASET_MODEL_TYPE.npy. For consistency, shouldn't they both be either MODEL_DATASET or DATASET_MODEL (preferably the latter -- DATASET_MODEL_*)?

from dgl-ke.

AlexMRuch commented on July 21, 2024

Manually changing the folder and entity/relation *.npy names still generates errors with dglke_eval:

amruch@wit:~/graphika/kg$ DGLBACKEND=pytorch dglke_eval \
> --data_path results_SXSW_2018 \
> --data_files entities.tsv relations.tsv all_ctups_10.tsv valid.tsv test.tsv --format udd_hrt \
> --model_name ComplEx \
> --hidden_dim 512 --gamma 128 \
> --mix_cpu_gpu --num_proc 6 --num_thread 5 --gpu 0 1 \
> --batch_size_eval 1024 --neg_sample_size_eval 10000 --eval_percent 20 \
> --model_path /home/amruch/graphika/kg/ckpts/SXSW2018_ComplEx_20200507/
Using backend: pytorch
Reading train triples....
Finished. Read 113336766 train triples.
Reading valid triples....
Finished. Read 5383497 valid triples.
Reading test triples....
Finished. Read 5666839 test triples.
Logs are being recorded at: /home/amruch/graphika/kg/ckpts/SXSW2018_ComplEx_20200507/eval.log
/usr/local/lib/python3.6/dist-packages/dgl/base.py:25: UserWarning: multigraph will be deprecated.DGL will treat all graphs as multigraph in the future.
  warnings.warn(msg, warn_type)
|valid|: 5383497|test|: 5666839
Traceback (most recent call last):
  File "/usr/local/bin/dglke_eval", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/dist-packages/dglke/eval.py", line 196, in main
    model = load_model_from_checkpoint(logger, args, n_entities, n_relations, ckpt_path)
  File "/usr/local/lib/python3.6/dist-packages/dglke/train_pytorch.py", line 109, in load_model_from_checkpoint
    model.load_emb(ckpt_path, args.dataset)
  File "/usr/local/lib/python3.6/dist-packages/dglke/models/general_models.py", line 178, in load_emb
    self.entity_emb.load(path, dataset+'_'+self.model_name+'_entity')
  File "/usr/local/lib/python3.6/dist-packages/dglke/models/pytorch/tensor_models.py", line 318, in load
    self.emb = th.Tensor(np.load(file_name))
  File "/home/amruch/.local/lib/python3.6/site-packages/numpy/lib/npyio.py", line 384, in load
    fid = open(file, "rb")
FileNotFoundError: [Errno 2] No such file or directory: '/home/amruch/graphika/kg/ckpts/SXSW2018_ComplEx_20200507/FB15k_ComplEx_entity.npy'```
^^^ It still expects things to be `FB15k`

from dgl-ke.

classicsong commented on July 21, 2024

What is under /home/amruch/graphika/kg/ckpts/SXSW2018_ComplEx_20200507/ ?
It needs the dataset name as prefix in the name of saved embedding.

dgl-ke/python/dglke/models/general_models.py

Lines 168 to 180 in 27c9b98

 def load_emb(self, path, dataset): 

 """Load the model. 

  Parameters 

  ---------- 

  path : str 

  Directory to load the model. 

  dataset : str 

  Dataset name as prefix to the saved embeddings. 

  """ 

 self.entity_emb.load(path, dataset+'_'+self.model_name+'_entity') 

 self.relation_emb.load(path, dataset+'_'+self.model_name+'_relation') 

 self.score_func.load(path, dataset+'_'+self.model_name)

from dgl-ke.

AlexMRuch commented on July 21, 2024

The model defaulted to save my entity and relation embeddings as FB15k_ComplEx_*.npy; however, I changed those files to SXSW_ComplEx_*.npy. I also changed the name of the ckpt folder from mentioning FB15k to mentioning SXSW2018. I did not include anything in --dataset, as in the instructions it doesn't seem to state that you need to when you use user-defined knowledge graph data: https://aws-dglke.readthedocs.io/en/latest/train_user_data.html. In my training command, I never included the --dataset option and my model trained fine (i.e., it loaded the correct training, validation, and testing data that was not the FB15k set:

DGLBACKEND=pytorch dglke_train \
--data_path results_SXSW_2018 \
--data_files entities.tsv relations.tsv all_ctups_10.tsv --format udd_hrt \
--model_name ComplEx \
--max_step 300000 --batch_size 1024 --neg_sample_size 1024 --neg_deg_sample --log_interval 1000 \
--hidden_dim 512 --gamma 128 --lr 0.085 -adv --regularization_coef 1.00E-9 \
--mix_cpu_gpu --num_proc 6 --num_thread 5 --gpu 0 1 --rel_part --async_update --force_sync_interval 1000

from dgl-ke.

AlexMRuch commented on July 21, 2024

I presumed that --dataset was only used if one wished to use a build-in knowledge graph:

Users can specify one of the [pre-defined] datasets with --dataset option in their tasks.

https://aws-dglke.readthedocs.io/en/latest/train_built_in.html

from dgl-ke.

classicsong commented on July 21, 2024

I presumed that --dataset was only used if one wished to use a build-in knowledge graph:
Users can specify one of the [pre-defined] datasets with --dataset option in their tasks.
https://aws-dglke.readthedocs.io/en/latest/train_built_in.html

You can also use it to name your own dataset. The embedding file name will also change.

from dgl-ke.

AlexMRuch commented on July 21, 2024

Ah, gotcha. I'll try that on my next run. If that option allows you to name the dataset, but then also lets you load a built-in dataset, then shouldn't the option be a required parameter to fill in? That's probably the simplest solution to this bug.

…

On Fri, May 8, 2020, 1:17 AM xiang song(charlie.song) < ***@***.***> wrote: I presumed that --dataset was only used if one wished to use a build-in knowledge graph: Users can specify one of the [pre-defined] datasets with --dataset option in their tasks. https://aws-dglke.readthedocs.io/en/latest/train_built_in.html You can also use it to name your own dataset. The embedding file name will also change. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#85 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFIYWOMRILWP6XZBXW4MGD3RQOIW3ANCNFSM4MXSB35Q> .

from dgl-ke.

classicsong commented on July 21, 2024

For UDD and Raw_UDD, user should provide a dataset name. Fixed in PR #105

from dgl-ke.

The dataset name in conf is wrong when users use their own dataset. about dgl-ke HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	def load_emb(self, path, dataset):
	"""Load the model.

	Parameters
	----------
	path : str
	Directory to load the model.
	dataset : str
	Dataset name as prefix to the saved embeddings.
	"""
	self.entity_emb.load(path, dataset+'_'+self.model_name+'_entity')
	self.relation_emb.load(path, dataset+'_'+self.model_name+'_relation')
	self.score_func.load(path, dataset+'_'+self.model_name)