Comments (11)
It may be nicer to add a datetime timestamp (e.g., model_dataset_20200507_hr_mn_sec
, where 20200507 is 2020 year, 05 month, 07 day) at the end of the output instead of an incrementing integer. Right now for hyperparameter tuning I have about 20 logs that look about the same I can't easily remember which was the one I ran on Monday (== my best model).
from dgl-ke.
This could be a good point.
I think, you can use --save_path as a workaround now
from dgl-ke.
from dgl-ke.
FYI, I also realized that this saves the entity and relation embeddings as FB15k_ComplEx_entity.npy
and FB15k_ComplEx_relation.npy
instead of with the name of my dataset ("SXSW2018"). I manually updated it for now, but just wanted to give a heads up that it's not just the ckpts folder name that has FB15k.
Also, it a little odd that the folder name is MODEL_DATASET_ITERATION
while the entity/relation embeddings are named DATASET_MODEL_TYPE.npy
. For consistency, shouldn't they both be either MODEL_DATASET or DATASET_MODEL (preferably the latter -- DATASET_MODEL_*)?
from dgl-ke.
Manually changing the folder and entity/relation *.npy
names still generates errors with dglke_eval
:
amruch@wit:~/graphika/kg$ DGLBACKEND=pytorch dglke_eval \
> --data_path results_SXSW_2018 \
> --data_files entities.tsv relations.tsv all_ctups_10.tsv valid.tsv test.tsv --format udd_hrt \
> --model_name ComplEx \
> --hidden_dim 512 --gamma 128 \
> --mix_cpu_gpu --num_proc 6 --num_thread 5 --gpu 0 1 \
> --batch_size_eval 1024 --neg_sample_size_eval 10000 --eval_percent 20 \
> --model_path /home/amruch/graphika/kg/ckpts/SXSW2018_ComplEx_20200507/
Using backend: pytorch
Reading train triples....
Finished. Read 113336766 train triples.
Reading valid triples....
Finished. Read 5383497 valid triples.
Reading test triples....
Finished. Read 5666839 test triples.
Logs are being recorded at: /home/amruch/graphika/kg/ckpts/SXSW2018_ComplEx_20200507/eval.log
/usr/local/lib/python3.6/dist-packages/dgl/base.py:25: UserWarning: multigraph will be deprecated.DGL will treat all graphs as multigraph in the future.
warnings.warn(msg, warn_type)
|valid|: 5383497|test|: 5666839
Traceback (most recent call last):
File "/usr/local/bin/dglke_eval", line 11, in <module>
sys.exit(main())
File "/usr/local/lib/python3.6/dist-packages/dglke/eval.py", line 196, in main
model = load_model_from_checkpoint(logger, args, n_entities, n_relations, ckpt_path)
File "/usr/local/lib/python3.6/dist-packages/dglke/train_pytorch.py", line 109, in load_model_from_checkpoint
model.load_emb(ckpt_path, args.dataset)
File "/usr/local/lib/python3.6/dist-packages/dglke/models/general_models.py", line 178, in load_emb
self.entity_emb.load(path, dataset+'_'+self.model_name+'_entity')
File "/usr/local/lib/python3.6/dist-packages/dglke/models/pytorch/tensor_models.py", line 318, in load
self.emb = th.Tensor(np.load(file_name))
File "/home/amruch/.local/lib/python3.6/site-packages/numpy/lib/npyio.py", line 384, in load
fid = open(file, "rb")
FileNotFoundError: [Errno 2] No such file or directory: '/home/amruch/graphika/kg/ckpts/SXSW2018_ComplEx_20200507/FB15k_ComplEx_entity.npy'```
^^^ It still expects things to be `FB15k`
from dgl-ke.
What is under /home/amruch/graphika/kg/ckpts/SXSW2018_ComplEx_20200507/ ?
It needs the dataset name as prefix in the name of saved embedding.
dgl-ke/python/dglke/models/general_models.py
Lines 168 to 180 in 27c9b98
from dgl-ke.
The model defaulted to save my entity and relation embeddings as FB15k_ComplEx_*.npy
; however, I changed those files to SXSW_ComplEx_*.npy
. I also changed the name of the ckpt folder from mentioning FB15k
to mentioning SXSW2018
. I did not include anything in --dataset
, as in the instructions it doesn't seem to state that you need to when you use user-defined knowledge graph data: https://aws-dglke.readthedocs.io/en/latest/train_user_data.html. In my training command, I never included the --dataset
option and my model trained fine (i.e., it loaded the correct training, validation, and testing data that was not the FB15k set:
DGLBACKEND=pytorch dglke_train \
--data_path results_SXSW_2018 \
--data_files entities.tsv relations.tsv all_ctups_10.tsv --format udd_hrt \
--model_name ComplEx \
--max_step 300000 --batch_size 1024 --neg_sample_size 1024 --neg_deg_sample --log_interval 1000 \
--hidden_dim 512 --gamma 128 --lr 0.085 -adv --regularization_coef 1.00E-9 \
--mix_cpu_gpu --num_proc 6 --num_thread 5 --gpu 0 1 --rel_part --async_update --force_sync_interval 1000
from dgl-ke.
I presumed that --dataset
was only used if one wished to use a build-in knowledge graph:
Users can specify one of the [pre-defined] datasets with --dataset option in their tasks.
https://aws-dglke.readthedocs.io/en/latest/train_built_in.html
from dgl-ke.
I presumed that
--dataset
was only used if one wished to use a build-in knowledge graph:Users can specify one of the [pre-defined] datasets with --dataset option in their tasks.
https://aws-dglke.readthedocs.io/en/latest/train_built_in.html
You can also use it to name your own dataset. The embedding file name will also change.
from dgl-ke.
from dgl-ke.
For UDD and Raw_UDD, user should provide a dataset name. Fixed in PR #105
from dgl-ke.
Related Issues (20)
- Upgrade DGL dependency HOT 2
- Can DGL_KE models be implemented on dynamic knowledge graphs? HOT 1
- Force dtype to int64 to ensure that we don't index with non-long tensor
- IndexError: list index out of range when training on raw user defined knowledge graph HOT 4
- Support Adam or Adagrad HOT 8
- Can not install dgl 0.4.3 HOT 4
- DGLBACKEND s not recognized as an internal or external command HOT 1
- No module named 'ogb
- whether just assign vertexes but not the edges together with on graph partition when use METIS
- [BUG] Quick start example code does not work HOT 4
- dgl.__version__ >= 0.8 breaks on partition.py HOT 2
- RuntimeError: Cannot re-initialize CUDA in forked subprocess HOT 1
- Multi-gpu training is not effective on specific cases
- `graph.HeteroGraph` Error happened when running example HOT 1
- !DGLBACKEND=pytorch dglke_train Not Working HOT 1
- pytorch dglke_train Not Working, Expected type graph.Graph but get graph.HeteroGraph HOT 3
- can't train my KG ,it keeps telling me 'AssertionError: test set is not provided' HOT 1
- Installation error, no corresponding version HOT 1
- DGL-KE TransR Predict Error
- 'dgl' has no attribute '_deprecate'
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dgl-ke.