Giter Club home page Giter Club logo

Comments (11)

AlexMRuch avatar AlexMRuch commented on July 21, 2024

It may be nicer to add a datetime timestamp (e.g., model_dataset_20200507_hr_mn_sec, where 20200507 is 2020 year, 05 month, 07 day) at the end of the output instead of an incrementing integer. Right now for hyperparameter tuning I have about 20 logs that look about the same I can't easily remember which was the one I ran on Monday (== my best model).

from dgl-ke.

classicsong avatar classicsong commented on July 21, 2024

This could be a good point.
I think, you can use --save_path as a workaround now

from dgl-ke.

AlexMRuch avatar AlexMRuch commented on July 21, 2024

from dgl-ke.

AlexMRuch avatar AlexMRuch commented on July 21, 2024

FYI, I also realized that this saves the entity and relation embeddings as FB15k_ComplEx_entity.npy and FB15k_ComplEx_relation.npy instead of with the name of my dataset ("SXSW2018"). I manually updated it for now, but just wanted to give a heads up that it's not just the ckpts folder name that has FB15k.

Also, it a little odd that the folder name is MODEL_DATASET_ITERATION while the entity/relation embeddings are named DATASET_MODEL_TYPE.npy. For consistency, shouldn't they both be either MODEL_DATASET or DATASET_MODEL (preferably the latter -- DATASET_MODEL_*)?

from dgl-ke.

AlexMRuch avatar AlexMRuch commented on July 21, 2024

Manually changing the folder and entity/relation *.npy names still generates errors with dglke_eval:

amruch@wit:~/graphika/kg$ DGLBACKEND=pytorch dglke_eval \
> --data_path results_SXSW_2018 \
> --data_files entities.tsv relations.tsv all_ctups_10.tsv valid.tsv test.tsv --format udd_hrt \
> --model_name ComplEx \
> --hidden_dim 512 --gamma 128 \
> --mix_cpu_gpu --num_proc 6 --num_thread 5 --gpu 0 1 \
> --batch_size_eval 1024 --neg_sample_size_eval 10000 --eval_percent 20 \
> --model_path /home/amruch/graphika/kg/ckpts/SXSW2018_ComplEx_20200507/
Using backend: pytorch
Reading train triples....
Finished. Read 113336766 train triples.
Reading valid triples....
Finished. Read 5383497 valid triples.
Reading test triples....
Finished. Read 5666839 test triples.
Logs are being recorded at: /home/amruch/graphika/kg/ckpts/SXSW2018_ComplEx_20200507/eval.log
/usr/local/lib/python3.6/dist-packages/dgl/base.py:25: UserWarning: multigraph will be deprecated.DGL will treat all graphs as multigraph in the future.
  warnings.warn(msg, warn_type)
|valid|: 5383497|test|: 5666839
Traceback (most recent call last):
  File "/usr/local/bin/dglke_eval", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/dist-packages/dglke/eval.py", line 196, in main
    model = load_model_from_checkpoint(logger, args, n_entities, n_relations, ckpt_path)
  File "/usr/local/lib/python3.6/dist-packages/dglke/train_pytorch.py", line 109, in load_model_from_checkpoint
    model.load_emb(ckpt_path, args.dataset)
  File "/usr/local/lib/python3.6/dist-packages/dglke/models/general_models.py", line 178, in load_emb
    self.entity_emb.load(path, dataset+'_'+self.model_name+'_entity')
  File "/usr/local/lib/python3.6/dist-packages/dglke/models/pytorch/tensor_models.py", line 318, in load
    self.emb = th.Tensor(np.load(file_name))
  File "/home/amruch/.local/lib/python3.6/site-packages/numpy/lib/npyio.py", line 384, in load
    fid = open(file, "rb")
FileNotFoundError: [Errno 2] No such file or directory: '/home/amruch/graphika/kg/ckpts/SXSW2018_ComplEx_20200507/FB15k_ComplEx_entity.npy'```
^^^ It still expects things to be `FB15k`

from dgl-ke.

classicsong avatar classicsong commented on July 21, 2024

What is under /home/amruch/graphika/kg/ckpts/SXSW2018_ComplEx_20200507/ ?
It needs the dataset name as prefix in the name of saved embedding.

def load_emb(self, path, dataset):
"""Load the model.
Parameters
----------
path : str
Directory to load the model.
dataset : str
Dataset name as prefix to the saved embeddings.
"""
self.entity_emb.load(path, dataset+'_'+self.model_name+'_entity')
self.relation_emb.load(path, dataset+'_'+self.model_name+'_relation')
self.score_func.load(path, dataset+'_'+self.model_name)

from dgl-ke.

AlexMRuch avatar AlexMRuch commented on July 21, 2024

The model defaulted to save my entity and relation embeddings as FB15k_ComplEx_*.npy; however, I changed those files to SXSW_ComplEx_*.npy. I also changed the name of the ckpt folder from mentioning FB15k to mentioning SXSW2018. I did not include anything in --dataset, as in the instructions it doesn't seem to state that you need to when you use user-defined knowledge graph data: https://aws-dglke.readthedocs.io/en/latest/train_user_data.html. In my training command, I never included the --dataset option and my model trained fine (i.e., it loaded the correct training, validation, and testing data that was not the FB15k set:

DGLBACKEND=pytorch dglke_train \
--data_path results_SXSW_2018 \
--data_files entities.tsv relations.tsv all_ctups_10.tsv --format udd_hrt \
--model_name ComplEx \
--max_step 300000 --batch_size 1024 --neg_sample_size 1024 --neg_deg_sample --log_interval 1000 \
--hidden_dim 512 --gamma 128 --lr 0.085 -adv --regularization_coef 1.00E-9 \
--mix_cpu_gpu --num_proc 6 --num_thread 5 --gpu 0 1 --rel_part --async_update --force_sync_interval 1000

from dgl-ke.

AlexMRuch avatar AlexMRuch commented on July 21, 2024

I presumed that --dataset was only used if one wished to use a build-in knowledge graph:

Users can specify one of the [pre-defined] datasets with --dataset option in their tasks.

https://aws-dglke.readthedocs.io/en/latest/train_built_in.html

from dgl-ke.

classicsong avatar classicsong commented on July 21, 2024

I presumed that --dataset was only used if one wished to use a build-in knowledge graph:

Users can specify one of the [pre-defined] datasets with --dataset option in their tasks.

https://aws-dglke.readthedocs.io/en/latest/train_built_in.html

You can also use it to name your own dataset. The embedding file name will also change.

from dgl-ke.

AlexMRuch avatar AlexMRuch commented on July 21, 2024

from dgl-ke.

classicsong avatar classicsong commented on July 21, 2024

For UDD and Raw_UDD, user should provide a dataset name. Fixed in PR #105

from dgl-ke.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.