Could you please provide an estimate of training time of the model, let's say, on FB15

2 reasons For smaller KGs, if we use default tokenizer then a

Training time of KGT5 about kgt5 HOT 6 CLOSED

sm354 commented on August 26, 2024

Training time of KGT5

from kgt5.

Comments (6)

apoorvumang commented on August 26, 2024

It is hard to estimate training time for a dataset since each requires different number of epochs to reach reasonable performance. For FB15k-237, we did not achieve good performance even if we kept training indefinitely - best we achieved was around 0.27 MRR. So for comparison I'll give you the numbers for WN18RR:

For WN18RR, if we use definitions as part of entity, we had median tokenized entity length of around 20 tokens. We trained t5-small arch. model using 4 1080Ti GPUs (11GB memory each) with 64 batch size (so total 64*4 training examples per batch), max input and output seq length=60. In this setting, each training epoch takes roughly 4 minutes (about 173k examples in train set if you count both head and tail prediction). Also, each epoch consists of 679 training steps here.

The evaluation was done at 200k training steps = ~294 epochs, which is about 20 hours of training. However you do not need to train for this long to get reasonable results. Also, you can converge faster and achieve better results if you initialize with the pretrained LM weights (intfloat/SimKGC#1 (comment)) but then you can't be sure whether the model has seen the test data during pretraining or not.

By comparison, our results for Wikidata5M were with training for 50 epochs, while for WikiKG90Mv2 were for 1.5 epochs - however these training runs went on for approx 10 days since the datasets are quite large.

I hope this gives you some idea about the training time.

from kgt5.

sm354 commented on August 26, 2024

Thank you for the detailed response! On a related note (with SimKGC), did you also try KGT5 with pretrained weights?

from kgt5.

apoorvumang commented on August 26, 2024

Yes we did. For WikiKG90Mv2 it didn't make much difference except faster convergence (for T5-small. for larger models I don't know what will happen)

from kgt5.

sm354 commented on August 26, 2024

Okay, just out of curiosity - was there any specific reason of not using pretrained KGT5 for all the experiments?

from kgt5.

apoorvumang commented on August 26, 2024

2 reasons

For smaller KGs, if we use default tokenizer then a lot of tokens are never seen in the train triples. And if you want to use a custom tokenizer like we do with Wikidata5M so that whole vocab gets finetuned, then you can't use pretrained weights (as far as I know there's no easy solution to this).
People can have an argument (and rightly so) that model is predicting test triples correctly because it read Wikipedia during pretraining, and WikiData is based off of Wikipedia. So to prevent any potential 'test data leakage' we tried (and reported results) without pretrained weights

from kgt5.

sm354 commented on August 26, 2024

Thank you very much for providing the insights and clarifications! Closing the issue.

from kgt5.

Training time of KGT5 about kgt5 HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent