Giter Club home page Giter Club logo

Comments (6)

apoorvumang avatar apoorvumang commented on August 26, 2024

It is hard to estimate training time for a dataset since each requires different number of epochs to reach reasonable performance. For FB15k-237, we did not achieve good performance even if we kept training indefinitely - best we achieved was around 0.27 MRR. So for comparison I'll give you the numbers for WN18RR:

For WN18RR, if we use definitions as part of entity, we had median tokenized entity length of around 20 tokens. We trained t5-small arch. model using 4 1080Ti GPUs (11GB memory each) with 64 batch size (so total 64*4 training examples per batch), max input and output seq length=60. In this setting, each training epoch takes roughly 4 minutes (about 173k examples in train set if you count both head and tail prediction). Also, each epoch consists of 679 training steps here.

The evaluation was done at 200k training steps = ~294 epochs, which is about 20 hours of training. However you do not need to train for this long to get reasonable results. Also, you can converge faster and achieve better results if you initialize with the pretrained LM weights (intfloat/SimKGC#1 (comment)) but then you can't be sure whether the model has seen the test data during pretraining or not.

By comparison, our results for Wikidata5M were with training for 50 epochs, while for WikiKG90Mv2 were for 1.5 epochs - however these training runs went on for approx 10 days since the datasets are quite large.

I hope this gives you some idea about the training time.

from kgt5.

sm354 avatar sm354 commented on August 26, 2024

Thank you for the detailed response! On a related note (with SimKGC), did you also try KGT5 with pretrained weights?

from kgt5.

apoorvumang avatar apoorvumang commented on August 26, 2024

Yes we did. For WikiKG90Mv2 it didn't make much difference except faster convergence (for T5-small. for larger models I don't know what will happen)

from kgt5.

sm354 avatar sm354 commented on August 26, 2024

Okay, just out of curiosity - was there any specific reason of not using pretrained KGT5 for all the experiments?

from kgt5.

apoorvumang avatar apoorvumang commented on August 26, 2024

2 reasons

  1. For smaller KGs, if we use default tokenizer then a lot of tokens are never seen in the train triples. And if you want to use a custom tokenizer like we do with Wikidata5M so that whole vocab gets finetuned, then you can't use pretrained weights (as far as I know there's no easy solution to this).
  2. People can have an argument (and rightly so) that model is predicting test triples correctly because it read Wikipedia during pretraining, and WikiData is based off of Wikipedia. So to prevent any potential 'test data leakage' we tried (and reported results) without pretrained weights

from kgt5.

sm354 avatar sm354 commented on August 26, 2024

Thank you very much for providing the insights and clarifications! Closing the issue.

from kgt5.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.