Using the method of prompt tuning to finetune for various NLP tasks.
summarization.py
question-answering.py
machine_translation.py
batch_size
: 2 or 4 or 8. CUDA goes out of memory after this. Gradient accumulation has been used.num_epochs
: 10. However, early stopping has also been used.gradient clipping
: Gradients have been clipped with a norm of 1.0 to prevent exploding gradients.learning_rate
: 0.01optimiser
: AdamW ,metric
: Bleu score
- Best Loss achieved: 3.2
- Best Loss achieved: 1.32
- Bleu score on test set: 0.03
- Best Loss achieved: 4.56
- Bleu score on test set: