Comments (4)
Ahh, thank you so much @stas00 for the brief answers. I would look more into this content, and would share resources accordingly, because the one which I curated are already there and more well written :)
from ml-engineering.
Everything else that applies to training applies to fine-tuning. The only difference is that instead of starting from random weights you start with non-random weights.
Some finetuning techniques freeze all or some of the weights, which reduces the number of gradients - which reduces the communication overhead for when the grads are reduced and you need a lot less memory since you no longer need to allocate optim states + grads + master weights for the now frozen weights.
By understanding what type of training/finetuning you do as explained here https://github.com/stas00/ml-engineering/blob/master/performance/software.md#anatomy-of-models-memory you know how much GPU memory you need to place a single model replica, and then if you can afford you can multiply that by multiple replicas to speed up the training.
So if you want to train a 10B param model with the standard AdamW with mixed precision bf16 you know you need about 180GB of GPU memory for a single replica, with activations and batch size, seq_len you'd need more - and so 4x 80GB gpus (320) should be a good fit. If you want to train ~2x faster use 8 GPUs. If you want to train even faster, say 4x times, you'd use 2 nodes of 8 GPUs, except since inter-node communication is slower than intra-node it won't be 4x faster, but a bit less than that.
You can also speed up the training by choosing a faster GPU, if A100 is your baseline, and everything else being equal, with H100 you should be able to train 2-3x faster than A100. If you switch to fp8, you'd have another 2x speed multiplier.
LORA is a different calculation where your pretrained model is frozen, so those parts consume only 2B per param in half precision, so for a 10B param model you'd need only 20GB of memory, and the LORA part is much smaller, so here you'd easily fit onto a single 80GB GPU, and then you can speed up by adding more GPUs and/or using faster GPUs.
If you want to share your findings by all means don't hesitate to do so, @Anindyadeep
from ml-engineering.
Research diverse GPU models, such as NVIDIA GeForce RTX 3080 and Tesla V100. For instance, the RTX 3080 offers high VRAM suitable for certain tasks, while the Tesla V100 excels in compute-intensive processes.
analyze your fine-tuning task - Identify the model's memory requirements and computational intensity, which can influence GPU selection.
Experiment with configurations, adjusting batch sizes and learning rates. If you observe that the RTX 3080 is underutilized due to its smaller VRAM, you might opt for a GPU with higher VRAM like the Tesla V100 to fully leverage available resources.
Based on the computational demands of your task, decide on the number of GPUs.
Explore GPU rental costs; for example, if using cloud services, compare prices for GPUs like RTX 3080 and Tesla V100. Calculate the estimated cost, factoring in training time and potential pricing fluctuations.
from ml-engineering.
Research diverse GPU models, such as NVIDIA GeForce RTX 3080 and Tesla V100. For instance, the RTX 3080 offers high VRAM suitable for certain tasks, while the Tesla V100 excels in compute-intensive processes.
analyze your fine-tuning task - Identify the model's memory requirements and computational intensity, which can influence GPU selection.
Experiment with configurations, adjusting batch sizes and learning rates. If you observe that the RTX 3080 is underutilized due to its smaller VRAM, you might opt for a GPU with higher VRAM like the Tesla V100 to fully leverage available resources.
Based on the computational demands of your task, decide on the number of GPUs. Explore GPU rental costs; for example, if using cloud services, compare prices for GPUs like RTX 3080 and Tesla V100. Calculate the estimated cost, factoring in training time and potential pricing fluctuations.
That's some awesome suggestion, that you so much, will follow those.
from ml-engineering.
Related Issues (18)
- Parallel training hangs HOT 10
- Daisy chain batch jobs HOT 1
- Improve folder structure HOT 3
- Convert to bfloat16 failing HOT 2
- pip install -r build/requirements.txt fails due to github_md_utils HOT 3
- Clarification for gradient memory in mixed precision training HOT 3
- Quarto Site HOT 3
- Conflicting opinions about streaming data from cloud storage? HOT 2
- discuss the solutions to Not fully recovering spikes HOT 7
- ML
- TPU v4 has 1,200GB/s of mem bandwidth and not 2,400, right? HOT 1
- Question about changing precision post training HOT 2
- Question about the right hidden dim when using SwiGLU HOT 3
- Missing `hparams` section HOT 2
- Adding another logbook (kinda) HOT 2
- convert markdown to pdf HOT 10
- Minor Typo in emulate multi node HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ml-engineering.