Comments (3)
You can set it as 1. The results will be almost exactly the same (maybe a 1-2 MB difference at most)
from gpu_poor.
You are right that in fine-tuning/training there is no concept of "prompt Len" & "tokens to generate". Only max_seq_length is required. When you use the GitHub site, the max_seq_length = prompt Len + tokens to Generate
.
The prompt Len & context Len concept is only for inference time. For example, if you have a question which is made of 100 words(tokens) and you want to generate an answer of 500 tokens. Here the first 100 tokens are processed at once while the next 500 tokens are processed token by token. Therefore a distinction is needed. The first 100 words are your "prompt Len" & the next 500 words are your "tokens to generate"
from gpu_poor.
Thanks for your reply! So if I want to get the memory result of fine-tuning, I should set "tokens to generate" to 0 right? However that is forbidden (warning that it have to be positive)
from gpu_poor.
Related Issues (12)
- Results are inconsistent and is not reliable enough HOT 6
- Test results are different HOT 6
- API to use this repo HOT 1
- The memory usage in LoRA finetuning HOT 1
- DeepSpeed support HOT 1
- Name and size from same model can cause different result HOT 2
- compute in gpu_configs.json meaning HOT 1
- why batch size does not effect to memory usage in inference mode
- Missing License
- What's the meaning of magic numbers?
- Activation Memory
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gpu_poor.