Comments (5)
Gocha, thank you very much!
from lmflow.
Thanks for your interest in LMFlow! Currently they are treated as being discarded at the end of each LISA interval for intermediate layers. If we need to maintain those m1 and m2 states, that will incur large memory consumptions, essentially making LISA the same as full-parameter training.
The suggestion of maintaining them in a smarter way is great! I think there can be some mechanisms in engineer to offload them to CPUs or disks occasionally, but this feature is still under implementation and not integrated yet. Hope this information can be helpful 😄
from lmflow.
Thanks for your interest in LMFlow! Currently they are treated as being discarded at the end of each LISA interval for intermediate layers. If we need to maintain those m1 and m2 states, that will incur large memory consumptions, essentially making LISA the same as full-parameter training.
The suggestion of maintaining them in a smarter way is great! I think there can be some mechanisms in engineer to offload them to CPUs or disks occasionally, but this feature is still under implementation and not integrated yet. Hope this information can be helpful 😄
Thank you for your prompt response!
I have another question regarding the implementation of AdamW in PyTorch. Specifically, does the native PyTorch implementation of AdamW accommodate dynamic adjustments, such as disregarding the states of frozen layers and initializing states for newly activated layers? Alternatively, have you customized the AdamW class to support these functionalities for Lisa?
from lmflow.
Thanks for your comments! It is a great question. In our paper, we avoid this risk by conducting experiments that run each LISA interval separately via loading & saving models each time, this makes the implementation easier for early-stage experiments.
We haven't examined our current implementation in LMFlow yet, but we have been monitoring the memory consumption and it is much lower than full-parameter training, so we conjectured that part is not a serious problem. If so, I think LISA's memory consumption in LMFlow can be further reduced and that would be great 😄
from lmflow.
Hi @research4pan,
Thanks for your great work.
I have a question regarding the implementation where the optimizer state of the freeze layer is discarded in LMFlow.
I've been trying to locate this particular section in the code, but I couldn't find any corresponding implementation in https://github.com/OptimalScale/LMFlow/blob/main/src/lmflow/pipeline/finetuner.py#L301.
Your help in figuring this out would be greatly appreciated.
Thanks for your time!
from lmflow.
Related Issues (20)
- Question HOT 8
- [BUG] LISA Finetune: AttributeError: 'ChatGLMModel' object has no attribute 'h' HOT 3
- [BUG] LISA: same loss regardless of lisa_activated_layers HOT 17
- is support llava model ? HOT 3
- Multiple rounds of training HOT 1
- Running install.sh after git clone requires over 200GB Ram HOT 6
- [DPO is available?] HOT 2
- Unable to activate conda environment on Colab HOT 7
- Cannot open the address http://lmflow.org:5000 HOT 4
- "trust_remote_code=True" problem HOT 1
- Questions about task tuning in medical domain HOT 5
- Evaluation error on PubMedQA dataset HOT 3
- Question Regarding Optimizer Reinitialization in Lisa Implementation HOT 4
- About using multiple GPUs to do lisa fine-tuning HOT 4
- How to set learning rate decay in lisa fine-tuning HOT 2
- [New Feature] Could someone share the finetuned diffusion model which is good at 256x256 resolution?
- Memory problem of Lisa finetuning HOT 5
- Does it support llama3? HOT 5
- Causal LM finetuning HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lmflow.