Comments (4)
Given that they ran GPT-4 over the full 600k cc3m subset, the vast majority of the cost involved should come from the API calls to gpt-4 itself (somewhere between $10k~100k). The cost of GPU rental would be tiny in comparison.
from llava.
@Richar-Du Thanks for your question and for the interest in our work.
We pretrain our model on 595K data on 8x A100s for around 5 hours. The finetuning of the initial release takes ~10 hours on the same machine. We also find using a smaller subset can achieve similar performance. We'll update details of these experiments later.
Thanks @152334H for answering, but we would clarify that we do not need to run GPT-4 on CC3M for pretraining stage. We use the official CC3M captions directly, without using BLIP synthetic captions in the first release. Only the instruction tuning stage data is generated by GPT-4. You can refer to our released LLaVA-CC3M-Pretrain-595K and LLaVA-Instruct-150K for more details.
We'll update the information in our paper as well, thanks.
from llava.
Oh okay, my apologies for misunderstanding!
from llava.
@152334H No worries at all! Thank you for your contribution to the discussion, and looking forward to hearing more feedbacks from you guys!
from llava.
Related Issues (20)
- [Question] Can I use current script to finetune or LoRA the 1.6 models?
- [Question] i use /home/AI1/qiaok/models/llava-v1.6-vicuna-7b \ ,but the problem is happening HOT 1
- Multiple Images or Video as Input HOT 1
- Unable to determine the device handle for GPU0000:4F:00.0: Unknown Error
- [Question] Text only training?
- [Question] 4-bit Pretraining
- [Question] How to use the pretrain checkpoint HOT 3
- [Question] FT use 1.5 face a issue that tensor mismatch
- [Question] How to evaluate pretraining[image-text alignment] performance?
- [Question] Convert saved LLaVA checkpoint to SGLang HOT 4
- [Question] How to batch inference?
- Pre-training with MPT-7B went well but fine-tuning it further gives garbled/random outputs
- [Usage] None of the inputs have requires_grad=True. Gradients will be None HOT 1
- [Usage] Vicuna v1.5 is not downloaded automatically
- Raw image URL from GitHub
- Question about multiple images in the same prompt
- where is mm_vision_tower
- [Usage] Deepspeed Zero Stage 3 not able to shard the model
- [Usage] Extremely Slow Inference Speed
- [ERROR]: RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llava.