baai-dcai / bunny Goto Github PK
View Code? Open in Web Editor NEWA family of lightweight multimodal models.
License: Apache License 2.0
A family of lightweight multimodal models.
License: Apache License 2.0
Hello! In this line of code, Bunny uses bias=False on the lm_head layer:
However in the original Phi code, it uses bias:
I am trying Bunny-v1.0-3B through various quantization tools and faster model APIs that support Phi, but are failing due to missing this layer in the weights. They cannot easily be disabled it seems. Any suggestions on how to fix it in the model?
啥时候支持下ollama方便部署测试
你好,想问下,中文版模型的训练集用的是什么?
作者你好,我在复现bunny_v1_0-2B-zh的过程中,pretrain阶段训出来保存的config.json中model_type是bunny-qwen,而开源中的config.json的model_type是bunny-qwen2,这个是什么原因?另外请问你在huggingface中开源的configuration_bunny_qwen2.py和modeling_bunny_qwen2.py这两个文件,是做什么用的?训练时是需要把这两个文件替换上去吗?
Dear Developers:
Thank you to the BAAI team for open-sourcing the Bunny model. I've been actively exploring it these past few days. I have a few doubts regarding the deployment of the model, and I hope to get answers from the BAAI official technical team. Nevertheless, I am extremely grateful! The first question is: I want to know the GPU running conditions required for several versions of the model. For example, the Bunny-v1_0-3B full parameter version and the bunny-phi-2-siglip-lora version. so can you provide a list for comparison and clarification? What are the officially recommended GPU models and VRAM sizes?The second question is: Can this model integrate the controller, Web-UI server, and Model Worker directly into one bash command ? Currently, it seems that three separate bash commands need to be executed to start the controller, WebUI, and model inference. This seems to be considered for "microservices architecture" or "distributed system architecture". Is my understanding correct?If we deploy using Docker containers and use Kubernetes as the container visual management framework, can an official post be provided to explain in more detail the standard deployment process?
by Isaac Wei Ran
Guangzhou, China, 7th March 2024
docker pull了之后可以提醒大家run一下,然后再在上面安装apex之类的
不过总体很清晰,点赞
Traceback (most recent call last):
File "/research/zhangzr/Bunny/bunny/eval/model_vqa.py", line 112, in <module>
eval_model(args)
File "/research/zhangzr/Bunny/bunny/eval/model_vqa.py", line 64, in eval_model
output_ids = model.generate(
File "/research/chengruogu/anaconda3/envs/bunny/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/research/chengruogu/anaconda3/envs/bunny/lib/python3.10/site-packages/transformers/generation/utils.py", line 1544, in generate
return self.greedy_search(
File "/research/chengruogu/anaconda3/envs/bunny/lib/python3.10/site-packages/transformers/generation/utils.py", line 2404, in greedy_search
outputs = self(
File "/research/chengruogu/anaconda3/envs/bunny/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/research/chengruogu/anaconda3/envs/bunny/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/research/zhangzr/Bunny/bunny/model/language_model/bunny_qwen2.py", line 72, in forward
return super().forward(
File "/research/zhangzr/Bunny/bunny/model/language_model/qwen2/modeling_qwen2.py", line 1174, in forward
outputs = self.model(
File "/research/chengruogu/anaconda3/envs/bunny/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/research/chengruogu/anaconda3/envs/bunny/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/research/zhangzr/Bunny/bunny/model/language_model/qwen2/modeling_qwen2.py", line 1021, in forward
attention_mask = _prepare_4d_causal_attention_mask_for_sdpa(
File "/research/chengruogu/anaconda3/envs/bunny/lib/python3.10/site-packages/transformers/modeling_attn_mask_utils.py", line 398, in _prepare_4d_causal_attention_mask_for_sdpa
expanded_4d_mask = attn_mask_converter.to_4d(
File "/research/chengruogu/anaconda3/envs/bunny/lib/python3.10/site-packages/transformers/modeling_attn_mask_utils.py", line 137, in to_4d
expanded_attn_mask = causal_4d_mask.masked_fill(expanded_attn_mask.bool(), torch.finfo(dtype).min)
RuntimeError: The size of tensor a (862) must match the size of tensor b (1723) at non-singleton dimension 3
Hi, I got the following error. Thanks.
ModuleNotFoundError: No module named 'transformers_modules.BAAI.Bunny-v1'
Great work! Would you please tell me when will you release the training data? pre-training and fine-tuning.
Hello,
I noticed in the README that it mentions, "We use a better strategy to train Bunny-Llama-3-8B-V, which will be open-sourced soon!" Could you briefly describe this better strategy?
Thank you!
I am wondering whether Bunny support Chinese finetune/inference?
Bunny-v1.0-2B-zh 模型有时候用英文回答问题
def chat(image_url, prompt):
image = read_image(image_url)
image_tensor = model.process_images([image], model.config).to(dtype=model.dtype)
text = f"你是一个非常好的人工智能助手,能够非常出色的和用户交谈. USER: <image>\n{prompt} ASSISTANT:"
text_chunks = [tokenizer(chunk).input_ids for chunk in text.split('<image>')]
input_ids = torch.tensor(text_chunks[0] + [-200] + text_chunks[1], dtype=torch.long).unsqueeze(0)
output_ids = model.generate(
input_ids,
images=image_tensor,
max_new_tokens=100,
use_cache=True)[0]
return tokenizer.decode(output_ids[input_ids.shape[1]:], skip_special_tokens=True).strip()
是不是 Qwen2 1.8B 的对话模版不一样?
Thanks for your great work! Can you support MiniCPM backbone?
Can u share the training loss curve?
请问筛选高质量数据的多阶段过滤数据的脚本有开源吗
Hello,
It's a great work! And there are several questions:
We find that LoRA empirically leads to better performance than fully tuning across all combinations of model architectures, probably because smaller models are more susceptible to catastrophic forgetting, while LoRA tuning alleviates this issue.
By " fully tuning across all combinations of model architectures", do you mean finetune the SigLIP encoder + projector + phi2, or just projector + phi2? And why LoRA tuning can alleviate catastrophic forgetting (sorry I am not familiar with this...)? Note that in this paper, using LoRA cannot avoid the model overfitting the finetuning dataset.
To avoid overfitting, it seems that researchers only train LLaVA for one epoch (both the pretrain and finetuning phase). Therefore, the loss curve may not converge to the lowest point.
For example, this is my loss curve and learning rate schedule during the pretrain phase:
And the loss curve and lr schedule in the finetune phase:
I guess the network does not converge at all...so how do you determine these hyperparameters? Do you select a set of hyperparameters which makes your network fully converges? Or you just select a set of hyperparameters which has the best benchmark performance?
Best,
Starcycle
Can't wait to try, but the gradio page can't launch, what version of gradio you are using? The cli is slow to load model.safetensors
With the example parameters some time the generation is repeat. What need to adjust? thanks
# generate
output_ids = model.generate(
input_ids,
images=image_tensor,
max_new_tokens=100,
use_cache=True)[0]
root@ubuntu-Z690:/mnt/workspace/.cache/modelscope/BAAI/Bunny-v1___0-3B# python -m bunny.serve.model_worker --host 0.0.0.0 --controller http://localhost:10000 --port 40000 --worker http://localhost:40000 --model-path /mnt/workspace/.cache/modelscope/BAAI/
2024-04-26 11:49:04.450276: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0
.
2024-04-26 11:49:04.451693: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-04-26 11:49:04.469047: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-04-26 11:49:04.469064: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-04-26 11:49:04.469078: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-04-26 11:49:04.472914: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-04-26 11:49:04.473033: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-26 11:49:04.905731: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/Bunny/bunny/serve/model_worker.py", line 20, in
from bunny.model.builder import load_pretrained_model
File "/Bunny/bunny/model/init.py", line 1, in
from .language_model.bunny_phi import BunnyPhiForCausalLM, BunnyPhiConfig
File "/Bunny/bunny/model/language_model/bunny_phi.py", line 11, in
from ..bunny_arch import BunnyMetaModel, BunnyMetaForCausalLM
File "/Bunny/bunny/model/bunny_arch.py", line 6, in
from .multimodal_projector.builder import build_vision_projector
File "/Bunny/bunny/model/multimodal_projector/builder.py", line 5, in
from timm.layers.norm_act import LayerNormAct2d
ModuleNotFoundError: No module named 'timm.layers'
Can the bunny models be loaded in 4bit or 8bit quantised modes?
Great job, have you tried dbscan? Which one do you think is better using kmeans or dbscan? I think it can be encapsulated into a general data processing program
Hi, first of all really impressive work!
The repo states that the paper used 8 A100 GPUs for training, but can I ask how many hours training took with those GPUs?
Thank you!
Thanks for opensource your great work! I'm interested in experimenting with other backbone models and vision encoders. However, I encountered an issue when attempting to load merged weights from a locally saved path. I received the following error:
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/BoyaWu10/bunny-phi-2-eva-lora/resolve/main/configuration_phi.py
.
I tried loading from Hugging Face using the following code:
model = AutoModelForCausalLM.from_pretrained(
'/path/to/local/weights',
torch_dtype=torch.float16,
device_map='auto',
trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(
'/path/to/local/weights',
trust_remote_code=True)
Any suggestions?
I'm trying to spin up the server so I can run this for inference as described in the README and I've hit a few issues.
First: demo_3.png and demo_4.png don't exist. This is easy to fix here:
Bunny/bunny/serve/gradio_web_server.py
Lines 348 to 349 in 516437e
should be edited to example_1.png and example_2.png.
Second: (and why this isn't just a PR) I can't get the service to start with the model to load phi-2. I'm just trying to get the inference demo working.
If I run the model_worker
service with --model-type phi-2
then I get a crash KeyError: 'BunnyPhiConfig'
when it tries to load the tokenizer. It looks like you try to configure this config somewhere but it doesn't get added to the huggingface transformers list of known configs for some reason.
Are there other steps required (e.g., modifying the huggingface code)?
Third: I don't understand what set of model paths I should be passing to run the service if I don't want to fine-tune anything. Could you give an example for what model-path
should be? I've downloaded bunny-phi-2-siglip-lora
and I'm passing this as the path, but I can't test this because of the prior crash.
I'm pretty sure I have the correct versions of everything installed. Have you tried following the readme on a clean machine install to verify it runs as expected?
Thanks for your great work! Can you provide the script used to filter raw data from LAION-2B?
Great work! Would you please tell me when will you release the training data? pre-training and fine-tuning.
I am trying to finetune my model for a specific task using my own dataset. I have already format the dataset correctly according to the docs. Here I got weird error of train.py: error: the following arguments are required: --output_dir
in the subprocesses even I already put it in my arguments. Do you have ideas what might be the cause of this? Thanks!
This is my finetune.sh
MODEL_PATH=/image_text/models/Bunny-v1_0-3B
MODEL_TYPE=phi-2
PRETRAIN_DIR=bunny-$MODEL_TYPE-pretrain
OUTPUT_DIR=bunny-$MODEL_TYPE-test
# JSON LIST
DATA_PATH=image_text/train_list/train_single_image.json
IMAGE_FOLDER=image_text/datasets
mkdir -p ./checkpoints-$MODEL_TYPE/$OUTPUT_DIR
deepspeed bunny/train/train.py \
--deepspeed ./script/deepspeed/zero3.json \
--model_name_or_path $MODEL_PATH \
--model_type $MODEL_TYPE \
--version bunny \
--data_path $DATA_PATH \
--image_folder $IMAGE_FOLDER \
--vision_tower google/siglip-so400m-patch14-384 \
# --pretrain_mm_mlp_adapter ./checkpoints-pretrain/$PRETRAIN_DIR/mm_projector.bin \
--mm_projector_type mlp2x_gelu \
--image_aspect_ratio pad \
--group_by_modality_length False \
--bf16 True \
--output_dir ./checkpoints-$MODEL_TYPE/$OUTPUT_DIR \
--num_train_epochs 1 \
--per_device_train_batch_size 8 \
--per_device_eval_batch_size 4 \
--gradient_accumulation_steps 2 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 500 \
--save_total_limit 1 \
--learning_rate 1e-5 \
--weight_decay 0. \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--tf32 True \
--model_max_length 2048 \
--gradient_checkpointing True \
--dataloader_num_workers 4 \
--lazy_preprocess True \
--report_to none | tee 2>&1 ./checkpoints-$MODEL_TYPE/$OUTPUT_DIR/log.txt
This is the error I got.
root@sv:/image_text/Bunny# [2024-04-19 16:06:53,436] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2
[WARNING] using untested triton version (2.2.0), only 1.0.0 is known to be compatible
[2024-04-19 16:06:54,635] [WARNING] [runner.py:202:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2024-04-19 16:06:54,636] [INFO] [runner.py:568:main] cmd = /usr/bin/python3 -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgMywgNCwgNSwgNiwgN119 --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None bunny/train/train.py --deepspeed ./script/deepspeed/zero3.json --model_name_or_path BAAI/Bunny-v1_0-3B --model_type phi-2 --version bunny --data_path /image_text/train_list/train_impression_single_image.json --image_folder /image_text/datasets --vision_tower google/siglip-so400m-patch14-384
[2024-04-19 16:06:57,324] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2
[WARNING] using untested triton version (2.2.0), only 1.0.0 is known to be compatible
[2024-04-19 16:06:59,716] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_PACKAGE_VERSION=2.16.2-1
[2024-04-19 16:06:59,716] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_DEV_PACKAGE_VERSION=2.16.2-1
[2024-04-19 16:06:59,717] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_PACKAGE_NAME=libnccl2
[2024-04-19 16:06:59,717] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_DEV_PACKAGE_NAME=libnccl-dev
[2024-04-19 16:06:59,717] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_PACKAGE=libnccl2=2.16.2-1+cuda11.8
[2024-04-19 16:06:59,717] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_DEV_PACKAGE=libnccl-dev=2.16.2-1+cuda11.8
[2024-04-19 16:06:59,717] [INFO] [launch.py:138:main] 0 NCCL_VERSION=2.16.2-1
[2024-04-19 16:06:59,717] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]}
[2024-04-19 16:06:59,717] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=8, node_rank=0
[2024-04-19 16:06:59,717] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]})
[2024-04-19 16:06:59,717] [INFO] [launch.py:163:main] dist_world_size=8
[2024-04-19 16:06:59,717] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
[2024-04-19 16:06:59,737] [INFO] [launch.py:253:main] process 4574 spawned with command: ['/usr/bin/python3', '-u', 'bunny/train/train.py', '--local_rank=0', '--deepspeed', './script/deepspeed/zero3.json', '--model_name_or_path', 'BAAI/Bunny-v1_0-3B', '--model_type', 'phi-2', '--version', 'bunny', '--data_path', '/image_text/train_list/train_impression_single_image.json', '--image_folder', '/image_text/datasets', '--vision_tower', 'google/siglip-so400m-patch14-384']
[2024-04-19 16:06:59,748] [INFO] [launch.py:253:main] process 4575 spawned with command: ['/usr/bin/python3', '-u', 'bunny/train/train.py', '--local_rank=1', '--deepspeed', './script/deepspeed/zero3.json', '--model_name_or_path', 'BAAI/Bunny-v1_0-3B', '--model_type', 'phi-2', '--version', 'bunny', '--data_path', '/image_text/train_list/train_impression_single_image.json', '--image_folder', '/image_text/datasets', '--vision_tower', 'google/siglip-so400m-patch14-384']
[2024-04-19 16:06:59,761] [INFO] [launch.py:253:main] process 4576 spawned with command: ['/usr/bin/python3', '-u', 'bunny/train/train.py', '--local_rank=2', '--deepspeed', './script/deepspeed/zero3.json', '--model_name_or_path', 'BAAI/Bunny-v1_0-3B', '--model_type', 'phi-2', '--version', 'bunny', '--data_path', '/image_text/train_list/train_impression_single_image.json', '--image_folder', '/image_text/datasets', '--vision_tower', 'google/siglip-so400m-patch14-384']
[2024-04-19 16:06:59,773] [INFO] [launch.py:253:main] process 4577 spawned with command: ['/usr/bin/python3', '-u', 'bunny/train/train.py', '--local_rank=3', '--deepspeed', './script/deepspeed/zero3.json', '--model_name_or_path', 'BAAI/Bunny-v1_0-3B', '--model_type', 'phi-2', '--version', 'bunny', '--data_path', '/image_text/train_list/train_impression_single_image.json', '--image_folder', '/image_text/datasets', '--vision_tower', 'google/siglip-so400m-patch14-384']
[2024-04-19 16:06:59,791] [INFO] [launch.py:253:main] process 4579 spawned with command: ['/usr/bin/python3', '-u', 'bunny/train/train.py', '--local_rank=4', '--deepspeed', './script/deepspeed/zero3.json', '--model_name_or_path', 'BAAI/Bunny-v1_0-3B', '--model_type', 'phi-2', '--version', 'bunny', '--data_path', '/image_text/train_list/train_impression_single_image.json', '--image_folder', '/image_text/datasets', '--vision_tower', 'google/siglip-so400m-patch14-384']
[2024-04-19 16:06:59,810] [INFO] [launch.py:253:main] process 4581 spawned with command: ['/usr/bin/python3', '-u', 'bunny/train/train.py', '--local_rank=5', '--deepspeed', './script/deepspeed/zero3.json', '--model_name_or_path', 'BAAI/Bunny-v1_0-3B', '--model_type', 'phi-2', '--version', 'bunny', '--data_path', '/image_text/train_list/train_impression_single_image.json', '--image_folder', '/image_text/datasets', '--vision_tower', 'google/siglip-so400m-patch14-384']
[2024-04-19 16:06:59,829] [INFO] [launch.py:253:main] process 4584 spawned with command: ['/usr/bin/python3', '-u', 'bunny/train/train.py', '--local_rank=6', '--deepspeed', './script/deepspeed/zero3.json', '--model_name_or_path', 'BAAI/Bunny-v1_0-3B', '--model_type', 'phi-2', '--version', 'bunny', '--data_path', '/image_text/train_list/train_impression_single_image.json', '--image_folder', '/image_text/datasets', '--vision_tower', 'google/siglip-so400m-patch14-384']
[2024-04-19 16:06:59,848] [INFO] [launch.py:253:main] process 4586 spawned with command: ['/usr/bin/python3', '-u', 'bunny/train/train.py', '--local_rank=7', '--deepspeed', './script/deepspeed/zero3.json', '--model_name_or_path', 'BAAI/Bunny-v1_0-3B', '--model_type', 'phi-2', '--version', 'bunny', '--data_path', '/image_text/train_list/train_impression_single_image.json', '--image_folder', '/image_text/datasets', '--vision_tower', 'google/siglip-so400m-patch14-384']
usage: train.py [-h] [--model_name_or_path MODEL_NAME_OR_PATH] [--model_type MODEL_TYPE]
[--version VERSION] [--freeze_backbone [FREEZE_BACKBONE]]
[--tune_mm_mlp_adapter [TUNE_MM_MLP_ADAPTER]] [--vision_tower VISION_TOWER]
[--pretrain_mm_mlp_adapter PRETRAIN_MM_MLP_ADAPTER]
[--mm_projector_type MM_PROJECTOR_TYPE] [--data_path DATA_PATH]
[--lazy_preprocess [LAZY_PREPROCESS]] [--is_multimodal [IS_MULTIMODAL]]
[--no_is_multimodal] [--image_folder IMAGE_FOLDER]
[--image_aspect_ratio IMAGE_ASPECT_RATIO] --output_dir OUTPUT_DIR
[--overwrite_output_dir [OVERWRITE_OUTPUT_DIR]] [--do_train [DO_TRAIN]]
[--do_eval [DO_EVAL]] [--do_predict [DO_PREDICT]]
[--evaluation_strategy {no,steps,epoch}]
[--prediction_loss_only [PREDICTION_LOSS_ONLY]]
[--per_device_train_batch_size PER_DEVICE_TRAIN_BATCH_SIZE]
[--per_device_eval_batch_size PER_DEVICE_EVAL_BATCH_SIZE]
[--per_gpu_train_batch_size PER_GPU_TRAIN_BATCH_SIZE]
[--per_gpu_eval_batch_size PER_GPU_EVAL_BATCH_SIZE]
[--gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS]
[--eval_accumulation_steps EVAL_ACCUMULATION_STEPS] [--eval_delay EVAL_DELAY]
[--learning_rate LEARNING_RATE] [--weight_decay WEIGHT_DECAY]
[--adam_beta1 ADAM_BETA1] [--adam_beta2 ADAM_BETA2]
[--adam_epsilon ADAM_EPSILON] [--max_grad_norm MAX_GRAD_NORM]
[--num_train_epochs NUM_TRAIN_EPOCHS] [--max_steps MAX_STEPS]
[--lr_scheduler_type {linear,cosine,cosine_with_restarts,polynomial,constant,constant_with_warmup,inverse_sqrt,reduce_lr_on_plateau}]
[--lr_scheduler_kwargs LR_SCHEDULER_KWARGS] [--warmup_ratio WARMUP_RATIO]
[--warmup_steps WARMUP_STEPS]
[--log_level {detail,debug,info,warning,error,critical,passive}]
[--log_level_replica {detail,debug,info,warning,error,critical,passive}]
[--log_on_each_node [LOG_ON_EACH_NODE]] [--no_log_on_each_node]
[--logging_dir LOGGING_DIR] [--logging_strategy {no,steps,epoch}]
[--logging_first_step [LOGGING_FIRST_STEP]] [--logging_steps LOGGING_STEPS]
[--logging_nan_inf_filter [LOGGING_NAN_INF_FILTER]]
[--no_logging_nan_inf_filter] [--save_strategy {no,steps,epoch}]
[--save_steps SAVE_STEPS] [--save_total_limit SAVE_TOTAL_LIMIT]
[--save_safetensors [SAVE_SAFETENSORS]] [--no_save_safetensors]
[--save_on_each_node [SAVE_ON_EACH_NODE]]
[--save_only_model [SAVE_ONLY_MODEL]] [--no_cuda [NO_CUDA]]
[--use_cpu [USE_CPU]] [--use_mps_device [USE_MPS_DEVICE]] [--seed SEED]
[--data_seed DATA_SEED] [--jit_mode_eval [JIT_MODE_EVAL]]
[--use_ipex [USE_IPEX]] [--bf16 [BF16]] [--fp16 [FP16]]
[--fp16_opt_level FP16_OPT_LEVEL]
[--half_precision_backend {auto,apex,cpu_amp}]
[--bf16_full_eval [BF16_FULL_EVAL]] [--fp16_full_eval [FP16_FULL_EVAL]]
[--tf32 TF32] [--local_rank LOCAL_RANK]
[--ddp_backend {nccl,gloo,mpi,ccl,hccl}] [--tpu_num_cores TPU_NUM_CORES]
[--tpu_metrics_debug [TPU_METRICS_DEBUG]] [--debug DEBUG [DEBUG ...]]
[--dataloader_drop_last [DATALOADER_DROP_LAST]] [--eval_steps EVAL_STEPS]
[--dataloader_num_workers DATALOADER_NUM_WORKERS]
[--dataloader_prefetch_factor DATALOADER_PREFETCH_FACTOR]
[--past_index PAST_INDEX] [--run_name RUN_NAME] [--disable_tqdm DISABLE_TQDM]
[--remove_unused_columns [REMOVE_UNUSED_COLUMNS]]
[--label_names LABEL_NAMES [LABEL_NAMES ...]]
[--load_best_model_at_end [LOAD_BEST_MODEL_AT_END]]
[--metric_for_best_model METRIC_FOR_BEST_MODEL]
[--greater_is_better GREATER_IS_BETTER]
[--ignore_data_skip [IGNORE_DATA_SKIP]] [--fsdp FSDP]
[--fsdp_min_num_params FSDP_MIN_NUM_PARAMS] [--fsdp_config FSDP_CONFIG]
[--fsdp_transformer_layer_cls_to_wrap FSDP_TRANSFORMER_LAYER_CLS_TO_WRAP]
[--accelerator_config ACCELERATOR_CONFIG] [--deepspeed DEEPSPEED]
[--label_smoothing_factor LABEL_SMOOTHING_FACTOR] [--optim OPTIM]
[--optim_args OPTIM_ARGS] [--adafactor [ADAFACTOR]]
[--group_by_length [GROUP_BY_LENGTH]]
[--length_column_name LENGTH_COLUMN_NAME]
[--report_to REPORT_TO [REPORT_TO ...]]
[--ddp_find_unused_parameters DDP_FIND_UNUSED_PARAMETERS]
[--ddp_bucket_cap_mb DDP_BUCKET_CAP_MB]
[--ddp_broadcast_buffers DDP_BROADCAST_BUFFERS]
[--dataloader_pin_memory [DATALOADER_PIN_MEMORY]]
[--no_dataloader_pin_memory]
[--dataloader_persistent_workers [DATALOADER_PERSISTENT_WORKERS]]
[--skip_memory_metrics [SKIP_MEMORY_METRICS]] [--no_skip_memory_metrics]
[--use_legacy_prediction_loop [USE_LEGACY_PREDICTION_LOOP]]
[--push_to_hub [PUSH_TO_HUB]]
[--resume_from_checkpoint RESUME_FROM_CHECKPOINT]
[--hub_model_id HUB_MODEL_ID]
[--hub_strategy {end,every_save,checkpoint,all_checkpoints}]
[--hub_token HUB_TOKEN] [--hub_private_repo [HUB_PRIVATE_REPO]]
[--hub_always_push [HUB_ALWAYS_PUSH]]
[--gradient_checkpointing [GRADIENT_CHECKPOINTING]]
[--gradient_checkpointing_kwargs GRADIENT_CHECKPOINTING_KWARGS]
[--include_inputs_for_metrics [INCLUDE_INPUTS_FOR_METRICS]]
[--fp16_backend {auto,apex,cpu_amp}]
[--push_to_hub_model_id PUSH_TO_HUB_MODEL_ID]
[--push_to_hub_organization PUSH_TO_HUB_ORGANIZATION]
[--push_to_hub_token PUSH_TO_HUB_TOKEN] [--mp_parameters MP_PARAMETERS]
[--auto_find_batch_size [AUTO_FIND_BATCH_SIZE]]
[--full_determinism [FULL_DETERMINISM]] [--torchdynamo TORCHDYNAMO]
[--ray_scope RAY_SCOPE] [--ddp_timeout DDP_TIMEOUT]
[--torch_compile [TORCH_COMPILE]]
[--torch_compile_backend TORCH_COMPILE_BACKEND]
[--torch_compile_mode TORCH_COMPILE_MODE]
[--dispatch_batches DISPATCH_BATCHES] [--split_batches SPLIT_BATCHES]
[--include_tokens_per_second [INCLUDE_TOKENS_PER_SECOND]]
[--include_num_input_tokens_seen [INCLUDE_NUM_INPUT_TOKENS_SEEN]]
[--neftune_noise_alpha NEFTUNE_NOISE_ALPHA]
[--optim_target_modules OPTIM_TARGET_MODULES] [--cache_dir CACHE_DIR]
[--freeze_mm_mlp_adapter [FREEZE_MM_MLP_ADAPTER]]
[--mpt_attn_impl MPT_ATTN_IMPL] [--model_max_length MODEL_MAX_LENGTH]
[--double_quant [DOUBLE_QUANT]] [--no_double_quant] [--quant_type QUANT_TYPE]
[--bits BITS] [--lora_enable [LORA_ENABLE]] [--lora_r LORA_R]
[--lora_alpha LORA_ALPHA] [--lora_dropout LORA_DROPOUT]
[--lora_weight_path LORA_WEIGHT_PATH] [--lora_bias LORA_BIAS]
[--mm_projector_lr MM_PROJECTOR_LR]
[--group_by_modality_length [GROUP_BY_MODALITY_LENGTH]]
train.py: error: the following arguments are required: --output_dir
...
repeat for all 8 subprocesses
...
[2024-04-19 16:07:06,856] [INFO] [launch.py:316:sigkill_handler] Killing subprocess 4574
[2024-04-19 16:07:06,858] [INFO] [launch.py:316:sigkill_handler] Killing subprocess 4575
[2024-04-19 16:07:06,859] [INFO] [launch.py:316:sigkill_handler] Killing subprocess 4576
[2024-04-19 16:07:06,859] [INFO] [launch.py:316:sigkill_handler] Killing subprocess 4577
[2024-04-19 16:07:06,860] [INFO] [launch.py:316:sigkill_handler] Killing subprocess 4579
[2024-04-19 16:07:06,860] [INFO] [launch.py:316:sigkill_handler] Killing subprocess 4581
[2024-04-19 16:07:06,861] [INFO] [launch.py:316:sigkill_handler] Killing subprocess 4584
[2024-04-19 16:07:06,861] [INFO] [launch.py:316:sigkill_handler] Killing subprocess 4586
[2024-04-19 16:07:06,861] [ERROR] [launch.py:322:sigkill_handler] ['/usr/bin/python3', '-u', 'bunny/train/train.py', '--local_rank=7', '--deepspeed', './script/deepspeed/zero3.json', '--model_name_or_path', 'BAAI/Bunny-v1_0-3B', '--model_type', 'phi-2', '--version', 'bunny', '--data_path', '/image_text/train_list/train_impression_single_image.json', '--image_folder', '/image_text/datasets', '--vision_tower', 'google/siglip-so400m-patch14-384'] exits with return code = 2
script/train/finetune_full_baseline.sh: 25: --mm_projector_
请问可以支持量化加速推理么
Have the authors conducted this ablation experiment?
Great work! I want to know if your pre-training used LLaMA 3 or LLaMA 3-Instruct.
Dear,
I'm quite struggling to make sample code works on my laptop with a Nvidia A2000(8GB) card.
Does anyone has an advice?
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat2 in method wrapper_CUDA_mm)
import torch
import transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
from PIL import Image
import warnings
import pathlib
transformers.logging.set_verbosity_error()
transformers.logging.disable_progress_bar()
warnings.filterwarnings('ignore')
#torch.set_default_device('cuda') # or 'cuda'
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
torch_device = 'cuda' #auto, cpu
model_name = 'BAAI/Bunny-v1_0-3B' # or 'BAAI/Bunny-v1_0-2B-zh'
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16,
device_map=torch_device,
trust_remote_code=True)
#model.to(device)
tokenizer = AutoTokenizer.from_pretrained(
model_name,
trust_remote_code=True)
prompt = 'What happened in the image?'
text = f"A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: \n{prompt} ASSISTANT:"
text_chunks = [tokenizer(chunk).input_ids for chunk in text.split('')]
input_ids = torch.tensor(text_chunks[0] + [-200] + text_chunks[1], dtype=torch.long).unsqueeze(0)
#input_ids = torch.tensor(text_chunks[0] + [-200] + text_chunks[1], dtype=torch.long).to(torch_device).unsqueeze(0)
#input_ids = torch.tensor(text_chunks[0] + [-200] + text_chunks[1], dtype=model.dtype, device=torch_device).unsqueeze(0)
#input_ids = torch.tensor(text_chunks[0] + [-200] + text_chunks[1], dtype=model.dtype, device=torch_device).to(torch_device).unsqueeze(0)
#local image
file = pathlib.Path('C:/Users/Admin/Utils/Bunny-AI/slippery-person.jpeg')
image = Image.open(file)
image_tensor = model.process_images([image], model.config)
output_ids = model.generate(
input_ids,
#images=image_tensor
images=image_tensor.unsqueeze(0).to(dtype=model.dtype, device='cuda', non_blocking=True),
max_new_tokens=100,
use_cache=True)[0]
print(tokenizer.decode(output_ids[input_ids.shape[1]:], skip_special_tokens=True).strip())
Thanks for your great work! when I use modelscope python api to download training dataset, I failed:
>>> from modelscope.msdatasets import MsDataset
2024-03-20 14:56:10,539 - modelscope - INFO - PyTorch version 2.2.0+cu118 Found.
2024-03-20 14:56:10,542 - modelscope - INFO - Loading ast index from /mnt/afs1/likeqiang/.cache/modelscope/ast_indexer
2024-03-20 14:56:10,957 - modelscope - INFO - Loading done! Current index file version is 1.13.1, with md5 ac6c5f948b02361aa74e8bd
58f64a6f7 and a total number of 972 components indexed
>>> ds = MsDataset.load('BoyaWu10/Bunny-v1.0-data')
2024-03-20 14:56:21,614 - modelscope - INFO - No subset_name specified, defaulting to the default
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/mnt/afs1/likeqiang/miniconda3/envs/bunny/lib/python3.10/site-packages/modelscope/msdatasets/ms_dataset.py", line 284, in
load
dataset_inst = remote_dataloader_manager.load_dataset(
File "/mnt/afs1/likeqiang/miniconda3/envs/bunny/lib/python3.10/site-packages/modelscope/msdatasets/data_loader/data_loader_manag
er.py", line 132, in load_dataset
oss_downloader.process()
File "/mnt/afs1/likeqiang/miniconda3/envs/bunny/lib/python3.10/site-packages/modelscope/msdatasets/data_loader/data_loader.py",
line 83, in process
self._prepare_and_download()
File "/mnt/afs1/likeqiang/miniconda3/envs/bunny/lib/python3.10/site-packages/modelscope/msdatasets/data_loader/data_loader.py",
line 132, in _prepare_and_download
raise f'meta-file: {dataset_name}.py not found on the modelscope hub.'
TypeError: exceptions must derive from BaseException
when I use git clone
directly, it shows:
Cloning into 'Bunny-v1.0-data'...
remote: Enumerating objects: 50, done.
remote: Counting objects: 100% (50/50), done.
remote: Compressing objects: 100% (35/35), done.
remote: Total 50 (delta 17), reused 43 (delta 13), pack-reused 0
Unpacking objects: 100% (50/50), 6.23 KiB | 25.00 KiB/s, done.
Filtering content: 100% (11/11), 18.76 GiB | 5.17 MiB/s, done.
Encountered 9 files that may not have been copied correctly on Windows:
finetune/images.tar.gz.part-ad
pretrain/images.tar.gz.part-aa
finetune/images.tar.gz.part-ac
finetune/images.tar.gz.part-ab
pretrain/images.tar.gz.part-ae
pretrain/images.tar.gz.part-ac
pretrain/images.tar.gz.part-ab
pretrain/images.tar.gz.part-ad
finetune/images.tar.gz.part-aa
could you give me some advice? or can you upload to huggingface
?
btw, the Chinese ability is very bad. No OCR abilities.
Hi,
Great work on Bunny, super impressive! Would you mind adding a license for the code & weights?
Thx!
请问是否支持中文数据训练、推理呀?
Hello,
I attempted to instruction fine-tune the Bunny model, but found the mm_projector.bin is missing.
Would u please share ur pretrained mm_projector.bin?
Thank u for your assistance.
hi,
Great work! I tried this script huggingface-transformers, but found that the inference speed is much slower than the llava series. Do you have any relevant speed tests there?
Hello, ask a question about data sampling.
According to the explanation in the technical report, during the second stage of sampling pretraining data, "sort the remaining samples by the cosine similarity between its text embedding and image embedding and keep samples ranking 40% - 60%".
Why keep the portion ranked between 40% and 60%? Shouldn't the data with higher cosine similarity between text and image embeddings be considered higher quality data?
If possible, where can I get the weight of minigpt projector?
Hi, wanna give this a go, which format the model expects the json file to be?
is this good?
{
"id": "leia (1)",
"image": "/mnt/d/quicktest/leia (1).jpg",
"conversations": [
{
"from": "human",
"value": " <image>\ndescribe the image"
},
{
"from": "gpt",
"value": "Princess Leia on Andor"
}
]
},
File "xxx/bunnyllama3.py", line 36, in generate_inner
image_tensor = self.model.process_images([image], self.model.config).to(dtype=self.model.dtype)
File "xxx/.cache/huggingface/modules/transformers_modules/BAAI/Bunny-Llama-3-8B-V/f2df3cf03156eaba4c34815675d5aac9a9e0bec2/modeling_bunny_llama.py", line 2771, in process_images
image = self.expand2square(image, tuple(int(x * 255) for x in image_processor.image_mean))
File "xxx/.cache/huggingface/modules/transformers_modules/BAAI/Bunny-Llama-3-8B-V/f2df3cf03156eaba4c34815675d5aac9a9e0bec2/modeling_bunny_llama.py", line 2758, in expand2square
result = Image.new(pil_img.mode, (height, height), background_color)
File "/root/miniconda3/envs/tr440/lib/python3.9/site-packages/PIL/Image.py", line 2941, in new
return im._new(core.fill(mode, size, color))
TypeError: color must be int or single-element tuple
pillow == 10.2.0
transformers == 4.40.0
你好,在bunny-llama3图像处理中对于一些黑白照片,存在该报错问题,我将expand2square函数中的background_color由grey (127, 127, 127)修改为'white'后无报错,请问这样修改是否可以
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.