baai-dcai / bunny Goto Github PK

View Code? Open in Web Editor NEW

851.0 19.0 65.0 29.21 MB

A family of lightweight multimodal models.

License: Apache License 2.0

Python 97.69% Shell 2.31%

mllm chatgpt gpt-4 multimodal-large-language-models vlm chinese english

bunny's People

Contributors

Stargazers

Watchers

bunny's Issues

`lm_head.bias=False` and missing lm_head.bias weights

Hello! In this line of code, Bunny uses bias=False on the lm_head layer:

Bunny/bunny/model/language_model/bunny_phi.py

Line 32 in 9a15192

self.lm_head = nn.Linear(config.hidden_size, config.vocab_size, bias=False)

However in the original Phi code, it uses bias:

Bunny/bunny/model/language_model/phi/modeling_phi.py

Line 969 in 9a15192

self.lm_head = nn.Linear(config.hidden_size, config.vocab_size, bias=True)

I am trying Bunny-v1.0-3B through various quantization tools and faster model APIs that support Phi, but are failing due to missing this layer in the weights. They cannot easily be disabled it seems. Any suggestions on how to fix it in the model?

啥时候支持下ollama方便部署测试

中文版模型的训练集

你好，想问下，中文版模型的训练集用的是什么？

请帮忙解答一下训练的问题

作者你好，我在复现bunny_v1_0-2B-zh的过程中，pretrain阶段训出来保存的config.json中model_type是bunny-qwen，而开源中的config.json的model_type是bunny-qwen2，这个是什么原因？另外请问你在huggingface中开源的configuration_bunny_qwen2.py和modeling_bunny_qwen2.py这两个文件，是做什么用的？训练时是需要把这两个文件替换上去吗？

Question about uneven distribution of GPU memory

Dear author:

Thanks for your interesting work.

During the full or lora finetuning, the memory usage of different GPUs is uneven：

I wander if it's correct? And how to handle it?

Thanks!!

Catastrophic forgetting

I used Lora to fine tune my own dataset, but the model only replied to the content I had trained on, and I didn't know any other common sense content but Bunny-v1_0-2B-zh is ok
Do you have any training tricks？
self model

Bunny-v1_0-2B-zh

Dear Developers: we ask some base question！

Dear Developers:

Thank you to the BAAI team for open-sourcing the Bunny model. I've been actively exploring it these past few days. I have a few doubts regarding the deployment of the model, and I hope to get answers from the BAAI official technical team. Nevertheless, I am extremely grateful! The first question is: I want to know the GPU running conditions required for several versions of the model. For example, the Bunny-v1_0-3B full parameter version and the bunny-phi-2-siglip-lora version. so can you provide a list for comparison and clarification? What are the officially recommended GPU models and VRAM sizes?The second question is: Can this model integrate the controller, Web-UI server, and Model Worker directly into one bash command ? Currently, it seems that three separate bash commands need to be executed to start the controller, WebUI, and model inference. This seems to be considered for "microservices architecture" or "distributed system architecture". Is my understanding correct?If we deploy using Docker containers and use Kubernetes as the container visual management framework, can an official post be provided to explain in more detail the standard deployment process?

                                                                    by Isaac Wei Ran                                                                                                                  
                                                                    Guangzhou, China, 7th March 2024

环境配置步骤可以再完善一下

docker pull了之后可以提醒大家run一下，然后再在上面安装apex之类的

不过总体很清晰，点赞

After full-parameter fine-tuning on private data, using the bunny-qwen2 model for inference results in bug

Traceback (most recent call last):
  File "/research/zhangzr/Bunny/bunny/eval/model_vqa.py", line 112, in <module>
    eval_model(args)
  File "/research/zhangzr/Bunny/bunny/eval/model_vqa.py", line 64, in eval_model
    output_ids = model.generate(
  File "/research/chengruogu/anaconda3/envs/bunny/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/research/chengruogu/anaconda3/envs/bunny/lib/python3.10/site-packages/transformers/generation/utils.py", line 1544, in generate
    return self.greedy_search(
  File "/research/chengruogu/anaconda3/envs/bunny/lib/python3.10/site-packages/transformers/generation/utils.py", line 2404, in greedy_search
    outputs = self(
  File "/research/chengruogu/anaconda3/envs/bunny/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/research/chengruogu/anaconda3/envs/bunny/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/research/zhangzr/Bunny/bunny/model/language_model/bunny_qwen2.py", line 72, in forward
    return super().forward(
  File "/research/zhangzr/Bunny/bunny/model/language_model/qwen2/modeling_qwen2.py", line 1174, in forward
    outputs = self.model(
  File "/research/chengruogu/anaconda3/envs/bunny/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/research/chengruogu/anaconda3/envs/bunny/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/research/zhangzr/Bunny/bunny/model/language_model/qwen2/modeling_qwen2.py", line 1021, in forward
    attention_mask = _prepare_4d_causal_attention_mask_for_sdpa(
  File "/research/chengruogu/anaconda3/envs/bunny/lib/python3.10/site-packages/transformers/modeling_attn_mask_utils.py", line 398, in _prepare_4d_causal_attention_mask_for_sdpa
    expanded_4d_mask = attn_mask_converter.to_4d(
  File "/research/chengruogu/anaconda3/envs/bunny/lib/python3.10/site-packages/transformers/modeling_attn_mask_utils.py", line 137, in to_4d
    expanded_attn_mask = causal_4d_mask.masked_fill(expanded_attn_mask.bool(), torch.finfo(dtype).min)
RuntimeError: The size of tensor a (862) must match the size of tensor b (1723) at non-singleton dimension 3

No module named 'transformers_modules.BAAI.Bunny-v1

Hi, I got the following error. Thanks.

ModuleNotFoundError: No module named 'transformers_modules.BAAI.Bunny-v1'

Training data

Great work! Would you please tell me when will you release the training data? pre-training and fine-tuning.

Request for clarification on training strategy for bunny-llama-3-8b-v

Hello,

I noticed in the README that it mentions, "We use a better strategy to train Bunny-Llama-3-8B-V, which will be open-sourced soon!" Could you briefly describe this better strategy?

Thank you!

Great work！ Does Bunny support Chinese finetune/inference?

I am wondering whether Bunny support Chinese finetune/inference?

Bunny-v1.0-2B-zh 模型有时候用英文回答问题

def chat(image_url, prompt):
    image = read_image(image_url)
    image_tensor = model.process_images([image], model.config).to(dtype=model.dtype)
    text = f"你是一个非常好的人工智能助手,能够非常出色的和用户交谈. USER: <image>\n{prompt} ASSISTANT:"
    text_chunks = [tokenizer(chunk).input_ids for chunk in text.split('<image>')]
    input_ids = torch.tensor(text_chunks[0] + [-200] + text_chunks[1], dtype=torch.long).unsqueeze(0)
    output_ids = model.generate(
    input_ids,
    images=image_tensor,
    max_new_tokens=100,
    use_cache=True)[0]
    return tokenizer.decode(output_ids[input_ids.shape[1]:], skip_special_tokens=True).strip()

是不是 Qwen2 1.8B 的对话模版不一样？

MiniCPM support

Thanks for your great work! Can you support MiniCPM backbone?

模型无法理解用户输入了多少张图片

如图，我选了一张图片，回答说3张。

Can u share the training loss curve?

处理数据脚本

请问筛选高质量数据的多阶段过滤数据的脚本有开源吗

Questions about the technical report

Hello,

It's a great work! And there are several questions:

In the technical report you mentioned

We find that LoRA empirically leads to better performance than fully tuning across all combinations of model architectures, probably because smaller models are more susceptible to catastrophic forgetting, while LoRA tuning alleviates this issue.

By " fully tuning across all combinations of model architectures", do you mean finetune the SigLIP encoder + projector + phi2, or just projector + phi2? And why LoRA tuning can alleviate catastrophic forgetting (sorry I am not familiar with this...)? Note that in this paper, using LoRA cannot avoid the model overfitting the finetuning dataset.

How do you select the learning rate, batch size, and cosine annealing schedule? Do you perform hyperparameter searching?

To avoid overfitting, it seems that researchers only train LLaVA for one epoch (both the pretrain and finetuning phase). Therefore, the loss curve may not converge to the lowest point.

For example, this is my loss curve and learning rate schedule during the pretrain phase:

And the loss curve and lr schedule in the finetune phase:

I guess the network does not converge at all...so how do you determine these hyperparameters? Do you select a set of hyperparameters which makes your network fully converges? Or you just select a set of hyperparameters which has the best benchmark performance?

Best,
Starcycle

model.safetensors is slow to load

Can't wait to try, but the gradio page can't launch, what version of gradio you are using? The cli is slow to load model.safetensors

Repeat Generation

With the example parameters some time the generation is repeat. What need to adjust? thanks

# generate
output_ids = model.generate(
    input_ids,
    images=image_tensor,
    max_new_tokens=100,
    use_cache=True)[0]

run worker error

root@ubuntu-Z690:/mnt/workspace/.cache/modelscope/BAAI/Bunny-v1___0-3B# python -m bunny.serve.model_worker --host 0.0.0.0 --controller http://localhost:10000 --port 40000 --worker http://localhost:40000 --model-path /mnt/workspace/.cache/modelscope/BAAI/

2024-04-26 11:49:04.450276: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
2024-04-26 11:49:04.451693: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-04-26 11:49:04.469047: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-04-26 11:49:04.469064: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-04-26 11:49:04.469078: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-04-26 11:49:04.472914: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-04-26 11:49:04.473033: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-26 11:49:04.905731: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/Bunny/bunny/serve/model_worker.py", line 20, in
from bunny.model.builder import load_pretrained_model
File "/Bunny/bunny/model/init.py", line 1, in
from .language_model.bunny_phi import BunnyPhiForCausalLM, BunnyPhiConfig
File "/Bunny/bunny/model/language_model/bunny_phi.py", line 11, in
from ..bunny_arch import BunnyMetaModel, BunnyMetaForCausalLM
File "/Bunny/bunny/model/bunny_arch.py", line 6, in
from .multimodal_projector.builder import build_vision_projector
File "/Bunny/bunny/model/multimodal_projector/builder.py", line 5, in
from timm.layers.norm_act import LayerNormAct2d
ModuleNotFoundError: No module named 'timm.layers'

quantisation modes

Can the bunny models be loaded in 4bit or 8bit quantised modes?

Have you tried dbscan?

Great job, have you tried dbscan? Which one do you think is better using kmeans or dbscan? I think it can be encapsulated into a general data processing program

How many hours on 8 A100 GPUs?

Hi, first of all really impressive work!

The repo states that the paper used 8 A100 GPUs for training, but can I ask how many hours training took with those GPUs?

Thank you!

Missing configuration file when loading merged weights

Thanks for opensource your great work! I'm interested in experimenting with other backbone models and vision encoders. However, I encountered an issue when attempting to load merged weights from a locally saved path. I received the following error:

requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/BoyaWu10/bunny-phi-2-eva-lora/resolve/main/configuration_phi.py.

I tried loading from Hugging Face using the following code:

model = AutoModelForCausalLM.from_pretrained(
    '/path/to/local/weights',
    torch_dtype=torch.float16,
    device_map='auto',
    trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(
    '/path/to/local/weights',
    trust_remote_code=True)

Any suggestions?

A few issues running the README

I'm trying to spin up the server so I can run this for inference as described in the README and I've hit a few issues.

First: demo_3.png and demo_4.png don't exist. This is easy to fix here:

Bunny/bunny/serve/gradio_web_server.py

Lines 348 to 349 in 516437e

 [f"{cur_dir}/examples/demo_3.png", "What is the astronaut holding in his hand?"], 

 [f"{cur_dir}/examples/demo_4.png", "Why is the image funny?"],

should be edited to example_1.png and example_2.png.

Second: (and why this isn't just a PR) I can't get the service to start with the model to load phi-2. I'm just trying to get the inference demo working.

If I run the model_worker service with --model-type phi-2 then I get a crash KeyError: 'BunnyPhiConfig' when it tries to load the tokenizer. It looks like you try to configure this config somewhere but it doesn't get added to the huggingface transformers list of known configs for some reason.

Are there other steps required (e.g., modifying the huggingface code)?

Third: I don't understand what set of model paths I should be passing to run the service if I don't want to fine-tune anything. Could you give an example for what model-path should be? I've downloaded bunny-phi-2-siglip-lora and I'm passing this as the path, but I can't test this because of the prior crash.

I'm pretty sure I have the correct versions of everything installed. Have you tried following the readme on a clean machine install to verify it runs as expected?

lora train question

Thanks for your great work!
my train sh

why final lora config model_type is bunny-qwen?

Can you provide the pretrain data filter script?

Thanks for your great work! Can you provide the script used to filter raw data from LAION-2B?

Training Data

Great work! Would you please tell me when will you release the training data? pre-training and fine-tuning.

train.py: error: the following arguments are required: --output_dir

I am trying to finetune my model for a specific task using my own dataset. I have already format the dataset correctly according to the docs. Here I got weird error of train.py: error: the following arguments are required: --output_dir in the subprocesses even I already put it in my arguments. Do you have ideas what might be the cause of this? Thanks!

This is my finetune.sh


MODEL_PATH=/image_text/models/Bunny-v1_0-3B
MODEL_TYPE=phi-2

PRETRAIN_DIR=bunny-$MODEL_TYPE-pretrain
OUTPUT_DIR=bunny-$MODEL_TYPE-test

# JSON LIST
DATA_PATH=image_text/train_list/train_single_image.json
IMAGE_FOLDER=image_text/datasets


mkdir -p ./checkpoints-$MODEL_TYPE/$OUTPUT_DIR

deepspeed bunny/train/train.py \
    --deepspeed ./script/deepspeed/zero3.json \
    --model_name_or_path $MODEL_PATH \
    --model_type $MODEL_TYPE \
    --version bunny \
    --data_path $DATA_PATH \
    --image_folder $IMAGE_FOLDER \
    --vision_tower google/siglip-so400m-patch14-384 \
    # --pretrain_mm_mlp_adapter ./checkpoints-pretrain/$PRETRAIN_DIR/mm_projector.bin \
    --mm_projector_type mlp2x_gelu \
    --image_aspect_ratio pad \
    --group_by_modality_length False \
    --bf16 True \
    --output_dir ./checkpoints-$MODEL_TYPE/$OUTPUT_DIR \
    --num_train_epochs 1 \
    --per_device_train_batch_size 8 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 2 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 500 \
    --save_total_limit 1 \
    --learning_rate 1e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --tf32 True \
    --model_max_length 2048 \
    --gradient_checkpointing True \
    --dataloader_num_workers 4 \
    --lazy_preprocess True \
    --report_to none | tee 2>&1 ./checkpoints-$MODEL_TYPE/$OUTPUT_DIR/log.txt

This is the error I got.

root@sv:/image_text/Bunny# [2024-04-19 16:06:53,436] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
 [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
 [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2
 [WARNING]  using untested triton version (2.2.0), only 1.0.0 is known to be compatible
[2024-04-19 16:06:54,635] [WARNING] [runner.py:202:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2024-04-19 16:06:54,636] [INFO] [runner.py:568:main] cmd = /usr/bin/python3 -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgMywgNCwgNSwgNiwgN119 --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None bunny/train/train.py --deepspeed ./script/deepspeed/zero3.json --model_name_or_path BAAI/Bunny-v1_0-3B --model_type phi-2 --version bunny --data_path /image_text/train_list/train_impression_single_image.json --image_folder /image_text/datasets --vision_tower google/siglip-so400m-patch14-384
[2024-04-19 16:06:57,324] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
 [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
 [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.2
 [WARNING]  using untested triton version (2.2.0), only 1.0.0 is known to be compatible
[2024-04-19 16:06:59,716] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_PACKAGE_VERSION=2.16.2-1
[2024-04-19 16:06:59,716] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_DEV_PACKAGE_VERSION=2.16.2-1
[2024-04-19 16:06:59,717] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_PACKAGE_NAME=libnccl2
[2024-04-19 16:06:59,717] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_DEV_PACKAGE_NAME=libnccl-dev
[2024-04-19 16:06:59,717] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_PACKAGE=libnccl2=2.16.2-1+cuda11.8
[2024-04-19 16:06:59,717] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_DEV_PACKAGE=libnccl-dev=2.16.2-1+cuda11.8
[2024-04-19 16:06:59,717] [INFO] [launch.py:138:main] 0 NCCL_VERSION=2.16.2-1
[2024-04-19 16:06:59,717] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]}
[2024-04-19 16:06:59,717] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=8, node_rank=0
[2024-04-19 16:06:59,717] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]})
[2024-04-19 16:06:59,717] [INFO] [launch.py:163:main] dist_world_size=8
[2024-04-19 16:06:59,717] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
[2024-04-19 16:06:59,737] [INFO] [launch.py:253:main] process 4574 spawned with command: ['/usr/bin/python3', '-u', 'bunny/train/train.py', '--local_rank=0', '--deepspeed', './script/deepspeed/zero3.json', '--model_name_or_path', 'BAAI/Bunny-v1_0-3B', '--model_type', 'phi-2', '--version', 'bunny', '--data_path', '/image_text/train_list/train_impression_single_image.json', '--image_folder', '/image_text/datasets', '--vision_tower', 'google/siglip-so400m-patch14-384']
[2024-04-19 16:06:59,748] [INFO] [launch.py:253:main] process 4575 spawned with command: ['/usr/bin/python3', '-u', 'bunny/train/train.py', '--local_rank=1', '--deepspeed', './script/deepspeed/zero3.json', '--model_name_or_path', 'BAAI/Bunny-v1_0-3B', '--model_type', 'phi-2', '--version', 'bunny', '--data_path', '/image_text/train_list/train_impression_single_image.json', '--image_folder', '/image_text/datasets', '--vision_tower', 'google/siglip-so400m-patch14-384']
[2024-04-19 16:06:59,761] [INFO] [launch.py:253:main] process 4576 spawned with command: ['/usr/bin/python3', '-u', 'bunny/train/train.py', '--local_rank=2', '--deepspeed', './script/deepspeed/zero3.json', '--model_name_or_path', 'BAAI/Bunny-v1_0-3B', '--model_type', 'phi-2', '--version', 'bunny', '--data_path', '/image_text/train_list/train_impression_single_image.json', '--image_folder', '/image_text/datasets', '--vision_tower', 'google/siglip-so400m-patch14-384']
[2024-04-19 16:06:59,773] [INFO] [launch.py:253:main] process 4577 spawned with command: ['/usr/bin/python3', '-u', 'bunny/train/train.py', '--local_rank=3', '--deepspeed', './script/deepspeed/zero3.json', '--model_name_or_path', 'BAAI/Bunny-v1_0-3B', '--model_type', 'phi-2', '--version', 'bunny', '--data_path', '/image_text/train_list/train_impression_single_image.json', '--image_folder', '/image_text/datasets', '--vision_tower', 'google/siglip-so400m-patch14-384']
[2024-04-19 16:06:59,791] [INFO] [launch.py:253:main] process 4579 spawned with command: ['/usr/bin/python3', '-u', 'bunny/train/train.py', '--local_rank=4', '--deepspeed', './script/deepspeed/zero3.json', '--model_name_or_path', 'BAAI/Bunny-v1_0-3B', '--model_type', 'phi-2', '--version', 'bunny', '--data_path', '/image_text/train_list/train_impression_single_image.json', '--image_folder', '/image_text/datasets', '--vision_tower', 'google/siglip-so400m-patch14-384']
[2024-04-19 16:06:59,810] [INFO] [launch.py:253:main] process 4581 spawned with command: ['/usr/bin/python3', '-u', 'bunny/train/train.py', '--local_rank=5', '--deepspeed', './script/deepspeed/zero3.json', '--model_name_or_path', 'BAAI/Bunny-v1_0-3B', '--model_type', 'phi-2', '--version', 'bunny', '--data_path', '/image_text/train_list/train_impression_single_image.json', '--image_folder', '/image_text/datasets', '--vision_tower', 'google/siglip-so400m-patch14-384']
[2024-04-19 16:06:59,829] [INFO] [launch.py:253:main] process 4584 spawned with command: ['/usr/bin/python3', '-u', 'bunny/train/train.py', '--local_rank=6', '--deepspeed', './script/deepspeed/zero3.json', '--model_name_or_path', 'BAAI/Bunny-v1_0-3B', '--model_type', 'phi-2', '--version', 'bunny', '--data_path', '/image_text/train_list/train_impression_single_image.json', '--image_folder', '/image_text/datasets', '--vision_tower', 'google/siglip-so400m-patch14-384']
[2024-04-19 16:06:59,848] [INFO] [launch.py:253:main] process 4586 spawned with command: ['/usr/bin/python3', '-u', 'bunny/train/train.py', '--local_rank=7', '--deepspeed', './script/deepspeed/zero3.json', '--model_name_or_path', 'BAAI/Bunny-v1_0-3B', '--model_type', 'phi-2', '--version', 'bunny', '--data_path', '/image_text/train_list/train_impression_single_image.json', '--image_folder', '/image_text/datasets', '--vision_tower', 'google/siglip-so400m-patch14-384']
usage: train.py [-h] [--model_name_or_path MODEL_NAME_OR_PATH] [--model_type MODEL_TYPE]
                [--version VERSION] [--freeze_backbone [FREEZE_BACKBONE]]
                [--tune_mm_mlp_adapter [TUNE_MM_MLP_ADAPTER]] [--vision_tower VISION_TOWER]
                [--pretrain_mm_mlp_adapter PRETRAIN_MM_MLP_ADAPTER]
                [--mm_projector_type MM_PROJECTOR_TYPE] [--data_path DATA_PATH]
                [--lazy_preprocess [LAZY_PREPROCESS]] [--is_multimodal [IS_MULTIMODAL]]
                [--no_is_multimodal] [--image_folder IMAGE_FOLDER]
                [--image_aspect_ratio IMAGE_ASPECT_RATIO] --output_dir OUTPUT_DIR
                [--overwrite_output_dir [OVERWRITE_OUTPUT_DIR]] [--do_train [DO_TRAIN]]
                [--do_eval [DO_EVAL]] [--do_predict [DO_PREDICT]]
                [--evaluation_strategy {no,steps,epoch}]
                [--prediction_loss_only [PREDICTION_LOSS_ONLY]]
                [--per_device_train_batch_size PER_DEVICE_TRAIN_BATCH_SIZE]
                [--per_device_eval_batch_size PER_DEVICE_EVAL_BATCH_SIZE]
                [--per_gpu_train_batch_size PER_GPU_TRAIN_BATCH_SIZE]
                [--per_gpu_eval_batch_size PER_GPU_EVAL_BATCH_SIZE]
                [--gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS]
                [--eval_accumulation_steps EVAL_ACCUMULATION_STEPS] [--eval_delay EVAL_DELAY]
                [--learning_rate LEARNING_RATE] [--weight_decay WEIGHT_DECAY]
                [--adam_beta1 ADAM_BETA1] [--adam_beta2 ADAM_BETA2]
                [--adam_epsilon ADAM_EPSILON] [--max_grad_norm MAX_GRAD_NORM]
                [--num_train_epochs NUM_TRAIN_EPOCHS] [--max_steps MAX_STEPS]
                [--lr_scheduler_type {linear,cosine,cosine_with_restarts,polynomial,constant,constant_with_warmup,inverse_sqrt,reduce_lr_on_plateau}]
                [--lr_scheduler_kwargs LR_SCHEDULER_KWARGS] [--warmup_ratio WARMUP_RATIO]
                [--warmup_steps WARMUP_STEPS]
                [--log_level {detail,debug,info,warning,error,critical,passive}]
                [--log_level_replica {detail,debug,info,warning,error,critical,passive}]
                [--log_on_each_node [LOG_ON_EACH_NODE]] [--no_log_on_each_node]
                [--logging_dir LOGGING_DIR] [--logging_strategy {no,steps,epoch}]
                [--logging_first_step [LOGGING_FIRST_STEP]] [--logging_steps LOGGING_STEPS]
                [--logging_nan_inf_filter [LOGGING_NAN_INF_FILTER]]
                [--no_logging_nan_inf_filter] [--save_strategy {no,steps,epoch}]
                [--save_steps SAVE_STEPS] [--save_total_limit SAVE_TOTAL_LIMIT]
                [--save_safetensors [SAVE_SAFETENSORS]] [--no_save_safetensors]
                [--save_on_each_node [SAVE_ON_EACH_NODE]]
                [--save_only_model [SAVE_ONLY_MODEL]] [--no_cuda [NO_CUDA]]
                [--use_cpu [USE_CPU]] [--use_mps_device [USE_MPS_DEVICE]] [--seed SEED]
                [--data_seed DATA_SEED] [--jit_mode_eval [JIT_MODE_EVAL]]
                [--use_ipex [USE_IPEX]] [--bf16 [BF16]] [--fp16 [FP16]]
                [--fp16_opt_level FP16_OPT_LEVEL]
                [--half_precision_backend {auto,apex,cpu_amp}]
                [--bf16_full_eval [BF16_FULL_EVAL]] [--fp16_full_eval [FP16_FULL_EVAL]]
                [--tf32 TF32] [--local_rank LOCAL_RANK]
                [--ddp_backend {nccl,gloo,mpi,ccl,hccl}] [--tpu_num_cores TPU_NUM_CORES]
                [--tpu_metrics_debug [TPU_METRICS_DEBUG]] [--debug DEBUG [DEBUG ...]]
                [--dataloader_drop_last [DATALOADER_DROP_LAST]] [--eval_steps EVAL_STEPS]
                [--dataloader_num_workers DATALOADER_NUM_WORKERS]
                [--dataloader_prefetch_factor DATALOADER_PREFETCH_FACTOR]
                [--past_index PAST_INDEX] [--run_name RUN_NAME] [--disable_tqdm DISABLE_TQDM]
                [--remove_unused_columns [REMOVE_UNUSED_COLUMNS]]
                [--label_names LABEL_NAMES [LABEL_NAMES ...]]
                [--load_best_model_at_end [LOAD_BEST_MODEL_AT_END]]
                [--metric_for_best_model METRIC_FOR_BEST_MODEL]
                [--greater_is_better GREATER_IS_BETTER]
                [--ignore_data_skip [IGNORE_DATA_SKIP]] [--fsdp FSDP]
                [--fsdp_min_num_params FSDP_MIN_NUM_PARAMS] [--fsdp_config FSDP_CONFIG]
                [--fsdp_transformer_layer_cls_to_wrap FSDP_TRANSFORMER_LAYER_CLS_TO_WRAP]
                [--accelerator_config ACCELERATOR_CONFIG] [--deepspeed DEEPSPEED]
                [--label_smoothing_factor LABEL_SMOOTHING_FACTOR] [--optim OPTIM]
                [--optim_args OPTIM_ARGS] [--adafactor [ADAFACTOR]]
                [--group_by_length [GROUP_BY_LENGTH]]
                [--length_column_name LENGTH_COLUMN_NAME]
                [--report_to REPORT_TO [REPORT_TO ...]]
                [--ddp_find_unused_parameters DDP_FIND_UNUSED_PARAMETERS]
                [--ddp_bucket_cap_mb DDP_BUCKET_CAP_MB]
                [--ddp_broadcast_buffers DDP_BROADCAST_BUFFERS]
                [--dataloader_pin_memory [DATALOADER_PIN_MEMORY]]
                [--no_dataloader_pin_memory]
                [--dataloader_persistent_workers [DATALOADER_PERSISTENT_WORKERS]]
                [--skip_memory_metrics [SKIP_MEMORY_METRICS]] [--no_skip_memory_metrics]
                [--use_legacy_prediction_loop [USE_LEGACY_PREDICTION_LOOP]]
                [--push_to_hub [PUSH_TO_HUB]]
                [--resume_from_checkpoint RESUME_FROM_CHECKPOINT]
                [--hub_model_id HUB_MODEL_ID]
                [--hub_strategy {end,every_save,checkpoint,all_checkpoints}]
                [--hub_token HUB_TOKEN] [--hub_private_repo [HUB_PRIVATE_REPO]]
                [--hub_always_push [HUB_ALWAYS_PUSH]]
                [--gradient_checkpointing [GRADIENT_CHECKPOINTING]]
                [--gradient_checkpointing_kwargs GRADIENT_CHECKPOINTING_KWARGS]
                [--include_inputs_for_metrics [INCLUDE_INPUTS_FOR_METRICS]]
                [--fp16_backend {auto,apex,cpu_amp}]
                [--push_to_hub_model_id PUSH_TO_HUB_MODEL_ID]
                [--push_to_hub_organization PUSH_TO_HUB_ORGANIZATION]
                [--push_to_hub_token PUSH_TO_HUB_TOKEN] [--mp_parameters MP_PARAMETERS]
                [--auto_find_batch_size [AUTO_FIND_BATCH_SIZE]]
                [--full_determinism [FULL_DETERMINISM]] [--torchdynamo TORCHDYNAMO]
                [--ray_scope RAY_SCOPE] [--ddp_timeout DDP_TIMEOUT]
                [--torch_compile [TORCH_COMPILE]]
                [--torch_compile_backend TORCH_COMPILE_BACKEND]
                [--torch_compile_mode TORCH_COMPILE_MODE]
                [--dispatch_batches DISPATCH_BATCHES] [--split_batches SPLIT_BATCHES]
                [--include_tokens_per_second [INCLUDE_TOKENS_PER_SECOND]]
                [--include_num_input_tokens_seen [INCLUDE_NUM_INPUT_TOKENS_SEEN]]
                [--neftune_noise_alpha NEFTUNE_NOISE_ALPHA]
                [--optim_target_modules OPTIM_TARGET_MODULES] [--cache_dir CACHE_DIR]
                [--freeze_mm_mlp_adapter [FREEZE_MM_MLP_ADAPTER]]
                [--mpt_attn_impl MPT_ATTN_IMPL] [--model_max_length MODEL_MAX_LENGTH]
                [--double_quant [DOUBLE_QUANT]] [--no_double_quant] [--quant_type QUANT_TYPE]
                [--bits BITS] [--lora_enable [LORA_ENABLE]] [--lora_r LORA_R]
                [--lora_alpha LORA_ALPHA] [--lora_dropout LORA_DROPOUT]
                [--lora_weight_path LORA_WEIGHT_PATH] [--lora_bias LORA_BIAS]
                [--mm_projector_lr MM_PROJECTOR_LR]
                [--group_by_modality_length [GROUP_BY_MODALITY_LENGTH]]
train.py: error: the following arguments are required: --output_dir
... 
repeat for all 8 subprocesses
...
[2024-04-19 16:07:06,856] [INFO] [launch.py:316:sigkill_handler] Killing subprocess 4574
[2024-04-19 16:07:06,858] [INFO] [launch.py:316:sigkill_handler] Killing subprocess 4575
[2024-04-19 16:07:06,859] [INFO] [launch.py:316:sigkill_handler] Killing subprocess 4576
[2024-04-19 16:07:06,859] [INFO] [launch.py:316:sigkill_handler] Killing subprocess 4577
[2024-04-19 16:07:06,860] [INFO] [launch.py:316:sigkill_handler] Killing subprocess 4579
[2024-04-19 16:07:06,860] [INFO] [launch.py:316:sigkill_handler] Killing subprocess 4581
[2024-04-19 16:07:06,861] [INFO] [launch.py:316:sigkill_handler] Killing subprocess 4584
[2024-04-19 16:07:06,861] [INFO] [launch.py:316:sigkill_handler] Killing subprocess 4586
[2024-04-19 16:07:06,861] [ERROR] [launch.py:322:sigkill_handler] ['/usr/bin/python3', '-u', 'bunny/train/train.py', '--local_rank=7', '--deepspeed', './script/deepspeed/zero3.json', '--model_name_or_path', 'BAAI/Bunny-v1_0-3B', '--model_type', 'phi-2', '--version', 'bunny', '--data_path', '/image_text/train_list/train_impression_single_image.json', '--image_folder', '/image_text/datasets', '--vision_tower', 'google/siglip-so400m-patch14-384'] exits with return code = 2
script/train/finetune_full_baseline.sh: 25: --mm_projector_

请问支持量化推理吗？

请问可以支持量化加速推理么

What is the performance of training LLAVA by Bunny-pretrain-LAION-2M and Bunny-695K?

Have the authors conducted this ablation experiment?

LLaMA3 or LLaMA3-Instruct

Great work! I want to know if your pre-training used LLaMA 3 or LLaMA 3-Instruct.

Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0

Dear,

I'm quite struggling to make sample code works on my laptop with a Nvidia A2000(8GB) card.

Does anyone has an advice?

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat2 in method wrapper_CUDA_mm)

import torch
import transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
from PIL import Image
import warnings
import pathlib

disable some warnings

transformers.logging.set_verbosity_error()
transformers.logging.disable_progress_bar()
warnings.filterwarnings('ignore')

set device

#torch.set_default_device('cuda') # or 'cuda'
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
torch_device = 'cuda' #auto, cpu

model_name = 'BAAI/Bunny-v1_0-3B' # or 'BAAI/Bunny-v1_0-2B-zh'

create model

model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16,
device_map=torch_device,
trust_remote_code=True)

#model.to(device)

tokenizer = AutoTokenizer.from_pretrained(
model_name,
trust_remote_code=True)

text prompt

prompt = 'What happened in the image?'
text = f"A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: \n{prompt} ASSISTANT:"
text_chunks = [tokenizer(chunk).input_ids for chunk in text.split('')]

input_ids = torch.tensor(text_chunks[0] + [-200] + text_chunks[1], dtype=torch.long).unsqueeze(0)
#input_ids = torch.tensor(text_chunks[0] + [-200] + text_chunks[1], dtype=torch.long).to(torch_device).unsqueeze(0)
#input_ids = torch.tensor(text_chunks[0] + [-200] + text_chunks[1], dtype=model.dtype, device=torch_device).unsqueeze(0)
#input_ids = torch.tensor(text_chunks[0] + [-200] + text_chunks[1], dtype=model.dtype, device=torch_device).to(torch_device).unsqueeze(0)

#local image
file = pathlib.Path('C:/Users/Admin/Utils/Bunny-AI/slippery-person.jpeg')
image = Image.open(file)
image_tensor = model.process_images([image], model.config)

generate

output_ids = model.generate(
input_ids,
#images=image_tensor
images=image_tensor.unsqueeze(0).to(dtype=model.dtype, device='cuda', non_blocking=True),
max_new_tokens=100,
use_cache=True)[0]

print(tokenizer.decode(output_ids[input_ids.shape[1]:], skip_special_tokens=True).strip())

download the training data

Thanks for your great work! when I use modelscope python api to download training dataset, I failed:

>>> from modelscope.msdatasets import MsDataset
2024-03-20 14:56:10,539 - modelscope - INFO - PyTorch version 2.2.0+cu118 Found.
2024-03-20 14:56:10,542 - modelscope - INFO - Loading ast index from /mnt/afs1/likeqiang/.cache/modelscope/ast_indexer
2024-03-20 14:56:10,957 - modelscope - INFO - Loading done! Current index file version is 1.13.1, with md5 ac6c5f948b02361aa74e8bd
58f64a6f7 and a total number of 972 components indexed
>>> ds =  MsDataset.load('BoyaWu10/Bunny-v1.0-data')
2024-03-20 14:56:21,614 - modelscope - INFO - No subset_name specified, defaulting to the default
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/mnt/afs1/likeqiang/miniconda3/envs/bunny/lib/python3.10/site-packages/modelscope/msdatasets/ms_dataset.py", line 284, in 
load
    dataset_inst = remote_dataloader_manager.load_dataset(
  File "/mnt/afs1/likeqiang/miniconda3/envs/bunny/lib/python3.10/site-packages/modelscope/msdatasets/data_loader/data_loader_manag
er.py", line 132, in load_dataset
    oss_downloader.process()
  File "/mnt/afs1/likeqiang/miniconda3/envs/bunny/lib/python3.10/site-packages/modelscope/msdatasets/data_loader/data_loader.py", 
line 83, in process
    self._prepare_and_download()
  File "/mnt/afs1/likeqiang/miniconda3/envs/bunny/lib/python3.10/site-packages/modelscope/msdatasets/data_loader/data_loader.py", 
line 132, in _prepare_and_download
    raise f'meta-file: {dataset_name}.py not found on the modelscope hub.'
TypeError: exceptions must derive from BaseException

when I use git clone directly, it shows:

Cloning into 'Bunny-v1.0-data'...
remote: Enumerating objects: 50, done.
remote: Counting objects: 100% (50/50), done.
remote: Compressing objects: 100% (35/35), done.
remote: Total 50 (delta 17), reused 43 (delta 13), pack-reused 0
Unpacking objects: 100% (50/50), 6.23 KiB | 25.00 KiB/s, done.
Filtering content: 100% (11/11), 18.76 GiB | 5.17 MiB/s, done.
Encountered 9 files that may not have been copied correctly on Windows:
        finetune/images.tar.gz.part-ad
        pretrain/images.tar.gz.part-aa
        finetune/images.tar.gz.part-ac
        finetune/images.tar.gz.part-ab
        pretrain/images.tar.gz.part-ae
        pretrain/images.tar.gz.part-ac
        pretrain/images.tar.gz.part-ab
        pretrain/images.tar.gz.part-ad
        finetune/images.tar.gz.part-aa

could you give me some advice? or can you upload to huggingface?

Llama3的 projector 和 lora 在哪单独下载？

what is the performance gap between lora and full finetune

Can u please opensource the data which might be the only useful part or make it more clear which part of data were used during pretrain and finetuing?

btw, the Chinese ability is very bad. No OCR abilities.

License

Hi,
Great work on Bunny, super impressive! Would you mind adding a license for the code & weights?
Thx!

Great work! Does Bunny support Chinese finetune/inference?

请问是否支持中文数据训练、推理呀？

Could u share pretrained mm_projector.bin

Hello,

I attempted to instruction fine-tune the Bunny model, but found the mm_projector.bin is missing.

Would u please share ur pretrained mm_projector.bin？

Thank u for your assistance.

test on gradio outputs random stuff

Model inference speed

hi,
Great work! I tried this script huggingface-transformers, but found that the inference speed is much slower than the llava series. Do you have any relevant speed tests there?

A question about sampling pre-train data

Hello, ask a question about data sampling.

According to the explanation in the technical report, during the second stage of sampling pretraining data, "sort the remaining samples by the cosine similarity between its text embedding and image embedding and keep samples ranking 40% - 60%".

Why keep the portion ranked between 40% and 60%? Shouldn't the data with higher cosine similarity between text and image embeddings be considered higher quality data?

Great work! May I know about when you will release the two-stage data?

Checkpoint of different projector

If possible, where can I get the weight of minigpt projector?

Great job! Does it support fine-tuning Chinese data

json format

Hi, wanna give this a go, which format the model expects the json file to be?

is this good?

  {
    "id": "leia (1)",
    "image": "/mnt/d/quicktest/leia (1).jpg",
    "conversations": [
      {
        "from": "human",
        "value": " <image>\ndescribe the image"
      },
      {
        "from": "gpt",
        "value": "Princess Leia on Andor"
      }
    ]
  },

bunny-llama3图像处理问题

File "xxx/bunnyllama3.py", line 36, in generate_inner
    image_tensor = self.model.process_images([image], self.model.config).to(dtype=self.model.dtype)
  File "xxx/.cache/huggingface/modules/transformers_modules/BAAI/Bunny-Llama-3-8B-V/f2df3cf03156eaba4c34815675d5aac9a9e0bec2/modeling_bunny_llama.py", line 2771, in process_images
    image = self.expand2square(image, tuple(int(x * 255) for x in image_processor.image_mean))
  File "xxx/.cache/huggingface/modules/transformers_modules/BAAI/Bunny-Llama-3-8B-V/f2df3cf03156eaba4c34815675d5aac9a9e0bec2/modeling_bunny_llama.py", line 2758, in expand2square
    result = Image.new(pil_img.mode, (height, height), background_color)
  File "/root/miniconda3/envs/tr440/lib/python3.9/site-packages/PIL/Image.py", line 2941, in new
    return im._new(core.fill(mode, size, color))
TypeError: color must be int or single-element tuple

pillow == 10.2.0
transformers == 4.40.0
你好，在bunny-llama3图像处理中对于一些黑白照片，存在该报错问题，我将expand2square函数中的background_color由grey (127, 127, 127)修改为'white'后无报错，请问这样修改是否可以

	[f"{cur_dir}/examples/demo_3.png", "What is the astronaut holding in his hand?"],
	[f"{cur_dir}/examples/demo_4.png", "Why is the image funny?"],

baai-dcai / bunny Goto Github PK

bunny's People

Contributors

Stargazers

Watchers

Forkers

bunny's Issues

disable some warnings

set device

create model

text prompt

generate

Recommend Projects

Recommend Topics

Recommend Org