Giter Club home page Giter Club logo

coin's Introduction

CoIN: A Benchmark of ContinuaL Instruction tuNing for Multimodel Large Language Model

Cheng Chen, Junchen Zhu, Xu Luo, Hengtao Shen, LianLi Gao, Jingkuan Song.

Abstract

Instruction tuning demonstrates impressive performance in adapting Multimodal Large Language Models (MLLMs) to follow task instructions and improve generalization ability. By extending tuning across diverse tasks, MLLMs can further enhance their understanding of world knowledge and instruction intent. However, continual instruction tuning has been largely overlooked and there are no public benchmarks available. In this paper, we present CoIN, a comprehensive benchmark tailored for assessing the behavior of existing MLLMs under continual instruction tunning. CoIN comprises 10 meticulously crafted datasets spanning 8 tasks, ensuring diversity and serving as a robust evaluation framework to assess crucial aspects of continual instruction tuning, such as task order, instruction diversity and volume. Additionally, apart from traditional evaluation, we design another LLM-based metric to assess the knowledge preserved within MLLMs for reasoning. Following an in-depth evaluation of several MLLMs, we demonstrate that they still suffer catastrophic forgetting, and the failure in instruction alignment assumes the main responsibility, instead of reasoning knowledge forgetting. To this end, we introduce MoELoRA which is effective in retaining the previous instruction alignment.

Install

  1. Clone this repository and navigate to CoIN folder
git clone https://github.com/zackschen/CoIN.git
cd CoIN 
  1. Install Package
conda create -n coin python=3.10 -y
conda activate coin
pip install --upgrade pip
pip install -e .
  1. Install additional packages for training cases
pip install -e ".[train]"
pip install flash-attn --no-build-isolation

This repo is based on LLaVA. If you meet a problem, maybe you could find some solutions in issuses.

Dataset

Please download the images from the constituting dataset: ScienceQA, VQAv2, VizWiz, TextVQA, GQA, OCR-VQA, ImageNet, RefCOCO, RefCOCO+, and RefCOCOg.

Image Source Download Path
COCO train2014, test2015, val2014
RefCOCO annotation
RefCOCO+ annotation
RefCOCOg annotation
ImageNet images
OCR-VQA images
GQA images
TextVQA train,test
ScienceQA images
VizWiz train, val, test

After downloading all of them, organize the data as follows:

├── COCO2014
│   └── train2014
├── GQA
│   └── images
├── OCR-VQA
│   └── images
├── TextVQA
│   └── train_images
│   └── test_images

Then, please download the instructions from our datasets path: CoIN_Dataset then, organize the instructions as follows:

├── Instruction_Original
│   └── GQA
│       └── train.json
│       └── test.json
│   └── ScienceQA
│       └── train.json
│       └── test.json
├── Instruction_Type2
│   └── GQA
│       └── train.json
│       └── test.json

Instruction Tuning

First, downloading the pretrained projectors in LLaVA Model_Zoo.

Setting pretrain_mm_mlp_adapter to the projector path. You could modify the deepspeed config to change the deepspeed config.

We provide the scripts of our train order in scripts/*/Train. Note, the output_dir of the previous script is the previous_task_model_path of the next training process. Then, you could tune these datasets in your order.

We provide scripts for training MOELoRA with LLaVA in scripts/LLaVA/Train_MOE. Additionally, you can modify the code to train MiniGPT-V2 and Qwen-VL, following the example in lines 138-152 of ETrain/Models/LLaVA/utils.py.

Evaluation

We have prepared the scripts to evaluate the trained model in scripts/*/Eval.

These scripts will evalute the trained model and create the prompts (prompt_to_eval.json) for evaluating the general knowldege.

To evaluate the general knowldege, you could add the result path to scripts/Eval_GeneralKnowledge/eval_prompt_slim.sh and run it, this script file will output a score to indicate the general knowledge.

To Do

    • Evaluating on more MLLM, MiniGPT-4, MiniGPT-V2, InstrctBlip, Qwen-VL; MiniGPT-V2, Qwen-VL have been merged. In addition, since MiniGPT-4 and InstrctBlip are based on LAVIS resp, you can modify the config to train with these model.
    • [] Evaluating on different size of MLLM; We are conducting experiments with larger model, 13b llava.
    • [] Evaluating on full finetune.

Citation

@misc{chen2024coin,
    title={CoIN: A Benchmark of Continual Instruction tuNing for Multimodel Large Language Model}, 
    author={Cheng Chen and Junchen Zhu and Xu Luo and Hengtao Shen and Lianli Gao and Jingkuan Song},
    year={2024},
    eprint={2403.08350},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

Acknowledgement

LLaVA: the codebase we built upon, and our base model LLaVA-1.5-7b that has the amazing vision-language capabilities!

LAVIS: the codebase MiniGPT and InstructBlip are built upon.

MiniGPT: the codebase of MinigGPT and MinitGPT-v2.

coin's People

Contributors

zackschen avatar

Stargazers

Jinpeng Chen avatar  avatar satolab avatar Wangbo Zhao(明先生) avatar Shengming Yuan avatar Xinyu Liu avatar  avatar Yusong Hu avatar Chendi Ge avatar tenderzada avatar LiangJian avatar  avatar Czi. avatar Zackie avatar Xu Luo avatar shipeng avatar Xuanrui Lin avatar

Watchers

 avatar Jinpeng Chen avatar

coin's Issues

[General Knowledge Evaluation]

Question

I am currently working on setting up the environment for evaluating general knowledge and have encountered an issue with the path configuration that I hope you can help clarify.
I have configured the path in the '1_eval_sqa.sh' script as follows:

"#!/bin/bash

gpu_list="${CUDA_VISIBLE_DEVICES:-0}"
IFS=',' read -ra GPULIST <<< "$gpu_list"

CHUNKS=${#GPULIST[@]}

if [ ! -n "$1" ] ;then
STAGE='Finetune'
else
STAGE=$1
fi

if [ ! -n "$2" ] ;then
MODELPATH='./checkpoints/Instruction/Only_Pretrain_1.5/ScienceQA/llava-1.5-7b-lora'
else
MODELPATH=$2
fi

RESULT_DIR="llava/eval/CoIN/to_eval_prompt.txt"

for IDX in $(seq 0 $((CHUNKS-1))); do
CUDA_VISIBLE_DEVICES=${GPULIST[$IDX]} python -m llava.eval.CoIN.model_vqa_science
--model-path $MODELPATH
--model-base lmsys/vicuna-7b-v1.5
--question-file ./playground/Instruction_Type1/ScienceQA/test.json
--image-folder ./cl_dataset
--answers-file $RESULT_DIR/$STAGE/${CHUNKS}_${IDX}.jsonl
--num-chunks $CHUNKS
--chunk-idx $IDX
--temperature 0
--conv-mode vicuna_v1 &
done

wait

output_file=$RESULT_DIR/$STAGE/merge.jsonl

"$output_file"

for IDX in $(seq 0 $((CHUNKS-1))); do
cat $RESULT_DIR/$STAGE/${CHUNKS}_${IDX}.jsonl >> "$output_file"
done

python llava/eval/CoIN/eval_science_qa.py
--base-dir ./cl_dataset/ScienceQA
--result-file $output_file
--output-file $RESULT_DIR/$STAGE/output.jsonl
--output-result $RESULT_DIR/$STAGE/output_result.jsonl \

python llava/eval/CoIN/create_prompt.py
--rule llava/eval/CoIN/rule.json
--questions ./playground/Instruction_Type1/ScienceQA/test.json
--results $output_file "

python llava/eval/CoIN/evaluate_generalknowledege.py \

However, when executing this script, I receive the following error:
NotADirectoryError: [Errno 20] Not a directory: 'llava/eval/CoIN/to_eval_prompt.txt/Finetune/merge.jsonl'
error_ss

It seems there might be a misconfiguration with how the RESULT_DIR path is set. Could you please provide guidance on the correct way to set this path so that the script functions correctly for general knowledge evaluation?

关于imagenet数据集[Question]

Question

非常感谢您的开源代码!请问imagenet数据集是您随机取的100个类别的子集吗?根据您给出的Instructions_Type1/ImageNet/train.json文件,里面有129833张图片的instructions,这似乎和论文中提到的117715张对应不上。

missing file convert_gqa_for_eval.py

Describe the issue

Congratulations on your excellent work!

I am trying to reproduce the results of your work. It appears to miss convert_gqa_for_eval.py when evaluating eval_gqa.sh. Could you share the file?

Missing file testdev_balanced_questions.json

Describe the issue

I am in the process of replicating the results from your project. It seems that the testdev_balanced_questions.json file, which is required for the eval_gqa.py script, is not available in the repository.
Could you please provide this file or indicate where I might find it?

Some questions about experiment tables in paper [Question]

Thanks a lot for your contribution!
我在阅读您的论文时,对于实验表格的数据有一些疑问:
1.在table2的fintune行中,第一行的结果是指每个任务在原始的llava预训练模型上微调的结果,还是经过前几个任务微调后,再对该任务微调后结果?例如:第2个任务TextVQA的结果是49.99,请问是在经过ScienceQA任务微调后的模型上再进行微调,还是直接在原始预训练模型上微调得出的结果呢?
2.有关BWT的结果,论文的公式上是除以(T-1),但表格中的结果似乎是除以(T),这两个地方是否有一定的出入?
以上是我对于表格数据的一些疑问,希望能得到您的解答!

GPU和显存

Question

非常感谢你们提供的研究成果,读完深受启发。请问复现你们的工作需要多少计算资源,GPU的显存,内存等的大致需求是多少?

RuntimeError: Error(s) in loading state_dict for Sequential:

Question

Could you please advise on resolving the error I encountered while following the steps in the GitHub repository? I downloaded the liuhaotian/llava-v1.5-7b checkpoints and the Projector weights Vicuna-7B-v1.5 and specified the path in the 'Science.sh' script. However, upon execution, I encountered an error, which I have attached a screenshot of the error. How can I resolve this issue? Additionally, could you clarify which projector file and model checkpoint I should download from the LLaVA Model_Zoo?
Screenshot

Questions about task_num?

feature

Hello, I have a few questions to ask.

  1. What is the task_num setting in CoIN, is it 16 or 2 by default? In the MOELoRA-peft paper, the setting is the number of multitasks, so it is set to 8.
  2. Can you explain the role of task_num in CoIN?
  3. The dimension of lora is 64, why is it set to 64?

Thanks!

请教专家个数设置问题

Question

截屏2024-04-10 22 34 12

task_num表示专家模型的个数吗?

看到您没有放上zero_offload.json文件,是不是跟llava一样呢?

Question Regarding per_device_train_batch_size and gradient_accumulation_steps in LLaVA Training

Hi Zacks,

Thank you for this amazing work!

I have a question regarding the LLaVA training configuration across different tasks in the LLaVA project. I noticed that the per_device_train_batch_size is set to various values such as 6, 8, 14, and 16, and the gradient_accumulation_steps is consistently set to 8. These settings differ significantly from the default configuration in LLaVA's official finetune_lora.sh.

Could you please clarify whether these values were chosen due to GPU memory constraints, or were they specifically tuned to optimize performance? Additionally, if I have sufficient GPU memory, would you recommend sticking to these settings, or should I consider different configurations?

Thank you in advance for your guidance!

pip install -e .报错

Question

运行pip install -e .后,机器都访问不了网站了

Could not fetch URL https://mirrors.aliyun.com/pypi/simple/pip/: There was a problem confirming the ssl certificate: HTTPSConnectionPool(host='mirrors.aliyun.com', port=443): Max retries exceeded with url: /pypi/simple/pip/ (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate (_ssl.c:1007)'))) - skipping

Code of MoELoRA

Thanks for your new code! Is the new codebase compatible with the MoELoRA scripts from the old codebase?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.