Light

vlf-silkie / vlfeedback Goto Github PK

View Code? Open in Web Editor NEW

69.0 2.0 2.0 1.59 MB

Python 100.00%

vlfeedback's Introduction

VLFeedback

A GPT-4V annotated preference dataset for large vision language models.

[Project Page] [Datasets] [Silkie Model] [Paper]

Annotation Framework

Multimodal Instruciton Source

The instructions are sampled from various domains to cover different capabilities of LVLMs

Model Pool

We construct a model pool consists of 12 LVLMs, including

GPT-4V
LLaVA-series
- LLaVA-v1.5-7B
- LLaVA-v1.5-13B
- LLaVA-RLHF-7b-v1.5-224
- LLaVA-RLHF-13b-v1.5-336
Qwen-VL-7B
IDEFICS-9b-Instruct
Fuyu-8B
InstructBLIP-serise
- InstructBLIP-Vicuna-7B
- InstructBLIP-Vicuna-13B
VisualGLM-6B
MMICL-Vicuna-13B

Silkie

We select Qwen-VL-Chat as the backbone model and perform DPO on our dataset.

Generated by DALL·E 3

The resulting model, Silkie, achieves comprehensive improvements on various benchmarks

Installation

To run our training scripts, create a virtual environment and install the dependencies first.

conda create -n silkie python=3.10  && conda activate silkie
pip install -r requirements.txt

Training

Our training scripts support both single-node and multi-node training. We provide a launch_dpo.py script that handles both cases. If you want to launch a job locally, you can use:

python launch_dpo.py --config dpo_config/example.yaml --working $WORKING_DIR

If you want to launch a job on a Slurm cluster, specify GPUS_PER_NODE in launch_dpo.py and run:

python launch_dpo.py --config dpo_config/example.yaml --working $WORKING_DIR --gpus $NUM_GPUS

Citations

@article{2023vlfeedback,
  author      = {Lei Li and Zhihui Xie and Mukai Li and Shunian Chen and Peiyi Wang and Liang Chen and  Yazheng Yang and  Benyou Wang and  Lingpeng Kong},
  title       = {Silkie: Preference Distillation for Large Visual Language Models},
  publisher   = {arXiv:2312.10665},
  year        = {2023}
}

Acknowledgements

We would like to thank the authors of trl and Qwen-VL for their great work.

vlfeedback's People

Contributors

Stargazers

Watchers

Forkers

hertera1 lzy-the-boys

vlfeedback's Issues

ValueError: 151859 is not in list

image = image[ : image.index(self.config.visual['image_start_id'] + 2)]
ValueError: 151859 is not in list

使用原始的数据和代码，一直报这个错，可以看下吗

Impact of Including GPT-4V in LVLM Pool?

First and foremost, thank you for writing this paper; it was very intriguing and informative. I have a question that arose during my reading.

What are the conceptual benefits when the supervisor model (GPT-4V) is included in the LVLM pool? Wouldn't this approach inherently bias the outcomes towards the decisions of GPT-4V? If so, how does the ensemble benefit in this scenario?

The prompt of gpt4v.

Great work!
Could you share the prompt you used for prompting gpt4v?

Is it normal for the loss to converge to 0.3 at the end of training?

Is it normal for the loss to converge to 0.3 at the end of training? Could you provide the training log files? This would help me to compare and verify the correctness of my reproduced experiments."

Cannot reproduce MM-Vet score

Hi, I try to reproduce your results, and the MME score and MMHal-Bench score I got is roughly consistent with the results you report in the paper, but the MM-Vet score I got is 48.2, while your result is 49.9. Moreover, the MM-Vet score of the raw Qwen-VL-Chat baseline I got is also 48.2, which means this score is not improved after dpo, while your baseline score is 45.7.
I'm using the latest Qwen-VL-Chat checkpoint and your raw codebase. I wonder what causes the difference of MM-Vet score of both baseline model and dpo model. Thanks!

DPO performance on other models

Do you have data on the performance of DPO with models other than Qwen-VL-Chat? I found that it degrades both perception and cognition in MME when used with LLaVA-1.5.

How could I specify the GPU index for the DPO training?

How could I specify the GPU index for the DPO training? I have tried using CUDA_VISIBLE_DEVICES=2,3,4,5 as well as appending localhost: 2,3,4,5 in the command, but none of these methods seem to work.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.