Giter Club home page Giter Club logo

vlfeedback's Introduction

VLFeedback

A GPT-4V annotated preference dataset for large vision language models.

[Project Page] [Datasets] [Silkie Model] [Paper]

Annotation Framework

Multimodal Instruciton Source

The instructions are sampled from various domains to cover different capabilities of LVLMs

Model Pool

We construct a model pool consists of 12 LVLMs, including

  • GPT-4V
  • LLaVA-series
    • LLaVA-v1.5-7B
    • LLaVA-v1.5-13B
    • LLaVA-RLHF-7b-v1.5-224
    • LLaVA-RLHF-13b-v1.5-336
  • Qwen-VL-7B
  • IDEFICS-9b-Instruct
  • Fuyu-8B
  • InstructBLIP-serise
    • InstructBLIP-Vicuna-7B
    • InstructBLIP-Vicuna-13B
  • VisualGLM-6B
  • MMICL-Vicuna-13B

Silkie

We select Qwen-VL-Chat as the backbone model and perform DPO on our dataset.

Silkie Logo

Generated by DALL·E 3

The resulting model, Silkie, achieves comprehensive improvements on various benchmarks

Installation

To run our training scripts, create a virtual environment and install the dependencies first.

conda create -n silkie python=3.10  && conda activate silkie
pip install -r requirements.txt

Training

Our training scripts support both single-node and multi-node training. We provide a launch_dpo.py script that handles both cases. If you want to launch a job locally, you can use:

python launch_dpo.py --config dpo_config/example.yaml --working $WORKING_DIR

If you want to launch a job on a Slurm cluster, specify GPUS_PER_NODE in launch_dpo.py and run:

python launch_dpo.py --config dpo_config/example.yaml --working $WORKING_DIR --gpus $NUM_GPUS

Citations

@article{2023vlfeedback,
  author      = {Lei Li and Zhihui Xie and Mukai Li and Shunian Chen and Peiyi Wang and Liang Chen and  Yazheng Yang and  Benyou Wang and  Lingpeng Kong},
  title       = {Silkie: Preference Distillation for Large Visual Language Models},
  publisher   = {arXiv:2312.10665},
  year        = {2023}
}

Acknowledgements

We would like to thank the authors of trl and Qwen-VL for their great work.

vlfeedback's People

Contributors

zhxieml avatar tobiaslee avatar vlf-silkie avatar

Stargazers

Feng Xiong avatar Mohammad Reza Taesiri avatar  avatar Tianyi Xiong avatar Jiacheng Ruan avatar Sangmin Woo avatar  avatar Xiaodong Wang avatar JJ Jiang avatar Tal Yarkoni avatar  avatar  avatar Patrick Barker avatar Shilin Xu avatar HXH avatar Zhihao Fan avatar Yuhang Zhou avatar YUAN, Zhihao avatar 이루리 avatar Nate Nethercott avatar Timsty avatar Czi. avatar Mukai Li avatar  avatar Zhenwei avatar Yihao Feng avatar Jeff Carpenter avatar Yaoyuan Liang avatar yangchao avatar Yue Yu avatar Shu avatar 林豪佳 avatar Renjie PI avatar  avatar Peiyi Wang avatar Zhao Zhang avatar Lu Ming avatar  avatar  avatar zqh avatar rain avatar Andrew Zhao avatar Andrew Chan avatar Yanfu Ren avatar cnxup avatar Tao Xijia avatar cb_Lian avatar waby avatar Vishaal Udandarao avatar Joshua Hunt avatar Zhenyi Lu avatar darkpromise avatar Wind Marx avatar Jian avatar  avatar Wizyoung avatar  avatar  avatar 爱可可-爱生活 avatar Shyam Sudhakaran avatar Nenad Mandic avatar Xuchen Li (李旭宸) avatar Sami Nas avatar Yiyang Zhou avatar Hakeem Demi avatar Haoqin Tu avatar Liang Chen avatar  avatar GeneZC avatar

Watchers

 avatar  avatar

vlfeedback's Issues

ValueError: 151859 is not in list

image = image[ : image.index(self.config.visual['image_start_id'] + 2)]
ValueError: 151859 is not in list

image
image

使用原始的数据和代码,一直报这个错,可以看下吗

Impact of Including GPT-4V in LVLM Pool?

First and foremost, thank you for writing this paper; it was very intriguing and informative. I have a question that arose during my reading.

What are the conceptual benefits when the supervisor model (GPT-4V) is included in the LVLM pool? Wouldn't this approach inherently bias the outcomes towards the decisions of GPT-4V? If so, how does the ensemble benefit in this scenario?

Cannot reproduce MM-Vet score

Hi, I try to reproduce your results, and the MME score and MMHal-Bench score I got is roughly consistent with the results you report in the paper, but the MM-Vet score I got is 48.2, while your result is 49.9. Moreover, the MM-Vet score of the raw Qwen-VL-Chat baseline I got is also 48.2, which means this score is not improved after dpo, while your baseline score is 45.7.
I'm using the latest Qwen-VL-Chat checkpoint and your raw codebase. I wonder what causes the difference of MM-Vet score of both baseline model and dpo model. Thanks!

DPO performance on other models

Do you have data on the performance of DPO with models other than Qwen-VL-Chat? I found that it degrades both perception and cognition in MME when used with LLaVA-1.5.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.