Giter Club home page Giter Club logo

Comments (7)

TobiasLee avatar TobiasLee commented on August 10, 2024

We did not explore the DPO with LLaVA models. Could you share your results and example outputs before/after DPO so we can dig into it?

from vlfeedback.

thusharakart avatar thusharakart commented on August 10, 2024

The following are the results for MME benchmark.

MME score { perception, cognition, ocr }
LLaVA-v1.5-7B with DPO {1342, 313, 125}
LLaVA-v1.5-13B with DPO {1425, 312, 130}

from vlfeedback.

TobiasLee avatar TobiasLee commented on August 10, 2024

How many epochs have your trained with DPO?

from vlfeedback.

thusharakart avatar thusharakart commented on August 10, 2024

Above results are from 1 epoch training for 7B model and 3 epoch training for 13B model.

from vlfeedback.

TobiasLee avatar TobiasLee commented on August 10, 2024

I'm sorry for not getting back to you sooner. We also recently explored performing DPO training on the LLaVA backbone and observed degraded MME performance. However, the scores on other benchmarks have consistently improved.

Model MM-Vet MMHal MMBench
LLaVA-v1.5-7B 30.5 2.42 63.0
LLaVA-v1.5-7B + DPO 31.7 2.62 63.9

We attribute that the simple answer format required by MME cannot be followed by the model after DPO training, and would like to investigate it later.

from vlfeedback.

kasoushu avatar kasoushu commented on August 10, 2024

may be you can add a prompt like this query = f'<img>{img_path}</img>\n{question} you can only use "Yes" or "No" as your responses without adding any extra text or explanation.

from vlfeedback.

TobiasLee avatar TobiasLee commented on August 10, 2024

Hi all, we found a great repo with the support/results of many other models: https://github.com/TideDra/VL-RLHF

The performance can be boosted almost consistently for LLaVA-Next series models. So my guess is that the current LLaVA-v1.5 series model is too weak to serve as a starting model for DPO ( possibly due to its lower resolution 336 v.s. Qwen-VL). LLaVA-Next series is more powerful with the image tiling mechanism.

Check it out if you want to further explore the DPO/RLHF with VLFeedback!

from vlfeedback.

Related Issues (7)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.