Light

mea-lab-421 / awesome-visual-instruction-tuning Goto Github PK

View Code? Open in Web Editor NEW

This project forked from bradyfu/awesome-multimodal-large-language-models

0.0 0.0 0.0 68 KB

Latest Papers and Datasets on Visual Instruction Tuning

awesome-visual-instruction-tuning's Introduction

Awesome-Visual-Instruction-Tuning

🔥🔥🔥 A curated list of Visual Instruction Tuning.

Please feel free to pull requests or open an issue to add papers.

Table of Contents

Awesome Datasets
Awesome Papers

Awesome Datasets

Name	Paper	Link	Notes
LLaVA-Instruct-150K	Visual Instruction Tuning	Link	Multimodal instruction-following data generated by GPT
OwlEval	mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality	Link	Dataset for evaluation on multiple capabilities
MIMIC-IT	Otter: A Multi-Modal Model with In-Context Instruction Tuning	Coming soon	Multimodal in-context instruction tuning
PMC-VQA	PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering	Coming soon	Large-scale medical visual question-answering dataset
VideoChat	VideoChat: Chat-Centric Video Understanding	Link	Video-centric multimodal instruction dataset
cc-sbu-align	MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models	Link	Multimodal aligned dataset for improving model's usability and generation's fluency
X-LLM	X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages	Link	Chinese multimodal instruction dataset

Awesome Papers

Instruction-Tuning

Title	Venue	Date	Code	Demo
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks	arXiv	2023-05-18	Github	Demo
VisualGLM-6B	-	2023-05-17	Github	Local demo
PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering	arXiv	2023-05-17	Github	-
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning	arXiv	2023-05-11	Github	Local demo
VideoChat: Chat-Centric Video Understanding	arXiv	2023-05-10	Github	Demo
MultiModal-GPT: A Vision and Language Model for Dialogue with Humans	arXiv	2023-05-08	Github	Demo
X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages	arXiv	2023-05-07	Github	Demo
Otter: A Multi-Modal Model with In-Context Instruction Tuning	arXiv	2023-05-05	Github	Demo
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality	arXiv	2023-04-27	Github	Demo
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models	arXiv	2023-04-20	Github	Demo
Visual Instruction Tuning	arXiv	2023-04-17	GitHub	Demo

LLM-Controlled Models

Title	Venue	Date	Code	Demo
Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models	arXiv	2023-04-19	Github	Demo
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace	arXiv	2023-03-30	Github
MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action	arXiv	2023-03-20	Github	Demo
Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models	arXiv	2023-03-08	Github	Demo

References

Title	Venue	Date	Code	Demo
Evaluating Object Hallucination in Large Vision-Language Models	arXiv	2023-05-17	Github	-
Transfer Visual Prompt Generator across LLMs	arXiv	2023-05-02	Github	Demo
Flamingo: a Visual Language Model for Few-Shot Learning	NeurIPS	2023-04-29	Github
GPT-4 Technical Report	arXiv	2023-03-15	-	-
PaLM-E: An Embodied Multimodal Language Model	arXiv	2023-03-06	-	Demo
Language Is Not All You Need: Aligning Perception with Language Models	arXiv	2023-02-27	Github	-
Multimodal Chain-of-Thought Reasoning in Language Models	arXiv	2023-02-02	Github	-
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models	arXiv	2023-01-30	Github

awesome-visual-instruction-tuning's People

Contributors

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.