Giter Club home page Giter Club logo

awesome-multimodal-chatbot's Introduction

Awesome-Multimodal-Chatbot Awesome

Awesome Multimodal Assistant is a curated list of multimodal chatbots/conversational assistants that utilize various modes of interaction, such as text, speech, images, and videos, to provide a seamless and versatile user experience. It is designed to assist users in performing various tasks, from simple information retrieval to complex multimedia reasoning.

Multimodal Instruction Tuning

  • MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning

    arXiv 2022/12 [paper]

  • GPT-4

    arXiv 2023/03 [paper] [blog]

  • Visual Instruction Tuning Star

    arXiv 2023/04 [paper] [code] [project page] [demo]

  • MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models Star

    arXiv 2023/04 [paper] [code] [project page] [demo]

  • mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality Star

    arXiv 2023/04 [paper] [code] [demo]

  • LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model Star

    arXiv 2023/04 [paper] [code] [demo]

  • Video-LLaMA: An Instruction-Finetuned Visual Language Model for Video Understanding Star

    [code]

  • LMEye: An Interactive Perception Network for Large Language Models Star

  • arXiv 2023/05 [paper] [code]

  • MultiModal-GPT: A Vision and Language Model for Dialogue with Humans Star

    arXiv 2023/05 [paper] [code] [demo]

  • X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages Star

    arXiv 2023/05 [paper] [code] [project page]

  • Otter: A Multi-Modal Model with In-Context Instruction Tuning Star

    arXiv 2023/05 [paper] [code] [demo]

  • InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning Star

    arXiv 2023/05 [paper] [code]

  • InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language Star

    arXiv 2023/05 [paper] [code] [demo]

  • VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric TasksStar

    arXiv 2023/05 [paper] [code]

  • Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language ModelsStar

  • arXiv 2023/05 [paper] [code] [project page]

  • EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought Star

    arXiv 2023/05 [paper] [code] [project page]

  • DetGPT: Detect What You Need via Reasoning Star

    arXiv 2023/05 [paper] [code] [project page]

  • PathAsst: Redefining Pathology through Generative Foundation AI Assistant for Pathology Star

    arXiv 2023/05 [paper] [code]

  • ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst Star

    arXiv 2023/05 [paper] [code] [project page]

  • Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models Star

    arXiv 2023/06 [paper] [code]

  • LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark

    arXiv 2023/06 [paper]

  • Accountable Textual-Visual Chat Learns to Reject Human Instructions in Image Re-creation

    arXiv 2023/06 [paper] [project page]

  • VALLEY: VIDEO ASSISTANT WITH LARGE LANGUAGE MODEL ENHANCED ABILITY Star

    arXiv 2023/06 [paper] [code]

LLM-Based Modularized Frameworks

  • Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models Star

    arXiv 2023/03 [paper] [code] [demo]

  • ViperGPT: Visual Inference via Python Execution for Reasoning Star

    arXiv 2023/03 [paper] [code] [project page]

  • TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs Star

    arXiv 2023/03 [paper] [code]

  • Chatgpt asks, blip-2 answers: Automatic questioning towards enriched visual descriptions Star

    arXiv 2023/03 [paper] [code]

  • MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action Star

    arXiv 2023/03 [paper] [code] [project page] [demo]

  • Hugginggpt: Solving ai tasks with chatgpt and its friends in huggingface Star

    arXiv 2023/03 [paper] [code] [demo]

  • VLog: Video as a Long Document Star

    [code] [demo]

  • Video ChatCaptioner: Towards Enriched Spatiotemporal Descriptions Star

    arXiv 2023/04 [paper] [code]

  • ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System

    arXiv 2023/04 [paper] [project page]

  • VideoChat: Chat-Centric Video Understanding Star

    arXiv 2023/05 [paper] [code] [demo]

awesome-multimodal-chatbot's People

Contributors

zjr2000 avatar feielysia avatar ttengwang avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.