Giter Club home page Giter Club logo

awesome_prompting_papers_in_computer_vision's Introduction

Awesome Prompting Papers in Computer Vision

Introduction

A curated list of prompt-based papers in computer vision and vision-language learning.

Keywords:

  • Task tag, e.g.,
  • Abbreviation tag, e.g.,
  • Characteristic tag: Some characteristic makes this paper unique, e.g.,
  • Bold font: We highlight some pilot work that may contribute to the prevalence of visual prompting.

Prompt Learning

This section contains papers designing prompt (containing adapter) modules for parameter-efficient adaptation of foundation models.

Vision Prompt

  • Learning to Prompt for Continual Learning [pdf] [code]

    CVPR 2022

  • Visual Prompt Tuning [pdf] [code]

    ECCV 2022

  • Exploring Visual Prompts for Adapting Large-Scale Models [pdf] [code]

    Arxiv 2022/03

  • DualPrompt: Complementary Prompting for Rehearsal-free Continual Learning [pdf] [code]

    ECCV 2022

  • AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition [pdf] [code]

    Arxiv 2022/05

  • Vision Transformer Adapter for Dense Predictions [pdf] [code]

    Arxiv 2022/05

  • Neural Prompt Search [pdf] [code]

    Arxiv 2022/06

  • Convolutional Bypasses Are Better Vision Transformer Adapters [pdf] [code]

    Arxiv 2022/07

  • Conv-Adapter: Exploring Parameter Efficient Transfer Learning for ConvNets [pdf]

    Arxiv 2022/08

  • Prompt Vision Transformer for Domain Generalization [pdf]

    Arxiv 2022/08

  • Visual Prompting via Image Inpainting [pdf] [code]

    Arxiv 2022/09

Vision-Language Prompt

  • Learning Transferable Visual Models From Natural Language Supervision [pdf] [code]

    ICML 2021

  • Learning to Prompt for Vision-Language Models [pdf] [code]

    IJCV 2022

  • Prompt Distribution Learning [pdf]

    CVPR 2022

  • Conditional Prompt Learning for Vision-Language Models [pdf] [code]

    CVPR 2022

  • DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting [pdf] [code]

    CVPR 2022

  • Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos [pdf] [code]

    CVPR 2022

  • PointCLIP: Point Cloud Understanding by CLIP [pdf] [code]

    CVPR 2022

  • VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks [pdf] [code]

    CVPR 2022

  • Can Language Understand Depth? [pdf] [code]

    ACM MM 2022

  • Expanding Language-Image Pretrained Models for General Video Recognition [pdf] [code]

    ECCV 2022

  • Tip-Adapter: Training-free Adaption of CLIP for Few-shot Classification [pdf] [code]

    ECCV 2022

  • Colorful Prompt Tuning for Pre-trained Vision-Language Models [pdf]

    Arxiv 2021/08

  • ActionCLIP: A New Paradigm for Video Action Recognition [pdf] [code]

    Arxiv 2021/09

  • CLIP-Adapter: Better Vision-Language Models with Feature Adapters [pdf] [code]

    Arxiv 2021/10

  • Amortized Prompt: Lightweight Fine-Tuning for CLIP in Domain Generalization [pdf]

    Arxiv 2021/11

  • Prompting Visual-Language Models for Efficient Video Understanding [pdf] [code]

    Arxiv 2021/12 task task task

  • Unsupervised Prompt Learning for Vision-Language Models [pdf] [code]

    Arxiv 2022/04 task

  • Prompt-aligned Gradient for Prompt Tuning [pdf] [code]

    Arxiv 2022/05

  • Parameter-Efficient Image-to-Video Transfer Learning [pdf]

    Arxiv 2022/06 task

  • DualCoOp: Fast Adaptation to Multi-Label Recognition with Limited Annotations [pdf]

    Arxiv 2022/06 task

  • Rethinking the Openness of CLIP [pdf]

    Arxiv 2022/06

  • OrdinalCLIP: Learning Rank Prompts for Language-Guided Ordinal Regression [pdf]

    Arxiv 2022/06

  • Prompt Tuning for Generative Multimodal Pretrained Models [pdf] [code]

    Arxiv 2022/06

  • Prompt Tuning with Soft Context Sharing for Vision-Language Models [pdf]

    Arxiv 2022/08

  • Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models [pdf] [code]

    Arxiv 2022/09

Language-Interactable Prompt

Language-interactable prompter develops few/zero-shot capabilities by prompting one/several independent foundational models (VLMs, LMs, VMs, etc.) with the language interface.

  • Multimodal Few-Shot Learning with Frozen Language Models [pdf]

    NIPS 2021

  • An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA [pdf] [code]

    AAAI 2022

  • A Good Prompt Is Worth Millions of Parameters? Low-resource Prompt-based Learning for Vision-Language Models [pdf]

    ACL 2022

  • VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning [pdf] [code]

    CVPR 2022

  • ClipCap: CLIP Prefix for Image Captioning [pdf] [code]

    Arxiv 2021/11

  • Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language [pdf] [code]

    Arxiv 2022/04

  • Flamingo: a Visual Language Model for Few-Shot Learning [pdf]

    Arxiv 2022/04

  • Language Models Can See: Plugging Visual Controls in Text Generation [pdf] [code]

    Arxiv 2022/05

  • Zero-Shot Video Question Answering via Frozen Bidirectional Language Models [pdf]

    Arxiv 2022/06

  • Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning [pdf]

    Arxiv 2022/06

Application of Prompt

This section contains awesome papers using the prompt module as tools, like papers using prompts for pretraining or specific applications.

  • Unifying Vision-and-Language Tasks via Text Generation [pdf] [code]

    ICML 2021

  • StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery [pdf] [code]

    ICCV 2021

  • Grounded Language-Image Pre-training [pdf] [code]

    CVPR 2022

  • Align and Prompt: Video-and-Language Pre-training with Entity Prompts [pdf] [code]

    CVPR 2022

  • GroupViT: Semantic Segmentation Emerges from Text Supervision [pdf] [code]

    CVPR 2022

  • Unified Multimodal Pretraining and Prompt-based Tuning for Vision-Language Understanding and Generation [pdf]

    Arxiv 2021/12

  • Discovering Bugs in Vision Models using Off-the-shelf Image Generation and Captioning [pdf]

    Arxiv 2022/08

Other Resources

  • PromptPapers: A comprehensive curated list for prompting papers (mainly in natural language processing)
  • Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing [pdf] Arxiv 2021/07

awesome_prompting_papers_in_computer_vision's People

Contributors

baoshuo avatar byakuya-zi avatar nbl97 avatar renshuhuai-andy avatar ttengwang avatar zhangyuanhan-ai avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.