Giter Club home page Giter Club logo

dmn's Introduction

Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models

This repository provides the official PyTorch implementation of our CVPR 2024 paper:

[Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models](https://github.com/YBZh/DMN
Authors: Yabin Zhang, Wenjie Zhu, Hui Tang, Zhiyuan Ma, Kaiyang Zhou, Lei Zhang

Overview

This repository contains the implementation of DMN for image classification with a pre-trained CLIP. We consider four task settings:

  • Zero-shot classification in a test-time adaptation manner
  • Few-shot classification
  • Training-free few-shot classification
  • Out-of-distribution generalization

Results on ImageNet dataset under different task settings.

The overall framework of our DMN.

Prerequisites

Hardware

This implementation is for the single-GPU configuration. All experiments can be reproduced on a GPU with more than 10GB memory (e.g., 1080Ti)!

Environment

The code is tested on PyTorch 1.13.1.

Datasets

We suggest downloading all datasets to a root directory (${data_root}), and renaming the directory of each dataset as suggested in ${ID_to_DIRNAME} in ./data/datautils.py. This would allow you to evaluate multiple datasets within the same run.
If this is not feasible, you could evaluate different datasets separately, and change the ${data_root} accordingly in the bash script.

For zero/few-shot classification, we consider 11 datasets:

For out-of-distribution generalization, we consider 4 datasets:

Run DMN

We provide a simple bash script under ./scripts/run.sh. You can modify the paths and other args in the script. One can easily reproduce all results by:

bash ./scripts/run.sh

For simplicity, we use set_id to denote different datasets. A complete list of set_id can be found in ${ID_to_DIRNAME} in ./data/datautils.py.

Main Results

Zero-shot Classification

Few-shot Classification

Few-shot classification results on 11 datasets with a VITB/16 image encoder.

Out-of-Distribution Generalization

Method ImageNet(IN) IN-A IN-V2 IN-R IN-Sketch Average OOD Average
CLIP-RN50 58.16 21.83 51.41 56.15 33.37 44.18 40.69
Ensembled prompt 59.81 23.24 52.91 60.72 35.48 46.43 43.09
CoOp 63.33 23.06 55.40 56.60 34.67 46.61 42.43
CoCoOp 62.81 23.32 55.72 57.74 34.48 46.81 42.82
TPT 60.74 26.67 54.70 59.11 35.09 47.26 43.89
DMN-ZS 63.87 28.57 56.12 61.44 39.84 49.97 46.49

Citation

If you find our code useful or our work relevant, please consider citing:

@inproceedings{zhang2024dual,
  title={Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models},
  author={Zhang, Yabin and Zhu, Wenjie and Tang, Hui and Ma, Zhiyuan and Zhou, Kaiyang and Zhang, Lei},
  booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
  year={2024}
}

Acknowledgements

We thank the authors of CoOp/CoCoOp and TPT for their open-source implementation and instructions on data preparation.

dmn's People

Contributors

ybzh avatar tianjiao-j avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.