Giter Club home page Giter Club logo

ape's Introduction

APE

Implementation of Not All Features Matter:Enhancing Few-shot CLIP with Adaptive Prior Refinement, a few-shot framework designed for CLIP and other vision-language models.

Introduction

Adaptive Prior rEfinement (APE) is a new method for few-shot CLIP, which achieves superior accuracy with high computational efficiency. Via a prior refinement module, we analyze the inter-class disparity in the downstream data and decouple the domain-specific knowledge from the CLIP-extracted cache model. On top of that, we introduce two model variants, a training-free APE and a training-required APE-T. We explore the trilateral affinities between the test image, prior cache model, and textual representations, and only enable a lightweight category-residual module to be trained. For the average accuracy over 11 benchmarks, both APE and APE-T attain SOTA.

Requirements

Installation

Create a conda environment and install dependencies:

conda create -n APE python=3.7
conda activate APE

pip install -r requirements.txt

# Install the according versions of torch and torchvision
conda install pytorch torchvision cudatoolkit

Dataset

Follow DATASET.md to install ImageNet and other 10 datasets referring to CoOp.

Get Started

In this code, we seperate the feature extraction and model inference. The features should be extracted first, then conduct model inference.

Extracting Features

We recommend users directly download the features from this google drive.

After setting up the datasets and downloading the features, the project root folder should look like this:

APE/
|–– caches
|–––– caltech101
|–––– dtd
|–––– eurosat
|–––– ... 8 other datasets' features
|–– clip
|–– configs
|–– data
|–––– caltech101
|–––– ... 10 other datasets
|–– datasets
|–––– caltech101.yaml
|–––– ... 10 other yamls
|–– gpt3_prompts
|–––– CuPL_prompts_caltech101.json
|–––– ... 10 other dataset json files
|–– extract_features.py
|–– main.py
|–– utils.py
|–– README.md

Or you can extract the features by youself by running

CUDA_VISIBLE_DEVICES=0 python extract_features.py

to extract the features of all 11 datasets. Including the few-shot training set representation, validation and test set representation, and textual representation.

Trying APE and APE-T

By running

CUDA_VISIBLE_DEVICES=0 python main.py --config configs/[dataset_name].yaml --shot [shot_number]

users can test the proposed APE and APE-T, where you can change the dataset_name and shot_number. dataset_name should be one of [caltech101, dtd, eurosat, fgvc, food101, imagenet, oxford_flowers, oxford_pets, stanford_cars, sun397, ucf101], and shot_number is chosen from 1/2/4/8/16.

Acknowledgements

We build on several previous well-maintained repositories like TIP-Adapter, CLIP, and CoOp, SuS-X, and CuPL. We thank the authors for providing such amazing code, and enabling further research towards better vision-language model adaptation.

Reference

Not All Features Matter:Enhancing Few-shot CLIP with Adaptive Prior Refinement

@misc{zhu2023features,
      title={Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior Refinement}, 
      author={Xiangyang Zhu and Renrui Zhang and Bowei He and Aojun Zhou and Dong Wang and Bin Zhao and Peng Gao},
      year={2023},
      eprint={2304.01195},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

ape's People

Contributors

yangyangyang127 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.