Giter Club home page Giter Club logo

cafo's Introduction

Prompt, Generate, then Cache

Official implementation of 'Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners'.

The paper has been accepted by CVPR 2023 ๐Ÿ”ฅ.

News

Introduction

We propose CaFo, a Cascade of Foundation models that incorporates diverse prior knowledge of various pre-trianing paradigms for better few-shot learning, including CLIP, DINO, DALL-E, and GPT-3. Specifically, CaFo works by `Prompt, Generate, then Cache'. We leverage GPT-3 to prompt CLIP with rich linguistic semantics and generate synthetic images via DALL-E to expand the few-shot training data. Then, we introduce a learnable cache model to adaptively blend the predictions from CLIP and DINO. By such collaboration, CaFo can fully unleash the potential of different pre-training methods and unify them to perform state-of-the-art for few-shot classification.

Requirements

Installation

Create a conda environment and install dependencies:

git clone https://github.com/ZrrSkywalker/CaFo.git
cd CaFo

conda create -n cafo python=3.7
conda activate cafo

pip install -r requirements.txt

# Install the according versions of torch and torchvision
conda install pytorch torchvision cudatoolkit

Dataset

Please follow DATASET.md to download official ImageNet and other 10 datasets.

Foundation Models

  • The pre-tained weights of CLIP will be automatically downloaded by running.
  • The prompts produced by GPT-3 have been stored at gpt_file/.
  • Please download DINO's pre-trained ResNet-50 from here, and put it under dino/.
  • Please download DALL-E's generated images from here, and organize them with the official datasets like
$DATA/
|โ€“โ€“ imagenet/
|โ€“โ€“ caltech-101/
|โ€“โ€“ oxford_pets/
|โ€“โ€“ ...
|โ€“โ€“ dalle_imagenet/
|โ€“โ€“ dalle_caltech-101/
|โ€“โ€“ dalle_oxford_pets/
|โ€“โ€“ ...
|โ€“โ€“ sd_caltech-101/
  • For Caltech-101 dataset, we also provide Stable Diffusion's images from here, and ChatGPT's prompts in gpt_file/.

Get Started

Configs

The running configurations for different [dataset] with [k] shots can be modified in configs/[dataset]/[k]shot.yaml, including visual encoders and hyperparamters. We have provided the configurations for reproducing the results in the paper. You can edit the search_scale, search_step, init_beta and init_alpha for fine-grained tuning and better results.

Note that the default load_cache and load_pre_feat are False for the first running, which will store the cache model and val/test features in configs/dataset/. For later running, they can be set as True for faster hyperparamters tuning.

For Caltech101 dataset, the config of Stable Diffusion's images and ChatGPT's prompts is respectively in configs/sd_caltech101 and configs/chat_caltech101.

Running

For 16-shot ImageNet dataset:

CUDA_VISIBLE_DEVICES=0 python main_imagenet.py --config configs/imagenet/16shot.yaml

For other 10 datasets:

CUDA_VISIBLE_DEVICES=0 python main.py --config configs/dataset/16shot.yaml

Numerical Results

We provide CaFo's numerical results on 11 datasets from 1 to 16 shots at exp_Cafo.log. The results for Tip-Adapter and Tip-Adapter-F is at exp_Tip.log.

Acknowledgement

This repo benefits from Tip-Adapter, CLIP, DINO, DALL-E and CuPL. Thanks for their wonderful works.

Citation

@article{zhang2023prompt,
  title={Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners},
  author={Renrui Zhang and Xiangfei Hu and Bohao Li and Siyuan Huang and Hanqiu Deng and Hongsheng Li and Yu Qiao and Peng Gao},
  journal={arXiv preprint arXiv:2303.02151},
  year={2023}
}

Contributors

Renrui Zhang, Xiangfei Hu, Bohao Li

Contact

If you have any question about this project, please feel free to contact [email protected] and [email protected].

cafo's People

Contributors

zrrskywalker avatar hxf42 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.