Giter Club home page Giter Club logo

davit's Introduction

DaViT: Dual Attention Vision Transformer

This repo contains the official detection and segmentation implementation of paper "DaViT: Dual Attention Vision Transformer", by Mingyu Ding, Bin Xiao, Noel Codella, Ping Luo, Jingdong Wang, and Lu Yuan.

The official implementation for image classification will be released in https://github.com/microsoft/DaViT.

Getting Started

Python3, PyTorch>=1.8.0, torchvision>=0.7.0 are required for the current codebase.

# An example on CUDA 10.2
pip install torch===1.9.0+cu102 torchvision===0.10.0+cu102 torchaudio===0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
pip install thop pyyaml fvcore pillow==8.3.2

Object Detection and Instance Segmentation

  • cd mmdet & install mmcv/mmdet

    # An example on CUDA 10.2 and pytorch 1.9
    pip install mmcv-full==1.3.0 -f https://download.openmmlab.com/mmcv/dist/cu102/torch1.9.0/index.html
    pip install -r requirements/build.txt
    pip install -v -e .  # or "python setup.py develop"
  • mkdir data & Prepare the dataset in data/coco/ (Format: ROOT/mmdet/data/coco/annotations, train2017, val2017)

  • Finetune on COCO

    bash tools/dist_train.sh configs/davit_retinanet_1x_coco.py 8 \
    --cfg-options model.pretrained=PRETRAINED_MODEL_PATH

Semantic Segmentation

  • cd mmseg & install mmcv/mmseg

    # An example on CUDA 10.2 and pytorch 1.9
    pip install mmcv-full==1.3.0 -f https://download.openmmlab.com/mmcv/dist/cu102/torch1.9.0/index.html
    pip install -e .
  • mkdir data & Prepare the dataset in data/ade/ (Format: ROOT/mmseg/data/ADEChallengeData2016)

  • Finetune on ADE

    bash tools/dist_train.sh configs/upernet_davit_512x512_160k_ade20k.py 8 \
    --options model.pretrained=PRETRAINED_MODEL_PATH
  • Multi-scale Testing

    bash tools/dist_test.sh configs/upernet_davit_512x512_160k_ade20k.py \ 
    TRAINED_MODEL_PATH 8 --aug-test --eval mIoU

Benchmarking

Image Classification on ImageNet-1K

Model Pretrain Resolution acc@1 acc@5 #params FLOPs Checkpoint Log
DaViT-T IN-1K 224 82.8 96.2 28.3M 4.5G download log
DaViT-S IN-1K 224 84.2 96.9 49.7M 8.8G download log
DaViT-B IN-1K 224 84.6 96.9 87.9M 15.5G download log

Object Detection and Instance Segmentation on COCO

Backbone Pretrain Lr Schd #params FLOPs box mAP mask mAP Checkpoint Log
DaViT-T ImageNet-1K 1x 47.8M 263G 45.0 41.1 download log
DaViT-T ImageNet-1K 3x 47.8M 263G 47.4 42.9 download log
DaViT-S ImageNet-1K 1x 69.2M 351G 47.7 42.9 download log
DaViT-S ImageNet-1K 3x 69.2M 351G 49.5 44.3 download log
DaViT-B ImageNet-1K 1x 107.3M 491G 48.2 43.3 download log
DaViT-B ImageNet-1K 3x 107.3M 491G 49.9 44.6 download log
Backbone Pretrain Lr Schd #params FLOPs box mAP Checkpoint Log
DaViT-T ImageNet-1K 1x 38.5M 244G 44.0 download log
DaViT-T ImageNet-1K 3x 38.5M 244G 46.5 download log
DaViT-S ImageNet-1K 1x 59.9M 332G 46.0 download log
DaViT-S ImageNet-1K 3x 59.9M 332G 48.2 download log
DaViT-B ImageNet-1K 1x 98.5M 471G 46.7 download log
DaViT-B ImageNet-1K 3x 98.5M 471G 48.7 download log

Semantic Segmentation on ADE20K

Backbone Pretrain Method Resolution Iters #params FLOPs mIoU Checkpoint Log
DaViT-T ImageNet-1K UPerNet 512x512 160k 60M 940G 46.3 download log
DaViT-S ImageNet-1K UPerNet 512x512 160k 81M 1030G 48.8 download log
DaViT-B ImageNet-1K UPerNet 512x512 160k 121M 1175G 49.4 download log

Citation

If you find this repo useful to your project, please consider citing it with following bib:

@article{ding2022davit,
    title={DaViT: Dual Attention Vision Transformer}, 
    author={Ding, Mingyu and Xiao, Bin and Codella, Noel and Luo, Ping and Wang, Jingdong and Yuan, Lu},
    journal={arXiv preprint arXiv:2204.03645},
    year={2022},
}

Acknowledgement

Our codebase is built based on timm, MMDetection, MMSegmentation. We thank the authors for the nicely organized code!

davit's People

Contributors

dingmyu avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.