Giter Club home page Giter Club logo

bagoftricks-lt's Introduction

Bag of tricks for long-tailed visual recognition with deep convolutional neural networks

This repository is the official PyTorch implementation of Bag of Tricks for Long-Tailed Visual Recognition with Deep Convolutional Neural Networks, which provides practical and effective tricks used in long-tailed image classification.

Development log

Previous logs
  • 2021-04-24 - Add the validation running command, which loads a trained model, then returns the validation acc and a corresponding confusion matrix figure. See Usage in this README for details.
  • 2021-04-24 - Add classifier-balancing and corresponding experiments in Two-stage training in trick_gallery.md, including $\tau$-normalization, cRT and LWS.
  • 2021-04-23 - Add CrossEntropyLabelAwareSmooth (label-aware smoothing, CVPR 2021) in trick_gallery.md.
  • 2021-04-22 - Add one option (TRAIN.APEX) in config.py, so you can set TRAIN.APEX to False for training without using apex.
  • 2021-02-19 - Test and add the results of two-stage training in trick_gallery.md.
  • 2021-01-11 - Add a mixup related method: Remix, ECCV 2020 workshop.
  • 2021-01-10 - Add CDT (class-dependent temparature), arXiv 2020, BSCE (balanced-softmax cross-entropy), NeurIPS 2020, and support a smooth version of cost-sensitive cross-entropy (smooth CS_CE), which add a hyper-parameter $ \gamma$ to vanilla CS_CE. In smooth CS_CE, the loss weight of class i is defined as: $(\frac{N_{min}}{N_i})^\gamma$, where $\gamma \in [0, 1]$, $N_i$ is the number of images in class i. We can set $\gamma = 0.5$ to get a square-root version of CS_CE.
  • 2021-01-05 - Add SEQL (softmax equalization loss), CVPR 2020.
  • 2021-01-02 - Add LDAMLoss, NeurIPS 2019, and a regularization method: label smooth cross-entropy, CVPR 2016.
  • 2020-12-30 - Add codes of torch.nn.parallel.DistributedDataParallel. Support apex in both torch.nn.DataParallel and torch.nn.parallel.DistributedDataParallel.
  • Trick gallery

    Brief inroduction

    We divided the long-tail realted tricks into four families: re-weighting, re-sampling, mixup training, and two-stage training. For more details of the above four trick families, see the original paper.

    Detailed information :

    • Trick gallery:

      Tricks, corresponding results, experimental settings, and running commands are listed in trick_gallery.md.

    Main requirements

    torch >= 1.4.0
    torchvision >= 0.5.0
    tensorboardX >= 2.1
    tensorflow >= 1.14.0 #convert long-tailed cifar datasets from tfrecords to jpgs
    Python 3
    apex
    • We provide the detailed requirements in requirements.txt. You can run pip install requirements.txt to create the same running environment as ours.
    • The apex is recommended to be installed for saving GPU memories:
    pip install -U pip
    git clone https://github.com/NVIDIA/apex
    cd apex
    pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
    • If the apex is not installed, the Distributed training with DistributedDataParallel in our codes cannot be used.

    Preparing the datasets

    We provide three datasets in this repo: long-tailed CIFAR (CIFAR-LT), long-tailed ImageNet (ImageNet-LT), and iNaturalist 2018 (iNat18).

    The detailed information of these datasets are shown as follows:

    Datasets CIFAR-10-LT CIFAR-100-LT ImageNet-LT iNat18
    Imbalance factor
    100 50 100 50
    Training images 12,406 13,996 10,847 12,608 11,5846 437,513
    Classes 50 50 100 100 1,000 8,142
    Max images 5,000 5,000 500 500 1,280 1,000
    Min images 50 100 5 10 5 2
    Imbalance factor 100 50 100 50 256 500
    -  `Max images` and `Min images` represents the number of training images in the largest and smallest classes, respectively.

    -  CIFAR-10-LT-100 means the long-tailed CIFAR-10 dataset with the imbalance factor $\beta = 100$.

    -  Imbalance factor is defined as $\beta = \frac{\text{Max images}}{\text{Min images}}$.

    • Data format

    The annotation of a dataset is a dict consisting of two field: annotations and num_classes. The field annotations is a list of dict with image_id, fpath, im_height, im_width and category_id.

    Here is an example.

    {
        'annotations': [
                        {
                            'image_id': 1,
                            'fpath': '/data/iNat18/images/train_val2018/Plantae/7477/3b60c9486db1d2ee875f11a669fbde4a.jpg',
                            'im_height': 600,
                            'im_width': 800,
                            'category_id': 7477
                        },
                        ...
                       ]
        'num_classes': 8142
    }
    
    • CIFAR-LT

      Cao et al., NeurIPS 2019 followed Cui et al., CVPR 2019 's method to generate the CIFAR-LT randomly. They modify the CIFAR datasets provided by PyTorch as this file shows.

    • ImageNet-LT

      You can use the following steps to convert from the original images of ImageNet-LT.

      1. Download the original ILSVRC-2012. Suppose you have downloaded and reorgnized them at path /downloaded/ImageNet/, which should contain two sub-directories: /downloaded/ImageNet/train and /downloaded/ImageNet/val.
      2. Download the train/test splitting files (ImageNet_LT_train.txt and ImageNet_LT_test.txt) in GoogleDrive or Baidu Netdisk (password: cj0g). Suppose you have downloaded them at path /downloaded/ImageNet-LT/.
      3. Run tools/convert_from_ImageNet.py, and you will get two jsons: ImageNet_LT_train.json and ImageNet_LT_val.json.
      # Convert from the original format of ImageNet-LT
      python tools/convert_from_ImageNet.py --input_path /downloaded/ImageNet-LT/ --image_path /downloaed/ImageNet/ --output_path ./
    • iNat18

      You can use the following steps to convert from the original format of iNaturalist 2018.

      1. The images and annotations should be downloaded at iNaturalist 2018 firstly. Suppose you have downloaded them at path /downloaded/iNat18/.
      2. Run tools/convert_from_iNat.py, and use the generated iNat18_train.json and iNat18_val.json to train.
      # Convert from the original format of iNaturalist
      # See tools/convert_from_iNat.py for more details of args 
      python tools/convert_from_iNat.py --input_json_file /downloaded/iNat18/train2018.json --image_path /downloaded/iNat18/images --output_json_file ./iNat18_train.json
      
      python tools/convert_from_iNat.py --input_json_file /downloaded/iNat18/val2018.json --image_path /downloaded/iNat18/images --output_json_file ./iNat18_val.json 

    Usage

    In this repo:

    • The results of CIFAR-LT (ResNet-32) and ImageNet-LT (ResNet-10), which need only one GPU to train, are gotten by DataParallel training with apex.

    • The results of iNat18 (ResNet-50), which need more than one GPU to train, are gotten by DistributedDataParallel training with apex.

    • If more than one GPU is used, DistributedDataParallel training is efficient than DataParallel training, especially when the CPU calculation forces are limited.

    Training

    Parallel training with DataParallel

    1, To train
    # To train long-tailed CIFAR-10 with imbalanced ratio of 50. 
    # `GPUs` are the GPUs you want to use, such as `0,4`.
    bash data_parallel_train.sh configs/test/data_parallel.yaml GPUs

    Distributed training with DistributedDataParallel

    1, Change the NCCL_SOCKET_IFNAME in run_with_distributed_parallel.sh to [your own socket name]. 
    export NCCL_SOCKET_IFNAME = [your own socket name]
    
    2, To train
    # To train long-tailed CIFAR-10 with imbalanced ratio of 50. 
    # `GPUs` are the GPUs you want to use, such as `0,1,4`.
    # `NUM_GPUs` are the number of GPUs you want to use. If you set `GPUs` to `0,1,4`, then `NUM_GPUs` should be `3`.
    bash distributed_data_parallel_train.sh configs/test/distributed_data_parallel.yaml NUM_GPUs GPUs

    Validation

    You can get the validation accuracy and the corresponding confusion matrix after running the following commands.

    See main/valid.py for more details.

    1, Change the TEST.MODEL_FILE in the yaml to your own path of the trained model firstly.
    2, To do validation
    # `GPUs` are the GPUs you want to use, such as `0,1,4`.
    python main/valid.py --cfg [Your yaml] --gpus GPUS

    The comparison between the baseline results using our codes and the references [Cui, Kang]

    • We use Top-1 error rates as our evaluation metric.
    • For the ImageNet-LT, we find that the color_jitter augmentation was not included in our experiments, which, however, is adopted by other methods. So, in this repo, we add the color_jitter augmentation on ImageNet-LT. The old baseline without color_jitter is 64.89, which is +1.15 points higher than the new baseline.
    • You can click the Baseline in the table below to see the experimental settings and corresponding running commands.
    Datasets CIFAR-10-LT CIFAR-100-LT ImageNet-LT iNat18
    Imbalance factor
    100 50 100 50
    Backbones ResNet-32 ResNet-10 ResNet-50
    Baselines using our codes
    1. CONFIG (from left to right):
      • configs/cao_cifar/baseline/{cifar10_im100.yaml, cifar10_im50.yaml, cifar100_im100.yaml, cifar100_im50.yaml}
      • configs/ImageNet_LT/imagenetlt_baseline.yaml
      • configs/iNat18/iNat18_baseline.yaml

    2. Running commands:
      • For CIFAR-LT and ImageNet-LT: bash data_parallel_train.sh CONFIG GPU
      • For iNat18: bash distributed_data_parallel_train.sh configs/iNat18/iNat18_baseline.yaml NUM_GPUs GPUs
    28.05 23.55 62.27 56.22 63.74 40.55
    Reference [Cui, Kang, Liu] 29.64 25.19 61.68 56.15 64.40 42.86

    Paper collection of long-tailed visual recognition

    Awesome-of-Long-Tailed-Recognition

    Long-Tailed-Classification-Leaderboard

    Citation

    @inproceedings{zhang2021tricks,
      author    = {Yongshun Zhang and Xiu{-}Shen Wei and Boyan Zhou and Jianxin Wu},
      title     = {Bag of Tricks for Long-Tailed Visual Recognition with Deep Convolutional Neural Networks},
      pages = {3447--3455},
      booktitle = {AAAI},
      year      = {2021},
    }
    

    Contacts

    If you have any question about our work, please do not hesitate to contact us by emails provided in the paper.

    bagoftricks-lt's People

    Contributors

    zhangyongshun avatar

    Stargazers

     avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

    Watchers

     avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

    bagoftricks-lt's Issues

    about Tau-norm

    更新了decouple的部分,好耶
    But i reproduce tau_norm of cifar10_im50 and found acc was only 75.98.
    I first trained baseline for 200 epochs. And use tau_norm.yaml provided in the repo. Is there anything missing?

    Trained models?

    Hello, will the authors release trained model files used in the paper?
    Thanks.

    about Trick combinations

    Hello! Thanks for your contribution. I have such questions:

    1. In the combination of IM & DRS with CAM-BS, At what stage is IM used? or both stage?
    2. In the fine-tuning after mixup training (Table 11), which epoch do you remove the mixup?
    3. Is there some configs for the combination tricks ?

    Thanks a lot ~

    About the effect of input mixup to long-tailed learning

    Hi, thank you for providing so many tricks to solve the problem of long-tailed recognition. I'm wondering the baseline with only input mixup on the dataset of CIFAR100 with imbalance ratio 100 could get error rate 59.66(58.21). In my experiments, the error rate is only around 61.0(60.2) according to the average results of multiple experiments.

    Reported Accuracy

    Can you confirm that the following splits are used for reporting the final accuracies in the paper?

    CIFAR100-LT: Val Split
    ImageNet-LT: Test Split
    iNaturalist18: Val Split

    Do all the earlier works follow the same, that is report final accuracies for CIFAR100-LT and iNaturalist18 only for validation splits and not test splits?

    about Mixup finetune

    Hi author,
    Thanks for such an impressive work, it really helps for related area that involving imbalanced categories.
    The thing is I did not find the code for the details of mixup fine-tuning described in paper, could point it out?
    Thanks

    About download link imagenet-100t

    https://image-net.org/challenges/LSVRC/2012/signup
    I can't find the download link about imagenet-100t
    ImageNet-LT
    You can use the following steps to convert from the original images of ImageNet-LT.

    Download the original ILSVRC-2012. Suppose you have downloaded and reorgnized them at path /downloaded/ImageNet/, which should contain two sub-directories: /downloaded/ImageNet/train and /downloaded/ImageNet/val.
    Download the train/test splitting files (ImageNet_LT_train.txt and ImageNet_LT_test.txt) in GoogleDrive or Baidu Netdisk (password: cj0g). Suppose you have downloaded them at path /downloaded/ImageNet-LT/.
    Run tools/convert_from_ImageNet.py, and you will get two jsons: ImageNet_LT_train.json and ImageNet_LT_val.json.

    about Mixup finetune

    Hi author,
    Thanks for such an impressive work, it really helps for related area that involving imbalanced categories.
    The thing is I did not find the code for the details of mixup fine-tuning described in paper, could point it out?
    Thanks

    About configs

    Hi, I found there are only cifar's config files of combinations . Can you please offer imagenet and inaturalist's config files of combinations? I couldn't find some import parameters , for example, cfg.DATASET.CAM_NUMBER_THRES which is needed when I train on imagnet and iNat. Thank you!

    About DRS training

    Hello! Thanks for your contribution. I have such questions:
    The DRS strategy described in Decoupling representation and classifier for long-tailed recognition is that: first train whole network for 90 or 200 epochs, then freeze the backbone and re-initialize a classifier and train. But the DRS strategy in the code is just to change a different sampler? or I just misunderstand the code?

    > > Hi, where we find the supplemental materials? Thanks.

    Hi, where we find the supplemental materials? Thanks.

    Sorry for replying late. You can find the supp at http://www.lamda.nju.edu.cn/zhangys/papers/AAAI_tricks_supp.pdf, and I will update it to README soon.

    Thank you!

    Also I am curious about a statement in the paper: We can also find that combining CS_CE and CAM-based balance-sampling together cannot further improve the accuracy, since both of them try to enlarge the influence of
    tail classes and the joint use of the two could cause an accuracy drop due to the overfitting problem.
    How do you observe the overfitting or this is just a hypothesis? Thanks.

    Originally posted by @jrcai in #1 (comment)

    Recommend Projects

    • React photo React

      A declarative, efficient, and flexible JavaScript library for building user interfaces.

    • Vue.js photo Vue.js

      🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

    • Typescript photo Typescript

      TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

    • TensorFlow photo TensorFlow

      An Open Source Machine Learning Framework for Everyone

    • Django photo Django

      The Web framework for perfectionists with deadlines.

    • D3 photo D3

      Bring data to life with SVG, Canvas and HTML. 📊📈🎉

    Recommend Topics

    • javascript

      JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

    • web

      Some thing interesting about web. New door for the world.

    • server

      A server is a program made to process requests and deliver data to clients.

    • Machine learning

      Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

    • Game

      Some thing interesting about game, make everyone happy.

    Recommend Org

    • Facebook photo Facebook

      We are working to build community through open source technology. NB: members must have two-factor auth.

    • Microsoft photo Microsoft

      Open source projects and samples from Microsoft.

    • Google photo Google

      Google ❤️ Open Source for everyone.

    • D3 photo D3

      Data-Driven Documents codes.