Giter Club home page Giter Club logo

asl's Introduction

Asymmetric Loss For Multi-Label Classification

PWC
PWC
PWC

Paper | Pretrained models | Datasets

Official PyTorch Implementation

Emanuel Ben-Baruch, Tal Ridnik, Nadav Zamir, Asaf Noy, Itamar Friedman, Matan Protter, Lihi Zelnik-Manor
DAMO Academy, Alibaba Group

Abstract

In a typical multi-label setting, a picture contains on average few positive labels, and many negative ones. This positive-negative imbalance dominates the optimization process, and can lead to under-emphasizing gradients from positive labels during training, resulting in poor accuracy. In this paper, we introduce a novel asymmetric loss ("ASL"), which operates differently on positive and negative samples. The loss enables to dynamically down-weights and hard-thresholds easy negative samples, while also discarding possibly mislabeled samples. We demonstrate how ASL can balance the probabilities of different samples, and how this balancing is translated to better mAP scores. With ASL, we reach state-of-the-art results on multiple popular multi-label datasets: MS-COCO, Pascal-VOC, NUS-WIDE and Open Images. We also demonstrate ASL applicability for other tasks, such as single-label classification and object detection. ASL is effective, easy to implement, and does not increase the training time or complexity.

9/1/2023 Update

Added tests auto-generated by CodiumAI tool

29/11/2021 Update - New article released, offering new classification head with state-of-the-art results

Checkout our new project, Ml-Decoder, which presents a unified classification head for multi-label, single-label and zero-shot tasks. Backbones with ML-Decoder reach SOTA results, while also improving speed-accuracy tradeoff.

24/7/2021 Update - ASL article was accepeted to ICCV 2021

A final version of the paper, with updated results for ImageNet-21K pretraining, is released to arxiv.
Note that ASL is becoming the de-facto 'default' loss for high performance multi-label classification, and all the top results in papers-with-code are currently using it.

Training Code Now Available !

With great collaboration by @GhostWnd, we now provide a script for fully reproducing the article results, and finally a modern multi-label training code is available for the community.

Frequently Asked Questions

Some questions are repeatedly asked in the issues section. make sure to review them before starting a new issue:

  • Regarding combining ASL with other techniques, see link
  • Regarding implementation of asymmetric clipping, see link
  • Regarding disable_torch_grad_focal_loss option, see link
  • Regarding squish Vs crop resizing, see link
  • Regarding training tricks, see link
  • How to apply ASL to your own dataset, see link

Asymmetric Loss (ASL) Implementation

In this PyTorch file, we provide implementations of our new loss function, ASL, that can serve as a drop-in replacement for standard loss functions (Cross-Entropy and Focal-Loss)

For the multi-label case (sigmoids), the two implementations are:

  • class AsymmetricLoss(nn.Module)
  • class AsymmetricLossOptimized(nn.Module)

The two losses are bit-accurate. However, AsymmetricLossOptimized() contains a more optimized (and complicated) way of implementing ASL, which minimizes memory allocations, gpu uploading, and favors inplace operations.

For the single-label case (softmax), the implementations is called:

  • class ASLSingleLabel(nn.Module)

Pretrained Models

In this link, we provide pre-trained models on various dataset.

Validation Code

Thanks to external contribution of @hellbell, we now provide a validation code that repdroduces the article results on MS-COCO:

python validate.py  \
--model_name=tresnet_l \
--model_path=./models_local/MS_COCO_TRresNet_L_448_86.6.pth

Inference Code

We provide inference code, that demonstrate how to load our model, pre-process an image and do actuall inference. Example run of MS-COCO model (after downloading the relevant model):

python infer.py  \
--dataset_type=MS-COCO \
--model_name=tresnet_l \
--model_path=./models_local/MS_COCO_TRresNet_L_448_86.6.pth \
--pic_path=./pics/000000000885.jpg \
--input_size=448

which will result in:

Example run of OpenImages model:

python infer.py  \
--dataset_type=OpenImages \
--model_name=tresnet_l \
--model_path=./models_local/Open_ImagesV6_TRresNet_L_448.pth \
--pic_path=./pics/000000000885.jpg \
--input_size=448

Citation

 @misc{benbaruch2020asymmetric, 
        title={Asymmetric Loss For Multi-Label Classification}, 
        author={Emanuel Ben-Baruch and Tal Ridnik and Nadav Zamir and Asaf Noy and Itamar Friedman and Matan Protter and Lihi Zelnik-Manor}, 
        year={2020}, 
        eprint={2009.14119},
        archivePrefix={arXiv}, 
        primaryClass={cs.CV} }

Contact

Feel free to contact if there are any questions or issues - Emanuel Ben-Baruch ([email protected]) or Tal Ridnik ([email protected]).

asl's People

Contributors

michalwols avatar mrt23 avatar t-wtnb avatar yqtianust avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

asl's Issues

what's the effect of the disable_torch_grad_focal_loss in ASymmetricLoss ?

Thank you for your work ! when i read the code of losses.py, i find the variable disable_torch_grad_focal_loss in AsymmetricLoss, but i don't find it appear in elsewhere . so i wonder if the disable_torch_grad_focal_loss is just used for paper's experiment comparing ? and in normal use,do we need to modify it ?

How to choose best checkpoints?

Hi! I'm using ASL+EMA for imbalanced multi-label problem. For ordinary softmax problems avg_checkpoints.py from timm works very good. I've tried checkpoint averaging for ASL+EMA, the results are very different, they highly depend on the selected checkpoints. Are there approaches to choose best checkpoints to average them? And should we use checkpoint averaging with EMA at all?

Adding a distributed script and try to reproduce 86.6 mAP

Hi,

At first, thank you for sharing the training scripts. I really like your work which inspired me a lot.

  1. There may be a bug in the train.py as mentioned in #38, which can be fixed by replacing line 155 target = target with target = target.max(dim=1)[0].

  2. I try to reproduce the results in the paper so that I add a distributed training scripts based on the train.py you shared, which can be found in https://github.com/SlongLiu/ASL_reproduce/blob/master/train_dist.py. Based on the script, I can achieve the mAP 85.5 on COCO(448*448), which is lower than the result 86.6 mAP in paper. Could you please help me find out the reason of this gap?

Thanks for your help and happy new year!

Training file

Thanks for this brilliant piece of work.

Can you provide the .py file you used for the training ? (for reproducing purpose )

在使用BCE损失函数训练好的模型上,使用ASL微调模型,loss变大。

您好。
我的数据集一共有2个标签,每个标签有三个属性:a1、a2、a3,b1、b2、b3。
它们之间的比例是1,1.38807, 1.35329, 1.05098, 1.92411, 1.01199

但是我的样本不平衡,根据属性排列组合,共有9类图片:a1b1、a1b2、a1b3、a2b1、a2b2、a2b3、a3b1、a3b2、a3b3
它们之间的数量比例是:55,2.7,2.5,6.4,27,14.5,1,1,42

我首先使用BCEWithLogitsLoss进行训练,此时loss从1.5(始)------>0.11(终),此时微平均是90%,宏平均是70%
然后在此基础上,使用您的ASL进行训练,此时loss从0.11-----1---->3------>2------>1(终),此时微平均和宏平均反而都降低了。

请问:
1.宏平均比微平均差20%,是样本不平衡造成的吗
2.使用了ASL之后,精度反而降低,是参数设置的不对吗,应该如何设置参数(gamma_neg、gamma_pos、clip)。

hyper-parameters for reproducing results on MSCOCO

Thanks for the inspiring implementation :)

I'm having trouble to reproduce the results on MSCOCO with tresnet_m as backbone, 224 as input.
I have varied lr (1e-4 and 2e-4), batch_size (128 and 64), epochs (40/80 and 14/25) and only got around 79.+% mAP (which are lower than reported pre-train models).

I understand you can't share the original training code due to commerical reasons, can you provide your hyper-parameters for the reproduction? Seems these hyper-params influence the results.

Wrong mAP from validate.py

Hello, I'm very interested in your novel work and try to run your validate.py file, but get totally wrong mAP:
Actually, I run
python validate.py
--model_name=tresnet_l
--model_path=./models_local/MS_COCO_TRresNet_L_448_86.6.pth

but finally it printed out mAP score: 3.7.
Is something wrong with what I did?

A question about pre-trained weights

Hello,

Very nice work. I am highly inspired by it and working on training from the scratch. I have a question regarding the default pre-trained weights which you have used to initialize the training. For example: the one which you have used locally were I guess: 'tresnet_m.pth'. Are they the one which are shared in the repo at the link https://github.com/Alibaba-MIIL/ASL/blob/main/MODEL_ZOO.md

When I am starting the training from scratch without any pre-trained weights, my loss is extremely high and the val mAP is never crossing 50 or so even after 80 epochs.

Thanks in advance for your time.

Best,
Inder

Questions on reproducing the reported results on MS COCO

Hi,

First, thank you for sharing the exciting work.

I was trying to reproduce the results on MS COCO dataset based on my own training framework. When I used cross entropy loss loss_function=AsymmetricLoss(gamma_neg=0,** gamma_pos=0, clip=0) to achieve the baseline. The result (with backbone of ResNet101) of mAP ~82.5% was achieved, which is quite similar to the result reported in Fig. 8 of the paper.

Then, I replaced the loss function with loss_function=AsymmetricLoss(gamma_neg=4, gamma_pos=1, clip=0.05) -- all other hyper parameters were kept consistent. However, I only got the mAP result of ~82.1%.

Also, the traditional focal loss loss_function=AsymmetricLoss(gamma_neg=2, gamma_pos=2, clip=0) can not outperform the baseline (~82.5%), given the same configurations. I am curious about the issue of my training process.

Could you also please share some training tricks? For example, a snippet of code on adjusting learning rate, training transforms similar to that used for validation here, etc. Or, is there any suggestions?

Thank you.

Questions about reproducing results on COCO

Hello, I tried to reproduce the result on COCO, I implemented my own framework and most of my files are the same as you, I only write my new train.py.
As is introduced in your paper, I have implemented EMA with 0.999, 1cycle policy with max learing rate 2e-4, Adam optimizer with weigth_decay 1e-4, img_size = 448*448 and batchsize = 16.

But when I train my model, the loss decreases from 120 to around 90 and then it just doesn't decrease and the performance on validation date is very bad, whose mAP is around 10, at first I guess it's because I didn't spend much time training, I only trained it for an hour, but when I try to train it longer, the loss still doesn't decrease, could you please tell me what I have done wrong?

My code is avaliable at https://github.com/GhostWnd/reproducingASL, thank you for your help.

Data augmentation

Hi, thank you for releasing your training code for us to reproduce your models.

I notice the date augmention in your code:
train_dataset = CocoDetection(data_path_train,
instances_path_train,
transforms.Compose([
transforms.Resize((args.image_size, args.image_size)),
CutoutPIL(cutout_factor=0.5),
RandAugment(),
transforms.ToTensor(),
# normalize,
]))

Here, RandAugment means the newly proposed augmentation by google? And did you use this augmention except cutout in your models? If so, when reproducing your code, I will use the augmention as you did.

Thanks in advance.

Loss is causing the program to crash

Hi,

I've tried running both the optimized and the non-optimized version of your loss and both caused an exception.
It seems that there is an issue with the self.target (is == None) in the optimized version and a shape issue with the non-optimized version.
What is the proper shape of the target in your loss?

Thanks

question about open images dataset

Hi, thanks for your great repo.
Open images dataset V6 contains human-verified labels and machine-generated labels. Do you use the machine-generated labels for training and testing?

How to calculate p?

In paper, "For very hard negative samples (with p > p∗, where p∗ is defined as the point where d (dp/dL)/dz=0,"
How to calculate p∗?

Binary size of TresNet L - COCO vs OpenImage

Hi @mrT23

I am looking at the pretrained model you provided. I notice that the pretrained Tresnet-L for COCO and OpenImage have different binary size. Though COCO has fewer categories than OpenImage, its size is ~215MB whereas OpenImage's model is only 120MB. I wonder why this is the case?

Thank you!

A question about the shifted probability?

Sorry, may I ask you a question?

In paper, you define the shifted probability, pm, as:
pm = max(p-m, 0)
But in code, you define it as:
xs_neg = (xs_neg + self.clip).clamp(max=1)
I think it is means:
pm = min(1-p+m, 1)

Why ?

getting difficulty in importing .pyi file

File "/content/gdrive/My Drive/project/ASL/inplace_abn/inplace_abn/functions.py", line 8, in

from . import _backend

ImportError: cannot import name '_backend'

_backend.pyi file is in the same directory as functions.py

RuntimeError: Some elements marked as dirty

there occurs an error in the tresnet.py file. result = self.forward(*input, **kwargs)
RuntimeError: Some elements marked as dirty during the forward method were not returned as output. The inputs that are modified inplace must all be outputs of the Function.
I don't know why

Dataset path

Is

ASL/train.py

Lines 51 to 52 in eb52197

data_path_val = args.data
data_path_train = args.data

should be:

    data_path_val   = f'{args.data}/val2014'    # args.data
    data_path_train = f'{args.data}/train2014'  # args.data

?

some confusions about computing p_m

Hi:
I am reading your paper and your source codes, there are some confusions:

  1. In the paper, p_m = max(p-m, 0), however, in your implementation, p_m = p + 0.05. Are there some insights? Thanks~

Validation of nuswide checkpoint

Hi, thank you very much for sharing the dataset and split of nuswide dataset.

I try to reproduce the mAP 65.2 that reported in paper on nuswide dataset, but I can only get a mAP of 64.0 based on the dataset and checkpoint you provided.
Both the data and checkpoint I used are downloaded from the link you provided. The script is obtained by modifying the validate.py in this repo and could be found in https://github.com/SlongLiu/ASL_reproduce/blob/master/validate_nuswide.py. The log could be found in https://github.com/SlongLiu/ASL_reproduce/blob/master/nuswide_pt_test/log.txt. Could you give me some advice on the results?

Thanks very much again!

ASL用于细粒度分类

您好,非常感谢您以及您的团队在这方面的贡献,我准备将此损失函数运用于细粒度分类,请问如何将此损失函数此任务,是否可以公开train.py示例。

I have a question

Why xs_neg = (xs_neg + self.clip).clamp(max=1) instead of xs_neg = (xs_neg - self.clip).clamp(min=0)?

Network special initialization

Hi,

I have a question regarding last layer initialisation as it was done in Focal Loss paper, so that
rare classes will initially have some low probability (Pi) of 0.01.

Do you also use some special init strategy?

Thanks,
Ilya

NaNs with fp16

Hi! I use multi-label AsymmetricLoss with default args in a modified timm's train script. When I turn on native AMP, I get partly NaNs, partly regular floats from ASL. And fp32 is ok.

from log:

pid 4361 INFO: AsymmetricLoss x tensor([[ 73.8750, -10.7578,  64.6250, -20.7031,  81.4375,  42.9688],
        [ 62.3438,  -9.9453,  58.5000, -17.8594,  71.8750,  37.4375],
        [ 38.2500,  -4.2578,  36.7188, -12.7344,  43.1250,  25.2188],
        [ 54.4062,  -6.5781,  49.5000, -14.1250,  64.7500,  29.8281],
        [ 50.4688,  -8.9766,  44.2500, -14.5938,  56.2812,  31.2031],
        [ 59.1875,  -5.0039,  52.4375, -19.5781,  64.8750,  32.9688]],
       device='cuda:0', dtype=torch.float16, grad_fn=<AddmmBackward>) y tensor([[0., 0., 1., 0., 0., 0.],
        [0., 0., 0., 0., 0., 1.],
        [0., 0., 1., 0., 0., 0.],
        [0., 0., 0., 0., 0., 1.],
        [0., 0., 1., 0., 0., 0.],
        [0., 1., 0., 0., 0., 1.]], device='cuda:0')
pid 4361 INFO: AsymmetricLoss loss nan <================================================== NAN
pid 4361 INFO: AsymmetricLoss x tensor([[ 43.3438,  -6.1172,  40.2500, -12.6875,  49.8750,  26.6406],
        [ 46.7188,  -6.6211,  40.7188, -12.7188,  54.7500,  27.3438],
        [ 47.8438,  -4.1523,  44.6250, -15.6172,  51.8750,  27.2344],
        [ 51.0312,  -7.9258,  49.4375, -13.4922,  56.9062,  29.8906],
        [ 50.7500,  -6.8281,  43.8125, -16.8125,  52.2500,  31.2969],
        [ 53.7500, -10.8438,  48.0312, -14.3750,  57.1875,  30.2500]],
       device='cuda:0', dtype=torch.float16, grad_fn=<AddmmBackward>) y tensor([[0., 0., 0., 0., 1., 0.],
        [1., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 1.],
        [0., 0., 0., 0., 0., 1.],
        [0., 0., 1., 0., 0., 0.],
        [0., 1., 0., 0., 0., 1.]], device='cuda:0')
pid 4361 INFO: AsymmetricLoss loss 54.76890563964844
...

Problem with some images (not all) while running pretrained open images model

RuntimeError: Given groups=1, weight of size [76, 48, 3, 3], expected input[1, 64, 112, 112] to have 48 channels, but got 64 channels instead

Actually it is giving error on small PNG files, on the other hand if we convert the same file to JPG (with same size) it is working

Can u explain why it is so?

Thanks.

Normalization difference

Hi, I have read the Appendices in your paper, in which you mentioned that "We found that the common ImageNet statistics normalization [16, 8, 28] does not improve results, and instead used a simpler normalization - scaling all the RGB channels to be between 0 and 1".

But in your validate.py, it was "normalize = transforms.Normalize(mean=[0, 0, 0], std=[1, 1, 1])" ?

So if I want to train a new model using tresnet as a backbone, which of the above normalization is better?
Thanks.

Evaluation code bug

Traceback (most recent call last):
  File "train.py", line 183, in <module>
    main()
  File "train.py", line 85, in main
    train_multi_label_coco(model, train_loader, val_loader, args.lr)
  File "train.py", line 139, in train_multi_label_coco
    mAP_score = validate_multi(val_loader, model, ema)
  File "train.py", line 173, in validate_multi
    mAP_score_regular = mAP(targs, preds)
  File "/home/sliao1/working/_20multilabel/code.others/ASL/src/helper_functions/helper_functions.py", line 64, in mAP
    ap[k] = average_precision(scores, targets)
  File "/home/sliao1/working/_20multilabel/code.others/ASL/src/helper_functions/helper_functions.py", line 41, in average_precision
    pos_count_[np.logical_not(ind)] = 0
IndexError: too many indices for array

Can you confirm the following is a bug and the fix is valid?

ASL/train.py

Line 168 in eb52197

mAP_score_regular = mAP(torch.cat(targets).numpy(), torch.cat(preds_regular).numpy())

Should be:

mAP_score_regular = mAP(torch.cat(targets).numpy()[:,-1,:], torch.cat(preds_regular).numpy())

since:

targets is of shape (num_sample, 3, 80) and preds_regular is of shape (num_sample, 80).

The same issue is also on:

ASL/train.py

Line 169 in eb52197

mAP_score_ema = mAP(torch.cat(targets).numpy(), torch.cat(preds_ema).numpy())

Not so sure which targets to pass in:

mAP_score_ema = mAP(torch.cat(targets).numpy()[:,i,:], torch.cat(preds_ema).numpy()) # i=0,1,2 ?

B.t.w, what does "Ema" stand for?

Implementation of Asymmetric Clipping

Thanks for such an interesting paper 👍

In the paper's equation (4), asymmetric probability shifting is p_m = max(p-m, 0), but in the implementation, it's called asymmetric clipping and there is xs_neg = (xs_neg + self.clip).clamp(max=1) which is probably p_m = min(p+m, 1).

Is there a reason for this difference?

Is it possible to train ASL using multiple GPU

Thank you very much for your work.
I trained the ASL model using one gpu sucessfully. But when I trained it using multiple GPUs. Error occurred.
So Is it possible to train ASL using multiple GPU?

some confusion about classes in pretrained model on OpenImages

Hi, thanks for your nice repo.
I am experimenting with your pretrained network on openimages for my thesis.
But I came across some mismatch between the names of the classes you trained your network on and the official ones from OpenImagesV6.
As I understand it, you saved the class names along with the model. Then, in infer.py we load it into the variable 'classes_list'.
When I looked into that variable, the first 50 labels have a very strange string format (e.g. """Pig's organ soup""") and in addition the last 10 classes also seem to be damaged (e.g. and melon family' or "pentathlon""" (this is raw text as it is in the list)). I attached a dump of the damaged labels as a zipped csv. classes_list.csv.zip (please look at it in a text editor and not in excel)
I wonder what the implications of this damaged classes are? were these really the ones used during training?

To my understanding, the correct ids to be trained on would be here: https://storage.googleapis.com/openimages/v6/oidv6-classes-trainable.txt and the corresponding class descriptions could be found here: https://storage.googleapis.com/openimages/v6/oidv6-class-descriptions.csv

Thanks in advance for your clarification and help.

In my dataset, the loss of ALS is very large, and it is normal to use other loss functions

Hello, thank you very much and your team's contribution in this respect, I intend to apply this loss function to my image multi label classification model (only label, no border label),
loss_function=AsymmetricLoss()
logits = net(images.to(device))
loss = loss_function(logits,labels.to(device))
I haven't changed your ALS loss function at all. At first, the loss was 156. Finally, it dropped to 4, ACC = 0. What's the matter? Why did the loss value just start to be more than 100, and still be 4 after training, and the accuracy rate is zero?When I use BCEloss, it's perfectly normal
train loss: 100%[->]4.9414
[epoch 1] train_loss: 21.409 test_accuracy: 0.000
train loss: 100%[
->]5.7753

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.