Giter Club home page Giter Club logo

ldam-drw's Introduction

Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss

Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Arechiga, Tengyu Ma


This is the official implementation of LDAM-DRW in the paper Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss in PyTorch.

Dependency

The code is built with following libraries:

Dataset

  • Imbalanced CIFAR. The original data will be downloaded and converted by imbalancec_cifar.py.
  • The paper also reports results on Tiny ImageNet and iNaturalist 2018. We will update the code for those datasets later.

Training

We provide several training examples with this repo:

  • To train the ERM baseline on long-tailed imbalance with ratio of 100
python cifar_train.py --gpu 0 --imb_type exp --imb_factor 0.01 --loss_type CE --train_rule None
  • To train the LDAM Loss along with DRW training on long-tailed imbalance with ratio of 100
python cifar_train.py --gpu 0 --imb_type exp --imb_factor 0.01 --loss_type LDAM --train_rule DRW

Reference

If you find our paper and repo useful, please cite as

@inproceedings{cao2019learning,
  title={Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss},
  author={Cao, Kaidi and Wei, Colin and Gaidon, Adrien and Arechiga, Nikos and Ma, Tengyu},
  booktitle={Advances in Neural Information Processing Systems},
  year={2019}
}

ldam-drw's People

Contributors

bhavinjawade avatar kaidic avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ldam-drw's Issues

Focal loss would lead to nan?

Hi @kaidic

Thanks for your fantastic work, but when I tried to reproduce the focal loss result, I found that when gamma=0.5, the focal loss would lead to nan loss during training, but the focal loss in this repo can make it.

I checked the two different designed focal loss carefully and found the forward progress of them are the same but model parameters became different after backward, I am quite confused, could you please give me some advice?

Thanks for your contribution again!

AttributeError: 'IMBALANCECIFAR10' object has no attribute 'data'

Hi,I meet "AttributeError" when running "cifar_train.py". Could you please tell me how to fix it ?

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./datasets/imbalance_cifar10/cifar-10-python.tar.gz
212664376it [00:19, 38731722.29it/s]Traceback (most recent call last):
File "/xinfu/code/long_tail/BBN/main/train.py", line 69, in
train_set = eval(cfg.DATASET.DATASET)("train", cfg)
File "/xinfu/code/long_tail/BBN/lib/dataset/imbalance_cifar.py", line 25, in init
img_num_list = self.get_img_num_per_cls(self.cls_num, imb_type, imb_factor)
File "/xinfu/code/long_tail/BBN/lib/dataset/imbalance_cifar.py", line 44, in get_img_num_per_cls
img_max = len(self.data) / cls_num
AttributeError: 'IMBALANCECIFAR10' object has no attribute 'data'

ERM Baseline

Thanks for sharing your code.

I have ran the ERM baseline but get 73.17% acc, which is different from 70.36% reported in the paper. Is there some problem about my experiment?

image

Can not achieve similar results For Tiny ImageNet

Thanks for your paper and your code, they are great work and help me a lot.
I did experiments on tiny imagenet dataset following the settings revealed on your paper, howerer i can't achieve similar results, for
long tailed 1:100 tiny imagenet, the top-1 validation error I got is:
ERM SGD: 80.05
LDAM SGD: 72.8
It has a big gap with the results showed in your paper. So I wonder if there is any setting or trick I have missed?
In the you mentioned:<We perform
1 crop test with the validation images.> I wonder how it is done specifically.
For ResNet-18, I use:
backbone = models.resnet18(pretrained=True)
backbone.avgpool = nn.AdaptiveAvgPool2d(1)
num_ftrs = backbone.fc.in_features
if USE_NORM:
backbone.fc = NormedLinear(num_ftrs, 200)
else:
backbone.fc = nn.Linear(num_ftrs, 200)
Is it correct?
Looking forward to your reply, thank you very much!

Any Experiments on Face Recognition?

Thanks for your great work.
I've got a question here, since LDAM extends the margin softmax loss which is commonly used in face recognition, have you ever tried some experiments on some face-recognition datasets?

Questions about the hyper-parameters for LDAM loss

It was a very interesting paper to read :)

I have some questions regarding the hyper-parameters for LDAM loss.

  1. What is the values of C, the hyper-parameter to be tuned (according to the paper)? Is it (max_m / np.max(m_list)) introduced in below?
    https://github.com/kaidic/LDAM-DRW/blob/master/losses.py#L28

  2. Is s=30 in LDAM loss also a hyper-parameter to be tuned? I could not find any explanation in the paper. Did I miss something?

  3. What were the tendency of these hyper-parameters when training? How do these hyper-parameter selections are related to the imbalance level (or different datasets)? The found parameters work for other datasets in the paper (Tiny ImageNet, iNaturalist)?

Thanks.

Is there any pretrain model?

Hi,

Thank you for opening the source code! The cifar model now is initialized with likely random parameters. I wonder if these models can be initialized by any pretrained model?
Thank you!

About the LDAM Loss

Thanks for your code a lot!
I have read your paper and code, it's really a good idea, but here I have a question about LDAM Loss. It's in the last line where we call the basic cross_entropy function in pytorch.

    def forward(self, x, target):
        index = torch.zeros_like(x, dtype=torch.uint8)
        index.scatter_(1, target.data.view(-1, 1), 1)

        index_float = index.type(torch.cuda.FloatTensor)
        # self.m_list[None, :] add one dimension to the origin m_list
        batch_m = torch.matmul(self.m_list[None, :], index_float.transpose(0, 1))
        # equivalently transpose
        batch_m = batch_m.view((-1, 1))
        x_m = x - batch_m
        # only the target labelpostion is x_m
        output = torch.where(index, x_m, x)
        return F.cross_entropy(self.s * output, target, weight=self.weight)

why the output is multiplied by s(here is 30 times), just to make the loss greater? However, we didn't do this to the Focal loss

more details about your paper

Thanks for your code a lot!
I have read your paper and code,it's really a good idea,but here I have a question about Formula 8.

image

why here y1 equals C/n^0.25?

Anyway,thanks a lot!

Wrong implementation of focal loss

Hi,

I believe that you have a wrong implementation of focal loss. I hope I have not misunderstood the code. Although the wrong implementation of focal loss will not effect the method you proposed. I hope the authors will spend some time correcting it.

You should compute -(1-p)^r * log(p) for every sample in the batch.
However, after you use F.cross_entropy at line 21 of losses.py , the output is already a single "value".
You then use this value as p to compute focal loss which is completely wrong.

An obvious indication of the wrong implementation is that you can actually remove the .mean() at line 11 in losses.py without causing any errors. It shows that you're indeed dealing with a single value but not vectors.

This might explain why your implementation is so different from https://github.com/Hsuxu/Loss_ToolBox-PyTorch/blob/master/FocalLoss/FocalLoss.py
or https://github.com/clcarwin/focal_loss_pytorch/blob/master/focalloss.py

You can also check the previous work you've cited https://github.com/vandit15/Class-balanced-loss-pytorch/blob/master/class_balanced_loss.py where the key point is that they make sure "reduction=none" when using F.binary_cross_entropy_with_logits.

how bout both training set and test set are same imbalanced?

Hi Mr.kaidic

Firstly, Thanks for you sharing your code and paper.

I read your paper and used your code to be impressed.

As i read this paper, there are comments for 2 cases.

    1. training set imbalance, test set is not, and their distributions are different.
    1. training set imbalance, test set is also imbalnced, and their distributions are different.

But you know, unfortunately real data is more imabalanced and challenging.

  • My data distribution is not only imbalanced in training set but in test set too.
    I mean both sets are imbalanced and have same distribution.
  • and in my dataset some of classes even got 1 instance for 1 class. ( extremely low )

Then Here's my question to you.
Do you think this LDAM-DRW loss also works to this dataset?
I'm doing experiments changing betas, and delta j (m_list) so on.

i would be very much appreciated if you answer to me !
Ty so much kaidic :)

Points for sampler

Thanks for sharing your great job.

I have a question about the point for sampler in your code.

train_sampler is first declared at L167 in cifar_train.py then, train_loader gets the sampler in L169-171.

This seems to be fine in itself. But in the middle of training(train+validation) there is a part which seems to be for sampler in L186-L208. I understand this part is need for LDAM and DRW, but I think this new train_sampler object does not affect train_loader.

How's your opinion?
Thnks!

LDAM-DRW/cifar_train.py

Lines 186 to 208 in 3193f05

if args.train_rule == 'None':
train_sampler = None
per_cls_weights = None
elif args.train_rule == 'Resample':
train_sampler = ImbalancedDatasetSampler(train_dataset)
per_cls_weights = None
elif args.train_rule == 'Reweight':
train_sampler = None
beta = 0.9999
effective_num = 1.0 - np.power(beta, cls_num_list)
per_cls_weights = (1.0 - beta) / np.array(effective_num)
per_cls_weights = per_cls_weights / np.sum(per_cls_weights) * len(cls_num_list)
per_cls_weights = torch.FloatTensor(per_cls_weights).cuda(args.gpu)
elif args.train_rule == 'DRW':
train_sampler = None
idx = epoch // 160
betas = [0, 0.9999]
effective_num = 1.0 - np.power(betas[idx], cls_num_list)
per_cls_weights = (1.0 - betas[idx]) / np.array(effective_num)
per_cls_weights = per_cls_weights / np.sum(per_cls_weights) * len(cls_num_list)
per_cls_weights = torch.FloatTensor(per_cls_weights).cuda(args.gpu)
else:
warnings.warn('Sample rule is not listed')

DRW actually use Class-Balance Weight, instead of Inverse of Frequency

Hello, thanks for the paper and the code. I just want to confirm in the code snip:

elif args.train_rule == 'DRW':
            train_sampler = None
            idx = epoch // 160
            betas = [0, 0.9999]
            effective_num = 1.0 - np.power(betas[idx], cls_num_list) # when epoch < 160, effective_num=1 (no reweighting). When epoch >160, reweighting with beta=0.9999
            per_cls_weights = (1.0 - betas[idx]) / np.array(effective_num)
            per_cls_weights = per_cls_weights / np.sum(per_cls_weights) * len(cls_num_list)
            per_cls_weights = torch.FloatTensor(per_cls_weights).cuda(args.gpu)

This is the implementation of Class-Balanced (slightly differ with inverse of freq reported in the paper). Any reason to select Beta=0.9999

Using gereral Resnet causes the loss to become ’nan‘

Thank you for your great work!
I find the backone in your code isn't the general Resnet. They are very different from the general.

And I try to replace resnet32 in the paper mentioned with resnet34 but the loss cannot converge and turn to nan finally.

This is bash command I tried (resnet32 has been changed to resnet34 realized by torchvision)

python cifar_train.py --arch resnet32 --gpu 0 --imb_type exp --imb_factor 0.01 --loss_type LDAM --train_rule DRW

Could you please provide further explanation?

CE+DRW and CE+CB

Thanks for your paper and your code, they are great work and help me a lot.
Your article says that DRW is based on the number of samples, but your code is based on the weight of CB. I want to know whether the DRW reported in your article is CE + CB or CE + 1 / N?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.