kaidic / ldam-drw Goto Github PK

View Code? Open in Web Editor NEW

635.0 635.0 115.0 17 KB

[NeurIPS 2019] Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss

Home Page: https://arxiv.org/pdf/1906.07413.pdf

License: MIT License

Python 100.00%

ldam-drw's People

Contributors

Stargazers

Watchers

Forkers

shiyongde vyraun azgo14 kaziahosunhabibripon xjtushujun 0zhongying0 bruinxiong hsouporto winwinjjiang gehongpeng kmfeng swansealeo shiyanrubing osmond332 tang16 myeongjin-kim hhgxx123 nachtsky1077 lliai bigrookie24 niuwan1 zymale skydddoogg mingliangzhang2018 jkooy vatsal2020 baopingli rotorliu xrosliang tryerrorman rachel3210 yachaoshao lzh990711 omipan nothingeasy curryandsun discovershu lllllli hazardfy sorrowyn zrhonor tjustorm tztztztztz changchunli xiaohua-chen miziha-zp viralparekh zylprivate lihuikenny qinwei-hfut alex-derhacobian lasia123 orparask guolz-ml erobic shilei2403 lomoda0715 caisarl76 carpetoid qyb156 zzw-zwzhang ag027592 ruyuan2512 daniel00008 outliers1106 sazzadhrz egoist-eins ishantgit sk1939 saketdixit raman32 g-u-n bhavinjawade stablegradients ismail-mustapha yuancao-git yuki-hong lefanzhang lizhaofu kangkai98 minutes1999 cxchenxi-0313 renjie3 otherpeoplecodes xjtulnx mrtater dreamerlin sunarker ahhy3on chuu-shin liusongshang whoamith thomascong121 windylemon liygcheng wallzfe dangmanhtruong1995 rorschachchen wynmew dl-loss

ldam-drw's Issues

Focal loss would lead to nan?

Hi @kaidic

Thanks for your fantastic work, but when I tried to reproduce the focal loss result, I found that when gamma=0.5, the focal loss would lead to nan loss during training, but the focal loss in this repo can make it.

I checked the two different designed focal loss carefully and found the forward progress of them are the same but model parameters became different after backward, I am quite confused, could you please give me some advice?

Thanks for your contribution again!

About the LDAM Loss

Thanks for your code a lot!
I have read your paper and code, it's really a good idea, but here I have a question about LDAM Loss. It's in the last line where we call the basic cross_entropy function in pytorch.

    def forward(self, x, target):
        index = torch.zeros_like(x, dtype=torch.uint8)
        index.scatter_(1, target.data.view(-1, 1), 1)

        index_float = index.type(torch.cuda.FloatTensor)
        # self.m_list[None, :] add one dimension to the origin m_list
        batch_m = torch.matmul(self.m_list[None, :], index_float.transpose(0, 1))
        # equivalently transpose
        batch_m = batch_m.view((-1, 1))
        x_m = x - batch_m
        # only the target labelpostion is x_m
        output = torch.where(index, x_m, x)
        return F.cross_entropy(self.s * output, target, weight=self.weight)

why the output is multiplied by s(here is 30 times), just to make the loss greater? However, we didn't do this to the Focal loss

AttributeError: 'IMBALANCECIFAR10' object has no attribute 'data'

Hi，I meet "AttributeError" when running "cifar_train.py". Could you please tell me how to fix it ?

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./datasets/imbalance_cifar10/cifar-10-python.tar.gz
212664376it [00:19, 38731722.29it/s]Traceback (most recent call last):
File "/xinfu/code/long_tail/BBN/main/train.py", line 69, in
train_set = eval(cfg.DATASET.DATASET)("train", cfg)
File "/xinfu/code/long_tail/BBN/lib/dataset/imbalance_cifar.py", line 25, in init
img_num_list = self.get_img_num_per_cls(self.cls_num, imb_type, imb_factor)
File "/xinfu/code/long_tail/BBN/lib/dataset/imbalance_cifar.py", line 44, in get_img_num_per_cls
img_max = len(self.data) / cls_num
AttributeError: 'IMBALANCECIFAR10' object has no attribute 'data'

Is there any pretrain model?

Hi,

Thank you for opening the source code! The cifar model now is initialized with likely random parameters. I wonder if these models can be initialized by any pretrained model?
Thank you!

DRW actually use Class-Balance Weight, instead of Inverse of Frequency

Hello, thanks for the paper and the code. I just want to confirm in the code snip:

elif args.train_rule == 'DRW':
            train_sampler = None
            idx = epoch // 160
            betas = [0, 0.9999]
            effective_num = 1.0 - np.power(betas[idx], cls_num_list) # when epoch < 160, effective_num=1 (no reweighting). When epoch >160, reweighting with beta=0.9999
            per_cls_weights = (1.0 - betas[idx]) / np.array(effective_num)
            per_cls_weights = per_cls_weights / np.sum(per_cls_weights) * len(cls_num_list)
            per_cls_weights = torch.FloatTensor(per_cls_weights).cuda(args.gpu)

This is the implementation of Class-Balanced (slightly differ with inverse of freq reported in the paper). Any reason to select Beta=0.9999

how bout both training set and test set are same imbalanced?

Hi Mr.kaidic

Firstly, Thanks for you sharing your code and paper.

I read your paper and used your code to be impressed.

As i read this paper, there are comments for 2 cases.

1. training set imbalance, test set is not, and their distributions are different.
1. training set imbalance, test set is also imbalnced, and their distributions are different.

But you know, unfortunately real data is more imabalanced and challenging.

My data distribution is not only imbalanced in training set but in test set too.
I mean both sets are imbalanced and have same distribution.
and in my dataset some of classes even got 1 instance for 1 class. ( extremely low )

Then Here's my question to you.
Do you think this LDAM-DRW loss also works to this dataset?
I'm doing experiments changing betas, and delta j (m_list) so on.

i would be very much appreciated if you answer to me !
Ty so much kaidic :)

Questions about the hyper-parameters for LDAM loss

It was a very interesting paper to read :)

I have some questions regarding the hyper-parameters for LDAM loss.

What is the values of C, the hyper-parameter to be tuned (according to the paper)? Is it (max_m / np.max(m_list)) introduced in below?
https://github.com/kaidic/LDAM-DRW/blob/master/losses.py#L28
Is s=30 in LDAM loss also a hyper-parameter to be tuned? I could not find any explanation in the paper. Did I miss something?
What were the tendency of these hyper-parameters when training? How do these hyper-parameter selections are related to the imbalance level (or different datasets)? The found parameters work for other datasets in the paper (Tiny ImageNet, iNaturalist)?

Thanks.

ERM Baseline

Thanks for sharing your code.

I have ran the ERM baseline but get 73.17% acc, which is different from 70.36% reported in the paper. Is there some problem about my experiment?

Wrong implementation of focal loss

Hi,

I believe that you have a wrong implementation of focal loss. I hope I have not misunderstood the code. Although the wrong implementation of focal loss will not effect the method you proposed. I hope the authors will spend some time correcting it.

You should compute -(1-p)^r * log(p) for every sample in the batch.
However, after you use F.cross_entropy at line 21 of losses.py , the output is already a single "value".
You then use this value as p to compute focal loss which is completely wrong.

An obvious indication of the wrong implementation is that you can actually remove the .mean() at line 11 in losses.py without causing any errors. It shows that you're indeed dealing with a single value but not vectors.

This might explain why your implementation is so different from https://github.com/Hsuxu/Loss_ToolBox-PyTorch/blob/master/FocalLoss/FocalLoss.py
or https://github.com/clcarwin/focal_loss_pytorch/blob/master/focalloss.py

You can also check the previous work you've cited https://github.com/vandit15/Class-balanced-loss-pytorch/blob/master/class_balanced_loss.py where the key point is that they make sure "reduction=none" when using F.binary_cross_entropy_with_logits.

more details about your paper

Thanks for your code a lot!
I have read your paper and code,it's really a good idea,but here I have a question about Formula 8.

why here y1 equals C/n^0.25?

Anyway,thanks a lot!

Whats ETA on code release?

Please let me know, thanks

Can not achieve similar results For Tiny ImageNet

Thanks for your paper and your code, they are great work and help me a lot.
I did experiments on tiny imagenet dataset following the settings revealed on your paper, howerer i can't achieve similar results, for
long tailed 1:100 tiny imagenet, the top-1 validation error I got is:
ERM SGD: 80.05
LDAM SGD: 72.8
It has a big gap with the results showed in your paper. So I wonder if there is any setting or trick I have missed?
In the you mentioned:<We perform
1 crop test with the validation images.> I wonder how it is done specifically.
For ResNet-18, I use:
backbone = models.resnet18(pretrained=True)
backbone.avgpool = nn.AdaptiveAvgPool2d(1)
num_ftrs = backbone.fc.in_features
if USE_NORM:
backbone.fc = NormedLinear(num_ftrs, 200)
else:
backbone.fc = nn.Linear(num_ftrs, 200)
Is it correct?
Looking forward to your reply, thank you very much!

CE+DRW and CE+CB

Thanks for your paper and your code, they are great work and help me a lot.
Your article says that DRW is based on the number of samples, but your code is based on the weight of CB. I want to know whether the DRW reported in your article is CE + CB or CE + 1 / N?

the learning rate in log_train times 0.1

I found the lr in log_train.csv is multiplied 0.1, and I found in the line marked TODO was written like this
data_time=data_time, loss=losses, top1=top1, top5=top5, lr=optimizer.param_groups[-1]['lr'] * 0.1)) # TODO
also can be seen in:
https://github.com/kaidic/LDAM-DRW/blame/master/cifar_train.py#L291
I wonder why lr times 0.1?

Codes for iNaturalist experiments?

Points for sampler

Thanks for sharing your great job.

I have a question about the point for sampler in your code.

train_sampler is first declared at L167 in cifar_train.py then, train_loader gets the sampler in L169-171.

This seems to be fine in itself. But in the middle of training(train+validation) there is a part which seems to be for sampler in L186-L208. I understand this part is need for LDAM and DRW, but I think this new train_sampler object does not affect train_loader.

How's your opinion?
Thnks!

LDAM-DRW/cifar_train.py

Lines 186 to 208 in 3193f05

 if args.train_rule == 'None': 

 train_sampler = None 

 per_cls_weights = None 

 elif args.train_rule == 'Resample': 

 train_sampler = ImbalancedDatasetSampler(train_dataset) 

 per_cls_weights = None 

 elif args.train_rule == 'Reweight': 

 train_sampler = None 

 beta = 0.9999 

 effective_num = 1.0 - np.power(beta, cls_num_list) 

 per_cls_weights = (1.0 - beta) / np.array(effective_num) 

 per_cls_weights = per_cls_weights / np.sum(per_cls_weights) * len(cls_num_list) 

 per_cls_weights = torch.FloatTensor(per_cls_weights).cuda(args.gpu) 

 elif args.train_rule == 'DRW': 

 train_sampler = None 

 idx = epoch // 160 

 betas = [0, 0.9999] 

 effective_num = 1.0 - np.power(betas[idx], cls_num_list) 

 per_cls_weights = (1.0 - betas[idx]) / np.array(effective_num) 

 per_cls_weights = per_cls_weights / np.sum(per_cls_weights) * len(cls_num_list) 

 per_cls_weights = torch.FloatTensor(per_cls_weights).cuda(args.gpu) 

 else: 

 warnings.warn('Sample rule is not listed')

Any Experiments on Face Recognition?

Thanks for your great work.
I've got a question here, since LDAM extends the margin softmax loss which is commonly used in face recognition, have you ever tried some experiments on some face-recognition datasets?

Using gereral Resnet causes the loss to become ’nan‘

Thank you for your great work!
I find the backone in your code isn't the general Resnet. They are very different from the general.

And I try to replace resnet32 in the paper mentioned with resnet34 but the loss cannot converge and turn to nan finally.

This is bash command I tried (resnet32 has been changed to resnet34 realized by torchvision)

python cifar_train.py --arch resnet32 --gpu 0 --imb_type exp --imb_factor 0.01 --loss_type LDAM --train_rule DRW

Could you please provide further explanation?

	if args.train_rule == 'None':
	train_sampler = None
	per_cls_weights = None
	elif args.train_rule == 'Resample':
	train_sampler = ImbalancedDatasetSampler(train_dataset)
	per_cls_weights = None
	elif args.train_rule == 'Reweight':
	train_sampler = None
	beta = 0.9999
	effective_num = 1.0 - np.power(beta, cls_num_list)
	per_cls_weights = (1.0 - beta) / np.array(effective_num)
	per_cls_weights = per_cls_weights / np.sum(per_cls_weights) * len(cls_num_list)
	per_cls_weights = torch.FloatTensor(per_cls_weights).cuda(args.gpu)
	elif args.train_rule == 'DRW':
	train_sampler = None
	idx = epoch // 160
	betas = [0, 0.9999]
	effective_num = 1.0 - np.power(betas[idx], cls_num_list)
	per_cls_weights = (1.0 - betas[idx]) / np.array(effective_num)
	per_cls_weights = per_cls_weights / np.sum(per_cls_weights) * len(cls_num_list)
	per_cls_weights = torch.FloatTensor(per_cls_weights).cuda(args.gpu)
	else:
	warnings.warn('Sample rule is not listed')