richardaecn / class-balanced-loss Goto Github PK
View Code? Open in Web Editor NEWClass-Balanced Loss Based on Effective Number of Samples. CVPR 2019
License: MIT License
Class-Balanced Loss Based on Effective Number of Samples. CVPR 2019
License: MIT License
I use ./cifar_trainval.sh to train,but it occurs some problems,why?
Traceback (most recent call last):
File "/environment/python/versions/miniconda3-4.7.12/envs/tf1.15/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
return fn(*args)
File "/environment/python/versions/miniconda3-4.7.12/envs/tf1.15/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
target_list, run_metadata)
File "/environment/python/versions/miniconda3-4.7.12/envs/tf1.15/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
(0) Internal: Blas GEMM launch failed : a.shape=(128, 64), b.shape=(64, 10), m=128, n=10, k=64
[[{{node resnet/tower_0/fully_connected/dense/MatMul}}]]
[[resnet/tower_0/softmax_cross_entropy_loss/assert_broadcastable/is_valid_shape/has_valid_nonscalar_shape/has_invalid_dims/concat/_1549]]
(1) Internal: Blas GEMM launch failed : a.shape=(128, 64), b.shape=(64, 10), m=128, n=10, k=64
[[{{node resnet/tower_0/fully_connected/dense/MatMul}}]]
0 successful operations.
0 derived errors ignored.
Hi!
First of all, thank you ver much for your code, it's a great work!
I would like to know if you could tell me the part where the class-balanced-loss is implemented among all the files that you include.
I've been looking for it and I'm a little lost.
Thank you very much in advance!
In each batch, I firstly compute one independent E_{n} for that batch, but it doesn't work at all; but according to your code, E_{n} is global, there exists only one value for entire mini-batch GD optimization process, it works evidently. I want to know the reason?
Because my implementation for cb-focal in pytorch can only reach 77% accuracy for long-tail cifar10. I have ran your source code without any change. But the baseline is 77.5% (My baseline for pytorch in actually 75%.), which is almost 2.7% higher than the results reported in your paper.
Does this mean cb-focal is only 2% higher than baseline?
@richardaecn @KMnP thanks for open-sourcing the code , is it possible to use Cb loss in the object detection or segmentation architecture ?? did you experiment it with any std architecture as yolo retina, deeplab ?? since i am planning on using to our custom object detection architecture so
When n_y is large, it seems that the loss weights α are always equal to 1. If so, CB Loss makes no sense. @richardaecn
For example, 10,000 images belong to class A, and only 1,000 images belong to class B. Then the CB weights are [1,1], no matter how much β is.
Please correct me if I have any misunderstanding. Thanks.
I try to compted En as a weight in batch data, but loss quickly change to NAN. I always try to compute in whole dataset, img_per_class =[900000,700000,60000], because img_num is very large, beta^n is almost equal 0. then, i got En like [1e-4,1e-4,1e-4]. I think the weight can not handle the imbalance dataset. @richardaecn
Hello! I've found a performance issue in tpu/models/: batch()
should be called before map()
, which could make your program more efficient. Here is the tensorflow document to support it.
Detailed description is listed below:
dataset.batch(batch_size, drop_remainder=True)
(here) should be called before dataset.map(_dataset_parser, num_parallel_calls=64)
(here).dataset.batch(batch_size, drop_remainder=True)
(here) should be called before dataset.map(_dataset_parser, num_parallel_calls=64)
(here)..batch(batch_size)
(here) should be called before .map(parser)
(here).Besides, you need to check the function called in map()
(e.g., parser
called in .map(parser)
) whether to be affected or not to make the changed code work properly. For example, if parser
needs data with shape (x, y, z) as its input before fix, it would require data with shape (batch_size, x, y, z).
Looking forward to your reply. Btw, I am very glad to create a PR to fix it if you are too busy.
Dear authors, thanks for your greate effort to make your code open-source.
I re-implement you CB-focal Loss in Pytorch(Both in your tf-version and my own version), but can't reach the performance reported in your paper.
This is my code. Could you please have a check whether there is something wrong with my code?
output: [batch_size, num_classes]
label: [batch_size]
catList: [num_classes] a list of sampler numbers for each class
the weights normalize function is not in the paper, why in the code need normalized ? and why to
multiply the number of classes? ths .
Respected Authors,
Firstly, thank you for releasing the code. Is code available for inference from given pre-trained models? It would be really helpful if you could provide the same along with your current repository.
Thanks in advance
I want to know what the difference between N and E_n in the paper?
Does N means the number of all samples?
Hi Yin
Thanks for sharing the code! I wanted to run your code on my own or some new dataset and get the corresponding evaluation metrics. Can you help me by giving some insights on how to use your code on some new datasets from scratch?
Thanks
Hi, I am using your implementation of focal loss, and sometimes the value calculated is nan. I realized that it is due to the normalization in terms of positive samples in the batch since I am working with 3D data, and I can't have big batches. I have a very unbalanced dataset as well. That causes that some of my batches are composed of only negative samples, so the normalization ends up having a zero division. How would you recommend I perform the normalization in this case?
Hi, may I ask a question about the description in the paper?
As you mentioned in paper in the last paragraph of chapter 3.1:
"the stronger the data augmentation is, the smaller the N will be. The small neighboring region of a sample is a way to capture all near-duplicates and instances that can be obtained by data augmentation"
Shouldn't stronger data augmentation technique provide more samples of the same class(S) that makes the volume of them (N) larger?
I found you might be the organizer of the COCO Detection Challenge.
The codalab has closed the old website and your competition can‘t submit.
Can you transfer the competition to the new website? Refer the issue here.
I should compute weights_per_cls of whole dataset or compute weights_per_cls of each batch ? There may be 0 in weights_per_cls of each batch.
how to infer the modulator
the code in your repo
modulator = tf.exp(-gamma * labels * logits - gamma * tf.log1p(
tf.exp(-1.0 * logits)))
for focal loss in tensorflow/models/blob/master/research/object_detection
the focal loss form is the same as what is shown in paper
prediction_probabilities = tf.sigmoid(prediction_tensor)
p_t = ((target_tensor * prediction_probabilities) +
((1 - target_tensor) * (1 - prediction_probabilities)))
modulating_factor = 1.0
if self._gamma:
modulating_factor = tf.pow(1.0 - p_t, self._gamma)
Could you please tell me how to transform the paper form to your form?
Thank you very much!
Hi,
thanks for sharing the code and for your great work. I have a question about the loss in your paper why you add +1 in (1) formula exactly (1-p)(En-1+1).
Where does it came from? is the initial expected value?
Thanks
Hello,I found a performance issue in the definition of __call__
,
tpu/models/official/retinanet/dataloader.py,
dataset = dataset.map(_process_example) was called without num_parallel_calls.
I think it will increase the efficiency of your program if you add this.
The same issues also exist in dataset = dataset.map(parser) ,
dataset = dataset.repeat().map(parser)
Here is the documemtation of tensorflow to support this thing.
Looking forward to your reply. Btw, I am very glad to create a PR to fix it if you are too busy.
Is n_y
computed from the whole dataset or each batch? Thanks a lot.
How I set weights?
If image1 (have class 1 and 2 and class3), the weight is (class1_num + class2_num + class3_num)?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.