hyz-xmaster / varifocalnet Goto Github PK

VarifocalNet: An IoU-aware Dense Object Detector

License: Apache License 2.0

Python 99.90% Shell 0.07% Dockerfile 0.03%

object-detection dense-object-detection varifocal-loss focal-loss mscoco varifocalnet

varifocalnet's Introduction

VarifocalNet: An IoU-aware Dense Object Detector

This repo hosts the code for implementing the VarifocalNet, as presented in our CVPR 2021 oral paper, which is available at: https://arxiv.org/abs/2008.13367:

@inproceedings{zhang2020varifocalnet,
  title={VarifocalNet: An IoU-aware Dense Object Detector},
  author={Zhang, Haoyang and Wang, Ying and Dayoub, Feras and S{\"u}nderhauf, Niko},
  booktitle={CVPR},
  year={2021}
}

Introduction

Accurately ranking the vast number of candidate detections is crucial for dense object detectors to achieve high performance. In this work, we propose to learn IoU-aware classification scores (IACS) that simultaneously represent the object presence confidence and localization accuracy, to produce a more accurate ranking of detections in dense object detectors. In particular, we design a new loss function, named Varifocal Loss (VFL), for training a dense object detector to predict the IACS, and a new efficient star-shaped bounding box feature representation (the features at nine yellow sampling points) for estimating the IACS and refining coarse bounding boxes. Combining these two new components and a bounding box refinement branch, we build a new IoU-aware dense object detector based on the FCOS+ATSS architecture, what we call VarifocalNet or VFNet for short. Extensive experiments on MS COCO benchmark show that our VFNet consistently surpasses the strong baseline by ~2.0 AP with different backbones. Our best model VFNet-X-1200 with Res2Net-101-DCN reaches a single-model single-scale AP of 55.1 on COCO test-dev, achieving the state-of-the-art performance among various object detectors.

Learning to Predict the IoU-aware Classification Score.

Updates

2021.03.05 Our VarifocalNet is accepted to CVPR 2021 as an oral presentation. Thanks the reviewers and ACs.
2021.03.04 Update to MMDetection v2.10.0, add more results and training scripts, and update the arXiv paper.
2021.01.09 Add SWA training.
2021.01.07 Update to MMDetection v2.8.0.
2020.12.24 We release a new VFNet-X model that can achieve a single-model single-scale 55.1 AP on COCO test-dev at 4.2 FPS.
2020.12.02 Update to MMDetection v2.7.0.
2020.10.29 VarifocalNet has been merged into the official MMDetection repo. Many thanks to @yhcao6, @RyanXLi and @hellock!
2020.10.29 This repo has been refactored so that users can pull the latest updates from the upstream official MMDetection repo. The previous one can be found in the old branch.

Installation

This VarifocalNet implementation is based on MMDetection. Therefore the installation is the same as original MMDetection.
Please check get_started.md for installation. Note that you should change the version of PyTorch and CUDA to yours when installing mmcv in step 3 and clone this repo instead of MMdetection in step 4.

If you run into problems with pycocotools, please install it by:

pip install "git+https://github.com/open-mmlab/cocoapi.git#subdirectory=pycocotools"

A Quick Demo

Once the installation is done, you can follow the steps below to run a quick demo.

Download the model and put it into one folder under the root directory of this project, say, checkpoints/.
Go to the root directory of this project in terminal and activate the corresponding virtual environment.

Run

python demo/image_demo.py demo/demo.jpg configs/vfnet/vfnet_r50_fpn_1x_coco.py checkpoints/vfnet_r50_1x_41.6.pth

and you should see an image with detections.

Usage of MMDetection

Please see exist_data_model.md for the basic usage of MMDetection. They also provide colab tutorial for beginners.

For troubleshooting, please refer to faq.md

Results and Models

For your convenience, we provide the following trained models. These models are trained with a mini-batch size of 16 images on 8 Nvidia V100 GPUs (2 images per GPU).

Backbone	Style	DCN	MS train	Lr schd	Inf time (fps)	box AP (val)	box AP (test-dev)	Download
R-50	pytorch	N	N	1x	19.4	41.6	41.6	model \| log
R-50	pytorch	N	Y	2x	19.3	44.5	44.8	model \| log
R-50	pytorch	Y	Y	2x	16.3	47.8	48.0	model \| log
R-101	pytorch	N	N	1x	15.5	43.0	43.6	model \| log
R-101	pytorch	N	N	2x	15.6	43.5	43.9	model \| log
R-101	pytorch	N	Y	2x	15.6	46.2	46.7	model \| log
R-101	pytorch	Y	Y	2x	12.6	49.0	49.2	model \| log
X-101-32x4d	pytorch	N	Y	2x	13.1	47.4	47.6	model \| log
X-101-32x4d	pytorch	Y	Y	2x	10.1	49.7	50.0	model \| log
X-101-64x4d	pytorch	N	Y	2x	9.2	48.2	48.5	model \| log
X-101-64x4d	pytorch	Y	Y	2x	6.7	50.4	50.8	model \| log
R2-101	pytorch	N	Y	2x	13.0	49.2	49.3	model \| log
R2-101	pytorch	Y	Y	2x	10.3	51.1	51.3	model \| log

Notes:

The MS-train maximum scale range is 1333x[480:960] (range mode) and the inference scale keeps 1333x800.
The R2-101 backbone is Res2Net-101.
DCN means using DCNv2 in both backbone and head.
The inference speed is tested with an Nvidia V100 GPU on HPC (log file).

We also provide the models of RetinaNet, FoveaBox, RepPoints and ATSS trained with the Focal Loss (FL) and our Varifocal Loss (VFL).

Method	Backbone	MS train	Lr schd	box AP (val)	Download
RetinaNet + FL	R-50	N	1x	36.5	model \| log
RetinaNet + VFL	R-50	N	1x	37.4	model \| log
FoveaBox + FL	R-50	N	1x	36.3	model \| log
FoveaBox + VFL	R-50	N	1x	37.2	model \| log
RepPoints + FL	R-50	N	1x	38.3	model \| log
RepPoints + VFL	R-50	N	1x	39.7	model \| log
ATSS + FL	R-50	N	1x	39.3	model \| log
ATSS + VFL	R-50	N	1x	40.2	model \| log

Notes:

We use 4 P100 GPUs for the training of these models (except ATSS, 8x2) with a mini-batch size of 16 images (4 images per GPU), as we found 4x4 training yielded slightly better results compared to 8x2 training.
You can find corresponding config files in configs/vfnet.
use_vfl flag in those config files controls whether to use the Varifocal Loss in training or not.

VFNet-X

Backbone	DCN	MS train	Training	Inf scale	Inf time (fps)	box AP (val)	box AP (test-dev)	Download
R2-101	Y	Y	41e + SWA 18e	1333x800	8.0	53.4	53.7	model \| config
R2-101	Y	Y	41e + SWA 18e	1800x1200	4.2	54.5	55.1

Notes:

We implement some improvements to the original VFNet. This version of VFNet is called VFNet-X and these improvements include:

PAFPN. We replace the FPN with the PAFPNX (minor modifications are made to the original PAFPN), and apply the DCN and group normalization (GN) in it.
More and Wider Conv Layers. We stack 4 convolution layers in the detection head, instead of 3 layers in the original VFNet, and increase the original 256 feature channels to 384 channels.
RandomCrop and Cutout. We employ the random crop and cutout as additional data augmentation methods.
Wider MSTrain Scale Range and Longer Training. We adopt a wider MSTrain scale range, from 750x500 to 2100x1400, and initially train the VFNet-X for 41 epochs.
SWA. We apply the technique of Stochastic Weight Averaging (SWA) in training the VFNet-X (for another 18 epochs), which brings 1.2 AP gain. Please see our work of SWA Object Detection for more details.
Soft-NMS. We apply soft-NMS in inference.

For more detailed information, please see the VFNet-X config file.

Inference

Assuming you have put the COCO dataset into data/coco/ and have downloaded the models into the checkpoints/, you can now evaluate the models on the COCO val2017 split:

./tools/dist_test.sh configs/vfnet/vfnet_r50_fpn_1x_coco.py checkpoints/vfnet_r50_1x_41.6.pth 8 --eval bbox

Notes:

If you have less than 8 gpus available on your machine, please change 8 into the number of your gpus.
If you want to evaluate a different model, please change the config file (in configs/vfnet) and corresponding model weights file.
Test time augmentation is supported for the VarifocalNet, including multi-scale testing and flip testing. If you are interested, please refer to an example config file vfnet_r50_fpn_1x_coco_tta.py. More information about test time augmentation can be found in the official script test_time_aug.py.

Training

The following command line will train vfnet_r50_fpn_1x_coco on 8 GPUs:

./tools/dist_train.sh configs/vfnet/vfnet_r50_fpn_1x_coco.py 8

Notes:

The models will be saved into work_dirs/vfnet_r50_fpn_1x_coco.
To use fewer GPUs, please change 8 to the number of your GPUs. If you want to keep the mini-batch size to 16, you need to change the samples_per_gpu and workers_per_gpu accordingly, so that samplers_per_gpu x number_of_gpus = 16. In general, workers_per_gpu = samples_per_gpu.
If you use a different mini-batch size, please change the learning rate according to the Linear Scaling Rule, e.g., lr=0.01 for 8 GPUs x 2 img/gpu and lr=0.005 for 4 GPUs x 2 img/gpu.
To train the VarifocalNet with other backbones, please change the config file accordingly.
To train the VarifocalNet on your own dataset, please follow this instruction.

Contributing

Any pull requests or issues are welcome.

Citation

Please consider citing our paper in your publications if the project helps your research. BibTeX reference is as follows:

@inproceedings{zhang2020varifocalnet,
  title={VarifocalNet: An IoU-aware Dense Object Detector},
  author={Zhang, Haoyang and Wang, Ying and Dayoub, Feras and S{\"u}nderhauf, Niko},
  booktitle={CVPR},
  year={2021}
}

Acknowledgment

We would like to thank MMDetection team for producing this great object detection toolbox!

License

This project is released under the Apache 2.0 license.

varifocalnet's People

Contributors

Stargazers

Watchers

varifocalnet's Issues

Varifocal loss is increasing.

Hi, thanks for sharing this excellent work. When I add an IoU prediction branch with varifocal loss to my anchor-free detection model, the mAP does increase, but I find the value of varifocal loss only decreases for the first several hundred iterations and then keeps increasing during the whole training process. I suspect the reason is that at the beginning the iou of all predicted bboxes are very small thus the network can predict 0 for all the bbox for a small loss value. Then with the training, the iou becomes larger and more diverse, so the error between predicted iou and gt iou might also increase. But it is still hard to understand why the loss term contributes to the final performance given that the values keep increasing during training. Do you have any similar observations or explanations?

不能正常的推理

Thanks for your error report and we appreciate it a lot.

Checklist

I have searched related issues but cannot get the expected help.
The bug has not been fixed in the latest version.

Describe the bug
A clear and concise description of what the bug is.

Reproduction

What command or script did you run?

A placeholder for the command.

Did you make any modifications on the code or config? Did you understand what you have modified?
What dataset did you use?

Environment

Please run python mmdet/utils/collect_env.py to collect necessary environment information and paste it here.
You may add addition that may be helpful for locating the problem, such as
- How you installed PyTorch [e.g., pip, conda, source]
- Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)

Error traceback
If applicable, paste the error trackback here.

A placeholder for trackback.

Bug fix
If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!

Welcome update to OpenMMLab 2.0

I am Vansin, the technical operator of OpenMMLab. In September of last year, we announced the release of OpenMMLab 2.0 at the World Artificial Intelligence Conference in Shanghai. We invite you to upgrade your algorithm library to OpenMMLab 2.0 using MMEngine, which can be used for both research and commercial purposes. If you have any questions, please feel free to join us on the OpenMMLab Discord at https://discord.gg/amFNsyUBvm or add me on WeChat (van-sin) and I will invite you to the OpenMMLab WeChat group.

Here are the OpenMMLab 2.0 repos branches:

	OpenMMLab 1.0 branch	OpenMMLab 2.0 branch
MMEngine		0.x
MMCV	1.x	2.x
MMDetection	0.x 、1.x、2.x	3.x
MMAction2	0.x	1.x
MMClassification	0.x	1.x
MMSegmentation	0.x	1.x
MMDetection3D	0.x	1.x
MMEditing	0.x	1.x
MMPose	0.x	1.x
MMDeploy	0.x	1.x
MMTracking	0.x	1.x
MMOCR	0.x	1.x
MMRazor	0.x	1.x
MMSelfSup	0.x	1.x
MMRotate	1.x	1.x
MMYOLO		0.x

Attention: please create a new virtual environment for OpenMMLab 2.0.

where is Star-Shaped Box Feature Representation and Bounding Box Refinement in the code

in vfnet_raw_head.py, i don't see the star-shaped bounding box feature representation
and the bounding box refinement. I am interested in there and want to know how to do it

KeyError: 'ATSSVGFLHead is not in the head registry'

Hello, I run the vfnet_r50_fpn_1x_coco.py and it works fine. However, when I run the vfl_atss_r50_fpn_1x_coco.py, I come across such problem, just as the title says. I have checked the init.py, and find ATSSVGFLHead is already there.
from .atss_vgfl_head import ATSSVGFLHead
__all__ = [ 'AnchorFreeHead', 'AnchorHead', 'GuidedAnchorHead', 'FeatureAdaption', 'RPNHead', 'GARPNHead', 'RetinaHead', 'RetinaSepBNHead', 'GARetinaHead', 'SSDHead', 'FCOSHead', 'RepPointsHead', 'FoveaHead', 'FreeAnchorRetinaHead', 'ATSSHead', 'FSAFHead', 'NASFCOSHead', 'PISARetinaHead', 'PISASSDHead', 'GFLHead', 'CornerHead', 'YOLACTHead', 'YOLACTSegmHead', 'YOLACTProtonet', 'YOLOV3Head', 'PAAHead', 'SABLRetinaHead', 'CentripetalHead', 'VFNetHead', 'TransformerHead', 'StageCascadeRPNHead', 'CascadeRPNHead', 'EmbeddingRPNHead', 'ATSSRawHead', 'ATSSVGFLHead', 'VFNetRawHead' ]
Also, the atss_vgfl_head.py has already registered such module by @HEADS.register_module(). So I am confused about where the problem is, and I would like you to give me some advice. Thanks aaaaa lot~~

dist_train keep waiting

My env：
cuda10.2
torch==1.6.0
mmdetection==2.8.0
mmcv==1.2.4
After some iters the GPU-Util 100% but the process is always waiting
Could you provide your env or any advice?

dist_train.sh unexpectedly hangs after 3-5 epochs

Describe the bug
While running distributed training, the script will work fine for 3-5 epochs, then stop running. The GPUs are still active and there is no error or stacktrace provided, but there will be no more output. I cannot tell why it's happening as I've run again and again with the same configuration and environment and the script will stop at irregular intervals. It always seems to be early on, as the latest it has hung is 5 epochs.

Reproduction

./tools/dist_train.sh /home/ec2-user/vfnetx_config.py 8
(The config file is the same as the one in the repo, I just renamed it.)

Did you make any modifications on the code or config? Did you understand what you have modified?
I used this config: https://github.com/hyz-xmaster/VarifocalNet/blob/master/configs/vfnet/vfnetx_r2_101_fpn_mdconv_c3-c5_mstrain_59e_coco.py
The only difference was the datasets I used (custom COCO datasets)

Environment

sys.platform: linux
Python: 3.6.13 | packaged by conda-forge | (default, Feb 19 2021, 05:36:01) [GCC 9.3.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: Tesla V100-SXM2-16GB
CUDA_HOME: /usr/local/cuda-10.1
NVCC: Cuda compilation tools, release 10.1, V10.1.243
GCC: gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28)
PyTorch: 1.7.1
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 10.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  - CuDNN 7.6.3
  - Magma 2.5.2
  - Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 

TorchVision: 0.8.2
OpenCV: 4.5.1
MMCV: 1.2.7
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 10.1
MMDetection: 2.10.0+f459696

How to reimplement IACS?

I want to reimplement VFNet by detectron2. Can you describe the technical details of implementing it in detail? please.

About load model

Hi, when I load vfnet_r50_fpn_1x_coco_20201027-38db6f58.pth model
there has some error:
RuntimeError: version_ <= kMaxSupportedFileFormatVersion INTERNAL ASSERT FAILED at /pytorch/caffe2/serialize/inline_container.cc:132, please repor
t a bug to PyTorch. Attempted to read a PyTorch file with version 3, but the maximum supported version for reading is 2. Your PyTorch installation may be too old. (init at /pytorch/caffe2/serialize/inline_container.cc:132)

my torch version

torch.version
'1.4.0+cu100'

Question about `detach`

Hi,

Thanks for the nice work. For calculating the IoU target, I think the detach should be used on the "predicted items". Specifically, in my view, for these two lines with detach applied, the detach should be moved to their previous lines, i.e. line 408 and 424. Correct me if I miss something.
https://github.com/hyz-xmaster/VarifocalNet/blob/master/mmdet/models/dense_heads/vfnet_head.py#L409
https://github.com/hyz-xmaster/VarifocalNet/blob/master/mmdet/models/dense_heads/vfnet_head.py#L425

Could you please explain it more? Thanks.

VFNet-X's config file 404 error

run demo report error

Hi
I run the demo and report one problem as the follow picture and my environment is follow:
cuda=10.1
pytorch=1.5
mmdetection=2.6
mmcv-full=1.15
Do you know the problem?

dist_train keep waiting with multiple GPUs and samples_per_gpu = 1

First of all, thank you for your work and for your repo.

Environment:

pytorch 1.5.1
cuda 10.2
cudnn 7.6.5
mmdetection 2.3.0
4xV100 16GB

My config file is based on: vfnet_r50_fpn_mstrain_2x, modified to a custom dataset having large images (2560x1440) and mainly small objects 10-60px

Training with multiple GPUs and samples_per_gpu = 1, workers_per_gpu = 1, train hangs at the beginning with all GPU_Util at 100%.
Training with multiple GPUs, samples_per_gpu = 2, workers_per_gpu = 2 (and smaller image size) train goes well.

Somehow similar to this issue: 2193

Do you think the `VarifocalLoss` could be used for labels with value of 0 & 1 ?

AttributeError:'ConfigDict' object has no attribute 'test_cfg'

Notice

There are several common situations in the reimplementation issues as below

Reimplement a model in the model zoo using the provided configs
Reimplement a model in the model zoo on other dataset (e.g., custom datasets)
Reimplement a custom model but all the components are implemented in MMDetection
Reimplement a custom model with new modules implemented by yourself

There are several things to do for different cases as below.

For case 1 & 3, please follow the steps in the following sections thus we could help to quick identify the issue.
For case 2 & 4, please understand that we are not able to do much help here because we usually do not know the full code and the users should be responsible to the code they write.
One suggestion for case 2 & 4 is that the users should first check whether the bug lies in the self-implemented code or the original code. For example, users can first make sure that the same model runs well on supported datasets. If you still need help, please describe what you have done and what you obtain in the issue, and follow the steps in the following sections and try as clear as possible so that we can better help you.

Checklist

I have searched related issues but cannot get the expected help.
The issue has not been fixed in the latest version.

Describe the issue

A clear and concise description of what the problem you meet and what have you done.

Reproduction

What command or script did you run?

A placeholder for the command.

What config dir you run?

A placeholder for the config.

Did you make any modifications on the code or config? Did you understand what you have modified?
What dataset did you use?

Environment

Please run python mmdet/utils/collect_env.py to collect necessary environment information and paste it here.
You may add addition that may be helpful for locating the problem, such as
1. How you installed PyTorch [e.g., pip, conda, source]
2. Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)

Results

If applicable, paste the related results here, e.g., what you expect and what you get.

A placeholder for results comparison

Issue fix

If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!

Can varifocal loss be applied to softmax classifier?

Actually, I don't know why most one stage detectors use sigmoid classifier even though COCO is a multi-classification task. Do you have a more reasonable explanation？I'm really confused about it. And could FCOS-based detector use softmax classifier? I'm applying FCOS-based detector(similar to VarifocalNet) to logo detection ( the num of categories could be 1000). But I get many false positives at the same position(with different categories). I think this might be the reason for using the sigmoid classifier(instead of softmax classifier).

So I want to use softmax classifier in FCOS, but I'm worried that the varifocal loss can only work on sigmoid classifier, is that so?

VFocalLoss in yolov5

hi，i test VFocalLoss in yolov5，but not getting improvment.
did you have done some test about yolov5 ? or any suggestion ?
thank your ~

class VFocalLoss(nn.Module):

def __init__(self, loss_fcn, gamma=2.0, alpha=0.75): #runs/train/exp28

    super(VFocalLoss, self).__init__()
    # 传递 nn.BCEWithLogitsLoss() 损失函数  must be nn.BCEWithLogitsLoss()

    self.loss_fcn = loss_fcn  #

    self.gamma = gamma

    self.alpha = alpha

    self.reduction = loss_fcn.reduction

    self.loss_fcn.reduction = 'mean'  # required to apply VFL to each element

def forward(self, pred, true):

    loss = self.loss_fcn(pred, true)

    pred_prob = torch.sigmoid(pred)  # prob from logits

    focal_weight = true * (true > 0.0).float() + self.alpha * (pred_prob - true).abs().pow(self.gamma) * (true <= 0.0).float()
    loss *= focal_weight

    if self.reduction == 'mean':
        return loss.mean()
    elif self.reduction == 'sum':
        return loss.sum()
    else:
        return loss

Question about Varifocal loss

In the paper, the negtive weight of BCE loss is alpha*p^gamma. However, in varifocal_loss.py， the loss is implemented by:

focal_weight = target * (target > 0.0).float() +
alpha * (pred_sigmoid - target).abs().pow(gamma) *
(target <= 0.0).float()

The negtive weight is alpha(p-q)^gamma*, why?

GPU error

When I test my model with demo/image_demo.py, the strange CUDA error occurs. I train it on my server(10 GPU), and test it on my computer(only 1gpu).
RuntimeError: Attempting to deserialize object on CUDA device 7 but torch.cuda.device_count() is 1. Please use torch.load with map_location to map your storages to an existing device.

Find matching configuration file for a model

Dear authors, thank you for your work and for your repo.

Is there a simple way how to find matching configuration .py file (from /configs folder) for a particular .pth file?

E.g., I am looking for the configuration files for models:
vfnet_r50_dcn_ms_2x_47.8.pth
vfnet_r2_101_dcn_ms_2x_51.1.pth

For ResNet 50 I have tried vrefnet_r50_fpn_1x_coco.py and vrefnet_r50_fpn_mstrain_2x_coco.py giving me mAP 0.91. For R2 I have tried vfnet_r2_101_fpn_mstrain_2x_coco.py, got error 'pth file is not valid checkpoint file'.

Thank you for your answer in advance.

Best regards
MV

用真实gt_cls替代预测分类分支值有几个问题想请教

Table2中的对比实验。
是把分类分支的max score用1去替换吗？
如果这样的话，
1.nms的排序这些框不都是1了吗，怎么去做nms呢？
2.另外，做完nms后，去评估AP值的时候，score都是1，据我的了解，AP是对score敏感的，那这样还怎么计算AP啊？

Questions about reimplementing the experiments in your paper.

Thanks for your great contribution. I have read your paper, but have some questions about the experiments in Table 1.

The replacement of the predicted results is only implemented in the inference process rather than training?
How to replace the predicted classification score with its ground truth, In other words, how to assign labels to anchors when replacing? Is it the same way in FCOS, just assign the anchor to the gt box it's in?

Varifocal Loss for YOLOv5

Good day,

I am experimenting and trying to use the varifocal loss for YOLOv5 but i get the error "AssertionError: Only sigmoid varifocal loss supported now."

How do I use varifocal loss for YOLOv5? Is it possible?
Thanks in advance

Some question about the inference

Thanks for your nice work~ but I met some questions when I used this model in the customer dataset.

I trained a nice model that achieved satisfying mAP in the validation dataset and the first test dataset (A). But in the second test dataset(B), the trained VfNet performed poorly. And I found that test results were different via many tests. for example, the output json file of the first test was 15MB, the output json file of the second test was 20MB, and so on. I still don't know what caused that. I guessed that may be the multi-scale test?

That's my config:

model = dict(
    type='VFNet',
    pretrained='open-mmlab://res2net101_v1d_26w_4s',
    backbone=dict(
        type='Res2Net',
        depth=101,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        norm_cfg=dict(type='BN', requires_grad=True),
        norm_eval=True,
        style='pytorch',
        dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False),
        stage_with_dcn=(False, True, True, True),
        plugins=[dict(cfg=dict(type='ContextBlock', ratio=1. / 4),
                      stages=(False, True, True, True),
                      position='after_conv3')]
    ),
    neck=dict(
        type='FPN',
        in_channels=[256, 512, 1024, 2048],
        out_channels=256,
        start_level=1,
        add_extra_convs=True,
        extra_convs_on_inputs=False,  # use P5
        num_outs=5,
        relu_before_extra_convs=True),
    bbox_head=dict(
        type='VFNetHead',
        num_classes=num_classes,
        in_channels=256,
        stacked_convs=3,
        feat_channels=256,
        strides=[8, 16, 32, 64, 128],
        center_sampling=False,
        dcn_on_last_conv=True,
        use_atss=True,
        use_vfl=True,
        loss_cls=dict(
            type='VarifocalLoss',
            use_sigmoid=True,
            alpha=0.75,
            gamma=2.0,
            iou_weighted=True,
            loss_weight=1.0),
        loss_bbox=dict(type='GIoULoss', loss_weight=1.5),
        loss_bbox_refine=dict(type='GIoULoss', loss_weight=2.0)),
    # training and testing settings
    train_cfg=dict(
        assigner=dict(type='ATSSAssigner', topk=9),
        allowed_border=-1,
        pos_weight=-1,
        debug=False),
    test_cfg=dict(
        nms_pre=1000,
        min_bbox_size=0,
        score_thr=0.05,
        nms=dict(type='soft_nms', iou_threshold=0.5),
        max_per_img=150))  # 150

img_norm_cfg = dict(mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)

albu_train_transforms = [
    dict(
        type='OneOf',
        transforms=[
            dict(type='IAAAdditiveGaussianNoise', p=0.5),
            dict(type='CLAHE'),
            dict(type='IAASharpen'),
            dict(type='IAAEmboss'),
            dict(type='RandomBrightnessContrast')], p=0.5),
]

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='Mosaic', prob=0.5, img_dir='data/train/image',
         json_path='data/annotation/train2.json'),
    dict(type='Resize', img_scale=[(4096, 600), (4096, 1000)], multiscale_mode='range', keep_ratio=True),
    dict(type='RandomFlip', flip_ratio=0.5),

    dict(type='Albu',
         transforms=albu_train_transforms,
         bbox_params=dict(type='BboxParams',
                          format='pascal_voc',
                          label_fields=['gt_labels'],
                          min_visibility=0.0,
                          filter_lost_elements=True),
         keymap={'img': 'image', 'gt_bboxes': 'bboxes'},
         update_pad_shape=False,
         skip_img_without_anno=True),

    dict(type='Normalize', **img_norm_cfg),
    dict(type='GridMask', use_h=True, use_w=True, rotate=1, offset=False, ratio=0.5, mode=1, prob=0.8),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=[(4096, 600), (4096, 800), (4096, 1000)],
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=32),
            dict(type='DefaultFormatBundle'),
            dict(type='Collect', keys=['img']),
        ])
]

data = dict(
    samples_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        type=dataset_type,
        classes=classes,
        ann_file=train_ann_file,
        img_prefix=image_dir,
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        classes=classes,
        ann_file=val_ann_file,
        img_prefix=image_dir,
        pipeline=test_pipeline),
    test=dict(
        type=dataset_type,
        classes=classes,
        ann_file=test_ann_file,
        img_prefix=test_image_dir,
        pipeline=test_pipeline))

optimizer = dict(lr=0.01, paramwise_cfg=dict(bias_lr_mult=2., bias_decay_mult=0.))

optimizer_config = dict(grad_clip=None)

lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=0.1,
    step=[36, 40])

log_config = dict(interval=50,
                  hooks=[
                      dict(type='TextLoggerHook'),
                      # dict(type='TensorboardLoggerHook')
                      ])

workflow = [('train', 1)]


runner = dict(type='EpochBasedRunner', max_epochs=41)

swa_optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)
swa_lr_config = dict(
    policy='cyclic',
    target_ratio=(1, 0.01),
    cyclic_times=12,
    step_ratio_up=0.0)
swa_runner = dict(type='EpochBasedRunner', max_epochs=18)

load_from = 'checkpoints/vfnet_r2_101_dcn_ms_2x_51.1.pth'

Looking for your reply, Thanks!

VarifocalLoss

Hello, I use VarifocalLoss,There is a problem in the picture, I look at the code setting true, how should this problem be solved?

How to implement on the origin mmdetection

I've installed mmdetection and I want to implement your algorithm，compared with the original mmdetection, does your implementation only need to add the following files：
1.configs/vfnet
2.models/dense_heads/vfnet_head.py
3.models/detectors/vfnet.py
4.models/detectors/varifocal_loss.py

Using MMDet version of VFNet with the lastest backbone (e,g. Poolformer S36, ConvNeXt Small) with Inf Issues on Varifocal loss

Thank you for your excellent work.
I am now experiment on improving VFNet with the latest model backbone. (e,g. Poolformer S36, ConvNeXt Small)
The network works fine on the first 5 epochs and suffer from significant performance drop caused by unexpected Inf value of cls_loss ( In my case is varifocal loss).
I am hoping for getting some advice for tracking the issue.
(I have tried grad_clip to clip gradient of Inf value, but it does not solve the issue)

Load weight mismatch

@hyz-xmaster Hi, thanks for your great work. I'm using VFNetX with custom data set with 14 classes and get this warning. But when i use VFNet with R50 backbone with no DCN this warning disappeared. Is it normal ?

Question about ablation studies in paper

Hi, thanks for your work and repo. I'm very interested in the VFL, which combines classification scores and location scores in the targets. Then I have some questions about VFL.

In table 3 of the paper, the first row represents the results of the raw VFNet trained with the focal loss. What is raw VFNet?
Is it FCOS+ATSS with the centerness branch removed？
If not, have you compared the performances between applying VFL to FCOS+ATSS with the centerness branch removed and applying FL to FCOS+ATSS(with the centerness branch) ？

Thank you very much!

About applying Varfifocal to yolox objectness loss

Hi, I applied vaerifocal loss to yolox objectness (also tried cls loss), all make AP massively drop, this seems abnormal. Do u have any idea why?

It seems the objloss decreased alittle bit, but cls loss get increased, (objloss: Varifocalloss, clsloss: BCELogits)

Gradient Multiplier Question

Fantastic work!

After generating the first set of predicted bounding boxes supervised by the first giou loss, these predicted boxes are partially detached by the Gradient Multiplier term. Looks like in all experiments the gradient_mul is set to .1 meaning 10% of the gradient propagates through the predicted boxes reformulated as deformable conv offsets relative to a gradient_mul setting of 1.

Is the idea to enable the Varifocal loss and second (refinement) giou loss to partially contribute to the learning of the offsets in addition to the supervision from the first giou loss?

Thank you!

Architecture dimension

Hello everyone,
I would like to ask about the varifocalNet head architecture. As I understand, the output from feature pyramid has different levels and thus different dimensions. However, I read in the paper, you shown that the input dimension to the head is HxWx256. Is it the same for every levels? The outputs from feature pyramid with backbone resnet50 are (batch, 256, 52, 52), (batch, 256, 26, 26), (batch, 256, 13, 13), (batch, 256, 7, 7), (batch, 256, 4, 4) which mean that I need to upsample to HxW for every feature height and width?
I hope my question will get a reply :D. Have a good day and research.

hyz-xmaster / varifocalnet Goto Github PK

varifocalnet's Introduction

VarifocalNet: An IoU-aware Dense Object Detector

Introduction

Updates

Installation

A Quick Demo

Usage of MMDetection

Results and Models

VFNet-X

Inference

Training

Contributing

Citation

Acknowledgment

License

varifocalnet's People

Contributors

Stargazers

Watchers

Forkers

varifocalnet's Issues

Welcome update to OpenMMLab 2.0

That's my config:

Recommend Projects

Recommend Topics

Recommend Org