hyz-xmaster / varifocalnet Goto Github PK
View Code? Open in Web Editor NEWVarifocalNet: An IoU-aware Dense Object Detector
License: Apache License 2.0
VarifocalNet: An IoU-aware Dense Object Detector
License: Apache License 2.0
How to visualize detection results?
My env:
cuda10.2
torch==1.6.0
mmdetection==2.8.0
mmcv==1.2.4
After some iters the GPU-Util 100% but the process is always waiting
Could you provide your env or any advice?
Hello everyone,
I would like to ask about the varifocalNet head architecture. As I understand, the output from feature pyramid has different levels and thus different dimensions. However, I read in the paper, you shown that the input dimension to the head is HxWx256. Is it the same for every levels? The outputs from feature pyramid with backbone resnet50 are (batch, 256, 52, 52), (batch, 256, 26, 26), (batch, 256, 13, 13), (batch, 256, 7, 7), (batch, 256, 4, 4) which mean that I need to upsample to HxW for every feature height and width?
I hope my question will get a reply :D. Have a good day and research.
Actually, I don't know why most one stage detectors use sigmoid classifier even though COCO is a multi-classification task. Do you have a more reasonable explanation?I'm really confused about it. And could FCOS-based detector use softmax classifier? I'm applying FCOS-based detector(similar to VarifocalNet) to logo detection ( the num of categories could be 1000). But I get many false positives at the same position(with different categories). I think this might be the reason for using the sigmoid classifier(instead of softmax classifier).
So I want to use softmax classifier in FCOS, but I'm worried that the varifocal loss can only work on sigmoid classifier, is that so?
Describe the bug
While running distributed training, the script will work fine for 3-5 epochs, then stop running. The GPUs are still active and there is no error or stacktrace provided, but there will be no more output. I cannot tell why it's happening as I've run again and again with the same configuration and environment and the script will stop at irregular intervals. It always seems to be early on, as the latest it has hung is 5 epochs.
Reproduction
./tools/dist_train.sh /home/ec2-user/vfnetx_config.py 8
(The config file is the same as the one in the repo, I just renamed it.)
Environment
sys.platform: linux
Python: 3.6.13 | packaged by conda-forge | (default, Feb 19 2021, 05:36:01) [GCC 9.3.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: Tesla V100-SXM2-16GB
CUDA_HOME: /usr/local/cuda-10.1
NVCC: Cuda compilation tools, release 10.1, V10.1.243
GCC: gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28)
PyTorch: 1.7.1
PyTorch compiling details: PyTorch built with:
- GCC 7.3
- C++ Version: 201402
- Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- NNPACK is enabled
- CPU capability usage: AVX2
- CUDA Runtime 10.1
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
- CuDNN 7.6.3
- Magma 2.5.2
- Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,
TorchVision: 0.8.2
OpenCV: 4.5.1
MMCV: 1.2.7
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 10.1
MMDetection: 2.10.0+f459696
hi,i test VFocalLoss in yolov5,but not getting improvment.
did you have done some test about yolov5 ? or any suggestion ?
thank your ~
class VFocalLoss(nn.Module):
def __init__(self, loss_fcn, gamma=2.0, alpha=0.75): #runs/train/exp28
super(VFocalLoss, self).__init__()
# 传递 nn.BCEWithLogitsLoss() 损失函数 must be nn.BCEWithLogitsLoss()
self.loss_fcn = loss_fcn #
self.gamma = gamma
self.alpha = alpha
self.reduction = loss_fcn.reduction
self.loss_fcn.reduction = 'mean' # required to apply VFL to each element
def forward(self, pred, true):
loss = self.loss_fcn(pred, true)
pred_prob = torch.sigmoid(pred) # prob from logits
focal_weight = true * (true > 0.0).float() + self.alpha * (pred_prob - true).abs().pow(self.gamma) * (true <= 0.0).float()
loss *= focal_weight
if self.reduction == 'mean':
return loss.mean()
elif self.reduction == 'sum':
return loss.sum()
else:
return loss
Fantastic work!
After generating the first set of predicted bounding boxes supervised by the first giou loss, these predicted boxes are partially detached by the Gradient Multiplier term. Looks like in all experiments the gradient_mul is set to .1 meaning 10% of the gradient propagates through the predicted boxes reformulated as deformable conv offsets relative to a gradient_mul setting of 1.
Is the idea to enable the Varifocal loss and second (refinement) giou loss to partially contribute to the learning of the offsets in addition to the supervision from the first giou loss?
Thank you!
Hi,
Thanks for the nice work. For calculating the IoU target, I think the detach
should be used on the "predicted items". Specifically, in my view, for these two lines with detach applied, the detach should be moved to their previous lines, i.e. line 408 and 424. Correct me if I miss something.
https://github.com/hyz-xmaster/VarifocalNet/blob/master/mmdet/models/dense_heads/vfnet_head.py#L409
https://github.com/hyz-xmaster/VarifocalNet/blob/master/mmdet/models/dense_heads/vfnet_head.py#L425
Could you please explain it more? Thanks.
First of all, thank you for your work and for your repo.
Environment:
pytorch 1.5.1
cuda 10.2
cudnn 7.6.5
mmdetection 2.3.0
4xV100 16GB
My config file is based on: vfnet_r50_fpn_mstrain_2x, modified to a custom dataset having large images (2560x1440) and mainly small objects 10-60px
Training with multiple GPUs and samples_per_gpu = 1, workers_per_gpu = 1, train hangs at the beginning with all GPU_Util at 100%.
Training with multiple GPUs, samples_per_gpu = 2, workers_per_gpu = 2 (and smaller image size) train goes well.
Somehow similar to this issue: 2193
Thanks for your nice work~ but I met some questions when I used this model in the customer dataset.
I trained a nice model that achieved satisfying mAP in the validation dataset and the first test dataset (A). But in the second test dataset(B), the trained VfNet performed poorly. And I found that test results were different via many tests. for example, the output json file of the first test was 15MB, the output json file of the second test was 20MB, and so on. I still don't know what caused that. I guessed that may be the multi-scale test?
model = dict(
type='VFNet',
pretrained='open-mmlab://res2net101_v1d_26w_4s',
backbone=dict(
type='Res2Net',
depth=101,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=1,
norm_cfg=dict(type='BN', requires_grad=True),
norm_eval=True,
style='pytorch',
dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False),
stage_with_dcn=(False, True, True, True),
plugins=[dict(cfg=dict(type='ContextBlock', ratio=1. / 4),
stages=(False, True, True, True),
position='after_conv3')]
),
neck=dict(
type='FPN',
in_channels=[256, 512, 1024, 2048],
out_channels=256,
start_level=1,
add_extra_convs=True,
extra_convs_on_inputs=False, # use P5
num_outs=5,
relu_before_extra_convs=True),
bbox_head=dict(
type='VFNetHead',
num_classes=num_classes,
in_channels=256,
stacked_convs=3,
feat_channels=256,
strides=[8, 16, 32, 64, 128],
center_sampling=False,
dcn_on_last_conv=True,
use_atss=True,
use_vfl=True,
loss_cls=dict(
type='VarifocalLoss',
use_sigmoid=True,
alpha=0.75,
gamma=2.0,
iou_weighted=True,
loss_weight=1.0),
loss_bbox=dict(type='GIoULoss', loss_weight=1.5),
loss_bbox_refine=dict(type='GIoULoss', loss_weight=2.0)),
# training and testing settings
train_cfg=dict(
assigner=dict(type='ATSSAssigner', topk=9),
allowed_border=-1,
pos_weight=-1,
debug=False),
test_cfg=dict(
nms_pre=1000,
min_bbox_size=0,
score_thr=0.05,
nms=dict(type='soft_nms', iou_threshold=0.5),
max_per_img=150)) # 150
img_norm_cfg = dict(mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
albu_train_transforms = [
dict(
type='OneOf',
transforms=[
dict(type='IAAAdditiveGaussianNoise', p=0.5),
dict(type='CLAHE'),
dict(type='IAASharpen'),
dict(type='IAAEmboss'),
dict(type='RandomBrightnessContrast')], p=0.5),
]
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True),
dict(type='Mosaic', prob=0.5, img_dir='data/train/image',
json_path='data/annotation/train2.json'),
dict(type='Resize', img_scale=[(4096, 600), (4096, 1000)], multiscale_mode='range', keep_ratio=True),
dict(type='RandomFlip', flip_ratio=0.5),
dict(type='Albu',
transforms=albu_train_transforms,
bbox_params=dict(type='BboxParams',
format='pascal_voc',
label_fields=['gt_labels'],
min_visibility=0.0,
filter_lost_elements=True),
keymap={'img': 'image', 'gt_bboxes': 'bboxes'},
update_pad_shape=False,
skip_img_without_anno=True),
dict(type='Normalize', **img_norm_cfg),
dict(type='GridMask', use_h=True, use_w=True, rotate=1, offset=False, ratio=0.5, mode=1, prob=0.8),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=[(4096, 600), (4096, 800), (4096, 1000)],
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img']),
])
]
data = dict(
samples_per_gpu=2,
workers_per_gpu=2,
train=dict(
type=dataset_type,
classes=classes,
ann_file=train_ann_file,
img_prefix=image_dir,
pipeline=train_pipeline),
val=dict(
type=dataset_type,
classes=classes,
ann_file=val_ann_file,
img_prefix=image_dir,
pipeline=test_pipeline),
test=dict(
type=dataset_type,
classes=classes,
ann_file=test_ann_file,
img_prefix=test_image_dir,
pipeline=test_pipeline))
optimizer = dict(lr=0.01, paramwise_cfg=dict(bias_lr_mult=2., bias_decay_mult=0.))
optimizer_config = dict(grad_clip=None)
lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=500,
warmup_ratio=0.1,
step=[36, 40])
log_config = dict(interval=50,
hooks=[
dict(type='TextLoggerHook'),
# dict(type='TensorboardLoggerHook')
])
workflow = [('train', 1)]
runner = dict(type='EpochBasedRunner', max_epochs=41)
swa_optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)
swa_lr_config = dict(
policy='cyclic',
target_ratio=(1, 0.01),
cyclic_times=12,
step_ratio_up=0.0)
swa_runner = dict(type='EpochBasedRunner', max_epochs=18)
load_from = 'checkpoints/vfnet_r2_101_dcn_ms_2x_51.1.pth'
Looking for your reply, Thanks!
Thanks for your great contribution. I have read your paper, but have some questions about the experiments in Table 1.
The replacement of the predicted results is only implemented in the inference process rather than training?
How to replace the predicted classification score with its ground truth, In other words, how to assign labels to anchors when replacing? Is it the same way in FCOS, just assign the anchor to the gt box it's in?
Dear authors, thank you for your work and for your repo.
Is there a simple way how to find matching configuration .py file (from /configs folder) for a particular .pth file?
E.g., I am looking for the configuration files for models:
vfnet_r50_dcn_ms_2x_47.8.pth
vfnet_r2_101_dcn_ms_2x_51.1.pth
For ResNet 50 I have tried vrefnet_r50_fpn_1x_coco.py
and vrefnet_r50_fpn_mstrain_2x_coco.py
giving me mAP 0.91. For R2 I have tried vfnet_r2_101_fpn_mstrain_2x_coco.py
, got error 'pth file is not valid checkpoint file'.
Thank you for your answer in advance.
Best regards
MV
Hi, @hyz-xmaster
I noticed that the cls_loss using varifocal loss does not decrease until roughly 1000 iterations. Could you give us some insights?
Good day,
I am experimenting and trying to use the varifocal loss for YOLOv5 but i get the error "AssertionError: Only sigmoid varifocal loss supported now."
How do I use varifocal loss for YOLOv5? Is it possible?
Thanks in advance
Thanks for your error report and we appreciate it a lot.
Checklist
Describe the bug
A clear and concise description of what the bug is.
Reproduction
A placeholder for the command.
Environment
python mmdet/utils/collect_env.py
to collect necessary environment information and paste it here.$PATH
, $LD_LIBRARY_PATH
, $PYTHONPATH
, etc.)Error traceback
If applicable, paste the error trackback here.
A placeholder for trackback.
Bug fix
If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!
Hello, I run the vfnet_r50_fpn_1x_coco.py and it works fine. However, when I run the vfl_atss_r50_fpn_1x_coco.py, I come across such problem, just as the title says. I have checked the init.py, and find ATSSVGFLHead is already there.
from .atss_vgfl_head import ATSSVGFLHead
__all__ = [ 'AnchorFreeHead', 'AnchorHead', 'GuidedAnchorHead', 'FeatureAdaption', 'RPNHead', 'GARPNHead', 'RetinaHead', 'RetinaSepBNHead', 'GARetinaHead', 'SSDHead', 'FCOSHead', 'RepPointsHead', 'FoveaHead', 'FreeAnchorRetinaHead', 'ATSSHead', 'FSAFHead', 'NASFCOSHead', 'PISARetinaHead', 'PISASSDHead', 'GFLHead', 'CornerHead', 'YOLACTHead', 'YOLACTSegmHead', 'YOLACTProtonet', 'YOLOV3Head', 'PAAHead', 'SABLRetinaHead', 'CentripetalHead', 'VFNetHead', 'TransformerHead', 'StageCascadeRPNHead', 'CascadeRPNHead', 'EmbeddingRPNHead', 'ATSSRawHead', 'ATSSVGFLHead', 'VFNetRawHead' ]
Also, the atss_vgfl_head.py has already registered such module by @HEADS.register_module()
. So I am confused about where the problem is, and I would like you to give me some advice. Thanks aaaaa lot~~
I've installed mmdetection and I want to implement your algorithm,compared with the original mmdetection, does your implementation only need to add the following files:
1.configs/vfnet
2.models/dense_heads/vfnet_head.py
3.models/detectors/vfnet.py
4.models/detectors/varifocal_loss.py
Hi, thanks for sharing this excellent work. When I add an IoU prediction branch with varifocal loss to my anchor-free detection model, the mAP does increase, but I find the value of varifocal loss only decreases for the first several hundred iterations and then keeps increasing during the whole training process. I suspect the reason is that at the beginning the iou of all predicted bboxes are very small thus the network can predict 0 for all the bbox for a small loss value. Then with the training, the iou becomes larger and more diverse, so the error between predicted iou and gt iou might also increase. But it is still hard to understand why the loss term contributes to the final performance given that the values keep increasing during training. Do you have any similar observations or explanations?
Thank you for your excellent work.
I am now experiment on improving VFNet with the latest model backbone. (e,g. Poolformer S36, ConvNeXt Small)
The network works fine on the first 5 epochs and suffer from significant performance drop caused by unexpected Inf value of cls_loss ( In my case is varifocal loss).
I am hoping for getting some advice for tracking the issue.
(I have tried grad_clip to clip gradient of Inf value, but it does not solve the issue)
Hi, when I load vfnet_r50_fpn_1x_coco_20201027-38db6f58.pth model
there has some error:
RuntimeError: version_ <= kMaxSupportedFileFormatVersion INTERNAL ASSERT FAILED at /pytorch/caffe2/serialize/inline_container.cc:132, please repor
t a bug to PyTorch. Attempted to read a PyTorch file with version 3, but the maximum supported version for reading is 2. Your PyTorch installation may be too old. (init at /pytorch/caffe2/serialize/inline_container.cc:132)
my torch version
torch.version
'1.4.0+cu100'
in vfnet_raw_head.py, i don't see the star-shaped bounding box feature representation
and the bounding box refinement. I am interested in there and want to know how to do it
Notice
There are several common situations in the reimplementation issues as below
There are several things to do for different cases as below.
Checklist
Describe the issue
A clear and concise description of what the problem you meet and what have you done.
Reproduction
A placeholder for the command.
A placeholder for the config.
Environment
python mmdet/utils/collect_env.py
to collect necessary environment information and paste it here.$PATH
, $LD_LIBRARY_PATH
, $PYTHONPATH
, etc.)Results
If applicable, paste the related results here, e.g., what you expect and what you get.
A placeholder for results comparison
Issue fix
If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!
Table2中的对比实验。
是把分类分支的max score用1去替换吗?
如果这样的话,
1.nms的排序这些框不都是1了吗,怎么去做nms呢?
2.另外,做完nms后,去评估AP值的时候,score都是1,据我的了解,AP是对score敏感的,那这样还怎么计算AP啊?
When I test my model with demo/image_demo.py, the strange CUDA error occurs. I train it on my server(10 GPU), and test it on my computer(only 1gpu).
RuntimeError: Attempting to deserialize object on CUDA device 7 but torch.cuda.device_count() is 1. Please use torch.load with map_location to map your storages to an existing device.
@hyz-xmaster Hi, thanks for your great work. I'm using VFNetX with custom data set with 14 classes and get this warning. But when i use VFNet with R50 backbone with no DCN this warning disappeared. Is it normal ?
Hi, thanks for your work and repo. I'm very interested in the VFL
, which combines classification scores and location scores in the targets. Then I have some questions about VFL
.
table 3
of the paper, the first row represents the results of the raw VFNet trained with the focal loss. What is raw VFNet
?FCOS+ATSS with the centerness branch removed
?VFL
to FCOS+ATSS with the centerness branch removed
and applying FL
to FCOS+ATSS(with the centerness branch)
?Thank you very much!
How can i train my custom dataset ? Do i need to label the images in VOC format with Labelimg ? How should i prepare the dataset for VarifocalNet training ? Looking forward to your reply ! And really appreciate it !
I want to reimplement VFNet by detectron2. Can you describe the technical details of implementing it in detail? please.
In the paper, the negtive weight of BCE loss is alpha*p^gamma. However, in varifocal_loss.py, the loss is implemented by:
focal_weight = target * (target > 0.0).float() +
alpha * (pred_sigmoid - target).abs().pow(gamma) *
(target <= 0.0).float()
The negtive weight is alpha(p-q)^gamma*, why?
Hi, I applied vaerifocal loss to yolox objectness (also tried cls loss), all make AP massively drop, this seems abnormal. Do u have any idea why?
It seems the objloss decreased alittle bit, but cls loss get increased, (objloss: Varifocalloss, clsloss: BCELogits)
I am Vansin, the technical operator of OpenMMLab. In September of last year, we announced the release of OpenMMLab 2.0 at the World Artificial Intelligence Conference in Shanghai. We invite you to upgrade your algorithm library to OpenMMLab 2.0 using MMEngine, which can be used for both research and commercial purposes. If you have any questions, please feel free to join us on the OpenMMLab Discord at https://discord.gg/amFNsyUBvm or add me on WeChat (van-sin) and I will invite you to the OpenMMLab WeChat group.
Here are the OpenMMLab 2.0 repos branches:
OpenMMLab 1.0 branch | OpenMMLab 2.0 branch | |
---|---|---|
MMEngine | 0.x | |
MMCV | 1.x | 2.x |
MMDetection | 0.x 、1.x、2.x | 3.x |
MMAction2 | 0.x | 1.x |
MMClassification | 0.x | 1.x |
MMSegmentation | 0.x | 1.x |
MMDetection3D | 0.x | 1.x |
MMEditing | 0.x | 1.x |
MMPose | 0.x | 1.x |
MMDeploy | 0.x | 1.x |
MMTracking | 0.x | 1.x |
MMOCR | 0.x | 1.x |
MMRazor | 0.x | 1.x |
MMSelfSup | 0.x | 1.x |
MMRotate | 1.x | 1.x |
MMYOLO | 0.x |
Attention: please create a new virtual environment for OpenMMLab 2.0.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.