Giter Club home page Giter Club logo

softteacher's Introduction

End-to-End Semi-Supervised Object Detection with Soft Teacher

PWC PWC PWC PWC PWC PWC PWC PWC

By Mengde Xu*, Zheng Zhang*, Han Hu, Jianfeng Wang, Lijuan Wang, Fangyun Wei, Xiang Bai, Zicheng Liu.

This repo is the official implementation of ICCV2021 paper "End-to-End Semi-Supervised Object Detection with Soft Teacher".

Citation

@article{xu2021end,
  title={End-to-End Semi-Supervised Object Detection with Soft Teacher},
  author={Xu, Mengde and Zhang, Zheng and Hu, Han and Wang, Jianfeng and Wang, Lijuan and Wei, Fangyun and Bai, Xiang and Liu, Zicheng},
  journal={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  year={2021}
}

Main Results

Partial Labeled Data

We followed STAC[1] to evaluate on 5 different data splits for each setting, and report the average performance of 5 splits. The results are shown in the following:

1% labeled data

Method mAP Model Weights Config Files
Baseline 10.0 - Config
Ours (thr=5e-2) 21.62 Drive Config
Ours (thr=1e-3) 22.64 Drive Config

5% labeled data

Method mAP Model Weights Config Files
Baseline 20.92 - Config
Ours (thr=5e-2) 30.42 Drive Config
Ours (thr=1e-3) 31.7 Drive Config

10% labeled data

Method mAP Model Weights Config Files
Baseline 26.94 - Config
Ours (thr=5e-2) 33.78 Drive Config
Ours (thr=1e-3) 34.7 Drive Config

Full Labeled Data

Faster R-CNN (ResNet-50)

Model mAP Model Weights Config Files
Baseline 40.9 - Config
Ours (thr=5e-2) 44.05 Drive Config
Ours (thr=1e-3) 44.6 Drive Config
Ours* (thr=5e-2) 44.5 - Config
Ours* (thr=1e-3) 44.9 - Config

Faster R-CNN (ResNet-101)

Model mAP Model Weights Config Files
Baseline 43.8 - Config
Ours* (thr=5e-2) 46.9 Drive Config
Ours* (thr=1e-3) 47.6 Drive Config

Notes

  • Ours* means we use longer training schedule.
  • thr indicates model.test_cfg.rcnn.score_thr in config files. This inference trick was first introduced by Instant-Teaching[2].
  • All models are trained on 8*V100 GPUs

Usage

Requirements

  • Ubuntu 16.04
  • Anaconda3 with python=3.6
  • Pytorch=1.9.0
  • mmdetection=2.16.0+fe46ffe
  • mmcv=1.3.9
  • wandb=0.10.31

Notes

  • We use wandb for visualization, if you don't want to use it, just comment line 273-284 in configs/soft_teacher/base.py.
  • The project should be compatible to the latest version of mmdetection. If you want to switch to the same version mmdetection as ours, run cd thirdparty/mmdetection && git checkout v2.16.0

Installation

make install

Data Preparation

  • Download the COCO dataset
  • Execute the following command to generate data set splits:
# YOUR_DATA should be a directory contains coco dataset.
# For eg.:
# YOUR_DATA/
#  coco/
#     train2017/
#     val2017/
#     unlabeled2017/
#     annotations/
ln -s ${YOUR_DATA} data
bash tools/dataset/prepare_coco_data.sh conduct

For concrete instructions of what should be downloaded, please refer to tools/dataset/prepare_coco_data.sh line 11-24

Training

  • To train model on the partial labeled data setting:
# JOB_TYPE: 'baseline' or 'semi', decide which kind of job to run
# PERCENT_LABELED_DATA: 1, 5, 10. The ratio of labeled coco data in whole training dataset.
# GPU_NUM: number of gpus to run the job
for FOLD in 1 2 3 4 5;
do
  bash tools/dist_train_partially.sh <JOB_TYPE> ${FOLD} <PERCENT_LABELED_DATA> <GPU_NUM>
done

For example, we could run the following scripts to train our model on 10% labeled data with 8 GPUs:

for FOLD in 1 2 3 4 5;
do
  bash tools/dist_train_partially.sh semi ${FOLD} 10 8
done
  • To train model on the full labeled data setting:
bash tools/dist_train.sh <CONFIG_FILE_PATH> <NUM_GPUS>

For example, to train ours R50 model with 8 GPUs:

bash tools/dist_train.sh configs/soft_teacher/soft_teacher_faster_rcnn_r50_caffe_fpn_coco_full_720k.py 8
  • To train model on new dataset:

The core idea is to convert a new dataset to coco format. Details about it can be found in the adding new dataset.

Evaluation

bash tools/dist_test.sh <CONFIG_FILE_PATH> <CHECKPOINT_PATH> <NUM_GPUS> --eval bbox --cfg-options model.test_cfg.rcnn.score_thr=<THR>

Inference

To inference with trained model and visualize the detection results:

# [IMAGE_FILE_PATH]: the path of your image file in local file system
# [CONFIG_FILE]: the path of a confile file
# [CHECKPOINT_PATH]: the path of a trained model related to provided confilg file.
# [OUTPUT_PATH]: the directory to save detection result
python demo/image_demo.py [IMAGE_FILE_PATH] [CONFIG_FILE] [CHECKPOINT_PATH] --output [OUTPUT_PATH]

For example:

  • Inference on single image with provided R50 model:
python demo/image_demo.py /tmp/tmp.png configs/soft_teacher/soft_teacher_faster_rcnn_r50_caffe_fpn_coco_full_720k.py work_dirs/downloaded.model --output work_dirs/

After the program completes, a image with the same name as input will be saved to work_dirs

  • Inference on many images with provided R50 model:
python demo/image_demo.py '/tmp/*.jpg' configs/soft_teacher/soft_teacher_faster_rcnn_r50_caffe_fpn_coco_full_720k.py work_dirs/downloaded.model --output work_dirs/

[1] A Simple Semi-Supervised Learning Framework for Object Detection

[2] Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework

softteacher's People

Contributors

mendelxu avatar microsoft-github-policy-service[bot] avatar microsoftopensource avatar psvnlsaikumar avatar stupidzz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

softteacher's Issues

Support for simgle gpu training

I tried to train on a single gpu(using train.py on a small custom dataset),but there's no sampler called 'DistributedSemiBalanceSampler', which caused error.
Since the source code is for distributeed training on multi-gpu, I guess when I only use one gpu, the name of sampler becomes
'DistributedSemiBalanceSampler' instead of 'DistributedGroupSemiBalanceSampler', which is built in your code.
I see that the variable 'group' becomes False in my case.
Can I modify a little to 'DistributedGroupSemiBalanceSampler' and create a correct 'DistributedSemiBalanceSampler'?
Then how?Thanks for your help!!!

soft teacher loss different in paper and code.

        bg_score = torch.cat([_score[:, -1] for _score in _scores])
        assigned_label, _, _, _ = bbox_targets
        neg_inds = assigned_label == self.student.roi_head.bbox_head.num_classes
        bbox_targets[1][neg_inds] = bg_score[neg_inds].detach()
    loss = self.student.roi_head.bbox_head.loss(
        bbox_results["cls_score"],
        bbox_results["bbox_pred"],
        rois,
        *bbox_targets,
        reduction_override="none",
    )
    loss["loss_cls"] = loss["loss_cls"].sum() / max(bbox_targets[1].sum(), 1.0)

the soft teacher loss different in paper and code.

image

assert len(indices) == len(self), f"{indices} not equal {len(self)} while offset is: {offset}"

When I use my custom data, raise error: "assert len(indices) == len(self), f"{indices} not equal {len(self)} while offset is: {offset}""
then I print the length info, =====len of indices is 29336 - offset: 0 - len self 58640

below is the detail error info, Please help me.

Traceback (most recent call last):
File "tools/train.py", line 198, in
main()
File "tools/train.py", line 193, in main
meta=meta,
File "/home/swap/project/SoftTeacher/ssod/apis/train.py", line 205, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 117, in run
iter_loaders = [IterLoader(x) for x in data_loaders]
File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 117, in
iter_loaders = [IterLoader(x) for x in data_loaders]
File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 23, in init
self.iter_loader = iter(self._dataloader)
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 291, in iter
return _MultiProcessingDataLoaderIter(self)
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 764, in init
self._try_put_index()
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 994, in _try_put_index
index = self._next_index()
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 357, in _next_index
return next(self._sampler_iter) # may raise StopIteration
File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 208, in iter
for idx in self.sampler:
File "/home/swap/project/SoftTeacher/ssod/datasets/samplers/semi_sampler.py", line 186, in iter
assert len(indices) == len(self), f"{indices} not equal {len(self)} while offset is: {offset}"

Using pretrained model for student at the begining of training process

I wonna use a pretrained model(not only is backbone pretrained, but also the neck and head)(pretrained by only using the labeled data) for my training process. How can I do it?
Well, it might feel strange, but that's what I wonna try. I felt hard to do it under a mmdetection version code.
Thanks for your help!

HTC++ code

Thank you for your excellent work. Can you provide the HTC + + code of soft teacher?

KeyError: 'SemiBalanceSampler is not in the sampler registry'

File: SoftTeacher/configs/soft_teacher/base.py
data -> sampler -> train -> type="SemiBalanceSampler"
I couldn't find a SemiBalanceSampler anywhere. Not in your repo, mmcv or mmdet.
Is this a mistake? What Sampler should I try instead?

Thanks
Josh

Could you release some log files?

Hi, I'm so excited for your works and I want to use this model on custom dataset.

Could you release some train log files? It will be a great help to check how loss flows in the early stages of training and whether the code I wrote works fine.

Thank you,
Jihwan Eom

Feasable to train on fewer/single GPU?

Hi,

in the paper it is stated that you used 8 GPUs for training. Some questions regarding that:

  • Can you specify what kind of GPUs you used (name, amount of GPU memory)?
  • How long did the training take with your 8 GPUs?
  • Is it necessary to train with this many GPUs or would the training time with the same settings be otherwise way to long?

Best Karol

Data ratio at the end of the training

image

Hello

I read your paper and found out you have reduced the data sampling ratio to zero (no labeled images at the end of the training?) for the last 20k iterations.

  1. Is there any reason why you did this?
  2. How much does it contribute to the performance of the Fully Labeled Data setting? What is the performance using the constant data ratio?
  3. Where is this implemented in the code? I cannot find it.

Thank you.

Learning curve for Full Labeled Data experiment

Hi,

I found it is hard to squeeze the Full Labeled Data experiment into 8 2080_ti GPUs. Could you provide the training logs, including the mAP, training losses, and other metrics along with the training iterations?

Thanks.

Error while trying to train with 4 gpus

Congratulations for the great work. I am getting this error while trying to train with 4 gpus. Can you please help me out?

File "/data/SoftTeacher/tools/train.py", line 198, in <module>
    main()
  File "/data/SoftTeacher/tools/train.py", line 186, in main
    train_detector(
  File "/data/SoftTeacher/ssod/apis/train.py", line 206, in train_detector
    runner.run(data_loaders, cfg.workflow)
  File "/home/ubuntu/anaconda3/envs/py39/lib/python3.9/site-packages/mmcv/runner/iter_based_runner.py", line 133, in run
    iter_runner(iter_loaders[i], **kwargs)
  File "/home/ubuntu/anaconda3/envs/py39/lib/python3.9/site-packages/mmcv/runner/iter_based_runner.py", line 60, in train
    outputs = self.model.train_step(data_batch, self.optimizer, **kwargs)
  File "/home/ubuntu/anaconda3/envs/py39/lib/python3.9/site-packages/mmcv/parallel/distributed.py", line 52, in train_step
    output = self.module.train_step(*inputs[0], **kwargs[0])
  File "/data/SoftTeacher/thirdparty/mmdetection/mmdet/models/detectors/base.py", line 238, in train_step
    losses = self(**data)
  File "/home/ubuntu/anaconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ubuntu/anaconda3/envs/py39/lib/python3.9/site-packages/mmcv/runner/fp16_utils.py", line 128, in new_func
    output = old_func(*new_args, **new_kwargs)
  File "/data/SoftTeacher/thirdparty/mmdetection/mmdet/models/detectors/base.py", line 172, in forward
    return self.forward_train(img, img_metas, **kwargs)
  File "/data/SoftTeacher/ssod/models/soft_teacher.py", line 44, in forward_train
    sup_loss = self.student.forward_train(**data_groups["sup"])
  File "/data/SoftTeacher/thirdparty/mmdetection/mmdet/models/detectors/two_stage.py", line 135, in forward_train
    rpn_losses, proposal_list = self.rpn_head.forward_train(
  File "/data/SoftTeacher/thirdparty/mmdetection/mmdet/models/dense_heads/base_dense_head.py", line 59, in forward_train
    proposal_list = self.get_bboxes(*outs, img_metas, cfg=proposal_cfg)
  File "/home/ubuntu/anaconda3/envs/py39/lib/python3.9/site-packages/mmcv/runner/fp16_utils.py", line 214, in new_func
    output = old_func(*new_args, **new_kwargs)
  File "/data/SoftTeacher/thirdparty/mmdetection/mmdet/models/dense_heads/rpn_head.py", line 152, in get_bboxes
    proposals = self._get_bboxes_single(cls_score_list, bbox_pred_list,
  File "/data/SoftTeacher/thirdparty/mmdetection/mmdet/models/dense_heads/rpn_head.py", line 244, in _get_bboxes_single
    dets, keep = batched_nms(proposals, scores, ids, cfg.nms)
  File "/home/ubuntu/anaconda3/envs/py39/lib/python3.9/site-packages/mmcv/ops/nms.py", line 307, in batched_nms
    dets, keep = nms_op(boxes_for_nms, scores, **nms_cfg_)
  File "/home/ubuntu/anaconda3/envs/py39/lib/python3.9/site-packages/mmcv/utils/misc.py", line 330, in new_func
    output = old_func(*args, **kwargs)
  File "/home/ubuntu/anaconda3/envs/py39/lib/python3.9/site-packages/mmcv/ops/nms.py", line 171, in nms
    inds = NMSop.apply(boxes, scores, iou_threshold, offset,
  File "/home/ubuntu/anaconda3/envs/py39/lib/python3.9/site-packages/mmcv/ops/nms.py", line 26, in forward
    inds = ext_module.nms(
RuntimeError: CUDA error: invalid device function
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
/data/SoftTeacher/thirdparty/mmdetection/mmdet/core/anchor/anchor_generator.py:324: UserWarning: ``grid_anchors`` would be deprecated soon. Please use ``grid_priors`` 
  warnings.warn('``grid_anchors`` would be deprecated soon. '
/data/SoftTeacher/thirdparty/mmdetection/mmdet/core/anchor/anchor_generator.py:360: UserWarning: ``single_level_grid_anchors`` would be deprecated soon. Please use ``single_level_grid_priors`` 
  warnings.warn(
Traceback (most recent call last):
  File "/data/SoftTeacher/tools/train.py", line 198, in <module>
    main()
  File "/data/SoftTeacher/tools/train.py", line 186, in main
    train_detector(
  File "/data/SoftTeacher/ssod/apis/train.py", line 206, in train_detector
    runner.run(data_loaders, cfg.workflow)
  File "/home/ubuntu/anaconda3/envs/py39/lib/python3.9/site-packages/mmcv/runner/iter_based_runner.py", line 133, in run
    iter_runner(iter_loaders[i], **kwargs)
  File "/home/ubuntu/anaconda3/envs/py39/lib/python3.9/site-packages/mmcv/runner/iter_based_runner.py", line 60, in train
    outputs = self.model.train_step(data_batch, self.optimizer, **kwargs)
  File "/home/ubuntu/anaconda3/envs/py39/lib/python3.9/site-packages/mmcv/parallel/distributed.py", line 52, in train_step
    output = self.module.train_step(*inputs[0], **kwargs[0])
  File "/data/SoftTeacher/thirdparty/mmdetection/mmdet/models/detectors/base.py", line 238, in train_step
    losses = self(**data)
  File "/home/ubuntu/anaconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ubuntu/anaconda3/envs/py39/lib/python3.9/site-packages/mmcv/runner/fp16_utils.py", line 128, in new_func
    output = old_func(*new_args, **new_kwargs)
  File "/data/SoftTeacher/thirdparty/mmdetection/mmdet/models/detectors/base.py", line 172, in forward
    return self.forward_train(img, img_metas, **kwargs)
  File "/data/SoftTeacher/ssod/models/soft_teacher.py", line 44, in forward_train
    sup_loss = self.student.forward_train(**data_groups["sup"])
  File "/data/SoftTeacher/thirdparty/mmdetection/mmdet/models/detectors/two_stage.py", line 135, in forward_train
    rpn_losses, proposal_list = self.rpn_head.forward_train(
  File "/data/SoftTeacher/thirdparty/mmdetection/mmdet/models/dense_heads/base_dense_head.py", line 59, in forward_train
    proposal_list = self.get_bboxes(*outs, img_metas, cfg=proposal_cfg)
  File "/home/ubuntu/anaconda3/envs/py39/lib/python3.9/site-packages/mmcv/runner/fp16_utils.py", line 214, in new_func
    output = old_func(*new_args, **new_kwargs)
  File "/data/SoftTeacher/thirdparty/mmdetection/mmdet/models/dense_heads/rpn_head.py", line 152, in get_bboxes
    proposals = self._get_bboxes_single(cls_score_list, bbox_pred_list,
  File "/data/SoftTeacher/thirdparty/mmdetection/mmdet/models/dense_heads/rpn_head.py", line 244, in _get_bboxes_single
    dets, keep = batched_nms(proposals, scores, ids, cfg.nms)
  File "/home/ubuntu/anaconda3/envs/py39/lib/python3.9/site-packages/mmcv/ops/nms.py", line 307, in batched_nms
    dets, keep = nms_op(boxes_for_nms, scores, **nms_cfg_)
  File "/home/ubuntu/anaconda3/envs/py39/lib/python3.9/site-packages/mmcv/utils/misc.py", line 330, in new_func
    output = old_func(*args, **kwargs)
  File "/home/ubuntu/anaconda3/envs/py39/lib/python3.9/site-packages/mmcv/ops/nms.py", line 171, in nms
    inds = NMSop.apply(boxes, scores, iou_threshold, offset,
  File "/home/ubuntu/anaconda3/envs/py39/lib/python3.9/site-packages/mmcv/ops/nms.py", line 26, in forward
    inds = ext_module.nms(
RuntimeError: CUDA error: invalid device function
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
/data/SoftTeacher/thirdparty/mmdetection/mmdet/core/anchor/anchor_generator.py:324: UserWarning: ``grid_anchors`` would be deprecated soon. Please use ``grid_priors`` 
  warnings.warn('``grid_anchors`` would be deprecated soon. '
/data/SoftTeacher/thirdparty/mmdetection/mmdet/core/anchor/anchor_generator.py:360: UserWarning: ``single_level_grid_anchors`` would be deprecated soon. Please use ``single_level_grid_priors`` 
  warnings.warn(
Traceback (most recent call last):
  File "/data/SoftTeacher/tools/train.py", line 198, in <module>
    main()
  File "/data/SoftTeacher/tools/train.py", line 186, in main
    train_detector(
  File "/data/SoftTeacher/ssod/apis/train.py", line 206, in train_detector
    runner.run(data_loaders, cfg.workflow)
  File "/home/ubuntu/anaconda3/envs/py39/lib/python3.9/site-packages/mmcv/runner/iter_based_runner.py", line 133, in run
    iter_runner(iter_loaders[i], **kwargs)
  File "/home/ubuntu/anaconda3/envs/py39/lib/python3.9/site-packages/mmcv/runner/iter_based_runner.py", line 60, in train
    outputs = self.model.train_step(data_batch, self.optimizer, **kwargs)
  File "/home/ubuntu/anaconda3/envs/py39/lib/python3.9/site-packages/mmcv/parallel/distributed.py", line 52, in train_step
    output = self.module.train_step(*inputs[0], **kwargs[0])
  File "/data/SoftTeacher/thirdparty/mmdetection/mmdet/models/detectors/base.py", line 238, in train_step
    losses = self(**data)
  File "/home/ubuntu/anaconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ubuntu/anaconda3/envs/py39/lib/python3.9/site-packages/mmcv/runner/fp16_utils.py", line 128, in new_func
    output = old_func(*new_args, **new_kwargs)
  File "/data/SoftTeacher/thirdparty/mmdetection/mmdet/models/detectors/base.py", line 172, in forward
    return self.forward_train(img, img_metas, **kwargs)
  File "/data/SoftTeacher/ssod/models/soft_teacher.py", line 44, in forward_train
    sup_loss = self.student.forward_train(**data_groups["sup"])
  File "/data/SoftTeacher/thirdparty/mmdetection/mmdet/models/detectors/two_stage.py", line 135, in forward_train
    rpn_losses, proposal_list = self.rpn_head.forward_train(
  File "/data/SoftTeacher/thirdparty/mmdetection/mmdet/models/dense_heads/base_dense_head.py", line 59, in forward_train
    proposal_list = self.get_bboxes(*outs, img_metas, cfg=proposal_cfg)
  File "/home/ubuntu/anaconda3/envs/py39/lib/python3.9/site-packages/mmcv/runner/fp16_utils.py", line 214, in new_func
    output = old_func(*new_args, **new_kwargs)
  File "/data/SoftTeacher/thirdparty/mmdetection/mmdet/models/dense_heads/rpn_head.py", line 152, in get_bboxes
    proposals = self._get_bboxes_single(cls_score_list, bbox_pred_list,
  File "/data/SoftTeacher/thirdparty/mmdetection/mmdet/models/dense_heads/rpn_head.py", line 244, in _get_bboxes_single
    dets, keep = batched_nms(proposals, scores, ids, cfg.nms)
  File "/home/ubuntu/anaconda3/envs/py39/lib/python3.9/site-packages/mmcv/ops/nms.py", line 307, in batched_nms
    dets, keep = nms_op(boxes_for_nms, scores, **nms_cfg_)
  File "/home/ubuntu/anaconda3/envs/py39/lib/python3.9/site-packages/mmcv/utils/misc.py", line 330, in new_func
    output = old_func(*args, **kwargs)
  File "/home/ubuntu/anaconda3/envs/py39/lib/python3.9/site-packages/mmcv/ops/nms.py", line 171, in nms
    inds = NMSop.apply(boxes, scores, iou_threshold, offset,
  File "/home/ubuntu/anaconda3/envs/py39/lib/python3.9/site-packages/mmcv/ops/nms.py", line 26, in forward
    inds = ext_module.nms(
RuntimeError: CUDA error: invalid device function
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
/data/SoftTeacher/thirdparty/mmdetection/mmdet/core/anchor/anchor_generator.py:324: UserWarning: ``grid_anchors`` would be deprecated soon. Please use ``grid_priors`` 
  warnings.warn('``grid_anchors`` would be deprecated soon. '
/data/SoftTeacher/thirdparty/mmdetection/mmdet/core/anchor/anchor_generator.py:360: UserWarning: ``single_level_grid_anchors`` would be deprecated soon. Please use ``single_level_grid_priors`` 
  warnings.warn(
Traceback (most recent call last):
  File "/data/SoftTeacher/tools/train.py", line 198, in <module>
    main()
  File "/data/SoftTeacher/tools/train.py", line 186, in main
    train_detector(
  File "/data/SoftTeacher/ssod/apis/train.py", line 206, in train_detector
    runner.run(data_loaders, cfg.workflow)
  File "/home/ubuntu/anaconda3/envs/py39/lib/python3.9/site-packages/mmcv/runner/iter_based_runner.py", line 133, in run
    iter_runner(iter_loaders[i], **kwargs)
  File "/home/ubuntu/anaconda3/envs/py39/lib/python3.9/site-packages/mmcv/runner/iter_based_runner.py", line 60, in train
    outputs = self.model.train_step(data_batch, self.optimizer, **kwargs)
  File "/home/ubuntu/anaconda3/envs/py39/lib/python3.9/site-packages/mmcv/parallel/distributed.py", line 52, in train_step
    output = self.module.train_step(*inputs[0], **kwargs[0])
  File "/data/SoftTeacher/thirdparty/mmdetection/mmdet/models/detectors/base.py", line 238, in train_step
    losses = self(**data)
  File "/home/ubuntu/anaconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ubuntu/anaconda3/envs/py39/lib/python3.9/site-packages/mmcv/runner/fp16_utils.py", line 128, in new_func
    output = old_func(*new_args, **new_kwargs)
  File "/data/SoftTeacher/thirdparty/mmdetection/mmdet/models/detectors/base.py", line 172, in forward
    return self.forward_train(img, img_metas, **kwargs)
  File "/data/SoftTeacher/ssod/models/soft_teacher.py", line 44, in forward_train
    sup_loss = self.student.forward_train(**data_groups["sup"])
  File "/data/SoftTeacher/thirdparty/mmdetection/mmdet/models/detectors/two_stage.py", line 135, in forward_train
    rpn_losses, proposal_list = self.rpn_head.forward_train(
  File "/data/SoftTeacher/thirdparty/mmdetection/mmdet/models/dense_heads/base_dense_head.py", line 59, in forward_train
    proposal_list = self.get_bboxes(*outs, img_metas, cfg=proposal_cfg)
  File "/home/ubuntu/anaconda3/envs/py39/lib/python3.9/site-packages/mmcv/runner/fp16_utils.py", line 214, in new_func
    output = old_func(*new_args, **new_kwargs)
  File "/data/SoftTeacher/thirdparty/mmdetection/mmdet/models/dense_heads/rpn_head.py", line 152, in get_bboxes
    proposals = self._get_bboxes_single(cls_score_list, bbox_pred_list,
  File "/data/SoftTeacher/thirdparty/mmdetection/mmdet/models/dense_heads/rpn_head.py", line 244, in _get_bboxes_single
    dets, keep = batched_nms(proposals, scores, ids, cfg.nms)
  File "/home/ubuntu/anaconda3/envs/py39/lib/python3.9/site-packages/mmcv/ops/nms.py", line 307, in batched_nms
    dets, keep = nms_op(boxes_for_nms, scores, **nms_cfg_)
  File "/home/ubuntu/anaconda3/envs/py39/lib/python3.9/site-packages/mmcv/utils/misc.py", line 330, in new_func
    output = old_func(*args, **kwargs)
  File "/home/ubuntu/anaconda3/envs/py39/lib/python3.9/site-packages/mmcv/ops/nms.py", line 171, in nms
    inds = NMSop.apply(boxes, scores, iou_threshold, offset,
  File "/home/ubuntu/anaconda3/envs/py39/lib/python3.9/site-packages/mmcv/ops/nms.py", line 26, in forward
    inds = ext_module.nms(
RuntimeError: CUDA error: invalid device function
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

wandb: Waiting for W&B process to finish, PID 38162
wandb: Program failed with code 1. 
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -11) local_rank: 1 (pid: 37922) of binary: /home/ubuntu/anaconda3/envs/py39/bin/python
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/py39/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/ubuntu/anaconda3/envs/py39/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/ubuntu/anaconda3/envs/py39/lib/python3.9/site-packages/torch/distributed/launch.py", line 193, in <module>
    main()
  File "/home/ubuntu/anaconda3/envs/py39/lib/python3.9/site-packages/torch/distributed/launch.py", line 189, in main
    launch(args)
  File "/home/ubuntu/anaconda3/envs/py39/lib/python3.9/site-packages/torch/distributed/launch.py", line 174, in launch
    run(args)
  File "/home/ubuntu/anaconda3/envs/py39/lib/python3.9/site-packages/torch/distributed/run.py", line 689, in run
    elastic_launch(
  File "/home/ubuntu/anaconda3/envs/py39/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 116, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/ubuntu/anaconda3/envs/py39/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 244, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
**************************************************
              tools/train.py FAILED               
==================================================
Root Cause:
[0]:
  time: 2021-09-29_15:26:15
  rank: 1 (local_rank: 1)
  exitcode: -11 (pid: 37922)
  error_file: <N/A>
  msg: "Signal 11 (SIGSEGV) received by PID 37922"
==================================================
Other Failures:
[1]:
  time: 2021-09-29_15:26:15
  rank: 3 (local_rank: 3)
  exitcode: -11 (pid: 37924)
  error_file: <N/A>
  msg: "Signal 11 (SIGSEGV) received by PID 37924"
**************************************************

how to prepare dataset?

Thanks for your excellent work. When I was preparing the data, I encountered a problem.
1631937551(1)
I downloaded the coco data set according to the instructions of STAC(https://github.com/google-research/ssl_detection)
But file image_info_unlabeled2017.json is missing when running your code(bash tools/dataset/prepare_coco_data.sh conduct).
Can you help me fixed it?

the r_square between iou and bbox variance in the refine(jitter(bbox)) method

The scatter plot in your paper about the relationship between iou and the bbox variance(after jittered) is really interesting and showed a strong correlation. Since that, I wonna try another method on single stage detector about estimating the bbox quality under your soft teacher architecture. I simply want to know what's the r_square you've achieved with soft teacher and faster-rcnn+FPN on COCO 1% labeled dataset. Maybe I wonna have a comparision in my projects in the future. Of course if I could come out with some methods under your architecture, I'll show my greatest gratitude and acknowledgements in my paper or project report! Sincerely thanks for your help!!!

Error in training

Error in full training:

tools/train.py FAILED

Root Cause:
[0]:
time: 2021-10-04_17:02:18
rank: 0 (local_rank: 0)
exitcode: 1 (pid: 921)
error_file: <N/A>
msg: "Process failed with exitcode 1"

Other Failures:
<NO_OTHER_FAILURES>

I am using only one GPU, get an error in full training with my own data converted to COCO.

Firstly, I segmented the data with "bash tools/dataset/prepare_coco_data.sh conduct", then trained with "bash tools/dist_train.sh configs/soft_teacher/soft_teacher_faster_rcnn_r50_caffe_fpn_coco_full_720k.py 1 "

I also trained as the readme file with the COCO data, and still obtain errors, in full or semi training.
It gets stuck in:
INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group
INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_mhoa0vu3/none_y1enyj09/attempt_0/0/error.json

Question about EMA

I don't understand where EMA updates happens

after update loss of sup+unsup, it just returns loss in softteacher

where does ema update happens?

loading weights issue

When I run the code, I can see logs:"unexpected key in source state_dict: conv1.bias, layer1.0.conv1.bias.......". all unexpected keys are bias, I don't know whether this will affect performance or not.

Some questions about Soft-Teacher method

HI,i'm trying to implement this training method with out mmdetection and i just want to train with my faster_rcnn code
and i have read the paper but didn't clearly understand all the methods... i need some help

Q1: When unsupervised learning , is the image already transformed by weak and hard augmentation? i mean transformed images are the inputs of student and teacher model?

Q2: If they are already transformed , the bbox coordinates will be different with weak auged img and hard auged img if some kind of flip or rotate augmentation is applied. Then how to calculate the loss ?

Q3: And I have no idea what is in the "transform_matrix" in img_meta. what is in to the transform_matrix?

Swin transformer based codes and configs

Hi there,
thanks for the nice work and congrats on getting accepted at ICCV2021.
I was wondering if you are planning to release the swin-transformer codes? especially for the instance segmentation model.

Best,

Query regarding some equations of the paper in the code

Thank you very much for this excellent work. I am trying to understand the code properly. Can you please tell me where can I find the code for calculating w_j in classification loss equation using the reliability score?
Screenshot (74)
Screenshot (77)

And can you please also tell me where can I find the code for calculating the box regression variance and the standard derivation used to calculate it?

Screenshot (76)

argument and parameter in builder.py do not match

sampler_cfg, default_sampler_cfg are transferred to build_sampler as an argument as follows.

sampler = build_sampler(sampler_cfg, default_sampler_cfg) if shuffle else None

but build_sampler has cfg, dist, group, default_args.

def build_sampler(cfg, dist=False, group=False, default_args=None):

therefore...
cfg, dist = sampler_cfg, default_sampler_cfg

As the training goes on, sup_acc decrease. Is this normal?

As the training goes on, sup_acc decrease. So I reduce learning rate, this case still exists.
such as: sup_acc from 0 increase to 96, then decrease to 90 finally. I think sup_acc is train_sup_acc, in supervised learning, it should be 99.99, But in semi-supervised learning here, it just 90. this case is normal?

Below is current training log:
2021-09-17 05:31:05,999 - mmdet.ssod - INFO - Iter [50/14400] lr: 7.828e-07, eta: 10:59:29, time: 2.757, data_time: 0.266, memory: 25005, ema_momentum: 0.9800, unsup_weight: 4, sup_loss_rpn_cls: 0.6633, sup_loss_rpn_bbox: 0.3303, sup_loss_cls: 1.8455, sup_acc: 53.1849, sup_loss_bbox: 0.1501, unsup_loss_rpn_cls: 2.6062, unsup_loss_rpn_bbox: 1.7301, unsup_loss_cls: 6.1416, unsup_acc: 52.6421, unsup_loss_bbox: 6.4742, loss: 19.9412
2021-09-17 05:33:10,237 - mmdet.ssod - INFO - Iter [100/14400] lr: 1.237e-06, eta: 10:24:42, time: 2.485, data_time: 0.095, memory: 25005, ema_momentum: 0.9900, unsup_weight: 4, sup_loss_rpn_cls: 0.6543, sup_loss_rpn_bbox: 0.3205, sup_loss_cls: 0.3482, sup_acc: 94.9533, sup_loss_bbox: 0.1464, unsup_loss_rpn_cls: 2.6226, unsup_loss_rpn_bbox: 0.6446, unsup_loss_cls: 1.0436, unsup_acc: 94.7202, unsup_loss_bbox: 3.6727, loss: 9.4529
2021-09-17 05:35:11,839 - mmdet.ssod - INFO - Iter [150/14400] lr: 1.954e-06, eta: 10:07:31, time: 2.432, data_time: 0.094, memory: 25005, ema_momentum: 0.9933, unsup_weight: 4, sup_loss_rpn_cls: 0.6318, sup_loss_rpn_bbox: 0.3099, sup_loss_cls: 0.3127, sup_acc: 96.4278, sup_loss_bbox: 0.1446, unsup_loss_rpn_cls: 2.5320, unsup_loss_rpn_bbox: 0.2281, unsup_loss_cls: 0.4462, unsup_acc: 97.9240, unsup_loss_bbox: 1.4821, loss: 6.0875
2021-09-17 05:37:15,827 - mmdet.ssod - INFO - Iter [200/14400] lr: 3.087e-06, eta: 10:00:46, time: 2.480, data_time: 0.093, memory: 25550, ema_momentum: 0.9950, unsup_weight: 4, sup_loss_rpn_cls: 0.5977, sup_loss_rpn_bbox: 0.3097, sup_loss_cls: 0.3398, sup_acc: 96.8026, sup_loss_bbox: 0.1584, unsup_loss_rpn_cls: 2.3346, unsup_loss_rpn_bbox: 0.1046, unsup_loss_cls: 0.2784, unsup_acc: 98.9751, unsup_loss_bbox: 0.7072, loss: 4.8303
2021-09-17 05:39:19,786 - mmdet.ssod - INFO - Iter [250/14400] lr: 4.877e-06, eta: 9:55:51, time: 2.479, data_time: 0.094, memory: 25550, ema_momentum: 0.9960, unsup_weight: 4, sup_loss_rpn_cls: 0.5481, sup_loss_rpn_bbox: 0.3087, sup_loss_cls: 0.4070, sup_acc: 96.1583, sup_loss_bbox: 0.1731, unsup_loss_rpn_cls: 2.0273, unsup_loss_rpn_bbox: 0.0322, unsup_loss_cls: 0.1416, unsup_acc: 99.3099, unsup_loss_bbox: 0.2865, loss: 3.9245
2021-09-17 05:41:26,675 - mmdet.ssod - INFO - Iter [300/14400] lr: 7.705e-06, eta: 9:54:11, time: 2.538, data_time: 0.094, memory: 25550, ema_momentum: 0.9967, unsup_weight: 4, sup_loss_rpn_cls: 0.4768, sup_loss_rpn_bbox: 0.2862, sup_loss_cls: 0.4715, sup_acc: 95.4132, sup_loss_bbox: 0.1895, unsup_loss_rpn_cls: 1.5952, unsup_loss_rpn_bbox: 0.0332, unsup_loss_cls: 0.1408, unsup_acc: 99.2302, unsup_loss_bbox: 0.2120, loss: 3.4052
2021-09-17 05:43:31,341 - mmdet.ssod - INFO - Iter [350/14400] lr: 1.217e-05, eta: 9:50:54, time: 2.493, data_time: 0.097, memory: 25550, ema_momentum: 0.9971, unsup_weight: 4, sup_loss_rpn_cls: 0.3821, sup_loss_rpn_bbox: 0.2393, sup_loss_cls: 0.4525, sup_acc: 95.4919, sup_loss_bbox: 0.2048, unsup_loss_rpn_cls: 1.0449, unsup_loss_rpn_bbox: 0.0105, unsup_loss_cls: 0.0925, unsup_acc: 99.6831, unsup_loss_bbox: 0.1227, loss: 2.5492
2021-09-17 05:45:35,516 - mmdet.ssod - INFO - Iter [400/14400] lr: 1.923e-05, eta: 9:47:38, time: 2.483, data_time: 0.096, memory: 25550, ema_momentum: 0.9975, unsup_weight: 4, sup_loss_rpn_cls: 0.3323, sup_loss_rpn_bbox: 0.2170, sup_loss_cls: 0.3843, sup_acc: 94.6693, sup_loss_bbox: 0.2592, unsup_loss_rpn_cls: 0.5614, unsup_loss_rpn_bbox: 0.0019, unsup_loss_cls: 0.0643, unsup_acc: 99.9102, unsup_loss_bbox: 0.0942, loss: 1.9144
2021-09-17 05:47:41,790 - mmdet.ssod - INFO - Iter [450/14400] lr: 3.038e-05, eta: 9:45:43, time: 2.525, data_time: 0.101, memory: 25550, ema_momentum: 0.9978, unsup_weight: 4, sup_loss_rpn_cls: 0.3164, sup_loss_rpn_bbox: 0.2000, sup_loss_cls: 0.3019, sup_acc: 94.1454, sup_loss_bbox: 0.2831, unsup_loss_rpn_cls: 0.3070, unsup_loss_rpn_bbox: 0.0008, unsup_loss_cls: 0.0534, unsup_acc: 99.9551, unsup_loss_bbox: 0.0819, loss: 1.5445
2021-09-17 05:49:45,712 - mmdet.ssod - INFO - Iter [500/14400] lr: 4.799e-05, eta: 9:42:40, time: 2.479, data_time: 0.098, memory: 25550, ema_momentum: 0.9980, unsup_weight: 4, sup_loss_rpn_cls: 0.2811, sup_loss_rpn_bbox: 0.1882, sup_loss_cls: 0.2738, sup_acc: 93.7461, sup_loss_bbox: 0.2917, unsup_loss_rpn_cls: 0.2128, unsup_loss_rpn_bbox: 0.0009, unsup_loss_cls: 0.0527, unsup_acc: 99.9358, unsup_loss_bbox: 0.0587, loss: 1.3600
2021-09-17 05:51:49,614 - mmdet.ssod - INFO - Iter [550/14400] lr: 7.582e-05, eta: 9:39:48, time: 2.478, data_time: 0.096, memory: 25550, ema_momentum: 0.9982, unsup_weight: 4, sup_loss_rpn_cls: 0.2607, sup_loss_rpn_bbox: 0.1927, sup_loss_cls: 0.2887, sup_acc: 92.5063, sup_loss_bbox: 0.3307, unsup_loss_rpn_cls: 0.1707, unsup_loss_rpn_bbox: 0.0006, unsup_loss_cls: 0.0539, unsup_acc: 99.9426, unsup_loss_bbox: 0.0351, loss: 1.3331
2021-09-17 05:53:56,824 - mmdet.ssod - INFO - Iter [600/14400] lr: 1.198e-04, eta: 9:38:19, time: 2.544, data_time: 0.096, memory: 25550, ema_momentum: 0.9983, unsup_weight: 4, sup_loss_rpn_cls: 0.2264, sup_loss_rpn_bbox: 0.1813, sup_loss_cls: 0.2817, sup_acc: 92.0654, sup_loss_bbox: 0.3322, unsup_loss_rpn_cls: 0.1481, unsup_loss_rpn_bbox: 0.0000, unsup_loss_cls: 0.0487, unsup_acc: 99.9492, unsup_loss_bbox: 0.0178, loss: 1.2362
2021-09-17 05:56:04,359 - mmdet.ssod - INFO - Iter [650/14400] lr: 1.892e-04, eta: 9:36:52, time: 2.551, data_time: 0.099, memory: 25771, ema_momentum: 0.9985, unsup_weight: 4, sup_loss_rpn_cls: 0.2320, sup_loss_rpn_bbox: 0.1866, sup_loss_cls: 0.2628, sup_acc: 91.9623, sup_loss_bbox: 0.3182, unsup_loss_rpn_cls: 0.1389, unsup_loss_rpn_bbox: 0.0000, unsup_loss_cls: 0.0415, unsup_acc: 99.9572, unsup_loss_bbox: 0.0042, loss: 1.1842
2021-09-17 05:58:10,804 - mmdet.ssod - INFO - Iter [700/14400] lr: 2.989e-04, eta: 9:34:57, time: 2.529, data_time: 0.100, memory: 25771, ema_momentum: 0.9986, unsup_weight: 4, sup_loss_rpn_cls: 0.2091, sup_loss_rpn_bbox: 0.1693, sup_loss_cls: 0.2708, sup_acc: 91.4751, sup_loss_bbox: 0.3258, unsup_loss_rpn_cls: 0.1178, unsup_loss_rpn_bbox: 0.0000, unsup_loss_cls: 0.0388, unsup_acc: 99.9464, unsup_loss_bbox: 0.0028, loss: 1.1344
2021-09-17 06:00:21,075 - mmdet.ssod - INFO - Iter [750/14400] lr: 4.722e-04, eta: 9:34:11, time: 2.605, data_time: 0.098, memory: 25771, ema_momentum: 0.9987, unsup_weight: 4, sup_loss_rpn_cls: 0.1575, sup_loss_rpn_bbox: 0.1760, sup_loss_cls: 0.3080, sup_acc: 90.4690, sup_loss_bbox: 0.3467, unsup_loss_rpn_cls: 0.0788, unsup_loss_rpn_bbox: 0.0000, unsup_loss_cls: 0.0440, unsup_acc: 99.9194, unsup_loss_bbox: 0.0026, loss: 1.1136
2021-09-17 06:02:27,567 - mmdet.ssod - INFO - Iter [800/14400] lr: 7.459e-04, eta: 9:32:10, time: 2.530, data_time: 0.096, memory: 25990, ema_momentum: 0.9988, unsup_weight: 4, sup_loss_rpn_cls: 0.1327, sup_loss_rpn_bbox: 0.1582, sup_loss_cls: 0.3051, sup_acc: 91.1122, sup_loss_bbox: 0.3199, unsup_loss_rpn_cls: 0.0556, unsup_loss_rpn_bbox: 0.0000, unsup_loss_cls: 0.0405, unsup_acc: 99.9432, unsup_loss_bbox: 0.0052, loss: 1.0170
2021-09-17 06:04:36,759 - mmdet.ssod - INFO - Iter [850/14400] lr: 1.178e-03, eta: 9:30:51, time: 2.584, data_time: 0.101, memory: 25990, ema_momentum: 0.9988, unsup_weight: 4, sup_loss_rpn_cls: 0.1169, sup_loss_rpn_bbox: 0.1577, sup_loss_cls: 0.3170, sup_acc: 90.6921, sup_loss_bbox: 0.3281, unsup_loss_rpn_cls: 0.0559, unsup_loss_rpn_bbox: 0.0000, unsup_loss_cls: 0.0476, unsup_acc: 99.9694, unsup_loss_bbox: 0.0024, loss: 1.0256
2021-09-17 06:06:44,521 - mmdet.ssod - INFO - Iter [900/14400] lr: 1.861e-03, eta: 9:29:05, time: 2.555, data_time: 0.096, memory: 25990, ema_momentum: 0.9989, unsup_weight: 4, sup_loss_rpn_cls: 0.1147, sup_loss_rpn_bbox: 0.1601, sup_loss_cls: 0.3094, sup_acc: 90.1847, sup_loss_bbox: 0.3461, unsup_loss_rpn_cls: 0.0394, unsup_loss_rpn_bbox: 0.0000, unsup_loss_cls: 0.0446, unsup_acc: 99.9753, unsup_loss_bbox: 0.0029, loss: 1.0173
2021-09-17 06:08:52,724 - mmdet.ssod - INFO - Iter [950/14400] lr: 2.940e-03, eta: 9:27:23, time: 2.564, data_time: 0.098, memory: 25990, ema_momentum: 0.9989, unsup_weight: 4, sup_loss_rpn_cls: 0.1129, sup_loss_rpn_bbox: 0.1692, sup_loss_cls: 0.3062, sup_acc: 90.2303, sup_loss_bbox: 0.3335, unsup_loss_rpn_cls: 0.0469, unsup_loss_rpn_bbox: 0.0000, unsup_loss_cls: 0.0439, unsup_acc: 99.9579, unsup_loss_bbox: 0.0050, loss: 1.0174
2021-09-17 06:11:00,656 - mmdet.ssod - INFO - Saving checkpoint at 1000 iterations
2021-09-17 06:11:02,006 - mmdet.ssod - INFO - Exp name: soft_teacher_faster_rcnn_r50_caffe_fpn_custom_full.py
2021-09-17 06:11:02,006 - mmdet.ssod - INFO - Iter [1000/14400] lr: 4.644e-03, eta: 9:25:53, time: 2.586, data_time: 0.103, memory: 25990, ema_momentum: 0.9990, unsup_weight: 4, sup_loss_rpn_cls: 0.1149, sup_loss_rpn_bbox: 0.1767, sup_loss_cls: 0.3226, sup_acc: 89.3570, sup_loss_bbox: 0.3590, unsup_loss_rpn_cls: 0.0448, unsup_loss_rpn_bbox: 0.0000, unsup_loss_cls: 0.0449, unsup_acc: 99.9805, unsup_loss_bbox: 0.0049, loss: 1.0678
2021-09-17 06:13:09,327 - mmdet.ssod - INFO - Iter [1050/14400] lr: 4.671e-03, eta: 9:23:55, time: 2.546, data_time: 0.098, memory: 25990, ema_momentum: 0.9990, unsup_weight: 4, sup_loss_rpn_cls: 0.0947, sup_loss_rpn_bbox: 0.1611, sup_loss_cls: 0.3244, sup_acc: 89.1538, sup_loss_bbox: 0.3506, unsup_loss_rpn_cls: 0.0365, unsup_loss_rpn_bbox: 0.0000, unsup_loss_cls: 0.0456, unsup_acc: 99.9851, unsup_loss_bbox: 0.0125, loss: 1.0256
2021-09-17 06:15:15,928 - mmdet.ssod - INFO - Iter [1100/14400] lr: 4.655e-03, eta: 9:21:46, time: 2.532, data_time: 0.095, memory: 25990, ema_momentum: 0.9990, unsup_weight: 4, sup_loss_rpn_cls: 0.0900, sup_loss_rpn_bbox: 0.1601, sup_loss_cls: 0.3162, sup_acc: 88.7713, sup_loss_bbox: 0.3548, unsup_loss_rpn_cls: 0.0318, unsup_loss_rpn_bbox: 0.0000, unsup_loss_cls: 0.0414, unsup_acc: 99.9675, unsup_loss_bbox: 0.0048, loss: 0.9990
2021-09-17 06:17:22,186 - mmdet.ssod - INFO - Iter [1150/14400] lr: 4.640e-03, eta: 9:19:34, time: 2.525, data_time: 0.097, memory: 25990, ema_momentum: 0.9990, unsup_weight: 4, sup_loss_rpn_cls: 0.0999, sup_loss_rpn_bbox: 0.1507, sup_loss_cls: 0.3243, sup_acc: 88.5658, sup_loss_bbox: 0.3543, unsup_loss_rpn_cls: 0.0462, unsup_loss_rpn_bbox: 0.0000, unsup_loss_cls: 0.0457, unsup_acc: 99.9852, unsup_loss_bbox: 0.0050, loss: 1.0261
2021-09-17 06:19:31,772 - mmdet.ssod - INFO - Iter [1200/14400] lr: 4.624e-03, eta: 9:17:59, time: 2.592, data_time: 0.096, memory: 25990, ema_momentum: 0.9990, unsup_weight: 4, sup_loss_rpn_cls: 0.0969, sup_loss_rpn_bbox: 0.1458, sup_loss_cls: 0.3263, sup_acc: 88.5438, sup_loss_bbox: 0.3582, unsup_loss_rpn_cls: 0.0503, unsup_loss_rpn_bbox: 0.0000, unsup_loss_cls: 0.0507, unsup_acc: 99.9736, unsup_loss_bbox: 0.0132, loss: 1.0413
2021-09-17 06:21:41,134 - mmdet.ssod - INFO - Iter [1250/14400] lr: 4.608e-03, eta: 9:16:19, time: 2.587, data_time: 0.102, memory: 25990, ema_momentum: 0.9990, unsup_weight: 4, sup_loss_rpn_cls: 0.0904, sup_loss_rpn_bbox: 0.1604, sup_loss_cls: 0.3366, sup_acc: 87.5838, sup_loss_bbox: 0.3819, unsup_loss_rpn_cls: 0.0362, unsup_loss_rpn_bbox: 0.0000, unsup_loss_cls: 0.0480, unsup_acc: 99.9461, unsup_loss_bbox: 0.0234, loss: 1.0770
2021-09-17 06:23:48,864 - mmdet.ssod - INFO - Iter [1300/14400] lr: 4.592e-03, eta: 9:14:21, time: 2.555, data_time: 0.095, memory: 25990, ema_momentum: 0.9990, unsup_weight: 4, sup_loss_rpn_cls: 0.0820, sup_loss_rpn_bbox: 0.1481, sup_loss_cls: 0.3262, sup_acc: 87.5965, sup_loss_bbox: 0.3754, unsup_loss_rpn_cls: 0.0333, unsup_loss_rpn_bbox: 0.0000, unsup_loss_cls: 0.0410, unsup_acc: 99.9516, unsup_loss_bbox: 0.0486, loss: 1.0546
2021-09-17 06:25:57,272 - mmdet.ssod - INFO - Iter [1350/14400] lr: 4.576e-03, eta: 9:12:28, time: 2.568, data_time: 0.097, memory: 25990, ema_momentum: 0.9990, unsup_weight: 4, sup_loss_rpn_cls: 0.0713, sup_loss_rpn_bbox: 0.1393, sup_loss_cls: 0.3126, sup_acc: 87.4847, sup_loss_bbox: 0.3748, unsup_loss_rpn_cls: 0.0245, unsup_loss_rpn_bbox: 0.0000, unsup_loss_cls: 0.0343, unsup_acc: 99.9417, unsup_loss_bbox: 0.0523, loss: 1.0092
2021-09-17 06:28:05,727 - mmdet.ssod - INFO - Iter [1400/14400] lr: 4.561e-03, eta: 9:10:34, time: 2.569, data_time: 0.097, memory: 25990, ema_momentum: 0.9990, unsup_weight: 4, sup_loss_rpn_cls: 0.0787, sup_loss_rpn_bbox: 0.1475, sup_loss_cls: 0.3314, sup_acc: 86.9955, sup_loss_bbox: 0.3917, unsup_loss_rpn_cls: 0.0270, unsup_loss_rpn_bbox: 0.0000, unsup_loss_cls: 0.0443, unsup_acc: 99.9356, unsup_loss_bbox: 0.0590, loss: 1.0797

2021-09-17 06:30:15,212 - mmdet.ssod - INFO - Iter [1450/14400] lr: 4.545e-03, eta: 9:08:49, time: 2.590, data_time: 0.099, memory: 25990, ema_momentum: 0.9990, unsup_weight: 4, sup_loss_rpn_cls: 0.0729, sup_loss_rpn_bbox: 0.1376, sup_loss_cls: 0.3080, sup_acc: 87.7220, sup_loss_bbox: 0.3806, unsup_loss_rpn_cls: 0.0266, unsup_loss_rpn_bbox: 0.0000, unsup_loss_cls: 0.0381, unsup_acc: 99.9128, unsup_loss_bbox: 0.0915, loss: 1.0554
2021-09-17 06:32:25,458 - mmdet.ssod - INFO - Iter [1500/14400] lr: 4.529e-03, eta: 9:07:08, time: 2.605, data_time: 0.100, memory: 25990, ema_momentum: 0.9990, unsup_weight: 4, sup_loss_rpn_cls: 0.0745, sup_loss_rpn_bbox: 0.1389, sup_loss_cls: 0.3179, sup_acc: 87.0793, sup_loss_bbox: 0.3934, unsup_loss_rpn_cls: 0.0250, unsup_loss_rpn_bbox: 0.0000, unsup_loss_cls: 0.0333, unsup_acc: 99.9277, unsup_loss_bbox: 0.1138, loss: 1.0968
2021-09-17 06:34:35,229 - mmdet.ssod - INFO - Iter [1550/14400] lr: 4.513e-03, eta: 9:05:22, time: 2.595, data_time: 0.100, memory: 25990, ema_momentum: 0.9990, unsup_weight: 4, sup_loss_rpn_cls: 0.0700, sup_loss_rpn_bbox: 0.1389, sup_loss_cls: 0.3169, sup_acc: 87.1831, sup_loss_bbox: 0.3859, unsup_loss_rpn_cls: 0.0237, unsup_loss_rpn_bbox: 0.0000, unsup_loss_cls: 0.0387, unsup_acc: 99.9144, unsup_loss_bbox: 0.1404, loss: 1.1146

Is there an official 1%, 5%, 10% MS COCO benchmark?

Hi,

you reported results for partially labeled data (first introduced by STAC). Is there an official benchmark website to which the results are also reported to or are they only reported in the paper?

Best
Karol

ICCV2021 accepted? VOC?

Good job!
Wonder whether this work is accepted by iccv2021?
Any experiments on PASCAL VOC?

THX.

Question about what is Full Labeled Training and Datasets

There required structure of the images is as follows:

# YOUR_DATA should be a directory contains coco dataset.
# For eg.:
# YOUR_DATA/
#  coco/
#     train2017/
#     val2017/
#     unlabeled2017/
#     annotations/
ln -s ${YOUR_DATA} data
bash tools/dataset/prepare_coco_data.sh conduct

My Questions are:

  1. If my understanding is correct, the unlabeled2017 contains all the unlabeled images, right?

  2. When you say X% labeled data (e.g. 5%, 10%, etc), does that take X% from the train2017/ training data? What happens to the 100-X% of the data in the training data? Does it get added to the unlabeled pool for training?

  3. When you say full-labeled training, does it mean it trains on all the data in train2017/ (supervised) then use the unlabeled2017/ data for unsupervised part of the semi-supervised learning? Or is it just supervised training on all training dataset?

  4. When using a custom dataset in COCO format, do I just follow the same instructions or do I need to change something more?

Meaning of epoch length

First of all thanks for the excellent work.
I am really wondering about the meaning of epoch_length for the DistributedGroupSemiBalanceSampler,and the default value is 7330, and it works when contructing the sample index.

Thanks for your reply.

question about "def unsup_rcnn_reg_loss"

with torch.no_grad():
            _, _scores = self.teacher.roi_head.simple_test_bboxes(
                teacher_feat,
                teacher_img_metas,
                aligned_proposals,
                None,
                rescale=False,
            )
            bg_score = torch.cat([_score[:, -1] for _score in _scores])
            assigned_label, _, _, _ = bbox_targets
            neg_inds = assigned_label==self.student.roi_head.bbox_head.num_classes
            bbox_targets[1][neg_inds] = bg_score[neg_inds].detach()
        loss = self.student.roi_head.bbox_head.loss(
            bbox_results["cls_score"],
            bbox_results["bbox_pred"],
            rois,
            *bbox_targets,
            reduction_override="none",
        )

bbox_targets[0=]labels
bbox_targets[1] =label_weights
bbox_targets[2]= bbox_targets
bbox_targets[3]= bbox_weights

Q1. why only use bg_score from teacher_model? how about pos_inds?

For removing wandb

Thanks for the great job!

I just want to remove the wandb, but removing line 276-289 of "base.py" seems strange. Does it mean that I should remove all the "log_config" part?

the line 276-289 in base.py:

project="pre_release",
name="${cfg_name}",
config=dict(
work_dirs="${work_dir}",
total_step="${runner.max_iters}",
),
),
by_epoch=False,
),
],
)

the lines stop at 286. I am not sure about the specific modification.

Best wishes.

[Resolved] Error when training (issues from config files)

Hello,

I have access to only 1 GPU and I am trying to train a SoftTeacher model on a custom dataset.
When trying to launch the training process on 100% labeled data with the following command:

bash tools/dist_train.sh configs/soft_teacher/soft_teacher_faster_rcnn_r50_caffe_fpn_coco_full_720k.py 1

I get an error and here is the full log:

2021-10-12 10:37:40,133 - mmdet.ssod - INFO - Environment info:
sys.platform: linux
Python: 3.7.11 (default, Jul 27 2021, 14:32:16) [GCC 7.5.0]
CUDA available: True
GPU 0: NVIDIA A100-PCIE-40GB MIG 2g.10gb
CUDA_HOME: /usr/local/cuda
NVCC: Build cuda_11.4.r11.4/compiler.30300941_0
GCC: gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44)
PyTorch: 1.9.1
PyTorch compiling details: PyTorch built with:

  • GCC 7.3
  • C++ Version: 201402
  • Intel(R) oneAPI Math Kernel Library Version 2021.3-Product Build 20210617 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v2.1.2 (Git Hash 98be7e8afa711dc9b66c8ff3504129cb82013cdb)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • NNPACK is enabled
  • CPU capability usage: AVX2
  • CUDA Runtime 11.1
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  • CuDNN 8.0.5
  • Magma 2.5.2
  • Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.9.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.10.1
OpenCV: 4.5.3
MMCV: 1.3.9
MMCV Compiler: GCC 5.4
MMCV CUDA Compiler: not available
MMDetection: 2.17.0+aacbef2

2021-10-12 10:37:42,266 - mmdet.ssod - INFO - Distributed training: True
2021-10-12 10:37:44,227 - mmdet.ssod - INFO - Config:
model = dict(
type='SoftTeacher',
model=dict(
type='FasterRCNN',
backbone=dict(
type='ResNet',
depth=50,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=1,
norm_cfg=dict(type='BN', requires_grad=False),
norm_eval=True,
style='caffe',
init_cfg=dict(
type='Pretrained',
checkpoint='open-mmlab://detectron2/resnet50_caffe')),
neck=dict(
type='FPN',
in_channels=[256, 512, 1024, 2048],
out_channels=256,
num_outs=5),
rpn_head=dict(
type='RPNHead',
in_channels=256,
feat_channels=256,
anchor_generator=dict(
type='AnchorGenerator',
scales=[8],
ratios=[0.5, 1.0, 2.0],
strides=[4, 8, 16, 32, 64]),
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[0.0, 0.0, 0.0, 0.0],
target_stds=[1.0, 1.0, 1.0, 1.0]),
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
roi_head=dict(
type='StandardRoIHead',
bbox_roi_extractor=dict(
type='SingleRoIExtractor',
roi_layer=dict(
type='RoIAlign', output_size=7, sampling_ratio=0),
out_channels=256,
featmap_strides=[4, 8, 16, 32]),
bbox_head=dict(
type='Shared2FCBBoxHead',
in_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=80,
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[0.0, 0.0, 0.0, 0.0],
target_stds=[0.1, 0.1, 0.2, 0.2]),
reg_class_agnostic=False,
loss_cls=dict(
type='CrossEntropyLoss',
use_sigmoid=False,
loss_weight=1.0),
loss_bbox=dict(type='L1Loss', loss_weight=1.0))),
train_cfg=dict(
rpn=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.7,
neg_iou_thr=0.3,
min_pos_iou=0.3,
match_low_quality=True,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=256,
pos_fraction=0.5,
neg_pos_ub=-1,
add_gt_as_proposals=False),
allowed_border=-1,
pos_weight=-1,
debug=False),
rpn_proposal=dict(
nms_pre=2000,
max_per_img=1000,
nms=dict(type='nms', iou_threshold=0.7),
min_bbox_size=0),
rcnn=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.5,
neg_iou_thr=0.5,
min_pos_iou=0.5,
match_low_quality=False,
ignore_iof_thr=-1),
sampler=dict(
type='RandomSampler',
num=512,
pos_fraction=0.25,
neg_pos_ub=-1,
add_gt_as_proposals=True),
pos_weight=-1,
debug=False)),
test_cfg=dict(
rpn=dict(
nms_pre=1000,
max_per_img=1000,
nms=dict(type='nms', iou_threshold=0.7),
min_bbox_size=0),
rcnn=dict(
score_thr=0.05,
nms=dict(type='nms', iou_threshold=0.5),
max_per_img=100))),
train_cfg=dict(
use_teacher_proposal=False,
pseudo_label_initial_score_thr=0.5,
rpn_pseudo_threshold=0.9,
cls_pseudo_threshold=0.9,
reg_pseudo_threshold=0.01,
jitter_times=10,
jitter_scale=0.06,
min_pseduo_box_size=0,
unsup_weight=2.0),
test_cfg=dict(inference_on='student'))
dataset_type = 'CocoDataset'
data_root = 'data/coco/'
img_norm_cfg = dict(
mean=[103.53, 116.28, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True),
dict(
type='Sequential',
transforms=[
dict(
type='RandResize',
img_scale=[(1333, 400), (1333, 1200)],
multiscale_mode='range',
keep_ratio=True),
dict(type='RandFlip', flip_ratio=0.5),
dict(
type='OneOf',
transforms=[
dict(type='Identity'),
dict(type='AutoContrast'),
dict(type='RandEqualize'),
dict(type='RandSolarize'),
dict(type='RandColor'),
dict(type='RandContrast'),
dict(type='RandBrightness'),
dict(type='RandSharpness'),
dict(type='RandPosterize')
])
],
record=True),
dict(type='Pad', size_divisor=32),
dict(
type='Normalize',
mean=[103.53, 116.28, 123.675],
std=[1.0, 1.0, 1.0],
to_rgb=False),
dict(type='ExtraAttrs', tag='sup'),
dict(type='DefaultFormatBundle'),
dict(
type='Collect',
keys=['img', 'gt_bboxes', 'gt_labels'],
meta_keys=('filename', 'ori_shape', 'img_shape', 'img_norm_cfg',
'pad_shape', 'scale_factor', 'tag'))
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1333, 800),
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(
type='Normalize',
mean=[103.53, 116.28, 123.675],
std=[1.0, 1.0, 1.0],
to_rgb=False),
dict(type='Pad', size_divisor=32),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img'])
])
]
data = dict(
samples_per_gpu=8,
workers_per_gpu=8,
train=dict(
type='SemiDataset',
sup=dict(
type='CocoDataset',
ann_file='data/coco/annotations/instances_train2017.json',
img_prefix='data/coco/train2017/',
pipeline=[
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True),
dict(
type='Sequential',
transforms=[
dict(
type='RandResize',
img_scale=[(1333, 400), (1333, 1200)],
multiscale_mode='range',
keep_ratio=True),
dict(type='RandFlip', flip_ratio=0.5),
dict(
type='OneOf',
transforms=[
dict(type='Identity'),
dict(type='AutoContrast'),
dict(type='RandEqualize'),
dict(type='RandSolarize'),
dict(type='RandColor'),
dict(type='RandContrast'),
dict(type='RandBrightness'),
dict(type='RandSharpness'),
dict(type='RandPosterize')
])
],
record=True),
dict(type='Pad', size_divisor=32),
dict(
type='Normalize',
mean=[103.53, 116.28, 123.675],
std=[1.0, 1.0, 1.0],
to_rgb=False),
dict(type='ExtraAttrs', tag='sup'),
dict(type='DefaultFormatBundle'),
dict(
type='Collect',
keys=['img', 'gt_bboxes', 'gt_labels'],
meta_keys=('filename', 'ori_shape', 'img_shape',
'img_norm_cfg', 'pad_shape', 'scale_factor',
'tag'))
]),
unsup=dict(
type='CocoDataset',
ann_file='data/coco/annotations/instances_unlabeled2017.json',
img_prefix='data/coco/unlabeled2017/',
pipeline=[
dict(type='LoadImageFromFile'),
dict(type='PseudoSamples', with_bbox=True),
dict(
type='MultiBranch',
unsup_teacher=[
dict(
type='Sequential',
transforms=[
dict(
type='RandResize',
img_scale=[(1333, 400), (1333, 1200)],
multiscale_mode='range',
keep_ratio=True),
dict(type='RandFlip', flip_ratio=0.5),
dict(
type='ShuffledSequential',
transforms=[
dict(
type='OneOf',
transforms=[
dict(type='Identity'),
dict(type='AutoContrast'),
dict(type='RandEqualize'),
dict(type='RandSolarize'),
dict(type='RandColor'),
dict(type='RandContrast'),
dict(type='RandBrightness'),
dict(type='RandSharpness'),
dict(type='RandPosterize')
]),
dict(
type='OneOf',
transforms=[{
'type': 'RandTranslate',
'x': (-0.1, 0.1)
}, {
'type': 'RandTranslate',
'y': (-0.1, 0.1)
}, {
'type': 'RandRotate',
'angle': (-30, 30)
},
[{
'type':
'RandShear',
'x': (-30, 30)
}, {
'type':
'RandShear',
'y': (-30, 30)
}]])
]),
dict(
type='RandErase',
n_iterations=(1, 5),
size=[0, 0.2],
squared=True)
],
record=True),
dict(type='Pad', size_divisor=32),
dict(
type='Normalize',
mean=[103.53, 116.28, 123.675],
std=[1.0, 1.0, 1.0],
to_rgb=False),
dict(type='ExtraAttrs', tag='unsup_student'),
dict(type='DefaultFormatBundle'),
dict(
type='Collect',
keys=['img', 'gt_bboxes', 'gt_labels'],
meta_keys=('filename', 'ori_shape', 'img_shape',
'img_norm_cfg', 'pad_shape',
'scale_factor', 'tag',
'transform_matrix'))
],
unsup_student=[
dict(
type='Sequential',
transforms=[
dict(
type='RandResize',
img_scale=[(1333, 400), (1333, 1200)],
multiscale_mode='range',
keep_ratio=True),
dict(type='RandFlip', flip_ratio=0.5)
],
record=True),
dict(type='Pad', size_divisor=32),
dict(
type='Normalize',
mean=[103.53, 116.28, 123.675],
std=[1.0, 1.0, 1.0],
to_rgb=False),
dict(type='ExtraAttrs', tag='unsup_teacher'),
dict(type='DefaultFormatBundle'),
dict(
type='Collect',
keys=['img', 'gt_bboxes', 'gt_labels'],
meta_keys=('filename', 'ori_shape', 'img_shape',
'img_norm_cfg', 'pad_shape',
'scale_factor', 'tag',
'transform_matrix'))
])
],
filter_empty_gt=False)),
val=dict(
type='CocoDataset',
ann_file='data/coco/annotations/instances_val2017.json',
img_prefix='data/coco/val2017/',
pipeline=[
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1333, 800),
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(
type='Normalize',
mean=[103.53, 116.28, 123.675],
std=[1.0, 1.0, 1.0],
to_rgb=False),
dict(type='Pad', size_divisor=32),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img'])
])
]),
test=dict(
type='CocoDataset',
ann_file='data/coco/annotations/instances_val2017.json',
img_prefix='data/coco/val2017/',
pipeline=[
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1333, 800),
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(
type='Normalize',
mean=[103.53, 116.28, 123.675],
std=[1.0, 1.0, 1.0],
to_rgb=False),
dict(type='Pad', size_divisor=32),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img'])
])
]),
sampler=dict(
train=dict(
type='SemiBalanceSampler',
sample_ratio=[1, 1],
by_prob=True,
epoch_length=7330)))
evaluation = dict(interval=4000, metric='bbox', type='SubModulesDistEvalHook')
optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=None)
lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=500,
warmup_ratio=0.001,
step=[480000, 640000])
runner = dict(type='IterBasedRunner', max_iters=720000)
checkpoint_config = dict(interval=4000, by_epoch=False, max_keep_ckpts=20)
log_config = dict(
interval=50,
hooks=[{
'type': 'TextLoggerHook',
'by_epoch': False
}, '\n dict(\n type="WandbLoggerHook",\n init_kwargs=dict(\n project="pre_release",\n name="soft_teacher_faster_rcnn_r50_caffe_fpn_coco_full_720k",\n config=dict(\n work_dirs="./work_dirs/soft_teacher_faster_rcnn_r50_caffe_fpn_coco_full_720k",\n total_step="720000",\n ),\n ),\n by_epoch=False,\n ),\n '
])
custom_hooks = [
dict(type='NumClassCheckHook'),
dict(type='WeightSummary'),
dict(type='MeanTeacher', momentum=0.999, interval=1, warm_up=0)
]
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
mmdet_base = '../../thirdparty/mmdetection/configs/base'
strong_pipeline = [
dict(
type='Sequential',
transforms=[
dict(
type='RandResize',
img_scale=[(1333, 400), (1333, 1200)],
multiscale_mode='range',
keep_ratio=True),
dict(type='RandFlip', flip_ratio=0.5),
dict(
type='ShuffledSequential',
transforms=[
dict(
type='OneOf',
transforms=[
dict(type='Identity'),
dict(type='AutoContrast'),
dict(type='RandEqualize'),
dict(type='RandSolarize'),
dict(type='RandColor'),
dict(type='RandContrast'),
dict(type='RandBrightness'),
dict(type='RandSharpness'),
dict(type='RandPosterize')
]),
dict(
type='OneOf',
transforms=[{
'type': 'RandTranslate',
'x': (-0.1, 0.1)
}, {
'type': 'RandTranslate',
'y': (-0.1, 0.1)
}, {
'type': 'RandRotate',
'angle': (-30, 30)
},
[{
'type': 'RandShear',
'x': (-30, 30)
}, {
'type': 'RandShear',
'y': (-30, 30)
}]])
]),
dict(
type='RandErase',
n_iterations=(1, 5),
size=[0, 0.2],
squared=True)
],
record=True),
dict(type='Pad', size_divisor=32),
dict(
type='Normalize',
mean=[103.53, 116.28, 123.675],
std=[1.0, 1.0, 1.0],
to_rgb=False),
dict(type='ExtraAttrs', tag='unsup_student'),
dict(type='DefaultFormatBundle'),
dict(
type='Collect',
keys=['img', 'gt_bboxes', 'gt_labels'],
meta_keys=('filename', 'ori_shape', 'img_shape', 'img_norm_cfg',
'pad_shape', 'scale_factor', 'tag', 'transform_matrix'))
]
weak_pipeline = [
dict(
type='Sequential',
transforms=[
dict(
type='RandResize',
img_scale=[(1333, 400), (1333, 1200)],
multiscale_mode='range',
keep_ratio=True),
dict(type='RandFlip', flip_ratio=0.5)
],
record=True),
dict(type='Pad', size_divisor=32),
dict(
type='Normalize',
mean=[103.53, 116.28, 123.675],
std=[1.0, 1.0, 1.0],
to_rgb=False),
dict(type='ExtraAttrs', tag='unsup_teacher'),
dict(type='DefaultFormatBundle'),
dict(
type='Collect',
keys=['img', 'gt_bboxes', 'gt_labels'],
meta_keys=('filename', 'ori_shape', 'img_shape', 'img_norm_cfg',
'pad_shape', 'scale_factor', 'tag', 'transform_matrix'))
]
unsup_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='PseudoSamples', with_bbox=True),
dict(
type='MultiBranch',
unsup_teacher=[
dict(
type='Sequential',
transforms=[
dict(
type='RandResize',
img_scale=[(1333, 400), (1333, 1200)],
multiscale_mode='range',
keep_ratio=True),
dict(type='RandFlip', flip_ratio=0.5),
dict(
type='ShuffledSequential',
transforms=[
dict(
type='OneOf',
transforms=[
dict(type='Identity'),
dict(type='AutoContrast'),
dict(type='RandEqualize'),
dict(type='RandSolarize'),
dict(type='RandColor'),
dict(type='RandContrast'),
dict(type='RandBrightness'),
dict(type='RandSharpness'),
dict(type='RandPosterize')
]),
dict(
type='OneOf',
transforms=[{
'type': 'RandTranslate',
'x': (-0.1, 0.1)
}, {
'type': 'RandTranslate',
'y': (-0.1, 0.1)
}, {
'type': 'RandRotate',
'angle': (-30, 30)
},
[{
'type': 'RandShear',
'x': (-30, 30)
}, {
'type': 'RandShear',
'y': (-30, 30)
}]])
]),
dict(
type='RandErase',
n_iterations=(1, 5),
size=[0, 0.2],
squared=True)
],
record=True),
dict(type='Pad', size_divisor=32),
dict(
type='Normalize',
mean=[103.53, 116.28, 123.675],
std=[1.0, 1.0, 1.0],
to_rgb=False),
dict(type='ExtraAttrs', tag='unsup_student'),
dict(type='DefaultFormatBundle'),
dict(
type='Collect',
keys=['img', 'gt_bboxes', 'gt_labels'],
meta_keys=('filename', 'ori_shape', 'img_shape',
'img_norm_cfg', 'pad_shape', 'scale_factor', 'tag',
'transform_matrix'))
],
unsup_student=[
dict(
type='Sequential',
transforms=[
dict(
type='RandResize',
img_scale=[(1333, 400), (1333, 1200)],
multiscale_mode='range',
keep_ratio=True),
dict(type='RandFlip', flip_ratio=0.5)
],
record=True),
dict(type='Pad', size_divisor=32),
dict(
type='Normalize',
mean=[103.53, 116.28, 123.675],
std=[1.0, 1.0, 1.0],
to_rgb=False),
dict(type='ExtraAttrs', tag='unsup_teacher'),
dict(type='DefaultFormatBundle'),
dict(
type='Collect',
keys=['img', 'gt_bboxes', 'gt_labels'],
meta_keys=('filename', 'ori_shape', 'img_shape',
'img_norm_cfg', 'pad_shape', 'scale_factor', 'tag',
'transform_matrix'))
])
]
fp16 = dict(loss_scale='dynamic')
work_dir = './work_dirs/soft_teacher_faster_rcnn_r50_caffe_fpn_coco_full_720k'
cfg_name = 'soft_teacher_faster_rcnn_r50_caffe_fpn_coco_full_720k'
gpu_ids = range(0, 1)

/gpfs/home/rdlamol/SoftTeacher/thirdparty/mmdetection/mmdet/core/anchor/builder.py:17: UserWarning: build_anchor_generator would be deprecated soon, please use build_prior_generator
'build_anchor_generator would be deprecated soon, please use '
2021-10-12 10:37:44,800 - mmcv - INFO - load model from: open-mmlab://detectron2/resnet50_caffe
2021-10-12 10:37:44,800 - mmcv - INFO - Use load_from_openmmlab loader
2021-10-12 10:37:45,808 - mmcv - WARNING - The model and loaded state dict do not match exactly

unexpected key in source state_dict: conv1.bias

2021-10-12 10:37:45,970 - mmcv - INFO - load model from: open-mmlab://detectron2/resnet50_caffe
2021-10-12 10:37:45,970 - mmcv - INFO - Use load_from_openmmlab loader
2021-10-12 10:37:46,047 - mmcv - WARNING - The model and loaded state dict do not match exactly

unexpected key in source state_dict: conv1.bias

loading annotations into memory...
Done (t=1.43s)
creating index...
index created!
loading annotations into memory...
Done (t=0.30s)
creating index...
index created!
Traceback (most recent call last):
File "tools/train.py", line 198, in
main()
File "tools/train.py", line 193, in main
meta=meta,
File "/gpfs/home/rdlamol/SoftTeacher/ssod/apis/train.py", line 143, in train_detector
cfg.get("momentum_config", None),
File "/gpfs/home/rdlamol/anaconda3/envs/st2/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 251, in register_training_hooks
info.setdefault('by_epoch', False)
AttributeError: 'str' object has no attribute 'setdefault'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 13852) of binary: /gpfs/home/rdlamol/anaconda3/envs/st2/bin/python
/gpfs/home/rdlamol/anaconda3/envs/st2/lib/python3.7/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py:367: UserWarning:


           CHILD PROCESS FAILED WITH NO ERROR_FILE

CHILD PROCESS FAILED WITH NO ERROR_FILE
Child process 13852 (local_rank 0) FAILED (exitcode 1)
Error msg: Process failed with exitcode 1
Without writing an error file to <N/A>.
While this DOES NOT affect the correctness of your application,
no trace information about the error will be available for inspection.
Consider decorating your top level entrypoint function with
torch.distributed.elastic.multiprocessing.errors.record. Example:

from torch.distributed.elastic.multiprocessing.errors import record

@record
def trainer_main(args):
# do train


warnings.warn(_no_error_file_warning_msg(rank, failure))
Traceback (most recent call last):
File "/gpfs/home/rdlamol/anaconda3/envs/st2/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/gpfs/home/rdlamol/anaconda3/envs/st2/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/gpfs/home/rdlamol/anaconda3/envs/st2/lib/python3.7/site-packages/torch/distributed/run.py", line 702, in
main()
File "/gpfs/home/rdlamol/anaconda3/envs/st2/lib/python3.7/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 361, in wrapper
return f(*args, **kwargs)
File "/gpfs/home/rdlamol/anaconda3/envs/st2/lib/python3.7/site-packages/torch/distributed/run.py", line 698, in main
run(args)
File "/gpfs/home/rdlamol/anaconda3/envs/st2/lib/python3.7/site-packages/torch/distributed/run.py", line 692, in run
)(*cmd_args)
File "/gpfs/home/rdlamol/anaconda3/envs/st2/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 116, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/gpfs/home/rdlamol/anaconda3/envs/st2/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:


     tools/train.py FAILED

=======================================
Root Cause:
[0]:
time: 2021-10-12_10:37:56
rank: 0 (local_rank: 0)
exitcode: 1 (pid: 13852)
error_file: <N/A>
msg: "Process failed with exitcode 1"

Other Failures:
<NO_OTHER_FAILURES>


I am not sure to understand what's wrong with the training process. I hope you can help me with this issue.

Cheers,
Olivier

How to train and test on custom dataset?

Thank you very much for the great work and congratulations on getting accepted at ICCV2021. Can you please provide description on Readme regarding training and testing with this framework on custom dataset?

assert len(indices) == len(self)

hello,
When I use it, raise error: "assert len(indices) == len(self), f"{indices} not equal {len(self)} while offset is: {offset}""
then I print the length info, =====len of indices is 26865 - offset: 0 - len self 36650
below is the detail error info, Please help me.
Traceback (most recent call last): File "tools/train.py", line 198, in <module> main() File "tools/train.py", line 193, in main meta=meta, File "/data6/ziqiwen/code/softteacher/ssod/apis/train.py", line 206, in train_detector runner.run(data_loaders, cfg.workflow) File "/home/ziqiwen/anaconda3/envs/mm/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 117, in run iter_loaders = [IterLoader(x) for x in data_loaders] File "/home/ziqiwen/anaconda3/envs/mm/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 117, in <listcomp> iter_loaders = [IterLoader(x) for x in data_loaders] File "/home/ziqiwen/anaconda3/envs/mm/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 23, in __init__ self.iter_loader = iter(self._dataloader) File "/home/ziqiwen/anaconda3/envs/mm/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 291, in __iter__ return _MultiProcessingDataLoaderIter(self) File "/home/ziqiwen/anaconda3/envs/mm/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 764, in __init__ self._try_put_index() File "/home/ziqiwen/anaconda3/envs/mm/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 994, in _try_put_index index = self._next_index() File "/home/ziqiwen/anaconda3/envs/mm/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 357, in _next_index return next(self._sampler_iter) # may raise StopIteration File "/home/ziqiwen/anaconda3/envs/mm/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 208, in __iter__ for idx in self.sampler: File "/data6/ziqiwen/code/softteacher/ssod/datasets/samplers/semi_sampler.py", line 189, in __iter__ assert len(indices) == len(self) AssertionError Traceback (most recent call last): File "/home/ziqiwen/anaconda3/envs/mm/lib/python3.7/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/home/ziqiwen/anaconda3/envs/mm/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/ziqiwen/anaconda3/envs/mm/lib/python3.7/site-packages/torch/distributed/launch.py", line 261, in <module> main() File "/home/ziqiwen/anaconda3/envs/mm/lib/python3.7/site-packages/torch/distributed/launch.py", line 257, in main cmd=cmd)

Specify GPU ids in bash command

As a little rookie in this area, I wonna ask a simple question which I've been confused with.
I see that in this train.py, I can't specify gpu ids if I use dist training bash command like "bash <gpu_nums>".
And I also tried setting os.environment variable CUDA_VISIBLE_DEVICE=2,3 in tools/train.py and it also don't work.
In the help it showed during training process:
Rank 1 using best-guess GPU 1 to perform barrier as devices used by this process are currently unknown. This can potent ially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device.
I guess it's the way to solve this issue but I didn't know how.
It's often the case in the group-shared machine that GPU 0 and 1 is heavily used and I need to use 2 or 3, but I failed to specify that , which delays my plans.
Sorry for bothering you with this question.Thanks so much for your help! And I guess this might be added into tutorial,too!
(-u-)

Iterations in the logging

Thanks for sharing the codes. I have a question about the logging for evaluation.

When training a supervised baseline with 1% data (fast_rcnn_r50_caffe_fpn_coco_partial_180k.py), the evaluation is happened every 4000 iterations based on the configuration. However, in the logs, the iter idx is always 625

{"mode": "train", "epoch": 28, "iter": 625, "lr": 0.01, "memory": 1985, "bbox_mAP": 0.083, "bbox_mAP_50": 0.209, "bbox_mAP_75": 0.045, "bbox_mAP_s": 0.036, "bbox_mAP_m": 0.094, "bbox_mAP_l": 0.11, "bbox_mAP_copypaste": "0.083 0.209 0.045 0.036 0.094 0.110", "data_time": 0.16451, "loss_rpn_cls": 0.04722, "loss_rpn_bbox": 0.06773, "loss_cls": 0.26717, "acc": 91.50293, "loss_bbox": 0.33872, "loss": 0.72084, "time": 0.32041}

Do you mind pointing me that where I should fix this? Thanks.

Model training stops after validation after 4000 iterations

After training for 4000 iterations the validation happens and after that the training stops throwing the following error:

raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
***************************************
         tools/train.py FAILED         
=======================================
Root Cause:
[0]:
  time: 2021-09-22_05:54:53
  rank: 1 (local_rank: 1)
  exitcode: 1 (pid: 2210236)
  error_file: <N/A>
  msg: "Process failed with exitcode 1"
=======================================
Other Failures:
  <NO_OTHER_FAILURES>
***************************************

I am training with 2 gpus. Do you have any insight why this error is being thrown?

some question about box jittering

code from extract teacher feature


proposal_list, proposal_label_list, _ = list(
            zip(
                *[
                    filter_invalid(
                        proposal,
                        proposal_label,
                        proposal[:, -1],
                        thr=thr,
                        min_size=self.train_cfg.min_pseduo_box_size,
                    )
                    for proposal, proposal_label in zip(
                        proposal_list, proposal_label_list
                    )
                ]
            )
        )
        det_bboxes = proposal_list
        reg_unc = self.compute_uncertainty_with_aug(
            feat, img_metas, proposal_list, proposal_label_list
        )
        det_bboxes = [
            torch.cat([bbox, unc], dim=-1) for bbox, unc in zip(det_bboxes, reg_unc)
        ]

i have some questions about det_bboxes and reg_unc

Q1: if jittering box for 10 times as mentioned in repo , you get (B,10,4) box proposals ?
Q2: proposal list has shape of (B,100,4) and reg_unc has some shape (B,10,4) how can this be concated?

ps. In my roi_head i get 100 proposals for default do i have to change this parameter to get 10 box proposals?

KeyError: 'loss_cls'

After 300 Iter training, raise "KeyError: 'loss_cls'".

Below is my training information:

wandb: (1) Create a W&B account
wandb: (2) Use an existing W&B account
wandb: (3) Don't visualize my results
wandb: Enter your choice: wandb: WARNING Invalid choice
wandb: Enter your choice: wandb: WARNING Invalid choice
wandb: Enter your choice: 3
wandb: You chose 'Don't visualize my results'

CondaEnvException: Unable to determine environment

Please re-run this command with one of the following options:

  • Provide an environment name via --name or -n
  • Re-run this command inside an activated conda environment.

wandb: W&B syncing is set to offline in this directory. Run wandb online or set WANDB_MODE=online to enable cloud syncing.
=====group sizes is [1755 7042]
=====len of indices is 14660 - offset: 0 - len self 14660
/opt/conda/lib/python3.7/site-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.)
return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
/home/swap/project/SoftTeacher/thirdparty/mmdetection/mmdet/core/anchor/anchor_generator.py:324: UserWarning: grid_anchors would be deprecated soon. Please use grid_priors
warnings.warn('grid_anchors would be deprecated soon. '
/home/swap/project/SoftTeacher/thirdparty/mmdetection/mmdet/core/anchor/anchor_generator.py:361: UserWarning: single_level_grid_anchors would be deprecated soon. Please use single_level_grid_priors
'single_level_grid_anchors would be deprecated soon. '
2021-09-13 07:35:50,265 - mmcv - INFO - Reducer buckets have been rebuilt in this iteration.
2021-09-13 07:36:05,024 - mmdet.ssod - INFO - Iter [50/14400] lr: 9.890e-04, eta: 1:18:29, time: 0.328, data_time: 0.021, memory: 3390, ema_momentum: 0.9800, unsup_weight: 4, sup_loss_rpn_cls: 0.4607, sup_loss_rpn_bbox: 0.2644, sup_loss_cls: 1.8000, sup_acc: 74.8359, sup_loss_bbox: 0.3120, unsup_loss_rpn_cls: 1.2687, unsup_loss_rpn_bbox: 0.4132, unsup_loss_cls: 3.9225, unsup_acc: 78.9180, unsup_loss_bbox: 2.3666, loss: 10.8082
2021-09-13 07:36:20,035 - mmdet.ssod - INFO - Iter [100/14400] lr: 1.988e-03, eta: 1:14:52, time: 0.300, data_time: 0.012, memory: 3390, ema_momentum: 0.9900, unsup_weight: 4, sup_loss_rpn_cls: 0.2892, sup_loss_rpn_bbox: 0.2540, sup_loss_cls: 0.4055, sup_acc: 92.8125, sup_loss_bbox: 0.2864, unsup_loss_rpn_cls: 0.1523, unsup_loss_rpn_bbox: 0.0000, unsup_loss_cls: 0.0626, unsup_acc: 99.9648, unsup_loss_bbox: 0.0114, loss: 1.4615
2021-09-13 07:36:35,185 - mmdet.ssod - INFO - Iter [150/14400] lr: 2.987e-03, eta: 1:13:43, time: 0.303, data_time: 0.012, memory: 3390, ema_momentum: 0.9933, unsup_weight: 4, sup_loss_rpn_cls: 0.2405, sup_loss_rpn_bbox: 0.2624, sup_loss_cls: 0.3531, sup_acc: 92.7383, sup_loss_bbox: 0.2667, unsup_loss_rpn_cls: 0.1803, unsup_loss_rpn_bbox: 0.0000, unsup_loss_cls: 0.0657, unsup_acc: 100.0000, unsup_loss_bbox: 0.0000, loss: 1.3686
2021-09-13 07:36:51,586 - mmdet.ssod - INFO - Iter [200/14400] lr: 3.986e-03, eta: 1:14:30, time: 0.328, data_time: 0.012, memory: 3390, ema_momentum: 0.9950, unsup_weight: 4, sup_loss_rpn_cls: 0.3032, sup_loss_rpn_bbox: 0.3015, sup_loss_cls: 0.4180, sup_acc: 92.7266, sup_loss_bbox: 0.2982, unsup_loss_rpn_cls: 0.1778, unsup_loss_rpn_bbox: 0.0000, unsup_loss_cls: 0.0629, unsup_acc: 100.0000, unsup_loss_bbox: 0.0000, loss: 1.5616
2021-09-13 07:37:07,086 - mmdet.ssod - INFO - Iter [250/14400] lr: 4.985e-03, eta: 1:14:01, time: 0.310, data_time: 0.012, memory: 3390, ema_momentum: 0.9960, unsup_weight: 4, sup_loss_rpn_cls: 0.3210, sup_loss_rpn_bbox: 0.4059, sup_loss_cls: 0.4333, sup_acc: 92.9453, sup_loss_bbox: 0.3436, unsup_loss_rpn_cls: 0.1687, unsup_loss_rpn_bbox: 0.0000, unsup_loss_cls: 0.0740, unsup_acc: 100.0000, unsup_loss_bbox: 0.0000, loss: 1.7465
2021-09-13 07:37:22,497 - mmdet.ssod - INFO - Iter [300/14400] lr: 5.984e-03, eta: 1:13:32, time: 0.308, data_time: 0.011, memory: 3390, ema_momentum: 0.9967, unsup_weight: 4, sup_loss_rpn_cls: 0.3386, sup_loss_rpn_bbox: 0.4950, sup_loss_cls: 0.3686, sup_acc: 93.9102, sup_loss_bbox: 0.2682, unsup_loss_rpn_cls: 0.1938, unsup_loss_rpn_bbox: 0.0000, unsup_loss_cls: 0.0760, unsup_acc: 100.0000, unsup_loss_bbox: 0.0000, loss: 1.7401
Traceback (most recent call last):
File "tools/train.py", line 198, in
main()
File "tools/train.py", line 193, in main
meta=meta,
File "/home/swap/project/SoftTeacher/ssod/apis/train.py", line 205, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 133, in run
iter_runner(iter_loaders[i], **kwargs)
File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 60, in train
outputs = self.model.train_step(data_batch, self.optimizer, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/mmcv/parallel/distributed.py", line 53, in train_step
output = self.module.train_step(*inputs[0], **kwargs[0])
File "/home/swap/project/SoftTeacher/thirdparty/mmdetection/mmdet/models/detectors/base.py", line 238, in train_step
losses = self(**data)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 128, in new_func
output = old_func(*new_args, **new_kwargs)
File "/home/swap/project/SoftTeacher/thirdparty/mmdetection/mmdet/models/detectors/base.py", line 172, in forward
return self.forward_train(img, img_metas, **kwargs)
File "/home/swap/project/SoftTeacher/ssod/models/soft_teacher.py", line 50, in forward_train
data_groups["unsup_teacher"], data_groups["unsup_student"]
File "/home/swap/project/SoftTeacher/ssod/models/soft_teacher.py", line 77, in foward_unsup_train
return self.compute_pseudo_label_loss(student_info, teacher_info)
File "/home/swap/project/SoftTeacher/ssod/models/soft_teacher.py", line 120, in compute_pseudo_label_loss
student_info=student_info,
File "/home/swap/project/SoftTeacher/ssod/models/soft_teacher.py", line 244, in unsup_rcnn_cls_loss
loss["loss_cls"] = loss["loss_cls"].sum() / max(bbox_targets[1].sum(), 1.0)
KeyError: 'loss_cls'

Mappings between labeled and unlabeled data

Hello,

First of all, thank you for the work on this very interesting project!

I am combining multiple datasets for a project and would like to use SoftTeacher for object detection.

My question is the following:

Do the annotation data for labeled data be consistent with the one for unlabeled data ? For example, in the labeled data, the category_id 0 correspond to rice, then should it be the case in the annotations file for unlabeled data? (same question for image_id and annotation_id)

Thank you in advance!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.