facebookresearch / maskformer Goto Github PK

View Code? Open in Web Editor NEW

1.3K 25.0 146.0 323 KB

Per-Pixel Classification is Not All You Need for Semantic Segmentation (NeurIPS 2021, spotlight)

License: Other

Python 100.00%

maskformer's Introduction

MaskFormer: Per-Pixel Classification is Not All You Need for Semantic Segmentation

Bowen Cheng, Alexander G. Schwing, Alexander Kirillov

[arXiv] [Project] [BibTeX]

Mask2Former

Checkout Mask2Former, a universal architecture based on MaskFormer meta-architecture that achieves SOTA on panoptic, instance and semantic segmentation across four popular datasets (ADE20K, Cityscapes, COCO, Mapillary Vistas).

Features

Better results while being more efficient.
Unified view of semantic- and instance-level segmentation tasks.
Support major semantic segmentation datasets: ADE20K, Cityscapes, COCO-Stuff, Mapillary Vistas.
Support ALL Detectron2 models.

Installation

See installation instructions.

Getting Started

See Preparing Datasets for MaskFormer.

See Getting Started with MaskFormer.

Model Zoo and Baselines

We provide a large set of baseline results and trained models available for download in the MaskFormer Model Zoo.

License

Shield:

The majority of MaskFormer is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

However portions of the project are available under separate license terms: Swin-Transformer-Semantic-Segmentation is licensed under the MIT license.

Citing MaskFormer

If you use MaskFormer in your research or wish to refer to the baseline results published in the Model Zoo, please use the following BibTeX entry.

@inproceedings{cheng2021maskformer,
  title={Per-Pixel Classification is Not All You Need for Semantic Segmentation},
  author={Bowen Cheng and Alexander G. Schwing and Alexander Kirillov},
  journal={NeurIPS},
  year={2021}
}

maskformer's People

Contributors

Stargazers

Watchers

Forkers

wangg12 endernewton msathishkumar1990 windaway derrickwang005 kshaonan bluseking cv-ip alwaysz lzz-kit yotofu shujunyy123 zhaoxiaolong2020 templeblock githubltqc ravinsharma7 ctr26 hetao255 manakahasegawa pragyanaischool samjcheng hust-wayne snoopybingo zots0127 anna-debug dsouzavijeth mendelxu aiyongy zfxu jiarunliu namcofast johndpope messier202 zbwxp kairobo uclh qilinli seoulappbrewery jiachunwang sunmmming hanqiu-hq xiangnan1123 ssskyue siyisan avitb einsnull xaviolo99 rotcx rune-l jackhu-bme wadx2019 fudan-autonomous-driving-perception panghongwei17 yehcheshin collector-m theayushat phc-alchera chenhaocs ruhuawang ljm198134 kingwangjl zhangzaibin haojunyu1998 nickchang97 shengzhang90 jackccccheung nicojorgensen1 mt-cly jacksky64 seongwoongcho wutao-cs jiyang-zheng cj-mclaughlin lxl24 radiusai wangjing60755 2100877953 cpsxhao ajits-github hangfang6 guhuangai tinyloop sysu-leo jingligao yuyingyeh duanzhihua cv-seg mrma-t li-qingyun zongbowen yztongzhan lyttonkeepfoing one-green-bird mramzy25 zhaoziheng dl-vit rayat137 shankhanil jjessicayao myjxm

maskformer's Issues

Why coco model shouldn't pad image?

Error when demo.py is run: cannot connect to X server

Steps:

wget "https://dl.fbaipublicfiles.com/maskformer/semantic-ade20k/maskformer_R50_bs16_160k/model_final_d8dbeb.pkl"
python MaskFormer/demo/demo.py --config-file MaskFormer/configs/ade20k-150/maskformer_R50_bs16_160k.yaml --input image.jpg --opts MODEL.WEIGHTS model_final_d8dbeb.pkl

# image.jpg exists

The above returns:

[07/24 14:57:54 detectron2]: Arguments: Namespace(confidence_threshold=0.5, config_file='MaskFormer/configs/ade20k-150/maskformer_R50_bs16_160k.yaml', input=['image.jpg'], opts=['MODEL.WEIGHTS', 'model_final_d8dbeb.pkl'], output=None, video_input=None, webcam=False)
WARNING [07/24 14:57:54 fvcore.common.config]: Loading config MaskFormer/configs/ade20k-150/Base-ADE20K-150.yaml with yaml.unsafe_load. Your machine may be at risk if the file contains malicious content.
[07/24 14:57:58 fvcore.common.checkpoint]: [Checkpointer] Loading from model_final_d8dbeb.pkl ...
[07/24 14:57:58 fvcore.common.checkpoint]: Reading a file from 'MaskFormer Model Zoo'
Weight format of MaskFormerHead have changed! Please upgrade your models. Applying automatic conversion now ...
/usr/local/lib/python3.7/dist-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at  /pytorch/aten/src/ATen/native/BinaryOps.cpp:467.)
  return torch.floor_divide(self, other)
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at  /pytorch/c10/core/TensorImpl.h:1156.)
  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
[07/24 14:57:58 detectron2]: image.jpg: finished in 0.26s
: cannot connect to X server 
/usr/lib/python3.7/multiprocessing/semaphore_tracker.py:144: UserWarning: semaphore_tracker: There appear to be 1 leaked semaphores to clean up at shutdown
  len(cache))

What am I doing wrong?
I wanted to run the demo.py script with the model specified in configs/ade20k-150/maskformer_R50_bs16_160k.yaml to obtain the predictions for a single image (image.jpg)

Some questions about FLOPs

Hi Wencheng,
In your paper, the FLOPs of Swin-UperNet are 236, 259, 471,647 for Swin-T, Swin-S, Swin-B, Swin-L, respectively, but are different from the official FLOPs in Swin-Transformer.
Best.

CUDA out of memory under default config

Hey, Dear authors
I wanna reproduce this interesting work. However, I meet CUDA out of memory when using the default config files for coco stuff and coco panoptic. (batchsize per gpu is 4 and 2 for coco stuff and panoptic respectively.)
Our experiments encounter OOM every 10，000 iterations or even more. And we use 32g v100 too. So, i want to know if you use other codes or different environments.
below is my environment info

sys.platform linux Python 3.7.10 (default, Jun 4 2021, 14:48:32) [GCC 7.5.0] numpy 1.21.1 detectron2 0.5 @/mnt/lustre/zhujinguo/anaconda3/envs/d2/lib/python3.7/site-packages/detectron2 Compiler GCC 7.3 CUDA compiler CUDA 10.1 detectron2 arch flags 3.7, 5.0, 5.2, 6.0, 6.1, 7.0, 7.5 DETECTRON2_ENV_MODULE <not set> PyTorch 1.7.1+cu101 @/mnt/lustre/zhujinguo/anaconda3/envs/d2/lib/python3.7/site-packages/torch PyTorch debug build False GPU available Yes GPU 0,1,2,3,4,5,6,7 Tesla V100-SXM2-32GB (arch=7.0) CUDA_HOME /mnt/lustre/share/cuda-10.1 Pillow 8.3.1 torchvision 0.8.2+cu101 @/mnt/lustre/zhujinguo/anaconda3/envs/d2/lib/python3.7/site-packages/torchvision torchvision arch flags 3.5, 5.0, 6.0, 7.0, 7.5 fvcore 0.1.5.post20210722 iopath 0.1.8 cv2 4.5.3

Thank you!

The error when training

Thank you for your great work.
However, when i train the maskformer_swin_large_IN21k_384_bs16_160k_res640.yaml using the commend:
./train_net.py --num-gpus 2 --config-file configs/ade20k-150/swin/maskformer_swin_large_IN21k_384_bs16_160k_res640.yaml .

I got the following errors:
`MaskFormer Training Script.

This script is a simplified version of the training script in detectron2/tools.
: No such file or directory
import-im6.q16: not authorized copy' @ error/constitute.c/WriteImage/1037. import-im6.q16: not authorized itertools' @ error/constitute.c/WriteImage/1037.
import-im6.q16: not authorized logging' @ error/constitute.c/WriteImage/1037. import-im6.q16: not authorized os' @ error/constitute.c/WriteImage/1037.
from: can't read /var/mail/collections
from: can't read /var/mail/typing
import-im6.q16: not authorized torch' @ error/constitute.c/WriteImage/1037. import-im6.q16: not authorized comm' @ error/constitute.c/WriteImage/1037.
from: can't read /var/mail/detectron2.checkpoint
from: can't read /var/mail/detectron2.config
from: can't read /var/mail/detectron2.data
from: can't read /var/mail/detectron2.engine
./train_net.py: line 21: syntax error near unexpected token (' ./train_net.py: line 21: from detectron2.evaluation import ('`

Could you please tell me what is the problem and how to solve it?
thank you very much!

which version of pre-trained is used in paper?

In the conversion script, it is mentioned that torchvision models are not used in the official config. And model released by MSRA is not used in the paper. But the performance of your official config and paper is the same. So which version of the pre-trained model (R50) is used?

Mapillary Vistas per class metrics

Could you please share the validation metrics per class on the Mapillary dataset?

Calculation of PQ for semantic segmentation dataset

Hi, very nice work! I want to calculate the PQ for semantic segmentation dataset as shown in your paper. Which files should I modify？Or could you release the code for PQ calculation on semantic segmentation dataset?

The PQ value is inconsistent

In MODEL_ZOO.md the PQ value in the COCO Panoptic Segment table is inconsistent with the PQ value displayed in its metric file. For example, for the "R50 plus 6 Enc" configuration, the PQ in the table is 46.5, but 40.5 in the metric file. For "Swin-L" configuration, however, the PQ values in both the table and the metric file are 52.7. Does the metric files match the table exactly?

By the way, this work is nice!

[Question] Corner case handling in the dataset mapper

In mask_former_panoptic_dataset_mapper.py, masks.append(pan_seg_gt == segment_info["id"]) might turn out to be an empty mask since objects in the image can be removed in the previous data augmentation (e.g. crop), therefore generating an "empty gt object". I am wondering how the training will behavior in such a corner case.
Anyone can help me answer this question?

RuntimeError: CUDA error: device-side assert triggered

I ran the following command.

python train_net.py \
  --config-file configs/ade20k-150/maskformer_R50_bs16_160k.yaml \
  --num-gpus 1 SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0001

I got a CUDA error as below, how can I resolve it? The environment was created with Docker.

docker@a68944098dc2:/Study-MaskFormer$ python train_net.py \
>   --config-file configs/ade20k-150/maskformer_R50_bs16_160k.yaml \
>   --num-gpus 1 SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0001
Matplotlib created a temporary config/cache directory at /tmp/matplotlib-187ux1uq because the default path (/home/docker/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
Matplotlib created a temporary config/cache directory at /tmp/matplotlib-x6qwnu56 because the default path (/tmp/matplotlib-187ux1uq) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
Command Line Args: Namespace(config_file='configs/ade20k-150/maskformer_R50_bs16_160k.yaml', dist_url='tcp://127.0.0.1:50153', eval_only=False, machine_rank=0, num_gpus=1, num_machines=1, opts=['SOLVER.IMS_PER_BATCH', '2', 'SOLVER.BASE_LR', '0.0001'], resume=False)
Loading config configs/ade20k-150/Base-ADE20K-150.yaml with yaml.unsafe_load. Your machine may be at risk if the file contains malicious content.
[08/09 16:12:59 detectron2]: Rank of current process: 0. World size: 1
/.pyenv/versions/3.8.6/lib/python3.8/site-packages/setuptools/distutils_patch.py:25: UserWarning: Distutils was imported before Setuptools. This usage is discouraged and may exhibit undesirable behaviors or errors. Please use Setuptools' objects directly or at least import Setuptools first.
  warnings.warn(
[08/09 16:13:00 detectron2]: Environment info:
----------------------  ---------------------------------------------------------------------------
sys.platform            linux
Python                  3.8.6 (default, Aug  9 2021, 07:43:54) [GCC 7.5.0]
numpy                   1.21.1
detectron2              0.4 @/detectron2_repo/detectron2
Compiler                GCC 7.5
CUDA compiler           CUDA 10.1
detectron2 arch flags   3.5, 3.7, 5.0, 5.2, 5.3, 6.0, 6.1, 7.0, 7.5
DETECTRON2_ENV_MODULE   <not set>
PyTorch                 1.8.0+cu101 @/.pyenv/versions/3.8.6/lib/python3.8/site-packages/torch
PyTorch debug build     False
GPU available           True
GPU 0,1                 GeForce RTX 2080 Ti (arch=7.5)
CUDA_HOME               /usr/local/cuda
TORCH_CUDA_ARCH_LIST    Kepler;Kepler+Tesla;Maxwell;Maxwell+Tegra;Pascal;Volta;Turing
Pillow                  8.3.1
torchvision             0.9.0+cu101 @/.pyenv/versions/3.8.6/lib/python3.8/site-packages/torchvision
torchvision arch flags  3.5, 5.0, 6.0, 7.0, 7.5
fvcore                  0.1.3.post20210317
cv2                     4.5.3
----------------------  ---------------------------------------------------------------------------
PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 10.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70
  - CuDNN 7.6.3
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=10.1, CUDNN_VERSION=7.6.3, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

[08/09 16:13:00 detectron2]: Command line arguments: Namespace(config_file='configs/ade20k-150/maskformer_R50_bs16_160k.yaml', dist_url='tcp://127.0.0.1:50153', eval_only=False, machine_rank=0, num_gpus=1, num_machines=1, opts=['SOLVER.IMS_PER_BATCH', '2', 'SOLVER.BASE_LR', '0.0001'], resume=False)
[08/09 16:13:00 detectron2]: Contents of args.config_file=configs/ade20k-150/maskformer_R50_bs16_160k.yaml:
_BASE_: Base-ADE20K-150.yaml
MODEL:
  META_ARCHITECTURE: "MaskFormer"
  SEM_SEG_HEAD:
    NAME: "MaskFormerHead"
    IN_FEATURES: ["res2", "res3", "res4", "res5"]
    IGNORE_VALUE: 255
    NUM_CLASSES: 150
    COMMON_STRIDE: 4  # not used, hard-coded
    LOSS_WEIGHT: 1.0
    CONVS_DIM: 256
    MASK_DIM: 256
    NORM: "GN"
  MASK_FORMER:
    TRANSFORMER_IN_FEATURE: "res5"
    DEEP_SUPERVISION: True
    NO_OBJECT_WEIGHT: 0.1
    DICE_WEIGHT: 1.0
    MASK_WEIGHT: 20.0
    HIDDEN_DIM: 256
    NUM_OBJECT_QUERIES: 100
    NHEADS: 8
    DROPOUT: 0.1
    DIM_FEEDFORWARD: 2048
    ENC_LAYERS: 0
    DEC_LAYERS: 6
    PRE_NORM: False

[08/09 16:13:00 detectron2]: Running with full config:
CUDNN_BENCHMARK: False
DATALOADER:
  ASPECT_RATIO_GROUPING: True
  FILTER_EMPTY_ANNOTATIONS: True
  NUM_WORKERS: 4
  REPEAT_THRESHOLD: 0.0
  SAMPLER_TRAIN: TrainingSampler
DATASETS:
  PRECOMPUTED_PROPOSAL_TOPK_TEST: 1000
  PRECOMPUTED_PROPOSAL_TOPK_TRAIN: 2000
  PROPOSAL_FILES_TEST: ()
  PROPOSAL_FILES_TRAIN: ()
  TEST: ('ade20k_sem_seg_val',)
  TRAIN: ('ade20k_sem_seg_train',)
GLOBAL:
  HACK: 1.0
INPUT:
  COLOR_AUG_SSD: True
  CROP:
    ENABLED: True
    SINGLE_CATEGORY_MAX_AREA: 1.0
    SIZE: [512, 512]
    TYPE: absolute
  DATASET_MAPPER_NAME: mask_former_semantic
  FORMAT: RGB
  MASK_FORMAT: polygon
  MAX_SIZE_TEST: 2048
  MAX_SIZE_TRAIN: 2048
  MIN_SIZE_TEST: 512
  MIN_SIZE_TRAIN: (256, 307, 358, 409, 460, 512, 563, 614, 665, 716, 768, 819, 870, 921, 972, 1024)
  MIN_SIZE_TRAIN_SAMPLING: choice
  RANDOM_FLIP: horizontal
  SIZE_DIVISIBILITY: 512
MODEL:
  ANCHOR_GENERATOR:
    ANGLES: [[-90, 0, 90]]
    ASPECT_RATIOS: [[0.5, 1.0, 2.0]]
    NAME: DefaultAnchorGenerator
    OFFSET: 0.0
    SIZES: [[32, 64, 128, 256, 512]]
  BACKBONE:
    FREEZE_AT: 0
    NAME: build_resnet_backbone
  DEVICE: cuda
  FPN:
    FUSE_TYPE: sum
    IN_FEATURES: []
    NORM:
    OUT_CHANNELS: 256
  KEYPOINT_ON: False
  LOAD_PROPOSALS: False
  MASK_FORMER:
    DEC_LAYERS: 6
    DEEP_SUPERVISION: True
    DICE_WEIGHT: 1.0
    DIM_FEEDFORWARD: 2048
    DROPOUT: 0.1
    ENC_LAYERS: 0
    ENFORCE_INPUT_PROJ: False
    HIDDEN_DIM: 256
    MASK_WEIGHT: 20.0
    NHEADS: 8
    NO_OBJECT_WEIGHT: 0.1
    NUM_OBJECT_QUERIES: 100
    PRE_NORM: False
    SIZE_DIVISIBILITY: 32
    TEST:
      OBJECT_MASK_THRESHOLD: 0.0
      OVERLAP_THRESHOLD: 0.0
      PANOPTIC_ON: False
      SEM_SEG_POSTPROCESSING_BEFORE_INFERENCE: False
    TRANSFORMER_IN_FEATURE: res5
  MASK_ON: False
  META_ARCHITECTURE: MaskFormer
  PANOPTIC_FPN:
    COMBINE:
      ENABLED: True
      INSTANCES_CONFIDENCE_THRESH: 0.5
      OVERLAP_THRESH: 0.5
      STUFF_AREA_LIMIT: 4096
    INSTANCE_LOSS_WEIGHT: 1.0
  PIXEL_MEAN: [123.675, 116.28, 103.53]
  PIXEL_STD: [58.395, 57.12, 57.375]
  PROPOSAL_GENERATOR:
    MIN_SIZE: 0
    NAME: RPN
  RESNETS:
    DEFORM_MODULATED: False
    DEFORM_NUM_GROUPS: 1
    DEFORM_ON_PER_STAGE: [False, False, False, False]
    DEPTH: 50
    NORM: FrozenBN
    NUM_GROUPS: 1
    OUT_FEATURES: ['res2', 'res3', 'res4', 'res5']
    RES2_OUT_CHANNELS: 256
    RES4_DILATION: 1
    RES5_DILATION: 1
    RES5_MULTI_GRID: [1, 1, 1]
    STEM_OUT_CHANNELS: 64
    STEM_TYPE: basic
    STRIDE_IN_1X1: False
    WIDTH_PER_GROUP: 64
  RETINANET:
    BBOX_REG_LOSS_TYPE: smooth_l1
    BBOX_REG_WEIGHTS: (1.0, 1.0, 1.0, 1.0)
    FOCAL_LOSS_ALPHA: 0.25
    FOCAL_LOSS_GAMMA: 2.0
    IN_FEATURES: ['p3', 'p4', 'p5', 'p6', 'p7']
    IOU_LABELS: [0, -1, 1]
    IOU_THRESHOLDS: [0.4, 0.5]
    NMS_THRESH_TEST: 0.5
    NORM:
    NUM_CLASSES: 80
    NUM_CONVS: 4
    PRIOR_PROB: 0.01
    SCORE_THRESH_TEST: 0.05
    SMOOTH_L1_LOSS_BETA: 0.1
    TOPK_CANDIDATES_TEST: 1000
  ROI_BOX_CASCADE_HEAD:
    BBOX_REG_WEIGHTS: ((10.0, 10.0, 5.0, 5.0), (20.0, 20.0, 10.0, 10.0), (30.0, 30.0, 15.0, 15.0))
    IOUS: (0.5, 0.6, 0.7)
  ROI_BOX_HEAD:
    BBOX_REG_LOSS_TYPE: smooth_l1
    BBOX_REG_LOSS_WEIGHT: 1.0
    BBOX_REG_WEIGHTS: (10.0, 10.0, 5.0, 5.0)
    CLS_AGNOSTIC_BBOX_REG: False
    CONV_DIM: 256
    FC_DIM: 1024
    NAME:
    NORM:
    NUM_CONV: 0
    NUM_FC: 0
    POOLER_RESOLUTION: 14
    POOLER_SAMPLING_RATIO: 0
    POOLER_TYPE: ROIAlignV2
    SMOOTH_L1_BETA: 0.0
    TRAIN_ON_PRED_BOXES: False
  ROI_HEADS:
    BATCH_SIZE_PER_IMAGE: 512
    IN_FEATURES: ['res4']
    IOU_LABELS: [0, 1]
    IOU_THRESHOLDS: [0.5]
    NAME: Res5ROIHeads
    NMS_THRESH_TEST: 0.5
    NUM_CLASSES: 80
    POSITIVE_FRACTION: 0.25
    PROPOSAL_APPEND_GT: True
    SCORE_THRESH_TEST: 0.05
  ROI_KEYPOINT_HEAD:
    CONV_DIMS: (512, 512, 512, 512, 512, 512, 512, 512)
    LOSS_WEIGHT: 1.0
    MIN_KEYPOINTS_PER_IMAGE: 1
    NAME: KRCNNConvDeconvUpsampleHead
    NORMALIZE_LOSS_BY_VISIBLE_KEYPOINTS: True
    NUM_KEYPOINTS: 17
    POOLER_RESOLUTION: 14
    POOLER_SAMPLING_RATIO: 0
    POOLER_TYPE: ROIAlignV2
  ROI_MASK_HEAD:
    CLS_AGNOSTIC_MASK: False
    CONV_DIM: 256
    NAME: MaskRCNNConvUpsampleHead
    NORM:
    NUM_CONV: 0
    POOLER_RESOLUTION: 14
    POOLER_SAMPLING_RATIO: 0
    POOLER_TYPE: ROIAlignV2
  RPN:
    BATCH_SIZE_PER_IMAGE: 256
    BBOX_REG_LOSS_TYPE: smooth_l1
    BBOX_REG_LOSS_WEIGHT: 1.0
    BBOX_REG_WEIGHTS: (1.0, 1.0, 1.0, 1.0)
    BOUNDARY_THRESH: -1
    HEAD_NAME: StandardRPNHead
    IN_FEATURES: ['res4']
    IOU_LABELS: [0, -1, 1]
    IOU_THRESHOLDS: [0.3, 0.7]
    LOSS_WEIGHT: 1.0
    NMS_THRESH: 0.7
    POSITIVE_FRACTION: 0.5
    POST_NMS_TOPK_TEST: 1000
    POST_NMS_TOPK_TRAIN: 2000
    PRE_NMS_TOPK_TEST: 6000
    PRE_NMS_TOPK_TRAIN: 12000
    SMOOTH_L1_BETA: 0.0
  SEM_SEG_HEAD:
    ASPP_CHANNELS: 256
    ASPP_DILATIONS: [6, 12, 18]
    ASPP_DROPOUT: 0.1
    COMMON_STRIDE: 4
    CONVS_DIM: 256
    IGNORE_VALUE: 255
    IN_FEATURES: ['res2', 'res3', 'res4', 'res5']
    LOSS_TYPE: hard_pixel_mining
    LOSS_WEIGHT: 1.0
    MASK_DIM: 256
    NAME: MaskFormerHead
    NORM: GN
    NUM_CLASSES: 150
    PIXEL_DECODER_NAME: BasePixelDecoder
    PROJECT_CHANNELS: [48]
    PROJECT_FEATURES: ['res2']
    TRANSFORMER_ENC_LAYERS: 0
    USE_DEPTHWISE_SEPARABLE_CONV: False
  SWIN:
    APE: False
    ATTN_DROP_RATE: 0.0
    DEPTHS: [2, 2, 6, 2]
    DROP_PATH_RATE: 0.3
    DROP_RATE: 0.0
    EMBED_DIM: 96
    MLP_RATIO: 4.0
    NUM_HEADS: [3, 6, 12, 24]
    OUT_FEATURES: ['res2', 'res3', 'res4', 'res5']
    PATCH_NORM: True
    PATCH_SIZE: 4
    PRETRAIN_IMG_SIZE: 224
    QKV_BIAS: True
    QK_SCALE: None
    WINDOW_SIZE: 7
  WEIGHTS: detectron2://ImageNetPretrained/torchvision/R-50.pkl
OUTPUT_DIR: ./output
SEED: -1
SOLVER:
  AMP:
    ENABLED: False
  BACKBONE_MULTIPLIER: 0.1
  BASE_LR: 0.0001
  BIAS_LR_FACTOR: 1.0
  CHECKPOINT_PERIOD: 5000
  CLIP_GRADIENTS:
    CLIP_TYPE: full_model
    CLIP_VALUE: 0.01
    ENABLED: True
    NORM_TYPE: 2.0
  GAMMA: 0.1
  IMS_PER_BATCH: 2
  LR_SCHEDULER_NAME: WarmupPolyLR
  MAX_ITER: 160000
  MOMENTUM: 0.9
  NESTEROV: False
  OPTIMIZER: ADAMW
  POLY_LR_CONSTANT_ENDING: 0.0
  POLY_LR_POWER: 0.9
  REFERENCE_WORLD_SIZE: 0
  STEPS: (30000,)
  WARMUP_FACTOR: 1.0
  WARMUP_ITERS: 0
  WARMUP_METHOD: linear
  WEIGHT_DECAY: 0.0001
  WEIGHT_DECAY_BIAS: 0.0001
  WEIGHT_DECAY_EMBED: 0.0
  WEIGHT_DECAY_NORM: 0.0
TEST:
  AUG:
    ENABLED: False
    FLIP: True
    MAX_SIZE: 3584
    MIN_SIZES: (256, 384, 512, 640, 768, 896)
  DETECTIONS_PER_IMAGE: 100
  EVAL_PERIOD: 5000
  EXPECTED_RESULTS: []
  KEYPOINT_OKS_SIGMAS: []
  PRECISE_BN:
    ENABLED: False
    NUM_ITER: 200
VERSION: 2
VIS_PERIOD: 0
[08/09 16:13:00 detectron2]: Full config saved to ./output/config.yaml
[08/09 16:13:00 d2.utils.env]: Using a generated random seed 881166
[08/09 16:13:04 d2.engine.defaults]: Model:
MaskFormer(
  (backbone): ResNet(
    (stem): BasicStem(
      (conv1): Conv2d(
        3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False
        (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
      )
    )
    (res2): Sequential(
      (0): BottleneckBlock(
        (shortcut): Conv2d(
          64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
        (conv1): Conv2d(
          64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
        )
        (conv2): Conv2d(
          64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
        )
        (conv3): Conv2d(
          64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
      )
      (1): BottleneckBlock(
        (conv1): Conv2d(
          256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
        )
        (conv2): Conv2d(
          64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
        )
        (conv3): Conv2d(
          64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
      )
      (2): BottleneckBlock(
        (conv1): Conv2d(
          256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
        )
        (conv2): Conv2d(
          64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
        )
        (conv3): Conv2d(
          64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
      )
    )
    (res3): Sequential(
      (0): BottleneckBlock(
        (shortcut): Conv2d(
          256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False
          (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
        )
        (conv1): Conv2d(
          256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
        )
        (conv2): Conv2d(
          128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
        )
        (conv3): Conv2d(
          128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
        )
      )
      (1): BottleneckBlock(
        (conv1): Conv2d(
          512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
        )
        (conv2): Conv2d(
          128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
        )
        (conv3): Conv2d(
          128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
        )
      )
      (2): BottleneckBlock(
        (conv1): Conv2d(
          512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
        )
        (conv2): Conv2d(
          128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
        )
        (conv3): Conv2d(
          128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
        )
      )
      (3): BottleneckBlock(
        (conv1): Conv2d(
          512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
        )
        (conv2): Conv2d(
          128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
        )
        (conv3): Conv2d(
          128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
        )
      )
    )
    (res4): Sequential(
      (0): BottleneckBlock(
        (shortcut): Conv2d(
          512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False
          (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
        )
        (conv1): Conv2d(
          512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
        (conv2): Conv2d(
          256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
        (conv3): Conv2d(
          256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
        )
      )
      (1): BottleneckBlock(
        (conv1): Conv2d(
          1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
        (conv2): Conv2d(
          256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
        (conv3): Conv2d(
          256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
        )
      )
      (2): BottleneckBlock(
        (conv1): Conv2d(
          1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
        (conv2): Conv2d(
          256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
        (conv3): Conv2d(
          256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
        )
      )
      (3): BottleneckBlock(
        (conv1): Conv2d(
          1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
        (conv2): Conv2d(
          256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
        (conv3): Conv2d(
          256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
        )
      )
      (4): BottleneckBlock(
        (conv1): Conv2d(
          1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
        (conv2): Conv2d(
          256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
        (conv3): Conv2d(
          256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
        )
      )
      (5): BottleneckBlock(
        (conv1): Conv2d(
          1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
        (conv2): Conv2d(
          256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
        )
        (conv3): Conv2d(
          256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
        )
      )
    )
    (res5): Sequential(
      (0): BottleneckBlock(
        (shortcut): Conv2d(
          1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False
          (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
        )
        (conv1): Conv2d(
          1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
        )
        (conv2): Conv2d(
          512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
        )
        (conv3): Conv2d(
          512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
        )
      )
      (1): BottleneckBlock(
        (conv1): Conv2d(
          2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
        )
        (conv2): Conv2d(
          512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
        )
        (conv3): Conv2d(
          512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
        )
      )
      (2): BottleneckBlock(
        (conv1): Conv2d(
          2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
        )
        (conv2): Conv2d(
          512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
        )
        (conv3): Conv2d(
          512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
          (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
        )
      )
    )
  )
  (sem_seg_head): MaskFormerHead(
    (pixel_decoder): BasePixelDecoder(
      (adapter_1): Conv2d(
        256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
        (norm): GroupNorm(32, 256, eps=1e-05, affine=True)
      )
      (layer_1): Conv2d(
        256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
        (norm): GroupNorm(32, 256, eps=1e-05, affine=True)
      )
      (adapter_2): Conv2d(
        512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
        (norm): GroupNorm(32, 256, eps=1e-05, affine=True)
      )
      (layer_2): Conv2d(
        256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
        (norm): GroupNorm(32, 256, eps=1e-05, affine=True)
      )
      (adapter_3): Conv2d(
        1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
        (norm): GroupNorm(32, 256, eps=1e-05, affine=True)
      )
      (layer_3): Conv2d(
        256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
        (norm): GroupNorm(32, 256, eps=1e-05, affine=True)
      )
      (layer_4): Conv2d(
        2048, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
        (norm): GroupNorm(32, 256, eps=1e-05, affine=True)
      )
      (mask_features): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    )
    (predictor): TransformerPredictor(
      (pe_layer): PositionEmbeddingSine()
      (transformer): Transformer(
        (encoder): TransformerEncoder(
          (layers): ModuleList()
        )
        (decoder): TransformerDecoder(
          (layers): ModuleList(
            (0): TransformerDecoderLayer(
              (self_attn): MultiheadAttention(
                (out_proj): _LinearWithBias(in_features=256, out_features=256, bias=True)
              )
              (multihead_attn): MultiheadAttention(
                (out_proj): _LinearWithBias(in_features=256, out_features=256, bias=True)
              )
              (linear1): Linear(in_features=256, out_features=2048, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
              (linear2): Linear(in_features=2048, out_features=256, bias=True)
              (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
              (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
              (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
              (dropout1): Dropout(p=0.1, inplace=False)
              (dropout2): Dropout(p=0.1, inplace=False)
              (dropout3): Dropout(p=0.1, inplace=False)
            )
            (1): TransformerDecoderLayer(
              (self_attn): MultiheadAttention(
                (out_proj): _LinearWithBias(in_features=256, out_features=256, bias=True)
              )
              (multihead_attn): MultiheadAttention(
                (out_proj): _LinearWithBias(in_features=256, out_features=256, bias=True)
              )
              (linear1): Linear(in_features=256, out_features=2048, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
              (linear2): Linear(in_features=2048, out_features=256, bias=True)
              (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
              (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
              (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
              (dropout1): Dropout(p=0.1, inplace=False)
              (dropout2): Dropout(p=0.1, inplace=False)
              (dropout3): Dropout(p=0.1, inplace=False)
            )
            (2): TransformerDecoderLayer(
              (self_attn): MultiheadAttention(
                (out_proj): _LinearWithBias(in_features=256, out_features=256, bias=True)
              )
              (multihead_attn): MultiheadAttention(
                (out_proj): _LinearWithBias(in_features=256, out_features=256, bias=True)
              )
              (linear1): Linear(in_features=256, out_features=2048, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
              (linear2): Linear(in_features=2048, out_features=256, bias=True)
              (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
              (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
              (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
              (dropout1): Dropout(p=0.1, inplace=False)
              (dropout2): Dropout(p=0.1, inplace=False)
              (dropout3): Dropout(p=0.1, inplace=False)
            )
            (3): TransformerDecoderLayer(
              (self_attn): MultiheadAttention(
                (out_proj): _LinearWithBias(in_features=256, out_features=256, bias=True)
              )
              (multihead_attn): MultiheadAttention(
                (out_proj): _LinearWithBias(in_features=256, out_features=256, bias=True)
              )
              (linear1): Linear(in_features=256, out_features=2048, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
              (linear2): Linear(in_features=2048, out_features=256, bias=True)
              (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
              (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
              (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
              (dropout1): Dropout(p=0.1, inplace=False)
              (dropout2): Dropout(p=0.1, inplace=False)
              (dropout3): Dropout(p=0.1, inplace=False)
            )
            (4): TransformerDecoderLayer(
              (self_attn): MultiheadAttention(
                (out_proj): _LinearWithBias(in_features=256, out_features=256, bias=True)
              )
              (multihead_attn): MultiheadAttention(
                (out_proj): _LinearWithBias(in_features=256, out_features=256, bias=True)
              )
              (linear1): Linear(in_features=256, out_features=2048, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
              (linear2): Linear(in_features=2048, out_features=256, bias=True)
              (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
              (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
              (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
              (dropout1): Dropout(p=0.1, inplace=False)
              (dropout2): Dropout(p=0.1, inplace=False)
              (dropout3): Dropout(p=0.1, inplace=False)
            )
            (5): TransformerDecoderLayer(
              (self_attn): MultiheadAttention(
                (out_proj): _LinearWithBias(in_features=256, out_features=256, bias=True)
              )
              (multihead_attn): MultiheadAttention(
                (out_proj): _LinearWithBias(in_features=256, out_features=256, bias=True)
              )
              (linear1): Linear(in_features=256, out_features=2048, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
              (linear2): Linear(in_features=2048, out_features=256, bias=True)
              (norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
              (norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
              (norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
              (dropout1): Dropout(p=0.1, inplace=False)
              (dropout2): Dropout(p=0.1, inplace=False)
              (dropout3): Dropout(p=0.1, inplace=False)
            )
          )
          (norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
        )
      )
      (query_embed): Embedding(100, 256)
      (input_proj): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1))
      (class_embed): Linear(in_features=256, out_features=151, bias=True)
      (mask_embed): MLP(
        (layers): ModuleList(
          (0): Linear(in_features=256, out_features=256, bias=True)
          (1): Linear(in_features=256, out_features=256, bias=True)
          (2): Linear(in_features=256, out_features=256, bias=True)
        )
      )
    )
  )
  (criterion): SetCriterion(
    (matcher): Matcher HungarianMatcher
        cost_class: 1
        cost_mask: 20.0
        cost_dice: 1.0
  )
)
[08/09 16:13:04 mask_former.data.dataset_mappers.mask_former_semantic_dataset_mapper]: [MaskFormerSemanticDatasetMapper] Augmentations used in training: [ResizeShortestEdge(short_edge_length=..., max_size=2048, sample_style='choice'), RandomCrop_CategoryAreaConstraint(crop_type='absolute', crop_size=[512, 512], single_category_max_area=1.0, ignored_category=255), <detectron2.projects.point_rend.color_augmentation.ColorAugSSDTransform object at 0x7fb8efc1b040>, RandomFlip()]
[08/09 16:13:05 d2.data.datasets.coco]: Loaded 20210 images with semantic segmentation from datasets/ADEChallengeData2016/images/training
[08/09 16:13:05 d2.data.build]: Using training sampler TrainingSampler
[08/09 16:13:05 d2.data.common]: Serializing 20210 elements to byte tensors and concatenating them all ...
[08/09 16:13:05 d2.data.common]: Serialized dataset takes 3.97 MiB
[08/09 16:13:05 fvcore.common.checkpoint]: Loading checkpoint from detectron2://ImageNetPretrained/torchvision/R-50.pkl
/home/docker/.torch/iopath_cache is not accessible! Using /tmp/iopath_cache instead!
R-50.pkl: 102MB [00:09, 11.2MB/s]
[08/09 16:13:14 fvcore.common.checkpoint]: Reading a file from 'torchvision'
[08/09 16:13:14 d2.checkpoint.c2_model_loading]: Following weights matched with submodule backbone:
| Names in Model    | Names in Checkpoint                                                               | Shapes                                          |
|:------------------|:----------------------------------------------------------------------------------|:------------------------------------------------|
| res2.0.conv1.*    | res2.0.conv1.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight}    | (64,) (64,) (64,) (64,) (64,64,1,1)             |
| res2.0.conv2.*    | res2.0.conv2.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight}    | (64,) (64,) (64,) (64,) (64,64,3,3)             |
| res2.0.conv3.*    | res2.0.conv3.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight}    | (256,) (256,) (256,) (256,) (256,64,1,1)        |
| res2.0.shortcut.* | res2.0.shortcut.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (256,) (256,) (256,) (256,) (256,64,1,1)        |
| res2.1.conv1.*    | res2.1.conv1.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight}    | (64,) (64,) (64,) (64,) (64,256,1,1)            |
| res2.1.conv2.*    | res2.1.conv2.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight}    | (64,) (64,) (64,) (64,) (64,64,3,3)             |
| res2.1.conv3.*    | res2.1.conv3.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight}    | (256,) (256,) (256,) (256,) (256,64,1,1)        |
| res2.2.conv1.*    | res2.2.conv1.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight}    | (64,) (64,) (64,) (64,) (64,256,1,1)            |
| res2.2.conv2.*    | res2.2.conv2.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight}    | (64,) (64,) (64,) (64,) (64,64,3,3)             |
| res2.2.conv3.*    | res2.2.conv3.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight}    | (256,) (256,) (256,) (256,) (256,64,1,1)        |
| res3.0.conv1.*    | res3.0.conv1.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight}    | (128,) (128,) (128,) (128,) (128,256,1,1)       |
| res3.0.conv2.*    | res3.0.conv2.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight}    | (128,) (128,) (128,) (128,) (128,128,3,3)       |
| res3.0.conv3.*    | res3.0.conv3.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight}    | (512,) (512,) (512,) (512,) (512,128,1,1)       |
| res3.0.shortcut.* | res3.0.shortcut.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (512,) (512,) (512,) (512,) (512,256,1,1)       |
| res3.1.conv1.*    | res3.1.conv1.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight}    | (128,) (128,) (128,) (128,) (128,512,1,1)       |
| res3.1.conv2.*    | res3.1.conv2.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight}    | (128,) (128,) (128,) (128,) (128,128,3,3)       |
| res3.1.conv3.*    | res3.1.conv3.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight}    | (512,) (512,) (512,) (512,) (512,128,1,1)       |
| res3.2.conv1.*    | res3.2.conv1.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight}    | (128,) (128,) (128,) (128,) (128,512,1,1)       |
| res3.2.conv2.*    | res3.2.conv2.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight}    | (128,) (128,) (128,) (128,) (128,128,3,3)       |
| res3.2.conv3.*    | res3.2.conv3.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight}    | (512,) (512,) (512,) (512,) (512,128,1,1)       |
| res3.3.conv1.*    | res3.3.conv1.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight}    | (128,) (128,) (128,) (128,) (128,512,1,1)       |
| res3.3.conv2.*    | res3.3.conv2.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight}    | (128,) (128,) (128,) (128,) (128,128,3,3)       |
| res3.3.conv3.*    | res3.3.conv3.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight}    | (512,) (512,) (512,) (512,) (512,128,1,1)       |
| res4.0.conv1.*    | res4.0.conv1.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight}    | (256,) (256,) (256,) (256,) (256,512,1,1)       |
| res4.0.conv2.*    | res4.0.conv2.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight}    | (256,) (256,) (256,) (256,) (256,256,3,3)       |
| res4.0.conv3.*    | res4.0.conv3.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight}    | (1024,) (1024,) (1024,) (1024,) (1024,256,1,1)  |
| res4.0.shortcut.* | res4.0.shortcut.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (1024,) (1024,) (1024,) (1024,) (1024,512,1,1)  |
| res4.1.conv1.*    | res4.1.conv1.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight}    | (256,) (256,) (256,) (256,) (256,1024,1,1)      |
| res4.1.conv2.*    | res4.1.conv2.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight}    | (256,) (256,) (256,) (256,) (256,256,3,3)       |
| res4.1.conv3.*    | res4.1.conv3.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight}    | (1024,) (1024,) (1024,) (1024,) (1024,256,1,1)  |
| res4.2.conv1.*    | res4.2.conv1.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight}    | (256,) (256,) (256,) (256,) (256,1024,1,1)      |
| res4.2.conv2.*    | res4.2.conv2.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight}    | (256,) (256,) (256,) (256,) (256,256,3,3)       |
| res4.2.conv3.*    | res4.2.conv3.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight}    | (1024,) (1024,) (1024,) (1024,) (1024,256,1,1)  |
| res4.3.conv1.*    | res4.3.conv1.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight}    | (256,) (256,) (256,) (256,) (256,1024,1,1)      |
| res4.3.conv2.*    | res4.3.conv2.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight}    | (256,) (256,) (256,) (256,) (256,256,3,3)       |
| res4.3.conv3.*    | res4.3.conv3.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight}    | (1024,) (1024,) (1024,) (1024,) (1024,256,1,1)  |
| res4.4.conv1.*    | res4.4.conv1.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight}    | (256,) (256,) (256,) (256,) (256,1024,1,1)      |
| res4.4.conv2.*    | res4.4.conv2.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight}    | (256,) (256,) (256,) (256,) (256,256,3,3)       |
| res4.4.conv3.*    | res4.4.conv3.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight}    | (1024,) (1024,) (1024,) (1024,) (1024,256,1,1)  |
| res4.5.conv1.*    | res4.5.conv1.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight}    | (256,) (256,) (256,) (256,) (256,1024,1,1)      |
| res4.5.conv2.*    | res4.5.conv2.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight}    | (256,) (256,) (256,) (256,) (256,256,3,3)       |
| res4.5.conv3.*    | res4.5.conv3.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight}    | (1024,) (1024,) (1024,) (1024,) (1024,256,1,1)  |
| res5.0.conv1.*    | res5.0.conv1.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight}    | (512,) (512,) (512,) (512,) (512,1024,1,1)      |
| res5.0.conv2.*    | res5.0.conv2.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight}    | (512,) (512,) (512,) (512,) (512,512,3,3)       |
| res5.0.conv3.*    | res5.0.conv3.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight}    | (2048,) (2048,) (2048,) (2048,) (2048,512,1,1)  |
| res5.0.shortcut.* | res5.0.shortcut.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight} | (2048,) (2048,) (2048,) (2048,) (2048,1024,1,1) |
| res5.1.conv1.*    | res5.1.conv1.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight}    | (512,) (512,) (512,) (512,) (512,2048,1,1)      |
| res5.1.conv2.*    | res5.1.conv2.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight}    | (512,) (512,) (512,) (512,) (512,512,3,3)       |
| res5.1.conv3.*    | res5.1.conv3.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight}    | (2048,) (2048,) (2048,) (2048,) (2048,512,1,1)  |
| res5.2.conv1.*    | res5.2.conv1.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight}    | (512,) (512,) (512,) (512,) (512,2048,1,1)      |
| res5.2.conv2.*    | res5.2.conv2.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight}    | (512,) (512,) (512,) (512,) (512,512,3,3)       |
| res5.2.conv3.*    | res5.2.conv3.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight}    | (2048,) (2048,) (2048,) (2048,) (2048,512,1,1)  |
| stem.conv1.*      | stem.conv1.{norm.bias,norm.running_mean,norm.running_var,norm.weight,weight}      | (64,) (64,) (64,) (64,) (64,3,7,7)              |
[08/09 16:13:14 fvcore.common.checkpoint]: Some model parameters or buffers are not found in the checkpoint:
criterion.empty_weight
sem_seg_head.pixel_decoder.adapter_1.norm.{bias, weight}
sem_seg_head.pixel_decoder.adapter_1.weight
sem_seg_head.pixel_decoder.adapter_2.norm.{bias, weight}
sem_seg_head.pixel_decoder.adapter_2.weight
sem_seg_head.pixel_decoder.adapter_3.norm.{bias, weight}
sem_seg_head.pixel_decoder.adapter_3.weight
sem_seg_head.pixel_decoder.layer_1.norm.{bias, weight}
sem_seg_head.pixel_decoder.layer_1.weight
sem_seg_head.pixel_decoder.layer_2.norm.{bias, weight}
sem_seg_head.pixel_decoder.layer_2.weight
sem_seg_head.pixel_decoder.layer_3.norm.{bias, weight}
sem_seg_head.pixel_decoder.layer_3.weight
sem_seg_head.pixel_decoder.layer_4.norm.{bias, weight}
sem_seg_head.pixel_decoder.layer_4.weight
sem_seg_head.pixel_decoder.mask_features.{bias, weight}
sem_seg_head.predictor.class_embed.{bias, weight}
sem_seg_head.predictor.input_proj.{bias, weight}
sem_seg_head.predictor.mask_embed.layers.0.{bias, weight}
sem_seg_head.predictor.mask_embed.layers.1.{bias, weight}
sem_seg_head.predictor.mask_embed.layers.2.{bias, weight}
sem_seg_head.predictor.query_embed.weight
sem_seg_head.predictor.transformer.decoder.layers.0.linear1.{bias, weight}
sem_seg_head.predictor.transformer.decoder.layers.0.linear2.{bias, weight}
sem_seg_head.predictor.transformer.decoder.layers.0.multihead_attn.out_proj.{bias, weight}
sem_seg_head.predictor.transformer.decoder.layers.0.multihead_attn.{in_proj_bias, in_proj_weight}
sem_seg_head.predictor.transformer.decoder.layers.0.norm1.{bias, weight}
sem_seg_head.predictor.transformer.decoder.layers.0.norm2.{bias, weight}
sem_seg_head.predictor.transformer.decoder.layers.0.norm3.{bias, weight}
sem_seg_head.predictor.transformer.decoder.layers.0.self_attn.out_proj.{bias, weight}
sem_seg_head.predictor.transformer.decoder.layers.0.self_attn.{in_proj_bias, in_proj_weight}
sem_seg_head.predictor.transformer.decoder.layers.1.linear1.{bias, weight}
sem_seg_head.predictor.transformer.decoder.layers.1.linear2.{bias, weight}
sem_seg_head.predictor.transformer.decoder.layers.1.multihead_attn.out_proj.{bias, weight}
sem_seg_head.predictor.transformer.decoder.layers.1.multihead_attn.{in_proj_bias, in_proj_weight}
sem_seg_head.predictor.transformer.decoder.layers.1.norm1.{bias, weight}
sem_seg_head.predictor.transformer.decoder.layers.1.norm2.{bias, weight}
sem_seg_head.predictor.transformer.decoder.layers.1.norm3.{bias, weight}
sem_seg_head.predictor.transformer.decoder.layers.1.self_attn.out_proj.{bias, weight}
sem_seg_head.predictor.transformer.decoder.layers.1.self_attn.{in_proj_bias, in_proj_weight}
sem_seg_head.predictor.transformer.decoder.layers.2.linear1.{bias, weight}
sem_seg_head.predictor.transformer.decoder.layers.2.linear2.{bias, weight}
sem_seg_head.predictor.transformer.decoder.layers.2.multihead_attn.out_proj.{bias, weight}
sem_seg_head.predictor.transformer.decoder.layers.2.multihead_attn.{in_proj_bias, in_proj_weight}
sem_seg_head.predictor.transformer.decoder.layers.2.norm1.{bias, weight}
sem_seg_head.predictor.transformer.decoder.layers.2.norm2.{bias, weight}
sem_seg_head.predictor.transformer.decoder.layers.2.norm3.{bias, weight}
sem_seg_head.predictor.transformer.decoder.layers.2.self_attn.out_proj.{bias, weight}
sem_seg_head.predictor.transformer.decoder.layers.2.self_attn.{in_proj_bias, in_proj_weight}
sem_seg_head.predictor.transformer.decoder.layers.3.linear1.{bias, weight}
sem_seg_head.predictor.transformer.decoder.layers.3.linear2.{bias, weight}
sem_seg_head.predictor.transformer.decoder.layers.3.multihead_attn.out_proj.{bias, weight}
sem_seg_head.predictor.transformer.decoder.layers.3.multihead_attn.{in_proj_bias, in_proj_weight}
sem_seg_head.predictor.transformer.decoder.layers.3.norm1.{bias, weight}
sem_seg_head.predictor.transformer.decoder.layers.3.norm2.{bias, weight}
sem_seg_head.predictor.transformer.decoder.layers.3.norm3.{bias, weight}
sem_seg_head.predictor.transformer.decoder.layers.3.self_attn.out_proj.{bias, weight}
sem_seg_head.predictor.transformer.decoder.layers.3.self_attn.{in_proj_bias, in_proj_weight}
sem_seg_head.predictor.transformer.decoder.layers.4.linear1.{bias, weight}
sem_seg_head.predictor.transformer.decoder.layers.4.linear2.{bias, weight}
sem_seg_head.predictor.transformer.decoder.layers.4.multihead_attn.out_proj.{bias, weight}
sem_seg_head.predictor.transformer.decoder.layers.4.multihead_attn.{in_proj_bias, in_proj_weight}
sem_seg_head.predictor.transformer.decoder.layers.4.norm1.{bias, weight}
sem_seg_head.predictor.transformer.decoder.layers.4.norm2.{bias, weight}
sem_seg_head.predictor.transformer.decoder.layers.4.norm3.{bias, weight}
sem_seg_head.predictor.transformer.decoder.layers.4.self_attn.out_proj.{bias, weight}
sem_seg_head.predictor.transformer.decoder.layers.4.self_attn.{in_proj_bias, in_proj_weight}
sem_seg_head.predictor.transformer.decoder.layers.5.linear1.{bias, weight}
sem_seg_head.predictor.transformer.decoder.layers.5.linear2.{bias, weight}
sem_seg_head.predictor.transformer.decoder.layers.5.multihead_attn.out_proj.{bias, weight}
sem_seg_head.predictor.transformer.decoder.layers.5.multihead_attn.{in_proj_bias, in_proj_weight}
sem_seg_head.predictor.transformer.decoder.layers.5.norm1.{bias, weight}
sem_seg_head.predictor.transformer.decoder.layers.5.norm2.{bias, weight}
sem_seg_head.predictor.transformer.decoder.layers.5.norm3.{bias, weight}
sem_seg_head.predictor.transformer.decoder.layers.5.self_attn.out_proj.{bias, weight}
sem_seg_head.predictor.transformer.decoder.layers.5.self_attn.{in_proj_bias, in_proj_weight}
sem_seg_head.predictor.transformer.decoder.norm.{bias, weight}
[08/09 16:13:14 fvcore.common.checkpoint]: The checkpoint state_dict contains keys that are not used by the model:
  stem.fc.{bias, weight}
[08/09 16:13:14 d2.engine.train_loop]: Starting training from iteration 0
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [67,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [71,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [75,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [79,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [83,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [87,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [91,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [95,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [99,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [103,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [107,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [111,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [115,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [119,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [123,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [127,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [3,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [7,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [11,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [15,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [19,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [23,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [27,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [31,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [35,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [39,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [43,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [47,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [51,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [55,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [59,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:142: operator(): block: [0,0,0], thread: [63,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
ERROR [08/09 16:13:15 d2.engine.train_loop]: Exception during training:
Traceback (most recent call last):
  File "/detectron2_repo/detectron2/engine/train_loop.py", line 138, in train
    self.run_step()
  File "/detectron2_repo/detectron2/engine/defaults.py", line 441, in run_step
    self._trainer.run_step()
  File "/detectron2_repo/detectron2/engine/train_loop.py", line 232, in run_step
    loss_dict = self.model(data)
  File "/.pyenv/versions/3.8.6/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Study-MaskFormer/mask_former/mask_former_model.py", line 180, in forward
    losses = self.criterion(outputs, targets)
  File "/.pyenv/versions/3.8.6/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Study-MaskFormer/mask_former/modeling/criterion.py", line 162, in forward
    indices = self.matcher(outputs_without_aux, targets)
  File "/.pyenv/versions/3.8.6/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/.pyenv/versions/3.8.6/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/Study-MaskFormer/mask_former/modeling/matcher.py", line 163, in forward
    return self.memory_efficient_forward(outputs, targets)
  File "/.pyenv/versions/3.8.6/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/Study-MaskFormer/mask_former/modeling/matcher.py", line 123, in memory_efficient_forward
    cost_mask = batch_sigmoid_focal_loss(out_mask, tgt_mask)
  File "/Study-MaskFormer/mask_former/modeling/matcher.py", line 49, in batch_sigmoid_focal_loss
    focal_pos = ((1 - prob) ** gamma) * F.binary_cross_entropy_with_logits(
  File "/.pyenv/versions/3.8.6/lib/python3.8/site-packages/torch/tensor.py", line 528, in __rsub__
    return _C._VariableFunctions.rsub(self, other)
RuntimeError: CUDA error: device-side assert triggered
[08/09 16:13:15 d2.engine.hooks]: Total training time: 0:00:00 (0:00:00 on hooks)
[08/09 16:13:15 d2.utils.events]:  iter: 0    lr: N/A  max_mem: 1009M
Traceback (most recent call last):
  File "train_net.py", line 264, in <module>
    launch(
  File "/detectron2_repo/detectron2/engine/launch.py", line 62, in launch
    main_func(*args)
  File "train_net.py", line 258, in main
    return trainer.train()
  File "/detectron2_repo/detectron2/engine/defaults.py", line 431, in train
    super().train(self.start_iter, self.max_iter)
  File "/detectron2_repo/detectron2/engine/train_loop.py", line 138, in train
    self.run_step()
  File "/detectron2_repo/detectron2/engine/defaults.py", line 441, in run_step
    self._trainer.run_step()
  File "/detectron2_repo/detectron2/engine/train_loop.py", line 232, in run_step
    loss_dict = self.model(data)
  File "/.pyenv/versions/3.8.6/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Study-MaskFormer/mask_former/mask_former_model.py", line 180, in forward
    losses = self.criterion(outputs, targets)
  File "/.pyenv/versions/3.8.6/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Study-MaskFormer/mask_former/modeling/criterion.py", line 162, in forward
    indices = self.matcher(outputs_without_aux, targets)
  File "/.pyenv/versions/3.8.6/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/.pyenv/versions/3.8.6/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/Study-MaskFormer/mask_former/modeling/matcher.py", line 163, in forward
    return self.memory_efficient_forward(outputs, targets)
  File "/.pyenv/versions/3.8.6/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/Study-MaskFormer/mask_former/modeling/matcher.py", line 123, in memory_efficient_forward
    cost_mask = batch_sigmoid_focal_loss(out_mask, tgt_mask)
  File "/Study-MaskFormer/mask_former/modeling/matcher.py", line 49, in batch_sigmoid_focal_loss
    focal_pos = ((1 - prob) ** gamma) * F.binary_cross_entropy_with_logits(
  File "/.pyenv/versions/3.8.6/lib/python3.8/site-packages/torch/tensor.py", line 528, in __rsub__
    return _C._VariableFunctions.rsub(self, other)
RuntimeError: CUDA error: device-side assert triggered
docker@a68944098dc2:/Study-MaskFormer$

What are cityscapes panoptic results for Maskformer?

Hi! It is a very nice work! I wonder what are cityscapes panoptic results for Maskformer?

How to compute the FLOPS reported in the paper

Hi Bowen, thank you for such a great work! I just have a small question about how to get the reported FLOPs in the paper. Is there a script or public repo that has this functionality? Thanks!

Inference time

Hi! Thanks for open-sourcing this work.

Out of curiosity, do you have any metric on the inference time on CPU for example, and whether this kind of model could run a few times a second if quantized on a TPU for example? Thanks in advance.

the input size of Flops is 256x256?

https://github.com/facebookresearch/detectron2/blob/main/tools/analyze_model.py

Hi Bowen. I calculate the flop and params with the scirpt, but the result is not the same with your paper.
The maskformer_swin_small_bs16_160k.yaml is 63M Params and 111G Flops. In your paper is 63M Params and 79G Flops. Is there any problems with my calculation? When the input shape resize to 256x256 it is the similar as your paper.

python3 analyze_model.py --config-file ./configs/ade20k-150/swin/maskformer_swin_small_bs16_160k.yaml --tasks flop

Augmentations used in inference: [ResizeShortestEdge(short_edge_length=(512, 512), max_size=2048, sample_style='choice')]
[11/15 13:41:29 detectron2]: Flops table computed from only one input sample:

module	#parameters or shape	#flops
model	63.075M	80.909G
backbone	48.839M	49.38G
backbone.patch_embed	4.896K	83.362M
backbone.patch_embed.proj	4.704K	75.497M
backbone.patch_embed.norm	0.192K	7.864M
backbone.layers	48.831M	49.282G
backbone.layers.0	0.299M	4.394G
backbone.layers.1	1.188M	4.367G
backbone.layers.2	33.16M	35.953G
backbone.layers.3.blocks	14.184M	4.567G
backbone.norm0	0.192K	7.864M
backbone.norm0.weight	(96,)
backbone.norm0.bias	(96,)
backbone.norm1	0.384K	3.932M
backbone.norm1.weight	(192,)
backbone.norm1.bias	(192,)
backbone.norm2	0.768K	1.966M
backbone.norm2.weight	(384,)
backbone.norm2.bias	(384,)
backbone.norm3	1.536K	0.983M
backbone.norm3.weight	(768,)
backbone.norm3.bias	(768,)
sem_seg_head	14.236M	27.453G
sem_seg_head.pixel_decoder	4.305M	23.56G
sem_seg_head.pixel_decoder.adapter_1	25.088K	0.424G
sem_seg_head.pixel_decoder.layer_1	0.59M	9.685G
sem_seg_head.pixel_decoder.adapter_2	49.664K	0.207G
sem_seg_head.pixel_decoder.layer_2	0.59M	2.421G
sem_seg_head.pixel_decoder.adapter_3	98.816K	0.102G
sem_seg_head.pixel_decoder.layer_3	0.59M	0.605G
sem_seg_head.pixel_decoder.layer_4	1.77M	0.453G
sem_seg_head.pixel_decoder.mask_features	0.59M	9.664G
sem_seg_head.predictor	9.932M	3.887G
sem_seg_head.predictor.transformer.decoder	9.473M	1.179G
sem_seg_head.predictor.query_embed	25.6K
sem_seg_head.predictor.input_proj	0.197M	50.332M
sem_seg_head.predictor.class_embed	38.807K	23.194M
sem_seg_head.predictor.mask_embed.layers	0.197M	0.118G
[11/15 13:41:29 detectron2]: Average GFlops for each type of operators:
[('conv', 32.83191595008), ('layer_norm', 0.22296760319999998), ('linear', 67.07614236672), ('matmul', 1.92566500224), ('group_norm', 0.0769406976), ('upsample_nearest2d', 0.00764854272), ('bmm', 0.139984896), ('einsum', 8.959275), ('upsample_bilinear2d', 0.29302461)]
[11/15 13:41:29 detectron2]: Total GFlops: 111.5±12.8

Can MaskFormer support multi-node training?

Dear Bowen,

Thanks for your excellent paper and your open-source code, I find it very interesting.

I have a question about how to train the MaskFormer on multi-nodes jointly, have you tried it yet?

For example, when each node has 8 GPUs, I use two nodes, my training script is as follows, but I cannot work it well. It seems that each of my nodes could find another node. I really need your help.

python -u train_net.py
--num-gpus 8
--machine-rank 0
--num-machines 2
--config-file configs/cityscapes-19/maskformer_R101c_bs16_90k.yaml
MODEL.WEIGHTS pretrain_models/R-103.pkl

python -u train_net.py
--num-gpus 8
--machine-rank 1
--num-machines 2
--config-file configs/cityscapes-19/maskformer_R101c_bs16_90k.yaml
MODEL.WEIGHTS pretrain_models/R-103.pkl \

I would appreciate it if you could help me.
Thank you in advance!

Qianyu

Support for custom dataset

Hi, thank tou for sharing your code, I just have 2 questions.

How should I prepare my custom dataset to test the model?
Do you know the maximum resolution achievable by this model to obtain good results? I mean, does this model work with 1024x1024 images for example?

Thank you!

Attention mask in last Swin basic layer

In the original Swin implementation last BasicLayer with 2 SwinTransformerBlock's does not uses attention mask:

first: due to condition https://github.com/microsoft/Swin-Transformer/blob/main/models/swin_transformer.py#L192
second: due to shift size equals 0

But your SwinTransformerBlock implementation does not uses such condition and first SwinTransformerBlock will be computed WITH attention mask.

Is it an error, or you did this on purpose? Will it harm the performance or boost it?

Any instance segmentation results?

It seems MaskFormer can be directly used for instance segmentation.
Is there any result on the MS-COCO instance segmentation task?

No module named 'timm'

When I run the demo.py，it happened:

Traceback (most recent call last):
File "demo.py", line 26, in
from mask_former import add_mask_former_config
File "/root/MaskFormer-master/MaskFormer-master/demo/../mask_former/init.py", line 3, in
from . import modeling
File "/root/MaskFormer-master/MaskFormer-master/demo/../mask_former/modeling/init.py", line 2, in
from .backbone.swin import D2SwinTransformer
File "/root/MaskFormer-master/MaskFormer-master/demo/../mask_former/modeling/backbone/swin.py", line 16, in
from timm.models.layers import DropPath, to_2tuple, trunc_normal_
ModuleNotFoundError: No module named 'timm'

can you tell me why?

Query embedding not learnable?

It seems that the query_embed in TransformerPredictor is used as the positional embedding for the query in the transformer predictor branch. The query embeddings themselves are initialized as zero vectors as the paper describes. But such initialization is apparently not trainable. Was it a purposeful choice that you kept the query embeddings fixed as zeros?

Failed to download https://dl.fbaipublicfiles.com/detectron2/ImageNetPretrained/torchvision/R-50.pkl

When I run python train_net.py --config-file configs/ade20k-full-847/maskformer_R50_bs16_200k.yaml --num-gpus 1 SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0001, it happened:

[Checkpointer] Loading from detectron2://ImageNetPretrained/torchvision/R-50.pkl ...
R-50.pkl: 0.00B [00:00, ?B/s]
Failed to download https://dl.fbaipublicfiles.com/detectron2/ImageNetPretrained/torchvision/R-50.pkl
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/urllib/request.py", line 1319, in do_open
encode_chunked=req.has_header('Transfer-encoding'))
File "/opt/conda/lib/python3.7/http/client.py", line 1252, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/opt/conda/lib/python3.7/http/client.py", line 1298, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/opt/conda/lib/python3.7/http/client.py", line 1247, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/opt/conda/lib/python3.7/http/client.py", line 1026, in _send_output
self.send(msg)
File "/opt/conda/lib/python3.7/http/client.py", line 966, in send
self.connect()
File "/opt/conda/lib/python3.7/http/client.py", line 1414, in connect
super().connect()
File "/opt/conda/lib/python3.7/http/client.py", line 942, in connect
self._tunnel()
File "/opt/conda/lib/python3.7/http/client.py", line 921, in _tunnel
message.strip()))
OSError: Tunnel connection failed: 403 Forbidden

what can I do?
If I download R50-pkl from the url manually, which path should I put it under and what configs should to set?

semantic seg training Error

C = C.reshape(num_queries, -1).cpu() RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

This is happened when I run with python ./train_net.py --num-gpus 4 \ --config-file ./configs/ade20k-150/maskformer_R50_bs16_160k_my.yaml \

hope you or anyone can help me. thanks in advance :)

Convergence speed of maskformer

Maskformer is similar to detr in the structure. But why maskformer does not need many iterations like detr? I notice that the convergence speed of maskformer is close to other traditional segmentation algorithms.

the mIoU in ade20k is not good like in the ade20k benchmark

thank you for your great work.
However, the mIoU i test with the maskformer_swin_large_IN21k_384_bs16_160k_res640.yaml and model is only 0.4022,
but your mIoU in ade20k benchmark is 0.4967, which is much better.
Could you please tell me why?
thank you.

Question about your paper

I've read your paper. Thank you so much for such a great work.

You said you didn't do 'per-pixel classification' but I think you did 'per-pixel, per-class, binary classification'.

Is that right to understand your paper?

Thanks.

cannot import name 'FakeQuantizeBase' from 'torch.quantization'

When I run train_net.py to train maskformer, it raised error :
Traceback (most recent call last):
File "train_net.py", line 19, in
from detectron2.data import MetadataCatalog, build_detection_train_loader
File "/home/dennisyuan/code/detectron2/detectron2/data/init.py", line 4, in
from .build import (
File "/home/dennisyuan/code/detectron2/detectron2/data/build.py", line 13, in
from detectron2.structures import BoxMode
File "/home/dennisyuan/code/detectron2/detectron2/structures/init.py", line 7, in
from .masks import BitMasks, PolygonMasks, polygons_to_bitmask, ROIMasks
File "/home/dennisyuan/code/detectron2/detectron2/structures/masks.py", line 10, in
from detectron2.layers.roi_align import ROIAlign
File "/home/dennisyuan/code/detectron2/detectron2/layers/init.py", line 2, in
from .batch_norm import FrozenBatchNorm2d, get_norm, NaiveSyncBatchNorm
File "/home/dennisyuan/code/detectron2/detectron2/layers/batch_norm.py", line 4, in
from fvcore.nn.distributed import differentiable_all_reduce
File "/python/lib/python3.8/site-packages/fvcore/nn/init.py", line 2, in
from .activation_count import ActivationCountAnalysis, activation_count
File "/python/lib/python3.8/site-packages/fvcore/nn/activation_count.py", line 10, in
from .jit_analysis import JitModelAnalysis
File "/python/lib/python3.8/site-packages/fvcore/nn/jit_analysis.py", line 15, in
from fvcore.common.checkpoint import _named_modules_with_dup
File "/python/lib/python3.8/site-packages/fvcore/common/checkpoint.py", line 23, in
from torch.quantization import ObserverBase, FakeQuantizeBase
ImportError: cannot import name 'FakeQuantizeBase' from 'torch.quantization' (/python/lib/python3.8/site-packages/torch/quantization/init.py)

How is the label constructed?

The first step is to train the binary mask, so what does the label in loss look like? How to set the N value？

train error

Traceback (most recent call last):
File "./train_net.py", line 270, in
args=(args,),
File "/usr/local/lib64/python3.6/site-packages/detectron2/engine/launch.py", line 62, in launch
main_func(*args)
File "./train_net.py", line 242, in main
cfg = setup(args)
File "./train_net.py", line 231, in setup
add_mask_former_config(cfg)
File "/MaskFormer-master/mask_former/config.py", line 79, in add_mask_former_config
cfg.MODEL.SWIN.QK_SCALE = None
File "/usr/local/lib/python3.6/site-packages/fvcore/common/config.py", line 148, in setattr
super().setattr(name, val)
File "/usr/local/lib/python3.6/site-packages/yacs/config.py", line 158, in setattr
type(value), name, _VALID_TYPES
File "/usr/local/lib/python3.6/site-packages/yacs/config.py", line 525, in _assert_with_logging
assert cond, msg
AssertionError: Invalid type <class 'NoneType'> for key QK_SCALE; valid types = {<class 'int'>, <class 'bool'>, <class 'str'>, <class 'tuple'>, <class 'float'>, <class 'list'>}

python3 ./train_net.py --config-file configs/ade20k-150/maskformer_R50_bs16_160k.yaml --num-gpus 1 SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.01

ADE20k prepare pan_seg, two stuffs has the same color possibly result in miss calculation of results

Hi
In the MaskFormer/datasets/prepare_ade20k_pan_seg.py
In the PLAETTE the 7th [140,140,140] (road:route, stuff) and the 49th [140,140,140] (skyscraper, stuff) has the same PLAETTE color.
They are both stuff. According to the Idgenerater() they will have the same segment_id. Resulting in both areas storing the same colour as panoptic annotation png and same "id" in json file.
line 437

            segm_info.append(
                {
                    "id": int(segment_id),   # same for both skyscrapers and road
                    "category_id": int(semantic_cat_id),  # different categories
                    "area": int(area),
                    "bbox": bbox,
                    "iscrowd": 0,
                }
            )

While in the pan_seg data_mapper, it seems the mapper directly convert back the categories using rgb2id and use the "id" to do training afterwards. This probably will map road and skyscraper areas together to either road and skyscraper. (In training those two categories will have both areas as the ground truth. I am not sure what will happen when evaluation)
line 106

    pan_seg_gt = rgb2id(pan_seg_gt)
    # some lines later ...
    for segment_info in segments_info:
        class_id = segment_info["category_id"]   # different categories
        if not segment_info["iscrowd"]:
            classes.append(class_id)
            masks.append(pan_seg_gt == segment_info["id"])  # same for both skyscrapers and road

I assume this may cause miss calculation in training and evaluation.

Could you please help me have a check of that?

errors occurred when running demo.py

When I run demo.py using the following command:
python ./demo/demo.py --config-file configs/cityscapes-19/maskformer_R101_bs16_90k.yaml --input ../videos/sanjose_street/test01000.png --opts MODEL.WEIGHTS ./model_final_38c00c.pkl

It gives me the following errors:

[09/29 23:17:36 detectron2]: Arguments: Namespace(confidence_threshold=0.5, config_file='configs/cityscapes-19/maskformer_R101_bs16_90k.yaml', input=['../videos/sanjose_street/test01000.png'], opts=['MODEL.WEIGHTS', './model_final_38c00c.pkl'], output=None, video_input=None, webcam=False)
Traceback (most recent call last):
File "./demo/demo.py", line 106, in
cfg = setup_cfg(args)
File "./demo/demo.py", line 38, in setup_cfg
add_mask_former_config(cfg)
File "/home/user/4TB/fc_tmp/git/MaskFormer/demo/../mask_former/config.py", line 79, in add_mask_former_config
cfg.MODEL.SWIN.QK_SCALE = None
File "/home/henry.jeng/anaconda3/envs/remnav/lib/python3.7/site-packages/fvcore/common/config.py", line 148, in setattr
super().setattr(name, val)
File "/home/henry.jeng/anaconda3/envs/remnav/lib/python3.7/site-packages/yacs/config.py", line 158, in setattr
type(value), name, _VALID_TYPES
File "/home/henry.jeng/anaconda3/envs/remnav/lib/python3.7/site-packages/yacs/config.py", line 525, in _assert_with_logging
assert cond, msg
AssertionError: Invalid type <class 'NoneType'> for key QK_SCALE; valid types = {<class 'str'>, <class 'int'>, <class 'bool'>, <class 'tuple'>, <class 'list'>, <class 'float'>}

How to prepare ground truth for MaskFormer-fixed setting?

Thanks for your wonderful work! May I ask how to prepare classification and mask ground truth for MaskFormer-fixed training? In your paper, it says "i.e. N = K and assignment is done based on category label indices identically to per-pixel classification setup". Sorry I don't fully understand.

Question about parameter tuning

Hi bowen,

Thank you for sharing such a great work.

I have a question about the parameter for training on new dataset for panoptic segmentation task. In the new dataset, we have less objects in each image (maybe 1-5). What parameter do you think is the most important to conduct this adaptation? Any advice is really appreciated.

The situation is that I found the final mask and dice loss are close to 0.1 which is somehow smaller than it in coco panoptic training (about 0.3)? I wonder is there any normalization in the code to make the loss small when object number is small? I think no?

Another random question: do you use a large batchsize of 64 because of the poor label quality of coco? If I change it to 16 will that have a large difference (I ask this because of the panoptic-deeplab pytorch version using 16 while paper using 64 also)?

Look forward to your reply.

where is the tansformer code part?

hi, I read the code many times. but I didn't find the tansformer part code in transformer file folder,,,,

AttributeError: Attribute 'ignore_label' does not exist in the metadata of dataset 'ade20k_sem_seg_train'.

I ran the following command.

./train_net.py \
  --config-file configs/ade20k-150/maskformer_R50_bs16_160k.yaml \
  --num-gpus 1 SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0001

Then I got the following error. What should I do?

Traceback (most recent call last):
  File "train_net.py", line 264, in <module>
    launch(
  File "/home/keiichi.kuroyanagi/.pyenv/versions/3.8.6-mask-former/lib/python3.8/site-packages/detectron2/engine/launch.py", line 62, in launch
    main_func(*args)
  File "train_net.py", line 256, in main
    trainer = Trainer(cfg)
  File "/home/keiichi.kuroyanagi/.pyenv/versions/3.8.6-mask-former/lib/python3.8/site-packages/detectron2/engine/defaults.py", line 312, in __init__
    data_loader = self.build_train_loader(cfg)
  File "train_net.py", line 107, in build_train_loader
    mapper = MaskFormerSemanticDatasetMapper(cfg, True)
  File "/home/keiichi.kuroyanagi/.pyenv/versions/3.8.6-mask-former/lib/python3.8/site-packages/detectron2/config/config.py", line 181, in wrapped
    explicit_args = _get_args_from_config(from_config_func, *args, **kwargs)
  File "/home/keiichi.kuroyanagi/.pyenv/versions/3.8.6-mask-former/lib/python3.8/site-packages/detectron2/config/config.py", line 238, in _get_args_from_config
    ret = from_config_func(*args, **kwargs)
  File "/mnt/sdb1/lost+found/clones/Study-MaskFormer/mask_former/data/dataset_mappers/mask_former_semantic_dataset_mapper.py", line 87, in from_config
    ignore_label = meta.ignore_label
  File "/home/keiichi.kuroyanagi/.pyenv/versions/3.8.6-mask-former/lib/python3.8/site-packages/detectron2/data/catalog.py", line 126, in __getattr__
    raise AttributeError(
AttributeError: Attribute 'ignore_label' does not exist in the metadata of dataset 'ade20k_sem_seg_train'. Available keys are dict_keys(['name', 'stuff_classes', 'image_root', 'sem_seg_root', 'evaluator_type', 'stuff_colors']).
⋊> ~/l/c/Study-MaskFormer on develop

Model cannot converge when training coco panoptic task

Hello, Thanks for your contribution！I want to train coco panoptic model, but i met a convergence problem. I use the maskformer_panoptic_R50_bs64_554k.yaml as my config, but I only have 4 P40 GPU, so I adjust the learning rate to 0.0000125. Buy after 400 K iterations, the total loss is still around 8.0. Except for the learning rate and the number of GPUs, I did not modify other parameters. Does the coco panoptic model have to be trained with 64 GPUs?

Question about normalization (Mean/Std) value different from swin pretrained backbone

Thanks for your great work! I have a question about the swin transformer backbone model. For training swin transformer, the original swin transformer import std and mean from timm with the following value:

However, as shown in the config file of this work, the mean and std have been changed to the following value:

It would be really appreciated if you could give any suggestion on this question. Thanks a lot!

RuntimeError: CUDA error: device-side assert triggered

I generated the annotation_detectron2 file according to prepare_ade20k_sem_seg.py, but the lower error appears when I run train_net. What should I do next?
[08/06 12:04:22 mask_former.data.dataset_mappers.mask_former_semantic_dataset_mapper]: [MaskFormerSemanticDatasetMapper] Augmentations used in training: [ResizeShortestEdge(short_edge_length=..., max_size=2048, sample_style='choice'), RandomCrop_CategoryAreaConstraint(crop_type='absolute', crop_size=[512, 512], single_category_max_area=1.0, ignored_category=255), <detectron2.projects.point_rend.color_augmentation.ColorAugSSDTransform object at 0x7f4d20b1d0d0>, RandomFlip()] [08/06 12:04:23 d2.data.datasets.coco]: Loaded 20210 images with semantic segmentation from datasets/ADEChallengeData2016/images/training [08/06 12:04:23 d2.data.common]: Serializing 20210 elements to byte tensors and concatenating them all ... [08/06 12:04:23 d2.data.common]: Serialized dataset takes 3.97 MiB [08/06 12:04:23 d2.data.build]: Using training sampler TrainingSampler [08/06 12:04:23 d2.engine.train_loop]: Starting training from iteration 0 /pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [40,0,0], thread: [32,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [40,0,0], thread: [33,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [40,0,0], thread: [34,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [40,0,0], thread: [35,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [40,0,0], thread: [36,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [40,0,0], thread: [37,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [40,0,0], thread: [38,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [40,0,0], thread: [39,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [18,0,0], thread: [0,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [18,0,0], thread: [1,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [18,0,0], thread: [2,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [18,0,0], thread: [3,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [15,0,0], thread: [64,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [15,0,0], thread: [65,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [15,0,0], thread: [66,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [15,0,0], thread: [67,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [15,0,0], thread: [68,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [15,0,0], thread: [69,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [15,0,0], thread: [70,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [15,0,0], thread: [71,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [11,0,0], thread: [64,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [11,0,0], thread: [65,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [11,0,0], thread: [66,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [11,0,0], thread: [67,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [19,0,0], thread: [64,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [19,0,0], thread: [65,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [19,0,0], thread: [66,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [19,0,0], thread: [67,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [19,0,0], thread: [68,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [19,0,0], thread: [69,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [19,0,0], thread: [70,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [19,0,0], thread: [71,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [19,0,0], thread: [72,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [19,0,0], thread: [73,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [19,0,0], thread: [74,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [19,0,0], thread: [75,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
ERROR [08/06 12:04:24 d2.engine.train_loop]: Exception during training:
Traceback (most recent call last):
File "/data/d1/liuchuandong/point_prj/detectron2/detectron2/engine/train_loop.py", line 142, in train
self.run_step()
File "/data/d1/liuchuandong/point_prj/detectron2/detectron2/engine/train_loop.py", line 235, in run_step
loss_dict = self.model(data)
File "/home/liuchuandong/anaconda3/envs/detectron2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/liuchuandong/lpc/mask_former/mask_former_model.py", line 180, in forward
losses = self.criterion(outputs, targets)
File "/home/liuchuandong/anaconda3/envs/detectron2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/liuchuandong/lpc/mask_former/modeling/criterion.py", line 162, in forward
indices = self.matcher(outputs_without_aux, targets)
File "/home/liuchuandong/anaconda3/envs/detectron2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/liuchuandong/anaconda3/envs/detectron2/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 26, in decorate_context
return func(*args, **kwargs)
File "/home/liuchuandong/lpc/mask_former/modeling/matcher.py", line 165, in forward
return self.memory_efficient_forward(outputs, targets)
File "/home/liuchuandong/anaconda3/envs/detectron2/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 26, in decorate_context
return func(*args, **kwargs)
File "/home/liuchuandong/lpc/mask_former/modeling/matcher.py", line 134, in memory_efficient_forward
C = C.reshape(num_queries, -1).cpu()
RuntimeError: CUDA error: device-side assert triggered
[08/06 12:04:24 d2.engine.hooks]: Total training time: 0:00:01 (0:00:00 on hooks)
[08/06 12:04:24 d2.utils.events]: iter: 0 lr: N/A max_mem: 8654M
Traceback (most recent call last):
File "/home/liuchuandong/lpc/train_net.py", line 264, in
launch(
File "/data/d1/liuchuandong/point_prj/detectron2/detectron2/engine/launch.py", line 62, in launch
main_func(*args)
File "/home/liuchuandong/lpc/train_net.py", line 258, in main
return trainer.train()
File "/data/d1/liuchuandong/point_prj/detectron2/detectron2/engine/defaults.py", line 410, in train
super().train(self.start_iter, self.max_iter)
File "/data/d1/liuchuandong/point_prj/detectron2/detectron2/engine/train_loop.py", line 142, in train
self.run_step()
File "/data/d1/liuchuandong/point_prj/detectron2/detectron2/engine/train_loop.py", line 235, in run_step
loss_dict = self.model(data)
File "/home/liuchuandong/anaconda3/envs/detectron2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/liuchuandong/lpc/mask_former/mask_former_model.py", line 180, in forward
losses = self.criterion(outputs, targets)
File "/home/liuchuandong/anaconda3/envs/detectron2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/liuchuandong/lpc/mask_former/modeling/criterion.py", line 162, in forward
indices = self.matcher(outputs_without_aux, targets)
File "/home/liuchuandong/anaconda3/envs/detectron2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/liuchuandong/anaconda3/envs/detectron2/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 26, in decorate_context
return func(*args, **kwargs)
File "/home/liuchuandong/lpc/mask_former/modeling/matcher.py", line 165, in forward
return self.memory_efficient_forward(outputs, targets)
File "/home/liuchuandong/anaconda3/envs/detectron2/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 26, in decorate_context
return func(*args, **kwargs)
File "/home/liuchuandong/lpc/mask_former/modeling/matcher.py", line 134, in memory_efficient_forward
C = C.reshape(num_queries, -1).cpu()
RuntimeError: CUDA error: device-side assert triggered

Process finished with exit code 1`

Is it possible to use the Detectron2 API to load a pretrained model?

I want to run inference on an image using a pretrained model given in this project and I want to use the detectron2 API to load the model. Is it something possible?

Here's what I tried:

I tried to run the Run a pre-trained detectron2 model section in the Getting Started notebook of detectron2.
When obtaining the config, I've changed the code to below:

URL = "ade20k-150/maskformer_R50_bs16_160k.yaml"
cfg = get_cfg()
# add project-specific config (e.g., TensorMask) here if you're not running a model in detectron2's core library
cfg.merge_from_file(model_zoo.get_config_file(URL))
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5  # set threshold for this model
# Find a model from detectron2's model zoo. You can use the https://dl.fbaipublicfiles... url as well
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url(URL)

It seems like the models aren't available in the model zoo:

  ---------------------------------------------------------------------------
  RuntimeError                              Traceback (most recent call last)
  <ipython-input-9-f6671c809590> in <module>()
        1 cfg = get_cfg()
        2 # add project-specific config (e.g., TensorMask) here if you're not running a model in detectron2's core library
  ----> 3 cfg.merge_from_file(model_zoo.get_config_file(URL))
        4 cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5  # set threshold for this model
        5 # Find a model from detectron2's model zoo. You can use the https://dl.fbaipublicfiles... url as well
  
  /usr/local/lib/python3.7/dist-packages/detectron2/model_zoo/model_zoo.py in get_config_file(config_path)
      117     )
      118     if not os.path.exists(cfg_file):
  --> 119         raise RuntimeError("{} not available in Model Zoo!".format(config_path))
      120     return cfg_file
      121 
  
  RuntimeError: ade20k-150/maskformer_R50_bs16_160k.yaml not available in Model Zoo!

If this isn't possible, what is the way we can load a pretrained model in the user code?

config error?

mask_former/config.py", line 79, in add_mask_former_config
    cfg.MODEL.SWIN.QK_SCALE = None
  File "/fvcore/fvcore/common/config.py", line 148, in __setattr__
    super().__setattr__(name, val)
  File "/usr/local/lib/python3.6/dist-packages/yacs-0.1.6-py3.6.egg/yacs/config.py", line 158, in __setattr__
    type(value), name, _VALID_TYPES
  File "/usr/local/lib/python3.6/dist-packages/yacs-0.1.6-py3.6.egg/yacs/config.py", line 521, in _assert_with_logging
    assert cond, msg
AssertionError: Invalid type <class 'NoneType'> for key QK_SCALE; valid types = {<class 'list'>, <class 'bool'>, <class 'int'>, <class 'tuple'>, <class 'str'>, <class 'float'>}

Did I miss something? Seems cfg.MODEL.SWIN.QK_SCALE can not set None.

python3 demo/demo.py \
    --config-file configs/coco-panoptic/maskformer_panoptic_R50_bs64_554k.yaml --video-input drive_sh.mp4 \
    --opts MODEL.WEIGHTS weights/model_final_f3fc73.pkl

The pretrained model

The pretrained models with backbone "Swin" in "COCO Panoptic Segmentation" cannot be accessed

About the missing file.

Hello, I have not found the file 'prepare_ade20k_panoptic_annotations.py' to generate the ade20k panoptic annotations, could you please provide the file? By the way, if the MaskFormer could be used on Mapillary Vistas panoptic segmentation?
Looking forward to your reply.

Training Log of maskformer_panoptic_swin_tiny_bs64_554k.yaml

Thanks for your great work. It would be really appreciated if you have any chance to share the training log of config maskformer_panoptic_swin_tiny_bs64_554k.yaml.

If there is no training log, could you please share your experience on when the network will achieve 50% final accuracy or any equivalent information?

Thanks in advance!

Problem running demo on COCO Panoptic

I try to run the demo.py example using the ''../configs/coco-panoptic/maskformer_panoptic_R50_bs64_554k.yaml' as configuration but i keep getting the following error: 'Non-existent config key: MODEL.RESNETS.STEM_TYPE'.
Any ideas of what can be the problem?

the model becomes very bad after I set the dice_weights to 0.

I can train maskformer when [dice_weight=1. mask_weight=20.] or [dice_weight=1. mask_weight=0.].
but when I set [dice_weight=0. mask_weight=20.],the model becomes very hard to train.
How do I train this model using only focalloss?

Panoptic segmentation on Cityscapes

Do you have the plan to support panoptic segmentation on Cityscapes?

Panoptic segmentation on Cityscapes

Thanks for your excellent work.

May I know if you have tried to do panoptic segmentation on Cityscapes dataset?

I am trying to do it but got weird results. I prepared the cityscapes dataset by following the instructions and modified the config files from the semantic config file "maskformer_R101_bs16_90k.yaml".

If you have tried, could you please provide the config files for panoptic segmentation on Cityscapes?

Thank you.

No instruction or code for coco panoptic segmentation training?

Looking forward to the coco panoptic instructions. Thanks!

question about ade20k benchmark learning setting

A great work! Here, I have some questions.

val data is used to fine-tune the model by training data or train and val data learn together?
if val to fine-tune, how much epoch is used? Is there any unify and default setting for fair comparison?
if train and val data learn together, the learning setting is also the same? 160k iters?

Look forward to your reply.

A naive question

hi, @bowenc0221
a naive question in

MaskFormer/mask_former/modeling/criterion.py

Line 101 in 2ae8543

 loss_ce = F.cross_entropy(src_logits.transpose(1, 2), target_classes, self.empty_weight) 

, why the src_logits should be transposed?
Thanks.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.