lucasjinreal / yolov7_d2 Goto Github PK

🔥🔥🔥🔥 (Earlier YOLOv7 not official one) YOLO with Transformers and Instance Segmentation, with TensorRT acceleration! 🔥🔥🔥

License: GNU General Public License v3.0

Python 96.34% CMake 0.07% C++ 3.50% Shell 0.09%

yolo yolox detection detextron2 tensorrt yolov6 yolov7 face transformers detr

yolov7_d2's Introduction

Documentation • Installation Instructions • Deployment • Contributing • Reporting Issues

this is another yolov7 implementation based on detectron2, YOLOX, YOLOv6, YOLOv5, DETR, Anchor-DETR, DINO and some other SOTA detection models also supported. The ultimate goal of yolov7-d2 is to build a powerful weapon for anyone who wants a SOTA detector and train it without pain. It's extremly easy for users to build any Multi-Head models on yolov7-d2, for example, our E2E pose estimation is build on yolov7-d2 and works very well.

Thanks for Aarohi's youtube vlog for guaidance of yolov7: https://www.youtube.com/watch?v=ag88beS_fvM , if you want a quick start, take a look at this nice introduction on yolov7 and detectron2.

New version will release!

YOLOv7 v2.0 will be released soon! We will release our Convext-tiny YOLO arch model achieves mAP 43.9 with very low latency! Feature will be included in next version:

Support EfficientFormer backbone;
Support new YOLO2Go model, more lighter, much more faster and much more accurate;
Support MobileOne backbone;

For more details, refer to read the doc.

Just fork and star!, you will be noticed once we release the new version!

🔥🔥🔥 Just another yolo variant implemented based on detectron2. But note that YOLOv7 isn't meant to be a successor of yolo family, 7 is just a magic and lucky number. Instead, YOLOv7 extends yolo into many other vision tasks, such as instance segmentation, one-stage keypoints detection etc..

The supported matrix in YOLOv7 are:

⚠️ Important note: YOLOv7 on Github not the latest version, many features are closed-source but you can get it from https://manaai.cn

Features are ready but not opensource yet:

Convnext training on YOLOX, higher accuracy than original YOLOX;
GFL loss support;
MobileVit-V2 backbone available;
CSPRep-Resnet: a repvgg style resnet used in PP-YOLOE but in pytorch rather than paddle;
VitDet support;
Simple-FPN support from VitDet;
PP-YOLOE head supported;

If you want get full version YOLOv7, either become a contributor or get from https://manaai.cn .

🆕 News!

2022.08.20: Our new lightweighted model MobileOne-S0-YOLOX-Lite achieves mAP 30! surpressed yolox-nano or other lightweighted CPU models!
2022.07.26: Now we are preparing release new pose model;
2022.06.25: Meituan's YOLOv6 training has been supported in YOLOv7!
2022.06.13: New model YOLOX-Convnext-tiny got a ~~41.3~~ 43 mAP beats yolox-s, AP-small even higher!;
2022.06.09: GFL, general focal loss supported;
2022.05.26: Added YOLOX-ConvNext config;
2022.05.18: DINO, DNDetr and DABDetr are about added, new records on coco up to 63.3 AP!
2022.05.09: Big new function added! We adopt YOLOX with Keypoints Head!, model still under train, but you can check at code already;
2022.04.23: We finished the int8 quantization on SparseInst! It works perfect! Download the onnx try it our by your self.
2022.04.15: Now, we support the SparseInst onnx expport!
2022.03.25: New instance seg supported! 40 FPS @ 37 mAP!! Which is fast;
2021.09.16: First transformer based DETR model added, will explore more DETR series models;
2021.08.02: YOLOX arch added, you can train YOLOX as well in this repo;
2021.07.25: We found YOLOv7-Res2net50 beat res50 and darknet53 at same speed level! 5% AP boost on custom dataset;
2021.07.04: Added YOLOF and we can have a anchor free support as well, YOLOF achieves a better trade off on speed and accuracy;
2021.06.25: this project first started.
more

🌹 Contribution Wanted

If you have spare time or if you have GPU card, then help YOLOv7 become more stronger! Here is the guidance of contribute:

Claim task: I have some ideas but do not have enough time to do it, if you want to implement it, claim the task, I will give u detailed advise on how to do, and you can learn a lot from it;
Test mAP: When you finished new idea implementation, create a thread to report experiment mAP, if it work, then merge into our main master branch;
Pull request: YOLOv7 is open and always tracking on SOTA and light models, if a model is useful, we will merge it and deploy it, distribute to all users want to try.

Here are some tasks need to be claimed:

Just join our in-house contributor plan, you can share our newest code with your contribution!

Quick Start

Before running yolov7-d2, make sure you have detectron2 installed, for it's installation, please refer to original facebookresearch repo.

Simple pip install -e . you can have detectron2 installed from source.

Then, just clone this repo:

git clone https://github.com/jinfagang/yolov7_d2
cd yolov7_d2
pip install -e .

Or, you can pip install yolov7-d2 for quick install from pypi.

Then following docs for first training && inference usage.

💁‍♂️ Results

YOLOv7 Instance	Face & Detection

🧑‍🦯 Installation && Quick Start

See docs/install.md

Special requirements (other version may also work, but these are tested, with best performance, including ONNX export best support):

torch 1.11 (stable version)
onnx
onnx-simplifier 0.3.7
alfred-py latest
detectron2 latest

If you using lower version torch, onnx exportation might not work as our expected.

🤔 Features

Some highlights of YOLOv7 are:

A simple and standard training framework for any detection && instance segmentation tasks, based on detectron2;
Supports DETR and many transformer based detection framework out-of-box;
Supports easy to deploy pipeline thought onnx.
This is the only framework support YOLOv4 + InstanceSegmentation in single stage style;
Easily plugin into transformers based detector;

We are strongly recommend you send PR if you have any further development on this project, the only reason for opensource it is just for using community power to make it stronger and further. It's very welcome for anyone contribute on any features!

🧙‍♂️ Pretrained Models

model	backbone	input	aug	AP^val	AP	FPS	weights
SparseInst	R-50	640	✘	32.8	-	44.3	model
SparseInst	R-50-vd	640	✘	34.1	-	42.6	model
SparseInst (G-IAM)	R-50	608	✘	33.4	-	44.6	model
SparseInst (G-IAM)	R-50	608	✓	34.2	34.7	44.6	model
SparseInst (G-IAM)	R-50-DCN	608	✓	36.4	36.8	41.6	model
SparseInst (G-IAM)	R-50-vd	608	✓	35.6	36.1	42.8	model
SparseInst (G-IAM)	R-50-vd-DCN	608	✓	37.4	37.9	40.0	model
SparseInst (G-IAM)	R-50-vd-DCN	640	✓	37.7	38.1	39.3	model
SparseInst Int8 onnx	google drive

🧙‍♂️ Models trained in YOLOv7

model	backbone	input	aug	AP	AP50	APs	FPS	weights
YoloFormer-Convnext-tiny	Convnext-tiny	800	✓	43	63.7	26.5	39.3	model
YOLOX-s	-	800	✓	40.5	-	-	39.3	model

note: We post AP-s here because we want to know how does small object performance in related model, it was notablely higher small-APs for transformer backbone based model! Some of above model might not opensourced but we provide weights.

🥰 Demo

Run a quick demo would be like:

python3 demo.py --config-file configs/wearmask/darknet53.yaml --input ./datasets/wearmask/images/val2017 --opts MODEL.WEIGHTS output/model_0009999.pth

Run a quick demo to upload and explore your YOLOv7 prediction with Weights & Biases . See here for an example

python3 demo.py --config-file configs/wearmask/darknet53.yaml --input ./datasets/wearmask/images/val2017 --wandb-entity <your-username/team> --wandb-project <project-name> --opts MODEL.WEIGHTS output/model_0009999.pth

Run SparseInst:

python demo.py --config-file configs/coco/sparseinst/sparse_inst_r50vd_giam_aug.yaml --video-input ~/Movies/Videos/86277963_nb2-1-80.flv -c 0.4 --opts MODEL.WEIGHTS weights/sparse_inst_r50vd_giam_aug_8bc5b3.pth

an update based on detectron2 newly introduced LazyConfig system, run with a LazyConfig model using:

python3 demo_lazyconfig.py --config-file configs/new_baselines/panoptic_fpn_regnetx_0.4g.py --opts train.init_checkpoint=output/model_0004999.pth

😎 Train

For training, quite simple, same as detectron2:

python train_net.py --config-file configs/coco/darknet53.yaml --num-gpus 8

If you want train YOLOX, you can using config file configs/coco/yolox_s.yaml. All support arch are:

YOLOX: anchor free yolo;
YOLOv7: traditional yolo with some explorations, mainly focus on loss experiments;
YOLOv7P: traditional yolo merged with decent arch from YOLOX;
YOLOMask: arch do detection and segmentation at the same time (tbd);
YOLOInsSeg: instance segmentation based on YOLO detection (tbd);

😎 Rules

There are some rules you must follow to if you want train on your own dataset:

Rule No.1: Always set your own anchors on your dataset, using tools/compute_anchors.py, this applys to any other anchor-based detection methods as well (EfficientDet etc.);
Rule No.2: Keep a faith on your loss will goes down eventually, if not, dig deeper to find out why (but do not post issues repeated caused I might don't know either.).
Rule No.3: No one will tells u but it's real: do not change backbone easily, whole params coupled with your backbone, dont think its simple as you think it should be, also a Deeplearning engineer is not an easy work as you think, the whole knowledge like an ocean, and your knowledge is just a tiny drop of water...
Rule No.4: must using pretrain weights for transoformer based backbone, otherwise your loss will bump;

Make sure you have read rules before ask me any questions.

🔨 Export ONNX && TensorRTT && TVM

detr:

python export.py --config-file detr/config/file

this works has been done, inference script included inside tools.

AnchorDETR:

anchorDETR also supported training and exporting to ONNX.

SparseInst: Sparsinst already supported exporting to onnx!!

python export.py --config-file configs/coco/sparseinst/sparse_inst_r50_giam_aug.yaml --video-input ~/Videos/a.flv  --opts MODEL.WEIGHTS weights/sparse_inst_r50_giam_aug_2b7d68.pth INPUT.MIN_SIZE_TEST 512

If you are on a CPU device, please using:

python export.py --config-file configs/coco/sparseinst/sparse_inst_r50_giam_aug.yaml --input images/COCO_val2014_000000002153.jpg --verbose  --opts MODEL.WEIGHTS weights/sparse_inst_r50_giam_aug_2b7d68.pth MODEL.DEVICE 'cpu'

Then you can have weights/sparse_inst_r50_giam_aug_2b7d68_sim.onnx generated, this onnx can be inference using ORT without any unsupported ops.

🤒️ Performance

Here is a dedicated performance compare with other packages.

tbd.

🪜 Some Tiny Object Datasets supported

Wearmask: support VOC, Yolo, coco 3 format. You can using coco format here. Download from: 链接: https://pan.baidu.com/s/1ozAgUFLqfTXLp-iOecddqQ 提取码: xgep . Using configs/wearmask to train this dataset.
more: to go.

👋 Detection Results

Image	Detections

😯 Dicussion Group

Wechat	QQ

if wechat expired, please contact me update via github issue. group for general discussion, not only for yolov7.
QQ群如果满了请加二群：419548605

🀄️ Some Exp Visualizations

GridMask	Mosaic

©️ License

yolov7_d2's People

Contributors

Stargazers

Watchers

Forkers

wolfworld6 daiyizheng hjffily devolfnn jianshijim henrylol dlyshare yzgrfsy shuowang-ai tmactmac1992 gaoqiangwu xinsuinizhuan zp1018 taxuezcy cvlzw niuwenju tomguluson92 hust-lidelong jie311 techthiyanes jepetolee cqray1990 shoichi-hasegawa0628 perfyperfect wuxiaolianggit rentainhe avisheknandi mjanddy hyaihjq xiao-an-qi zhangjiekui zggl cxz pudongdong imyhxy hust-wayne dandelion111 linhduongtuan tchigher yydan2022 marenan liuxiaoxiao666 akshay-gupta123 93yh desertclaw9 hxl1990 jdc08161063 askintution mbyase l-net-1992 sinead-li yanggui19891007 ashutoshiitk zhouzhubin wyukai runkun-lu zhang0557kui kingo007 wstchhwp zhangshitoday jianpingzhonggit anhuaxiang taotaoyuhust tianxin1024 trigrass2 swjtulinxi daijie1223 g7b9 cv-ip milort freewind2016 islinxu zgq91 lg12170226 wangmj6 wyf94 mei-727 shuaijun-deng helojo serissa daibin lrz55127 tanjiaxian concerttttt simon821 xuhaozhi88 gdbrianlu youtang1993 ltbig fardman69420 davecoding richardclee barneyqiao dcupqiu changgimoon yueyihua cjwqhz tianfukang haiyuwu superxxts

yolov7_d2's Issues

Inference on SparseInst onnxruntime gives bad output data

Hello!

Has anyone successfully exported the sparseinst model to onnx and run inference on it using onnxruntime?
I'm struggling getting reasonable output when running inference in C++ onnxruntime. I'm running the same frame through the network pre-export and the output looks well. But post-export the model outputs scores size (1, 50) with all zeros but a few scores ~0.04. For comparison the model pre-export output two person labels and one sports ball all over 80% confidence.

I exported the model using the provided export_onnx.py as per readme instruction.
The onnx-model looks, from what I can tell, fine when looking at it Netron.

Just wondering if anyone has successfully used the exported model and could share some examples? =)

Could you please provide speed and performance benchmark such as yolo-v5?

Could you please provide speed and performance benchmark such as yolo-v5?
Thanks.

Should I expect same output for the same sparseinst model before and after onnx export?

Hi,

After exporting sparseinst to onnx I receive similar but slightly different output if I give the two models (.pth and .onnx) the same image input. Is this expected? Does it have to do with the different warnings during export such as:

TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!

UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').

See attached images for onnx (green) vs. pth model output for the same image and how they differ between the front legs and around the tail.

.

encoder_sparseinst.py 中MyAdaptiveAvgPool2d函数疑似代码bug

(https://github.com/jinfagang/yolov7/blob/main/yolov7/modeling/transcoders/encoder_sparseinst.py)
这个文件中的30-33行缩进出问题了，elif isinstance(self.sz, list) or isinstance(self.sz, tuple):应该和elif isinstance(self.sz, list) or isinstance(self.sz, tuple):对齐。

class MyAdaptiveAvgPool2d(nn.Module):
def init(self, sz=None):
super().init()
self.sz = sz

def forward(self, x):
    inp_size = x.size()
    kernel_width, kernel_height = inp_size[2], inp_size[3]
    if self.sz is not None:
        if isinstance(self.sz, int):
            kernel_width = math.ceil(inp_size[2] / self.sz)
            kernel_height = math.ceil(inp_size[3] / self.sz)
    elif isinstance(self.sz, list) or isinstance(self.sz, tuple):
        assert len(self.sz) == 2
        kernel_width = math.ceil(inp_size[2] / self.sz[0])
        kernel_height = math.ceil(inp_size[3] / self.sz[1])
    if torch.is_tensor(kernel_width):
        kernel_width = kernel_width.item()
        kernel_height = kernel_height.item()
    return F.avg_pool2d(
        input=x, ceil_mode=False, kernel_size=(kernel_width, kernel_height)
    )

readme need improvements.

Great works. I had just experienced this lib, and running the demo. However, to be honest, the readme is not friendly to users. Concerning the installation instruction, there is too little illustration, almost no requirements illustration. And alfred-py couples with too many third party libs, where no description was provided. For detectron2, the readme for yolov7 requires v0.5, However v0.5 has the problem
"ImportError: cannot import name '_C'". that needs hacking. All in all, the installation procedure needs too much hacking, without install requirements and installation illustration. Improvements in the readme could may this project more perfect.

Error in demo.py (I think I know the solution)

Hi, Thanks for this repo :-D

When I run the demo.py I get the following error:

python3 demo.py --output /home/ws/Desktop/ --config-file configs/coco-instance/yolomask.yaml --input /home/ws/data/dataset/nets_kinneret_only24_2/images/0_2.jpg --opts MODEL.WEIGHTS output/coco_yolomask/model_final.pth
Install mish-cuda to speed up training and inference. More importantly, replace the naive Mish with MishCuda will give a ~1.5G memory saving during training.
[05/08 10:45:50 detectron2]: Arguments: Namespace(confidence_threshold=0.21, config_file='configs/coco-instance/yolomask.yaml', input='/home/ws/data/dataset/nets_kinneret_only24_2/images/0_2.jpg', nms_threshold=0.6, opts=['MODEL.WEIGHTS', 'output/coco_yolomask/model_final.pth'], output='/home/ws/Desktop/', video_input=None, webcam=False)
10:45:50 05.08 INFO yolomask.py:86]: YOLO.ANCHORS: [[142, 110], [192, 243], [459, 401], [36, 75], [76, 55], [72, 146], [12, 16], [19, 36], [40, 28]]
10:45:50 05.08 INFO yolomask.py:90]: backboneshape: [64, 128, 256, 512], size_divisibility: 32
[[142, 110], [192, 243], [459, 401], [36, 75], [76, 55], [72, 146], [12, 16], [19, 36], [40, 28]]
/home/ws/.local/lib/python3.6/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  ../aten/src/ATen/native/TensorShape.cpp:2157.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
[05/08 10:45:52 fvcore.common.checkpoint]: [Checkpointer] Loading from output/coco_yolomask/model_final.pth ...
[05/08 10:45:53 d2.checkpoint.c2_model_loading]: Following weights matched with model:
| Names in Model                                    | Names in Checkpoint
                                    | Shapes                         |
|:--------------------------------------------------|:------------------------------------------------------------------------------------------------------|:-------------------------------|
| backbone.dark2.0.bn.*                             | backbone.dark2.0.bn.{bias,num_batches_tracked,running_mean,running_var,weight}                        | (64,) () (64,) (64,) (64,)     |
| backbone.dark2.0.conv.weight                      | backbone.dark2.0.conv.weight
                                    | (64, 32, 3, 3)                 |
| backbone.dark2.1.conv1.bn.*                       | backbone.dark2.1.conv1.bn.{bias,num_batches_tracked,running_mean,running_var,weight}                  | (32,) () (32,) (32,) (32,)     |
| backbone.dark2.1.conv1.conv.weight                | backbone.dark2.1.conv1.conv.weight
                                    | (32, 64, 1, 1)                 |
| backbone.dark2.1.conv2.bn.*                       | backbone.dark2.1.conv2.bn.{bias,num_batches_tracked,running_mean,running_var,weight}                  | (32,) () (32,) (32,) (32,)     |
| backbone.dark2.1.conv2.conv.weight                | backbone.dark2.1.conv2.conv.weight
                                    | (32, 64, 1, 1)                 |
| backbone.dark2.1.conv3.bn.*                       | backbone.dark2.1.conv3.bn.{bias,num_batches_tracked,running_mean,running_var,weight}                  | (64,) () (64,) (64,) (64,)     |
| backbone.dark2.1.conv3.conv.weight                | backbone.dark2.1.conv3.conv.weight
                                    | (64, 64, 1, 1)                 |
| backbone.dark2.1.m.0.conv1.bn.*                   | backbone.dark2.1.m.0.conv1.bn.{bias,num_batches_tracked,running_mean,running_var,weight}              | (32,) () (32,) (32,) (32,)     |
| backbone.dark2.1.m.0.conv1.conv.weight            | backbone.dark2.1.m.0.conv1.conv.weight
                                    | (32, 32, 1, 1)                 |
| backbone.dark2.1.m.0.conv2.bn.*                   | backbone.dark2.1.m.0.conv2.bn.{bias,num_batches_tracked,running_mean,running_var,weight}              | (32,) () (32,) (32,) (32,)     |
| backbone.dark2.1.m.0.conv2.conv.weight            | backbone.dark2.1.m.0.conv2.conv.weight
                                    | (32, 32, 3, 3)                 |
| backbone.dark3.0.bn.*                             | backbone.dark3.0.bn.{bias,num_batches_tracked,running_mean,running_var,weight}                        | (128,) () (128,) (128,) (128,) |
| backbone.dark3.0.conv.weight                      | backbone.dark3.0.conv.weight
                                    | (128, 64, 3, 3)                |
| backbone.dark3.1.conv1.bn.*                       | backbone.dark3.1.conv1.bn.{bias,num_batches_tracked,running_mean,running_var,weight}                  | (64,) () (64,) (64,) (64,)     |
| backbone.dark3.1.conv1.conv.weight                | backbone.dark3.1.conv1.conv.weight
                                    | (64, 128, 1, 1)                |
| backbone.dark3.1.conv2.bn.*                       | backbone.dark3.1.conv2.bn.{bias,num_batches_tracked,running_mean,running_var,weight}                  | (64,) () (64,) (64,) (64,)     |
| backbone.dark3.1.conv2.conv.weight                | backbone.dark3.1.conv2.conv.weight
                                    | (64, 128, 1, 1)                |
| backbone.dark3.1.conv3.bn.*                       | backbone.dark3.1.conv3.bn.{bias,num_batches_tracked,running_mean,running_var,weight}                  | (128,) () (128,) (128,) (128,) |
| backbone.dark3.1.conv3.conv.weight                | backbone.dark3.1.conv3.conv.weight
                                    | (128, 128, 1, 1)               |
| backbone.dark3.1.m.0.conv1.bn.*                   | backbone.dark3.1.m.0.conv1.bn.{bias,num_batches_tracked,running_mean,running_var,weight}              | (64,) () (64,) (64,) (64,)     |
| backbone.dark3.1.m.0.conv1.conv.weight            | backbone.dark3.1.m.0.conv1.conv.weight
                                    | (64, 64, 1, 1)                 |
| backbone.dark3.1.m.0.conv2.bn.*                   | backbone.dark3.1.m.0.conv2.bn.{bias,num_batches_tracked,running_mean,running_var,weight}              | (64,) () (64,) (64,) (64,)     |
| backbone.dark3.1.m.0.conv2.conv.weight            | backbone.dark3.1.m.0.conv2.conv.weight
                                    | (64, 64, 3, 3)                 |
| backbone.dark3.1.m.1.conv1.bn.*                   | backbone.dark3.1.m.1.conv1.bn.{bias,num_batches_tracked,running_mean,running_var,weight}              | (64,) () (64,) (64,) (64,)     |
| backbone.dark3.1.m.1.conv1.conv.weight            | backbone.dark3.1.m.1.conv1.conv.weight
                                    | (64, 64, 1, 1)                 |
| backbone.dark3.1.m.1.conv2.bn.*                   | backbone.dark3.1.m.1.conv2.bn.{bias,num_batches_tracked,running_mean,running_var,weight}              | (64,) () (64,) (64,) (64,)     |
| backbone.dark3.1.m.1.conv2.conv.weight            | backbone.dark3.1.m.1.conv2.conv.weight
                                    | (64, 64, 3, 3)                 |
| backbone.dark3.1.m.2.conv1.bn.*                   | backbone.dark3.1.m.2.conv1.bn.{bias,num_batches_tracked,running_mean,running_var,weight}              | (64,) () (64,) (64,) (64,)     |
| backbone.dark3.1.m.2.conv1.conv.weight            | backbone.dark3.1.m.2.conv1.conv.weight
                                    | (64, 64, 1, 1)                 |
| backbone.dark3.1.m.2.conv2.bn.*                   | backbone.dark3.1.m.2.conv2.bn.{bias,num_batches_tracked,running_mean,running_var,weight}              | (64,) () (64,) (64,) (64,)     |
| backbone.dark3.1.m.2.conv2.conv.weight            | backbone.dark3.1.m.2.conv2.conv.weight
                                    | (64, 64, 3, 3)                 |
| backbone.dark4.0.bn.*                             | backbone.dark4.0.bn.{bias,num_batches_tracked,running_mean,running_var,weight}                        | (256,) () (256,) (256,) (256,) |
| backbone.dark4.0.conv.weight                      | backbone.dark4.0.conv.weight
                                    | (256, 128, 3, 3)               |
| backbone.dark4.1.conv1.bn.*                       | backbone.dark4.1.conv1.bn.{bias,num_batches_tracked,running_mean,running_var,weight}                  | (128,) () (128,) (128,) (128,) |
| backbone.dark4.1.conv1.conv.weight                | backbone.dark4.1.conv1.conv.weight
                                    | (128, 256, 1, 1)               |
| backbone.dark4.1.conv2.bn.*                       | backbone.dark4.1.conv2.bn.{bias,num_batches_tracked,running_mean,running_var,weight}                  | (128,) () (128,) (128,) (128,) |
| backbone.dark4.1.conv2.conv.weight                | backbone.dark4.1.conv2.conv.weight
                                    | (128, 256, 1, 1)               |
| backbone.dark4.1.conv3.bn.*                       | backbone.dark4.1.conv3.bn.{bias,num_batches_tracked,running_mean,running_var,weight}                  | (256,) () (256,) (256,) (256,) |
| backbone.dark4.1.conv3.conv.weight                | backbone.dark4.1.conv3.conv.weight
                                    | (256, 256, 1, 1)               |
| backbone.dark4.1.m.0.conv1.bn.*                   | backbone.dark4.1.m.0.conv1.bn.{bias,num_batches_tracked,running_mean,running_var,weight}              | (128,) () (128,) (128,) (128,) |
| backbone.dark4.1.m.0.conv1.conv.weight            | backbone.dark4.1.m.0.conv1.conv.weight
                                    | (128, 128, 1, 1)               |
| backbone.dark4.1.m.0.conv2.bn.*                   | backbone.dark4.1.m.0.conv2.bn.{bias,num_batches_tracked,running_mean,running_var,weight}              | (128,) () (128,) (128,) (128,) |
| backbone.dark4.1.m.0.conv2.conv.weight            | backbone.dark4.1.m.0.conv2.conv.weight
                                    | (128, 128, 3, 3)               |
| backbone.dark4.1.m.1.conv1.bn.*                   | backbone.dark4.1.m.1.conv1.bn.{bias,num_batches_tracked,running_mean,running_var,weight}              | (128,) () (128,) (128,) (128,) |
| backbone.dark4.1.m.1.conv1.conv.weight            | backbone.dark4.1.m.1.conv1.conv.weight
                                    | (128, 128, 1, 1)               |
| backbone.dark4.1.m.1.conv2.bn.*                   | backbone.dark4.1.m.1.conv2.bn.{bias,num_batches_tracked,running_mean,running_var,weight}              | (128,) () (128,) (128,) (128,) |
| backbone.dark4.1.m.1.conv2.conv.weight            | backbone.dark4.1.m.1.conv2.conv.weight
                                    | (128, 128, 3, 3)               |
| backbone.dark4.1.m.2.conv1.bn.*                   | backbone.dark4.1.m.2.conv1.bn.{bias,num_batches_tracked,running_mean,running_var,weight}              | (128,) () (128,) (128,) (128,) |
| backbone.dark4.1.m.2.conv1.conv.weight            | backbone.dark4.1.m.2.conv1.conv.weight
                                    | (128, 128, 1, 1)               |
| backbone.dark4.1.m.2.conv2.bn.*                   | backbone.dark4.1.m.2.conv2.bn.{bias,num_batches_tracked,running_mean,running_var,weight}              | (128,) () (128,) (128,) (128,) |
| backbone.dark4.1.m.2.conv2.conv.weight            | backbone.dark4.1.m.2.conv2.conv.weight
                                    | (128, 128, 3, 3)               |
| backbone.dark5.0.bn.*                             | backbone.dark5.0.bn.{bias,num_batches_tracked,running_mean,running_var,weight}                        | (512,) () (512,) (512,) (512,) |
| backbone.dark5.0.conv.weight                      | backbone.dark5.0.conv.weight
                                    | (512, 256, 3, 3)               |
| backbone.dark5.1.conv1.bn.*                       | backbone.dark5.1.conv1.bn.{bias,num_batches_tracked,running_mean,running_var,weight}                  | (256,) () (256,) (256,) (256,) |
| backbone.dark5.1.conv1.conv.weight                | backbone.dark5.1.conv1.conv.weight
                                    | (256, 512, 1, 1)               |
| backbone.dark5.1.conv2.bn.*                       | backbone.dark5.1.conv2.bn.{bias,num_batches_tracked,running_mean,running_var,weight}                  | (512,) () (512,) (512,) (512,) |
| backbone.dark5.1.conv2.conv.weight                | backbone.dark5.1.conv2.conv.weight
                                    | (512, 1024, 1, 1)              |
| backbone.dark5.2.conv1.bn.*                       | backbone.dark5.2.conv1.bn.{bias,num_batches_tracked,running_mean,running_var,weight}                  | (256,) () (256,) (256,) (256,) |
| backbone.dark5.2.conv1.conv.weight                | backbone.dark5.2.conv1.conv.weight
                                    | (256, 512, 1, 1)               |
| backbone.dark5.2.conv2.bn.*                       | backbone.dark5.2.conv2.bn.{bias,num_batches_tracked,running_mean,running_var,weight}                  | (256,) () (256,) (256,) (256,) |
| backbone.dark5.2.conv2.conv.weight                | backbone.dark5.2.conv2.conv.weight
                                    | (256, 512, 1, 1)               |
| backbone.dark5.2.conv3.bn.*                       | backbone.dark5.2.conv3.bn.{bias,num_batches_tracked,running_mean,running_var,weight}                  | (512,) () (512,) (512,) (512,) |
| backbone.dark5.2.conv3.conv.weight                | backbone.dark5.2.conv3.conv.weight
                                    | (512, 512, 1, 1)               |
| backbone.dark5.2.m.0.conv1.bn.*                   | backbone.dark5.2.m.0.conv1.bn.{bias,num_batches_tracked,running_mean,running_var,weight}              | (256,) () (256,) (256,) (256,) |
| backbone.dark5.2.m.0.conv1.conv.weight            | backbone.dark5.2.m.0.conv1.conv.weight
                                    | (256, 256, 1, 1)               |
| backbone.dark5.2.m.0.conv2.bn.*                   | backbone.dark5.2.m.0.conv2.bn.{bias,num_batches_tracked,running_mean,running_var,weight}              | (256,) () (256,) (256,) (256,) |
| backbone.dark5.2.m.0.conv2.conv.weight            | backbone.dark5.2.m.0.conv2.conv.weight
                                    | (256, 256, 3, 3)               |
| backbone.stem.conv.bn.*                           | backbone.stem.conv.bn.{bias,num_batches_tracked,running_mean,running_var,weight}                      | (32,) () (32,) (32,) (32,)     |
| backbone.stem.conv.conv.weight                    | backbone.stem.conv.conv.weight
                                    | (32, 12, 3, 3)                 |
| m.0.*                                             | m.0.{bias,weight}
                                    | (255,) (255,512,1,1)           |
| m.1.*                                             | m.1.{bias,weight}
                                    | (255,) (255,256,1,1)           |
| m.2.*                                             | m.2.{bias,weight}
                                    | (255,) (255,128,1,1)           |
| neck.C3_n3.conv1.bn.*                             | neck.C3_n3.conv1.bn.{bias,num_batches_tracked,running_mean,running_var,weight}                        | (128,) () (128,) (128,) (128,) |
| neck.C3_n3.conv1.conv.weight                      | neck.C3_n3.conv1.conv.weight
                                    | (128, 256, 1, 1)               |
| neck.C3_n3.conv2.bn.*                             | neck.C3_n3.conv2.bn.{bias,num_batches_tracked,running_mean,running_var,weight}                        | (128,) () (128,) (128,) (128,) |
| neck.C3_n3.conv2.conv.weight                      | neck.C3_n3.conv2.conv.weight
                                    | (128, 256, 1, 1)               |
| neck.C3_n3.conv3.bn.*                             | neck.C3_n3.conv3.bn.{bias,num_batches_tracked,running_mean,running_var,weight}                        | (256,) () (256,) (256,) (256,) |
| neck.C3_n3.conv3.conv.weight                      | neck.C3_n3.conv3.conv.weight
                                    | (256, 256, 1, 1)               |
| neck.C3_n3.m.0.conv1.bn.*                         | neck.C3_n3.m.0.conv1.bn.{bias,num_batches_tracked,running_mean,running_var,weight}                    | (128,) () (128,) (128,) (128,) |
| neck.C3_n3.m.0.conv1.conv.weight                  | neck.C3_n3.m.0.conv1.conv.weight
                                    | (128, 128, 1, 1)               |
| neck.C3_n3.m.0.conv2.bn.*                         | neck.C3_n3.m.0.conv2.bn.{bias,num_batches_tracked,running_mean,running_var,weight}                    | (128,) () (128,) (128,) (128,) |
| neck.C3_n3.m.0.conv2.conv.weight                  | neck.C3_n3.m.0.conv2.conv.weight
                                    | (128, 128, 3, 3)               |
| neck.C3_n4.conv1.bn.*                             | neck.C3_n4.conv1.bn.{bias,num_batches_tracked,running_mean,running_var,weight}                        | (256,) () (256,) (256,) (256,) |
| neck.C3_n4.conv1.conv.weight                      | neck.C3_n4.conv1.conv.weight
                                    | (256, 512, 1, 1)               |
| neck.C3_n4.conv2.bn.*                             | neck.C3_n4.conv2.bn.{bias,num_batches_tracked,running_mean,running_var,weight}                        | (256,) () (256,) (256,) (256,) |
| neck.C3_n4.conv2.conv.weight                      | neck.C3_n4.conv2.conv.weight
                                    | (256, 512, 1, 1)               |
| neck.C3_n4.conv3.bn.*                             | neck.C3_n4.conv3.bn.{bias,num_batches_tracked,running_mean,running_var,weight}                        | (512,) () (512,) (512,) (512,) |
| neck.C3_n4.conv3.conv.weight                      | neck.C3_n4.conv3.conv.weight
                                    | (512, 512, 1, 1)               |
| neck.C3_n4.m.0.conv1.bn.*                         | neck.C3_n4.m.0.conv1.bn.{bias,num_batches_tracked,running_mean,running_var,weight}                    | (256,) () (256,) (256,) (256,) |
| neck.C3_n4.m.0.conv1.conv.weight                  | neck.C3_n4.m.0.conv1.conv.weight
                                    | (256, 256, 1, 1)               |
| neck.C3_n4.m.0.conv2.bn.*                         | neck.C3_n4.m.0.conv2.bn.{bias,num_batches_tracked,running_mean,running_var,weight}                    | (256,) () (256,) (256,) (256,) |
| neck.C3_n4.m.0.conv2.conv.weight                  | neck.C3_n4.m.0.conv2.conv.weight
                                    | (256, 256, 3, 3)               |
| neck.C3_p3.conv1.bn.*                             | neck.C3_p3.conv1.bn.{bias,num_batches_tracked,running_mean,running_var,weight}                        | (64,) () (64,) (64,) (64,)     |
| neck.C3_p3.conv1.conv.weight                      | neck.C3_p3.conv1.conv.weight
                                    | (64, 256, 1, 1)                |
| neck.C3_p3.conv2.bn.*                             | neck.C3_p3.conv2.bn.{bias,num_batches_tracked,running_mean,running_var,weight}                        | (64,) () (64,) (64,) (64,)     |
| neck.C3_p3.conv2.conv.weight                      | neck.C3_p3.conv2.conv.weight
                                    | (64, 256, 1, 1)                |
| neck.C3_p3.conv3.bn.*                             | neck.C3_p3.conv3.bn.{bias,num_batches_tracked,running_mean,running_var,weight}                        | (128,) () (128,) (128,) (128,) |
| neck.C3_p3.conv3.conv.weight                      | neck.C3_p3.conv3.conv.weight
                                    | (128, 128, 1, 1)               |
| neck.C3_p3.m.0.conv1.bn.*                         | neck.C3_p3.m.0.conv1.bn.{bias,num_batches_tracked,running_mean,running_var,weight}                    | (64,) () (64,) (64,) (64,)     |
| neck.C3_p3.m.0.conv1.conv.weight                  | neck.C3_p3.m.0.conv1.conv.weight
                                    | (64, 64, 1, 1)                 |
| neck.C3_p3.m.0.conv2.bn.*                         | neck.C3_p3.m.0.conv2.bn.{bias,num_batches_tracked,running_mean,running_var,weight}                    | (64,) () (64,) (64,) (64,)     |
| neck.C3_p3.m.0.conv2.conv.weight                  | neck.C3_p3.m.0.conv2.conv.weight
                                    | (64, 64, 3, 3)                 |
| neck.C3_p4.conv1.bn.*                             | neck.C3_p4.conv1.bn.{bias,num_batches_tracked,running_mean,running_var,weight}                        | (128,) () (128,) (128,) (128,) |
| neck.C3_p4.conv1.conv.weight                      | neck.C3_p4.conv1.conv.weight
                                    | (128, 512, 1, 1)               |
| neck.C3_p4.conv2.bn.*                             | neck.C3_p4.conv2.bn.{bias,num_batches_tracked,running_mean,running_var,weight}                        | (128,) () (128,) (128,) (128,) |
| neck.C3_p4.conv2.conv.weight                      | neck.C3_p4.conv2.conv.weight
                                    | (128, 512, 1, 1)               |
| neck.C3_p4.conv3.bn.*                             | neck.C3_p4.conv3.bn.{bias,num_batches_tracked,running_mean,running_var,weight}                        | (256,) () (256,) (256,) (256,) |
| neck.C3_p4.conv3.conv.weight                      | neck.C3_p4.conv3.conv.weight
                                    | (256, 256, 1, 1)               |
| neck.C3_p4.m.0.conv1.bn.*                         | neck.C3_p4.m.0.conv1.bn.{bias,num_batches_tracked,running_mean,running_var,weight}                    | (128,) () (128,) (128,) (128,) |
| neck.C3_p4.m.0.conv1.conv.weight                  | neck.C3_p4.m.0.conv1.conv.weight
                                    | (128, 128, 1, 1)               |
| neck.C3_p4.m.0.conv2.bn.*                         | neck.C3_p4.m.0.conv2.bn.{bias,num_batches_tracked,running_mean,running_var,weight}                    | (128,) () (128,) (128,) (128,) |
| neck.C3_p4.m.0.conv2.conv.weight                  | neck.C3_p4.m.0.conv2.conv.weight
                                    | (128, 128, 3, 3)               |
| neck.bu_conv1.bn.*                                | neck.bu_conv1.bn.{bias,num_batches_tracked,running_mean,running_var,weight}                           | (256,) () (256,) (256,) (256,) |
| neck.bu_conv1.conv.weight                         | neck.bu_conv1.conv.weight
                                    | (256, 256, 3, 3)               |
| neck.bu_conv2.bn.*                                | neck.bu_conv2.bn.{bias,num_batches_tracked,running_mean,running_var,weight}                           | (128,) () (128,) (128,) (128,) |
| neck.bu_conv2.conv.weight                         | neck.bu_conv2.conv.weight
                                    | (128, 128, 3, 3)               |
| neck.lateral_conv0.bn.*                           | neck.lateral_conv0.bn.{bias,num_batches_tracked,running_mean,running_var,weight}                      | (256,) () (256,) (256,) (256,) |
| neck.lateral_conv0.conv.weight                    | neck.lateral_conv0.conv.weight
                                    | (256, 512, 1, 1)               |
| neck.reduce_conv1.bn.*                            | neck.reduce_conv1.bn.{bias,num_batches_tracked,running_mean,running_var,weight}                       | (128,) () (128,) (128,) (128,) |
| neck.reduce_conv1.conv.weight                     | neck.reduce_conv1.conv.weight
                                    | (128, 256, 1, 1)               |
| orien_head.neck_orien.0.conv_block.0.weight       | orien_head.neck_orien.0.conv_block.0.weight
                                    | (128, 256, 1, 1)               |
| orien_head.neck_orien.0.conv_block.1.*            | orien_head.neck_orien.0.conv_block.1.{bias,num_batches_tracked,running_mean,running_var,weight}       | (128,) () (128,) (128,) (128,) |
| orien_head.neck_orien.1.conv_block.0.weight       | orien_head.neck_orien.1.conv_block.0.weight
                                    | (256, 128, 3, 3)               |
| orien_head.neck_orien.1.conv_block.1.*            | orien_head.neck_orien.1.conv_block.1.{bias,num_batches_tracked,running_mean,running_var,weight}       | (256,) () (256,) (256,) (256,) |
| orien_head.neck_orien.2.conv_block.0.weight       | orien_head.neck_orien.2.conv_block.0.weight
                                    | (128, 256, 1, 1)               |
| orien_head.neck_orien.2.conv_block.1.*            | orien_head.neck_orien.2.conv_block.1.{bias,num_batches_tracked,running_mean,running_var,weight}       | (128,) () (128,) (128,) (128,) |
| orien_head.neck_orien.3.conv_block.0.weight       | orien_head.neck_orien.3.conv_block.0.weight
                                    | (256, 128, 3, 3)               |
| orien_head.neck_orien.3.conv_block.1.*            | orien_head.neck_orien.3.conv_block.1.{bias,num_batches_tracked,running_mean,running_var,weight}       | (256,) () (256,) (256,) (256,) |
| orien_head.neck_orien.4.conv_block.0.weight       | orien_head.neck_orien.4.conv_block.0.weight
                                    | (128, 256, 1, 1)               |
| orien_head.neck_orien.4.conv_block.1.*            | orien_head.neck_orien.4.conv_block.1.{bias,num_batches_tracked,running_mean,running_var,weight}       | (128,) () (128,) (128,) (128,) |
| orien_head.orien_m.0.conv_block.0.weight          | orien_head.orien_m.0.conv_block.0.weight
                                    | (256, 128, 3, 3)               |
| orien_head.orien_m.0.conv_block.1.*               | orien_head.orien_m.0.conv_block.1.{bias,num_batches_tracked,running_mean,running_var,weight}          | (256,) () (256,) (256,) (256,) |
| orien_head.orien_m.1.conv_block.0.weight          | orien_head.orien_m.1.conv_block.0.weight
                                    | (128, 256, 1, 1)               |
| orien_head.orien_m.1.conv_block.1.*               | orien_head.orien_m.1.conv_block.1.{bias,num_batches_tracked,running_mean,running_var,weight}          | (128,) () (128,) (128,) (128,) |
| orien_head.orien_m.2.conv_block.0.weight          | orien_head.orien_m.2.conv_block.0.weight
                                    | (256, 128, 3, 3)               |
| orien_head.orien_m.2.conv_block.1.*               | orien_head.orien_m.2.conv_block.1.{bias,num_batches_tracked,running_mean,running_var,weight}          | (256,) () (256,) (256,) (256,) |
| orien_head.orien_m.3.conv_block.0.weight          | orien_head.orien_m.3.conv_block.0.weight
                                    | (128, 256, 1, 1)               |
| orien_head.orien_m.3.conv_block.1.*               | orien_head.orien_m.3.conv_block.1.{bias,num_batches_tracked,running_mean,running_var,weight}          | (128,) () (128,) (128,) (128,) |
| orien_head.orien_m.4.conv_block.0.weight          | orien_head.orien_m.4.conv_block.0.weight
                                    | (256, 128, 3, 3)               |
| orien_head.orien_m.4.conv_block.1.*               | orien_head.orien_m.4.conv_block.1.{bias,num_batches_tracked,running_mean,running_var,weight}          | (256,) () (256,) (256,) (256,) |
| orien_head.orien_m.5.*                            | orien_head.orien_m.5.{bias,weight}
                                    | (18,) (18,256,1,1)             |
| orien_head.up_levels_2to5.0.conv_block.0.weight   | orien_head.up_levels_2to5.0.conv_block.0.weight
                                    | (64, 64, 1, 1)                 |
| orien_head.up_levels_2to5.0.conv_block.1.*        | orien_head.up_levels_2to5.0.conv_block.1.{bias,num_batches_tracked,running_mean,running_var,weight}   | (64,) () (64,) (64,) (64,)     |
| orien_head.up_levels_2to5.1.0.conv_block.0.weight | orien_head.up_levels_2to5.1.0.conv_block.0.weight
                                    | (64, 128, 1, 1)                |
| orien_head.up_levels_2to5.1.0.conv_block.1.*      | orien_head.up_levels_2to5.1.0.conv_block.1.{bias,num_batches_tracked,running_mean,running_var,weight} | (64,) () (64,) (64,) (64,)     |
| orien_head.up_levels_2to5.2.0.conv_block.0.weight | orien_head.up_levels_2to5.2.0.conv_block.0.weight
                                    | (64, 256, 1, 1)                |
| orien_head.up_levels_2to5.2.0.conv_block.1.*      | orien_head.up_levels_2to5.2.0.conv_block.1.{bias,num_batches_tracked,running_mean,running_var,weight} | (64,) () (64,) (64,) (64,)     |
| orien_head.up_levels_2to5.3.0.conv_block.0.weight | orien_head.up_levels_2to5.3.0.conv_block.0.weight
                                    | (64, 512, 1, 1)                |
| orien_head.up_levels_2to5.3.0.conv_block.1.*      | orien_head.up_levels_2to5.3.0.conv_block.1.{bias,num_batches_tracked,running_mean,running_var,weight} | (64,) () (64,) (64,) (64,)     |
416 416 600
confidence thresh:  0.21
image after transform:  (416, 416, 3)
cost: 0.02672123908996582, fps: 37.42341425983922
Traceback (most recent call last):
  File "demo.py", line 211, in <module>
    res = vis_res_fast(res, img, class_names, colors, conf_thresh)
  File "demo.py", line 146, in vis_res_fast
    img, clss, bit_masks, force_colors=None, draw_contours=False
  File "/home/ws/.local/lib/python3.6/site-packages/alfred/vis/image/mask.py", line 302, in vis_bitmasks_with_classes
    txt = f'{class_names[classes[i]]}'
TypeError: list indices must be integers or slices, not numpy.float32

I think this is happened because this line:
scores = ins.scores.cpu().numpy() clss = ins.pred_classes.cpu().numpy()

This cuse clss to by a numpy array and not a list of int.
Am I right. How it's works for you?

Also, in line (157):
if bboxes:
bboxes is not bool
Maybe it's should by:
if bboxes is not None:

Thanks

demo.py detect will create an error

xxx:~/work/yolov7$ python3 demo.py --config-file configs/coco/sparseinst/sparse_inst_r50vd_giam_aug.yaml --input /home/jsj/work/yolov7/images/2.jpg --opts MODEL.WEIGHTS weights/sparse.pth Install mish-cuda to speed up training and inference. More importantly, replace the naive Mish with MishCuda will give a ~1.5G memory saving during training. [04/26 21:55:23 detectron2]: Arguments: Namespace(confidence_threshold=0.21, config_file='configs/coco/sparseinst/sparse_inst_r50vd_giam_aug.yaml', input='/home/jsj/work/yolov7/images/2.jpg', nms_threshold=0.6, opts=['MODEL.WEIGHTS', 'weights/sparse.pth'], output=None, video_input=None, webcam=False) [04/26 21:55:26 fvcore.common.checkpoint]: [Checkpointer] Loading from weights/sparse.pth ... WARNING [04/26 21:55:26 fvcore.common.checkpoint]: Some model parameters or buffers are not found in the checkpoint: backbone.bn1.{bias, weight} backbone.conv1.0.weight backbone.conv1.1.{bias, weight} backbone.conv1.3.weight backbone.conv1.4.{bias, weight} backbone.conv1.6.weight backbone.layer1.0.bn1.{bias, weight} backbone.layer1.0.bn2.{bias, weight} backbone.layer1.0.bn3.{bias, weight} backbone.layer1.0.conv1.weight backbone.layer1.0.conv2.weight backbone.layer1.0.conv3.weight backbone.layer1.0.downsample.1.weight backbone.layer1.0.downsample.2.{bias, weight} backbone.layer1.1.bn1.{bias, weight} backbone.layer1.1.bn2.{bias, weight} backbone.layer1.1.bn3.{bias, weight} backbone.layer1.1.conv1.weight backbone.layer1.1.conv2.weight backbone.layer1.1.conv3.weight backbone.layer1.2.bn1.{bias, weight} backbone.layer1.2.bn2.{bias, weight} backbone.layer1.2.bn3.{bias, weight} backbone.layer1.2.conv1.weight backbone.layer1.2.conv2.weight backbone.layer1.2.conv3.weight backbone.layer2.0.bn1.{bias, weight} backbone.layer2.0.bn2.{bias, weight} backbone.layer2.0.bn3.{bias, weight} backbone.layer2.0.conv1.weight backbone.layer2.0.conv2.weight backbone.layer2.0.conv3.weight backbone.layer2.0.downsample.1.weight backbone.layer2.0.downsample.2.{bias, weight} backbone.layer2.1.bn1.{bias, weight} backbone.layer2.1.bn2.{bias, weight} backbone.layer2.1.bn3.{bias, weight} backbone.layer2.1.conv1.weight backbone.layer2.1.conv2.weight backbone.layer2.1.conv3.weight backbone.layer2.2.bn1.{bias, weight} backbone.layer2.2.bn2.{bias, weight} backbone.layer2.2.bn3.{bias, weight} backbone.layer2.2.conv1.weight backbone.layer2.2.conv2.weight backbone.layer2.2.conv3.weight backbone.layer2.3.bn1.{bias, weight} backbone.layer2.3.bn2.{bias, weight} backbone.layer2.3.bn3.{bias, weight} backbone.layer2.3.conv1.weight backbone.layer2.3.conv2.weight backbone.layer2.3.conv3.weight backbone.layer3.0.bn1.{bias, weight} backbone.layer3.0.bn2.{bias, weight} backbone.layer3.0.bn3.{bias, weight} backbone.layer3.0.conv1.weight backbone.layer3.0.conv2.weight backbone.layer3.0.conv3.weight backbone.layer3.0.downsample.1.weight backbone.layer3.0.downsample.2.{bias, weight} backbone.layer3.1.bn1.{bias, weight} backbone.layer3.1.bn2.{bias, weight} backbone.layer3.1.bn3.{bias, weight} backbone.layer3.1.conv1.weight backbone.layer3.1.conv2.weight backbone.layer3.1.conv3.weight backbone.layer3.2.bn1.{bias, weight} backbone.layer3.2.bn2.{bias, weight} backbone.layer3.2.bn3.{bias, weight} backbone.layer3.2.conv1.weight backbone.layer3.2.conv2.weight backbone.layer3.2.conv3.weight backbone.layer3.3.bn1.{bias, weight} backbone.layer3.3.bn2.{bias, weight} backbone.layer3.3.bn3.{bias, weight} backbone.layer3.3.conv1.weight backbone.layer3.3.conv2.weight backbone.layer3.3.conv3.weight backbone.layer3.4.bn1.{bias, weight} backbone.layer3.4.bn2.{bias, weight} backbone.layer3.4.bn3.{bias, weight} backbone.layer3.4.conv1.weight backbone.layer3.4.conv2.weight backbone.layer3.4.conv3.weight backbone.layer3.5.bn1.{bias, weight} backbone.layer3.5.bn2.{bias, weight} backbone.layer3.5.bn3.{bias, weight} backbone.layer3.5.conv1.weight backbone.layer3.5.conv2.weight backbone.layer3.5.conv3.weight backbone.layer4.0.bn1.{bias, weight} backbone.layer4.0.bn2.{bias, weight} backbone.layer4.0.bn3.{bias, weight} backbone.layer4.0.conv1.weight backbone.layer4.0.conv2.weight backbone.layer4.0.conv3.weight backbone.layer4.0.downsample.1.weight backbone.layer4.0.downsample.2.{bias, weight} backbone.layer4.1.bn1.{bias, weight} backbone.layer4.1.bn2.{bias, weight} backbone.layer4.1.bn3.{bias, weight} backbone.layer4.1.conv1.weight backbone.layer4.1.conv2.weight backbone.layer4.1.conv3.weight backbone.layer4.2.bn1.{bias, weight} backbone.layer4.2.bn2.{bias, weight} backbone.layer4.2.bn3.{bias, weight} backbone.layer4.2.conv1.weight backbone.layer4.2.conv2.weight backbone.layer4.2.conv3.weight WARNING [04/26 21:55:26 fvcore.common.checkpoint]: The checkpoint state_dict contains keys that are not used by the model: backbone.stem.conv1.weight backbone.stem.conv1.norm.{bias, running_mean, running_var, weight} backbone.res2.0.shortcut.weight backbone.res2.0.shortcut.norm.{bias, running_mean, running_var, weight} backbone.res2.0.conv1.weight backbone.res2.0.conv1.norm.{bias, running_mean, running_var, weight} backbone.res2.0.conv2.weight backbone.res2.0.conv2.norm.{bias, running_mean, running_var, weight} backbone.res2.0.conv3.weight backbone.res2.0.conv3.norm.{bias, running_mean, running_var, weight} backbone.res2.1.conv1.weight backbone.res2.1.conv1.norm.{bias, running_mean, running_var, weight} backbone.res2.1.conv2.weight backbone.res2.1.conv2.norm.{bias, running_mean, running_var, weight} backbone.res2.1.conv3.weight backbone.res2.1.conv3.norm.{bias, running_mean, running_var, weight} backbone.res2.2.conv1.weight backbone.res2.2.conv1.norm.{bias, running_mean, running_var, weight} backbone.res2.2.conv2.weight backbone.res2.2.conv2.norm.{bias, running_mean, running_var, weight} backbone.res2.2.conv3.weight backbone.res2.2.conv3.norm.{bias, running_mean, running_var, weight} backbone.res3.0.shortcut.weight backbone.res3.0.shortcut.norm.{bias, running_mean, running_var, weight} backbone.res3.0.conv1.weight backbone.res3.0.conv1.norm.{bias, running_mean, running_var, weight} backbone.res3.0.conv2.weight backbone.res3.0.conv2.norm.{bias, running_mean, running_var, weight} backbone.res3.0.conv3.weight backbone.res3.0.conv3.norm.{bias, running_mean, running_var, weight} backbone.res3.1.conv1.weight backbone.res3.1.conv1.norm.{bias, running_mean, running_var, weight} backbone.res3.1.conv2.weight backbone.res3.1.conv2.norm.{bias, running_mean, running_var, weight} backbone.res3.1.conv3.weight backbone.res3.1.conv3.norm.{bias, running_mean, running_var, weight} backbone.res3.2.conv1.weight backbone.res3.2.conv1.norm.{bias, running_mean, running_var, weight} backbone.res3.2.conv2.weight backbone.res3.2.conv2.norm.{bias, running_mean, running_var, weight} backbone.res3.2.conv3.weight backbone.res3.2.conv3.norm.{bias, running_mean, running_var, weight} backbone.res3.3.conv1.weight backbone.res3.3.conv1.norm.{bias, running_mean, running_var, weight} backbone.res3.3.conv2.weight backbone.res3.3.conv2.norm.{bias, running_mean, running_var, weight} backbone.res3.3.conv3.weight backbone.res3.3.conv3.norm.{bias, running_mean, running_var, weight} backbone.res4.0.shortcut.weight backbone.res4.0.shortcut.norm.{bias, running_mean, running_var, weight} backbone.res4.0.conv1.weight backbone.res4.0.conv1.norm.{bias, running_mean, running_var, weight} backbone.res4.0.conv2.weight backbone.res4.0.conv2.norm.{bias, running_mean, running_var, weight} backbone.res4.0.conv3.weight backbone.res4.0.conv3.norm.{bias, running_mean, running_var, weight} backbone.res4.1.conv1.weight backbone.res4.1.conv1.norm.{bias, running_mean, running_var, weight} backbone.res4.1.conv2.weight backbone.res4.1.conv2.norm.{bias, running_mean, running_var, weight} backbone.res4.1.conv3.weight backbone.res4.1.conv3.norm.{bias, running_mean, running_var, weight} backbone.res4.2.conv1.weight backbone.res4.2.conv1.norm.{bias, running_mean, running_var, weight} backbone.res4.2.conv2.weight backbone.res4.2.conv2.norm.{bias, running_mean, running_var, weight} backbone.res4.2.conv3.weight backbone.res4.2.conv3.norm.{bias, running_mean, running_var, weight} backbone.res4.3.conv1.weight backbone.res4.3.conv1.norm.{bias, running_mean, running_var, weight} backbone.res4.3.conv2.weight backbone.res4.3.conv2.norm.{bias, running_mean, running_var, weight} backbone.res4.3.conv3.weight backbone.res4.3.conv3.norm.{bias, running_mean, running_var, weight} backbone.res4.4.conv1.weight backbone.res4.4.conv1.norm.{bias, running_mean, running_var, weight} backbone.res4.4.conv2.weight backbone.res4.4.conv2.norm.{bias, running_mean, running_var, weight} backbone.res4.4.conv3.weight backbone.res4.4.conv3.norm.{bias, running_mean, running_var, weight} backbone.res4.5.conv1.weight backbone.res4.5.conv1.norm.{bias, running_mean, running_var, weight} backbone.res4.5.conv2.weight backbone.res4.5.conv2.norm.{bias, running_mean, running_var, weight} backbone.res4.5.conv3.weight backbone.res4.5.conv3.norm.{bias, running_mean, running_var, weight} backbone.res5.0.shortcut.weight backbone.res5.0.shortcut.norm.{bias, running_mean, running_var, weight} backbone.res5.0.conv1.weight backbone.res5.0.conv1.norm.{bias, running_mean, running_var, weight} backbone.res5.0.conv2.weight backbone.res5.0.conv2.norm.{bias, running_mean, running_var, weight} backbone.res5.0.conv3.weight backbone.res5.0.conv3.norm.{bias, running_mean, running_var, weight} backbone.res5.1.conv1.weight backbone.res5.1.conv1.norm.{bias, running_mean, running_var, weight} backbone.res5.1.conv2.weight backbone.res5.1.conv2.norm.{bias, running_mean, running_var, weight} backbone.res5.1.conv3.weight backbone.res5.1.conv3.norm.{bias, running_mean, running_var, weight} backbone.res5.2.conv1.weight backbone.res5.2.conv1.norm.{bias, running_mean, running_var, weight} backbone.res5.2.conv2.weight backbone.res5.2.conv2.norm.{bias, running_mean, running_var, weight} backbone.res5.2.conv3.weight backbone.res5.2.conv3.norm.{bias, running_mean, running_var, weight} 640 640 600 confidence thresh: 0.21 image after transform: (398, 600, 3) cost: 0.0565032958984375, fps: 17.698082635700782 Traceback (most recent call last): File "demo.py", line 211, in <module> res = vis_res_fast(res, img, class_names, colors, conf_thresh) File "demo.py", line 153, in vis_res_fast img = vis_bitmasks_with_classes( File "/XXX/python3.8/site-packages/alfred/vis/image/mask.py", line 272, in vis_bitmasks_with_classes cts, _ = cv2.findContours(m, cv2.RETR_TREE, cv2.CHAIN_APPROX_NONE) ValueError: too many values to unpack (expected 2)

vis_bitmasks_with_classes function has a bug？ Or I do error?

detectron2 install

torch 1.11 (stable version)
detectron2 latest

https://detectron2.readthedocs.io/en/latest/tutorials/install.html
detectron2 only support torch 1.10, there is no installation commands for torch 1.11

Is it possible to do transfer learning?

Hi,
How can I do transfer learning using one of the networks in this repo?

"Dataset 'cloth_train' is not registered

我在train_visdrone.py已经修改了路径

DATASET_ROOT = '/root/autodl-tmp/cloth_1.6'
ANN_ROOT = os.path.join(DATASET_ROOT, 'cloth_anno')
TRAIN_PATH = os.path.join(DATASET_ROOT, 'cloth_train/images')
VAL_PATH = os.path.join(DATASET_ROOT, 'cloth_val/images')
TRAIN_JSON = os.path.join(ANN_ROOT, 'cloth_train.json')
VAL_JSON = os.path.join(ANN_ROOT, 'cloth_val.json')

register_coco_instances("cloth_train", {}, TRAIN_JSON, TRAIN_PATH)
register_coco_instances("cloth_val", {}, VAL_JSON, VAL_PATH)

darknet53 也修改了：
DATASETS:
TRAIN: ("cloth_train",)
TEST: ("cloth_val",)

执行：
python train_net.py --config-file configs/coco/darknet53.yaml --num-gpus 1 --opts MODEL.WEIGHTS /root/tf-logs/yolov7/convnext_tiny_yoloformer.pth

报错："Dataset 'cloth_train' is not registered!

Demo Trouble Shooting

For new version, run demo by:

python demo.py --config-file /path/config -i   /your/videos

the -i here is your:

input image file
images dir
video file

all can automatically handle

cannot overfitt

Hi,

What is the name of the mask loss? I did not see it in the logs.

I am try to overfitt one image. but after the training finished the results are not good.
conf (It's the "yolomask.yaml" config. I changed only the dataset and batch size):

_BASE_: "../Base-YOLOv7.yaml"
MODEL:
  META_ARCHITECTURE: "YOLOMask"
  WEIGHTS: ""
  MASK_ON: True
  BACKBONE:
    NAME: "build_cspdarknetx_backbone"
  DARKNET:
    WEIGHTS: ""
    DEPTH_WISE: False
    OUT_FEATURES: ["dark2", "dark3", "dark4", "dark5"]

  YOLO:
    ANCHORS:
      # yolomask anchors slightly different than YOLOv7
      [
        [142, 110],
        [192, 243],
        [459, 401],

        [36, 75],
        [76, 55],
        [72, 146],
        
        [12, 16],
        [19, 36],
        [40, 28],
      ]
    ANCHOR_MASK: [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
    CLASSES: 80
    IN_FEATURES: ["dark2", "dark3", "dark4", "dark5"] # p3, p4, p5 from FPN
    WIDTH_MUL: 0.50
    DEPTH_MUL: 0.33

    CONF_THRESHOLD: 0.001
    NMS_THRESHOLD: 0.65
    IGNORE_THRESHOLD: 0.7
    VARIANT: "yolov7"
    LOSS_TYPE: "v7"
    LOSS:
      LAMBDA_IOU: 1.1
    NECK:
      TYPE: "fpn"
      WITH_SPP: true

DATASETS:
  TRAIN: ("one_image_overfit",)
  TEST:  ("one_image_overfit",)

INPUT:
  MASK_FORMAT: "bitmask"
  MIN_SIZE_TRAIN: (416, 512, 608)
  MAX_SIZE_TRAIN: 608 # force max size train to 800?
  MIN_SIZE_TEST: 416
  MAX_SIZE_TEST: 608
  # open all augmentations
  RANDOM_FLIP: "none"
  JITTER_CROP:
    ENABLED: False
  RESIZE:
    ENABLED: False
    # SHAPE: (540, 960)
  DISTORTION:
    ENABLED: False
  # MOSAIC:
  #   ENABLED: True
  #   NUM_IMAGES: 4
  #   DEBUG_VIS: True
  #   # MOSAIC_WIDTH: 960
  #   # MOSAIC_HEIGHT: 540
  MOSAIC_AND_MIXUP:
    ENABLED: False
    DEBUG_VIS: False
    ENABLE_MIXUP: False

SOLVER:
  # AMP:
  # ENABLED: true
  IMS_PER_BATCH: 8 # 1/5 bs than YOLOX
  # it can be 0.016 maybe
  BASE_LR: 0.0009
  STEPS: (60000, 80000)
  WARMUP_FACTOR: 0.00033333
  WARMUP_ITERS: 1500
  MAX_ITER: 190000
  LR_SCHEDULER_NAME: "WarmupCosineLR"

TEST:
  # EVAL_PERIOD: 10000
  EVAL_PERIOD: 0
OUTPUT_DIR: "output/coco_yolomask"

DATALOADER:
  # proposals are part of the dataset_dicts, and take a lot of RAM
  NUM_WORKERS: 1

Also I add this line to register the dataset:
`
def load_data(json_path):
with open(json_path, 'r') as file:
json_dataset = json.load(file)
return json_dataset

def get_dataset(json_path, dataset_name, classes):

# DatasetCatalog.register(dataset_name, lambda d=json_path: load_data(d))
register_coco_instances(dataset_name, {}, json_path, '')
# MetadataCatalog.get(dataset_name).set(thing_classes=classes)  # ["Dog", "Cat", "Mouse"]

metadata = MetadataCatalog.get("train")
return metadata

get_dataset('/home/ws/data/dataset/coco_one_image_overfit/train_coco.json',
'coco_one_image_overfit', ['car', 'bus', 'truck'])`

AssertionError: Invalid REFERENCE_WORLD_SIZE in config!

File "//python3.7/site-packages/detectron2/engine/defaults.py", line 674, in auto_scale_workers
), "Invalid REFERENCE_WORLD_SIZE in config!"
AssertionError: Invalid REFERENCE_WORLD_SIZE in config!

Cann't train with solov2 config

Hi, Thanks for this repo :-)
I am try to train the network (and do an overfitt) using solov2 config
When I start the training process I see the image but the masks are wrong (the image flipped, but the masks not).
When I close the image, the training crash. attached the log.
Thanks

python3 train_net.py --config-file configs/coco-instance/solov2_lite.yaml 
Install mish-cuda to speed up training and inference. More importantly, replace the naive Mish with MishCuda will give a ~1.5G memory saving during training.
Command Line Args: Namespace(config_file='configs/coco-instance/solov2_lite.yaml', dist_url='tcp://127.0.0.1:50152', eval_only=False, machine_rank=0, num_gpus=1, num_machines=1, opts=[], resume=False)
[05/09 13:50:05 detectron2]: Rank of current process: 0. World size: 1
[05/09 13:50:06 detectron2]: Environment info:
----------------------  ---------------------------------------------------------------------
sys.platform            linux
Python                  3.6.9 (default, Mar 15 2022, 13:55:28) [GCC 8.4.0]
numpy                   1.19.2
detectron2              0.6 @/home/ws/.local/lib/python3.6/site-packages/detectron2
Compiler                GCC 7.3
CUDA compiler           CUDA 10.2
detectron2 arch flags   3.7, 5.0, 5.2, 6.0, 6.1, 7.0, 7.5
DETECTRON2_ENV_MODULE   <not set>
PyTorch                 1.10.0+cu102 @/home/ws/.local/lib/python3.6/site-packages/torch
PyTorch debug build     False
GPU available           Yes
GPU 0                   GeForce RTX 2080 Ti (arch=7.5)
Driver version          450.57
CUDA_HOME               /usr/local/cuda-10.2
Pillow                  6.2.2
torchvision             0.11.0+cu102 @/home/ws/.local/lib/python3.6/site-packages/torchvision
torchvision arch flags  3.5, 5.0, 6.0, 7.0, 7.5
fvcore                  0.1.5.post20220414
iopath                  0.1.9
cv2                     4.5.5
----------------------  ---------------------------------------------------------------------
PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 10.2
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70
  - CuDNN 7.6.5
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=10.2, CUDNN_VERSION=7.6.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 

[05/09 13:50:06 detectron2]: Command line arguments: Namespace(config_file='configs/coco-instance/solov2_lite.yaml', dist_url='tcp://127.0.0.1:50152', eval_only=False, machine_rank=0, num_gpus=1, num_machines=1, opts=[], resume=False)
[05/09 13:50:06 detectron2]: Contents of args.config_file=configs/coco-instance/solov2_lite.yaml:
MODEL:
  META_ARCHITECTURE: "SOLOv2"
  MASK_ON: True
  BACKBONE:
    NAME: "build_resnet_fpn_backbone"
  RESNETS:
    OUT_FEATURES: ["res2", "res3", "res4", "res5"]
  FPN:
    IN_FEATURES: ["res2", "res3", "res4", "res5"]
  SOLOV2:
    FPN_SCALE_RANGES: ((1, 56), (28, 112), (56, 224), (112, 448), (224, 896))
    NUM_GRIDS: [40, 36, 24, 16, 12]
    NUM_INSTANCE_CONVS: 2
    NUM_KERNELS: 256
    INSTANCE_IN_CHANNELS: 256
    INSTANCE_CHANNELS: 128
    MASK_IN_CHANNELS: 256
    MASK_CHANNELS: 128
    NORM: "SyncBN"
DATASETS:
  TRAIN: ("nets_kinneret_only24",)
  TEST: ("nets_kinneret_only24",)
SOLVER:
  IMS_PER_BATCH: 8
  BASE_LR: 0.01
  WARMUP_FACTOR: 0.01
  WARMUP_ITERS: 1000
  STEPS: (60000, 80000)
  MAX_ITER: 90000
INPUT:
  MIN_SIZE_TRAIN: (640, 672, 704, 736, 768, 800)
  MASK_FORMAT: "bitmask"
VERSION: 2

[05/09 13:50:06 detectron2]: Running with full config:
CUDNN_BENCHMARK: false
DATALOADER:
  ASPECT_RATIO_GROUPING: true
  FILTER_EMPTY_ANNOTATIONS: true
  NUM_WORKERS: 4
  REPEAT_THRESHOLD: 0.0
  SAMPLER_TRAIN: TrainingSampler
DATASETS:
  CLASS_NAMES: []
  PRECOMPUTED_PROPOSAL_TOPK_TEST: 1000
  PRECOMPUTED_PROPOSAL_TOPK_TRAIN: 2000
  PROPOSAL_FILES_TEST: []
  PROPOSAL_FILES_TRAIN: []
  TEST:
  - nets_kinneret_only24
  TRAIN:
  - nets_kinneret_only24
GLOBAL:
  HACK: 1.0
INPUT:
  COLOR_JITTER:
    BRIGHTNESS: false
    LIGHTING: false
    SATURATION: false
  CROP:
    ENABLED: false
    SIZE:
    - 0.9
    - 0.9
    TYPE: relative_range
  DISTORTION:
    ENABLED: false
    EXPOSURE: 1.5
    HUE: 0.1
    SATURATION: 1.5
  FORMAT: BGR
  GRID_MASK:
    ENABLED: false
    MODE: 1
    PROB: 0.3
    USE_HEIGHT: true
    USE_WIDTH: true
  INPUT_SIZE:
  - 640
  - 640
  JITTER_CROP:
    ENABLED: false
    JITTER_RATIO: 0.3
  MASK_FORMAT: bitmask
  MAX_SIZE_TEST: 1333
  MAX_SIZE_TRAIN: 1333
  MIN_SIZE_TEST: 800
  MIN_SIZE_TRAIN:
  - 640
  - 672
  - 704
  - 736
  - 768
  - 800
  MIN_SIZE_TRAIN_SAMPLING: choice
  MOSAIC:
    DEBUG_VIS: false
    ENABLED: false
    MIN_OFFSET: 0.2
    MOSAIC_HEIGHT: 640
    MOSAIC_WIDTH: 640
    NUM_IMAGES: 4
    POOL_CAPACITY: 1000
  MOSAIC_AND_MIXUP:
    DEBUG_VIS: false
    DEGREES: 10.0
    DISABLE_AT_ITER: 120000
    ENABLED: false
    ENABLE_MIXUP: true
    MOSAIC_HEIGHT_RANGE:
    - 512
    - 800
    MOSAIC_WIDTH_RANGE:
    - 512
    - 800
    MSCALE:
    - 0.5
    - 1.5
    NUM_IMAGES: 4
    PERSPECTIVE: 0.0
    POOL_CAPACITY: 1000
    SCALE:
    - 0.5
    - 1.5
    SHEAR: 2.0
    TRANSLATE: 0.1
  RANDOM_FLIP: horizontal
  RESIZE:
    ENABLED: false
    SCALE_JITTER:
    - 0.8
    - 1.2
    SHAPE:
    - 640
    - 640
    TEST_SHAPE:
    - 608
    - 608
  SHIFT:
    SHIFT_PIXELS: 32
MODEL:
  ANCHOR_GENERATOR:
    ANGLES:
    - - -90
      - 0
      - 90
    ASPECT_RATIOS:
    - - 0.5
      - 1.0
      - 2.0
    NAME: DefaultAnchorGenerator
    OFFSET: 0.0
    SIZES:
    - - 32
      - 64
      - 128
      - 256
      - 512
  BACKBONE:
    CHANNEL: 0
    FREEZE_AT: 2
    NAME: build_resnet_fpn_backbone
    SIMPLE: false
    STRIDE: 1
  BIFPN:
    NORM: GN
    NUM_BIFPN: 6
    NUM_LEVELS: 5
    OUT_CHANNELS: 160
    SEPARABLE_CONV: false
  DARKNET:
    DEPTH: 53
    DEPTH_WISE: false
    NORM: BN
    OUT_FEATURES:
    - dark3
    - dark4
    - dark5
    RES5_DILATION: 1
    STEM_OUT_CHANNELS: 32
    WEIGHTS: ''
    WITH_CSP: true
  DETR:
    ATTENTION_TYPE: DETR
    BBOX_EMBED_NUM_LAYERS: 3
    CENTERED_POSITION_ENCODIND: false
    CLS_WEIGHT: 1.0
    DECODER_BLOCK_GRAD: true
    DEC_LAYERS: 6
    DEEP_SUPERVISION: true
    DEFORMABLE: false
    DIM_FEEDFORWARD: 2048
    DROPOUT: 0.1
    ENC_LAYERS: 6
    FROZEN_WEIGHTS: ''
    GIOU_WEIGHT: 2.0
    HIDDEN_DIM: 256
    L1_WEIGHT: 5.0
    NHEADS: 8
    NO_OBJECT_WEIGHT: 0.1
    NUM_CLASSES: 80
    NUM_FEATURE_LEVELS: 1
    NUM_OBJECT_QUERIES: 100
    NUM_QUERY_PATTERN: 3
    NUM_QUERY_POSITION: 300
    PRE_NORM: false
    SPATIAL_PRIOR: learned
    TWO_STAGE: false
    USE_FOCAL_LOSS: false
    WITH_BOX_REFINE: false
  DEVICE: cuda
  EFFICIENTNET:
    FEATURE_INDICES:
    - 1
    - 4
    - 10
    - 15
    NAME: efficientnet_b0
    OUT_FEATURES:
    - stride4
    - stride8
    - stride16
    - stride32
    PRETRAINED: true
  FBNET_V2:
    ARCH: default
    ARCH_DEF: []
    NORM: bn
    NORM_ARGS: []
    OUT_FEATURES:
    - trunk3
    SCALE_FACTOR: 1.0
    STEM_IN_CHANNELS: 3
    WIDTH_DIVISOR: 1
  FPN:
    FUSE_TYPE: sum
    IN_FEATURES:
    - res2
    - res3
    - res4
    - res5
    NORM: ''
    OUT_CHANNELS: 256
    OUT_CHANNELS_LIST:
    - 256
    - 512
    - 1024
    REPEAT: 2
  KEYPOINT_ON: false
  LOAD_PROPOSALS: false
  MASK_ON: true
  META_ARCHITECTURE: SOLOv2
  NMS_TYPE: normal
  ONNX_EXPORT: false
  PADDED_VALUE: 114.0
  PANOPTIC_FPN:
    COMBINE:
      ENABLED: true
      INSTANCES_CONFIDENCE_THRESH: 0.5
      OVERLAP_THRESH: 0.5
      STUFF_AREA_LIMIT: 4096
    INSTANCE_LOSS_WEIGHT: 1.0
  PIXEL_MEAN:
  - 103.53
  - 116.28
  - 123.675
  PIXEL_STD:
  - 1.0
  - 1.0
  - 1.0
  PROPOSAL_GENERATOR:
    MIN_SIZE: 0
    NAME: RPN
  REGNETS:
    OUT_FEATURES:
    - s2
    - s3
    - s4
    TYPE: x
  RESNETS:
    DEFORM_MODULATED: false
    DEFORM_NUM_GROUPS: 1
    DEFORM_ON_PER_STAGE:
    - false
    - false
    - false
    - false
    DEPTH: 50
    NORM: FrozenBN
    NUM_GROUPS: 1
    OUT_FEATURES:
    - res2
    - res3
    - res4
    - res5
    R2TYPE: res2net50_v1d
    RES2_OUT_CHANNELS: 256
    RES5_DILATION: 1
    STEM_OUT_CHANNELS: 64
    STRIDE_IN_1X1: true
    WIDTH_PER_GROUP: 64
  RETINANET:
    BBOX_REG_LOSS_TYPE: smooth_l1
    BBOX_REG_WEIGHTS:
    - 1.0
    - 1.0
    - 1.0
    - 1.0
    FOCAL_LOSS_ALPHA: 0.25
    FOCAL_LOSS_GAMMA: 2.0
    IN_FEATURES:
    - p3
    - p4
    - p5
    - p6
    - p7
    IOU_LABELS:
    - 0
    - -1
    - 1
    IOU_THRESHOLDS:
    - 0.4
    - 0.5
    NMS_THRESH_TEST: 0.5
    NORM: ''
    NUM_CLASSES: 80
    NUM_CONVS: 4
    PRIOR_PROB: 0.01
    SCORE_THRESH_TEST: 0.05
    SMOOTH_L1_LOSS_BETA: 0.1
    TOPK_CANDIDATES_TEST: 1000
  ROI_BOX_CASCADE_HEAD:
    BBOX_REG_WEIGHTS:
    - - 10.0
      - 10.0
      - 5.0
      - 5.0
    - - 20.0
      - 20.0
      - 10.0
      - 10.0
    - - 30.0
      - 30.0
      - 15.0
      - 15.0
    IOUS:
    - 0.5
    - 0.6
    - 0.7
  ROI_BOX_HEAD:
    BBOX_REG_LOSS_TYPE: smooth_l1
    BBOX_REG_LOSS_WEIGHT: 1.0
    BBOX_REG_WEIGHTS:
    - 10.0
    - 10.0
    - 5.0
    - 5.0
    CLS_AGNOSTIC_BBOX_REG: false
    CONV_DIM: 256
    FC_DIM: 1024
    NAME: ''
    NORM: ''
    NUM_CONV: 0
    NUM_FC: 0
    POOLER_RESOLUTION: 14
    POOLER_SAMPLING_RATIO: 0
    POOLER_TYPE: ROIAlignV2
    SMOOTH_L1_BETA: 0.0
    TRAIN_ON_PRED_BOXES: false
  ROI_HEADS:
    BATCH_SIZE_PER_IMAGE: 512
    IN_FEATURES:
    - res4
    IOU_LABELS:
    - 0
    - 1
    IOU_THRESHOLDS:
    - 0.5
    NAME: Res5ROIHeads
    NMS_THRESH_TEST: 0.5
    NUM_CLASSES: 80
    POSITIVE_FRACTION: 0.25
    PROPOSAL_APPEND_GT: true
    SCORE_THRESH_TEST: 0.05
  ROI_KEYPOINT_HEAD:
    CONV_DIMS:
    - 512
    - 512
    - 512
    - 512
    - 512
    - 512
    - 512
    - 512
    LOSS_WEIGHT: 1.0
    MIN_KEYPOINTS_PER_IMAGE: 1
    NAME: KRCNNConvDeconvUpsampleHead
    NORMALIZE_LOSS_BY_VISIBLE_KEYPOINTS: true
    NUM_KEYPOINTS: 17
    POOLER_RESOLUTION: 14
    POOLER_SAMPLING_RATIO: 0
    POOLER_TYPE: ROIAlignV2
  ROI_MASK_HEAD:
    CLS_AGNOSTIC_MASK: false
    CONV_DIM: 256
    NAME: MaskRCNNConvUpsampleHead
    NORM: ''
    NUM_CONV: 0
    POOLER_RESOLUTION: 14
    POOLER_SAMPLING_RATIO: 0
    POOLER_TYPE: ROIAlignV2
  RPN:
    BATCH_SIZE_PER_IMAGE: 256
    BBOX_REG_LOSS_TYPE: smooth_l1
    BBOX_REG_LOSS_WEIGHT: 1.0
    BBOX_REG_WEIGHTS:
    - 1.0
    - 1.0
    - 1.0
    - 1.0
    BOUNDARY_THRESH: -1
    CONV_DIMS:
    - -1
    HEAD_NAME: StandardRPNHead
    IN_FEATURES:
    - res4
    IOU_LABELS:
    - 0
    - -1
    - 1
    IOU_THRESHOLDS:
    - 0.3
    - 0.7
    LOSS_WEIGHT: 1.0
    NMS_THRESH: 0.7
    POSITIVE_FRACTION: 0.5
    POST_NMS_TOPK_TEST: 1000
    POST_NMS_TOPK_TRAIN: 2000
    PRE_NMS_TOPK_TEST: 6000
    PRE_NMS_TOPK_TRAIN: 12000
    SMOOTH_L1_BETA: 0.0
  SEM_SEG_HEAD:
    COMMON_STRIDE: 4
    CONVS_DIM: 128
    IGNORE_VALUE: 255
    IN_FEATURES:
    - p2
    - p3
    - p4
    - p5
    LOSS_WEIGHT: 1.0
    NAME: SemSegFPNHead
    NORM: GN
    NUM_CLASSES: 54
  SOLOV2:
    FPN_INSTANCE_STRIDES:
    - 8
    - 8
    - 16
    - 32
    - 32
    FPN_SCALE_RANGES:
    - - 1
      - 56
    - - 28
      - 112
    - - 56
      - 224
    - - 112
      - 448
    - - 224
      - 896
    INSTANCE_CHANNELS: 128
    INSTANCE_IN_CHANNELS: 256
    INSTANCE_IN_FEATURES:
    - p2
    - p3
    - p4
    - p5
    - p6
    LOSS:
      DICE_WEIGHT: 3.0
      FOCAL_ALPHA: 0.25
      FOCAL_GAMMA: 2.0
      FOCAL_USE_SIGMOID: true
      FOCAL_WEIGHT: 1.0
    MASK_CHANNELS: 128
    MASK_IN_CHANNELS: 256
    MASK_IN_FEATURES:
    - p2
    - p3
    - p4
    - p5
    MASK_THR: 0.5
    MAX_PER_IMG: 100
    NMS_KERNEL: gaussian
    NMS_PRE: 500
    NMS_SIGMA: 2
    NMS_TYPE: matrix
    NORM: SyncBN
    NUM_CLASSES: 80
    NUM_GRIDS:
    - 40
    - 36
    - 24
    - 16
    - 12
    NUM_INSTANCE_CONVS: 2
    NUM_KERNELS: 256
    NUM_MASKS: 256
    PRIOR_PROB: 0.01
    SCORE_THR: 0.1
    SIGMA: 0.2
    TYPE_DCN: DCN
    UPDATE_THR: 0.05
    USE_COORD_CONV: true
    USE_DCN_IN_INSTANCE: false
  SPARSE_INST:
    CLS_THRESHOLD: 0.005
    DATASET_MAPPER: SparseInstDatasetMapper
    DECODER:
      GROUPS: 4
      INST:
        CONVS: 4
        DIM: 256
      KERNEL_DIM: 128
      MASK:
        CONVS: 4
        DIM: 256
      NAME: BaseIAMDecoder
      NUM_CLASSES: 80
      NUM_MASKS: 100
      OUTPUT_IAM: false
      SCALE_FACTOR: 2.0
    ENCODER:
      IN_FEATURES:
      - res3
      - res4
      - res5
      NAME: FPNPPMEncoder
      NORM: ''
      NUM_CHANNELS: 256
    LOSS:
      CLASS_WEIGHT: 2.0
      ITEMS:
      - labels
      - masks
      MASK_DICE_WEIGHT: 2.0
      MASK_PIXEL_WEIGHT: 5.0
      NAME: SparseInstCriterion
      OBJECTNESS_WEIGHT: 1.0
    MASK_THRESHOLD: 0.45
    MATCHER:
      ALPHA: 0.8
      BETA: 0.2
      NAME: SparseInstMatcher
    MAX_DETECTIONS: 100
  SWIN:
    DEPTHS:
    - 2
    - 2
    - 6
    - 2
    OUT_FEATURES:
    - 1
    - 2
    - 3
    PATCH: 4
    TYPE: tiny
    WEIGHTS: ''
    WINDOW: 7
  VT_FPN:
    HEADS: 16
    IN_FEATURES:
    - res2
    - res3
    - res4
    - res5
    LAYERS: 3
    MIN_GROUP_PLANES: 64
    NORM: BN
    OUT_CHANNELS: 256
    POS_HWS: []
    POS_N_DOWNSAMPLE: []
    TOKEN_C: 1024
    TOKEN_LS:
    - 16
    - 16
    - 8
    - 8
  WEIGHTS: ''
  YOLO:
    ANCHORS:
    - - - 116
        - 90
      - - 156
        - 198
      - - 373
        - 326
    - - - 30
        - 61
      - - 62
        - 45
      - - 42
        - 119
    - - - 10
        - 13
      - - 16
        - 30
      - - 33
        - 23
    ANCHOR_MASK: []
    BRANCH_DILATIONS:
    - 1
    - 2
    - 3
    CLASSES: 80
    CONF_THRESHOLD: 0.01
    DEPTH_MUL: 1.0
    IGNORE_THRESHOLD: 0.07
    IN_FEATURES:
    - dark3
    - dark4
    - dark5
    IOU_TYPE: ciou
    LOSS:
      ANCHOR_RATIO_THRESH: 4.0
      BUILD_TARGET_TYPE: default
      LAMBDA_CLS: 1.0
      LAMBDA_CONF: 1.0
      LAMBDA_IOU: 1.1
      LAMBDA_WH: 1.0
      LAMBDA_XY: 1.0
      USE_L1: true
    LOSS_TYPE: v4
    MAX_BOXES_NUM: 100
    NECK:
      TYPE: yolov3
      WITH_SPP: false
    NMS_THRESHOLD: 0.5
    NUM_BRANCH: 3
    ORIEN_HEAD:
      UP_CHANNELS: 64
    TEST_BRANCH_IDX: 1
    VARIANT: yolov3
    WIDTH_MUL: 1.0
OUTPUT_DIR: ./output
SEED: -1
SOLVER:
  AMP:
    ENABLED: false
  AMSGRAD: false
  AUTO_SCALING_METHODS:
  - default_scale_d2_configs
  - default_scale_quantization_configs
  BACKBONE_MULTIPLIER: 0.1
  BASE_LR: 0.01
  BIAS_LR_FACTOR: 1.0
  CHECKPOINT_PERIOD: 5000
  CLIP_GRADIENTS:
    CLIP_TYPE: value
    CLIP_VALUE: 1.0
    ENABLED: false
    NORM_TYPE: 2.0
  GAMMA: 0.1
  IMS_PER_BATCH: 8
  LR_MULTIPLIER_OVERWRITE: []
  LR_SCHEDULER:
    GAMMA: 0.1
    MAX_EPOCH: 500
    MAX_ITER: 40000
    NAME: WarmupMultiStepLR
    STEPS:
    - 30000
    WARMUP_FACTOR: 0.001
    WARMUP_ITERS: 1000
    WARMUP_METHOD: linear
  LR_SCHEDULER_NAME: WarmupMultiStepLR
  MAX_ITER: 90000
  MOMENTUM: 0.9
  NESTEROV: false
  OPTIMIZER: ADAMW
  REFERENCE_WORLD_SIZE: 8
  STEPS:
  - 60000
  - 80000
  WARMUP_FACTOR: 0.01
  WARMUP_ITERS: 1000
  WARMUP_METHOD: linear
  WEIGHT_DECAY: 0.0001
  WEIGHT_DECAY_BIAS: null
  WEIGHT_DECAY_EMBED: 0.0
  WEIGHT_DECAY_NORM: 0.0
TEST:
  AUG:
    ENABLED: false
    FLIP: true
    MAX_SIZE: 4000
    MIN_SIZES:
    - 400
    - 500
    - 600
    - 700
    - 800
    - 900
    - 1000
    - 1100
    - 1200
  DETECTIONS_PER_IMAGE: 100
  EVAL_PERIOD: 0
  EXPECTED_RESULTS: []
  KEYPOINT_OKS_SIGMAS: []
  PRECISE_BN:
    ENABLED: false
    NUM_ITER: 200
VERSION: 2
VIS_PERIOD: 0

[05/09 13:50:06 detectron2]: Full config saved to ./output/config.yaml
[05/09 13:50:06 d2.utils.env]: Using a generated random seed 6550842
[05/09 13:50:06 d2.engine.defaults]: Auto-scaling the config to batch_size=1, learning_rate=0.00125, max_iter=720000, warmup=8000.
13:50:06 05.09 INFO solov2.py:83]: instance_shapes: [ShapeSpec(channels=256, height=None, width=None, stride=4), ShapeSpec(channels=256, height=None, width=None, stride=8), ShapeSpec(channels=256, height=None, width=None, stride=16), ShapeSpec(channels=256, height=None, width=None, stride=32), ShapeSpec(channels=256, height=None, width=None, stride=64)]
[05/09 13:50:08 d2.data.datasets.coco]: Loaded 87 images in COCO format from /home/ws/data/dataset/nets_kinneret_only24_2/train_coco.json
[05/09 13:50:08 d2.data.build]: Removed 0 images with no usable annotations. 87 images left.
[05/09 13:50:08 d2.data.build]: Distribution of instances among all 3 categories:
|  category  | #instances   |  category  | #instances   |  category  | #instances   |
|:----------:|:-------------|:----------:|:-------------|:----------:|:-------------|
|    car     | 1305         |    bus     | 0            |   truck    | 0            |
|            |              |            |              |            |              |
|   total    | 1305         |            |              |            |              |
[05/09 13:50:08 d2.data.build]: Using training sampler TrainingSampler
[05/09 13:50:08 d2.data.common]: Serializing 87 elements to byte tensors and concatenating them all ...
[05/09 13:50:08 d2.data.common]: Serialized dataset takes 0.82 MiB
[05/09 13:50:08 fvcore.common.checkpoint]: No checkpoint found. Initializing model from scratch
[05/09 13:50:08 d2.engine.train_loop]: Starting training from iteration 0
(15, 768, 768)
/home/ws/.local/lib/python3.6/site-packages/detectron2/structures/image_list.py:88: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  max_size = (max_size + (stride - 1)) // stride * stride
[(768, 768)]
torch.Size([1, 3, 768, 768])
/home/ws/.local/lib/python3.6/site-packages/torch/nn/functional.py:3635: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode)
/home/ws/.local/lib/python3.6/site-packages/torch/nn/functional.py:3680: UserWarning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and now uses scale_factor directly, instead of relying on the computed output size. If you wish to restore the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. 
  "The default behavior for interpolate/upsample with float scale_factor changed "
/home/ws/.local/lib/python3.6/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  ../aten/src/ATen/native/TensorShape.cpp:2157.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
/home/ws/PycharmProjects/yolov7/yolov7/modeling/meta_arch/solov2.py:300: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  (center_w / upsampled_size[1]) // (1. / num_grid))
/home/ws/PycharmProjects/yolov7/yolov7/modeling/meta_arch/solov2.py:302: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  (center_h / upsampled_size[0]) // (1. / num_grid))
/home/ws/PycharmProjects/yolov7/yolov7/modeling/meta_arch/solov2.py:306: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  0, int(((center_h - half_h) / upsampled_size[0]) // (1. / num_grid)))
/home/ws/PycharmProjects/yolov7/yolov7/modeling/meta_arch/solov2.py:308: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  num_grid - 1, int(((center_h + half_h) / upsampled_size[0]) // (1. / num_grid)))
/home/ws/PycharmProjects/yolov7/yolov7/modeling/meta_arch/solov2.py:310: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  0, int(((center_w - half_w) / upsampled_size[1]) // (1. / num_grid)))
/home/ws/PycharmProjects/yolov7/yolov7/modeling/meta_arch/solov2.py:312: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  num_grid - 1, int(((center_w + half_w) / upsampled_size[1]) // (1. / num_grid)))
ERROR [05/09 13:50:12 d2.engine.train_loop]: Exception during training:
Traceback (most recent call last):
  File "/home/ws/.local/lib/python3.6/site-packages/detectron2/engine/train_loop.py", line 149, in train
    self.run_step()
  File "train_net.py", line 58, in run_step
    self._trainer.run_step()
  File "/home/ws/.local/lib/python3.6/site-packages/detectron2/engine/train_loop.py", line 285, in run_step
    losses.backward()
  File "/home/ws/.local/lib/python3.6/site-packages/torch/_tensor.py", line 307, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/ws/.local/lib/python3.6/site-packages/torch/autograd/__init__.py", line 156, in backward
    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 128, 192, 192]], which is output 0 of ReluBackward0, is at version 3; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
[05/09 13:50:12 d2.engine.hooks]: Total training time: 0:00:03 (0:00:00 on hooks)
[05/09 13:50:12 d2.utils.events]:  iter: 0    lr: N/A  max_mem: 710M
Traceback (most recent call last):
  File "train_net.py", line 133, in <module>
    args=(args,),
  File "/home/ws/.local/lib/python3.6/site-packages/detectron2/engine/launch.py", line 82, in launch
    main_func(*args)
  File "train_net.py", line 121, in main
    return trainer.train()
  File "/home/ws/.local/lib/python3.6/site-packages/detectron2/engine/defaults.py", line 484, in train
    super().train(self.start_iter, self.max_iter)
  File "/home/ws/.local/lib/python3.6/site-packages/detectron2/engine/train_loop.py", line 149, in train
    self.run_step()
  File "train_net.py", line 58, in run_step
    self._trainer.run_step()
  File "/home/ws/.local/lib/python3.6/site-packages/detectron2/engine/train_loop.py", line 285, in run_step
    losses.backward()
  File "/home/ws/.local/lib/python3.6/site-packages/torch/_tensor.py", line 307, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/ws/.local/lib/python3.6/site-packages/torch/autograd/__init__.py", line 156, in backward
    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [1, 128, 192, 192]], which is output 0 of ReluBackward0, is at version 3; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

About YOLOv7

If you are new to YOLOv7, there are 2 things you need to know:

YOLOv7 is not a simple enhancement upon YOLO, it was meaningless, instead, we are exploring the potential power of transformers model combination with anchor or anchor free design detectors, also, we are exploring on STACK instance head or key points head or semantic segmentation head to a unfied multi-tasking network;
Contribution is very welcomed, please join our WeChat group if you want learn more!

At last, this is my vision of YOLOv7:

Building a training framework with detectron2, not just for detection, but as many tasks as possible;
Building a mature deploy chain, from training to deployment. I am keeping add TensorRT, tvm, ncnn deployment for the YOLOv7 trained model;
Let's pushing YOLOv7 detector's mAP to a higher standard!
Every single contributor I will add to README so that your work can be seen.

Error when trained YOLOX with convnext

There seems like a error when I run the command "CUDA_VISIBLE_DEVICES=1,2,3,4 python train_net.py --config-file conolox/yolox_convnext.yaml --num-gpus 4". The only modification is I changed the "IMS_PER_BATCH: 16" in the yolox_convnext.yaml file, so I want to whether I run the correct command, or this is a bug?

Hey,i found this question still exist https://github.com/jinfagang/yolov7/issues/12#issue-1153874625

Where can I find the flags list?

Hi, I am trying to use your repo to train a model with my custom data but I couldn't find what are the flags of train_visdrone.py. Can you please share documentation on it?
Also, I couldn't find how to make detection on a set of images.

Regards,

TypeError: 'method' object is not subscriptable

when I run " python train_net.py --config-file configs/coco/darknet53.yaml " , the error is as follows:

TypeError: 'method' object is not subscriptable

the error position is in " yolov7-main\yolov7\modeling\meta_arch\yolo.py ",line 64

I try a lot,but it doesn't work.Can you give me some ideas about it?thanks a lot!

RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.

Hi jinfagang,

Thanks for your amazing contribution.

Would you mind help with this issue below, I'm not quite familiar with detectron.

While I tried to run exp on coco2017 dataset with train_detr.py code and detr_256_6_6_regnetx_0.4g.yaml confile file, a error occurred in the init process.

ERROR [03/08 14:10:37 d2.engine.train_loop]: Exception during training:
Traceback (most recent call last):
  File "/home/yuxin/miniconda3/envs/yolov7/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 149, in train
    self.run_step()
  File "/home/yuxin/miniconda3/envs/yolov7/lib/python3.7/site-packages/detectron2/engine/defaults.py", line 494, in run_step
    self._trainer.run_step()
  File "/home/yuxin/miniconda3/envs/yolov7/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 273, in run_step
    loss_dict = self.model(data)
  File "/home/yuxin/miniconda3/envs/yolov7/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/yuxin/code/yolov7/yolov7/modeling/meta_arch/detr.py", line 165, in forward
    output = self.detr(images)
  File "/home/yuxin/miniconda3/envs/yolov7/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/yuxin/code/yolov7/yolov7/modeling/meta_arch/detr.py", line 449, in forward
    features, pos = self.backbone(samples)
  File "/home/yuxin/miniconda3/envs/yolov7/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/yuxin/code/yolov7/yolov7/modeling/backbone/detr_backbone.py", line 504, in forward
    xs = self[0](tensor_list)
  File "/home/yuxin/miniconda3/envs/yolov7/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/yuxin/code/yolov7/yolov7/modeling/meta_arch/detr.py", line 356, in forward
    features = self.backbone(images.tensor)
  File "/home/yuxin/miniconda3/envs/yolov7/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/yuxin/miniconda3/envs/yolov7/lib/python3.7/site-packages/detectron2/modeling/backbone/regnet.py", line 315, in forward
    x = self.stem(x)
  File "/home/yuxin/miniconda3/envs/yolov7/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/yuxin/miniconda3/envs/yolov7/lib/python3.7/site-packages/detectron2/modeling/backbone/regnet.py", line 87, in forward
    x = layer(x)
  File "/home/yuxin/miniconda3/envs/yolov7/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/yuxin/miniconda3/envs/yolov7/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 732, in forward
    world_size = torch.distributed.get_world_size(process_group)
  File "/home/yuxin/miniconda3/envs/yolov7/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 845, in get_world_size
    return _get_group_size(group)
  File "/home/yuxin/miniconda3/envs/yolov7/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 306, in _get_group_size
    default_pg = _get_default_group()
  File "/home/yuxin/miniconda3/envs/yolov7/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 411, in _get_default_group
    "Default process group has not been initialized, "
RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.

Can you show some hint how to fix this issue?

Thanks,
Yuxin.

need help with training custom datase

i want to train custom dataset to train instance with detectron2,
my problem is the traing command , which training .py file i should use
i need the training line ,just that.

install.md

404?

SparseInst onnx2trt error

@jinfagang ,hello, thank you for your work! when i convert SparseInst onnx model to tensorrt model using trtexec , the error is In node -1(importRange):UNSUPPORTED_NODE:Assertion failed:inputs.at(0).isInt32() && "For range operator with dynamic inputs,this version og TensorRT only supports INT32!

tensorrt version:7.2.3
onnx version : v7
pytorch version:1.10

sparseinst pre model results bad

I test sparseinst series models. Its speed and precision are both terrible.
What reason it may be?
Bellow are some of my results.

many remote class near the hair, and overlapped.

sunlogin_20220506221250.mp4

评估太慢了

用两张v100训练，batch_size为16， NUM_WORKERS: 1，评估太慢了。。

When evaluation: Timed out waiting 1800000ms for send operation to complete

Every time the program goes into evaluation, the error occurs:

raise ProcessRaisedException(msg, error_index, failed_process.pid)

torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 1 terminated with the following error:
File "anaconda3/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 2067, in all_gather
work.wait()
RuntimeError: [/opt/conda/conda-bld/pytorch_1646755897462/work/third_party/gloo/gloo/transport/tcp/unbound_buffer.cc:136] Timed out waiting 1800000ms for send operation to complete

Do u have any idea about it? Thanks a lot.

The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

i run demo
!python3 demo.py --config-file /content/yolov7/configs/coco/darknet53.yaml --input /content/yolov7/datasets/coco/test2017/ --opts MODEL.WEIGHTS /content/yolov7/output/model_final.pth

and got this error

800 800 600
confidence thresh:  0.21
image after transform:  (450, 600, 3)
cost: 0.07432031631469727, fps: 13.455271042788125
Traceback (most recent call last):
  File "demo.py", line 201, in <module>
    res = vis_res_fast(res, im, class_names, colors, conf_thresh)
  File "demo.py", line 158, in vis_res_fast
    if bboxes:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

请问loss_type="v7"的代码在哪里可以看到呢？

我看到config里有loss_type有v4,v7这几类，但是没有能够在库里找到相关的代码，请您帮忙解答一下，谢谢。

Cann't overfitt SparseInst

Hi,
I am try to overfit one image using sparse_inst_r50_giam config.
I changed the dataset to a custom dataset and only 3 labels (car, bus, truck).
Those are the lines in the log (after one day of training):

[05/11 12:29:09 d2.utils.events]:  eta: 9 days, 2:03:30  iter: 119739  total_loss: 0.873  loss_box: 0.4872  loss_obj_pos: 0.0006603  loss_obj_neg: 0.003101  loss_cls: 0.04907  loss_orien_pos: 0.1568  loss_orien_neg: 0.1485  loss_xy: 1.128e-07  loss_wh: 0.02689  time: 0.5578  data_time: 0.0603  lr: 0.00011079  max_mem: 4517M
[05/11 12:29:20 d2.utils.events]:  eta: 9 days, 2:06:10  iter: 119759  total_loss: 0.8729  loss_box: 0.4872  loss_obj_pos: 0.0006603  loss_obj_neg: 0.0031  loss_cls: 0.04907  loss_orien_pos: 0.1568  loss_orien_neg: 0.1485  loss_xy: 1.165e-07  loss_wh: 0.02699  time: 0.5578  data_time: 0.0607  lr: 0.00011079  max_mem: 4517M
[05/11 12:29:31 d2.utils.events]:  eta: 9 days, 2:03:08  iter: 119779  total_loss: 0.8728  loss_box: 0.4871  loss_obj_pos: 0.0006602  loss_obj_neg: 0.003099  loss_cls: 0.04906  loss_orien_pos: 0.1569  loss_orien_neg: 0.1485  loss_xy: 1.095e-07  loss_wh: 0.02677  time: 0.5578  data_time: 0.0594  lr: 0.00011079  max_mem: 4517M
[05/11 12:29:42 d2.utils.events]:  eta: 9 days, 2:05:47  iter: 119799  total_loss: 0.8728  loss_box: 0.4871  loss_obj_pos: 0.0006602  loss_obj_neg: 0.003099  loss_cls: 0.04906  loss_orien_pos: 0.1568  loss_orien_neg: 0.1485  loss_xy: 1.129e-07  loss_wh: 0.02687  time: 0.5578  data_time: 0.0605  lr: 0.00011078  max_mem: 4517M
[05/11 12:29:54 d2.utils.events]:  eta: 9 days, 2:08:15  iter: 119819  total_loss: 0.8727  loss_box: 0.487  loss_obj_pos: 0.0006601  loss_obj_neg: 0.003098  loss_cls: 0.04905  loss_orien_pos: 0.1568  loss_orien_neg: 0.1485  loss_xy: 1.164e-07  loss_wh: 0.02697  time: 0.5578  data_time: 0.0610  lr: 0.00011078  max_mem: 4517M
[05/11 12:30:05 d2.utils.events]:  eta: 9 days, 2:02:34  iter: 119839  total_loss: 0.8727  loss_box: 0.487  loss_obj_pos: 0.00066  loss_obj_neg: 0.003097  loss_cls: 0.04905  loss_orien_pos: 0.1569  loss_orien_neg: 0.1485  loss_xy: 1.093e-07  loss_wh: 0.02675  time: 0.5578  data_time: 0.0596  lr: 0.00011078  max_mem: 4517M
[05/11 12:30:16 d2.utils.events]:  eta: 9 days, 1:58:52  iter: 119859  total_loss: 0.8726  loss_box: 0.4869  loss_obj_pos: 0.00066  loss_obj_neg: 0.003097  loss_cls: 0.04904  loss_orien_pos: 0.1568  loss_orien_neg: 0.1485  loss_xy: 1.128e-07  loss_wh: 0.02685  time: 0.5578  data_time: 0.0615  lr: 0.00011078  max_mem: 4517M
[05/11 12:30:27 d2.utils.events]:  eta: 9 days, 1:56:41  iter: 119879  total_loss: 0.8726  loss_box: 0.4869  loss_obj_pos: 0.0006598  loss_obj_neg: 0.003096  loss_cls: 0.04904  loss_orien_pos: 0.1568  loss_orien_neg: 0.1485  loss_xy: 1.166e-07  loss_wh: 0.02696  time: 0.5578  data_time: 0.0604  lr: 0.00011078  max_mem: 4517M
[05/11 12:30:38 d2.utils.events]:  eta: 9 days, 1:52:10  iter: 119899  total_loss: 0.8725  loss_box: 0.4869  loss_obj_pos: 0.0006598  loss_obj_neg: 0.003096  loss_cls: 0.04903  loss_orien_pos: 0.1569  loss_orien_neg: 0.1485  loss_xy: 1.093e-07  loss_wh: 0.02674  time: 0.5578  data_time: 0.0613  lr: 0.00011078  max_mem: 4517M
[05/11 12:30:49 d2.utils.events]:  eta: 9 days, 1:50:12  iter: 119919  total_loss: 0.8725  loss_box: 0.4868  loss_obj_pos: 0.0006597  loss_obj_neg: 0.003095  loss_cls: 0.04902  loss_orien_pos: 0.1569  loss_orien_neg: 0.1485  loss_xy: 1.127e-07  loss_wh: 0.02684  time: 0.5578  data_time: 0.0609  lr: 0.00011078  max_mem: 4517M
[05/11 12:31:00 d2.utils.events]:  eta: 9 days, 1:43:43  iter: 119939  total_loss: 0.8724  loss_box: 0.4868  loss_obj_pos: 0.0006596  loss_obj_neg: 0.003094  loss_cls: 0.04902  loss_orien_pos: 0.1568  loss_orien_neg: 0.1485  loss_xy: 1.166e-07  loss_wh: 0.02694  time: 0.5578  data_time: 0.0605  lr: 0.00011078  max_mem: 4517M
[05/11 12:31:12 d2.utils.events]:  eta: 9 days, 1:42:02  iter: 119959  total_loss: 0.8724  loss_box: 0.4867  loss_obj_pos: 0.0006595  loss_obj_neg: 0.003094  loss_cls: 0.04901  loss_orien_pos: 0.1569  loss_orien_neg: 0.1485  loss_xy: 1.092e-07  loss_wh: 0.02672  time: 0.5578  data_time: 0.0610  lr: 0.00011078  max_mem: 4517M
[05/11 12:31:23 d2.utils.events]:  eta: 9 days, 1:40:19  iter: 119979  total_loss: 0.8723  loss_box: 0.4867  loss_obj_pos: 0.0006595  loss_obj_neg: 0.003093  loss_cls: 0.04901  loss_orien_pos: 0.1569  loss_orien_neg: 0.1485  loss_xy: 1.128e-07  loss_wh: 0.02682  time: 0.5578  data_time: 0.0623  lr: 0.00011078  max_mem: 4517M
[05/11 12:31:34 fvcore.common.checkpoint]: Saving checkpoint to output/coco_yolomask/model_0119999.pth
[05/11 12:31:34 d2.utils.events]:  eta: 9 days, 1:27:01  iter: 119999  total_loss: 0.8723  loss_box: 0.4866  loss_obj_pos: 0.0006594  loss_obj_neg: 0.003092  loss_cls: 0.049  loss_orien_pos: 0.1568  loss_orien_neg: 0.1485  loss_xy: 1.162e-07  loss_wh: 0.02693  time: 0.5578  data_time: 0.0613  lr: 0.00011078  max_mem: 4517M
[05/11 12:31:45 d2.utils.events]:  eta: 9 days, 1:24:30  iter: 120019  total_loss: 0.8722  loss_box: 0.4866  loss_obj_pos: 0.0006593  loss_obj_neg: 0.003091  loss_cls: 0.049  loss_orien_pos: 0.1569  loss_orien_neg: 0.1485  loss_xy: 1.089e-07  loss_wh: 0.02671  time: 0.5578  data_time: 0.0606  lr: 0.00011078  max_mem: 4517M
[05/11 12:31:57 d2.utils.events]:  eta: 9 days, 1:18:55  iter: 120039  total_loss: 0.8721  loss_box: 0.4866  loss_obj_pos: 0.0006592  loss_obj_neg: 0.00309  loss_cls: 0.04899  loss_orien_pos: 0.1569  loss_orien_neg: 0.1485  loss_xy: 1.122e-07  loss_wh: 0.02681  time: 0.5578  data_time: 0.0609  lr: 0.00011078  max_mem: 4517M
[05/11 12:32:08 d2.utils.events]:  eta: 9 days, 1:22:10  iter: 120059  total_loss: 0.8721  loss_box: 0.4865  loss_obj_pos: 0.0006591  loss_obj_neg: 0.00309  loss_cls: 0.04899  loss_orien_pos: 0.1568  loss_orien_neg: 0.1485  loss_xy: 1.161e-07  loss_wh: 0.02691  time: 0.5578  data_time: 0.0604  lr: 0.00011078  max_mem: 4517M
[05/11 12:32:19 d2.utils.events]:  eta: 9 days, 1:21:07  iter: 120079  total_loss: 0.872  loss_box: 0.4865  loss_obj_pos: 0.0006591  loss_obj_neg: 0.003089  loss_cls: 0.04898  loss_orien_pos: 0.1569  loss_orien_neg: 0.1485  loss_xy: 1.092e-07  loss_wh: 0.02669  time: 0.5578  data_time: 0.0622  lr: 0.00011078  max_mem: 4517M
[05/11 12:32:30 d2.utils.events]:  eta: 9 days, 1:23:45  iter: 120099  total_loss: 0.872  loss_box: 0.4864  loss_obj_pos: 0.0006591  loss_obj_neg: 0.003089  loss_cls: 0.04898  loss_orien_pos: 0.1568  loss_orien_neg: 0.1485  loss_xy: 1.121e-07  loss_wh: 0.02679  time: 0.5578  data_time: 0.0615  lr: 0.00011078  max_mem: 4517M
[05/11 12:32:41 d2.utils.events]:  eta: 9 days, 1:20:48  iter: 120119  total_loss: 0.8719  loss_box: 0.4864  loss_obj_pos: 0.000659  loss_obj_neg: 0.003088  loss_cls: 0.04897  loss_orien_pos: 0.1568  loss_orien_neg: 0.1485  loss_xy: 1.158e-07  loss_wh: 0.02689  time: 0.5578  data_time: 0.0607  lr: 0.00011078  max_mem: 4517M
[05/11 12:32:53 d2.utils.events]:  eta: 9 days, 1:13:26  iter: 120139  total_loss: 0.8719  loss_box: 0.4863  loss_obj_pos: 0.000659  loss_obj_neg: 0.003088  loss_cls: 0.04896  loss_orien_pos: 0.1569  loss_orien_neg: 0.1485  loss_xy: 1.09e-07  loss_wh: 0.02668  time: 0.5578  data_time: 0.0608  lr: 0.00011077  max_mem: 4517M
[05/11 12:33:04 d2.utils.events]:  eta: 9 days, 1:08:34  iter: 120159  total_loss: 0.8718  loss_box: 0.4863  loss_obj_pos: 0.0006589  loss_obj_neg: 0.003087  loss_cls: 0.04896  loss_orien_pos: 0.1569  loss_orien_neg: 0.1485  loss_xy: 1.123e-07  loss_wh: 0.02678  time: 0.5578  data_time: 0.0610  lr: 0.00011077  max_mem: 4517M
[05/11 12:33:15 d2.utils.events]:  eta: 9 days, 1:10:41  iter: 120179  total_loss: 0.8718  loss_box: 0.4863  loss_obj_pos: 0.0006589  loss_obj_neg: 0.003087  loss_cls: 0.04896  loss_orien_pos: 0.1568  loss_orien_neg: 0.1485  loss_xy: 1.162e-07  loss_wh: 0.02688  time: 0.5578  data_time: 0.0616  lr: 0.00011077  max_mem: 4517M
[05/11 12:33:26 d2.utils.events]:  eta: 9 days, 1:20:52  iter: 120199  total_loss: 0.8717  loss_box: 0.4862  loss_obj_pos: 0.0006589  loss_obj_neg: 0.003086  loss_cls: 0.04895  loss_orien_pos: 0.1569  loss_orien_neg: 0.1485  loss_xy: 1.086e-07  loss_wh: 0.02666  time: 0.5578  data_time: 0.0604  lr: 0.00011077  max_mem: 4517M
[05/11 12:33:37 d2.utils.events]:  eta: 9 days, 1:17:36  iter: 120219  total_loss: 0.8716  loss_box: 0.4862  loss_obj_pos: 0.0006589  loss_obj_neg: 0.003085  loss_cls: 0.04894  loss_orien_pos: 0.1568  loss_orien_neg: 0.1485  loss_xy: 1.121e-07  loss_wh: 0.02676  time: 0.5578  data_time: 0.0617  lr: 0.00011077  max_mem: 4517M
[05/11 12:33:48 d2.utils.events]:  eta: 9 days, 1:07:31  iter: 120239  total_loss: 0.8716  loss_box: 0.4861  loss_obj_pos: 0.0006588  loss_obj_neg: 0.003085  loss_cls: 0.04894  loss_orien_pos: 0.1568  loss_orien_neg: 0.1485  loss_xy: 1.154e-07  loss_wh: 0.02686  time: 0.5578  data_time: 0.0609  lr: 0.00011077  max_mem: 4517M
[05/11 12:33:59 d2.utils.events]:  eta: 9 days, 1:11:03  iter: 120259  total_loss: 0.8715  loss_box: 0.4861  loss_obj_pos: 0.0006588  loss_obj_neg: 0.003084  loss_cls: 0.04893  loss_orien_pos: 0.1569  loss_orien_neg: 0.1485  loss_xy: 1.086e-07  loss_wh: 0.02665  time: 0.5578  data_time: 0.0598  lr: 0.00011077  max_mem: 4517M

This is the image (after one day of trainin

g):

Why the overfit did not works?
(When I used Yolact the overfit works after a 4-5 hours)

Thanks

where yolo+instance seg at same time

Is that possible to do keypoint and Segmentation same time without effecting performance? nd get

Change license?

Hi,

thanks for this project! Would it be possible to change the license to a less restrictive one? GPL is a really bad license in that sense as any project that builds upon this project is also required to be GPL, which is in many cases not desired.
Or does this framework already include code from YOLOv3-5 and therefore has to be GPL?

Best,
Karol

Training Contribution

If you have spare GPU or spare time, please help YOLOv7 training models! Currently we are planned to train models on:

YOLOX with Keypoints head;
YOLOv7 with NASVit as backbone;

If you want contribute, please leave a message below. I'll guide you how to train.

自定义coco数据集报错

我给train_coco.py加入了如下代码
DATASET_ROOT = './datasets/coco'
ANN_ROOT = os.path.join(DATASET_ROOT, 'annotations')
TRAIN_PATH = os.path.join(DATASET_ROOT, 'train2017')
VAL_PATH = os.path.join(DATASET_ROOT, 'val2017')
TRAIN_JSON = os.path.join(ANN_ROOT, 'instances_train2017.json')
VAL_JSON = os.path.join(ANN_ROOT, 'instances_val2017.json')

register_coco_instances("coco_2017_train", {}, TRAIN_JSON, TRAIN_PATH)
register_coco_instances("coco_2017_val", {}, VAL_JSON, VAL_PATH)

但最后在执行

python train_coco.py --config-file configs/coco/yolox_s.yaml --num-gpus 1

时报错
Install mish-cuda to speed up training and inference. More importantly, replace the naive Mish with MishCuda will give a ~1.5G memory saving during training.
Traceback (most recent call last):
File "train_coco.py", line 34, in
register_coco_instances("coco_2017_train", {}, TRAIN_JSON, TRAIN_PATH)
File "/home/ren/App/Miniconda3/envs/yolov7/lib/python3.7/site-packages/detectron2/data/datasets/coco.py", line 500, in register_coco_instances
DatasetCatalog.register(name, lambda: load_coco_json(json_file, image_root, name))
File "/home/ren/App/Miniconda3/envs/yolov7/lib/python3.7/site-packages/detectron2/data/catalog.py", line 37, in register
assert name not in self, "Dataset '{}' is already registered!".format(name)
AssertionError: Dataset 'coco_2017_train' is already registered!

请问这是为什么

与torch原版yolox的区别

请问您的yolox相比torch原版的yolox有比较具体的改进吗，因为训练效果比原版的要好的多。谢谢回复

SparseInst Model exportation to ONNX

Thank you so much for your work @jinfagang. I have tested yolov7 and I realized that SparseInst models cannot be converted to ONNX. Is the export onnx code compatible with exporting SparseInst models?

requirements

您好，一直运行不了，感觉是环境没有搭好，请问有详细一点的环境配置吗？

sparseinst no results

I try the demo of sparseinst like this

python demo.py --config-file configs/coco/sparseinst/sparse_inst_r50vd_giam_aug.yaml --video-input ~/Movies/Videos/86277963_nb2-1-80.flv -c 0.4 --opts MODEL.WEIGHTS weights/sparse_inst_r50vd_giam_aug_8bc5b3.pth

but I get error

Traceback (most recent call last):
  File "/home/dreamdeck/Documents/MJJ/code/Seg/yolov7-main/demo.py", line 230, in <module>
    res = vis_res_fast(res, frame, class_names, colors, conf_thresh)
  File "/home/dreamdeck/Documents/MJJ/code/Seg/yolov7-main/demo.py", line 155, in vis_res_fast
    img, clss, bit_masks, force_colors=None, draw_contours=False
  File "/home/dreamdeck/anaconda3/envs/detectron2/lib/python3.6/site-packages/alfred/vis/image/mask.py", line 285, in vis_bitmasks_with_classes
    thickness=-1, lineType=cv2.LINE_AA)
cv2.error: OpenCV(4.5.5) /io/opencv/modules/imgproc/src/drawing.cpp:2599: error: (-215:Assertion failed) reader.ptr != NULL in function 'cvDrawContours'

I debuged and found that there is no masks when run predictions = self.model([inputs]). All values in pred_masks are False.

Do I need modify the config file or download other files?( I have dolwnloaded the sparse_inst_r50vd_giam_aug_8bc5b3.pth)
What is the full steps to test demo?

sparseinst onnx result issue

Hi, thank you for your nice work.

I trained sparseinst and converted onnx model successfully.

But, result is different between pth and onnx. The number of prediction mask is different.

My dataset is single class instance segmentation dataset and all images have 2 instances.
(2 instances per image)

If I inference image using pth in demo.py, I can get correct result. (there are 2 prediction masks)

But, If I inference image using onnx, I get only 1 prediction mask. (mask position is good)

What is problem for difference in the number of masks?

Thank you.

yolox_s的pth模型转onnx出错

RuntimeError: Exporting the operator silu to onnx opset version 12 is not supported.

Exporting the operator linspace to ONNX opset version 11 is not supported

您好，根据你的文档说明，我在使用export_onnx,导出模型时，出现上面问题，如何解决？

Install mish-cuda

Install mish-cuda to speed up training and inference. More importantly, replace the naive Mish with MishCuda will give a ~1.5G memory saving during training.

ImportError while running the demo_lazyconfig.py for single image inference

Hi,
I am running the YOLOv7 code with the "Swin_s transformer" as the backbone in Google Colab. While running the demo_lazyconfig.py for inference on single image, I am getting the following error:
ImportError: cannot import name 'SCMode' from 'omegaconf' (/usr/local/lib/python3.7/dist-packages/omegaconf/__init__.py)

Anchordetr exportation to onnx

Thank you so much for your work @jinfagang. I have tested to export detr to onnx, and failed, I used AnchorDETR_r50_c5 as anchordetr_origin, my torch version is 1.8.0, and my onnx version is 1.11.0, could you tell me how can i get a right model? thank you very much!

Traceback (most recent call last):
File "export_onnx.py", line 262, in
torch.onnx.export(model, inp, onnx_f, output_names={
File "/home/xj/anaconda3/lib/python3.8/site-packages/torch/onnx/init.py", line 271, in export
return utils.export(model, args, f, export_params, verbose, training,
File "/home/xj/anaconda3/lib/python3.8/site-packages/torch/onnx/utils.py", line 88, in export
_export(model, args, f, export_params, verbose, training, input_names, output_names,
File "/home/xj/anaconda3/lib/python3.8/site-packages/torch/onnx/utils.py", line 691, in _export
_model_to_graph(model, args, verbose, input_names,
File "/home/xj/anaconda3/lib/python3.8/site-packages/torch/onnx/utils.py", line 454, in _model_to_graph
graph, params, torch_out, module = _create_jit_graph(model, args,
File "/home/xj/anaconda3/lib/python3.8/site-packages/torch/onnx/utils.py", line 417, in _create_jit_graph
graph, torch_out = _trace_and_get_graph_from_model(model, args)
File "/home/xj/anaconda3/lib/python3.8/site-packages/torch/onnx/utils.py", line 377, in _trace_and_get_graph_from_model
torch.jit._get_trace_graph(model, args, strict=False, _force_outplace=False, _return_inputs_states=True)
File "/home/xj/anaconda3/lib/python3.8/site-packages/torch/jit/_trace.py", line 1139, in _get_trace_graph
outs = ONNXTracedModule(f, strict, _force_outplace, return_inputs, _return_inputs_states)(*args, **kwargs)
File "/home/xj/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/xj/anaconda3/lib/python3.8/site-packages/torch/jit/_trace.py", line 125, in forward
graph, out = torch._C._create_graph_by_tracing(
File "/home/xj/anaconda3/lib/python3.8/site-packages/torch/jit/_trace.py", line 116, in wrapper
outs.append(self.inner(*trace_inputs))
File "/home/xj/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 887, in _call_impl
result = self._slow_forward(*input, **kwargs)
File "/home/xj/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 860, in _slow_forward
result = self.forward(*input, **kwargs)
File "/home/xj/yolov7-main/yolov7/modeling/meta_arch/anchor_detr.py", line 183, in forward
self.detr.backbone.prepare_onnx_export()
File "/home/xj/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 947, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'MaskedBackboneTraceFriendly' object has no attribute 'prepare_onnx_export'

custom data

hi @jinfagang,
How to training with custom data?
Best regards,
PeterPham

Support for edge device deployment

Does yolov7 support target detection on jetson devices (TX2, AGX Xavier)?

Where is the yolov7 paper?

I think you need to report what kind of research the yolo version has been upgraded to v7, but is there a yolov7 paper? By the way, yolov5 doesn't have a report, so everyone is confused as shown in the link below.

where is yolov5 paper?

Coco Map Result?

How does it perform compared to other yolo models?
What are AP test? AP50 test ? AP75 test ? Fps?

onnxsimplify detr model error

@jinfagang ,when sumplify detr onnx model, the error happened,
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Reshape node. Name:'Reshape_1682' Status Message: /onnxruntime_src/onnxruntime/core/providers/cpu/tensor/reshape_helper.h:41 onnxruntime::ReshapeHelper::ReshapeHelper(const onnxruntime::TensorShape&, std::vector&, bool) gsl::narrow_cast<int64_t>(input_shape.Size()) == size was false. The input tensor cannot be reshaped to the requested shape. Input shape:{2108,1,256}, requested shape:{2108,800,32}
how to solve it?

SparseInst模型训练不收敛？

您好！
我们使用SparseInst模型可以正常训练，但是训练开始时total_loss是8.x，训练几个小时后，就降到了2.6，然后连续训练了好一周左右，total_loss依然是2.6，没有任何变化。相同的数据集，使用SparseInst官方代码训练，是可以收敛的。
想请问一下这是什么原因造成的？怎么解决？
注：我们使用Yolov7来训练SparseInst主要是想用这里的工程化代码，转为onnx然后转TensorRT的engine.
下图是我们训练中的log：