Giter Club home page Giter Club logo

fastestdet's Issues

train.py导出路径

train.py导出路径有个冒号,windows训练了半天,一个都没保存下来.

May choose the function `os.path.splitext` ?

In line 89 of the file /utils/datasets.py, there is a statement: label_path = img_path.split(".")[0] + ".txt" which may cause some errors, such as using the format of './xxx/xxx. txt '.
To avoid such errors, label_path = os.path.splitext(img_path)[0] + ".txt" might be better?

FatestDet线程设置没用,同样设置YOLO-fastestv2正常

由于项目需要单线程运行,对于YOLO-fastestv2,在amd平台的虚拟机和rk3588上单线程设置都是正常的,查看cpu占用一般只有一到两个核心高。但是配置在Fastestdet的用同样设置上对于虚拟机平台,1-12cpu都在30-40%,而对于rk3588,1-8cpu都在70%左右。
上述都是设置的单线程。使用
ncnn::Net net;
net.opt.num_threads = 1; //此处配置的单线程和下面的ex.set_num_threads(1);保持一致,对于yolofastestv2也是同样的设置
net.load_param("FastestDet.param");
net.load_model("FastestDet.bin");
但是实际上则不像fastestv2的占用,特别是rk3588上的测试,设置单线程,但8核都在70%左右。

example/ncnn 文件下 ncnn 测试问题

用例程执行时,是正常的

ncnn model load sucess...
output: 85, 22, 22
Time: 14.54 ms
x1:4 y1:167 x2:304 y2:268 person:95.12%
x1:262 y1:59 x2:658 y2:1046 bicycle:94.27%
x1:-29 y1:175 x2:134 y2:267 person:87.71%
x1:210 y1:144 x2:355 y2:239 person:78.87%

但是用生成的自己的 param bin 文件,就出现了问题

ncnn model load sucess...
input: 3, 352, 352
output: 0, 0, 0
Time: 6.87 ms


output 维度不见了,请问是为什么呢?

multi-GPU training

Do you plan to train an architecture like yolov5 on multiple GPUs, which is more friendly to the use of large-scale data?

小目标label无法参加训练

发现当label中大小目标差异比较大的时候,用f = iou > iou.mean()过滤后,很多小目标的label完全过滤掉,无法参加训练。比如我的label分别为0,1. 0是大的label,1是小的label。过滤后,只有0的label参加训练。1对应的被忽略

MemoryError

Hi I get this error with batchsize = 32 after 10 epoch
creating index...
index created!
Traceback (most recent call last):
File "train.py", line 134, in
model.train()
File "train.py", line 126, in train
mAP05 = self.evaluation.compute_map(self.val_dataloader, self.model)
File "C:\final\dataset\FastestDet-main\utils\evaluation.py", line 98, in compute_map
mAP05 = self.coco_evaluate(gts, pts)
File "C:\final\dataset\FastestDet-main\utils\evaluation.py", line 45, in coco_evaluate
coco_pred.dataset["images"].append({"id": i})
MemoryError

导出的onnx输出shape是 float32[1,85,22,22],如何转化成想要的 (x0,y0,x1,y1,class)的结果

按照说明,转出 onnx的模型,在 https://netron.app/导入查看到,输出是 float32[1,85,22,22]
但看到命令执行得到的结果形如
tensor([0.0814, 0.1866, 0.3560, 0.8782, 0.9012, 0.0000])
tensor([0.5778, 0.4236, 0.9649, 0.8865, 0.8948, 0.0000])
tensor([0.3183, 0.4065, 0.5119, 0.7926, 0.8064, 0.0000])
tensor([0.4625, 0.4142, 0.6451, 0.8280, 0.7805, 0.0000])

这里是如何转化的呢?

转成ncnn之后模型精度差好多

大神您好,
我这边训练了一个自己的3类的检测模型,直接评测的情况尚可,但是转换ncnn模型并且用大神给的example去进行推理,发现检测结果差别很大,不知道这个是什么原因呢

生成的2个预测框几乎重合

handpose4_issue
模型预测得到两个预测框,这两个框的iou接近于1.0.
这种情况不合乎常理吧?
是否可以修改后处理算法,使这两个框只保留一个?

ncnn出问题

运行ncnn例子,无任何修改,出现这种情况,哪位大佬知道是什么问题
result

实际检测准确率

您好,请问这个实际检测的准确率很差,是我自己的问题嘛。大家有在端侧或虚拟机上尝试嘛?
image

请问大佬这训练过程是否有出错的地方,一直找不到训练出来的模型

新手学习中光是搭环境就用了好久时间,然后好像运行成功了训练可是又没找到如果训练成功了生成的模型在哪里
(fast38) PS C:\Users\Administrator\Desktop\code\python\FastestDet-main> python train.py --yaml weights/train/20230413-2-187/coco.yaml
Load yaml sucess...
<utils.tool.LoadYaml object at 0x0000026AC27A2D30>
Initialize params from:./module/shufflenetv2.pth

    Layer (type)               Output Shape         Param #

================================================================
Conv2d-1 [-1, 24, 176, 176] 648
BatchNorm2d-2 [-1, 24, 176, 176] 48
ReLU-3 [-1, 24, 176, 176] 0
MaxPool2d-4 [-1, 24, 88, 88] 0
Conv2d-5 [-1, 24, 44, 44] 216
BatchNorm2d-6 [-1, 24, 44, 44] 48
Conv2d-7 [-1, 24, 44, 44] 576
BatchNorm2d-8 [-1, 24, 44, 44] 48
ReLU-9 [-1, 24, 44, 44] 0
Conv2d-10 [-1, 24, 88, 88] 576
BatchNorm2d-11 [-1, 24, 88, 88] 48
ReLU-12 [-1, 24, 88, 88] 0
Conv2d-13 [-1, 24, 44, 44] 216
BatchNorm2d-14 [-1, 24, 44, 44] 48
Conv2d-15 [-1, 24, 44, 44] 576
BatchNorm2d-16 [-1, 24, 44, 44] 48
ReLU-17 [-1, 24, 44, 44] 0
ShuffleV2Block-18 [-1, 48, 44, 44] 0
Conv2d-19 [-1, 24, 44, 44] 576
BatchNorm2d-20 [-1, 24, 44, 44] 48
ReLU-21 [-1, 24, 44, 44] 0
Conv2d-22 [-1, 24, 44, 44] 216
BatchNorm2d-23 [-1, 24, 44, 44] 48
Conv2d-24 [-1, 24, 44, 44] 576
BatchNorm2d-25 [-1, 24, 44, 44] 48
ReLU-26 [-1, 24, 44, 44] 0
ShuffleV2Block-27 [-1, 48, 44, 44] 0
Conv2d-28 [-1, 24, 44, 44] 576
BatchNorm2d-29 [-1, 24, 44, 44] 48
ReLU-30 [-1, 24, 44, 44] 0
Conv2d-31 [-1, 24, 44, 44] 216
BatchNorm2d-32 [-1, 24, 44, 44] 48
Conv2d-33 [-1, 24, 44, 44] 576
BatchNorm2d-34 [-1, 24, 44, 44] 48
ReLU-35 [-1, 24, 44, 44] 0
ShuffleV2Block-36 [-1, 48, 44, 44] 0
Conv2d-37 [-1, 24, 44, 44] 576
BatchNorm2d-38 [-1, 24, 44, 44] 48
ReLU-39 [-1, 24, 44, 44] 0
Conv2d-40 [-1, 24, 44, 44] 216
BatchNorm2d-41 [-1, 24, 44, 44] 48
Conv2d-42 [-1, 24, 44, 44] 576
BatchNorm2d-43 [-1, 24, 44, 44] 48
ReLU-44 [-1, 24, 44, 44] 0
ShuffleV2Block-45 [-1, 48, 44, 44] 0
Conv2d-46 [-1, 48, 22, 22] 432
BatchNorm2d-47 [-1, 48, 22, 22] 96
Conv2d-48 [-1, 48, 22, 22] 2,304
BatchNorm2d-49 [-1, 48, 22, 22] 96
ReLU-50 [-1, 48, 22, 22] 0
Conv2d-51 [-1, 48, 44, 44] 2,304
BatchNorm2d-52 [-1, 48, 44, 44] 96
ReLU-53 [-1, 48, 44, 44] 0
Conv2d-54 [-1, 48, 22, 22] 432
BatchNorm2d-55 [-1, 48, 22, 22] 96
Conv2d-56 [-1, 48, 22, 22] 2,304
BatchNorm2d-57 [-1, 48, 22, 22] 96
ReLU-58 [-1, 48, 22, 22] 0
ShuffleV2Block-59 [-1, 96, 22, 22] 0
Conv2d-60 [-1, 48, 22, 22] 2,304
BatchNorm2d-61 [-1, 48, 22, 22] 96
ReLU-62 [-1, 48, 22, 22] 0
Conv2d-63 [-1, 48, 22, 22] 432
BatchNorm2d-64 [-1, 48, 22, 22] 96
Conv2d-65 [-1, 48, 22, 22] 2,304
BatchNorm2d-66 [-1, 48, 22, 22] 96
ReLU-67 [-1, 48, 22, 22] 0
ShuffleV2Block-68 [-1, 96, 22, 22] 0
Conv2d-69 [-1, 48, 22, 22] 2,304
BatchNorm2d-70 [-1, 48, 22, 22] 96
ReLU-71 [-1, 48, 22, 22] 0
Conv2d-72 [-1, 48, 22, 22] 432
BatchNorm2d-73 [-1, 48, 22, 22] 96
Conv2d-74 [-1, 48, 22, 22] 2,304
BatchNorm2d-75 [-1, 48, 22, 22] 96
ReLU-76 [-1, 48, 22, 22] 0
ShuffleV2Block-77 [-1, 96, 22, 22] 0
Conv2d-78 [-1, 48, 22, 22] 2,304
BatchNorm2d-79 [-1, 48, 22, 22] 96
ReLU-80 [-1, 48, 22, 22] 0
Conv2d-81 [-1, 48, 22, 22] 432
BatchNorm2d-82 [-1, 48, 22, 22] 96
Conv2d-83 [-1, 48, 22, 22] 2,304
BatchNorm2d-84 [-1, 48, 22, 22] 96
ReLU-85 [-1, 48, 22, 22] 0
ShuffleV2Block-86 [-1, 96, 22, 22] 0
Conv2d-87 [-1, 48, 22, 22] 2,304
BatchNorm2d-88 [-1, 48, 22, 22] 96
ReLU-89 [-1, 48, 22, 22] 0
Conv2d-90 [-1, 48, 22, 22] 432
BatchNorm2d-91 [-1, 48, 22, 22] 96
Conv2d-92 [-1, 48, 22, 22] 2,304
BatchNorm2d-93 [-1, 48, 22, 22] 96
ReLU-94 [-1, 48, 22, 22] 0
ShuffleV2Block-95 [-1, 96, 22, 22] 0
Conv2d-96 [-1, 48, 22, 22] 2,304
BatchNorm2d-97 [-1, 48, 22, 22] 96
ReLU-98 [-1, 48, 22, 22] 0
Conv2d-99 [-1, 48, 22, 22] 432
BatchNorm2d-100 [-1, 48, 22, 22] 96
Conv2d-101 [-1, 48, 22, 22] 2,304
BatchNorm2d-102 [-1, 48, 22, 22] 96
ReLU-103 [-1, 48, 22, 22] 0
ShuffleV2Block-104 [-1, 96, 22, 22] 0
Conv2d-105 [-1, 48, 22, 22] 2,304
BatchNorm2d-106 [-1, 48, 22, 22] 96
ReLU-107 [-1, 48, 22, 22] 0
Conv2d-108 [-1, 48, 22, 22] 432
BatchNorm2d-109 [-1, 48, 22, 22] 96
Conv2d-110 [-1, 48, 22, 22] 2,304
BatchNorm2d-111 [-1, 48, 22, 22] 96
ReLU-112 [-1, 48, 22, 22] 0
ShuffleV2Block-113 [-1, 96, 22, 22] 0
Conv2d-114 [-1, 48, 22, 22] 2,304
BatchNorm2d-115 [-1, 48, 22, 22] 96
ReLU-116 [-1, 48, 22, 22] 0
Conv2d-117 [-1, 48, 22, 22] 432
BatchNorm2d-118 [-1, 48, 22, 22] 96
Conv2d-119 [-1, 48, 22, 22] 2,304
BatchNorm2d-120 [-1, 48, 22, 22] 96
ReLU-121 [-1, 48, 22, 22] 0
ShuffleV2Block-122 [-1, 96, 22, 22] 0
Conv2d-123 [-1, 96, 11, 11] 864
BatchNorm2d-124 [-1, 96, 11, 11] 192
Conv2d-125 [-1, 96, 11, 11] 9,216
BatchNorm2d-126 [-1, 96, 11, 11] 192
ReLU-127 [-1, 96, 11, 11] 0
Conv2d-128 [-1, 96, 22, 22] 9,216
BatchNorm2d-129 [-1, 96, 22, 22] 192
ReLU-130 [-1, 96, 22, 22] 0
Conv2d-131 [-1, 96, 11, 11] 864
BatchNorm2d-132 [-1, 96, 11, 11] 192
Conv2d-133 [-1, 96, 11, 11] 9,216
BatchNorm2d-134 [-1, 96, 11, 11] 192
ReLU-135 [-1, 96, 11, 11] 0
ShuffleV2Block-136 [-1, 192, 11, 11] 0
Conv2d-137 [-1, 96, 11, 11] 9,216
BatchNorm2d-138 [-1, 96, 11, 11] 192
ReLU-139 [-1, 96, 11, 11] 0
Conv2d-140 [-1, 96, 11, 11] 864
BatchNorm2d-141 [-1, 96, 11, 11] 192
Conv2d-142 [-1, 96, 11, 11] 9,216
BatchNorm2d-143 [-1, 96, 11, 11] 192
ReLU-144 [-1, 96, 11, 11] 0
ShuffleV2Block-145 [-1, 192, 11, 11] 0
Conv2d-146 [-1, 96, 11, 11] 9,216
BatchNorm2d-147 [-1, 96, 11, 11] 192
ReLU-148 [-1, 96, 11, 11] 0
Conv2d-149 [-1, 96, 11, 11] 864
BatchNorm2d-150 [-1, 96, 11, 11] 192
Conv2d-151 [-1, 96, 11, 11] 9,216
BatchNorm2d-152 [-1, 96, 11, 11] 192
ReLU-153 [-1, 96, 11, 11] 0
ShuffleV2Block-154 [-1, 192, 11, 11] 0
Conv2d-155 [-1, 96, 11, 11] 9,216
BatchNorm2d-156 [-1, 96, 11, 11] 192
ReLU-157 [-1, 96, 11, 11] 0
Conv2d-158 [-1, 96, 11, 11] 864
BatchNorm2d-159 [-1, 96, 11, 11] 192
Conv2d-160 [-1, 96, 11, 11] 9,216
BatchNorm2d-161 [-1, 96, 11, 11] 192
ReLU-162 [-1, 96, 11, 11] 0
ShuffleV2Block-163 [-1, 192, 11, 11] 0
ShuffleNetV2-164 [[-1, 48, 44, 44], [-1, 96, 22, 22], [-1, 192, 11, 11]] 0
Upsample-165 [-1, 192, 22, 22] 0
AvgPool2d-166 [-1, 48, 22, 22] 0
Conv2d-167 [-1, 96, 22, 22] 32,256
BatchNorm2d-168 [-1, 96, 22, 22] 192
ReLU-169 [-1, 96, 22, 22] 0
Conv1x1-170 [-1, 96, 22, 22] 0
Conv2d-171 [-1, 96, 22, 22] 2,400
BatchNorm2d-172 [-1, 96, 22, 22] 192
ReLU-173 [-1, 96, 22, 22] 0
Conv2d-174 [-1, 96, 22, 22] 2,400
BatchNorm2d-175 [-1, 96, 22, 22] 192
ReLU-176 [-1, 96, 22, 22] 0
Conv2d-177 [-1, 96, 22, 22] 2,400
BatchNorm2d-178 [-1, 96, 22, 22] 192
ReLU-179 [-1, 96, 22, 22] 0
Conv2d-180 [-1, 96, 22, 22] 2,400
BatchNorm2d-181 [-1, 96, 22, 22] 192
ReLU-182 [-1, 96, 22, 22] 0
Conv2d-183 [-1, 96, 22, 22] 2,400
BatchNorm2d-184 [-1, 96, 22, 22] 192
ReLU-185 [-1, 96, 22, 22] 0
Conv2d-186 [-1, 96, 22, 22] 2,400
BatchNorm2d-187 [-1, 96, 22, 22] 192
ReLU-188 [-1, 96, 22, 22] 0
Conv2d-189 [-1, 96, 22, 22] 27,648
BatchNorm2d-190 [-1, 96, 22, 22] 192
ReLU-191 [-1, 96, 22, 22] 0
SPP-192 [-1, 96, 22, 22] 0
Conv2d-193 [-1, 96, 22, 22] 9,216
BatchNorm2d-194 [-1, 96, 22, 22] 192
ReLU-195 [-1, 96, 22, 22] 0
Conv1x1-196 [-1, 96, 22, 22] 0
Conv2d-197 [-1, 96, 22, 22] 2,400
BatchNorm2d-198 [-1, 96, 22, 22] 192
ReLU-199 [-1, 96, 22, 22] 0
Conv2d-200 [-1, 1, 22, 22] 96
BatchNorm2d-201 [-1, 1, 22, 22] 2
Head-202 [-1, 1, 22, 22] 0
Sigmoid-203 [-1, 1, 22, 22] 0
Conv2d-204 [-1, 96, 22, 22] 2,400
BatchNorm2d-205 [-1, 96, 22, 22] 192
ReLU-206 [-1, 96, 22, 22] 0
Conv2d-207 [-1, 4, 22, 22] 384
BatchNorm2d-208 [-1, 4, 22, 22] 8
Head-209 [-1, 4, 22, 22] 0
Conv2d-210 [-1, 96, 22, 22] 2,400
BatchNorm2d-211 [-1, 96, 22, 22] 192
ReLU-212 [-1, 96, 22, 22] 0
Conv2d-213 [-1, 4, 22, 22] 384
BatchNorm2d-214 [-1, 4, 22, 22] 8
Head-215 [-1, 4, 22, 22] 0
Softmax-216 [-1, 4, 22, 22] 0
DetectHead-217 [-1, 9, 22, 22] 0

Total params: 237,042
Trainable params: 237,042
Non-trainable params: 0

Input size (MB): 1.42
Forward/backward pass size (MB): 14982.11
Params size (MB): 0.90
Estimated Total Size (MB): 14984.44

use SGD optimizer
Starting training for 10 epochs...
Epoch:0 LR:0.000000 IOU:0.683203 Obj:0.118574 Cls:1.722579 Total:9.085380: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:03<00:00, 3.00s/it]
Epoch:1 LR:0.000002 IOU:0.682660 Obj:0.117858 Cls:1.543631 Total:8.890644: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 1.23s/it]
Epoch:2 LR:0.000026 IOU:0.642826 Obj:0.115852 Cls:1.679555 Total:8.675802: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 1.13s/it]
Epoch:3 LR:0.000130 IOU:0.680697 Obj:0.117061 Cls:1.506619 Total:8.825181: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 1.28s/it]
Epoch:4 LR:0.000410 IOU:0.693652 Obj:0.116068 Cls:1.497365 Total:8.903667: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 1.25s/it]
Epoch:5 LR:0.001000 IOU:0.667514 Obj:0.118184 Cls:1.399224 Total:8.630288: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 1.17s/it]
Epoch:6 LR:0.001000 IOU:0.641047 Obj:0.113830 Cls:1.065889 Total:8.015538: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 1.14s/it]
Epoch:7 LR:0.001000 IOU:0.598921 Obj:0.112864 Cls:0.718651 Total:7.315843: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 1.23s/it]
Epoch:8 LR:0.001000 IOU:0.596256 Obj:0.111394 Cls:0.472929 Total:7.025274: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 1.25s/it]
Epoch:9 LR:0.001000 IOU:0.594581 Obj:0.108997 Cls:0.344135 Total:6.844735: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 1.38s/it]
Epoch:10 LR:0.001000 IOU:0.586639 Obj:0.108114 Cls:0.251320 Total:6.674245: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 1.29s/it]
computer mAP...
0%| | 0/1 [00:00<?, ?it/s]D
:\python\anaconda\envs\fast38\lib\site-packages\torch\functional.py:568: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at
C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\TensorShape.cpp:2228.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00, 2.46s/it]
creating index...
index created!
creating index...
index created!
Running per image evaluation...
Evaluate annotation type bbox
DONE (t=0.03s).
Accumulating evaluation results...
DONE (t=0.01s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000

Is SPP class in custom_layers.py means Spatial Pyramid Pooling?

Sorry if I asked a stupid question. I thought it was a Spatial Pyramid Pooling (SPP) code but it looks different from what I know. Can someone enlighten me?

    def __init__(self, input_channels, output_channels):
        super(SPP, self).__init__()
        self.Conv1x1 = Conv1x1(input_channels, output_channels)

        self.S1 =  nn.Sequential(nn.Conv2d(output_channels, output_channels, 5, 1, 2, groups = output_channels, bias = False),
                                 nn.BatchNorm2d(output_channels),
                                 nn.ReLU(inplace=True)
                                 )

        self.S2 =  nn.Sequential(nn.Conv2d(output_channels, output_channels, 5, 1, 2, groups = output_channels, bias = False),
                                 nn.BatchNorm2d(output_channels),
                                 nn.ReLU(inplace=True),

                                 nn.Conv2d(output_channels, output_channels, 5, 1, 2, groups = output_channels, bias = False),
                                 nn.BatchNorm2d(output_channels),
                                 nn.ReLU(inplace=True)
                                 )

        self.S3 =  nn.Sequential(nn.Conv2d(output_channels, output_channels, 5, 1, 2, groups = output_channels, bias = False),
                                 nn.BatchNorm2d(output_channels),
                                 nn.ReLU(inplace=True),

                                 nn.Conv2d(output_channels, output_channels, 5, 1, 2, groups = output_channels, bias = False),
                                 nn.BatchNorm2d(output_channels),
                                 nn.ReLU(inplace=True),

                                 nn.Conv2d(output_channels, output_channels, 5, 1, 2, groups = output_channels, bias = False),
                                 nn.BatchNorm2d(output_channels),
                                 nn.ReLU(inplace=True)
                                 )

        self.output = nn.Sequential(nn.Conv2d(output_channels * 3, output_channels, 1, 1, 0, bias = False),
                                    nn.BatchNorm2d(output_channels),
                                   )
                                   
        self.relu = nn.ReLU(inplace=True)

    def forward(self, x):    
        x = self.Conv1x1(x)

        y1 = self.S1(x)
        y2 = self.S2(x)
        y3 = self.S3(x)

        y = torch.cat((y1, y2, y3), dim=1)
        y = self.relu(x + self.output(y))

        return y```

Set custom input resolution

Hi :)
Thanks for this repo

I have a problem with setting a custom input resolution in YAML config

DATASET:
  TRAIN: path\to\train\txt
  VAL: path\to\test\txt 
  NAMES: path\to\names
MODEL:
  NC: 1
  INPUT_WIDTH: 240
  INPUT_HEIGHT: 240
TRAIN:
  LR: 0.001
  THRESH: 0.25
  WARMUP: true
  BATCH_SIZE: 64
  END_EPOCH: 200
  MILESTIONES:
    - 50
    - 100
    - 150

When I try to train FastestDet with different resolution, not default 352x352 (for example, 240x240), I got RuntimeError:

Load yaml sucess...
<utils.tool.LoadYaml object at 0x00000299198CD8B0>
Initialize params from:./module/shufflenetv2.pth
Traceback (most recent call last):
  File "E:\Repositories\FastestDet\train.py", line 134, in <module>
    model = FastestDet()
  File "E:\Repositories\FastestDet\train.py", line 42, in __init__
    summary(self.model, input_size=(3, self.cfg.input_height, self.cfg.input_width))
  File "C:\Users\Reutov\.anaconda3\envs\experimental_env\lib\site-packages\torchsummary\torchsummary.py", line 72, in summary
    model(*x)
  File "C:\Users\Reutov\.anaconda3\envs\experimental_env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "E:\Repositories\FastestDet\module\detector.py", line 25, in forward
    P = torch.cat((P1, P2, P3), dim=1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 15 but got size 16 for tensor number 2 in the list.

Shapes of P1, P2 and P3 (240х240): torch.Size([2, 48, 30, 30]) torch.Size([2, 96, 15, 15]) torch.Size([2, 192, 8, 8])
Shapes of P1, P2 and P3 (358x358): torch.Size([2, 48, 44, 44]) torch.Size([2, 96, 22, 22]) torch.Size([2, 192, 11, 11])

I can guess that I need to change the architecture of the backbone network a little bit, but pretrained weights will not to load correctly in this case

How can I to change input resolution for training FastestDet?
Can I to change input resolution of FastestDet in convert to ONNX process?

Thanks in advance :)

Right way to convert coco dataset

import json
import cv2
import os
import matplotlib.pyplot as plt
import shutil
from tqdm import tqdm

def load_images_from_folder(input_path):
    file_names = []
    for filename in tqdm(os.listdir(input_path)):
        file_names.append(filename)
    return file_names

def get_img_ann(image_id):
    img_ann = []
    isFound = False
    for ann in data['annotations']:
        if ann['image_id'] == image_id:
            img_ann.append(ann)
            isFound = True
    if isFound:
        return img_ann
    else:
        return None

def get_img(filename):
    for img in data['images']:
        if img['file_name'] == filename:
            return img



if __name__ == "__main__":
    input_path = "/home/dzhang/data/coco/val2017/"
    output_path = "/home/dzhang/data/coco_fastest_net/val2017"
    annotation_path = "/home/dzhang/data/coco/annotations/instances_val2017.json"
    output_txt_path = "/home/dzhang/data/coco_fastest_net/val2017.txt"
    os.system(f"rm -rf {output_path}")
    os.makedirs(output_path, exist_ok=True)


    with open(annotation_path, 'r') as f:
        data = json.load(f)

    file_names = load_images_from_folder(input_path)
    labeled_file_names = []

    count = 0

    print("Processing labels...")
    for filename in tqdm(file_names):
    # Extracting image 
        img = get_img(filename)
        img_id = img['id']
        img_w = img['width']
        img_h = img['height']

        # Get Annotations for this image
        img_ann = get_img_ann(img_id)

        if img_ann:
            # Opening file for current image
            file_object = open(f"{output_path}/img{count}.txt", "a")

            for ann in img_ann:
                current_category = ann['category_id'] - 1 # As yolo format labels start from 0 
                if current_category > 79:
                    print("Current category larger than 79")
                    continue
                current_bbox = ann['bbox']
                x = current_bbox[0]
                y = current_bbox[1]
                w = current_bbox[2]
                h = current_bbox[3]
                
                # Finding midpoints
                x_centre = (x + (x+w))/2
                y_centre = (y + (y+h))/2
                
                # Normalization
                x_centre = x_centre / img_w
                y_centre = y_centre / img_h
                w = w / img_w
                h = h / img_h
                
                # Limiting upto fix number of decimal places
                x_centre = format(x_centre, '.6f')
                y_centre = format(y_centre, '.6f')
                w = format(w, '.6f')
                h = format(h, '.6f')
                    
                # Writing current object 
                file_object.write(f"{current_category} {x_centre} {y_centre} {w} {h}\n")
        

            file_object.close()

            # copy image here if successful
            source = os.path.join(input_path, filename)
            destination = f"{output_path}/img{count}.jpg"
            labeled_file_names.append(destination)
            shutil.copy(source, destination)
            count += 1  # This should be outside the if img_ann block.
        else:
            print(f"Image {filename}'s label not found.")
    with open(output_txt_path, 'w') as f:
        for filename in labeled_file_names:
            f.write(f"{filename}\n")
    print("Labels processed successfully.")

I use the above code to transfer coco dataset to standard format, but the evaluation result seems not to be right:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.028
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.054
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.027
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.006
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.025
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.051
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.031
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.043
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.045
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.013
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.047
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.079

Do you know why?

README weight_AP05:0.253207_280-epoch.pth Colon Issues

Hi,
developer,

I have noticed that the weight_AP05:0.253207_280-epoch.pth in weights folder has been changed to weight_AP05_0.253207_280-epoch.pth

Would you like to update the READEME.md colon's issue?
python3 test.py --yaml configs/coco.yaml --weight weights/weight_AP05:0.253207_280-epoch.pth --img data/3.jpg

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.