bubbliiiing / efficientdet-pytorch Goto Github PK

View Code? Open in Web Editor NEW

297.0 297.0 61.0 5.51 MB

这是一个efficientdet-pytorch的源码，可以用于训练自己的模型。

License: MIT License

Python 100.00%

efficientdet-pytorch's Introduction

Hi，很高兴遇见你 👋

🧡 专注于深度学习 Focusing on Deep Learning
🔨 复现各类深度学习算法，主要集中于图像，但属于NLP编外人员
🍬 期待可以去海边旅游
🥩 想吃但是更想瘦
📯 我的哔哩哔哩空间（Bilibili Video） https://space.bilibili.com/472467171
📚 我的CSDN博客（CSDN Blog） https://blog.csdn.net/weixin_44791964
🍱 我的知乎（Zhihu） https://www.zhihu.com/people/bubbliiiing
📜 我的微信公众号（Wechat Official Accounts） Bubbliiiing的深度学习小课堂

efficientdet-pytorch's People

Contributors

Stargazers

Watchers

efficientdet-pytorch's Issues

关于pretrain, 主干权值加载报错

Downloading: "https://publicmodels.blob.core.windows.net/container/advprop/efficientnet-b0-b64d5a18.pth" to /home/dyd/.cache/torch/hub/checkpoints/efficientnet-b0-b64d5a18.pth

urllib.error.HTTPError: HTTP Error 404: The specified resource does not exist.

F:\EfficientNet\efficientdet-pytorch-master\nets\efficientdet_training.py:264: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`). plt.figure()

运行train.py的时候，这个报错会对结果有影响吗？
怎么去除它呢？

关于loss函数的问题：在github代码efficientdet-pytorch/nets/efficientdet_training.py L48-L50,"重合度小于0.4需要参与训练 ",是否应该是"重合度大于0.4需要参与训练 "？

在github代码efficientdet-pytorch/nets/efficientdet_training.py L48-L50,"重合度小于0.4需要参与训练 ",是否应该是"重合度大于0.4需要参与训练 "？因为下面计算loss使用的是大于0.5的部分
这一点可能是导致下面那位同学没有训练出结果的原因

您好，想问一下，训练参数一致，最终结果不一样

您好，在使用您的FCOS模型时，训练了3次，都是在同一参数，数据集下训练的，可是最终得到的评估精度是不一样的，这个是正常的嘛？

train.py

Fail To Load Key: ['classifier.header.pointwise_conv.conv.weight', 'classifier.header.pointwise_conv.conv.bias'] ……
Fail To Load Key num: 2

温馨提示，head部分没有载入是正常现象，Backbone部分没有载入是错误的。
The expanded size of the tensor (46917) must match the existing size (49104) at non-singleton dimension 1. Target sizes: [1, 46917, 4]. Tensor sizes: [1, 49104, 4]
Error occurs, No graph saved
在运行时出现了这个错误

get_dr_txt.py出现问题

RuntimeError: Error(s) in loading state_dict for EfficientDetBackbone:
size mismatch for classifier.header.pointwise_conv.conv.weight: copying a param with shape torch.Size([396, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([810, 64, 1, 1]).
size mismatch for classifier.header.pointwise_conv.conv.bias: copying a param with shape torch.Size([396]) from checkpoint, the shape in current model is torch.Size([810]).
为什么会出现上面的维度不匹配的问题

请教预测的时候出现了bug

File "predict.py", line 17, in
r_image = efficientdet.detect_image(image)
File "/home/404/efficientdet-pytorch-master/efficientdet.py", line 109, in detect_image
detection = torch.cat([regression,classification],axis=-1)
TypeError: cat() got an unexpected keyword argument 'axis'
(yp) [root@localhost efficientdet-pytorch-master]#

关于模型的选择

作者您好，请问代码选择具体D0~D7的代码部分是在哪里的

多GPU训练

请问这个训练支持多GPU训练吗，在哪设置呢？

关于bifpn，多次执行bifpn次数的问题

从您的B站课堂过来的！大佬讲的真棒！！关于bifpn这里有一个地方没明白。论文中提到bifpn这个结构是需要使用多次的，您的代码里也提到了第一次bifpn结束以后会存在p3_out、p4_out.....p7_out它们会返回成为新的输入进行第二次bifpn操作。想麻烦问您一下代码中哪里对这个bifpn总共操作次数的值进行了定义？

用predict.py计算fps时出现的问题

Traceback (most recent call last):
File "e:/deeplearning/efficientdet-pytorch-master/efficientdet-pytorch-master/predict.py", line 120, in
tact_time = efficientdet.get_FPS(img, test_interval)
File "e:\deeplearning\efficientdet-pytorch-master\efficientdet-pytorch-master\efficientdet.py", line 237, in get_FPS
image_shape, self.letterbox_image, conf_thres = self.confidence, nms_thres = self.nms_iou)
TypeError: non_max_suppression() got multiple values for argument 'conf_thres'

运行train.py出现问题

runfile('E:/A/efficientdet-pytorch-master/train.py', wdir='E:/A/efficientdet-pytorch-master')
Reloaded modules: nets, nets.efficientdet, utils, utils.anchors, nets.efficientnet, nets.layers, nets.efficientdet_training
Traceback (most recent call last):

File "D:\software\Anaconda\envs\torch1.2\lib\site-packages\torch\utils\tensorboard_init_.py", line 2, in
from tensorboard.summary.writer.record_writer import RecordWriter # noqa F401

ModuleNotFoundError: No module named 'tensorboard.summary'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "E:\A\efficientdet-pytorch-master\train.py", line 16, in
from utils.callbacks import LossHistory

File "E:\A\efficientdet-pytorch-master\utils\callbacks.py", line 9, in
from torch.utils.tensorboard import SummaryWriter

File "D:\software\Anaconda\envs\torch1.2\lib\site-packages\torch\utils\tensorboard_init_.py", line 4, in
raise ImportError('TensorBoard logging requires TensorBoard with Python summary writer installed. '

ImportError: TensorBoard logging requires TensorBoard with Python summary writer installed. This should be available in 1.14 or above.

training warning

Epoch 3/25: 0%| | 0/204 [00:00<?, ?it/s<class 'dict'>]D:\Model\efficientDet\04\efficientdet-pytorch\Utils\dataloader.py:130: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
bboxes = np.array(bboxes)
D:\Model\efficientDet\04\efficientdet-pytorch\Utils\dataloader.py:130: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
bboxes = np.array(bboxes)

self._root = parser._parse_whole(source) UnicodeDecodeError: 'gbk' codec can't decode byte 0xac in position 129: illegal multibyte sequence

If you encounter the above problems, please follow the following additional coding format to solve.

in_file = open('VOCdevkit/VOC%s/Annotations/%s.xml'%(year, image_id), encoding='UTF_8')

trouble

in_file = open('VOCdevkit/VOC%s/Annotations/%s.xml'%(year, image_id))

method

in_file = open('VOCdevkit/VOC%s/Annotations/%s.xml'%(year, image_id), encoding='UTF_8')

TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect

/project/nets/layers.py:323: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
extra_h = (math.ceil(w / self.stride[1]) - 1) * self.stride[1] - w + self.kernel_size[1]
/project/nets/layers.py:324: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
extra_v = (math.ceil(h / self.stride[0]) - 1) * self.stride[0] - h + self.kernel_size[0]
/project/nets/layers.py:357: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
extra_h = (math.ceil(w / self.stride[1]) - 1) * self.stride[1] - w + self.kernel_size[1]
/project/nets/layers.py:358: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
extra_v = (math.ceil(h / self.stride[0]) - 1) * self.stride[0] - h + self.kernel_size[0]
/project/utils/anchors.py:25: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if image_shape[1] % stride != 0:
/project/utils/anchors.py:30: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
x = np.arange(stride / 2, image_shape[1], stride)
/project/utils/anchors.py:30: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
x = np.arange(stride / 2, image_shape[1], stride)
/project/utils/anchors.py:31: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
y = np.arange(stride / 2, image_shape[0], stride)
/project/utils/anchors.py:31: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
y = np.arange(stride / 2, image_shape[0], stride)
/project/utils/anchors.py:49: TracerWarning: torch.from_numpy results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
anchor_boxes = torch.from_numpy(anchor_boxes).to(image.device)

predict遇到问题

Traceback (most recent call last):
File "E:\yk\Code\efficientdet-pytorch\predict.py", line 77, in
r_image = efficientdet.detect_image(image, crop = crop, count=count)
File "E:\yk\Code\efficientdet-pytorch\efficientdet.py", line 216, in detect_image
draw.rectangle([left + i, top + i, right - i, bottom - i], outline=self.colors[c])
File "D:\Anaconda\envs\objectbox\lib\site-packages\PIL\ImageDraw.py", line 296, in rectangle
self.draw.draw_rectangle(xy, ink, 0, width)
ValueError: x1 must be greater than or equal to x0

作者大大好，我在使用d3预训练权重进行预测时，出现了这个错误，不知道是什么原因

训练预测出现问题

你好，我训练完50个epoch之后重新加载测试模型，出现了以下错误信息，是我没有改fc的输出导致的吗？谢谢
Traceback (most recent call last):
File "/home/yueyu/efficientdet-pytorch/predict.py", line 7, in
efficientdet = EfficientDet()
File "/home/yueyu/efficientdet-pytorch/efficientdet.py", line 53, in init
self.generate()
File "/home/yueyu/efficientdet-pytorch/efficientdet.py", line 74, in generate
self.net.load_state_dict(state_dict)
File "/home/yueyu/miniconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 839, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for EfficientDetBackbone:
size mismatch for classifier.header.pointwise_conv.conv.weight: copying a param with shape torch.Size([180, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([36, 64, 1, 1]).
size mismatch for classifier.header.pointwise_conv.conv.bias: copying a param with shape torch.Size([180]) from checkpoint, the shape in current model is torch.Size([36]).

下载预训练权值出现问题：无效哈希值

b导好，我在下载预训练权值时给我报错如下,这是为什么呢？
RuntimeError: invalid hash value (expected "b0", got "73f3a3d3c70508a1dfc1fcb58f8ba0edb1a5aaf2f0aaa2ce4dcd34b18b1a97df")

如何将训练好的.pth权重和模型转换为 .onnx or .pb通用模型文件

非常感谢博主视频和博文，已经跟了博主学习了一段时间，但是无法将训练好的权重转换为onnx或pb模型，建议能否出一期视频或博文，专门讲如何把训练好的权重与模型的各种格式文件(如.pt .pth.h5)转换为.onnx或.pb后缀的通用模型文件，以便于其他平台部署和推理。这有利于将视觉深度学习在工业环境中得到应用，非常感谢。

请问efficiendet检测小目标的效果怎么样？我20482048的图片目标大小为1616，好像检测不出来。

训练自己的数据，mAP很低

作者您好，感谢你的工作。
我使用efficientdet-d3对其他数据集进行训练，效果比较差，可以请您帮忙分析一下吗？谢谢！
数据集是铝型材表面缺陷检测数据。
前30个epoch冻结主干，之后解冻训练，训练到val_loss不再降低。
map结果如下：
`Get map.

2.78% = 不导电 AP || score_threhold=0.5 : F1=0.00 ; Recall=0.00% ; Precision=0.00%

0.00% = 喷流 AP || score_threhold=0.5 : F1=0.00 ; Recall=0.00% ; Precision=0.00%

0.00% = 擦花 AP || score_threhold=0.5 : F1=0.00% ; Recall=0.00% ; Precision=0.00%

38.01% = 杂色 AP || score_threhold=0.5 : F1=0.11 ; Recall=5.56% ; Precision=100.00%

13.00% = 桔皮 AP || score_threhold=0.5 : F1=0.00 ; Recall=0.00% ; Precision=0.00%

0.00% = 漆泡 AP || score_threhold=0.5 : F1=0.00% ; Recall=0.00% ; Precision=0.00%

0.00% = 漏底 AP || score_threhold=0.5 : F1=0.00 ; Recall=0.00% ; Precision=0.00%

5.01% = 脏点 AP || score_threhold=0.5 : F1=0.14 ; Recall=10.84% ; Precision=21.43%

0.00% = 角位漏底 AP || score_threhold=0.5 : F1=0.00% ; Recall=0.00% ; Precision=0.00%

0.00% = 起坑 AP || score_threhold=0.5 : F1=0.00 ; Recall=0.00% ; Precision=0.00%

mAP = 5.88%

Get map done.`

每个epoch的val_loss如下：
692.5821533203125 18.020043546503242 6.0580932082551895 2.9250449375672773 1.7247823729659573 1.1773547078623916 0.8997215122887583 0.7432047751816836 0.6556733747323354 0.5967527682130987 0.5552684529261156 0.5247120744351185 0.5120401086680817 0.490136006564805 0.48692420892643207 0.4789598101016247 0.46626631509174 0.47130427938519104 0.46737760531179834 0.4521835113339352 0.4599695607568278 0.45360165202256403 0.44911260993191693 0.4493762557253693 0.4430947389566537 0.44133371464682347 0.44138025108611945 0.44033303834272153 0.4294479764772184 0.4288688725368543 0.43481780243898505 0.4259181917825742 0.4300796524691048 0.41619802549926205 0.41009346857222156 0.4145459183561268 0.4220570447618392 0.4012312131530758 0.3930108847735978 0.4246373758998825 0.38846801557758853 0.39046983073340424 0.5298791383462611 0.3835990623251271 0.424211601240199 0.40973395342702296 0.374117885035143 0.3753899414537113 0.3762250279924318 0.3619030268668239 0.4167037764147146 0.37196493557473614 0.3779459266798265 0.3777198547691996 0.3685086570235331 0.3662413968635139 0.3724715964062445 0.37364781386594276 0.3646606824068881 0.40380949271259026 0.36690836080085876 0.39359222296903384 0.35407807564001476 0.35932715515147395 0.35652801939355794 0.3581762860126015 0.3570717303777364 0.3549506452712995 0.3696301862208256 0.36184008566857273 0.35130936827566195 0.35368453527786836 0.36518700056667647 0.34808981838399794 0.3540204591326304 0.35394072493732864 0.3597586819651856 0.35117012387447394 0.35453211001829427 0.338129321863847 0.354188849065286 0.3480956403733189 0.3560672045421244 0.3494431595665528 0.3579848694489963 0.3562058041344828 0.3504434914620065 0.36181518032368437 0.3520742502730729 0.3408385811568196 0.3392267982239154 0.34833704064419463 0.3375129512330489 0.34490444361051514 0.3474811746168937 0.3642430662997623 0.3400071593029286 0.3533157893359216 0.34791290949084863 0.35537530932186256 0.3504680647000448 0.3470999870949717 0.3505480893011858 0.351605375561474 0.35297540341740224 0.3379963515743391 0.34117161613235725 0.3530546065364311 0.35188829584686615 0.35485441400322004 0.3438295838492575 0.3458844947058763 0.35485429858872247 0.3565744514674393 0.34367825498165033 0.34764408359109467 0.35074018681449676 0.3437748175225596 0.34253188953804437 0.3441715170976831 0.3461703913870142 0.34832563582084963 0.3480878883222146 0.34780260700899274 0.3481335105068648 0.3435267209172694 0.34988239888491024 0.35219536335277024 0.3490558216876503 0.34662854827162043 0.3428226285961582 0.3569837624985558 0.34416547353699134 0.34747738218796786 0.34722422989113116 0.34134776695673147 0.343578678819893 0.3511959823654659 0.3519623815012512 0.34963406944897635 0.3476591583118955 0.34318768255301374 0.3484093218819419 0.35494244101443395 0.3509057753010472 0.3456782892957997 0.3371015375118647 0.3482280739708178 0.3487955643447922 0.3454236375140165 0.35292010598663076 0.3519064793780224 0.3401252330461545 0.3439494139278558 0.34353317512171483 0.34736122029708394 0.3405051428491055 0.34928369027242734 0.34589640301332547 0.34704446495135327 0.348259352842596 0.34758604331803855 0.34305048622746964 0.3531607194428346 0.3395370377311066 0.3502034348950012 0.3446665857988062 0.3422466347605657 0.34934172028703475 0.35288102302088664 0.35932373247151056 0.3504922179255023 0.3531327823093578 0.3520925745868416 0.3523084268029501 0.34706903154503055 0.35040349913622015 0.3543376238432838 0.3538556429766007 0.34093494554842585 0.3473847135901451 0.3451451262764966 0.34536995963930195 0.3503570426533471 0.34914408012557385 0.3532811352399303 0.34506166659629167 0.3482327019489968 0.3509918149393886 0.3524799474806928 0.35067986360570386 0.3517061372134668 0.3485593108543709 0.3451726289827432 0.34738497355424647 0.34392267045801256 0.3420782624674377 0.33875321841506817 0.347360303236255 0.3521433267186382 0.3485356454473378 0.34775876684753754 0.3512924103094126 0.3482155179632689 0.3382312229264583 0.35628125399573524 0.34736081468525215 0.3492826893369653 0.3421760888964827 0.3456490655610366 0.34405294005105747 0.34908889931862924 0.34774825335549775 0.35118296286508216 0.3519998987886443 0.3402652802474018 0.34220543765087624 0.34587571233399766

关于feature_size

标签框筛选问题

您好，想问一下代码在加载标签时有根据宽高比筛选标签框的操作吗，我发现训练出来的模型对细长直的目标检测效果比较差，有没有可能是在训练阶段这类极限宽高比的标签被筛掉了

voc map

请问用efficientDet D0训练voc数据集的map是多少呢，我训练时的最高精度是三十多，我觉得不大行

bug

训练问题

博主您好。
训练过程中，会突然报错，错误代码如下所示：
E:\py_file\efficientdet-pytorch-master\venv\Scripts\python.exe E:/py_file/efficientdet-pytorch-master/train_1.py
Loading weights into state dict...
Finished!
Start Train
Epoch 1/50: 26%|██▌ | 428/1675 [09:09<26:39, 1.28s/it, Conf Loss=994, Regression Loss=0.0386, lr=0.001]
Traceback (most recent call last):
File "E:/py_file/efficientdet-pytorch-master/train_1.py", line 214, in
val_loss = fit_one_epoch(net, efficient_loss, epoch, epoch_size, epoch_size_val, gen, gen_val, Freeze_Epoch, Cuda)
File "E:/py_file/efficientdet-pytorch-master/train_1.py", line 44, in fit_one_epoch
targets = [torch.from_numpy(ann).type(torch.FloatTensor).cuda() for ann in targets]
File "E:/py_file/efficientdet-pytorch-master/train_1.py", line 44, in
targets = [torch.from_numpy(ann).type(torch.FloatTensor).cuda() for ann in targets]
TypeError: can't convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, int64, int32, int16, int8, uint8, and bool.

查询之后没有找到解决办法，希望能得到您的帮助，谢谢

训练日志 pr输出指定训练gpu

因为我在服务器上运行这个程序，但是没有看到设定日志保存的地方只看到了权重文件的保存
计算map可以指定阈值（如0.5）下的recall precision
最后没有发现可以指定gpu的地方负责一旦运行程序所有gpu都得工作影响别人的工作，请大神指点

RuntimeError: CUDA error: device-side assert triggered

使用自己的数据集进行训练时出现了这个错误，请问应该怎么解决呢？
./aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [1,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [2,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [3,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [4,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [5,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [6,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [7,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [8,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [9,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [10,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [11,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [12,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [13,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [14,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [15,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [16,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [17,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [18,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [19,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [20,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [21,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [22,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [23,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [24,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [25,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [26,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [27,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [28,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [0,0,0], thread: [29,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
Traceback (most recent call last):
File "/home/jxt/.conda/envs/efficient/lib/python3.8/multiprocessing/queues.py", line 245, in _feed
send_bytes(obj)
File "/home/jxt/.conda/envs/efficient/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
self._send_bytes(m[offset:offset + size])
File "/home/jxt/.conda/envs/efficient/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
self._send(header + buf)
File "/home/jxt/.conda/envs/efficient/lib/python3.8/multiprocessing/connection.py", line 368, in _send
n = write(self._handle, buf)
Traceback (most recent call last):
File "/mnt/disk1/data0/jxt/efficientdet/train.py", line 504, in
fit_one_epoch(model_train, model, focal_loss, loss_history, eval_callback, optimizer, epoch,
File "/mnt/disk1/data0/jxt/efficientdet/utils/utils_fit.py", line 37, in fit_one_epoch
loss_value, _, _ = focal_loss(classification, regression, anchors, targets, cuda = cuda)
File "/home/jxt/.conda/envs/efficient/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/disk1/data0/jxt/efficientdet/nets/efficientdet_training.py", line 210, in forward
if positive_indices.sum() > 0:
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

和其他检测模型对比问题

这边我跑的b0网络和ssd效果对比，比ssd低了5个点，这个效果正常吗，其他朋友跑这个模型效果怎么样呢

COCO 训练问题

请问作者，COCO训练集您训练了多少个epoch

作者，您好问您一个小问题

请问这个预测可以批量预测图片并保存吗？还有视频，能预测后进行保存吗？

关于先验框

代码里怎么没有先验anchor的设置呢？

数据转换

在annotations下的xml文件全部都转换到train的数据了,剩余的txt文档全是空的啊

用MAP出现bug

您好，非常感谢分享，我再执行get_dr_txt.py时，发现生成得detection下得txt文件全空，查找错误发现，执行test策略时，执行EfficientNet的forward时(位置在代码目录$./nets/efficientdet.py)，第413行x = self.model._bn0(x)，也即self._bn0 = nn.BatchNorm2d(num_features=out_channels, momentum=bn_mom, eps=bn_eps)函数时，上一步不同的输出数值，经过bn0后结果都变为相同的输出，不管接受的是什么输入，只输出'backbone_net.model._bn0.bias'的值，并且weight并不为0，查看上一步的结果，至于形状
test策略时：
[16, 3, 1024, 1024] input
[16, 48, 512, 512] 经过x = self.model._conv_stem(x)
train策略时：
[2, 3, 1024, 1024] input
[2, 48, 512, 512] 经过x = self.model._conv_stem(x)

test策略调用的模型是训练的第40代(phi=4)，训练误差和验证误差均为0.0002。

试着调整test时的送入形状，我分别截取输出，使map调用输入的形状和训练时的，两者形状完全一致，还是有以上错误；

截取了训练时的数据，结果证明是完全没问题的，

查看训练和map输入的数据，进行对比发现，并无明显差异，

非常期待您的回复，谢谢

map太低

训练d1去检测织物瑕疵（图片大小1280x1080，标签大小50x900，有些更小），训练出来map都是0.04这样的

loss问题

训练的时候，验证loss前几个值能达到几十万，虽然100epoch之后loss也只有1.多，但是map是0，数据集格式上我用在大佬你yolov3上能用，但是在这上面map一直是0，请问还可能是什么原因呢？模型用的d1

get_dr_txt.py 报错

RuntimeError: Error(s) in loading state_dict for EfficientDetBackbone:
size mismatch for classifier.header.pointwise_conv.conv.weight: copying a param with shape torch.Size([45, 112, 1, 1]) from checkpoint, the shape in current model is torch.Size([189, 112, 1, 1]).
size mismatch for classifier.header.pointwise_conv.conv.bias: copying a param with shape torch.Size([45]) from checkpoint, the shape in current model is torch.Size([189]).
用的phi=2 pth文件也改了，为什么维度对不上呢

博主您好，我才使用您的efficientDet代码时候，分别用了两个不同的数据集，但是效果却相差较大。

  这两个数据集的图像尺寸均在1000像素左右
   效果差的那个数据集，存在这样的情况：共有10类目标，我的训练策略是使用efficientd-4作为初始权重。所测试的结果其map仅能达到75左右，但是用frcnn ssd retinanet等均能达到85-89左右。但是当我改用efficient d-3作为初始权重的时候，在batchsize=4的设置下，经过100epoch所测试的的各类ap只有个位数大小，实在好奇怪。
  但是另一个数据集，效果很好可以达到非常理想的map，与frcnn ssd retinanet等均相近甚至更高。
 希望能够得到您的帮助！非常感谢！