Giter Club home page Giter Club logo

deeplabv3plus-pytorch's People

Contributors

kant avatar yudewang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deeplabv3plus-pytorch's Issues

xception training time

我用xception的模型跑自己的数据,大概20w张图,现在跑了两天,一轮训练的结果都没有跑完,请问这个进度是不是有异常。因为相同的数据使用deeplabv3跑,两天可以训练40个轮次。请问是不是哪里出了问题。

resnet pretrained model

can I do object detection with resnet -atrous directly ? So, how i get the pretrained model ?

cannot use xception backbone with pretrained checkpoint

Hello.

Thanks for the great code.
I have a question about using xception as a backbone.
I modified config.py to run the code with xception backbone & pretrained checkpoint

self.MODEL_BACKBONE = 'xception'
...
self.TEST_CKPT = os.path.join(self.ROOT_DIR,'lib/deeplabv3plus_xception_VOC2012_epoch46_all.pth')


But I got some errors when I run the code with this configuration.

==============================================================
Traceback (most recent call last):
File "test.py", line 94, in
test_net()
File "test.py", line 27, in test_net
net = generate_net(cfg)
File "/home/jtkim/project/lgd_2nd/ref_code/deeplabv3plus-pytorch-master/lib/net/generateNet.py", line 15, in generate_net
return deeplabv3plus(cfg)
File "/home/jtkim/project/lgd_2nd/ref_code/deeplabv3plus-pytorch-master/lib/net/deeplabv3plus.py", line 51, in init
self.backbone = build_backbone(cfg.MODEL_BACKBONE, os=cfg.MODEL_OUTPUT_STRIDE)
File "/home/jtkim/project/lgd_2nd/ref_code/deeplabv3plus-pytorch-master/lib/net/backbone.py", line 25, in build_backbone
net = xception.xception(pretrained=pretrained, os=os)
File "/home/jtkim/project/lgd_2nd/ref_code/deeplabv3plus-pytorch-master/lib/net/xception.py", line 239, in xception
model.load_state_dict(model_dict)
File "/home/jtkim/.conda/envs/DeepLab/lib/python3.6/site-packages/torch/nn/modules/module.py", line 845, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Xception:
Unexpected key(s) in state_dict: "module.aspp.branch1.0.weight", ...

It seems like network models described in the pretrained checkpoint and the code are not matched. Please give me some advice about this problem.
It's perfectly run when I use resnet 101.

Thanks,

different TRAIN_BN_MOM for COCO and VOC

Thank you for your decent work! I noticed that the TRAIN_BN_MOM is set to 0.0003 in deeplabv3+voc/config.py, but remains default value 0.1 for COCO experiment.
How do you choose the value for BN momentum? I couldn't find any suggestion for BN momentum from the deeplabv3+ paper.

problem of running code

FileNotFoundError: [Errno 2] No such file or directory: '/home/wangyude/.torch/models/xception_pytorch_imagenet.pth'

This is the error, thank you.

Error with 'sync_batchnorm' when batchsize=1

Hi, I got an error as below when I try to run your code, and I found this is caused by 'sync_batchnorm' when this batch only have one sample.

feature_aspp = self.aspp(layers[-1])

result = self.forward(*input, **kwargs)

File "modules/ASPP.py", line 60, in forward

global_feature = self.branch5_bn(global_feature)

raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size))

File "/home/woshishui/anaconda3/envs/segmentation/lib/python3.6/site-packages/torch/nn/functional.py", line 1619, in batch_norm

ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 256, 1, 1])

Performance on cityscapes

Thanks for your work!I am a student .
I have trained your model on Cityscapes , I only modified train.py and config.py in the directory /deeplabv3plus-pytorch/experiment/deeplabv3+voc/ and dataset's path in cityscapes.py in /datasets/,and used 2 gpus.But the results is very bad ,test results only get 19.89% ,and I see images during training by tensorboardX, the images seems lose some classes.

Did I do something wrong ?Is there anything I haven't modified? Hope for your respond! Thanks very much!

1
2
3

How much time do 46 epochs cost for pascal voc 2012

I am sorry to bother again. I have run the code for training on pascal voc 2012 with the backbone of resnet-101. But one epoch takes me 30 minutes. I train the network with 4 GPUs and the batch_size is 16. I think it is a little slow compared with tensorflow. So, I want to ask how much time do 46 epochs cost normally.

Some questions about dataset

作者你好,我在下载VOC数据集的时候,发现VOC2012只有大约3k张图片是用来语义分割的,所以您训练的时候用的是官方的VOC还是对VOC进行扩展的数据集?

Why no BN in the last few conv layers?

Hi
I have some questions about the Resent-backbone Network.

  1. Why no Bn in the last few Conv layers.
  2. Why the dilation rates of block-4 of stride16 is (1, 2, 1) rather than (2, 4, 2) as proposed in the paper of deeplabV3. Also, when output stride is 8, the rates should be (4, 8, 4) for block4, am I right?
  3. I noticed the dilation setting of your Xception-middle-flow is different from others. Do you have some reference for your setting?
  4. How do you normalize your input. I didn't find the normalize part of in you own dataset.

multi-scale test severely damage the performance

without multi-scale I can get 79%, the normal performance, but with multi-scale, I can only get 64%. Wonder if you encounter this situation before? Please help me.
here are some plots:
image
image
image
image
they are input image, rate1.0, rate0.5, rate1.75
the result are somewhat bad at all the multi-scale rates other than 1.0.

Performence on cityscape

你好,我按照您的做法,设置epoch=160,lr=0.007,但是的到的mIOU大约在17%左右,而且我使用的是您提供的Xception_imagnet的pretrain的模型,请问您最近有重新训练吗?能不能repo您的训练结果和loss的曲线噶给我?谢谢!

Learning rate for BatchNorm layers

Hello, thanks for the great work!
I noticed the function “get_param” and it seems not to set the learning rate of Bn layers . Do you think it’s not necessary even harmful to set bn layers learnable? Or somehow I misunderstood.
Thanks!

voc test.py mIOU value is error

I reappear the project, the training result feels ok. But the mIOU value tested was only 20.733%.
Training configuration information:

  1. Use deeplabv3+voc folder under train.py test.py
  2. Use VOCdevkit data sets
    3, self.data_aug = False
  3. Self.TRAIN_BATCHES = 10
    5, self.train_epochs = 300
  4. Backbone res101_atrous has no pretraining model parameter for the base model

Test configuration information:
Self. DATA_AUG = False
Self. TEST_MULTISCALE = [1.0]
Self. TEST_BATCHES = 18

Screenshots of training and testing:
acc
loss
test-miou

When you want to ask test what's wrong with the mIOU value being so low.

question on tensor allocation

Hi thanks for your implementation.

I modified your code for my task, and I got an error about gpu resource allocation problem:
in your train.py, there is a line of code: device = torch.device(0) and net.to(device) ,
which means we used gpu0 to store the networks.

then for labels_batched and labels_batched:
you used .to(1) , which means we put tensors to gpu1 for loss calculation.

is that a problem due to two different gpus? because I got an error indicating that the tensor/weight/gradient for calculating CrossEntropyLoss are not from the same gpu, and they should moved to same gpu.

there will be no error if I change .to(1) to .to(0).

I'm using pytorch 0.4.1, and use more than 1 gpu. Much appreciated if you can have a look at my question.

one more question, with one gpu, the per gpu memory usage for 4 images sized 512x512 input is 7.6G, but with multiple gpus, the per gpu memory usage for 4 inputs is around 10G, is that normal? it seemed data parallelization consumed more than 2G gpu ram.

VOC数据集里的白色mask部分被算作了背景

作者你好,如果我理解正确的话,
sample['segmentation']中将白色margin转换成了背景(label为0)
sample['mask']中存放忽略计算的mask掩模
但是在训练的时候,
使用的是criterion = nn.CrossEntropyLoss(ignore_index=255)
并没有使用loss.py里mask类loss funs,这样一来,相当于白色不进入计算的margin变成了背景类。
这块代码比较好修改,但是想问一下,train.py里这样写,是因为把margin看做背景效果会更好吗?为什么没有使用自己构造的loss fun呢?
感谢这份repo,帮了我很大忙~

About atrous/Multi-Grid

Hi:

Nice work! But I found a place below which might be wrong.

self.layer4 = self._make_layer(block, 1024, 512, layers[3], stride=stride_list[2], atrous=[item*16//os for item in atrous])

In the DeepLab v3 paper, author introduces the Multi-grid, or atrous variable here.

paper

The author takes an example that when OS=16 with multi-grid=(1,2,4), the rates should be (2,4,8). However, code here would give a rates of (1,2,4), which is wrong. So atrous part of this line should be change to

atrous=[item*32//os for item in atrous]

Is that right?

How can I get mIoU=79.916%?

Hello, I have a problem.
1.Same hyperparamters with you, backbone=resnet101
2.Use 4 GPUs, batch_size=16
3.pretrained model=resnet101-5d3b4d8f.pth
4.Train dataset=VOC2012AUG(10582)
5.Test dataset=VOC2012val(1449)
6.Use syncbatchnorm
But I get best mIoU is 77.359% with 350 epochs. How can I get 79.916% as you.
Thanks.

VOC augmented segmentation dataset from DrSleep

感谢你的分享,有个问题:说明里要下载 VOC augmented segmentation dataset从你给定的网址DrSleep,下载下来的图是单通道灰度图,而VOC官网下载的图是三通道彩图,这样做的目的是什么?

Two question about transform on dataset

你好,我没有找到你对数据集的归一化处理,是不是这一操作没有什么效果?另一个问题是定义数据集的类的时候发现图像的读取代码是:

image = cv2.imread(img_file)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
#image = np.array(io.imread(img_file), dtype=np.uint8)

这个注释掉的用skimage读取图像的代码是不是可以用来替换掉前两行的opecv读取代码的?

Core Dump error

你好,我在训练网络的时候发现直接报Core Dump ,后经过排查,发现时tensorboardX的问题,我之前没装过tensorboardX,请问您使用的是什么版本啊?安装的过程能详细告知吗?谢谢~

pretrained model

作者你好,感谢您的工作,最近我在复现deeplabv3plus的一些效果,有几个点略有疑惑,
1.当我直接使用deeplab-xception的结构在pascal数据集上训练时,由于样本数量有限,总是出现过拟合的情况,即便是加载了您代码中提供的xception预训练模型也如此,因此我想了解一下您是如何复现成功的。
2.您提供的deeplab-xception-46的预训练模型是在哪些数据集上训练出来的呢?

single GPU inference error

hi,我用你原来的多GPU的train.py训练代码训练好的模型,我想用单块GPU做inference,修改好cfg.TEST_GPUS=1,去掉了多个代码中的*.to(1)这种情况,然后运行test.py但是会报错呢,报错内容如下,请问这是啥原因呢,感谢~

Traceback (most recent call last):
  File "test.py", line 94, in <module>
    test_net()
  File "test.py", line 42, in test_net
    net.load_state_dict(model_dict)
  File "/home/mypc/lib/anaconda2/envs/python36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 719, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for deeplabv3plus:
	Missing key(s) in state_dict: "aspp.branch1.0.weight", "aspp.branch1.0.bias", "aspp.branch1.1.weight", "aspp.branch1.1.bias", "aspp.branch1.1.running_mean", "aspp.branch1.1.running_var", "aspp.branch2.0.weight", "aspp.branch2.0.bias", "aspp.branch2.1.weight", "aspp.branch2.1.bias", "aspp.branch2.1.running_mean", "aspp.branch2.1.running_var", "aspp.branch3.0.weight", "aspp.branch3.0.bias", "aspp.branch3.1.weight", "aspp.branch3.1.bias"........................

deeplabv3+ VOC

您好。请问你这边能否提供训练好的模型,我测试一下,因为我这便的训练结果并不能达到你说的这样。deeplabv3+voc。
谢谢

ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 256, 1, 1])

Hello, when I use the code self.unet = DeepLabV3Plus(model_backbone='res50_atrous', num_classes=self.output_ch) to invoke a model, some error happen:

Train Loss: 0.0088223, lr: params_group_0: 0.000200000000, : 100%|██████████████████████████████████████████████████████████████████████████████████████▉| 1067/1068 [17:38<00:00,  1.02it/s]Traceback (most recent call last):
  File "train_sfold.py", line 194, in <module>
    main(config)
  File "train_sfold.py", line 79, in main
    solver.train(index)
  File "/mnt/Data/mxq/project/Kaggle-Pneumothorax-Seg/solver.py", line 192, in train
    net_output = self.unet(images)
  File "/home/lab3/anaconda3/envs/mxq/lib/python3.5/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/lab3/anaconda3/envs/mxq/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 152, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/home/lab3/anaconda3/envs/mxq/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 162, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/home/lab3/anaconda3/envs/mxq/lib/python3.5/site-packages/torch/nn/parallel/parallel_apply.py", line 83, in parallel_apply
    raise output
  File "/home/lab3/anaconda3/envs/mxq/lib/python3.5/site-packages/torch/nn/parallel/parallel_apply.py", line 59, in _worker
    output = module(*input, **kwargs)
  File "/home/lab3/anaconda3/envs/mxq/lib/python3.5/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/mnt/Data/mxq/project/Kaggle-Pneumothorax-Seg/models/deeplabv3/deeplabv3plus.py", line 67, in forward
    feature_aspp = self.aspp(layers[-1])
  File "/home/lab3/anaconda3/envs/mxq/lib/python3.5/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/mnt/Data/mxq/project/Kaggle-Pneumothorax-Seg/models/deeplabv3/ASPP.py", line 57, in forward
    global_feature = self.branch5_bn(global_feature)
  File "/home/lab3/anaconda3/envs/mxq/lib/python3.5/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/mnt/Data/mxq/project/Kaggle-Pneumothorax-Seg/models/deeplabv3/sync_batchnorm/batchnorm.py", line 53, in forward
    self.training, self.momentum, self.eps)
  File "/home/lab3/anaconda3/envs/mxq/lib/python3.5/site-packages/torch/nn/functional.py", line 1693, in batch_norm
    raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size))
ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 256, 1, 1])

my input image size is [8, 3, 768, 768]

Single GPU?

Hi, I wonder why only multi-GPU is allowed currently. If I replace the synchronized BN with normal BN and use PyTorch built-in DataParallel, can I train it with only 1 GPU? Also, is single-GPU inference supported at this moment?

The performance paper reported is 89% mIoU on PASCAL VOC 2012

Hi. In the paper of ECCV 2018, Deeplabv3+ achieved the performance of 89% of mIoU on PASCAL VOC dataset. But you claimed that the performance in paper is far lower(about 10%), do you just made a mistake of not seeing it right? By the way, the benchmark is far higher than the code can do, can you solve it?

关于ASPP的结构,有疑问

根据您提供的ASPP源码,根据deeplabv3+的模型结构,branch5应该是池化操作,但在您的源码中,branch5是卷积操作,如下所示:
self.branch5_conv = nn.Conv2d(dim_in, dim_out, 1, 1, 0,bias=True)
self.branch5_bn = SynchronizedBatchNorm2d(dim_out, momentum=bn_mom)
self.branch5_relu = nn.ReLU(inplace=True)
是否应该为:
self.branch5_pool = nn.AdaptiveAvgPool2d((1,1)) #此处由卷积操作改为池化操作
self.branch5_bn = SynchronizedBatchNorm2d(dim_out, momentum=bn_mom)
self.branch5_relu = nn.ReLU(inplace=True)
期待回答~

Performance is lower than the results you report.

By using the default config you provide, I have achieved 79.206 % (your result is 79.916 %) based on deeplabv3+res101 on the pascalvoc validation set. Can you tell me why this happened? By the way:

  1. I have used the pretrained res-101.
  2. My environment is python3.6, cuda8.0, pytorch0.4.1,
  3. The augment pascal voc dataset is downloaded from DrSleep
  4. I re-train the model without any modifications for the code you provide.

How can I do to achieve the performance you report in the README.? Thank you for your help~

Training and Testing with ADE20K Dataset

Hi! I trained DeepLabv3+ with ADE20K dataset, but the test mIoU can just reach 36.3%, even can’t match with UperNet (its mIoU can reach 42%). I just wonder why it happened. I will appreciate it if someone can tell me how to solve this problem. Thanks a lot.

Bad Performance

I have downloaded the deeplabv3plus_xception_VOC2012_epoch60_all.pth,but the val result is only 0.032. Do you know what happened?

Cityscapes classes?

Thanks for your code. I find that there are 30 classes according to their official introduction,while there are 34 classes in ~deeplabv3plus-pytorch/lib/datasets/CityscapesDataset.py. So should I set the self.MODEL_NUM_CLASSES = 30 or self.MODEL_NUM_CLASSES = 30 + 1(background)? I don't know whether to add 1 background.
In my results backgropud scores 0 when I set self.MODEL_NUM_CLASSES = 34 and I do not change CityscapesDataset.py. My GPU resources are limited, so could you help me with this question?
捕获

after test,only background is defected

@YudeWang ,hello, can i ask you a question?
I executed the test operation,but only background was segmented。
the result is as following:

38/41
P[i].value: 9437184
P[i].value: 0
P[i].value: 0
7/41
P[i].value: 9699328
P[i].value: 0
P[i].value: 0
15/41
P[i].value: 9961472
P[i].value: 0
P[i].value: 0
23/41
P[i].value: 10223616
P[i].value: 0
P[i].value: 0
31/41
P[i].value: 10485760
P[i].value: 0
P[i].value: 0
39/41
P[i].value: 10747904
P[i].value: 0
P[i].value: 0
backbound: 97.516% w04: 0.000%
w13: 0.000%

   mIoU: 32.505%

Test finished

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.