Giter Club home page Giter Club logo

torchseg's Introduction

TorchSeg

This project aims at providing a fast, modular reference implementation for semantic segmentation models using PyTorch.

demo image

Highlights

  • Modular Design: easily construct customized semantic segmentation models by combining different components.
  • Distributed Training: >60% faster than the multi-thread parallel method(nn.DataParallel), we use the multi-processing parallel method.
  • Multi-GPU training and inference: support different manners of inference.
  • Provides pre-trained models and implement different semantic segmentation models.

Prerequisites

  • PyTorch 1.0
    • pip3 install torch torchvision
  • Easydict
    • pip3 install easydict
  • Apex
  • Ninja
    • sudo apt-get install ninja-build
  • tqdm
    • pip3 install tqdm

Updates

v0.1.1 (05/14/2019)

  • Release the pre-trained models and all trained models
  • Add PSANet for ADE20K
  • Add support for CamVid, PASCAL-Context datasets
  • Start only supporting the distributed training manner

Model Zoo

Pretrained Model

Supported Model

Performance and Benchmarks

SS:Single Scale MSF:Multi-scale + Flip

PASCAL VOC 2012

Methods Backbone TrainSet EvalSet Mean IoU(ss) Mean IoU(msf) Model
FCN-32s R101_v1c train_aug val 71.26 -
DFN(paper) R101_v1c train_aug val 79.67 80.6*
DFN(ours) R101_v1c train_aug val 79.40 81.40 GoogleDrive

80.6*: this result reported in paper is further finetuned on train dataset.

Cityscapes

Non-real-time Methods

Methods Backbone OHEM TrainSet EvalSet Mean IoU(ss) Mean IoU(msf) Model
DFN(paper) R101_v1c train_fine val 78.5 79.3
DFN(ours) R101_v1c train_fine val 79.09 80.41 GoogleDrive
DFN(ours) R101_v1c train_fine val 79.16 80.53 GoogleDrive
BiSeNet(paper) R101_v1c train_fine val - 80.3
BiSeNet(ours) R101_v1c train_fine val 79.09 80.39 GoogleDrive
BiSeNet(paper) R18 train_fine val 76.21 78.57
BiSeNet(ours) R18 train_fine val 76.28 78.00 GoogleDrive
BiSeNet(paper) X39 train_fine val 70.1 72
BiSeNet(ours)* X39 train_fine val 70.32 72.06 GoogleDrive

Real-time Methods

Methods Backbone OHEM TrainSet EvalSet Mean IoU Model
BiSeNet(paper) R18 train_fine val 74.8
BiSeNet(ours) R18 train_fine val 74.83 GoogleDrive
BiSeNet(paper) X39 train_fine val 69
BiSeNet(ours)* X39 train_fine val 68.51 GoogleDrive

BiSeNet(ours)*: because we didn't pre-train the Xception39 model on ImageNet in PyTorch, we train this experiment from scratch. We will release the pre-trained Xception39 model in PyTorch and the corresponding experiment.

ADE

Methods Backbone TrainSet EvalSet Mean IoU(ss) Accuracy(ss) Model
PSPNet(paper) R50_v1c train val 41.68 80.04
PSPNet(ours) R50_v1c train val 41.65 79.74 GoogleDrive
PSPNet(paper) R101_v1c train val 41.96 80.64
PSPNet(ours) R101_v1c train val 42.89 80.55 GoogleDrive
PSANet(paper) R50_v1c train val 41.92 80.17
PSANet(ours)* R50_v1c train val 41.67 80.09 GoogleDrive
PSANet(paper) R101_v1c train val 42.75 80.71
PSANet(ours) R101_v1c train val 43.04 80.56 GoogleDrive

PSANet(ours)*: The original PSANet in the paper constructs the attention map with over-parameters, while we only predict the attention map with the same size of the feature map. The performance is almost similar to the original one.

To Do

  • offer comprehensive documents
  • support more semantic segmentation models
    • Deeplab v3 / Deeplab v3+
    • DenseASPP
    • EncNet
    • OCNet

Training

  1. create the config file of dataset:train.txt, val.txt, test.txt
    file structure:(split with tab)
    path-of-the-image   path-of-the-groundtruth
  2. modify the config.py according to your requirements
  3. train a network:

Distributed Training

We use the official torch.distributed.launch in order to launch multi-gpu training. This utility function from PyTorch spawns as many Python processes as the number of GPUs we want to use, and each Python process will only use a single GPU.

For each experiment, you can just run this script:

export NGPUS=8
python -m torch.distributed.launch --nproc_per_node=$NGPUS train.py

Inference

In the evaluator, we have implemented the multi-gpu inference base on the multi-process. In the inference phase, the function will spawns as many Python processes as the number of GPUs we want to use, and each Python process will handle a subset of the whole evaluation dataset on a single GPU.

  1. evaluate a trained network on the validation set:
    python3 eval.py
  2. input arguments:
    usage: -e epoch_idx -d device_idx [--verbose ] 
    [--show_image] [--save_path Pred_Save_Path]

Disclaimer

This project is under active development. So things that are currently working might break in a future release. However, feel free to open issue if you get stuck anywhere.

Citation

The following are BibTeX references. The BibTeX entry requires the url LaTeX package.

Please consider citing this project in your publications if it helps your research.

@misc{torchseg2019,
  author =       {Yu, Changqian},
  title =        {TorchSeg},
  howpublished = {\url{https://github.com/ycszen/TorchSeg}},
  year =         {2019}
}

Please consider citing the DFN in your publications if it helps your research.

@inproceedings{yu2018dfn,
  title={Learning a Discriminative Feature Network for Semantic Segmentation},
  author={Yu, Changqian and Wang, Jingbo and Peng, Chao and Gao, Changxin and Yu, Gang and Sang, Nong},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2018}
}

Please consider citing the BiSeNet in your publications if it helps your research.

@inproceedings{yu2018bisenet,
  title={Bisenet: Bilateral segmentation network for real-time semantic segmentation},
  author={Yu, Changqian and Wang, Jingbo and Peng, Chao and Gao, Changxin and Yu, Gang and Sang, Nong},
  booktitle={European Conference on Computer Vision},
  pages={334--349},
  year={2018},
  organization={Springer}
}

Why this name, Furnace?

Furnace means the Alchemical Furnace. We all are the Alchemist, so I hope everyone can have a good alchemical furnace to practice the Alchemy. Hope you can be a excellent alchemist.

torchseg's People

Contributors

0xflotus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

torchseg's Issues

Question about cityscapes.bisenet.X39.speed

Hi, @ycszen thanks for your work.

I found that cityscapes.bisenet.X39 drops its upsampling operation for final output and becomes cityscapes.bisenet.X39.speed.

        if is_training:
            heads = [BiSeNetHead(conv_channel, out_planes, 2, # 16
                                 True, norm_layer),
                     BiSeNetHead(conv_channel, out_planes, 1,  # 8 
                                 True, norm_layer),
                     BiSeNetHead(conv_channel * 2, out_planes, 1, # 8
                                 False, norm_layer)]
        else:
            heads = [None, None,
                     BiSeNetHead(conv_channel * 2, out_planes, 1, # 8
                                 False, norm_layer)]

Meanwhile, cityscapes.bisenet.X39.speed evaluates mIoU in the resolution of 192x96,which means resizing the ground truth to a low resolution. I wonder that this operation is suitable for mIoU metrics.

In ICNet, the authors resize the predicted labels to the resolution of ground truth (2048x1024).

It's ok to train on other resolutions, but I think rescaling the ground truth and evaluating on other resolutions would disturb fair comparison (especially, some small objects miss in low resolutions).

If I only have one GPU, how to set the config.py?

Thanks for your great project, it's really helpful.

I only have one GPU, when I try to run the train.py of the cityscapes.bisenet.R18, the following error occured:

Traceback (most recent call last):
File "/home/oliver/PycharmProjects/TorchSeg/model/bisenet/cityscapes.bisenet.R18/train.py", line 131, in
loss = model(imgs, gts)
File "/home/oliver/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/oliver/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 141, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/oliver/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/oliver/PycharmProjects/TorchSeg/model/bisenet/cityscapes.bisenet.R18/network.py", line 77, in forward
spatial_out = self.spatial_path(data)
File "/home/oliver/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/oliver/PycharmProjects/TorchSeg/model/bisenet/cityscapes.bisenet.R18/network.py", line 133, in forward
x = self.conv_7x7(x)
File "/home/oliver/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/oliver/PycharmProjects/TorchSeg/furnace/seg_opr/seg_oprs.py", line 32, in forward
x = self.bn(x)
File "/home/oliver/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/oliver/PycharmProjects/TorchSeg/furnace/seg_opr/sync_bn/syncbn.py", line 50, in forward
mean, inv_std = self._slave_pipe.run_slave(_ChildMessage(xsum, xsqsum, N))
AttributeError: 'NoneType' object has no attribute 'run_slave'

After reading this page, I'm wondering how to run this code on the single GPU.

Can you give me some tips? Thank you so much!!

get 8% miou when train ade datasets with 7 classes

Hello, firstly thanks for your share!
Recently, I am using your model 'cityscapes.bisenet.R18.speed' to training my own dataset which is processed to 7classes with pretrained model 'R 18', But i got 8% class iou .
Could you help me to find where the fault is?

The expanded size of the tensor (256000) must match the existing size (253924) at non-singleton dimension 1. Target sizes: [2, 256000]. Tensor sizes: [253924]

The shape of my images is:
C.image_height = 1596
C.image_width = 2552

I set
C.target_size = 1024
C.base_size = 832

Is the question below related to my config?

Question:
Traceback (most recent call last):
File "train.py", line 127, in
loss = model(imgs, gts)
File "/home/work/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/work/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 141, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/work/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/work/yhy/TorchSeg/model/bisenet/license_plate.bisenet.X39.speed/network.py", line 107, in forward
aux_loss0 = self.ohem_criterion(self.heads0, label)
File "/home/work/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/work/yhy/TorchSeg/furnace/seg_opr/loss_opr.py", line 79, in forward
prob = prob.masked_fill_(1 - valid_mask, 1)
RuntimeError: The expanded size of the tensor (256000) must match the existing size (253924) at non-singleton dimension 1. Target sizes: [2, 256000]. Tensor sizes: [253924]

BiSeNet mean IoU for R18

Hi!, I am only able to get mean IoU: 70.446% for BiSeNet, when R18 is used as a backbone. I have trained BiSeNet on CityScape leftImg8bit folder (used gtFine folder for GT's) with train input image dimensions 1024x1024.
The achieved results are still little below than the results you mentioned on GitHub repository page (Mean IoU : 74.6). Did you have used network parameters other than mentioned in the config file uploaded on the GitHub repository? Thanks for your time.

where do you transform the labels for Cityscape dataset

Hi, the original cityscape dataset has 33 classes while usually we only use 19 classes. so may I ask you where do you transform the original labels to the ones we use for training ?

I see in cityscapes.py file, you have defined a function named transform_label. However, I didn't find any usage of this function.

Where should I run train.py?

Should I enter model/bisenet/cityscapes.bisenet.R18 and run train.py or just in the top of root dir? I got error:

Traceback (most recent call last):
  File "train.py", line 13, in <module>
    from config import config
  File "/home/USER/Projects/TorchSeg-BiSeNet/model/bisenet/cityscapes.bisenet.R18/config.py", line 56, in <module>
    from utils.pyt_utils import model_urls
ModuleNotFoundError: No module named 'utils'
Traceback (most recent call last):
  File "train.py", line 13, in <module>
    from config import config
  File "/home/USER/Projects/TorchSeg-BiSeNet/model/bisenet/cityscapes.bisenet.R18/config.py", line 56, in <module>
    from utils.pyt_utils import model_urls
ModuleNotFoundError: No module named 'utils'
Traceback (most recent call last):
  File "train.py", line 13, in <module>
    from config import config
  File "/home/USER/Projects/TorchSeg-BiSeNet/model/bisenet/cityscapes.bisenet.R18/config.py", line 56, in <module>
    from utils.pyt_utils import model_urls
ModuleNotFoundError: No module named 'utils'
Traceback (most recent call last):
  File "train.py", line 13, in <module>
    from config import config
  File "/home/USER/Projects/TorchSeg-BiSeNet/model/bisenet/cityscapes.bisenet.R18/config.py", line 56, in <module>
    from utils.pyt_utils import model_urls
ModuleNotFoundError: No module named 'utils'

I found that furnace/utils, maybe wrong dir?

RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/

hi, @ycszen

Sorry to disturb you again. After some struggle on the code, I was stuck at the Criterion part. It gave RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/aten/src/THCUNN/generic/SpatialClassNLLCriterion.cu:128

I add the CUDA_LAUNCH_BLOCKING=1 before run the script to enable more accuracy message:

0] Assertiont >= 0 && t < n_classesfailed. /pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:99: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [11,0,0], thread: [766,0,0] Assertiont >= 0 && t < n_classesfailed. /pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:99: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [11,0,0], thread: [767,0,0] Assertiont >= 0 && t < n_classesfailed. /pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:99: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [11,0,0], thread: [800,0,0] Assertiont >= 0 && t < n_classes` failed.
THCudaCheck FAIL file=/pytorch/aten/src/THCUNN/generic/SpatialClassNLLCriterion.cu line=128 error=59 : device-side assert triggered
Traceback (most recent call last):

loss = model(imgs, gts, cgts)

File "/home/chenp/.pyenv/versions/3.6.8/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/chenp/workspace/git/TorchSeg/model/dfn/voc.dfn.R101_v1c/network.py", line 137, in forward
loss0 = self.criterion(pred_out[0], label)
File "/home/chenp/.pyenv/versions/3.6.8/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/chenp/.pyenv/versions/3.6.8/lib/python3.6/site-packages/torch/nn/modules/loss.py", line 904, in forward
ignore_index=self.ignore_index, reduction=self.reduction)
File "/home/chenp/.pyenv/versions/3.6.8/lib/python3.6/site-packages/torch/nn/functional.py", line 1970, in cross_entropy
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
File "/home/chenp/.pyenv/versions/3.6.8/lib/python3.6/site-packages/torch/nn/functional.py", line 1792, in nll_loss
ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/aten/src/THCUNN/generic/SpatialClassNLLCriterion.cu:128

`

Do you have any experience or advise on it ?

The problem about FPS

Hi @ycszen
The FPS about BiSeNet in paper abstract is tested on a 2048x1024 input image is 105.

But, I just get 2 FPS about BiSeNet(Xception) and 9.5 FPS about BiSeNet(ResNet-18) on TiTan Xp.

Question about the performances

I note that the labels you use in eval.py of cityscapes.bisenet are downsampled, and I reproduce your result of 74 miou of bisenet.r18.speed with the 8x downsampled label. And if I set the gt_downsample=1, the performance will drop.
I wan to know if the performances you post are based on the low-scale labels?

replace the final 8xupsample with convtranspose

in the scenario of gt_down_sample ==1, the header with 8x upsampling will lost much space info, replacing it with deconv will get a better result,

in my dateset, I get 4% performance boosting , FYI
windyrobin@bd3085e#diff-516fe8f873dafdd94ef2d87e5e51428dR228

before:

----------------------------     mean_IU  75.694% mean_IU_no_back 74.636% mean_pixel_ACC 96.662%
29 09:30:08 Evaluation Elapsed Time: 11.91s

after:

----------------------------     mean_IU  79.444% mean_IU_no_back 78.544% mean_pixel_ACC 97.154%
30 20:57:20 Evaluation Elapsed Time: 13.99s

WRN Missing key(s) in state_dict

@ycszen
When I run the train.py from cityscapes.bisenet.R18.speed, the following tip appears:

WRN Missing key(s) in state_dict: layer3.0.bn1.num_batches_tracked, layer1.1.bn1.num_batches_tracked, layer2.1.bn2.num_batches_tracked, layer1.1.bn2.num_batches_tracked, layer1.0.bn1.num_batches_tracked, layer2.0.downsample.1.num_batches_tracked, layer3.1.bn2.num_batches_tracked, layer3.1.bn1.num_batches_tracked, layer3.0.downsample.1.num_batches_tracked, layer2.0.bn1.num_batches_tracked, layer2.0.bn2.num_batches_tracked, layer4.0.bn1.num_batches_tracked, layer4.0.bn2.num_batches_tracked, bn1.num_batches_tracked, layer4.1.bn2.num_batches_tracked, layer4.1.bn1.num_batches_tracked, layer1.0.bn2.num_batches_tracked, layer3.0.bn2.num_batches_tracked, layer4.0.downsample.1.num_batches_tracked, layer2.1.bn1.num_batches_tracked

How should I deal with this problem?

Nan loss when training dfn model with voc2012.

I use single GPU GTX1080ti with batch-size 4, so I replace DataParallelModel,Reduce, BatchNorm2d with standard nn.BatchNorm2d. Since the pre-trained model can't be downloaded, I trained the model from scratch.
But I got an Nan loss after few iterations, is it not appropriate to modified this?

My torch version is 1.0 with cuda10. Thanks!

software dependence error

hi, @ycszen

Sorry to disturb you. This project is so attractive that I want to re-produce the result with it. However, when I tried to run train.py in TorchSeg/model/dfn/voc.dfn.R101_v1c. It gave several warnings and errors. They were the software dependence issues. This I wonder if you could share your software version in your environment.

My is: centos7.5 + python3.6.8 + pytorch1.0 + cuda9.0 + gcc-4.9.4.

Error message:

`
/home/cat/.pyenv/versions/3.6.8/lib/python3.6/site-packages/torch/utils/cpp_extension.py:166: UserWarning:

                           !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

platform=sys.platform))
Traceback (most recent call last):
File "train.py", line 24, in
from apex.parallel import DistributedDataParallel, SyncBatchNorm
ModuleNotFoundError: No module named 'apex'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train.py", line 27, in
"Please install apex from https://www.github.com/nvidia/apex .")
ImportError: Please install apex from https://www.github.com/nvidia/apex .

`

When I tried to install apex by pip install apex, it gave
In file included from /home/chenp/.pyenv/versions/3.6.8/include/python3.6m/Python.h:39:0, from cryptacular/bcrypt/_bcrypt.c:26: crypt_blowfish-1.2/crypt.h:17:23: fatal error: gnu-crypt.h: No such file or directory #include <gnu-crypt.h> ^ compilation terminated. error: command 'gcc' failed with exit status 1

ModuleNotFoundError: No module named 'datasets.stuff10k'

Hi.
I download cityscapes-bisenet-R18-speed.pth and set to orchSeg/model/bisenet/ directory.
And

$ python eval.py
Traceback (most recent call last):
  File "eval.py", line 18, in <module>
    from datasets.cityscapes import Cityscapes
  File "/home/sounansu/anaconda3/TorchSeg/furnace/datasets/__init__.py", line 5, in <module>
    from .stuff10k import Stuff10K
ModuleNotFoundError: No module named 'datasets.stuff10k'

Where is .stuff10k ?

DFN performance on VOC2012

thank you for your work!
I downloaded the voc-dfn-R101_v1c model and test on voc 1449 valset but got only 77.381 mIoU at single scale.
How to reproduce the result reported in the table?

resnet50_v1c weight not match

Thanks to your great work!
I tried to run pspnet according to your instructions. I downloaded 'resnet50_v1c' from gluon and converted it to pytorch model by running 'python gluon2pytorch.py -m 'resnet50_v1c'. But when I tried to run the pspnet by command 'python train.py -d 0-7', it shows that the weight of the checkpoint does not match that of the current model. The log is as following:
RuntimeError: Error(s) in loading state_dict for ResNet:
size mismatch for conv1.0.weight: copying a param with shape torch.Size([32, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 3, 3, 3]).
size mismatch for conv1.1.weight: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([64]).
size mismatch for conv1.1.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([64]).
size mismatch for conv1.1.running_mean: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([64]).
size mismatch for conv1.1.running_var: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([64]).
size mismatch for conv1.3.weight: copying a param with shape torch.Size([32, 32, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
size mismatch for conv1.4.weight: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([64]).
size mismatch for conv1.4.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([64]).
size mismatch for conv1.4.running_mean: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([64]).
size mismatch for conv1.4.running_var: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([64]).
size mismatch for conv1.6.weight: copying a param with shape torch.Size([64, 32, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 64, 3, 3]).
size mismatch for bn1.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for bn1.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for bn1.running_mean: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for bn1.running_var: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([128]).
size mismatch for layer1.0.conv1.weight: copying a param with shape torch.Size([64, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 128, 1, 1]).
size mismatch for layer1.0.downsample.0.weight: copying a param with shape torch.Size([256, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 128, 1, 1]).
Could you help me find out where it goes wrong? Thanks very much!

Image normalization during evaluation mode

I wants to use image normalization during evaluation mode. For this, I have subtracted channel-wise mean from the original image. Then feed the image to the model. I am not getting expected mean IoU outcome. I think model is not aware from the mean subtraction. Does anyone knows where I am doing mistake?

Attention Refinement Module is slightly different from the on in paper

According to your paper, feature is first global avarage pooling, 1x1 conv, bn, sigmoid to create a weighted vector, and multiply this weighted vector to input feature to get the final output. However, according to the definition in:
https://github.com/ycszen/TorchSeg/blob/master/furnace/seg_opr/seg_oprs.py#L155-L175
An extra conv-bn-relu block is prepend in the ARM, which is not mentioned in the paper. Do I understand it right?

is there any problem with function normalize in img_utils.py

def normalize(img, mean, std):
# pytorch pretrained model need the input range: 0-1
img = img - mean
img = img.astype(np.float32) / 255.0
# img = img / std

return img

hi ycszen, i feel that there is a problem about the normalize function, the img passed into the function is lied in 0-255, while the mean found in the config.py is typically 0.4-0.5, so if the normalize should be changed into

def normalize(img, mean, std):
# pytorch pretrained model need the input range: 0-1

img = img.astype(np.float32) / 255.0
img = img - mean
# img = img / std

return img

thank you very much

Issue with Ninja and NVCC

Thank you for sharing your code. I am currently trying to reproduce your result.
However, I come across with some issue when running train.py. Below is the error message.

//////////////////////////////////////////////////////////////////////////////////////////////////////

Traceback (most recent call last): File "/scratch_net/biwidl212/majing/anaconda3/envs/majing/lib/python3.6/site-packages/t orch/utils/cpp_extension.py", line 873, in verify_ninja_availability subprocess.check_call('ninja --version'.split(), stdout=devnull) File "/scratch_net/biwidl212/majing/anaconda3/envs/majing/lib/python3.6/subprocess.py", line 306, in check_call retcode = call(*popenargs, **kwargs) File "/scratch_net/biwidl212/majing/anaconda3/envs/majing/lib/python3.6/subprocess.py", line 287, in call with Popen(*popenargs, **kwargs) as p: File "/scratch_net/biwidl212/majing/anaconda3/envs/majing/lib/python3.6/subprocess.py", line 729, in init restore_signals, start_new_session) File "/scratch_net/biwidl212/majing/anaconda3/envs/majing/lib/python3.6/subprocess.py", line 1364, in execute_child raise child_exception_type(errno_num, err_msg, err_filename) FileNotFoundError: [Errno 2] No such file or directory: 'ninja': 'ninja' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "train.py", line 21, in from seg_opr.sync_bn import DataParallelModel, Reduce, BatchNorm2d File "/scratch_net/biwidl212/majing/sem_proj/BiSeNet/TorchSeg/furnace/seg_opr/sync_bn/ init_.py", line 8, in from .syncbn import * File "/scratch_net/biwidl212/majing/sem_proj/BiSeNet/TorchSeg/furnace/seg_opr/sync_bn/s yncbn.py", line 17, in from .functions import * File "/scratch_net/biwidl212/majing/sem_proj/BiSeNet/TorchSeg/furnace/seg_opr/sync_bn/f unctions.py", line 13, in from .src import * File "/scratch_net/biwidl212/majing/sem_proj/BiSeNet/TorchSeg/furnace/seg_opr/sync_bn/s rc/init.py", line 12, in ], build_directory=cpu_path, verbose=False) File "/scratch_net/biwidl212/majing/anaconda3/envs/majing/lib/python3.6/site-packages/t orch/utils/cpp_extension.py", line 645, in load is_python_module) File "/scratch_net/biwidl212/majing/anaconda3/envs/majing/lib/python3.6/site-packages/t orch/utils/cpp_extension.py", line 814, in _jit_compile with_cuda=with_cuda) File "/scratch_net/biwidl212/majing/anaconda3/envs/majing/lib/python3.6/site-packages/t orch/utils/cpp_extension.py", line 837, in _write_ninja_file_and_build verify_ninja_availability() File "/scratch_net/biwidl212/majing/anaconda3/envs/majing/lib/python3.6/site-packages/t orch/utils/cpp_extension.py", line 875, in verify_ninja_availability raise RuntimeError("Ninja is required to load C++ extensions") RuntimeError: Ninja is required to load C++ extensions

//////////////////////////////////////////////////////////////////////////////////////////////////////
But as actually I did have ninja installed and appeared in my list. Installation was done by pip. So, could you please share your version for gcc, cudatoolkit, nvcc and ninja?
Mine is GCC 6.3.0 cudatoolkit 9.2.0 nvcc 6.0 v6.0.1 ninja 1.9.0

what is "min_kept" in ProbOhemCrossEntropy2d function?

Hi @ycszen thank you for the wonderful codebase. May I ask you several questions?

  1. what is "min_kept" in ProbOhemCrossEntropy2d function?
    ohem_criterion = ProbOhemCrossEntropy2d(ignore_label=255, thresh=0.7, min_kept=250000, use_weight=False)

  2. What is 16 means in this equation? Is it the batch size?
    min_kept=int(config.batch_size // len(engine.devices) * config.image_height * config.image_width // 16)

  3. Actually my situation is, I can start training but it will trigger device RuntimeError: cuda runtime error (59) : device-side assert at random epochs. It is similar to issue 10, but mine is random during training. I already checked my labels. They should be correct, with a range of 0 to 18. The only modification I did to your config file is changing the batch size to 12 because I only has 2 GPUs. Could you or anyone help or give me any hints how to debug? Thank you very much.

About CamVid dataset

I have downloaded the CamVid dataset, but the labels are RGB format and 32 classes, I want to convert the labels to 11 classes and index, how can i do it?

Focal loss

我感觉你的focal loss 有点和论文不一样是你自己修改的还是,这么样的呢?可以给我些解答吗

why is the output of loss = model(imgs, gts) a 0-d tensor?

Traceback (most recent call last):
File "train.py", line 144, in
loss = Reduce.apply(*loss) / len(loss)
File "/home/work/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/tensor.py", line 422, in iter
raise TypeError('iteration over a 0-d tensor')
TypeError: iteration over a 0-d tensor

Batch normalization function missed

Hi Dr.Yu,
Your source code is very very valuable.
I found in latest version of your suit, you deleted your sync_bn package.
And in the BiSeNet training code, you deleted from seg_opr.sync_bn import BatchNorm2d. Thus it means to use sync_bn2d in the package APEX?

But the error occured as following.
Traceback (most recent call last): File "/home/cgv841/litao/TorchSeg/model/bisenet/cityscapes.bisenet.R18/train.py", line 128, in <module> loss = model(imgs, gts) File "/home/cgv841/anaconda3/envs/segpt1.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/home/cgv841/litao/TorchSeg/model/bisenet/cityscapes.bisenet.R18/network.py", line 104, in forward aux_loss0 = self.criterion(self.heads[0](pred_out[0]), label) File "/home/cgv841/anaconda3/envs/segpt1.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/home/cgv841/litao/TorchSeg/furnace/seg_opr/loss_opr.py", line 84, in forward index = mask_prob.argsort() File "/home/cgv841/anaconda3/envs/segpt1.0/lib/python3.6/site-packages/torch/tensor.py", line 248, in argsort return torch.argsort(self, dim, descending) File "/home/cgv841/anaconda3/envs/segpt1.0/lib/python3.6/site-packages/torch/functional.py", line 651, in argsort return torch.sort(input, -1, descending)[1] RuntimeError: merge_sort: failed to synchronize: device-side assert triggered

I use 2 GPUs of 1080Ti, and batch-size=8

Problem with Sync_BN and CUDA

This is what I got from running the code


Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/torch/utils/cpp_extension.py", line 946, in _build_extension_module
    check=True)
  File "/usr/lib/python3.5/subprocess.py", line 708, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train.py", line 21, in <module>
    from seg_opr.sync_bn import DataParallelModel, Reduce, BatchNorm2d
  File "/Data/TorchSeg/furnace/seg_opr/sync_bn/__init__.py", line 8, in <module>
    from .syncbn import *
  File "/Data/TorchSeg/furnace/seg_opr/sync_bn/syncbn.py", line 17, in <module>
    from .functions import *
  File "/Data/TorchSeg/furnace/seg_opr/sync_bn/functions.py", line 13, in <module>
    from .src import *
  File "/Data/TorchSeg/furnace/seg_opr/sync_bn/src/__init__.py", line 18, in <module>
    ], build_directory=gpu_path, verbose=False)
  File "/usr/local/lib/python3.5/dist-packages/torch/utils/cpp_extension.py", line 645, in load
    is_python_module)
  File "/usr/local/lib/python3.5/dist-packages/torch/utils/cpp_extension.py", line 814, in _jit_compile
    with_cuda=with_cuda)
  File "/usr/local/lib/python3.5/dist-packages/torch/utils/cpp_extension.py", line 863, in _write_ninja_file_and_build
    _build_extension_module(name, build_directory, verbose)
  File "/usr/local/lib/python3.5/dist-packages/torch/utils/cpp_extension.py", line 959, in _build_extension_module
    raise RuntimeError(message)
RuntimeError: Error building extension 'syncbn_gpu': [1/2] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=syncbn_gpu -DTORCH_API_INCLUDE_EXTENSION_H -isystem /usr/local/lib/python3.5/dist-packages/torch/lib/include -isystem /usr/local/lib/python3.5/dist-packages/torch/lib/include/torch/csrc/api/include -isystem /usr/local/lib/python3.5/dist-packages/torch/lib/include/TH -isystem /usr/local/lib/python3.5/dist-packages/torch/lib/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.5m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --compiler-options '-fPIC' -std=c++11 -c /Data/TorchSeg/furnace/seg_opr/sync_bn/src/gpu/syncbn_kernel.cu -o syncbn_kernel.cuda.o
FAILED: syncbn_kernel.cuda.o
/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=syncbn_gpu -DTORCH_API_INCLUDE_EXTENSION_H -isystem /usr/local/lib/python3.5/dist-packages/torch/lib/include -isystem /usr/local/lib/python3.5/dist-packages/torch/lib/include/torch/csrc/api/include -isystem /usr/local/lib/python3.5/dist-packages/torch/lib/include/TH -isystem /usr/local/lib/python3.5/dist-packages/torch/lib/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.5m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --compiler-options '-fPIC' -std=c++11 -c /Data/TorchSeg/furnace/seg_opr/sync_bn/src/gpu/syncbn_kernel.cu -o syncbn_kernel.cuda.o
In file included from /Data/TorchSeg/furnace/seg_opr/sync_bn/src/gpu/syncbn_kernel.cu:4:0:
/usr/local/lib/python3.5/dist-packages/torch/lib/include/ATen/cuda/CUDAContext.h:12:22: fatal error: cusparse.h: No such file or directory
compilation terminated.
ninja: build stopped: subcommand failed.


I looked everywhere and found out that it was CUDA that caused the problem. It seems to be associate with the CUDAContext.h file, it can not found any of the header files, e.g. cusparse.h, cublas_v2.h...

I don't know how to solve it
I've tried reintalling pytorch from source but no use.
Can someone help me, MUCH THANKS!

ImportError: libcudart.so.9.0: cannot open shared object file: No such file or directory

I wanna know why and is imp.py loading cause an error about libcudart.so.9.0. I have cuda 10 installed in my conda environment, not cuda 9.

Traceback (most recent call last):
  File "train.py", line 21, in <module>
    from seg_opr.sync_bn import DataParallelModel, Reduce, BatchNorm2d
  File "/home/USER/Projects/TorchSeg-BiSeNet/furnace/seg_opr/sync_bn/__init__.py", line 8, in <module>
    from .syncbn import *
  File "/home/USER/Projects/TorchSeg-BiSeNet/furnace/seg_opr/sync_bn/syncbn.py", line 17, in <module>
    from .functions import *
  File "/home/USER/Projects/TorchSeg-BiSeNet/furnace/seg_opr/sync_bn/functions.py", line 13, in <module>
    from .src import *
  File "/home/USER/Projects/TorchSeg-BiSeNet/furnace/seg_opr/sync_bn/src/__init__.py", line 18, in <module>
    ], build_directory=gpu_path, verbose=False)
  File "/home/USER/.conda/envs/BiSeNet-official-test/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 645, in load
    is_python_module)
  File "/home/USER/.conda/envs/BiSeNet-official-test/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 825, in _jit_compile
    return _import_module_from_library(name, build_directory, is_python_module)
  File "/home/USER/.conda/envs/BiSeNet-official-test/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 968, in _import_module_from_library
    return imp.load_module(module_name, file, path, description)
  File "/home/USER/.conda/envs/BiSeNet-official-test/lib/python3.6/imp.py", line 243, in load_module
    return load_dynamic(name, filename, file)
  File "/home/USER/.conda/envs/BiSeNet-official-test/lib/python3.6/imp.py", line 343, in load_dynamic
    return _load(spec)
ImportError: libcudart.so.9.0: cannot open shared object file: No such file or directory

Complete log is in attach: log.txt

The number of paramters

First of all, thank you for your work.
However, I tried to get the total number of parameters of the BiSeNet(Xception39 based) with 19 classes(Cityscapes) by using
pytorch_total_params = sum(p.numel() for p in model.parameters() if p.requires_grad),
the result is 1.54 M.
It is not the same as 5.8 M in paper.
How did you measure the number of parameters?

Inference code

Can you share the code which can be used to check the trained network inference on CityScape test data sets?

RuntimeError: merge_sort: failed to synchronize: device-side assert triggered

sorry to bother you , i got this problem when i run train.py
/opt/conda/conda-bld/pytorch_1549636813070/work/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda ->auto::operator()(int)->auto: block: [151,0,0], thread: [122,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1549636813070/work/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda ->auto::operator()(int)->auto: block: [151,0,0], thread: [123,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1549636813070/work/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda ->auto::operator()(int)->auto: block: [151,0,0], thread: [124,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1549636813070/work/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda ->auto::operator()(int)->auto: block: [151,0,0], thread: [125,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1549636813070/work/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda ->auto::operator()(int)->auto: block: [151,0,0], thread: [126,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1549636813070/work/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda ->auto::operator()(int)->auto: block: [151,0,0], thread: [127,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1549636813070/work/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda ->auto::operator()(int)->auto: block: [207,0,0], thread: [93,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1549636813070/work/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda ->auto::operator()(int)->auto: block: [207,0,0], thread: [94,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1549636813070/work/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda ->auto::operator()(int)->auto: block: [207,0,0], thread: [95,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1549636813070/work/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda ->auto::operator()(int)->auto: block: [263,0,0], thread: [62,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1549636813070/work/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda ->auto::operator()(int)->auto: block: [263,0,0], thread: [63,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
Traceback (most recent call last):
File "train.py", line 133, in
loss = model(imgs, gts)
File "/home/heal/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/heal/anaconda3/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 143, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/heal/anaconda3/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 153, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/heal/anaconda3/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 83, in parallel_apply
raise output
File "/home/heal/anaconda3/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 59, in _worker
output = module(*input, **kwargs)
File "/home/heal/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/heal/TorchSeg-master/model/bisenet/cityscapes.bisenet.R18/network.py", line 105, in forward
aux_loss0 = self.ohem_criterion(self.heads0, label)
File "/home/heal/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/heal/TorchSeg-master/furnace/seg_opr/loss_opr.py", line 85, in forward
index = mask_prob.argsort()
File "/home/heal/anaconda3/lib/python3.7/site-packages/torch/tensor.py", line 248, in argsort
return torch.argsort(self, dim, descending)
File "/home/heal/anaconda3/lib/python3.7/site-packages/torch/functional.py", line 648, in argsort
return torch.sort(input, -1, descending)[1]
RuntimeError: merge_sort: failed to synchronize: device-side assert triggered

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train.py", line 167, in
config.log_dir_link)
File "/home/heal/TorchSeg-master/furnace/engine/engine.py", line 154, in exit
torch.cuda.empty_cache()
File "/home/heal/anaconda3/lib/python3.7/site-packages/torch/cuda/init.py", line 374, in empty_cache
torch._C._cuda_emptyCache()
RuntimeError: CUDA error: device-side assert triggered

and this is my dataset:class Camvid(BaseDataset):
trans_labels = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,
28,29,30,31,32]

@classmethod
def get_class_colors(*args):
    return [[64,128,64],[192,0,128],[0,128,192],[0,128,64],[128,0,0],[64,0,128],[64,0,192],[192,128,64],[192,192,128],
            [64,64,128],[128,0,192],[192,0,64],[128,128,64],[192,0,192],[128,64,64],[64,192,128],[64,64,0],[128,64,128],
            [128,128,92],[0,0,192],[192,128,128],[128,128,128],[64,128,192],[0,0,64],[0,64,64],[192,64,128],[128,128,0]
            [192,128,192][64,0,64][192,192,0][0,0,0],[64,192,0]]

@classmethod
def get_class_names(*args):
    return ['Animal',

'Archway',
'Bicyclist',
'Bridge',
'Building',
'Car',
'CartLuggagePram',
'Child',
'Column_Pole',
'Fence',
'LaneMkgsDriv',
'LaneMkgsNonDriv',
'Misc_Text',
'MotorcycleScooter',
'OtherMoving',
'ParkingBlock',
'Pedestrian',
'Road',
'RoadShoulder',
'Sidewalk',
'SignSymbol',
'Sky',
'SUVPickupTruck',
'TrafficCone',
'TrafficLight',
'Train',
'Tree',
'Truck_Bus',
'Tunnel',
'VegetationMisc',
'Void',
'Wall',
]

this is my config:C = edict()
config = C
cfg = C

C.seed = 12345

"""please config ROOT_dir and user when u first using"""
C.repo_name = 'TorchSeg'
C.abs_dir = osp.realpath(".")
C.this_dir = C.abs_dir.split(osp.sep)[-1]
C.root_dir = C.abs_dir[:C.abs_dir.index(C.repo_name) + len(C.repo_name)]
C.log_dir = osp.abspath(osp.join(C.root_dir, 'log', C.this_dir))
C.log_dir_link = osp.join(C.abs_dir, 'log')
C.snapshot_dir = osp.abspath(osp.join(C.log_dir, "snapshot"))

exp_time = time.strftime('%Y_%m_%d_%H_%M_%S', time.localtime())
C.log_file = C.log_dir + '/log_' + exp_time + '.log'
C.link_log_file = C.log_file + '/log_last.log'
C.val_log_file = C.log_dir + '/val_' + exp_time + '.log'
C.link_val_log_file = C.log_dir + '/val_last.log'

"""Data Dir and Weight Dir"""
C.dataset_path = "/home/heal/TorchSeg-master/data/CamVid/"
C.img_root_folder = C.dataset_path
C.gt_root_folder = C.dataset_path
C.train_source = osp.join(C.dataset_path, "train.txt")
C.eval_source = osp.join(C.dataset_path, "val.txt")
C.test_source = osp.join(C.dataset_path, "test.txt")
C.is_test = False

"""Path Config"""

def add_path(path):
if path not in sys.path:
sys.path.insert(0, path)

add_path(osp.join(C.root_dir, 'furnace'))

=============================================================================

from torch.utils.pyt_utils import model_urls

=============================================================================

"""Image Config"""
C.num_classes = 32
C.background = 0
C.image_mean = np.array([0.485, 0.456, 0.406]) # 0.485, 0.456, 0.406
C.image_std = np.array([0.229, 0.224, 0.225])
C.target_size = 512
C.image_height = 512
C.image_width = 512
C.num_train_imgs = 420
C.num_eval_imgs = 20

""" Settings for network, this would be different for each kind of model"""
C.fix_bias = True
C.fix_bn = False
C.sync_bn = True
C.bn_eps = 1e-5
C.bn_momentum = 0.1
C.pretrained_model = "/home/heal/TorchSeg-master/pytorch_model/resnet18_v1.pth"

"""Train Config"""
C.lr = 1e-2
C.lr_power = 0.9
C.momentum = 0.9
C.weight_decay = 5e-4
C.batch_size = 8 #4 * C.num_gpu
C.nepochs = 150
C.niters_per_epoch = 420
C.num_workers = 4
C.train_scale_array = [0.75, 1, 1.25, 1.5, 1.75, 2.0]

"""Eval Config"""
C.eval_iter = 30
C.eval_stride_rate = 5 / 6
C.eval_scale_array = [1, ] # 0.5, 0.75, 1, 1.25, 1.5, 1.75
C.eval_flip = False
C.eval_base_size =512
C.eval_crop_size =512

"""Display Config"""
C.snapshot_iter = 50
C.record_info_iter = 20
C.display_iter = 50

Question about the evaluation settings.

Hi, thanks for you nice work!
But a question still bothers me. Does the single scale performance use the average value of sliding patches cropped in the output, or just use the whole single output to compare with the corresponded same size label?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.