drsleep / light-weight-refinenet Goto Github PK

Light-Weight RefineNet for Real-Time Semantic Segmentation

License: Other

Python 99.21% Shell 0.79%

semantic-segmentation deep-learning light-weight refinenet pytorch

light-weight-refinenet's Introduction

Light-Weight RefineNet (in PyTorch)

This repository provides official models from the paper Light-Weight RefineNet for Real-Time Semantic Segmentation, available here

Light-Weight RefineNet for Real-Time Semantic Segmentation
Vladimir Nekrasov, Chunhua Shen, Ian Reid
In BMVC 2018

UPDATES

14, July, 2020:

New weights of Light-Weight RefineNet with the ResNet-50 backbone trained on COCO+BSD+VOC via the code in src_v2/ have been uploaded. The model shows 82.04% mean iou on the validation set in the single-scale regime, and 83.41% mean iou on the test set with multi-scale and horizontal flipping (per-class test results).
New weights of Light-Weight RefineNet with the MobileNet-v2 backbone trained on COCO+BSD+VOC via the code in src_v2/ have been uploaded. The model shows 78.30% mean iou on the validation set in the single-scale regime, and 80.28% mean iou on the test set with multi-scale and horizontal flipping (per-class test results).

5, June, 2020: a new version of the code has been pushed. It currently resides in src_v2/. The code now closely interacts with densetorch and supports transformations from albumentations, while also supporting torchvision datasets. Three training examples are provided in train/:

train_v2_nyu.sh is analogous to nyu.sh, trains Light-Weight-RefineNet-50 on NYU, achieving ~42.4% mean IoU on the validation set (no TTA).
train_v2_nyu_albumentations.sh uses transformations from the albumentations package, achieving ~42.5% mean IoU on the validation set (no TTA).
train_v2_sbd_voc.sh trains Light-Weight-RefineNet-50 on SBD (5623 training images) and VOC (1464 training images) datasets from torchvision with transformations from the albumentations package; achieves ~76% mean IoU on the validation set with no TTA (1449 validation images).

If you want to train the network on your own dataset, specify the arguments (see the available options in src_v2/arguments.py) and provide implementation of your dataset in src_v2/data.py if it is not supported by either densetorch or torchvision.

Getting Started

For flawless reproduction of our results, the Ubuntu OS is recommended. The models have been tested using Python 2.7 and Python 3.6.

Dependencies

pip, pip3
torch>=0.4.0

To install required Python packages, please run pip install -r requirements.txt (Python2), or pip3 install -r requirements3.txt (Python3) - use the flag -u for local installation. The given examples can be run with, or without GPU.

Running examples

For the ease of reproduction, we have embedded all our examples inside Jupyter notebooks. One can either download them from this repository and proceed working with them on his/her local machine/server, or can resort to online version supported by the Google Colab service.

Jupyter Notebooks [Local]

If all the installation steps have been smoothly executed, you can proceed with running any of the notebooks provided in the examples/notebooks folder. To start the Jupyter Notebook server, on your local machine run jupyter notebook. This will open a web page inside your browser. If it did not open automatically, find the port number from the command's output and paste it into your browser manually. After that, navigate to the repository folder and choose any of the examples given.

The number of FLOPs and runtime are measured on 625x468 inputs using a single GTX1080Ti, mean IoU is given on corresponding validation sets with a single scale input.

Models	PASCAL VOC	Person-Part	PASCAL Context	NYUv2, 40	Params, M	FLOPs, B	Runtime, ms
RF-LW-ResNet-50	78.5	64.9	-	41.7	27	33	19.56±0.29
RF-LW-ResNet-101	80.3	66.7	45.1	43.6	46	52	27.16±0.19
RF-LW-ResNet-152	82.1	67.6	45.8	44.4	62	71	35.82±0.23
RF-LW-MobileNet-v2	76.2	-	-	-	3.3	9.3	-

Inside the notebook, one can try out their own images, write loops to iterate over videos / whole datasets / streams (e.g., from webcam). Feel free to contribute your cool use cases of the notebooks!

Colab Notebooks [Web]

If you do not want to be involved in any hassle regarding the setup of the Jupyter Notebook server, you can proceed by using the same examples inside the Google colab environment - with free GPUs available!

Training scripts

We provide training scripts to get you started on the NYUv2-40 dataset. The methodology slightly differs from the one described in the paper and leads to better and more stable results (at least, on NYU).

In particular, here we i) start with a lower learning rate (as we initialise weights using default PyTorch's intiialisation instead of normal(0.01)), ii) add more aggressive augmentation (random scale between 0.5 and 2.0), and iii) pad each image inside the batch to a fixed crop size (instead of resizing all of them). The training process is divided into 3 stages: after each the optimisers are re-created with the learning rates halved. All the training is done using a single GTX1080Ti GPU card. Additional experiments with this new methodology on the other datasets (and with the MobileNet-v2 backbone) are under way, and relevant scripts will be provided once available. Please also note that the training scripts were written in Python 3.6.

To start training on NYU:

If not already done, download the dataset from here. Note that the white borders in all the images were already cropped.
Build the helper code for calculating mean IoU written in Cython. For that, execute the following python src/setup.py build_ext --build-lib=./src/.
Make sure to provide the correct paths to the dataset images either by modifying src/config.py or train/nyu.sh
Run ./train/nyu.sh. On a single 1080Ti, the training takes around 3-6 hours (ResNet-50 - ResNet-152, correspondingly).

If you want to train the networks using your dataset, you would need to modify the following:

Add files with paths to your images and segmentation masks. The paths can either be relative or absolute - additional flags TRAIN_DIR and VAL_DIR in src/config.py can be used to prepend the relative paths. It is up to you to decide how to encode the segmentation masks - in the NYU example, the masks are encoded without a colourmap, i.e., with a single digit (label) per 2-D location;
Make sure to adapt the implementation of the NYUDataset for your case in src/datasets.py: in particular, pay attention to how the images and masks are being read from the files;
Modify src/config.py for your needs - do not forget about changing the number of classes (NUM_CLASSES);
Finally, run your code - see train/nyu.sh for example.

More to come

Once time permits, more things will be added to this repository:

NASNet-Mobile
CityScapes' models
~~Full training pipeline example~~
~~Evaluation scripts~~ (src/train.py provides the flag --evaluate)

More projects to check out

Our most recent work on real-time joint semantic segmentation and depth estimation is built on top of Light-Weight RefineNet with MobileNet-v2. Check out the paper here; the models are available here!
RefineNet-101 trained on PASCAL VOC is available here

License

For academic usage, this project is licensed under the 2-clause BSD License - see the LICENSE file for details. For commercial usage, please contact the authors.

Acknowledgments

University of Adelaide and Australian Centre for Robotic Vision (ACRV) for making this project happen
HPC Phoenix cluster at the University of Adelaide for making the training of the models possible
PyTorch developers
Google Colab
Yerba mate tea

light-weight-refinenet's People

Contributors

Stargazers

Watchers

Forkers

thorjonsson xiuyangleiasp ml-lab crossli xqpinitial batermj taotaoyuhust gc5218112 extrememart wpf535236337 jingting9 jianweilin liuguoyou jdc08161063 laoyoutiaotiao yyfyan lyk125 andrewssobral dreadlord1984 ou-zhi-hui gatarelib murari023 smilejx ygx2011 maolb hongzhenwang seushengchao doctorwk007 benjaminzhouyj codes-kzhan soulempty hansry qianrenjian reach-xx zizi21 1093842024 queenie88 hust-wayne zhenxingsh soccergame chenxinglili goldcarpenter stanstarks xjmeng001 zhangxgu fendaq xuyuewei zhang405744522 moonmyth suyanzhou626 dlwbm123 yepw skycctw lijiunderstand rambosofter bhheo ljun901527 aimhabo acmaheri jiaojiaozhang shiyuan0806 cloveryww yyspark vicchu kleinxin kingyj7 minxuanjun xiamenwcy kelvinson qinfeiyu19941208 ason93 mumujun97 templeblock zebrajack minzhangm zccvip7 garfield2005 davis-love-ai zhenzhenxiang jonathanfly wesley-kang joegue emily9901 carloschen2003 just-blank skycloudwind hengzhangmiivii hcsearch fishman2008 likeafoolqvq thomaslin1990 liukai2008 gaunthan leo-xxx veintiocho 0316038 lw230 sybil12 cookiecheng yxinjiang

light-weight-refinenet's Issues

About RuntimeError：CUDA out of memory

Hi,
Thanks for your wonderful work and detailed tutorial.
I am just a fresh new here, when I try to retrain the model, there will be RuntimeError. Then I set the Batch_Size (config.py) to [1] * 3, it also can't work. I wonder if you have ever met this problem?
Could you please help me?
Thanks in advance!

INFO:main: Train epoch: 0 [0/795] Avg. Loss: 3.751 Avg. Time: 1.046
Traceback (most recent call last):
File "src/train.py", line 425, in
main()
File "src/train.py", line 409, in main
args.freeze_bn[task_idx])
File "src/train.py", line 273, in train_segmenter
output = segmenter(input_var)
File "/home/txr/.virtualenvs/env_test/lib/python3.5/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/txr/.virtualenvs/env_test/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/txr/.virtualenvs/env_test/lib/python3.5/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/txr/SS/light-weight-refinenet/models/resnet.py", line 237, in forward
x1 = self.mflow_conv_g4_pool(x1)
File "/home/txr/.virtualenvs/env_test/lib/python3.5/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/txr/.virtualenvs/env_test/lib/python3.5/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/txr/.virtualenvs/env_test/lib/python3.5/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/txr/SS/light-weight-refinenet/utils/layer_factory.py", line 72, in forward
top = self.maxpool(top)
File "/home/txr/.virtualenvs/env_test/lib/python3.5/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/txr/.virtualenvs/env_test/lib/python3.5/site-packages/torch/nn/modules/pooling.py", line 146, in forward
self.return_indices)
File "/home/txr/.virtualenvs/env_test/lib/python3.5/site-packages/torch/_jit_internal.py", line 133, in fn
return if_false(*args, **kwargs)
File "/home/txr/.virtualenvs/env_test/lib/python3.5/site-packages/torch/nn/functional.py", line 494, in _max_pool2d
input, kernel_size, stride, padding, dilation, ceil_mode)
RuntimeError: CUDA out of memory. Tried to allocate 32.00 MiB (GPU 0; 1.96 GiB total capacity; 1.14 GiB already allocated; 20.06 MiB free; 41.52 MiB cached)

PASCAL Person-Part dataset

I want to retrain on the PASCAL Person-Part dataset, could you please send me the Person-Part dataset, thank you

please help me!!!!!!!!

./train/nyu.sh
INFO:main: Loaded Segmenter 50, ImageNet-Pre-Trained=True, #PARAMS=27.34M
/home/amax/anaconda3/lib/python3.6/site-packages/torch/nn/modules/loss.py:216: UserWarning: NLLLoss2d has been deprecated. Please use NLLLoss instead as a drop-in replacement and see https://pytorch.org/docs/master/nn.html#torch.nn.NLLLoss for more details.
warnings.warn("NLLLoss2d has been deprecated. "
INFO:main: Training Process Starts
INFO:main: Created train set = 7736 examples, val set = 48 examples
Traceback (most recent call last):
File "src/train.py", line 425, in
main()
File "src/train.py", line 388, in main
return validate(segmenter, val_loader, 0, num_classes=args.num_classes[task_idx])
File "src/train.py", line 317, in validate
output = segmenter(input_var)
File "/home/amax/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/amax/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/amax/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/data/yh/light-weight-refinenet-bak-origin/models/resnet.py", line 222, in forward
x3 = x3 + x4
RuntimeError: The size of tensor a (40) must match the size of tensor b (32) at non-singleton dimension 3

try to run train.py meet error

thanks for your work！I want to run your code, but I meet a error,could you help me？thanks very much！

/home/robot/fangyu_pytorch/bin/python /home/robot/PycharmProjects/light-weight-refinenet_train/src/train.py --enc 50 INFO:__main__: Loaded Segmenter 50, ImageNet-Pre-Trained=True, #PARAMS=27.40M /home/robot/fangyu_pytorch/lib/python3.6/site-packages/torch/nn/modules/loss.py:206: UserWarning: NLLLoss2d has been deprecated. Please use NLLLoss instead as a drop-in replacement and see http://pytorch.org/docs/master/nn.html#torch.nn.NLLLoss for more details. warnings.warn("NLLLoss2d has been deprecated. " INFO:__main__: Training Process Starts INFO:__main__: Created train set = 795 examples, val set = 654 examples INFO:__main__: Training Stage 0 INFO:__main__: Enc. parameter: module.conv1.weight INFO:__main__: Enc. parameter: module.bn1.weight INFO:__main__: Enc. parameter: module.bn1.bias INFO:__main__: Enc. parameter: module.layer1.0.conv1.weight INFO:__main__: Enc. parameter: module.layer1.0.bn1.weight INFO:__main__: Enc. parameter: module.layer1.0.bn1.bias INFO:__main__: Enc. parameter: module.layer1.0.conv2.weight INFO:__main__: Enc. parameter: module.layer1.0.bn2.weight INFO:__main__: Enc. parameter: module.layer1.0.bn2.bias INFO:__main__: Enc. parameter: module.layer1.0.conv3.weight INFO:__main__: Enc. parameter: module.layer1.0.bn3.weight INFO:__main__: Enc. parameter: module.layer1.0.bn3.bias INFO:__main__: Enc. parameter: module.layer1.0.downsample.0.weight INFO:__main__: Enc. parameter: module.layer1.0.downsample.1.weight INFO:__main__: Enc. parameter: module.layer1.0.downsample.1.bias INFO:__main__: Enc. parameter: module.layer1.1.conv1.weight INFO:__main__: Enc. parameter: module.layer1.1.bn1.weight INFO:__main__: Enc. parameter: module.layer1.1.bn1.bias INFO:__main__: Enc. parameter: module.layer1.1.conv2.weight INFO:__main__: Enc. parameter: module.layer1.1.bn2.weight INFO:__main__: Enc. parameter: module.layer1.1.bn2.bias INFO:__main__: Enc. parameter: module.layer1.1.conv3.weight INFO:__main__: Enc. parameter: module.layer1.1.bn3.weight INFO:__main__: Enc. parameter: module.layer1.1.bn3.bias INFO:__main__: Enc. parameter: module.layer1.2.conv1.weight INFO:__main__: Enc. parameter: module.layer1.2.bn1.weight INFO:__main__: Enc. parameter: module.layer1.2.bn1.bias INFO:__main__: Enc. parameter: module.layer1.2.conv2.weight INFO:__main__: Enc. parameter: module.layer1.2.bn2.weight INFO:__main__: Enc. parameter: module.layer1.2.bn2.bias INFO:__main__: Enc. parameter: module.layer1.2.conv3.weight INFO:__main__: Enc. parameter: module.layer1.2.bn3.weight INFO:__main__: Enc. parameter: module.layer1.2.bn3.bias INFO:__main__: Enc. parameter: module.layer2.0.conv1.weight INFO:__main__: Enc. parameter: module.layer2.0.bn1.weight INFO:__main__: Enc. parameter: module.layer2.0.bn1.bias INFO:__main__: Enc. parameter: module.layer2.0.conv2.weight INFO:__main__: Enc. parameter: module.layer2.0.bn2.weight INFO:__main__: Enc. parameter: module.layer2.0.bn2.bias INFO:__main__: Enc. parameter: module.layer2.0.conv3.weight INFO:__main__: Enc. parameter: module.layer2.0.bn3.weight INFO:__main__: Enc. parameter: module.layer2.0.bn3.bias INFO:__main__: Enc. parameter: module.layer2.0.downsample.0.weight INFO:__main__: Enc. parameter: module.layer2.0.downsample.1.weight INFO:__main__: Enc. parameter: module.layer2.0.downsample.1.bias INFO:__main__: Enc. parameter: module.layer2.1.conv1.weight INFO:__main__: Enc. parameter: module.layer2.1.bn1.weight INFO:__main__: Enc. parameter: module.layer2.1.bn1.bias INFO:__main__: Enc. parameter: module.layer2.1.conv2.weight INFO:__main__: Enc. parameter: module.layer2.1.bn2.weight INFO:__main__: Enc. parameter: module.layer2.1.bn2.bias INFO:__main__: Enc. parameter: module.layer2.1.conv3.weight INFO:__main__: Enc. parameter: module.layer2.1.bn3.weight INFO:__main__: Enc. parameter: module.layer2.1.bn3.bias INFO:__main__: Enc. parameter: module.layer2.2.conv1.weight INFO:__main__: Enc. parameter: module.layer2.2.bn1.weight INFO:__main__: Enc. parameter: module.layer2.2.bn1.bias INFO:__main__: Enc. parameter: module.layer2.2.conv2.weight INFO:__main__: Enc. parameter: module.layer2.2.bn2.weight INFO:__main__: Enc. parameter: module.layer2.2.bn2.bias INFO:__main__: Enc. parameter: module.layer2.2.conv3.weight INFO:__main__: Enc. parameter: module.layer2.2.bn3.weight INFO:__main__: Enc. parameter: module.layer2.2.bn3.bias INFO:__main__: Enc. parameter: module.layer2.3.conv1.weight INFO:__main__: Enc. parameter: module.layer2.3.bn1.weight INFO:__main__: Enc. parameter: module.layer2.3.bn1.bias INFO:__main__: Enc. parameter: module.layer2.3.conv2.weight INFO:__main__: Enc. parameter: module.layer2.3.bn2.weight INFO:__main__: Enc. parameter: module.layer2.3.bn2.bias INFO:__main__: Enc. parameter: module.layer2.3.conv3.weight INFO:__main__: Enc. parameter: module.layer2.3.bn3.weight INFO:__main__: Enc. parameter: module.layer2.3.bn3.bias INFO:__main__: Enc. parameter: module.layer3.0.conv1.weight INFO:__main__: Enc. parameter: module.layer3.0.bn1.weight INFO:__main__: Enc. parameter: module.layer3.0.bn1.bias INFO:__main__: Enc. parameter: module.layer3.0.conv2.weight INFO:__main__: Enc. parameter: module.layer3.0.bn2.weight INFO:__main__: Enc. parameter: module.layer3.0.bn2.bias INFO:__main__: Enc. parameter: module.layer3.0.conv3.weight INFO:__main__: Enc. parameter: module.layer3.0.bn3.weight INFO:__main__: Enc. parameter: module.layer3.0.bn3.bias INFO:__main__: Enc. parameter: module.layer3.0.downsample.0.weight INFO:__main__: Enc. parameter: module.layer3.0.downsample.1.weight INFO:__main__: Enc. parameter: module.layer3.0.downsample.1.bias INFO:__main__: Enc. parameter: module.layer3.1.conv1.weight INFO:__main__: Enc. parameter: module.layer3.1.bn1.weight INFO:__main__: Enc. parameter: module.layer3.1.bn1.bias INFO:__main__: Enc. parameter: module.layer3.1.conv2.weight INFO:__main__: Enc. parameter: module.layer3.1.bn2.weight INFO:__main__: Enc. parameter: module.layer3.1.bn2.bias INFO:__main__: Enc. parameter: module.layer3.1.conv3.weight INFO:__main__: Enc. parameter: module.layer3.1.bn3.weight INFO:__main__: Enc. parameter: module.layer3.1.bn3.bias INFO:__main__: Enc. parameter: module.layer3.2.conv1.weight INFO:__main__: Enc. parameter: module.layer3.2.bn1.weight INFO:__main__: Enc. parameter: module.layer3.2.bn1.bias INFO:__main__: Enc. parameter: module.layer3.2.conv2.weight INFO:__main__: Enc. parameter: module.layer3.2.bn2.weight INFO:__main__: Enc. parameter: module.layer3.2.bn2.bias INFO:__main__: Enc. parameter: module.layer3.2.conv3.weight INFO:__main__: Enc. parameter: module.layer3.2.bn3.weight INFO:__main__: Enc. parameter: module.layer3.2.bn3.bias INFO:__main__: Enc. parameter: module.layer3.3.conv1.weight INFO:__main__: Enc. parameter: module.layer3.3.bn1.weight INFO:__main__: Enc. parameter: module.layer3.3.bn1.bias INFO:__main__: Enc. parameter: module.layer3.3.conv2.weight INFO:__main__: Enc. parameter: module.layer3.3.bn2.weight INFO:__main__: Enc. parameter: module.layer3.3.bn2.bias INFO:__main__: Enc. parameter: module.layer3.3.conv3.weight INFO:__main__: Enc. parameter: module.layer3.3.bn3.weight INFO:__main__: Enc. parameter: module.layer3.3.bn3.bias INFO:__main__: Enc. parameter: module.layer3.4.conv1.weight INFO:__main__: Enc. parameter: module.layer3.4.bn1.weight INFO:__main__: Enc. parameter: module.layer3.4.bn1.bias INFO:__main__: Enc. parameter: module.layer3.4.conv2.weight INFO:__main__: Enc. parameter: module.layer3.4.bn2.weight INFO:__main__: Enc. parameter: module.layer3.4.bn2.bias INFO:__main__: Enc. parameter: module.layer3.4.conv3.weight INFO:__main__: Enc. parameter: module.layer3.4.bn3.weight INFO:__main__: Enc. parameter: module.layer3.4.bn3.bias INFO:__main__: Enc. parameter: module.layer3.5.conv1.weight INFO:__main__: Enc. parameter: module.layer3.5.bn1.weight INFO:__main__: Enc. parameter: module.layer3.5.bn1.bias INFO:__main__: Enc. parameter: module.layer3.5.conv2.weight INFO:__main__: Enc. parameter: module.layer3.5.bn2.weight INFO:__main__: Enc. parameter: module.layer3.5.bn2.bias INFO:__main__: Enc. parameter: module.layer3.5.conv3.weight INFO:__main__: Enc. parameter: module.layer3.5.bn3.weight INFO:__main__: Enc. parameter: module.layer3.5.bn3.bias INFO:__main__: Enc. parameter: module.layer4.0.conv1.weight INFO:__main__: Enc. parameter: module.layer4.0.bn1.weight INFO:__main__: Enc. parameter: module.layer4.0.bn1.bias INFO:__main__: Enc. parameter: module.layer4.0.conv2.weight INFO:__main__: Enc. parameter: module.layer4.0.bn2.weight INFO:__main__: Enc. parameter: module.layer4.0.bn2.bias INFO:__main__: Enc. parameter: module.layer4.0.conv3.weight INFO:__main__: Enc. parameter: module.layer4.0.bn3.weight INFO:__main__: Enc. parameter: module.layer4.0.bn3.bias INFO:__main__: Enc. parameter: module.layer4.0.downsample.0.weight INFO:__main__: Enc. parameter: module.layer4.0.downsample.1.weight INFO:__main__: Enc. parameter: module.layer4.0.downsample.1.bias INFO:__main__: Enc. parameter: module.layer4.1.conv1.weight INFO:__main__: Enc. parameter: module.layer4.1.bn1.weight INFO:__main__: Enc. parameter: module.layer4.1.bn1.bias INFO:__main__: Enc. parameter: module.layer4.1.conv2.weight INFO:__main__: Enc. parameter: module.layer4.1.bn2.weight INFO:__main__: Enc. parameter: module.layer4.1.bn2.bias INFO:__main__: Enc. parameter: module.layer4.1.conv3.weight INFO:__main__: Enc. parameter: module.layer4.1.bn3.weight INFO:__main__: Enc. parameter: module.layer4.1.bn3.bias INFO:__main__: Enc. parameter: module.layer4.2.conv1.weight INFO:__main__: Enc. parameter: module.layer4.2.bn1.weight INFO:__main__: Enc. parameter: module.layer4.2.bn1.bias INFO:__main__: Enc. parameter: module.layer4.2.conv2.weight INFO:__main__: Enc. parameter: module.layer4.2.bn2.weight INFO:__main__: Enc. parameter: module.layer4.2.bn2.bias INFO:__main__: Enc. parameter: module.layer4.2.conv3.weight INFO:__main__: Enc. parameter: module.layer4.2.bn3.weight INFO:__main__: Enc. parameter: module.layer4.2.bn3.bias INFO:__main__: Dec. parameter: module.p_ims1d2_outl1_dimred.weight INFO:__main__: Dec. parameter: module.mflow_conv_g1_pool.0.1_outvar_dimred.weight INFO:__main__: Dec. parameter: module.mflow_conv_g1_pool.0.2_outvar_dimred.weight INFO:__main__: Dec. parameter: module.mflow_conv_g1_pool.0.3_outvar_dimred.weight INFO:__main__: Dec. parameter: module.mflow_conv_g1_pool.0.4_outvar_dimred.weight INFO:__main__: Dec. parameter: module.mflow_conv_g1_b3_joint_varout_dimred.weight INFO:__main__: Dec. parameter: module.p_ims1d2_outl2_dimred.weight INFO:__main__: Dec. parameter: module.adapt_stage2_b2_joint_varout_dimred.weight INFO:__main__: Dec. parameter: module.mflow_conv_g2_pool.0.1_outvar_dimred.weight INFO:__main__: Dec. parameter: module.mflow_conv_g2_pool.0.2_outvar_dimred.weight INFO:__main__: Dec. parameter: module.mflow_conv_g2_pool.0.3_outvar_dimred.weight INFO:__main__: Dec. parameter: module.mflow_conv_g2_pool.0.4_outvar_dimred.weight INFO:__main__: Dec. parameter: module.mflow_conv_g2_b3_joint_varout_dimred.weight INFO:__main__: Dec. parameter: module.p_ims1d2_outl3_dimred.weight INFO:__main__: Dec. parameter: module.adapt_stage3_b2_joint_varout_dimred.weight INFO:__main__: Dec. parameter: module.mflow_conv_g3_pool.0.1_outvar_dimred.weight INFO:__main__: Dec. parameter: module.mflow_conv_g3_pool.0.2_outvar_dimred.weight INFO:__main__: Dec. parameter: module.mflow_conv_g3_pool.0.3_outvar_dimred.weight INFO:__main__: Dec. parameter: module.mflow_conv_g3_pool.0.4_outvar_dimred.weight INFO:__main__: Dec. parameter: module.mflow_conv_g3_b3_joint_varout_dimred.weight INFO:__main__: Dec. parameter: module.p_ims1d2_outl4_dimred.weight INFO:__main__: Dec. parameter: module.adapt_stage4_b2_joint_varout_dimred.weight INFO:__main__: Dec. parameter: module.mflow_conv_g4_pool.0.1_outvar_dimred.weight INFO:__main__: Dec. parameter: module.mflow_conv_g4_pool.0.2_outvar_dimred.weight INFO:__main__: Dec. parameter: module.mflow_conv_g4_pool.0.3_outvar_dimred.weight INFO:__main__: Dec. parameter: module.mflow_conv_g4_pool.0.4_outvar_dimred.weight INFO:__main__: Dec. parameter: module.clf_conv.weight INFO:__main__: Dec. parameter: module.clf_conv.bias THCudaCheck FAIL file=/pytorch/aten/src/THC/generated/../generic/THCTensorMathPointwise.cu line=266 error=59 : device-side assert triggered Traceback (most recent call last): File "/home/robot/PycharmProjects/light-weight-refinenet_train/src/train.py", line 422, in <module> main() File "/home/robot/PycharmProjects/light-weight-refinenet_train/src/train.py", line 406, in main epoch_start, segm_crit,args.freeze_bn[task_idx]) File "/home/robot/PycharmProjects/light-weight-refinenet_train/src/train.py", line 279, in train_segmenter optim_enc.step() File "/home/robot/fangyu_pytorch/lib/python3.6/site-packages/torch/optim/sgd.py", line 93, in step d_p.add_(weight_decay, p.data) RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/aten/src/THC/generated/../generic/THCTensorMathPointwise.cu:266

What data set is the Person model trained with? Can you send this data set to me?

I train NYU,but the results of your model is better than the ones I train with your own program.

when I train your model,mean iou<0.5

train error.

File "/home/.pyenv/versions/anaconda3-5.2.0/envs/pytorch/lib/python3.6/site-packages/torch/tensor.py", line 102, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/.pyenv/versions/anaconda3-5.2.0/envs/pytorch/lib/python3.6/site-packages/torch/autograd/init.py", line 90, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

could you help me

How to convert the NYU depth images into the same image_size of your dataset?

I download the NYU dataset from your link. As I see, the image_size are not 640*480 since you have removed the district label==255. How can I do the same thing to the depth images? Or could you give me the corresponding depth images with the same size of your RGB and segmentation mask files?

How to train on Custom Dataset?

Bravo for nice work!

I am also interested in retraining for Person Parts for custom dataset, any roadmap to do so?

Thanks

Could code work segmenting building footprints from aerials?

My real interest in semantic segmentation using RefineNet is to essentially replicate the workflow used by Microsoft to extract building footprints from aerials. Is the code you have posted suitable to be adapted to extract building footprints from aerials? I know that I would have to prepare aerial image and mask chips for the training and validation data that fit the code's input requirements, but are there other changes that would have to be made to this code or its inputs to make this work for the task of segmenting building footprints?

I realize this code does not account for several aspects of the problem I am trying to solve, such as the handling of the global coordinates of the aerial tiles, but I am hoping that this code could provide a starting point for training a RefineNet enhanced building footprint semantic segmentation model. If not, are you aware of any other sources of code that would be better suited to that task that you could share?

What is the minimum number of classes required by the code?

I tried running the model with my own dataset using images that had a code of 0 for buildings and 255 (ignore) for everything else and the model reached perfect validation after 1 epoch if I set the number of classes to 1 and 2 epochs if I set the number of classes to 2. Either way, I assume that is not how the model is supposed to work.

Do I need at least two classes that are not ignored for the model to work? Would the model work if I assigned a code of 0 for buildings and a code of 1 for everything else in the image? If that does not work and I have to properly classify a lot of objects in my images that will significantly reduce the number of images I will be able to use with the model, since I only have buildings classified at this point. I have 4000 images ready to go if I could just classify buildings and make everything else an ignored portion of the image or a single class, but I would only have 100 or less if I have to manually classify real objects in the images other than buildings. Also if I had a dataset with codes of 0, 1, and 255 (ignore) would the number of classes parameter in the configuration file need to be set to 2 classes or 3 classes?

Additionally, does the model work properly with 32 bit images or does it require 24 bit images like the images in the nyu dataset?

Confusion about task_idx

Hi! Great work with this repo!

I have a question regarding how you structured the training process. Why do you have 3-dimensional values in the config file?

I've seen that the task_idx value will iterate until it reaches the value of num_stages (which is assigned the depth of the num_classes variable, i.e. 3) and I am not sure of what it means or why it is used.

Thanks!

evaluation on NYU looks horrible

Hi Vladimir,

Thanks for your great work and generous release. Your tutorial to reproduce the results is very detailed, and I appreciate it a lot.

I am trying to use your code in my research project. First I tried to re-train the network on NYUD with the provided code, the IoU is 0.419 which is fine. Then I tried to visualize the prediction as you did in the notebook. But the result looks pretty bad, and I don't know why the predicted mask image is green.

I tried evaluation with given weight file and the file I trained myself, the difference is huge as follows. The only difference I made in the code between two result figures is loading different weight file.

Do you have an idea why this could happen?

Thank you in advance!

Training error

Hi, @DrSleep , I am appreciate for your works, but when I directly run the scripts named train/nyu.sh, it run on three NVIDIA 1080ti cards but got blocked when evaluating the trained model after 16 epoches, I tried to change the value of num_workers in src/config.py from 16 to 1, it still occurred. But, when I set it to zero, it works, the training procedure would not be blocked, can you enplane it? Thanks a lot.

How can I get the paper`s supplementary material?

Hi!
I am reproducing the MobilenetV2s performance in PascalVOC but suffer from some difficults(the models performance do not get better with training, the performance in val set is very bad--all classes(except background)`s iou is zero).So, could you please give more detailed info about the trainning strategy？
Thanks

Is the label a binary image or an RGB image?

Is the label a binary image or an RGB image? I used the non-binary image to report errors all the time. There are two classes : objects and backgrounds

The mobilenetV2 produced abnormal results

I adapted the function create_segmenter in train.py, by adding a branch as follow:

 elif str(net) == 'mbv2':
        from models.mobilenet import mbv2
        return mbv2(num_classes, pretrained=pretrained)

I also adapted other necessary codes such as dataloader and then I ran the code using provided mbv2 on NYU and cityscapes. But I got very low mIoUs on both these datasets. The results were similar to follow and almost invariant during training:

IoUs: [0.233812, 0, 0, 0, 0, 0, 0, ...] 
Mean IoU: 0.006

It seemed that the model classify all pixels as background.

I also ran the provided resnet101 and its results were normal. Is there something wrong with the code in mobilenet.py ?

can you share your training skills?

Thanks for your work,It is very exciting.I try to train this network,But I only get 70 mIoU,and could you share your training skills?thanks very much!

Weights of trained model

Can you please upload the trained model (dataset no matter)?

cpu run

Hi,
It is my pleasure to learn your code ,but i do not have a capable GPU,so could you tell me how i run the codes in CPU? I have run .but runtime error occured, does CPU need cuda?

RuntimeError: Error(s) in loading state_dict for DataParallel

I am so sorry that I am a freshman for it, I just change NUM_CLASSES = [40] * 3 in this file(config.py) to NUM_CLASSES = [2] * 3 because of my train(val) datasets needed to be devided into 2 classes.

but it run,and error:

INFO:main: Loaded Segmenter 50, ImageNet-Pre-Trained=True, #PARAMS=27.31M
Traceback (most recent call last):
File "C:/Users/likun3/Documents/light-weight-refinenet-master/src/train.py", line 432, in
main()
File "C:/Users/likun3/Documents/light-weight-refinenet-master/src/train.py", line 367, in main
best_val, epoch_start = load_ckpt(args.ckpt_path, {'segmenter' : segmenter})
File "C:/Users/likun3/Documents/light-weight-refinenet-master/src/train.py", line 240, in load_ckpt
v.load_state_dict(ckpt[k])
File "D:\Soft\Anaconda_\envs\dp\lib\site-packages\torch\nn\modules\module.py", line 769, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for DataParallel:
size mismatch for module.clf_conv.weight: copying a param with shape torch.Size([40, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([2, 256, 3, 3]).
size mismatch for module.clf_conv.bias: copying a param with shape torch.Size([40]) from checkpoint, the shape in current model is torch.Size([2]).

Could someone help me ? please~~~

I want to ask, how did the PNG file come out, do I have to train myself?

About evaluation

Hi, @DrSleep , I find there is no evaluation scripts to test the trained model, but output the mean IoU during training procedure, is that means I don't need to test the model?

Something unclear

When using validate function, the output of the network doesn't go through the operation of softmax. Why? It is not clear?

Joint semantic segmentation and depth estimation

Hi and thanks for the interesting work!
I was wondering when do you expect to release the model for joint segmentation and depth estimation?

Thanks!

test code

thanks for your great job!
Is there a test code to get the predicted segmentation result?

_pickle.UnpicklingError: unpickling stack underflow

hello, i meet some problems now. when i try "To start training on NYU:" this part, i got "Connected to pydev debugger (build 182.3911.33)
Traceback (most recent call last):
File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 1664, in
main()
File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 1658, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 1068, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/Users/luo/Documents/code/light-weight-refinenet/src/train.py", line 435, in
main()
File "/Users/luo/Documents/code/light-weight-refinenet/src/train.py", line 364, in main
create_segmenter(args.enc, args.enc_pretrained, args.num_classes[0])
File "/Users/luo/Documents/code/light-weight-refinenet/src/train.py", line 136, in create_segmenter
return rf_lw50(num_classes, imagenet=pretrained)
File "/Users/luo/Documents/code/light-weight-refinenet/models/resnet.py", line 249, in rf_lw50
model.load_state_dict(maybe_download(key, url), strict=False)
File "/Users/luo/Documents/code/light-weight-refinenet/utils/helpers.py", line 22, in maybe_download
return torch.load(cached_file, map_location=map_location)
File "/Users/luo/anaconda3/lib/python3.7/site-packages/torch/serialization.py", line 358, in load
return _load(f, map_location, pickle_module)
File "/Users/luo/anaconda3/lib/python3.7/site-packages/torch/serialization.py", line 532, in _load
magic_number = pickle_module.load(f)
_pickle.UnpicklingError: unpickling stack underflow

Process finished with exit code 1
"
can you help me to solve this? thx

model load error

RuntimeError Traceback (most recent call last)
in ()
9 models = dict()
10 for key,fun in six.iteritems(model_inits):
---> 11 net = fun(n_classes, pretrained=True).eval()
12 if has_cuda:
13 net = net.cuda()

/home/lc/work/light-weight-refinenet/models/resnet.py in rf_lw50(num_classes, pretrained, **kwargs)
241 key = 'rf_lw' + bname
242 url = models_urls[bname]
--> 243 model.load_state_dict(maybe_download(key, url))
244 return model
245

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in load_state_dict(self, state_dict, strict)
719 if len(error_msgs) > 0:
720 raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
--> 721 self.class.name, "\n\t".join(error_msgs)))
722
723 def parameters(self):

RuntimeError: Error(s) in loading state_dict for ResNetLW:
Unexpected key(s) in state_dict: "bn1.num_batches_tracked", "layer1.0.bn1.num_batches_tracked", "layer1.0.bn2.num_batches_tracked", "layer1.0.bn3.num_batches_tracked", "layer1.0.downsample.1.num_batches_tracked", "layer1.1.bn1.num_batches_tracked", "layer1.1.bn2.num_batches_tracked", "layer1.1.bn3.num_batches_tracked", "layer1.2.bn1.num_batches_tracked", "layer1.2.bn2.num_batches_tracked", "layer1.2.bn3.num_batches_tracked", "layer2.0.bn1.num_batches_tracked", "layer2.0.bn2.num_batches_tracked", "layer2.0.bn3.num_batches_tracked", "layer2.0.downsample.1.num_batches_tracked", "layer2.1.bn1.num_batches_tracked", "layer2.1.bn2.num_batches_tracked", "layer2.1.bn3.num_batches_tracked", "layer2.2.bn1.num_batches_tracked", "layer2.2.bn2.num_batches_tracked", "layer2.2.bn3.num_batches_tracked", "layer2.3.bn1.num_batches_tracked", "layer2.3.bn2.num_batches_tracked", "layer2.3.bn3.num_batches_tracked", "layer3.0.bn1.num_batches_tracked", "layer3.0.bn2.num_batches_tracked", "layer3.0.bn3.num_batches_tracked", "layer3.0.downsample.1.num_batches_tracked", "layer3.1.bn1.num_batches_tracked", "layer3.1.bn2.num_batches_tracked", "layer3.1.bn3.num_batches_tracked", "layer3.2.bn1.num_batches_tracked", "layer3.2.bn2.num_batches_tracked", "layer3.2.bn3.num_batches_tracked", "layer3.3.bn1.num_batches_tracked", "layer3.3.bn2.num_batches_tracked", "layer3.3.bn3.num_batches_tracked", "layer3.4.bn1.num_batches_tracked", "layer3.4.bn2.num_batches_tracked", "layer3.4.bn3.num_batches_tracked", "layer3.5.bn1.num_batches_tracked", "layer3.5.bn2.num_batches_tracked", "layer3.5.bn3.num_batches_tracked", "layer4.0.bn1.num_batches_tracked", "layer4.0.bn2.num_batches_tracked", "layer4.0.bn3.num_batches_tracked", "layer4.0.downsample.1.num_batches_tracked", "layer4.1.bn1.num_batches_tracked", "layer4.1.bn2.num_batches_tracked", "layer4.1.bn3.num_batches_tracked", "layer4.2.bn1.num_batches_tracked", "layer4.2.bn2.num_batches_tracked", "layer4.2.bn3.num_batches_tracked".

How to train the model?

Could you please release the training demo code ?

Each validation, traing gets stuck.

INFO:main: Val epoch: 28 [500/654] Mean IoU: 0.329
INFO:main: Val epoch: 28 [510/654] Mean IoU: 0.333
INFO:main: Val epoch: 28 [520/654] Mean IoU: 0.334
INFO:main: Val epoch: 28 [530/654] Mean IoU: 0.334
INFO:main: Val epoch: 28 [540/654] Mean IoU: 0.334
INFO:main: Val epoch: 28 [550/654] Mean IoU: 0.336
INFO:main: Val epoch: 28 [560/654] Mean IoU: 0.337
INFO:main: Val epoch: 28 [570/654] Mean IoU: 0.337
INFO:main: Val epoch: 28 [580/654] Mean IoU: 0.337
INFO:main: Val epoch: 28 [590/654] Mean IoU: 0.337
INFO:main: Val epoch: 28 [600/654] Mean IoU: 0.338
INFO:main: Val epoch: 28 [610/654] Mean IoU: 0.338
INFO:main: Val epoch: 28 [620/654] Mean IoU: 0.339
INFO:main: Val epoch: 28 [630/654] Mean IoU: 0.340
INFO:main: Val epoch: 28 [640/654] Mean IoU: 0.341
INFO:main: Val epoch: 28 [650/654] Mean IoU: 0.340
INFO:main: IoUs: [0.71555132 0.79084598 0.38018856 0.58270573 0.49350641 0.53325564
0.33868351 0.24003405 0.35049824 0.38725722 0.53552238 0.43335014
0.52524775 0.13147147 0.06738203 0.45353059 0.10944007 0.35642416
0.13853911 0.24521033 0.2119736 0.52864194 0.25071423 0.31138197
0.40138153 0.23264673 0.31995535 0.16649904 0.06017227 0.3065639
0.61248457 0.18471847 0.68977884 0.36473194 0.31989985 0.28686686
0.0092722 0.17552748 0.12222387 0.28322877]
INFO:main: Val epoch: 28 Mean IoU: 0.341
saving
INFO:main: New best value 0.3412, was 0.3211
saving done
starting *********

Reproducing cityscapes results

Hi, thanks for your work. Can you share the batchsize you used, the input resolution and the crop resolution to achieve 72% with resnet-101?

Update: I cannot get past 33% mIoU using your training script, obviously something is wrong. I will give you some details on the training.

This is my config file:

SHORTER_SIDE = [1024] * 3
CROP_SIZE = [769] * 3
NORMALISE_PARAMS = [1./255, # SCALE
                    np.array([0.290, 0.328, 0.286]).reshape((1, 1, 3)), # MEAN
                    np.array([0.182, 0.186, 0.184]).reshape((1, 1, 3))] # STD
BATCH_SIZE = [6] * 3
NUM_WORKERS = 0
NUM_CLASSES = [19] * 3
LOW_SCALE = [0.5] * 3
HIGH_SCALE = [2.0] * 3
IGNORE_LABEL = 255

# ENCODER PARAMETERS
ENC = '101'
ENC_PRETRAINED = True  # pre-trained on ImageNet or randomly initialised

# GENERAL
EVALUATE = False
FREEZE_BN = [True] * 3
NUM_SEGM_EPOCHS = [100] * 3
PRINT_EVERY = 10
RANDOM_SEED = 42
SNAPSHOT_DIR = './ckpt/'
CKPT_PATH = './ckpt/checkpoint.pth.tar'
VAL_EVERY = [1] * 3 # how often to record validation scores

# OPTIMISERS' PARAMETERS
LR_ENC = [5e-4, 2.5e-4, 1e-4]  # TO FREEZE, PUT 0
LR_DEC = [5e-3, 2.5e-3, 1e-3]
MOM_ENC = [0.9] * 3 # TO FREEZE, PUT 0
MOM_DEC = [0.9] * 3
WD_ENC = [1e-5] * 3 # TO FREEZE, PUT 0
WD_DEC = [1e-5] * 3
OPTIM_DEC = 'sgd'

The dataset loader has not been changed. Do you have any idea on how to solve the problem?

python setup.py build_ext --build-lib=./src/的时候遇到错误

C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.15.26726\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -ID:\Anaconda3.5\lib\site-packages\numpy\core\include -ID:\Anaconda3.5\include -ID:\Anaconda3.5\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.15.26726\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.15.26726\include" /TcF:\light-weight-refinenet-master/src/miou_utils.c /Fobuild\temp.win-amd64-3.6\Release\light-weight-refinenet-master/src/miou_utils.obj
miou_utils.c
d:\anaconda3.5\include\pyconfig.h(59): fatal error C1083: 无法打开包括文件: “io.h”: No such file or directory
error: command 'C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.15.26726\bin\HostX86\x64\cl.exe' failed with exit status 2

train and eval scripts

Best work.
Can you provide the train and eval scripts?
thank you!

About NYUD label

I see you ignore label '255' in your code, and num_classes of NYUD is 40. So the label '0' means 'wall', and '39' means 'otherprop', right? Unlabeled class is '255'?

What is the model of cmap?

What is the model of cmap?
for mname, mnet in six.iteritems(models):
print(mnet(img_inp).shape)
segm = cmap[segm.argmax(axis=2).astype(np.uint8)]

torch.Size([1, 40, 117, 157]) , 117 157 Is the length and width of the picture?
I want to draw the border of each category，But I still don't understand the output of the model rf_lw50

I try to convert .pth.tar to onnx file, please help me!!!

amax@amax:/data/yh/light-weight-refinenet$ python make_onnx.py
Traceback (most recent call last):
File "make_onnx.py", line 17, in
torch.onnx.export(segmenter,dummy_input,"yh_refinenet.onnx",verbose=True)
File "/home/amax/anaconda3/lib/python3.6/site-packages/torch/onnx/init.py", line 25, in export
return utils.export(*args, **kwargs)
File "/home/amax/anaconda3/lib/python3.6/site-packages/torch/onnx/utils.py", line 131, in export
strip_doc_string=strip_doc_string)
File "/home/amax/anaconda3/lib/python3.6/site-packages/torch/onnx/utils.py", line 363, in _export
_retain_param_name, do_constant_folding)
File "/home/amax/anaconda3/lib/python3.6/site-packages/torch/onnx/utils.py", line 278, in _model_to_graph
_disable_torch_constant_prop=_disable_torch_constant_prop)
File "/home/amax/anaconda3/lib/python3.6/site-packages/torch/onnx/utils.py", line 183, in _optimize_graph
torch._C._jit_pass_lower_all_tuples(graph)
RuntimeError: tuple appears in op that does not forward tuples (VisitNode at /opt/conda/conda-bld/pytorch_1556653099582/work/torch/csrc/jit/passes/lower_tuples.cpp:117)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7fd313353dc5 in /home/amax/anaconda3/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: + 0xadf7d0 (0x7fd30e4777d0 in /home/amax/anaconda3/lib/python3.6/site-packages/torch/lib/libtorch.so.1)
frame #2: + 0xadfa34 (0x7fd30e477a34 in /home/amax/anaconda3/lib/python3.6/site-packages/torch/lib/libtorch.so.1)
frame #3: torch::jit::LowerAllTuples(std::shared_ptrtorch::jit::Graph&) + 0x13 (0x7fd30e477a73 in /home/amax/anaconda3/lib/python3.6/site-packages/torch/lib/libtorch.so.1)
frame #4: + 0x3f59a4 (0x7fd3426259a4 in /home/amax/anaconda3/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #5: + 0x12ce4a (0x7fd34235ce4a in /home/amax/anaconda3/lib/python3.6/site-packages/torch/lib/libtorch_python.so)

frame #36: __libc_start_main + 0xf0 (0x7fd349179830 in /lib/x86_64-linux-gnu/libc.so.6)

my python:

import torch
from models.resnet import rf_lw50
from torch.autograd import Variable

file = 'ckpt/checkpoint.pth.tar'

segmenter = rf_lw50(15, False)
segmenter = torch.nn.DataParallel(segmenter)

ckpt = torch.load(file)
segmenter.load_state_dict(ckpt['segmenter'])
segmenter.cuda()
segmenter.eval()

dummy_input = Variable(torch.randn(1,3,360,640)).cuda()

torch.onnx.export(segmenter,dummy_input,"yh_refinenet.onnx",verbose=True)

How to calculate the 'FLOPS' of the given model?

Hi, in this paper, you compare the 'FLOPs' of different models, but I am confused about that how to get the value of one models' FLOPs', is to use some scripts or what? Thank you very much.

Reproducing PASCAL VOC

Hi I want to reproduce the results you have on PASCAL VOC by training on PASCAL. If you can guide me on the setup you used. So basically what was your config.py? I am assuming the config.py you're providing now is the one for NYUv2 training. Also if you can add the loader for PASCAL VOC.

Thanks

train my own data

Hi,DrSleep,
I want to train model using my own dataset,and I just want to 2 number class,when I changed the parameters in config.py,an error occured as flowing:

size mismatch for module.clf_conv.bias: copying a param with shape torch.Size([40]) from checkpoint, the shape in current model is torch.Size([2]).

in addition,when my label image is  binarized image,is the reason that the error occured?

ValueError: operands could not be broadcast together with remapped shapes [original->remapped]: (2,2) and requested shape (3,2).

hope your help,thanks very much.

Mnasnet

What about MNasnet https://github.com/AnjieZheng/MNasnet-Pytorch?

meet train error in stage 2

your job is great!
but when i train on my own data, i meet a error in train stage 2:

`INFO:main: Val epoch: 219 Mean IoU: 1.000
INFO:main: Train epoch: 220 [0/44] Avg. Loss: 0.000 Avg. Time: 0.311
INFO:main: Train epoch: 220 [10/44] Avg. Loss: 0.000 Avg. Time: 0.267
INFO:main: Train epoch: 220 [20/44] Avg. Loss: 0.000 Avg. Time: 0.264
INFO:main: Train epoch: 220 [30/44] Avg. Loss: 0.000 Avg. Time: 0.261
INFO:main: Train epoch: 220 [40/44] Avg. Loss: 0.000 Avg. Time: 0.262
INFO:main: Train epoch: 221 [0/44] Avg. Loss: 0.000 Avg. Time: 0.296
INFO:main: Train epoch: 221 [10/44] Avg. Loss: 0.000 Avg. Time: 0.261
INFO:main: Train epoch: 221 [20/44] Avg. Loss: 0.000 Avg. Time: 0.259
INFO:main: Train epoch: 221 [30/44] Avg. Loss: 0.000 Avg. Time: 0.261
INFO:main: Train epoch: 221 [40/44] Avg. Loss: 0.000 Avg. Time: 0.261
INFO:main: Train epoch: 222 [0/44] Avg. Loss: 0.000 Avg. Time: 0.277
INFO:main: Train epoch: 222 [10/44] Avg. Loss: 0.000 Avg. Time: 0.263
INFO:main: Train epoch: 222 [20/44] Avg. Loss: 0.000 Avg. Time: 0.264
INFO:main: Train epoch: 222 [30/44] Avg. Loss: 0.000 Avg. Time: 0.263
INFO:main: Train epoch: 222 [40/44] Avg. Loss: 0.000 Avg. Time: 0.263
INFO:main: Train epoch: 223 [0/44] Avg. Loss: 0.000 Avg. Time: 0.303
INFO:main: Train epoch: 223 [10/44] Avg. Loss: 0.000 Avg. Time: 0.267
INFO:main: Train epoch: 223 [20/44] Avg. Loss: 0.000 Avg. Time: 0.262
INFO:main: Train epoch: 223 [30/44] Avg. Loss: 0.000 Avg. Time: 0.261
INFO:main: Train epoch: 223 [40/44] Avg. Loss: 0.000 Avg. Time: 0.262
INFO:main: Train epoch: 224 [0/44] Avg. Loss: 0.000 Avg. Time: 0.288
INFO:main: Train epoch: 224 [10/44] Avg. Loss: 0.000 Avg. Time: 0.259
INFO:main: Train epoch: 224 [20/44] Avg. Loss: 0.000 Avg. Time: 0.259
INFO:main: Train epoch: 224 [30/44] Avg. Loss: 0.000 Avg. Time: 0.259
INFO:main: Train epoch: 224 [40/44] Avg. Loss: 0.000 Avg. Time: 0.259
INFO:main: Val epoch: 224 [0/31] Mean IoU: 1.000
INFO:main: Val epoch: 224 [10/31] Mean IoU: 1.000
INFO:main: Val epoch: 224 [20/31] Mean IoU: 1.000
INFO:main: Val epoch: 224 [30/31] Mean IoU: 1.000
INFO:main: IoUs: [1. 1.]
INFO:main: Val epoch: 224 Mean IoU: 1.000

INFO:main: Train epoch: 225 [0/44] Avg. Loss: 0.000 Avg. Time: 0.316

Traceback (most recent call last):

File "/home/vetec-tf/program/light-weight-refinenet/src/train.py", line 429, in
main()

File "/home/vetec-tf/program/light-weight-refinenet/src/train.py", line 413, in main
args.freeze_bn[task_idx])

File "/home/vetec-tf/program/light-weight-refinenet/src/train.py", line 276, in train_segmenter
output = segmenter(input_var)

File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)

File "/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/data_parallel.py", line 123, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)

File "/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/data_parallel.py", line 133, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])

File "/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/parallel_apply.py", line 77, in parallel_apply
raise output
File "/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/parallel_apply.py", line 53, in _worker
output = module(*input, **kwargs)

File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)

File "/home/vetec-tf/program/light-weight-refinenet/models/resnet.py", line 203, in forward
l1 = self.layer1(x)

File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)

File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/container.py", line 91, in forward
input = module(input)

File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)

File "/home/vetec-tf/program/light-weight-refinenet/models/resnet.py", line 135, in forward
out += residual

RuntimeError: The expanded size of the tensor (1024) must match the existing size (256) at non-singleton dimension 1`

and stage 1 is fininshed:
`INFO:main: Val epoch: 199 [0/31] Mean IoU: 1.000

INFO:main: Val epoch: 199 [10/31] Mean IoU: 1.000

INFO:main: Val epoch: 199 [20/31] Mean IoU: 1.000

INFO:main: Val epoch: 199 [30/31] Mean IoU: 1.000

INFO:main: IoUs: [1. 1.]

INFO:main: Val epoch: 199 Mean IoU: 1.000

INFO:main:Stage 1 finished, time spent 23.135min

INFO:main: Created train set = 265 examples, val set = 31 examples

INFO:main: Training Stage 2`

can you help me ? thank you!

Beginning with epoch 1,loss is equal to 0

Your examples/notebook's work is wonderful.thank you.
Now, I hope to use our dataset. It ran well ,but loss was too small.Like this,
Train epoch: 0 [0/512] Avg. Loss: 0.701 Avg. Time: 1.706 INFO:__main__: Train epoch: 0 [10/512] Avg. Loss: 0.589 Avg. Time: 0.434 INFO:__main__: Train epoch: 0 [20/512] Avg. Loss: 0.407 Avg. Time: 0.374
Finally,we got nothing except red in predict result.
our dataset is medical image, gt is white(255).we set NUM_CLASSES = 2.Is that right?

train dataset

How is the training data set organized? The files you gave are the clipped original diagram and split diagram without annotation

Can you share your cityscapes results and fps?

hey! i am doing some experiments on semantic segmentation in street scenes. i found that you put the cityscapes result in your paper supplementary materials. can you share your cityscapes results and fps?

where is the model-weight when I trained it？

Please tell me where dir is model weight when I trained it？

what about the highest IoU can arrive in PASCAL VOC in this repositority？

hello, very nice to meet this repositority.I think there are a little difference from paper in your code.I also writer a script to reproduce the paper but I can't make the val mIoU=80.3 test mIoU=82.0 , which are reported by author.so what's the best performance in this repositority.
thankyou for your answer.

drsleep / light-weight-refinenet Goto Github PK

light-weight-refinenet's Introduction

Light-Weight RefineNet (in PyTorch)

UPDATES

14, July, 2020:

5, June, 2020: a new version of the code has been pushed. It currently resides in src_v2/. The code now closely interacts with densetorch and supports transformations from albumentations, while also supporting torchvision datasets. Three training examples are provided in train/:

Getting Started

Dependencies

Running examples

Jupyter Notebooks [Local]

Colab Notebooks [Web]

Training scripts

More to come

More projects to check out

License

Acknowledgments

light-weight-refinenet's People

Contributors

Stargazers

Watchers

Forkers

light-weight-refinenet's Issues

Recommend Projects

Recommend Topics

Recommend Org