wuhuikai / fastfcn Goto Github PK

View Code? Open in Web Editor NEW

836.0 19.0 148.0 581 KB

FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation.

Home Page: http://wuhuikai.me/FastFCNProject

License: Other

Python 94.51% Shell 5.49%

fastfcn's Introduction

FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation

[Project] [Paper] [arXiv] [Home]

Official implementation of FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation.
A Faster, Stronger and Lighter framework for semantic segmentation, achieving the state-of-the-art performance and more than 3x acceleration.

@inproceedings{wu2019fastfcn,
  title     = {FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation},
  author    = {Wu, Huikai and Zhang, Junge and Huang, Kaiqi and Liang, Kongming and Yu Yizhou},
  booktitle = {arXiv preprint arXiv:1903.11816},
  year = {2019}
}

Contact: Hui-Kai Wu ([email protected])

Update

2020-04-15: Now support inference on a single image !!!

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m experiments.segmentation.test_single_image --dataset [pcontext|ade20k] \
    --model [encnet|deeplab|psp] --jpu [JPU|JPU_X] \
    --backbone [resnet50|resnet101] [--ms] --resume {MODEL} --input-path {INPUT} --save-path {OUTPUT}

2020-04-15: New joint upsampling module is now available !!!

--jpu [JPU|JPU_X]: JPU is the original module in the arXiv paper; JPU_X is a pyramid version of JPU.

2020-02-20: FastFCN can now run on every OS with PyTorch>=1.1.0 and Python==3.*.*

Replace all C/C++ extensions with pure python extensions.

Version

Original code, producing the results reported in the arXiv paper. [branch:v1.0.0]
Pure PyTorch code, with torch.nn.DistributedDataParallel and torch.nn.SyncBatchNorm. [branch:latest]
Pure Python code. [branch:master]

Overview

Framework

Joint Pyramid Upsampling (JPU)

Install

PyTorch >= 1.1.0 (Note: The code is test in the environment with python=3.6, cuda=9.0)

Download FastFCN

git clone https://github.com/wuhuikai/FastFCN.git
cd FastFCN

Install Requirements
```
nose
tqdm
scipy
cython
requests
```

Train and Test

PContext

python -m scripts.prepare_pcontext

Method	Backbone	mIoU	FPS	Model	Scripts
EncNet	ResNet-50	49.91	18.77
EncNet+JPU (ours)	ResNet-50	51.05	37.56	GoogleDrive	bash
PSP	ResNet-50	50.58	18.08
PSP+JPU (ours)	ResNet-50	50.89	28.48	GoogleDrive	bash
DeepLabV3	ResNet-50	49.19	15.99
DeepLabV3+JPU (ours)	ResNet-50	50.07	20.67	GoogleDrive	bash
EncNet	ResNet-101	52.60 (MS)	10.51
EncNet+JPU (ours)	ResNet-101	54.03 (MS)	32.02	GoogleDrive	bash

ADE20K

python -m scripts.prepare_ade20k

Training Set

Method	Backbone	mIoU (MS)	Model	Scripts
EncNet	ResNet-50	41.11
EncNet+JPU (ours)	ResNet-50	42.75	GoogleDrive	bash
EncNet	ResNet-101	44.65
EncNet+JPU (ours)	ResNet-101	44.34	GoogleDrive	bash

Training Set + Val Set

Method	Backbone	FinalScore (MS)	Model	Scripts
EncNet+JPU (ours)	ResNet-50		GoogleDrive	bash
EncNet	ResNet-101	55.67
EncNet+JPU (ours)	ResNet-101	55.84	GoogleDrive	bash

Note: EncNet (ResNet-101) is trained with crop_size=576, while EncNet+JPU (ResNet-101) is trained with crop_size=480 for fitting 4 images into a 12G GPU.

Visual Results

Dataset	Input	GT	EncNet	Ours
PContext
ADE20K

More Visual Results

Acknowledgement

Code borrows heavily from PyTorch-Encoding.

fastfcn's People

Contributors

Stargazers

Watchers

Forkers

ml-lab tangyoubao xiaoketongxue trantorrepository suyanzhou626 chiukin collector-m for-competition pchank hajungong007 shankar0206 shashgpt templeblock jdc08161063 dlwbm123 fendaq johndpope xuhuaze707313 labimage tony-leeee yougoforward jianyuchen23 hust-wayne xiamenwcy rjt1990 axruff leo-xxx 980380446 10183308 liminn billyzju wenhuach zwei1996 xiaochengcike devilnerd lxtgh nemonameless nnu-gisa hymandeng caijiahao arasharchor zw-shen qwerbbbb xuecaihu henryace mirocle007 ybj123 gyengera darkfunct jerrybai1995 mahlermozart emily9901 python00xuexi banyueqin surfzjy mesalamon zhique930716 huayuezhang shiyanrubing wangboml wanglixilinx engraphia fdujay 1026295417 stjordanis anikily houyunlong666 kt8506 abnerxzhe wuqiangch hyzwj zkwalt min-sheng whz1861 siju-samuel sonaliam xjwangziyan qingyun322 joybanerjee08 mxxwxb5012 qureshizawar michal-nahlik otmanon koala0qoo e18301194 julianschoep tamwaiban lainz12 fukuda-y liuguoyou marvis neverstoplearn santolina nkuhealong surfcao 1157942086 975150313 wuboris mengkunzhao he-jerry

fastfcn's Issues

A problem in `test_fps_params.py`.

Hi, thank you for your meaningful work!
I have a doubt about test_fps_params.py. Why does it perform two loop operations (loop1, loop2)?What is the role of loop1?

why i remove JPU，I also can train model？

Why does the code still execute without error when I delete the JPU module?（/FastFCN/encoding/nn/customize.py），I also can train model？
These are my commands :(I did load the JPU module) CUDA_VISIBLE_DEVICES=4,5,6,7 python train.py --dataset pcontext --model encnet --jpu --aux --se-loss --backbone resnet101 --checkname encnet_res101_pcontext

Is i have to compile pytorch from source?

python scripts/prepare_pcontext.py
/home/yulu/anaconda3/envs/fastfcn/lib/python3.5/site-packages/torch/utils/cpp_extension.py:166: UserWarning:

                           !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                          !! WARNING !!

platform=sys.platform))
Downloading /home/yulu/.encoding/data/downloads/VOCtrainval_03-May-2010.tar from http://host.robots.ox.ac.uk/pascal/VOC/voc2010/VOCtrainval_03-May-2010.tar...
16%|█████████████████████▊ | 215923/1313517 [02:09<04:19, 4228.91KB/s]

Is this warning maters?
my pytorch version:1.1.0
python:3.5

log and result visualization？

How to recored training phase by log?

自带的test文件是计算的59类的Miou还是60类的?

Pacal_context:Is mask in train.pth and val.pth identical to original mask?

I mean the class order in mask, in code,
mapping = np.sort(np.array([
0, 2, 259, 260, 415, 324, 9, 258, 144, 18, 19, 22,
23, 397, 25, 284, 158, 159, 416, 33, 162, 420, 454, 295, 296,
427, 44, 45, 46, 308, 59, 440, 445, 31, 232, 65, 354, 424,
68, 326, 72, 458, 34, 207, 80, 355, 85, 347, 220, 349, 360,
98, 187, 104, 105, 366, 189, 368, 113, 115]))
But i fond the mapping in train.pth is :
mapping = [0, 2, 9, 18, 19, 22, 23, 25, 31, 33, 34, 44, 45,
46, 59, 65, 68, 72, 80, 85, 98, 104, 105, 113, 115, 144,
158, 159, 162, 187, 189, 207, 220, 232, 258, 259, 260, 284, 295,
296, 308, 324, 326, 347, 349, 354, 355, 360, 366, 368, 397, 415,
416, 420, 424, 427, 440, 445, 454, 458]

how to Change SynBN to regular BN ?

how to Change SynBN to regular BN ?
could you help me?

thakn you very much.

Inferencing on real world images

Can you guide me on how to do inferencing on outside images. you have used pytorch encoding repo for data loading and preprocessing, so can you specify how can I use it to make inferencing.

other dataset result

Hi,best job.
You train the VOC and CityScape dataset?

Pretrained models error on untarring.

All the models that relate to the ade20k dataset give me errors when I try to untar the downloaded model. Is there anything I can do?

关于s-conv

请问论文s-conv是空洞卷积吗，我看见代码中有个dilation参数

请问是否可以开源具体的网络结构

我对 JPU 很感兴趣，也正在尝试复现。在论文中，我没有找到具体的 JPU 网络层数（通道数），代码中也没有找到 JPU 的实现细节。请问是否可以分享一下具体的网络结构，这样也方便大家引用 JPU 这个方法。

Should we use Ninja,before "python scripts/prepare_ade20k.py"? Can you help me?Thank you!

Traceback (most recent call last):
File "scripts/prepare_ade20k.py", line 6, in
from encoding.utils import download, mkdir
File "E:\PycharmProject\venv\lib\site-packages\encoding_init_.py", line 13, in
from . import nn, functions, dilated, parallel, utils, models, datasets
File "E:\PycharmProject\venv\lib\site-packages\encoding\nn_init_.py", line 12, in
from .syncbn import *
File "E:\PycharmProject\venv\lib\site-packages\encoding\nn\syncbn.py", line 23, in
from ..functions import *
File "E:\PycharmProject\venv\lib\site-packages\encoding\functions_init_.py", line 2, in
from .syncbn import *
File "E:\PycharmProject\venv\lib\site-packages\encoding\functions\syncbn.py", line 13, in
from .. import lib
File "E:\PycharmProject\venv\lib\site-packages\encoding\lib_init_.py", line 14, in
], build_directory=cpu_path, verbose=False)
File "E:\PycharmProject\venv\lib\site-packages\torch\utils\cpp_extension.py", line 645, in load
is_python_module)
File "E:\PycharmProject\venv\lib\site-packages\torch\utils\cpp_extension.py", line 814, in _jit_compile
with_cuda=with_cuda)
File "E:\PycharmProject\venv\lib\site-packages\torch\utils\cpp_extension.py", line 837, in _write_ninja_file_and_build
verify_ninja_availability()
File "E:\PycharmProject\venv\lib\site-packages\torch\utils\cpp_extension.py", line 875, in verify_ninja_availability
raise RuntimeError("Ninja is required to load C++ extensions")
RuntimeError: Ninja is required to load C++ extensions

包调用的问题

运行train.py中import encoding.utils as utils会出现ModuleNotFoundError: No module named 'encoding'的错误提示，改为 import FastFCN-master.encoding.utils as utils时出现” File "D:\Deep learning\FastFCN\encoding_init_.py", line 12, in
from .version import version
ModuleNotFoundError: No module named 'FastFCN.encoding.version'“的错误，请问__init__.py中的version指的是什么，如何解决？

AttributeError: 'NoneType' object has no attribute 'run_slave'

Hi, thanks to help me solve the ninja issue.But, there is a new issue come:

/home/weizhaoxiang/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/torch/nn/_reduction.py:43: UserWarning: size_average and reduce args will be deprecated, please use reduction='mean' instead.
warnings.warn(warning.format(ret))
Using poly LR Scheduler!
Starting Epoch: 0
Total Epoches: 240
0%| | 0/185 [00:00<?, ?it/s]
=>Epoches 0, learning rate = 0.0100, previous best = 0.0000
Traceback (most recent call last):
File "train.py", line 180, in
trainer.training(epoch)
File "train.py", line 110, in training
outputs = self.model(image)
File "/home/weizhaoxiang/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/weizhaoxiang/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/encoding/models/deeplabv3.py", line 22, in forward
_, _, c3, c4 = self.base_forward(x)
File "/home/weizhaoxiang/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/encoding/models/base.py", line 55, in base_forward
x = self.pretrained.conv1(x)
File "/home/weizhaoxiang/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/weizhaoxiang/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/weizhaoxiang/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/weizhaoxiang/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/encoding/nn/syncbn.py", line 58, in forward
mean, inv_std = self._slave_pipe.run_slave(_ChildMessage(xsum, xsqsum, N))
AttributeError: 'NoneType' object has no attribute 'run_slave'

Can you tell me how to solve ??

core dumped issue

Dear authors, I run the training script and I found the sementation fault as follows.

What do you think brings this issue? Thanks.

performance on deeplab_jpu

Hi,
Thank you for the awesome code.
I test the deeplab +jpu without changing anything on 4xgeforce 1080, cuda 9.0, torch 1.0.0
#train
CUDA_VISIBLE_DEVICES=4,5,6,7 python train.py --dataset pcontext --model deeplab --jpu --aux --backbone resnet50 --checkname deeplab_res50_pcontext_deeplabv3
#test
CUDA_VISIBLE_DEVICES=4,5,6,7 python test.py --dataset pcontext --model deeplab --jpu --aux --backbone resnet50 --resume ./runs/pcontext/deeplab/deeplab_res50_pcontext_deeplabv3/model_best.pth.tar --checkname deeplab_res50_pcontext_deeplabv3 --split val --mode testval
The model_best.pth.tar is the same as checkpoint.pth.tar in my case.
Ther performance i get is pixAcc: 0.7868, mIoU: 0.4904. Compare to Table 1, there is 1% drop (50.07 from table 1).

Do I miss sth or this is normal?

Thank you

Training does not start

Hi, I tried running the train.py script for both PContext and ADE20k datasets after following all the instructions. (setup.py, prepare_ade20k.py/prepare_pcontext.py, details-api). However, the training does not start or at least I can't see any progress. For context, I'm running this on Horovod docker (https://github.com/horovod/horovod/blob/master/Dockerfile). I only get the following log and nothing more:

root@ml:/path/# CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --dataset ade20k --model encnet --jpu --aux --se-loss --backbone resnet50 --checkname encnet_res50_ade20k_train
Namespace(aux=True, aux_weight=0.2, backbone='resnet50', base_size=520, batch_size=128, checkname='encnet_res50_ade20k_train', crop_size=480, cuda=True, dataset='ade20k', dilated=False, epochs=120, ft=False, jpu=True, lateral=False, lr=0.08, lr_scheduler='poly', mode='testval', model='encnet', model_zoo=None, momentum=0.9, ms=False, no_cuda=False, no_val=False, resume=None, save_folder='results', se_loss=True, se_weight=0.2, seed=1, split='val', start_epoch=0, test_batch_size=128, train_split='train', weight_decay=0.0001, workers=16)
BaseDataset: base_size 520, crop_size 480
len(img_paths): 20210
/usr/local/lib/python3.5/dist-packages/torch/nn/_reduction.py:49: UserWarning: size_average and reduce args will be deprecated, please use reduction='mean' instead.
warnings.warn(warning.format(ret))
Using poly LR Scheduler!
Starting Epoch: 0
Total Epoches: 120
0%| | 0/157 [00:00<?, ?it/s]Training at 0

=>Epoches 0, learning rate = 0.0800, previous best = 0.0000

ml:436:436 [3] misc/ibvwrap.cu:63 NCCL WARN Failed to open libibverbs.so[.1]
ml:436:436 [3] NCCL INFO Using internal Network Socket
NCCL version 2.3.7+cuda9.0
ml:436:436 [3] NCCL INFO nranks 4
ml:436:436 [0] NCCL INFO comm 0x188aaea80 rank 0 nranks 4
ml:436:436 [1] NCCL INFO comm 0x188ab0b80 rank 1 nranks 4
ml:436:436 [2] NCCL INFO comm 0x188aa8f10 rank 2 nranks 4
ml:436:436 [3] NCCL INFO comm 0x188aab010 rank 3 nranks 4
ml:436:436 [0] NCCL INFO NET : Using interface lo:127.0.0.1<0>
ml:436:436 [0] NCCL INFO NET : Using interface eno49:10.102.140.115<0>
ml:436:436 [0] NCCL INFO NET/Socket : 2 interfaces found
ml:436:436 [0] NCCL INFO Could not find real path of /sys/class/net/lo/device
ml:436:436 [0] NCCL INFO CUDA Dev 0, IP Interfaces : lo(SOC) eno49(PHB)
ml:436:436 [1] NCCL INFO Could not find real path of /sys/class/net/lo/device
ml:436:436 [1] NCCL INFO CUDA Dev 1, IP Interfaces : lo(SOC) eno49(PHB)
ml:436:436 [2] NCCL INFO Could not find real path of /sys/class/net/lo/device
ml:436:436 [2] NCCL INFO CUDA Dev 2, IP Interfaces : lo(SOC) eno49(SOC)
ml:436:436 [3] NCCL INFO Could not find real path of /sys/class/net/lo/device
ml:436:436 [3] NCCL INFO CUDA Dev 3, IP Interfaces : lo(SOC) eno49(SOC)
ml:436:436 [3] NCCL INFO Using 256 threads
ml:436:436 [3] NCCL INFO Min Comp Cap 5
ml:436:436 [3] NCCL INFO Ring 00 : 0 1 2 3
ml:436:436 [0] NCCL INFO Ring 00 : 0[0] -> 1[1] via P2P/direct pointer
ml:436:436 [1] NCCL INFO Ring 00 : 1[1] -> 2[2] via direct shared memory
ml:436:436 [2] NCCL INFO Ring 00 : 2[2] -> 3[3] via P2P/direct pointer
ml:436:436 [3] NCCL INFO Ring 00 : 3[3] -> 0[0] via direct shared memory
ml:436:436 [0] NCCL INFO Launch mode Group/Stream

CPU and memory utlization shows high usage and GPU utilization is 100%. Any thoughts? Thanks!

DeepLabV3+JPU pre-model download error

谷歌云端硬盘上的pre-model

是488M，但是我下载下来却是511.9MB

然后我用The Unarchiver解压缩“deeplab_jpu_res50_pcontext.pth.tar”，出现报错，提示数据损坏

我用mac自带的解压缩工具解压后却生成一个新的文件名为“deeplab_jpu_res50_pcontext.pth.tar.cpgz”的压缩文件，再次解压缩这个文件却重新生成了一个“deeplab_jpu_res50_pcontext.pth.tar”，这样不断循坏。
你好，可以重新提供一个百度云的地址？谢谢

关于setup.py

你好！
请问一下，这个代码每次有改动都必须重新在/FastFCN/目录下执行一次 python setup.py install吗？
谢谢

ModuleNotFoundError: No module named 'detail'

Hi, I was trying to run the test script

CUDA_VISIBLE_DEVICES=0,1,2,3 python test.py --dataset pcontext \
    --model encnet --jpu --aux --se-loss \
--backbone resnet50 --resume {MODEL} --split val --mode test

Then this error popped up:

  File "[...]/FastFCN/encoding/datasets/pcontext.py", line 25, in __init__
    from detail import Detail
ModuleNotFoundError: No module named 'detail

Are we missing a detail.py module here?

请问必须通过代码下载数据集才能训练吗？我自己先下载好不行吗？AssertionError: Please setup the dataset usingencoding/scripts/prepare_ade20k.py

請問如何重現EncNet的結果

你好，請問一下如何重現EncNet的結果

github上只有EncNet+JPU的script而無EncNet的script

想重現EncNet的結果將 --jpu --aux 拿掉就可以了嗎?

還是有其他地方要改?

Thanks !

Did not go on training

Thanks a lot to help me solve the ‘run slave’ .However, there is a new issue happen:

After the print, the code did not go on training or break, just stop at here. And you can not check there is a thread exist with the command 'nvidia-smi'.

How to solve the problem??

subprocess.CalledProcessError: Command '['where', 'cl']' returned non-zero exit status 1

when I tried to run test.py I got this error:

C:\Users\127051\AppData\Local\Programs\Python\Python35\python.exe "C:\Program Files\JetBrains\PyCharm Community Edition 2018.3.5\helpers\pydev\pydevd.py" --multiproc --qt-support=auto --client 127.0.0.1 --port 1388 --file "D:/Artificial Intelligence/Segmentation/SRC/FastFCN-master/FastFCN-master/experiments/segmentation/test.py"
pydev debugger: process 12360 is connecting

Connected to pydev debugger (build 191.6605.12)
C:\Users\127051\AppData\Local\Programs\Python\Python35\lib\site-packages\torch\utils\cpp_extension.py:184: UserWarning: Error checking compiler version for c++: Command 'c++' returned non-zero exit status 1
  warnings.warn('Error checking compiler version for {}: {}'.format(compiler, error))
INFO: Could not find files for the given pattern(s).
Traceback (most recent call last):
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2018.3.5\helpers\pydev\pydevd.py", line 1741, in <module>
    main()
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2018.3.5\helpers\pydev\pydevd.py", line 1735, in main
    globals = debugger.run(setup['file'], None, None, is_module)
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2018.3.5\helpers\pydev\pydevd.py", line 1135, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2018.3.5\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "D:/Artificial Intelligence/Segmentation/SRC/FastFCN-master/FastFCN-master/experiments/segmentation/test.py", line 12, in <module>
    import encoding.utils as utils
  File "D:\Artificial Intelligence\Segmentation\SRC\FastFCN-master\FastFCN-master\experiments\segmentation\encoding\__init__.py", line 13, in <module>
    from . import nn, functions, dilated, parallel, utils, models, datasets
  File "D:\Artificial Intelligence\Segmentation\SRC\FastFCN-master\FastFCN-master\experiments\segmentation\encoding\nn\__init__.py", line 12, in <module>
    from .syncbn import *
  File "D:\Artificial Intelligence\Segmentation\SRC\FastFCN-master\FastFCN-master\experiments\segmentation\encoding\nn\syncbn.py", line 23, in <module>
    from ..functions import *
  File "D:\Artificial Intelligence\Segmentation\SRC\FastFCN-master\FastFCN-master\experiments\segmentation\encoding\functions\__init__.py", line 2, in <module>
    from .syncbn import *
  File "D:\Artificial Intelligence\Segmentation\SRC\FastFCN-master\FastFCN-master\experiments\segmentation\encoding\functions\syncbn.py", line 13, in <module>
    from .. import lib
  File "D:\Artificial Intelligence\Segmentation\SRC\FastFCN-master\FastFCN-master\experiments\segmentation\encoding\lib\__init__.py", line 14, in <module>
    ], build_directory=cpu_path, verbose=False)
  File "C:\Users\127051\AppData\Local\Programs\Python\Python35\lib\site-packages\torch\utils\cpp_extension.py", line 645, in load
    is_python_module)
  File "C:\Users\127051\AppData\Local\Programs\Python\Python35\lib\site-packages\torch\utils\cpp_extension.py", line 814, in _jit_compile
    with_cuda=with_cuda)
  File "C:\Users\127051\AppData\Local\Programs\Python\Python35\lib\site-packages\torch\utils\cpp_extension.py", line 859, in _write_ninja_file_and_build
    with_cuda=with_cuda)
  File "C:\Users\127051\AppData\Local\Programs\Python\Python35\lib\site-packages\torch\utils\cpp_extension.py", line 1064, in _write_ninja_file
    'cl']).decode().split('\r\n')
  File "C:\Users\127051\AppData\Local\Programs\Python\Python35\lib\subprocess.py", line 316, in check_output
    **kwargs).stdout
  File "C:\Users\127051\AppData\Local\Programs\Python\Python35\lib\subprocess.py", line 398, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['where', 'cl']' returned non-zero exit status 1

Process finished with exit code -1

My environment :
torch : 1.0.1
os : windows 10
python 3.5.4

visualization

I'm sorry for disturbing you again. Could you give a demo to show the visualization of ground-truth or any other images in PContext such as fig. 6 in paper ?

python scripts/prepare_ade20k.py

RuntimeError: Error building extension 'enclib_gpu': [1/3] :/usr/local/cuda-8.0:/root/cuda:/usr/local/cuda-9.0/bin/nvcc -DTORCH_EXTENSION_NAME=enclib_gpu -DTORCH_API_INCLUDE_EXTENSION_H -isystem /root/anaconda3/envs/fastFCN/lib/python3.5/site-packages/torch/lib/include -isystem /root/anaconda3/envs/fastFCN/lib/python3.5/site-packages/torch/lib/include/torch/csrc/api/include -isystem /root/anaconda3/envs/fastFCN/lib/python3.5/site-packages/torch/lib/include/TH -isystem /root/anaconda3/envs/fastFCN/lib/python3.5/site-packages/torch/lib/include/THC -isystem :/usr/local/cuda-8.0:/root/cuda:/usr/local/cuda-9.0/include -isystem /root/anaconda3/envs/fastFCN/include/python3.5m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --compiler-options '-fPIC' -std=c++11 -c /home/fastFCN/FastFCN/encoding/lib/gpu/encoding_kernel.cu -o encoding_kernel.cuda.o
FAILED: encoding_kernel.cuda.o
:/usr/local/cuda-8.0:/root/cuda:/usr/local/cuda-9.0/bin/nvcc -DTORCH_EXTENSION_NAME=enclib_gpu -DTORCH_API_INCLUDE_EXTENSION_H -isystem /root/anaconda3/envs/fastFCN/lib/python3.5/site-packages/torch/lib/include -isystem /root/anaconda3/envs/fastFCN/lib/python3.5/site-packages/torch/lib/include/torch/csrc/api/include -isystem /root/anaconda3/envs/fastFCN/lib/python3.5/site-packages/torch/lib/include/TH -isystem /root/anaconda3/envs/fastFCN/lib/python3.5/site-packages/torch/lib/include/THC -isystem :/usr/local/cuda-8.0:/root/cuda:/usr/local/cuda-9.0/include -isystem /root/anaconda3/envs/fastFCN/include/python3.5m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --compiler-options '-fPIC' -std=c++11 -c /home/fastFCN/FastFCN/encoding/lib/gpu/encoding_kernel.cu -o encoding_kernel.cuda.o
/bin/sh: :/usr/local/cuda-8.0:/root/cuda:/usr/local/cuda-9.0/bin/nvcc: No such file or directory

how to solve this？

Error checking compiler version for cl: [WinError 2]请问这个问题怎么解决

what is trainval？

Thanks for your code.
You train your model on training and val dataset. Does it mean that the validation dataset also join in training dataset ?

I don't understand the meaning of trainval. Waiting for your reply.

I have a question!

Code implemention

I would like to ask which file code implements the convolution network in the paper graph

论文中的一些疑惑

按照论文中3.3.3 开头所说的流程应该通过ys和ym0来学习得到h
但是3.3.3中并没有讲如何得到ys？

论文中说将3个尺度的特征图上采样，请问是怎样上采样的？
通过h来进行联合上采样又是具体怎么学习的？

看论文从3.3.3开始就感觉具体操作和前面的内容有点接不上了

NVIDIA driver on your system is too old

Hi, can you tell me how to solve this problem:

Traceback (most recent call last):
File "train.py", line 182, in trainer.validation(epoch) File "train.py", line 149, in validation correct, labeled, inter, union = eval_batch(self.model, image, target)
File "train.py", line 134, in eval_batch target = target.cuda() File "/home/weizhaoxiang/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/torch/cuda/init.py", line 178, in _lazy_init _check_driver()
File "/home/weizhaoxiang/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/torch/cuda/init.py", line 108, in _check_driver
of the CUDA driver.""".format(str(torch._C._cuda_getDriverVersion())))
AssertionError:
The NVIDIA driver on your system is too old (found version 9000).
Please update your GPU driver by downloading and installing a new
version from the URL: http://www.nvidia.com/Download/index.aspx
Alternatively, go to: https://pytorch.org to install
a PyTorch version that has been compiled with your version
of the CUDA driver.

关于

您好，为什么我把JPU替换掉我自己的网络中的dilated conv层，速度和时间并没有提升？

problem in testing

used this for testing
CUDA_VISIBLE_DEVICES=4,5,6,7 python test.py --dataset ade20k --model encnet --jpu --aux --se-loss --backbone resnet101 --resume 'runs/ade20k/encnet/encnet_res50_ade20k_train/model_best.pth.tar' --split test --mode test

showed this error :
Namespace(aux=True, aux_weight=0.2, backbone='resnet101', base_size=520, batch_size=16, checkname='default', crop_size=480, cuda=True, dataset='ade20k', dilated=False, epochs=120, ft=False, jpu=True, lateral=False, lr=0.01, lr_scheduler='poly', mode='test', model='encnet', model_zoo=None, momentum=0.9, ms=False, no_cuda=False, no_val=False, resume='runs/ade20k/encnet/encnet_res50_ade20k_train/model_best.pth.tar', save_folder='results', se_loss=True, se_weight=0.2, seed=1, split='test', start_epoch=0, test_batch_size=16, train_split='train', weight_decay=0.0001, workers=16) Traceback (most recent call last): File "test.py", line 91, in test(args) File "test.py", line 57, in test model.load_state_dict(checkpoint['state_dict']) File "/home/akmmrahman/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 777, in load_state_dict self.class.name, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for EncNet: Missing key(s) in state_dict: "pretrained.layer3.6.conv1.weight", "pretrained.layer3.6.bn1.weight", "pretrained.layer3.6.bn1.bias", "pretrained.layer3.6.bn1.running_mean", "pretrained.layer3.6.bn1.running_var", "pretrained.layer3.6.conv2.weight", "pretrained.layer3.6.bn2.weight", "pretrained.layer3.6.bn2.bias", "pretrained.layer3.6.bn2.running_mean", "pretrained.layer3.6.bn2.running_var", "pretrained.layer3.6.conv3.weight", "pretrained.layer3.6.bn3.weight", "pretrained.layer3.6.bn3.bias", "pretrained.layer3.6.bn3.running_mean", "pretrained.layer3.6.bn3.running_var", "pretrained.layer3.7.conv1.weight", "pretrained.layer3.7.bn1.weight", "pretrained.layer3.7.bn1.bias", "pretrained.layer3.7.bn1.running_mean", "pretrained.layer3.7.bn1.running_var", "pretrained.layer3.7.conv2.weight", "pretrained.layer3.7.bn2.weight", "pretrained.layer3.7.bn2.bias", "pretrained.layer3.7.bn2.running_mean", "pretrained.layer3.7.bn2.running_var", "pretrained.layer3.7.conv3.weight", "pretrained.layer3.7.bn3.weight", "pretrained.layer3.7.bn3.bias", "pretrained.layer3.7.bn3.running_mean", "pretrained.layer3.7.bn3.running_var", "pretrained.layer3.8.conv1.weight", "pretrained.layer3.8.bn1.weight", "pretrained.layer3.8.bn1.bias", "pretrained.layer3.8.bn1.running_mean", "pretrained.layer3.8.bn1.running_var", "pretrained.layer3.8.conv2.weight", "pretrained.layer3.8.bn2.weight", "pretrained.layer3.8.bn2.bias", "pretrained.layer3.8.bn2.running_mean", "pretrained.layer3.8.bn2.running_var", "pretrained.layer3.8.conv3.weight", "pretrained.layer3.8.bn3.weight", "pretrained.layer3.8.bn3.bias", "pretrained.layer3.8.bn3.running_mean", "pretrained.layer3.8.bn3.running_var", "pretrained.layer3.9.conv1.weight", "pretrained.layer3.9.bn1.weight", "pretrained.layer3.9.bn1.bias", "pretrained.layer3.9.bn1.running_mean", "pretrained.layer3.9.bn1.running_var", "pretrained.layer3.9.conv2.weight", "pretrained.layer3.9.bn2.weight", "pretrained.layer3.9.bn2.bias", "pretrained.layer3.9.bn2.running_mean", "pretrained.layer3.9.bn2.running_var", "pretrained.layer3.9.conv3.weight", "pretrained.layer3.9.bn3.weight", "pretrained.layer3.9.bn3.bias", "pretrained.layer3.9.bn3.running_mean", "pretrained.layer3.9.bn3.running_var", "pretrained.layer3.10.conv1.weight", "pretrained.layer3.10.bn1.weight", "pretrained.layer3.10.bn1.bias", "pretrained.layer3.10.bn1.running_mean", "pretrained.layer3.10.bn1.running_var", "pretrained.layer3.10.conv2.weight", "pretrained.layer3.10.bn2.weight", "pretrained.layer3.10.bn2.bias", "pretrained.layer3.10.bn2.running_mean", "pretrained.layer3.10.bn2.running_var", "pretrained.layer3.10.conv3.weight", "pretrained.layer3.10.bn3.weight", "pretrained.layer3.10.bn3.bias", "pretrained.layer3.10.bn3.running_mean", "pretrained.layer3.10.bn3.running_var", "pretrained.layer3.11.conv1.weight", "pretrained.layer3.11.bn1.weight", "pretrained.layer3.11.bn1.bias", "pretrained.layer3.11.bn1.running_mean", "pretrained.layer3.11.bn1.running_var", "pretrained.layer3.11.conv2.weight", "pretrained.layer3.11.bn2.weight", "pretrained.layer3.11.bn2.bias", "pretrained.layer3.11.bn2.running_mean", "pretrained.layer3.11.bn2.running_var", "pretrained.layer3.11.conv3.weight", "pretrained.layer3.11.bn3.weight", "pretrained.layer3.11.bn3.bias", "pretrained.layer3.11.bn3.running_mean", "pretrained.layer3.11.bn3.running_var", "pretrained.layer3.12.conv1.weight", "pretrained.layer3.12.bn1.weight", "pretrained.layer3.12.bn1.bias", "pretrained.layer3.12.bn1.running_mean", "pretrained.layer3.12.bn1.running_var", "pretrained.layer3.12.conv2.weight", "pretrained.layer3.12.bn2.weight", "pretrained.layer3.12.bn2.bias", "pretrained.layer3.12.bn2.running_mean", "pretrained.layer3.12.bn2.running_var", "pretrained.layer3.12.conv3.weight", "pretrained.layer3.12.bn3.weight", "pretrained.layer3.12.bn3.bias", "pretrained.layer3.12.bn3.running_mean", "pretrained.layer3.12.bn3.running_var", "pretrained.layer3.13.conv1.weight", "pretrained.layer3.13.bn1.weight", "pretrained.layer3.13.bn1.bias", "pretrained.layer3.13.bn1.running_mean", "pretrained.layer3.13.bn1.running_var", "pretrained.layer3.13.conv2.weight", "pretrained.layer3.13.bn2.weight", "pretrained.layer3.13.bn2.bias", "pretrained.layer3.13.bn2.running_mean", "pretrained.layer3.13.bn2.running_var", "pretrained.layer3.13.conv3.weight", "pretrained.layer3.13.bn3.weight", "pretrained.layer3.13.bn3.bias", "pretrained.layer3.13.bn3.running_mean", "pretrained.layer3.13.bn3.running_var", "pretrained.layer3.14.conv1.weight", "pretrained.layer3.14.bn1.weight", "pretrained.layer3.14.bn1.bias", "pretrained.layer3.14.bn1.running_mean", "pretrained.layer3.14.bn1.running_var", "pretrained.layer3.14.conv2.weight", "pretrained.layer3.14.bn2.weight", "pretrained.layer3.14.bn2.bias", "pretrained.layer3.14.bn2.running_mean", "pretrained.layer3.14.bn2.running_var", "pretrained.layer3.14.conv3.weight", "pretrained.layer3.14.bn3.weight", "pretrained.layer3.14.bn3.bias", "pretrained.layer3.14.bn3.running_mean", "pretrained.layer3.14.bn3.running_var", "pretrained.layer3.15.conv1.weight", "pretrained.layer3.15.bn1.weight", "pretrained.layer3.15.bn1.bias", "pretrained.layer3.15.bn1.running_mean", "pretrained.layer3.15.bn1.running_var", "pretrained.layer3.15.conv2.weight", "pretrained.layer3.15.bn2.weight", "pretrained.layer3.15.bn2.bias", "pretrained.layer3.15.bn2.running_mean", "pretrained.layer3.15.bn2.running_var", "pretrained.layer3.15.conv3.weight", "pretrained.layer3.15.bn3.weight", "pretrained.layer3.15.bn3.bias", "pretrained.layer3.15.bn3.running_mean", "pretrained.layer3.15.bn3.running_var", "pretrained.layer3.16.conv1.weight", "pretrained.layer3.16.bn1.weight", "pretrained.layer3.16.bn1.bias", "pretrained.layer3.16.bn1.running_mean", "pretrained.layer3.16.bn1.running_var", "pretrained.layer3.16.conv2.weight", "pretrained.layer3.16.bn2.weight", "pretrained.layer3.16.bn2.bias", "pretrained.layer3.16.bn2.running_mean", "pretrained.layer3.16.bn2.running_var", "pretrained.layer3.16.conv3.weight", "pretrained.layer3.16.bn3.weight", "pretrained.layer3.16.bn3.bias", "pretrained.layer3.16.bn3.running_mean", "pretrained.layer3.16.bn3.running_var", "pretrained.layer3.17.conv1.weight", "pretrained.layer3.17.bn1.weight", "pretrained.layer3.17.bn1.bias", "pretrained.layer3.17.bn1.running_mean", "pretrained.layer3.17.bn1.running_var", "pretrained.layer3.17.conv2.weight", "pretrained.layer3.17.bn2.weight", "pretrained.layer3.17.bn2.bias", "pretrained.layer3.17.bn2.running_mean", "pretrained.layer3.17.bn2.running_var", "pretrained.layer3.17.conv3.weight", "pretrained.layer3.17.bn3.weight", "pretrained.layer3.17.bn3.bias", "pretrained.layer3.17.bn3.running_mean", "pretrained.layer3.17.bn3.running_var", "pretrained.layer3.18.conv1.weight", "pretrained.layer3.18.bn1.weight", "pretrained.layer3.18.bn1.bias", "pretrained.layer3.18.bn1.running_mean", "pretrained.layer3.18.bn1.running_var", "pretrained.layer3.18.conv2.weight", "pretrained.layer3.18.bn2.weight", "pretrained.layer3.18.bn2.bias", "pretrained.layer3.18.bn2.running_mean", "pretrained.layer3.18.bn2.running_var", "pretrained.layer3.18.conv3.weight", "pretrained.layer3.18.bn3.weight", "pretrained.layer3.18.bn3.bias", "pretrained.layer3.18.bn3.running_mean", "pretrained.layer3.18.bn3.running_var", "pretrained.layer3.19.conv1.weight", "pretrained.layer3.19.bn1.weight", "pretrained.layer3.19.bn1.bias", "pretrained.layer3.19.bn1.running_mean", "pretrained.layer3.19.bn1.running_var", "pretrained.layer3.19.conv2.weight", "pretrained.layer3.19.bn2.weight", "pretrained.layer3.19.bn2.bias", "pretrained.layer3.19.bn2.running_mean", "pretrained.layer3.19.bn2.running_var", "pretrained.layer3.19.conv3.weight", "pretrained.layer3.19.bn3.weight", "pretrained.layer3.19.bn3.bias", "pretrained.layer3.19.bn3.running_mean", "pretrained.layer3.19.bn3.running_var", "pretrained.layer3.20.conv1.weight", "pretrained.layer3.20.bn1.weight", "pretrained.layer3.20.bn1.bias", "pretrained.layer3.20.bn1.running_mean", "pretrained.layer3.20.bn1.running_var", "pretrained.layer3.20.conv2.weight", "pretrained.layer3.20.bn2.weight", "pretrained.layer3.20.bn2.bias", "pretrained.layer3.20.bn2.running_mean", "pretrained.layer3.20.bn2.running_var", "pretrained.layer3.20.conv3.weight", "pretrained.layer3.20.bn3.weight", "pretrained.layer3.20.bn3.bias", "pretrained.layer3.20.bn3.running_mean", "pretrained.layer3.20.bn3.running_var", "pretrained.layer3.21.conv1.weight", "pretrained.layer3.21.bn1.weight", "pretrained.layer3.21.bn1.bias", "pretrained.layer3.21.bn1.running_mean", "pretrained.layer3.21.bn1.running_var", "pretrained.layer3.21.conv2.weight", "pretrained.layer3.21.bn2.weight", "pretrained.layer3.21.bn2.bias", "pretrained.layer3.21.bn2.running_mean", "pretrained.layer3.21.bn2.running_var", "pretrained.layer3.21.conv3.weight", "pretrained.layer3.21.bn3.weight", "pretrained.layer3.21.bn3.bias", "pretrained.layer3.21.bn3.running_mean", "pretrained.layer3.21.bn3.running_var", "pretrained.layer3.22.conv1.weight", "pretrained.layer3.22.bn1.weight", "pretrained.layer3.22.bn1.bias", "pretrained.layer3.22.bn1.running_mean", "pretrained.layer3.22.bn1.running_var", "pretrained.layer3.22.conv2.weight", "pretrained.layer3.22.bn2.weight", "pretrained.layer3.22.bn2.bias", "pretrained.layer3.22.bn2.running_mean", "pretrained.layer3.22.bn2.running_var", "pretrained.layer3.22.conv3.weight", "pretrained.layer3.22.bn3.weight", "pretrained.layer3.22.bn3.bias", "pretrained.layer3.22.bn3.running_mean", "pretrained.layer3.22.bn3.running_var".

Test a single image

After the training is complete , with the trained model ,how do i segment a single image in any specific folder ?

Performance Issue

Thanks for your work.
I have tried this script:
https://github.com/wuhuikai/FastFCN/blob/master/experiments/segmentation/scripts/encnet_res50_pcontext.sh
with the hardware and software: 4xTitanXp, Ubuntu16.04, CUDA9.0, PyToch1.0

But I can't reproduce the performance reported in your paper.
I got pixAcc: 0.7747, mIoU: 0.4785 for single-scale,
and pixAcc: 0.7833, mIoU: 0.4898 for multi-scale.

I would appreciate your help. Thanks for your consideration.

Wrong when i run prepare_pcontext.py

[yulu@yq01-gpu-255-126-19-00 FastFCN]$ python scripts/prepare_pcontext.py
/home/yulu/anaconda3/lib/python3.7/site-packages/torch/utils/cpp_extension.py:166: UserWarning:

                           !! WARNING !!

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                          !! WARNING !!

platform=sys.platform))
Traceback (most recent call last):
File "scripts/prepare_pcontext.py", line 7, in
from encoding.utils import download, mkdir
File "/home/yulu/anaconda3/lib/python3.7/site-packages/encoding/init.py", line 13, in
from . import nn, functions, dilated, parallel, utils, models, datasets
File "/home/yulu/anaconda3/lib/python3.7/site-packages/encoding/nn/init.py", line 12, in
from .syncbn import *
File "/home/yulu/anaconda3/lib/python3.7/site-packages/encoding/nn/syncbn.py", line 23, in
from ..functions import *
File "/home/yulu/anaconda3/lib/python3.7/site-packages/encoding/functions/init.py", line 2, in
from .syncbn import *
File "/home/yulu/anaconda3/lib/python3.7/site-packages/encoding/functions/syncbn.py", line 13, in
from .. import lib
File "/home/yulu/anaconda3/lib/python3.7/site-packages/encoding/lib/init.py", line 14, in
], build_directory=cpu_path, verbose=False)
File "/home/yulu/anaconda3/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 645, in load
is_python_module)
File "/home/yulu/anaconda3/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 814, in _jit_compile
with_cuda=with_cuda)
File "/home/yulu/anaconda3/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 863, in _write_ninja_file_and_build
_build_extension_module(name, build_directory, verbose)
File "/home/yulu/anaconda3/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 946, in _build_extension_module
check=True)
File "/home/yulu/anaconda3/lib/python3.7/subprocess.py", line 453, in run
with Popen(*popenargs, **kwargs) as process:
File "/home/yulu/anaconda3/lib/python3.7/subprocess.py", line 756, in init
restore_signals, start_new_session)
File "/home/yulu/anaconda3/lib/python3.7/subprocess.py", line 1499, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'ninja': 'ninja'

ninja problem

I downloaded the citys dataset and put it in home/user/.encoding/data.
Then, I run command 'CUDA_VISIBLE_DEVICES=0,1 python train.py --dataset citys
--model deeplab --jpu --aux
--backbone resnet50 --checkname deeplab_res50_citys'
and the issue come out:

Traceback (most recent call last):
File "/home/weizhaoxiang/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 576, in _build_extension_module
['ninja', '-v'], stderr=subprocess.STDOUT, cwd=build_directory)
File "/home/weizhaoxiang/anaconda3/envs/pytorch/lib/python3.6/subprocess.py", line 336, in check_output
**kwargs).stdout
File "/home/weizhaoxiang/anaconda3/envs/pytorch/lib/python3.6/subprocess.py", line 418, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train.py", line 16, in
import encoding.utils as utils
File "/home/weizhaoxiang/anaconda3/envs/pytorch/lib/python3.6/site-packages/encoding/init.py", line 13, in
from . import nn, functions, dilated, parallel, utils, models, datasets
File "/home/weizhaoxiang/anaconda3/envs/pytorch/lib/python3.6/site-packages/encoding/nn/init.py", line 12, in
from .syncbn import *
File "/home/weizhaoxiang/anaconda3/envs/pytorch/lib/python3.6/site-packages/encoding/nn/syncbn.py", line 23, in
from ..functions import *
File "/home/weizhaoxiang/anaconda3/envs/pytorch/lib/python3.6/site-packages/encoding/functions/init.py", line 2, in
from .syncbn import *
File "/home/weizhaoxiang/anaconda3/envs/pytorch/lib/python3.6/site-packages/encoding/functions/syncbn.py", line 13, in
from .. import lib
File "/home/weizhaoxiang/anaconda3/envs/pytorch/lib/python3.6/site-packages/encoding/lib/init.py", line 14, in
], build_directory=cpu_path, verbose=False)
File "/home/weizhaoxiang/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 501, in load
_build_extension_module(name, build_directory)
File "/home/weizhaoxiang/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 582, in _build_extension_module
name, error.output.decode()))
RuntimeError: Error building extension 'enclib_cpu': [1/2] c++ -MMD -MF syncbn_cpu.o.d -DTORCH_EXTENSION_NAME=enclib_cpu -I/home/weizhaoxiang/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/lib/include -I/home/weizhaoxiang/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/lib/include/TH -I/home/weizhaoxiang/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/lib/include/THC -I/home/weizhaoxiang/anaconda3/envs/pytorch/include/python3.6m -fPIC -std=c++11 -c /home/weizhaoxiang/anaconda3/envs/pytorch/lib/python3.6/site-packages/encoding/lib/cpu/syncbn_cpu.cpp -o syncbn_cpu.o
FAILED: syncbn_cpu.o
c++ -MMD -MF syncbn_cpu.o.d -DTORCH_EXTENSION_NAME=enclib_cpu -I/home/weizhaoxiang/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/lib/include -I/home/weizhaoxiang/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/lib/include/TH -I/home/weizhaoxiang/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/lib/include/THC -I/home/weizhaoxiang/anaconda3/envs/pytorch/include/python3.6m -fPIC -std=c++11 -c /home/weizhaoxiang/anaconda3/envs/pytorch/lib/python3.6/site-packages/encoding/lib/cpu/syncbn_cpu.cpp -o syncbn_cpu.o
/home/weizhaoxiang/anaconda3/envs/pytorch/lib/python3.6/site-packages/encoding/lib/cpu/syncbn_cpu.cpp:1:29: fatal error: torch/extension.h: No such file or directory
compilation terminated.
ninja: build stopped: subcommand failed.

How to solve this problem??

performance problem when training ADE20k

When i train ADE20k and cropsize=520,i have got a worse performance than original crop-size 480,do you have any idea why this happen?是学习率要调高吗？

Segmentation fault

I think this problem is caused by my previous pytorch problem,so maybe i have to solve pytorch first.Could you give me some help?
gcc:4.8
pytorch:1.1.0
python:3.5
and how could i change the pytorch version to 1.0.0?pip install torch==1.0?

how to set learning rate for single GPU and the performance trained on small batch size

I am using a single GPU, so my batch_size ==2

Should I use the default setting of learning rate as shown in the following
args.lr = lrs[args.dataset.lower()] / 16 * args.batch_size
Seems the lr will be very small.
what the 16 means in the code above?
Have you ever try trained on very small batch_size?
For me, after 80 epoches, the default setting for lr, batch_size 2, mIoU is about 0.33.
It could be when small batch size, 80 epoches is not enough for good converge. But if you have some experience on single GPU (small batch size), it would be great if we can discuss

Thank you for your code and appriciate if you can help.

how to train it？i can not load dataset？

Training on custom dataset

Hello! Will 12 GB of single GPU be enough to train your model on a small dataset? And may be you can publish your pretrained model (dataset no matter)? It would be great)

There was an error when I ran the program according to your steps. Can you help me?Thank you very much!

Using poly LR Scheduler!
Starting Epoch: 0
Total Epoches: 120
0%| | 0/1263 [00:00<?, ?it/s]
=>Epoches 0, learning rate = 0.0100, previous best = 0.0000
Segmentation fault (core dumped)

size of predicted map

In Fig. 2 of your paper, I hold the view that the final predicted map is the same size as input rather than 8x smaller than input.

AttributeError: 'NoneType' object has no attribute 'run_slave'

Hi!
I am not sure what is causing the following Error although I worry my cuda version is wrong or my gpu could not be suitable.

I am running cuda 10 on a 1060 GTX

When I run

#train
CUDA_VISIBLE_DEVICES=0 python train.py --dataset pcontext \
    --model encnet --jpu --aux --se-loss \
    --backbone resnet50 --checkname encnet_res50_pcontext

I get

Traceback (most recent call last):
  File "train.py", line 180, in <module>
    trainer.training(epoch)
  File "train.py", line 110, in training
    outputs = self.model(image)
  File "/home/emiller/WorkPlace/DataScience/Personal/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/emiller/WorkPlace/DataScience/Personal/venv/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 141, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/emiller/WorkPlace/DataScience/Personal/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/emiller/WorkPlace/DataScience/Personal/venv/lib/python3.6/site-packages/encoding/models/encnet.py", line 33, in forward
    features = self.base_forward(x)
  File "/home/emiller/WorkPlace/DataScience/Personal/venv/lib/python3.6/site-packages/encoding/models/base.py", line 55, in base_forward
    x = self.pretrained.conv1(x)
  File "/home/emiller/WorkPlace/DataScience/Personal/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/emiller/WorkPlace/DataScience/Personal/venv/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "/home/emiller/WorkPlace/DataScience/Personal/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/emiller/WorkPlace/DataScience/Personal/venv/lib/python3.6/site-packages/encoding/nn/syncbn.py", line 58, in forward
    mean, inv_std = self._slave_pipe.run_slave(_ChildMessage(xsum, xsqsum, N))
AttributeError: 'NoneType' object has no attribute 'run_slave'

At first i thought this is because i am not multi gpu but could it be something else?

Undefined names

flake8 testing of https://github.com/wuhuikai/FastFCN on Python 3.7.1

$ flake8 . --count --select=E9,F63,F72,F82 --show-source --statistics

./encoding/models/base.py:102:34: F821 undefined name 'target_gpus'
        kwargs = scatter(kwargs, target_gpus, dim) if kwargs else []
                                 ^
./encoding/models/base.py:102:47: F821 undefined name 'dim'
        kwargs = scatter(kwargs, target_gpus, dim) if kwargs else []
                                              ^
./encoding/datasets/base.py:113:44: F821 undefined name 'batch'
    raise TypeError((error_msg.format(type(batch[0]))))
                                           ^
3     F821 undefined name 'batch'
3

E901,E999,F821,F822,F823 are the "showstopper" flake8 issues that can halt the runtime with a SyntaxError, NameError, etc. These 5 are different from most other flake8 issues which are merely "style violations" -- useful for readability but they do not effect runtime safety.

F821: undefined name name
F822: undefined name name in __all__
F823: local variable name referenced before assignment
E901: SyntaxError or IndentationError
E999: SyntaxError -- failed to compile a file into an Abstract Syntax Tree

Can you tell me how to train your network?

prepare_cityscapes.py not working

the download URLs aren't correct
_CITY_DOWNLOAD_URLS = [ ('gtFine_trainvaltest.zip', '99f532cb1af174f5fcc4c5bc8feea8c66246ddbc'), ('leftImg8bit_trainvaltest.zip', '2c0b77ce9933cc635adda307fbba5566f5d9d404')]
and the download function is never called