Giter Club home page Giter Club logo

fastfcn's Introduction

FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation

[Project] [Paper] [arXiv] [Home]

PWC

Official implementation of FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation.
A Faster, Stronger and Lighter framework for semantic segmentation, achieving the state-of-the-art performance and more than 3x acceleration.

@inproceedings{wu2019fastfcn,
  title     = {FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation},
  author    = {Wu, Huikai and Zhang, Junge and Huang, Kaiqi and Liang, Kongming and Yu Yizhou},
  booktitle = {arXiv preprint arXiv:1903.11816},
  year = {2019}
}

Contact: Hui-Kai Wu ([email protected])

Update

2020-04-15: Now support inference on a single image !!!

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m experiments.segmentation.test_single_image --dataset [pcontext|ade20k] \
    --model [encnet|deeplab|psp] --jpu [JPU|JPU_X] \
    --backbone [resnet50|resnet101] [--ms] --resume {MODEL} --input-path {INPUT} --save-path {OUTPUT}

2020-04-15: New joint upsampling module is now available !!!

  • --jpu [JPU|JPU_X]: JPU is the original module in the arXiv paper; JPU_X is a pyramid version of JPU.

2020-02-20: FastFCN can now run on every OS with PyTorch>=1.1.0 and Python==3.*.*

  • Replace all C/C++ extensions with pure python extensions.

Version

  1. Original code, producing the results reported in the arXiv paper. [branch:v1.0.0]
  2. Pure PyTorch code, with torch.nn.DistributedDataParallel and torch.nn.SyncBatchNorm. [branch:latest]
  3. Pure Python code. [branch:master]

Overview

Framework

Joint Pyramid Upsampling (JPU)

Install

  1. PyTorch >= 1.1.0 (Note: The code is test in the environment with python=3.6, cuda=9.0)
  2. Download FastFCN
    git clone https://github.com/wuhuikai/FastFCN.git
    cd FastFCN
    
  3. Install Requirements
    nose
    tqdm
    scipy
    cython
    requests
    

Train and Test

PContext

python -m scripts.prepare_pcontext
Method Backbone mIoU FPS Model Scripts
EncNet ResNet-50 49.91 18.77
EncNet+JPU (ours) ResNet-50 51.05 37.56 GoogleDrive bash
PSP ResNet-50 50.58 18.08
PSP+JPU (ours) ResNet-50 50.89 28.48 GoogleDrive bash
DeepLabV3 ResNet-50 49.19 15.99
DeepLabV3+JPU (ours) ResNet-50 50.07 20.67 GoogleDrive bash
EncNet ResNet-101 52.60 (MS) 10.51
EncNet+JPU (ours) ResNet-101 54.03 (MS) 32.02 GoogleDrive bash

ADE20K

python -m scripts.prepare_ade20k

Training Set

Method Backbone mIoU (MS) Model Scripts
EncNet ResNet-50 41.11
EncNet+JPU (ours) ResNet-50 42.75 GoogleDrive bash
EncNet ResNet-101 44.65
EncNet+JPU (ours) ResNet-101 44.34 GoogleDrive bash

Training Set + Val Set

Method Backbone FinalScore (MS) Model Scripts
EncNet+JPU (ours) ResNet-50 GoogleDrive bash
EncNet ResNet-101 55.67
EncNet+JPU (ours) ResNet-101 55.84 GoogleDrive bash

Note: EncNet (ResNet-101) is trained with crop_size=576, while EncNet+JPU (ResNet-101) is trained with crop_size=480 for fitting 4 images into a 12G GPU.

Visual Results

Dataset Input GT EncNet Ours
PContext
ADE20K

Acknowledgement

Code borrows heavily from PyTorch-Encoding.

fastfcn's People

Contributors

rjt1990 avatar siju-samuel avatar wuhuikai avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fastfcn's Issues

why i remove JPU,I also can train model?

Why does the code still execute without error when I delete the JPU module?(/FastFCN/encoding/nn/customize.py),I also can train model?
These are my commands :(I did load the JPU module) CUDA_VISIBLE_DEVICES=4,5,6,7 python train.py --dataset pcontext --model encnet --jpu --aux --se-loss --backbone resnet101 --checkname encnet_res101_pcontext

Is i have to compile pytorch from source?

python scripts/prepare_pcontext.py
/home/yulu/anaconda3/envs/fastfcn/lib/python3.5/site-packages/torch/utils/cpp_extension.py:166: UserWarning:

                           !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                          !! WARNING !!

platform=sys.platform))
Downloading /home/yulu/.encoding/data/downloads/VOCtrainval_03-May-2010.tar from http://host.robots.ox.ac.uk/pascal/VOC/voc2010/VOCtrainval_03-May-2010.tar...
16%|█████████████████████▊ | 215923/1313517 [02:09<04:19, 4228.91KB/s]

Is this warning maters?
my pytorch version:1.1.0
python:3.5

Pacal_context:Is mask in train.pth and val.pth identical to original mask?

I mean the class order in mask, in code,
mapping = np.sort(np.array([
0, 2, 259, 260, 415, 324, 9, 258, 144, 18, 19, 22,
23, 397, 25, 284, 158, 159, 416, 33, 162, 420, 454, 295, 296,
427, 44, 45, 46, 308, 59, 440, 445, 31, 232, 65, 354, 424,
68, 326, 72, 458, 34, 207, 80, 355, 85, 347, 220, 349, 360,
98, 187, 104, 105, 366, 189, 368, 113, 115]))
But i fond the mapping in train.pth is :
mapping = [0, 2, 9, 18, 19, 22, 23, 25, 31, 33, 34, 44, 45,
46, 59, 65, 68, 72, 80, 85, 98, 104, 105, 113, 115, 144,
158, 159, 162, 187, 189, 207, 220, 232, 258, 259, 260, 284, 295,
296, 308, 324, 326, 347, 349, 354, 355, 360, 366, 368, 397, 415,
416, 420, 424, 427, 440, 445, 454, 458]

Inferencing on real world images

Can you guide me on how to do inferencing on outside images. you have used pytorch encoding repo for data loading and preprocessing, so can you specify how can I use it to make inferencing.

关于s-conv

请问论文s-conv是空洞卷积吗,我看见代码中有个dilation参数

请问是否可以开源具体的网络结构

我对 JPU 很感兴趣,也正在尝试复现。在论文中,我没有找到具体的 JPU 网络层数(通道数),代码中也没有找到 JPU 的实现细节。请问是否可以分享一下具体的网络结构,这样也方便大家引用 JPU 这个方法。

Should we use Ninja,before "python scripts/prepare_ade20k.py"? Can you help me?Thank you!

Traceback (most recent call last):
File "scripts/prepare_ade20k.py", line 6, in
from encoding.utils import download, mkdir
File "E:\PycharmProject\venv\lib\site-packages\encoding_init_.py", line 13, in
from . import nn, functions, dilated, parallel, utils, models, datasets
File "E:\PycharmProject\venv\lib\site-packages\encoding\nn_init_.py", line 12, in
from .syncbn import *
File "E:\PycharmProject\venv\lib\site-packages\encoding\nn\syncbn.py", line 23, in
from ..functions import *
File "E:\PycharmProject\venv\lib\site-packages\encoding\functions_init_.py", line 2, in
from .syncbn import *
File "E:\PycharmProject\venv\lib\site-packages\encoding\functions\syncbn.py", line 13, in
from .. import lib
File "E:\PycharmProject\venv\lib\site-packages\encoding\lib_init_.py", line 14, in
], build_directory=cpu_path, verbose=False)
File "E:\PycharmProject\venv\lib\site-packages\torch\utils\cpp_extension.py", line 645, in load
is_python_module)
File "E:\PycharmProject\venv\lib\site-packages\torch\utils\cpp_extension.py", line 814, in _jit_compile
with_cuda=with_cuda)
File "E:\PycharmProject\venv\lib\site-packages\torch\utils\cpp_extension.py", line 837, in _write_ninja_file_and_build
verify_ninja_availability()
File "E:\PycharmProject\venv\lib\site-packages\torch\utils\cpp_extension.py", line 875, in verify_ninja_availability
raise RuntimeError("Ninja is required to load C++ extensions")
RuntimeError: Ninja is required to load C++ extensions

包调用的问题

运行train.py中import encoding.utils as utils会出现ModuleNotFoundError: No module named 'encoding'的错误提示,改为 import FastFCN-master.encoding.utils as utils时出现” File "D:\Deep learning\FastFCN\encoding_init_.py", line 12, in
from .version import version
ModuleNotFoundError: No module named 'FastFCN.encoding.version'“的错误,请问__init__.py中的version指的是什么,如何解决?

AttributeError: 'NoneType' object has no attribute 'run_slave'

Hi, thanks to help me solve the ninja issue.But, there is a new issue come:

/home/weizhaoxiang/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/torch/nn/_reduction.py:43: UserWarning: size_average and reduce args will be deprecated, please use reduction='mean' instead.
warnings.warn(warning.format(ret))
Using poly LR Scheduler!
Starting Epoch: 0
Total Epoches: 240
0%| | 0/185 [00:00<?, ?it/s]
=>Epoches 0, learning rate = 0.0100, previous best = 0.0000
Traceback (most recent call last):
File "train.py", line 180, in
trainer.training(epoch)
File "train.py", line 110, in training
outputs = self.model(image)
File "/home/weizhaoxiang/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/weizhaoxiang/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/encoding/models/deeplabv3.py", line 22, in forward
_, _, c3, c4 = self.base_forward(x)
File "/home/weizhaoxiang/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/encoding/models/base.py", line 55, in base_forward
x = self.pretrained.conv1(x)
File "/home/weizhaoxiang/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/weizhaoxiang/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/weizhaoxiang/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/weizhaoxiang/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/encoding/nn/syncbn.py", line 58, in forward
mean, inv_std = self._slave_pipe.run_slave(_ChildMessage(xsum, xsqsum, N))
AttributeError: 'NoneType' object has no attribute 'run_slave'

Can you tell me how to solve ??

core dumped issue

Dear authors, I run the training script and I found the sementation fault as follows.
image

What do you think brings this issue? Thanks.

performance on deeplab_jpu

Hi,
Thank you for the awesome code.
I test the deeplab +jpu without changing anything on 4xgeforce 1080, cuda 9.0, torch 1.0.0
#train
CUDA_VISIBLE_DEVICES=4,5,6,7 python train.py --dataset pcontext --model deeplab --jpu --aux --backbone resnet50 --checkname deeplab_res50_pcontext_deeplabv3
#test
CUDA_VISIBLE_DEVICES=4,5,6,7 python test.py --dataset pcontext --model deeplab --jpu --aux --backbone resnet50 --resume ./runs/pcontext/deeplab/deeplab_res50_pcontext_deeplabv3/model_best.pth.tar --checkname deeplab_res50_pcontext_deeplabv3 --split val --mode testval
The model_best.pth.tar is the same as checkpoint.pth.tar in my case.
Ther performance i get is pixAcc: 0.7868, mIoU: 0.4904. Compare to Table 1, there is 1% drop (50.07 from table 1).

Do I miss sth or this is normal?

Thank you

Training does not start

Hi, I tried running the train.py script for both PContext and ADE20k datasets after following all the instructions. (setup.py, prepare_ade20k.py/prepare_pcontext.py, details-api). However, the training does not start or at least I can't see any progress. For context, I'm running this on Horovod docker (https://github.com/horovod/horovod/blob/master/Dockerfile). I only get the following log and nothing more:

root@ml:/path/# CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --dataset ade20k --model encnet --jpu --aux --se-loss --backbone resnet50 --checkname encnet_res50_ade20k_train
Namespace(aux=True, aux_weight=0.2, backbone='resnet50', base_size=520, batch_size=128, checkname='encnet_res50_ade20k_train', crop_size=480, cuda=True, dataset='ade20k', dilated=False, epochs=120, ft=False, jpu=True, lateral=False, lr=0.08, lr_scheduler='poly', mode='testval', model='encnet', model_zoo=None, momentum=0.9, ms=False, no_cuda=False, no_val=False, resume=None, save_folder='results', se_loss=True, se_weight=0.2, seed=1, split='val', start_epoch=0, test_batch_size=128, train_split='train', weight_decay=0.0001, workers=16)
BaseDataset: base_size 520, crop_size 480
len(img_paths): 20210
/usr/local/lib/python3.5/dist-packages/torch/nn/_reduction.py:49: UserWarning: size_average and reduce args will be deprecated, please use reduction='mean' instead.
warnings.warn(warning.format(ret))
Using poly LR Scheduler!
Starting Epoch: 0
Total Epoches: 120
0%| | 0/157 [00:00<?, ?it/s]Training at 0

=>Epoches 0, learning rate = 0.0800, previous best = 0.0000

ml:436:436 [3] misc/ibvwrap.cu:63 NCCL WARN Failed to open libibverbs.so[.1]
ml:436:436 [3] NCCL INFO Using internal Network Socket
NCCL version 2.3.7+cuda9.0
ml:436:436 [3] NCCL INFO nranks 4
ml:436:436 [0] NCCL INFO comm 0x188aaea80 rank 0 nranks 4
ml:436:436 [1] NCCL INFO comm 0x188ab0b80 rank 1 nranks 4
ml:436:436 [2] NCCL INFO comm 0x188aa8f10 rank 2 nranks 4
ml:436:436 [3] NCCL INFO comm 0x188aab010 rank 3 nranks 4
ml:436:436 [0] NCCL INFO NET : Using interface lo:127.0.0.1<0>
ml:436:436 [0] NCCL INFO NET : Using interface eno49:10.102.140.115<0>
ml:436:436 [0] NCCL INFO NET/Socket : 2 interfaces found
ml:436:436 [0] NCCL INFO Could not find real path of /sys/class/net/lo/device
ml:436:436 [0] NCCL INFO CUDA Dev 0, IP Interfaces : lo(SOC) eno49(PHB)
ml:436:436 [1] NCCL INFO Could not find real path of /sys/class/net/lo/device
ml:436:436 [1] NCCL INFO CUDA Dev 1, IP Interfaces : lo(SOC) eno49(PHB)
ml:436:436 [2] NCCL INFO Could not find real path of /sys/class/net/lo/device
ml:436:436 [2] NCCL INFO CUDA Dev 2, IP Interfaces : lo(SOC) eno49(SOC)
ml:436:436 [3] NCCL INFO Could not find real path of /sys/class/net/lo/device
ml:436:436 [3] NCCL INFO CUDA Dev 3, IP Interfaces : lo(SOC) eno49(SOC)
ml:436:436 [3] NCCL INFO Using 256 threads
ml:436:436 [3] NCCL INFO Min Comp Cap 5
ml:436:436 [3] NCCL INFO Ring 00 : 0 1 2 3
ml:436:436 [0] NCCL INFO Ring 00 : 0[0] -> 1[1] via P2P/direct pointer
ml:436:436 [1] NCCL INFO Ring 00 : 1[1] -> 2[2] via direct shared memory
ml:436:436 [2] NCCL INFO Ring 00 : 2[2] -> 3[3] via P2P/direct pointer
ml:436:436 [3] NCCL INFO Ring 00 : 3[3] -> 0[0] via direct shared memory
ml:436:436 [0] NCCL INFO Launch mode Group/Stream

CPU and memory utlization shows high usage and GPU utilization is 100%. Any thoughts? Thanks!

DeepLabV3+JPU pre-model download error

谷歌云端硬盘上的pre-model
image
是488M,但是我下载下来却是511.9MB
image
然后我用The Unarchiver解压缩“deeplab_jpu_res50_pcontext.pth.tar”,出现报错,提示数据损坏
image
我用mac自带的解压缩工具解压后却生成一个新的文件名为“deeplab_jpu_res50_pcontext.pth.tar.cpgz”的压缩文件,再次解压缩这个文件却重新生成了一个“deeplab_jpu_res50_pcontext.pth.tar”,这样不断循坏。
你好,可以重新提供一个百度云的地址?谢谢

关于setup.py

你好!
请问一下,这个代码每次有改动都必须重新在/FastFCN/目录下执行一次 python setup.py install吗?
谢谢

ModuleNotFoundError: No module named 'detail'

Hi, I was trying to run the test script

CUDA_VISIBLE_DEVICES=0,1,2,3 python test.py --dataset pcontext \
    --model encnet --jpu --aux --se-loss \
--backbone resnet50 --resume {MODEL} --split val --mode test

Then this error popped up:

  File "[...]/FastFCN/encoding/datasets/pcontext.py", line 25, in __init__
    from detail import Detail
ModuleNotFoundError: No module named 'detail

Are we missing a detail.py module here?

請問如何重現EncNet的結果

你好,請問一下如何重現EncNet的結果

github上只有EncNet+JPU的script而無EncNet的script

想重現EncNet的結果將 --jpu --aux 拿掉就可以了嗎?

還是有其他地方要改?

Thanks !

Did not go on training

Thanks a lot to help me solve the ‘run slave’ .However, there is a new issue happen:

/home/weizhaoxiang/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/torch/nn/_reduction.py:43: UserWarning: size_average and reduce args will be deprecated, please use reduction='mean' instead.
warnings.warn(warning.format(ret))
Using poly LR Scheduler!
Starting Epoch: 0
Total Epoches: 240
0%| | 0/371 [00:00<?, ?it/s]
=>Epoches 0, learning rate = 0.0050, previous best = 0.0000
/home/weizhaoxiang/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/torch/nn/functional.py:2390: UserWarning: nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.
warnings.warn("nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.")
problem.

After the print, the code did not go on training or break, just stop at here. And you can not check there is a thread exist with the command 'nvidia-smi'.

How to solve the problem??

subprocess.CalledProcessError: Command '['where', 'cl']' returned non-zero exit status 1

when I tried to run test.py I got this error:

C:\Users\127051\AppData\Local\Programs\Python\Python35\python.exe "C:\Program Files\JetBrains\PyCharm Community Edition 2018.3.5\helpers\pydev\pydevd.py" --multiproc --qt-support=auto --client 127.0.0.1 --port 1388 --file "D:/Artificial Intelligence/Segmentation/SRC/FastFCN-master/FastFCN-master/experiments/segmentation/test.py"
pydev debugger: process 12360 is connecting

Connected to pydev debugger (build 191.6605.12)
C:\Users\127051\AppData\Local\Programs\Python\Python35\lib\site-packages\torch\utils\cpp_extension.py:184: UserWarning: Error checking compiler version for c++: Command 'c++' returned non-zero exit status 1
  warnings.warn('Error checking compiler version for {}: {}'.format(compiler, error))
INFO: Could not find files for the given pattern(s).
Traceback (most recent call last):
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2018.3.5\helpers\pydev\pydevd.py", line 1741, in <module>
    main()
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2018.3.5\helpers\pydev\pydevd.py", line 1735, in main
    globals = debugger.run(setup['file'], None, None, is_module)
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2018.3.5\helpers\pydev\pydevd.py", line 1135, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2018.3.5\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "D:/Artificial Intelligence/Segmentation/SRC/FastFCN-master/FastFCN-master/experiments/segmentation/test.py", line 12, in <module>
    import encoding.utils as utils
  File "D:\Artificial Intelligence\Segmentation\SRC\FastFCN-master\FastFCN-master\experiments\segmentation\encoding\__init__.py", line 13, in <module>
    from . import nn, functions, dilated, parallel, utils, models, datasets
  File "D:\Artificial Intelligence\Segmentation\SRC\FastFCN-master\FastFCN-master\experiments\segmentation\encoding\nn\__init__.py", line 12, in <module>
    from .syncbn import *
  File "D:\Artificial Intelligence\Segmentation\SRC\FastFCN-master\FastFCN-master\experiments\segmentation\encoding\nn\syncbn.py", line 23, in <module>
    from ..functions import *
  File "D:\Artificial Intelligence\Segmentation\SRC\FastFCN-master\FastFCN-master\experiments\segmentation\encoding\functions\__init__.py", line 2, in <module>
    from .syncbn import *
  File "D:\Artificial Intelligence\Segmentation\SRC\FastFCN-master\FastFCN-master\experiments\segmentation\encoding\functions\syncbn.py", line 13, in <module>
    from .. import lib
  File "D:\Artificial Intelligence\Segmentation\SRC\FastFCN-master\FastFCN-master\experiments\segmentation\encoding\lib\__init__.py", line 14, in <module>
    ], build_directory=cpu_path, verbose=False)
  File "C:\Users\127051\AppData\Local\Programs\Python\Python35\lib\site-packages\torch\utils\cpp_extension.py", line 645, in load
    is_python_module)
  File "C:\Users\127051\AppData\Local\Programs\Python\Python35\lib\site-packages\torch\utils\cpp_extension.py", line 814, in _jit_compile
    with_cuda=with_cuda)
  File "C:\Users\127051\AppData\Local\Programs\Python\Python35\lib\site-packages\torch\utils\cpp_extension.py", line 859, in _write_ninja_file_and_build
    with_cuda=with_cuda)
  File "C:\Users\127051\AppData\Local\Programs\Python\Python35\lib\site-packages\torch\utils\cpp_extension.py", line 1064, in _write_ninja_file
    'cl']).decode().split('\r\n')
  File "C:\Users\127051\AppData\Local\Programs\Python\Python35\lib\subprocess.py", line 316, in check_output
    **kwargs).stdout
  File "C:\Users\127051\AppData\Local\Programs\Python\Python35\lib\subprocess.py", line 398, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['where', 'cl']' returned non-zero exit status 1

Process finished with exit code -1

My environment :
torch : 1.0.1
os : windows 10
python 3.5.4

visualization

I'm sorry for disturbing you again. Could you give a demo to show the visualization of ground-truth or any other images in PContext such as fig. 6 in paper ?

python scripts/prepare_ade20k.py

RuntimeError: Error building extension 'enclib_gpu': [1/3] :/usr/local/cuda-8.0:/root/cuda:/usr/local/cuda-9.0/bin/nvcc -DTORCH_EXTENSION_NAME=enclib_gpu -DTORCH_API_INCLUDE_EXTENSION_H -isystem /root/anaconda3/envs/fastFCN/lib/python3.5/site-packages/torch/lib/include -isystem /root/anaconda3/envs/fastFCN/lib/python3.5/site-packages/torch/lib/include/torch/csrc/api/include -isystem /root/anaconda3/envs/fastFCN/lib/python3.5/site-packages/torch/lib/include/TH -isystem /root/anaconda3/envs/fastFCN/lib/python3.5/site-packages/torch/lib/include/THC -isystem :/usr/local/cuda-8.0:/root/cuda:/usr/local/cuda-9.0/include -isystem /root/anaconda3/envs/fastFCN/include/python3.5m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --compiler-options '-fPIC' -std=c++11 -c /home/fastFCN/FastFCN/encoding/lib/gpu/encoding_kernel.cu -o encoding_kernel.cuda.o
FAILED: encoding_kernel.cuda.o
:/usr/local/cuda-8.0:/root/cuda:/usr/local/cuda-9.0/bin/nvcc -DTORCH_EXTENSION_NAME=enclib_gpu -DTORCH_API_INCLUDE_EXTENSION_H -isystem /root/anaconda3/envs/fastFCN/lib/python3.5/site-packages/torch/lib/include -isystem /root/anaconda3/envs/fastFCN/lib/python3.5/site-packages/torch/lib/include/torch/csrc/api/include -isystem /root/anaconda3/envs/fastFCN/lib/python3.5/site-packages/torch/lib/include/TH -isystem /root/anaconda3/envs/fastFCN/lib/python3.5/site-packages/torch/lib/include/THC -isystem :/usr/local/cuda-8.0:/root/cuda:/usr/local/cuda-9.0/include -isystem /root/anaconda3/envs/fastFCN/include/python3.5m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --compiler-options '-fPIC' -std=c++11 -c /home/fastFCN/FastFCN/encoding/lib/gpu/encoding_kernel.cu -o encoding_kernel.cuda.o
/bin/sh: :/usr/local/cuda-8.0:/root/cuda:/usr/local/cuda-9.0/bin/nvcc: No such file or directory

how to solve this?

what is trainval?

Thanks for your code.
You train your model on training and val dataset. Does it mean that the validation dataset also join in training dataset ?
image

I don't understand the meaning of trainval. Waiting for your reply.

Code implemention

I would like to ask which file code implements the convolution network in the paper graph

论文中的一些疑惑

按照论文中3.3.3 开头所说的流程 应该通过ys和ym0来学习得到h
但是3.3.3中并没有讲如何得到ys?

论文中说将3个尺度的特征图上采样,请问是怎样上采样的?
通过h来进行联合上采样又是具体怎么学习的?

看论文从3.3.3开始就感觉具体操作和前面的内容有点接不上了

NVIDIA driver on your system is too old

Hi, can you tell me how to solve this problem:

Traceback (most recent call last):
File "train.py", line 182, in trainer.validation(epoch) File "train.py", line 149, in validation correct, labeled, inter, union = eval_batch(self.model, image, target)
File "train.py", line 134, in eval_batch target = target.cuda() File "/home/weizhaoxiang/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/torch/cuda/init.py", line 178, in _lazy_init _check_driver()
File "/home/weizhaoxiang/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/torch/cuda/init.py", line 108, in _check_driver
of the CUDA driver.""".format(str(torch._C._cuda_getDriverVersion())))
AssertionError:
The NVIDIA driver on your system is too old (found version 9000).
Please update your GPU driver by downloading and installing a new
version from the URL: http://www.nvidia.com/Download/index.aspx
Alternatively, go to: https://pytorch.org to install
a PyTorch version that has been compiled with your version
of the CUDA driver.

关于

您好,为什么我把JPU替换掉我自己的网络中的dilated conv层,速度和时间并没有提升?

problem in testing

used this for testing
CUDA_VISIBLE_DEVICES=4,5,6,7 python test.py --dataset ade20k --model encnet --jpu --aux --se-loss --backbone resnet101 --resume 'runs/ade20k/encnet/encnet_res50_ade20k_train/model_best.pth.tar' --split test --mode test

showed this error :
Namespace(aux=True, aux_weight=0.2, backbone='resnet101', base_size=520, batch_size=16, checkname='default', crop_size=480, cuda=True, dataset='ade20k', dilated=False, epochs=120, ft=False, jpu=True, lateral=False, lr=0.01, lr_scheduler='poly', mode='test', model='encnet', model_zoo=None, momentum=0.9, ms=False, no_cuda=False, no_val=False, resume='runs/ade20k/encnet/encnet_res50_ade20k_train/model_best.pth.tar', save_folder='results', se_loss=True, se_weight=0.2, seed=1, split='test', start_epoch=0, test_batch_size=16, train_split='train', weight_decay=0.0001, workers=16) Traceback (most recent call last): File "test.py", line 91, in test(args) File "test.py", line 57, in test model.load_state_dict(checkpoint['state_dict']) File "/home/akmmrahman/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 777, in load_state_dict self.class.name, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for EncNet: Missing key(s) in state_dict: "pretrained.layer3.6.conv1.weight", "pretrained.layer3.6.bn1.weight", "pretrained.layer3.6.bn1.bias", "pretrained.layer3.6.bn1.running_mean", "pretrained.layer3.6.bn1.running_var", "pretrained.layer3.6.conv2.weight", "pretrained.layer3.6.bn2.weight", "pretrained.layer3.6.bn2.bias", "pretrained.layer3.6.bn2.running_mean", "pretrained.layer3.6.bn2.running_var", "pretrained.layer3.6.conv3.weight", "pretrained.layer3.6.bn3.weight", "pretrained.layer3.6.bn3.bias", "pretrained.layer3.6.bn3.running_mean", "pretrained.layer3.6.bn3.running_var", "pretrained.layer3.7.conv1.weight", "pretrained.layer3.7.bn1.weight", "pretrained.layer3.7.bn1.bias", "pretrained.layer3.7.bn1.running_mean", "pretrained.layer3.7.bn1.running_var", "pretrained.layer3.7.conv2.weight", "pretrained.layer3.7.bn2.weight", "pretrained.layer3.7.bn2.bias", "pretrained.layer3.7.bn2.running_mean", "pretrained.layer3.7.bn2.running_var", "pretrained.layer3.7.conv3.weight", "pretrained.layer3.7.bn3.weight", "pretrained.layer3.7.bn3.bias", "pretrained.layer3.7.bn3.running_mean", "pretrained.layer3.7.bn3.running_var", "pretrained.layer3.8.conv1.weight", "pretrained.layer3.8.bn1.weight", "pretrained.layer3.8.bn1.bias", "pretrained.layer3.8.bn1.running_mean", "pretrained.layer3.8.bn1.running_var", "pretrained.layer3.8.conv2.weight", "pretrained.layer3.8.bn2.weight", "pretrained.layer3.8.bn2.bias", "pretrained.layer3.8.bn2.running_mean", "pretrained.layer3.8.bn2.running_var", "pretrained.layer3.8.conv3.weight", "pretrained.layer3.8.bn3.weight", "pretrained.layer3.8.bn3.bias", "pretrained.layer3.8.bn3.running_mean", "pretrained.layer3.8.bn3.running_var", "pretrained.layer3.9.conv1.weight", "pretrained.layer3.9.bn1.weight", "pretrained.layer3.9.bn1.bias", "pretrained.layer3.9.bn1.running_mean", "pretrained.layer3.9.bn1.running_var", "pretrained.layer3.9.conv2.weight", "pretrained.layer3.9.bn2.weight", "pretrained.layer3.9.bn2.bias", "pretrained.layer3.9.bn2.running_mean", "pretrained.layer3.9.bn2.running_var", "pretrained.layer3.9.conv3.weight", "pretrained.layer3.9.bn3.weight", "pretrained.layer3.9.bn3.bias", "pretrained.layer3.9.bn3.running_mean", "pretrained.layer3.9.bn3.running_var", "pretrained.layer3.10.conv1.weight", "pretrained.layer3.10.bn1.weight", "pretrained.layer3.10.bn1.bias", "pretrained.layer3.10.bn1.running_mean", "pretrained.layer3.10.bn1.running_var", "pretrained.layer3.10.conv2.weight", "pretrained.layer3.10.bn2.weight", "pretrained.layer3.10.bn2.bias", "pretrained.layer3.10.bn2.running_mean", "pretrained.layer3.10.bn2.running_var", "pretrained.layer3.10.conv3.weight", "pretrained.layer3.10.bn3.weight", "pretrained.layer3.10.bn3.bias", "pretrained.layer3.10.bn3.running_mean", "pretrained.layer3.10.bn3.running_var", "pretrained.layer3.11.conv1.weight", "pretrained.layer3.11.bn1.weight", "pretrained.layer3.11.bn1.bias", "pretrained.layer3.11.bn1.running_mean", "pretrained.layer3.11.bn1.running_var", "pretrained.layer3.11.conv2.weight", "pretrained.layer3.11.bn2.weight", "pretrained.layer3.11.bn2.bias", "pretrained.layer3.11.bn2.running_mean", "pretrained.layer3.11.bn2.running_var", "pretrained.layer3.11.conv3.weight", "pretrained.layer3.11.bn3.weight", "pretrained.layer3.11.bn3.bias", "pretrained.layer3.11.bn3.running_mean", "pretrained.layer3.11.bn3.running_var", "pretrained.layer3.12.conv1.weight", "pretrained.layer3.12.bn1.weight", "pretrained.layer3.12.bn1.bias", "pretrained.layer3.12.bn1.running_mean", "pretrained.layer3.12.bn1.running_var", "pretrained.layer3.12.conv2.weight", "pretrained.layer3.12.bn2.weight", "pretrained.layer3.12.bn2.bias", "pretrained.layer3.12.bn2.running_mean", "pretrained.layer3.12.bn2.running_var", "pretrained.layer3.12.conv3.weight", "pretrained.layer3.12.bn3.weight", "pretrained.layer3.12.bn3.bias", "pretrained.layer3.12.bn3.running_mean", "pretrained.layer3.12.bn3.running_var", "pretrained.layer3.13.conv1.weight", "pretrained.layer3.13.bn1.weight", "pretrained.layer3.13.bn1.bias", "pretrained.layer3.13.bn1.running_mean", "pretrained.layer3.13.bn1.running_var", "pretrained.layer3.13.conv2.weight", "pretrained.layer3.13.bn2.weight", "pretrained.layer3.13.bn2.bias", "pretrained.layer3.13.bn2.running_mean", "pretrained.layer3.13.bn2.running_var", "pretrained.layer3.13.conv3.weight", "pretrained.layer3.13.bn3.weight", "pretrained.layer3.13.bn3.bias", "pretrained.layer3.13.bn3.running_mean", "pretrained.layer3.13.bn3.running_var", "pretrained.layer3.14.conv1.weight", "pretrained.layer3.14.bn1.weight", "pretrained.layer3.14.bn1.bias", "pretrained.layer3.14.bn1.running_mean", "pretrained.layer3.14.bn1.running_var", "pretrained.layer3.14.conv2.weight", "pretrained.layer3.14.bn2.weight", "pretrained.layer3.14.bn2.bias", "pretrained.layer3.14.bn2.running_mean", "pretrained.layer3.14.bn2.running_var", "pretrained.layer3.14.conv3.weight", "pretrained.layer3.14.bn3.weight", "pretrained.layer3.14.bn3.bias", "pretrained.layer3.14.bn3.running_mean", "pretrained.layer3.14.bn3.running_var", "pretrained.layer3.15.conv1.weight", "pretrained.layer3.15.bn1.weight", "pretrained.layer3.15.bn1.bias", "pretrained.layer3.15.bn1.running_mean", "pretrained.layer3.15.bn1.running_var", "pretrained.layer3.15.conv2.weight", "pretrained.layer3.15.bn2.weight", "pretrained.layer3.15.bn2.bias", "pretrained.layer3.15.bn2.running_mean", "pretrained.layer3.15.bn2.running_var", "pretrained.layer3.15.conv3.weight", "pretrained.layer3.15.bn3.weight", "pretrained.layer3.15.bn3.bias", "pretrained.layer3.15.bn3.running_mean", "pretrained.layer3.15.bn3.running_var", "pretrained.layer3.16.conv1.weight", "pretrained.layer3.16.bn1.weight", "pretrained.layer3.16.bn1.bias", "pretrained.layer3.16.bn1.running_mean", "pretrained.layer3.16.bn1.running_var", "pretrained.layer3.16.conv2.weight", "pretrained.layer3.16.bn2.weight", "pretrained.layer3.16.bn2.bias", "pretrained.layer3.16.bn2.running_mean", "pretrained.layer3.16.bn2.running_var", "pretrained.layer3.16.conv3.weight", "pretrained.layer3.16.bn3.weight", "pretrained.layer3.16.bn3.bias", "pretrained.layer3.16.bn3.running_mean", "pretrained.layer3.16.bn3.running_var", "pretrained.layer3.17.conv1.weight", "pretrained.layer3.17.bn1.weight", "pretrained.layer3.17.bn1.bias", "pretrained.layer3.17.bn1.running_mean", "pretrained.layer3.17.bn1.running_var", "pretrained.layer3.17.conv2.weight", "pretrained.layer3.17.bn2.weight", "pretrained.layer3.17.bn2.bias", "pretrained.layer3.17.bn2.running_mean", "pretrained.layer3.17.bn2.running_var", "pretrained.layer3.17.conv3.weight", "pretrained.layer3.17.bn3.weight", "pretrained.layer3.17.bn3.bias", "pretrained.layer3.17.bn3.running_mean", "pretrained.layer3.17.bn3.running_var", "pretrained.layer3.18.conv1.weight", "pretrained.layer3.18.bn1.weight", "pretrained.layer3.18.bn1.bias", "pretrained.layer3.18.bn1.running_mean", "pretrained.layer3.18.bn1.running_var", "pretrained.layer3.18.conv2.weight", "pretrained.layer3.18.bn2.weight", "pretrained.layer3.18.bn2.bias", "pretrained.layer3.18.bn2.running_mean", "pretrained.layer3.18.bn2.running_var", "pretrained.layer3.18.conv3.weight", "pretrained.layer3.18.bn3.weight", "pretrained.layer3.18.bn3.bias", "pretrained.layer3.18.bn3.running_mean", "pretrained.layer3.18.bn3.running_var", "pretrained.layer3.19.conv1.weight", "pretrained.layer3.19.bn1.weight", "pretrained.layer3.19.bn1.bias", "pretrained.layer3.19.bn1.running_mean", "pretrained.layer3.19.bn1.running_var", "pretrained.layer3.19.conv2.weight", "pretrained.layer3.19.bn2.weight", "pretrained.layer3.19.bn2.bias", "pretrained.layer3.19.bn2.running_mean", "pretrained.layer3.19.bn2.running_var", "pretrained.layer3.19.conv3.weight", "pretrained.layer3.19.bn3.weight", "pretrained.layer3.19.bn3.bias", "pretrained.layer3.19.bn3.running_mean", "pretrained.layer3.19.bn3.running_var", "pretrained.layer3.20.conv1.weight", "pretrained.layer3.20.bn1.weight", "pretrained.layer3.20.bn1.bias", "pretrained.layer3.20.bn1.running_mean", "pretrained.layer3.20.bn1.running_var", "pretrained.layer3.20.conv2.weight", "pretrained.layer3.20.bn2.weight", "pretrained.layer3.20.bn2.bias", "pretrained.layer3.20.bn2.running_mean", "pretrained.layer3.20.bn2.running_var", "pretrained.layer3.20.conv3.weight", "pretrained.layer3.20.bn3.weight", "pretrained.layer3.20.bn3.bias", "pretrained.layer3.20.bn3.running_mean", "pretrained.layer3.20.bn3.running_var", "pretrained.layer3.21.conv1.weight", "pretrained.layer3.21.bn1.weight", "pretrained.layer3.21.bn1.bias", "pretrained.layer3.21.bn1.running_mean", "pretrained.layer3.21.bn1.running_var", "pretrained.layer3.21.conv2.weight", "pretrained.layer3.21.bn2.weight", "pretrained.layer3.21.bn2.bias", "pretrained.layer3.21.bn2.running_mean", "pretrained.layer3.21.bn2.running_var", "pretrained.layer3.21.conv3.weight", "pretrained.layer3.21.bn3.weight", "pretrained.layer3.21.bn3.bias", "pretrained.layer3.21.bn3.running_mean", "pretrained.layer3.21.bn3.running_var", "pretrained.layer3.22.conv1.weight", "pretrained.layer3.22.bn1.weight", "pretrained.layer3.22.bn1.bias", "pretrained.layer3.22.bn1.running_mean", "pretrained.layer3.22.bn1.running_var", "pretrained.layer3.22.conv2.weight", "pretrained.layer3.22.bn2.weight", "pretrained.layer3.22.bn2.bias", "pretrained.layer3.22.bn2.running_mean", "pretrained.layer3.22.bn2.running_var", "pretrained.layer3.22.conv3.weight", "pretrained.layer3.22.bn3.weight", "pretrained.layer3.22.bn3.bias", "pretrained.layer3.22.bn3.running_mean", "pretrained.layer3.22.bn3.running_var".

Test a single image

After the training is complete , with the trained model ,how do i segment a single image in any specific folder ?

Wrong when i run prepare_pcontext.py

[yulu@yq01-gpu-255-126-19-00 FastFCN]$ python scripts/prepare_pcontext.py
/home/yulu/anaconda3/lib/python3.7/site-packages/torch/utils/cpp_extension.py:166: UserWarning:

                           !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.

See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                          !! WARNING !!

platform=sys.platform))
Traceback (most recent call last):
File "scripts/prepare_pcontext.py", line 7, in
from encoding.utils import download, mkdir
File "/home/yulu/anaconda3/lib/python3.7/site-packages/encoding/init.py", line 13, in
from . import nn, functions, dilated, parallel, utils, models, datasets
File "/home/yulu/anaconda3/lib/python3.7/site-packages/encoding/nn/init.py", line 12, in
from .syncbn import *
File "/home/yulu/anaconda3/lib/python3.7/site-packages/encoding/nn/syncbn.py", line 23, in
from ..functions import *
File "/home/yulu/anaconda3/lib/python3.7/site-packages/encoding/functions/init.py", line 2, in
from .syncbn import *
File "/home/yulu/anaconda3/lib/python3.7/site-packages/encoding/functions/syncbn.py", line 13, in
from .. import lib
File "/home/yulu/anaconda3/lib/python3.7/site-packages/encoding/lib/init.py", line 14, in
], build_directory=cpu_path, verbose=False)
File "/home/yulu/anaconda3/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 645, in load
is_python_module)
File "/home/yulu/anaconda3/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 814, in _jit_compile
with_cuda=with_cuda)
File "/home/yulu/anaconda3/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 863, in _write_ninja_file_and_build
_build_extension_module(name, build_directory, verbose)
File "/home/yulu/anaconda3/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 946, in _build_extension_module
check=True)
File "/home/yulu/anaconda3/lib/python3.7/subprocess.py", line 453, in run
with Popen(*popenargs, **kwargs) as process:
File "/home/yulu/anaconda3/lib/python3.7/subprocess.py", line 756, in init
restore_signals, start_new_session)
File "/home/yulu/anaconda3/lib/python3.7/subprocess.py", line 1499, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'ninja': 'ninja'

ninja problem

I downloaded the citys dataset and put it in home/user/.encoding/data.
Then, I run command 'CUDA_VISIBLE_DEVICES=0,1 python train.py --dataset citys
--model deeplab --jpu --aux
--backbone resnet50 --checkname deeplab_res50_citys'
and the issue come out:

Traceback (most recent call last):
File "/home/weizhaoxiang/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 576, in _build_extension_module
['ninja', '-v'], stderr=subprocess.STDOUT, cwd=build_directory)
File "/home/weizhaoxiang/anaconda3/envs/pytorch/lib/python3.6/subprocess.py", line 336, in check_output
**kwargs).stdout
File "/home/weizhaoxiang/anaconda3/envs/pytorch/lib/python3.6/subprocess.py", line 418, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train.py", line 16, in
import encoding.utils as utils
File "/home/weizhaoxiang/anaconda3/envs/pytorch/lib/python3.6/site-packages/encoding/init.py", line 13, in
from . import nn, functions, dilated, parallel, utils, models, datasets
File "/home/weizhaoxiang/anaconda3/envs/pytorch/lib/python3.6/site-packages/encoding/nn/init.py", line 12, in
from .syncbn import *
File "/home/weizhaoxiang/anaconda3/envs/pytorch/lib/python3.6/site-packages/encoding/nn/syncbn.py", line 23, in
from ..functions import *
File "/home/weizhaoxiang/anaconda3/envs/pytorch/lib/python3.6/site-packages/encoding/functions/init.py", line 2, in
from .syncbn import *
File "/home/weizhaoxiang/anaconda3/envs/pytorch/lib/python3.6/site-packages/encoding/functions/syncbn.py", line 13, in
from .. import lib
File "/home/weizhaoxiang/anaconda3/envs/pytorch/lib/python3.6/site-packages/encoding/lib/init.py", line 14, in
], build_directory=cpu_path, verbose=False)
File "/home/weizhaoxiang/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 501, in load
_build_extension_module(name, build_directory)
File "/home/weizhaoxiang/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 582, in _build_extension_module
name, error.output.decode()))
RuntimeError: Error building extension 'enclib_cpu': [1/2] c++ -MMD -MF syncbn_cpu.o.d -DTORCH_EXTENSION_NAME=enclib_cpu -I/home/weizhaoxiang/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/lib/include -I/home/weizhaoxiang/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/lib/include/TH -I/home/weizhaoxiang/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/lib/include/THC -I/home/weizhaoxiang/anaconda3/envs/pytorch/include/python3.6m -fPIC -std=c++11 -c /home/weizhaoxiang/anaconda3/envs/pytorch/lib/python3.6/site-packages/encoding/lib/cpu/syncbn_cpu.cpp -o syncbn_cpu.o
FAILED: syncbn_cpu.o
c++ -MMD -MF syncbn_cpu.o.d -DTORCH_EXTENSION_NAME=enclib_cpu -I/home/weizhaoxiang/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/lib/include -I/home/weizhaoxiang/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/lib/include/TH -I/home/weizhaoxiang/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/lib/include/THC -I/home/weizhaoxiang/anaconda3/envs/pytorch/include/python3.6m -fPIC -std=c++11 -c /home/weizhaoxiang/anaconda3/envs/pytorch/lib/python3.6/site-packages/encoding/lib/cpu/syncbn_cpu.cpp -o syncbn_cpu.o
/home/weizhaoxiang/anaconda3/envs/pytorch/lib/python3.6/site-packages/encoding/lib/cpu/syncbn_cpu.cpp:1:29: fatal error: torch/extension.h: No such file or directory
compilation terminated.
ninja: build stopped: subcommand failed.

How to solve this problem??

performance problem when training ADE20k

When i train ADE20k and cropsize=520,i have got a worse performance than original crop-size 480,do you have any idea why this happen?是学习率要调高吗?

Segmentation fault

I think this problem is caused by my previous pytorch problem,so maybe i have to solve pytorch first.Could you give me some help?
gcc:4.8
pytorch:1.1.0
python:3.5
and how could i change the pytorch version to 1.0.0?pip install torch==1.0?

how to set learning rate for single GPU and the performance trained on small batch size

I am using a single GPU, so my batch_size ==2

  1. Should I use the default setting of learning rate as shown in the following
    args.lr = lrs[args.dataset.lower()] / 16 * args.batch_size
    Seems the lr will be very small.

  2. what the 16 means in the code above?

  3. Have you ever try trained on very small batch_size?
    For me, after 80 epoches, the default setting for lr, batch_size 2, mIoU is about 0.33.
    It could be when small batch size, 80 epoches is not enough for good converge. But if you have some experience on single GPU (small batch size), it would be great if we can discuss

Thank you for your code and appriciate if you can help.

Training on custom dataset

Hello! Will 12 GB of single GPU be enough to train your model on a small dataset? And may be you can publish your pretrained model (dataset no matter)? It would be great)

size of predicted map

In Fig. 2 of your paper, I hold the view that the final predicted map is the same size as input rather than 8x smaller than input.

AttributeError: 'NoneType' object has no attribute 'run_slave'

Hi!
I am not sure what is causing the following Error although I worry my cuda version is wrong or my gpu could not be suitable.

I am running cuda 10 on a 1060 GTX

When I run

#train
CUDA_VISIBLE_DEVICES=0 python train.py --dataset pcontext \
    --model encnet --jpu --aux --se-loss \
    --backbone resnet50 --checkname encnet_res50_pcontext

I get

Traceback (most recent call last):
  File "train.py", line 180, in <module>
    trainer.training(epoch)
  File "train.py", line 110, in training
    outputs = self.model(image)
  File "/home/emiller/WorkPlace/DataScience/Personal/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/emiller/WorkPlace/DataScience/Personal/venv/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 141, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/emiller/WorkPlace/DataScience/Personal/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/emiller/WorkPlace/DataScience/Personal/venv/lib/python3.6/site-packages/encoding/models/encnet.py", line 33, in forward
    features = self.base_forward(x)
  File "/home/emiller/WorkPlace/DataScience/Personal/venv/lib/python3.6/site-packages/encoding/models/base.py", line 55, in base_forward
    x = self.pretrained.conv1(x)
  File "/home/emiller/WorkPlace/DataScience/Personal/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/emiller/WorkPlace/DataScience/Personal/venv/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "/home/emiller/WorkPlace/DataScience/Personal/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/emiller/WorkPlace/DataScience/Personal/venv/lib/python3.6/site-packages/encoding/nn/syncbn.py", line 58, in forward
    mean, inv_std = self._slave_pipe.run_slave(_ChildMessage(xsum, xsqsum, N))
AttributeError: 'NoneType' object has no attribute 'run_slave'

At first i thought this is because i am not multi gpu but could it be something else?

Undefined names

flake8 testing of https://github.com/wuhuikai/FastFCN on Python 3.7.1

$ flake8 . --count --select=E9,F63,F72,F82 --show-source --statistics

./encoding/models/base.py:102:34: F821 undefined name 'target_gpus'
        kwargs = scatter(kwargs, target_gpus, dim) if kwargs else []
                                 ^
./encoding/models/base.py:102:47: F821 undefined name 'dim'
        kwargs = scatter(kwargs, target_gpus, dim) if kwargs else []
                                              ^
./encoding/datasets/base.py:113:44: F821 undefined name 'batch'
    raise TypeError((error_msg.format(type(batch[0]))))
                                           ^
3     F821 undefined name 'batch'
3

E901,E999,F821,F822,F823 are the "showstopper" flake8 issues that can halt the runtime with a SyntaxError, NameError, etc. These 5 are different from most other flake8 issues which are merely "style violations" -- useful for readability but they do not effect runtime safety.

  • F821: undefined name name
  • F822: undefined name name in __all__
  • F823: local variable name referenced before assignment
  • E901: SyntaxError or IndentationError
  • E999: SyntaxError -- failed to compile a file into an Abstract Syntax Tree

prepare_cityscapes.py not working

the download URLs aren't correct
_CITY_DOWNLOAD_URLS = [ ('gtFine_trainvaltest.zip', '99f532cb1af174f5fcc4c5bc8feea8c66246ddbc'), ('leftImg8bit_trainvaltest.zip', '2c0b77ce9933cc635adda307fbba5566f5d9d404')]
and the download function is never called

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.