eromera / erfnet_pytorch Goto Github PK

Pytorch code for semantic segmentation using ERFNet

License: Other

Python 100.00%

semantic segmentation semantic-segmentation erfnet pytorch cityscapes

erfnet_pytorch's Introduction

ERFNet (PyTorch version)

This code is a toolbox that uses PyTorch for training and evaluating the ERFNet architecture for semantic segmentation.

For the Original Torch version please go HERE

NOTE: This PyTorch version has a slightly better result than the ones in the Torch version (used in the paper): 72.1 IoU in Val set and 69.8 IoU in test set.

Publications

If you use this software in your research, please cite our publications:

"Efficient ConvNet for Real-time Semantic Segmentation", E. Romera, J. M. Alvarez, L. M. Bergasa and R. Arroyo, IEEE Intelligent Vehicles Symposium (IV), pp. 1789-1794, Redondo Beach (California, USA), June 2017. [Best Student Paper Award], [pdf]

"ERFNet: Efficient Residual Factorized ConvNet for Real-time Semantic Segmentation", E. Romera, J. M. Alvarez, L. M. Bergasa and R. Arroyo, Transactions on Intelligent Transportation Systems (T-ITS), December 2017. [pdf]

Packages

For instructions please refer to the README on each folder:

train contains tools for training the network for semantic segmentation.
eval contains tools for evaluating/visualizing the network's output.
imagenet Contains script and model for pretraining ERFNet's encoder in Imagenet.
trained_models Contains the trained models used in the papers. NOTE: the pytorch version is slightly different from the torch models.

Requirements:

The Cityscapes dataset: Download the "leftImg8bit" for the RGB images and the "gtFine" for the labels. Please note that for training you should use the "_labelTrainIds" and not the "_labelIds", you can download the cityscapes scripts and use the conversor to generate trainIds from labelIds
Python 3.6: If you don't have Python3.6 in your system, I recommend installing it with Anaconda
PyTorch: Make sure to install the Pytorch version for Python 3.6 with CUDA support (code only tested for CUDA 8.0).
Additional Python packages: numpy, matplotlib, Pillow, torchvision and visdom (optional for --visualize flag)

In Anaconda you can install with:

conda install numpy matplotlib torchvision Pillow
conda install -c conda-forge visdom

If you use Pip (make sure to have it configured for Python3.6) you can install with:

pip install numpy matplotlib torchvision Pillow visdom

License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, which allows for personal and research use only. For a commercial license please contact the authors. You can view a license summary here: http://creativecommons.org/licenses/by-nc/4.0/

erfnet_pytorch's People

Stargazers

Watchers

Forkers

donnyyou hyuantan landscape-ooo boundles lxh-123 wpf535236337 wuzzh kuyun-zhangyang mahlermozart denethor1997 hyzwj igi123 lorenwel world4jason hyzcn mingx9527 xtanitfy kowasaki hozh3497 snooble xgmiao geevi heumchri jasonlee020 zhengzibing2011 gitfenging eternalsunshine1314 yuelong-yu weisili2016 skrya wpfhtl yuyangyg holyhao irfanicmll hu-jerry forlovess amiltonwong eglrp adrshm91 shreelock amirunpri2018 halozqs andongchen hurricane2018 mathpopo chlee98 maxadda nnu-gisa nowrin0102 preetkhaturia tonyissacjames ticlazau pipi-hua yangqihang111 kiki0378 niccoloraspa shathe li5811100 ytzhao mrnengdada yyfyan songyongkang an-pan zhugeliang1 suyanzhou626 dachaoxc yingzgigi jfyao90 inkyusa cosmoshua pandongwei yushengyushengb baodijun nowander natsuka81 noticeable qianqian121 kamiqiu pvrohin qianmin a361251388 zhangtiantians sun-zzy olbychos fsong666 tiga002 abner742 stat-eklee dynamite1218 deephaejoong llwx593 sun3396815 iandouglas96 wozqhl garceamihai nilscode lb1995 secrul yaruhou seungsooyu

erfnet_pytorch's Issues

How to calculate FPS/FWT?

Dr. Edurado: Hello!
I have been puzzled recently to ask you, so can you tell me?
I am very interested in your real-time semantic work, ERFNet Efficient Residual Factorized ConvNet, but I have not found a description of FPS from your paper, it is related to FWT: FORWARD PASS TIME IN SECONDS, I also run your program , but I don't know if fwt is the Avg time/img:0.042 output during the training of your program, or the forward time per img tested by the script eval_forwardTime.py under the code eval folder, and this time and batchsize have Is it a big relationship? Also, does fwt need to test twice for training and test sets (ie, are they the same)?
I don't understand very well, can you tell me?
Thank you very much for taking time out of your busy schedule!

Not able to reproduce validation set accuracy

I am currently trying out the ERFNet on the cityscapes dataset. For that I use my own training script but the exact same model implementation as yours.

The mIoU results that I achieve are at best around 62% mIoU on the Cityscapes validation set for training from scratch. Now I am wondering if I am missing something during training since your validation set results are around 69% mIoU for training from scratch(right?).

What I do is:

Training Scale: 1024x512 (for testing: bilinear upsampling to 2048x1024)
Augmentation: Random translation x/y +-2px; rand. horizontal flipping; input normalization to [-1,1]
Class balancing with the weights from your script (the class train ids are the same as from the official cityscapes-scripts right? Or did you use a different train id distribution?)
Learning rate schedule with same lambda function as in your script
Start learning rate: 5e-4, weight decay: 1e-4
Batch size: 5 (you used 6 right? But I can't imagine that makes the huge difference)
Trained for 150 epochs then I picked the best working epoch (epoch 127 in my case) -> 62.3 % mIoU on val. set (did you search for the best epoch or achieved the results with simply the last epoch?)

So do you maybe know if I miss something that could lead to my poor performance? Any help would be appreciated!

About trainded model

Thanks for amazing work. I have a question about the pretrained weights. Is the model "erfnet_pretrained.pth" trained on ImageNet or cityscapes val or cityscapes trainval?

Results not reproducible

Have run the training at 1/4th resolution.

Two different runs give wildly different IoUs on the val set.

1st time: 65.43
2nd time: 60.28

Setting torch.manual_seed also doesn't work.

Paper issue confused

Hi, Eromera, Thanks to your great work!

I've confused about the 'width' increase concept you metioned in your paper(see as below pic).

Why you think Non-bottleneck-1D block directly increase the layer width?

And another question is did you test the Bottlenect-1D replaced erfnet's accuracy & inference time?

Looking forward to your reply，thanks :-)

dropout when eval model

Hi, did you use dropout when eval model? I use Caffe to reproduce ERFNet, but it costs about 100ms on forward. Could you give me some suggestions?

Implementation on PyTorch for Windows 10

Hi Eromera,
first thanks for your inspiring work! I wanted to recreate your project with a conda package of PyTorch for Windows 10 x64, Anaconda3 (Python 3.6), Cuda 8.0 and pytorch-0.3.0. When I tried to start the training I receive an IndexError because a list index is out of range. The same error also appears, when I try to evaluate the trained model on the validation set.

Do you think this kind of errors are quick to fix or are they appearing because I try to deploy it on Windows?

All the best,
Max

imagenet pretrained model

Hi, could you provide train scripts of IMAGENET pretrained model?

CARLA Simulator - Semantic Segmentation

Hello,

Firstly, thanks for this amazing work.

Secondly, I want to use the network to train it on my own dataset from (CARLA Simulator). Are there any tips on how to adapt your implementation to my own dataset (with only 12 classes of semantics) ?

How to compute class weight

In (https://github.com/Eromera/erfnet_pytorch/blob/master/train/main.py#L92-L131), you give different weights for every class when calculating loss. Can you introduce the method you compute this weights. I want to use that in other datasets. Thanks.

Why do you have 2 consecutive batch norm layer?

I'm just curious why you have 2 consecutive batch norm layer here. Also, is the encoder part for ImageNet and Cityscapes exactly the same? At least they seem to differ at this batch norm thing in the code.

iouEval

Hi, I don't understand "fpmult = x_onehot * (1-y_onehot-ignores) ". Why it's not "fpmult = x_onehot * (1-y_onehot)"?

Can demo be run on different sizes of images?

I've been trying to run the eval/eval_cityscape_color.py for a demoSequence for different sizes and I'm getting an error. Only 1024 x 2048 works. Is there anyway to run for different sizes?

File "/datadrive/pytorch/erfnet/erfnet_pytorch/eval/erfnet.py", line 21, in forward
    output = torch.cat([self.conv(input), self.pool(input)], 1)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 170 and 171 in dimension 3 at /pytorch/aten/src/THC/generic/THCTensorMath.cu:87

Do you report the top5 error for the encoder network?

Hi,

While I was reading your paper, the paper mentioned that the encoder was trained using two strategies: "from scratch", and "pretrained".

I was wondering what was the top 5 error for the encoder when training on ImageNet. Is it comparable (or better) than other efficient architectures like mobilenet or xception?

Add License

Hi,

Would you be able to add license to your code so others may use your work?

Thanks

transfer learning with erfnet

hi all,

I would like to do a transfer learning project by using erfnet, but i have some questions about the training process with erfnet.

I have collected my own data(training : val : test = 7k : 1.5k : 1.5k images), the dataset has 15 classes, if I would like to train the model without pre-trained ImageNet weights, when do I decide to terminate the encoder training process?

Thank you very much :)

how to test a pic?

when i use the code in the eval folder to test a pictures from VOC2012, i got some wrong results and every time, i got different results. i wonder if there are some trick you uncoment. if you can provide a demo in this github. thanks.

eval

What's the difference between the two weights?

hi, @Eromera
What's the difference between the two weights? If I train from scratch, what weight should I use?

How to properly resume decoder training?

Hi,

I am trying to retrain the model on my own on the cityscapes dataset, but only using 2 classes. The encoder training works fine, but I have problems with decoder training which I think are caused by not attaching the trained encoder to the model properly.

As far as i understand, there are two possibilities for decoder training.

1. Use the encoder pretrained on imagenet

For this I used the following commands from the documentation

python main_binary.py --savedir erfnet_training1 --datadir /home/datasets/cityscapes/ --num-epochs 150 --batch-size 6 --decoder --pretrainedEncoder "../trained_models/erfnet_encoder_pretrained.pth.tar"
and
python main_binary.py --savedir erfnet_training1 --datadir /home/datasets/cityscapes/ --num-epochs 150 --batch-size 6 --decoder --pretrainedEncoder "../trained_models/erfnet_encoder_pretrained.pth.tar" --resume

I have not trained the model for all epochs, but the intermediate result looks fine (best Val-IoU after 85 epochs: 0.9495)

2. Use an encoder trained on cityscapes

The encoder training worked fine and resulted in a Val-IoU of 0.9471.
However, I couldn't find any documentation on how to attach the pretrained encoder for decoder training.
From the code, I got that --pretrainedEncoder flag is only for the imagenet encoder, and that I should use --state, so I used

python main_binary.py --savedir erfnet_training2 --datadir /home/datasets/cityscapes/ --num-epochs 150 --batch-size 6 --decoder --state "../save/erfnet_training2/model_best_enc.pth.tar"
and since the code says to only use --state for initializing:
python main_binary.py --savedir erfnet_training2 --datadir /home/datasets/cityscapes/ --num-epochs 150 --batch-size 6 --decoder --resume

However, this didn't work out and finished with a best Val-IoU of only 0.9441, so the whole network performs worse than just the encoder alone.
For comparison, I also tested training the decoder without initializing an encoder, which resulted in Val-IoU= 0.9461.

So what is the correct way of training the decoder after finished encoder training?
Especially the arguments in combination with the --resume flag, since I cannot train the model in one go due to hardware availability.

Thank You for your answer.

DownsamplerBlock

why there is " self.conv = nn.Conv2d(ninput, noutput-ninput, (3, 3), stride=2, padding=1, bias=True)"
(noutput-ninput, instead of noutput)
And as a result , the network will be (conv): Conv2d(3, 13, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)). It's different the paper.

RuntimeError: input and target batch or spatial sizes don't match

How to calculate weights by processing dataset histogram?

Hi @Eromera , thanks for share.
In main.py for trainning, there is a TODO for calculating weights by processing dataset histogram. Can you tell something more about this? Such as how to create weight array using class balancing(calculate the total pixels or the number of polygon in json files as the number of classes)? What's the difference between encoder and decoder? Does it something like torch.utils.data.DataLoader() with sampler or batch_sampler option DataLoader?

Thanks!

how

Difference between the calculation of encoder weights and the decoder weights

hi, @Eromera
I want to train your model with a different dataset and different classes.

What is the difference between the calculation of encoder weights and the decoder weights?

I read in your article that you are using the next formula to determine the weights:

I saw that you have different values for the encoder and decoder weights:

If I use this formula to calculate the encoder weights, what should I do in order to calculate the decoder weights? Is there a connection between the encoder weights and the decoder weights? A multiplication factor or something ?

Do you have a model trained in the ADE20k dataset

Do you have a model trained in the ADE20k dataset?

encoder and decoder separate during training?

Hello, I have a question.
Is encoder and decoder separate during training? Is it not end to end training?

DownsamplerBlock torch cat inconsistent tensor size

hi Eromera, I'm using my dataset to train the model, but it seems something wrong with the torch.cat operation in DownsamplerBlock:

it seems that results from conv and pool doesnot match for odd image size...
Also, due to the repeated usage of this "DownsamplerBlock", all "cat" operation input must have the even size.
Also, I'm curious about this "cat" operation, does it contribute to the final performance? I mean why just use the conv or pool operation directly?

Computation of mIoU

Hi. Can your evaluation code iouEval.py calculate the miou between 'pred' and 'gt'， if ‘pred’ has not been subjected to the max operation? Thanks.

Test speed

Hello ,could you help me how to calculate the segmentation speed.I don't know how to calculate the speed.

....

Error when I train my own dataset

Hi, Thanks for you share.
I train the model on CityScape Dateset and get the results which the paper show. I want to train the model on my own dataset, but met some issue.
When I train on 2 classes(including background), and change NUM_CLASSES=20(same with original code), the train process work find but the predict result look strange:

when I change NUM_CLASSES=20 and def __init__(self, nClasses, ignoreIndex=0) in iouEval.py(because my background is 0), in the encode val stage:

----- VALIDATING - EPOCH 1 ----- Traceback (most recent call last): File "main.py", line 545, in <module> main(parser.parse_args()) File "main.py", line 499, in main model = train(args, model, True) #Train encoder File "main.py", line 334, in train iouEvalVal.addBatch(outputs.max(1)[1].unsqueeze(1).data, targets.data) File "/media/holly/Code/Segmentation/ERFNet/erfnet_pytorch/train/iouEval.py", line 41, in addBatch x_onehot = x_onehot[:, :self.ignoreIndex] ValueError: result of slicing is an empty tensor

when I change NUM_CLASSES=2 and def __init__(self, nClasses, ignoreIndex=19), in the decode strage:

========== DECODER TRAINING =========== /DataSet/DSHolly/DataAll/SegmentationLikeCityScapes_room/leftImg8bit/train /DataSet/DSHolly/DataAll/SegmentationLikeCityScapes_room/leftImg8bit/val <class 'criterion.CrossEntropyLoss2d'> ----- TRAINING - EPOCH 1 ----- LEARNING RATE: 0.0005 THCudaCheck FAIL file=/pytorch/torch/lib/THCUNN/generic/Threshold.cu line=66 error=59 : device-side assert triggered THCudaCheck FAIL file=/pytorch/torch/lib/THCUNN/generic/Threshold.cu line=66 error=59 : device-side assert triggered Traceback (most recent call last): File "main.py", line 541, in <module> main(parser.parse_args()) File "main.py", line 514, in main model = train(args, model, False) #Train decoder File "main.py", line 260, in train loss.backward() File "/media/holly/Code/.pyenv/versions/Python3.6.3ERFNet/lib/python3.6/site-packages/torch/autograd/variable.py", line 156, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables) File "/media/holly/Code/.pyenv/versions/Python3.6.3ERFNet/lib/python3.6/site-packages/torch/autograd/__init__.py", line 98, in backward variables, grad_variables, retain_graph) File "/media/holly/Code/.pyenv/versions/Python3.6.3ERFNet/lib/python3.6/site-packages/torch/autograd/function.py", line 91, in apply return self._forward_cls.backward(self, *args) File "/media/holly/Code/.pyenv/versions/Python3.6.3ERFNet/lib/python3.6/site-packages/torch/nn/_functions/thnn/auto.py", line 187, in backward return (backward_cls.apply(input, grad_output, ctx.additional_args, ctx._backend, ctx.buffers, *tensor_params) + File "/media/holly/Code/.pyenv/versions/Python3.6.3ERFNet/lib/python3.6/site-packages/torch/nn/_functions/thnn/auto.py", line 219, in backward_cls_forward update_grad_input_fn(ctx._backend.library_state, input, grad_output, grad_input, *gi_args) RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/torch/lib/THCUNN/generic/Threshold.cu:66

Are there some tips if I want to train model on my own data set?
Thanks!!

Turns out to 5 fps on GTX1080TI, far way from realtime

I have test on GTX1080ti with ENet, its runs about 19 fps, however ERFNet just got 5 fps on gtx1080ti

TypeError: forward() missing 1 required positional argument: 'input'

My Pytorch Version: 0.4.0
CUDA: 8.0
Python Version: 3.6
Multiple GPU

Could you give me some suggestions? Thanks a lot

TypeError: forward() missing 1 required positional argument: 'input' When training on CityScape DataSet

Hello, thanks for your share.
I want to train on CityScape DataSet using /train/main.py, but I usually met some error in encode stage when train or val like:

Traceback (most recent call last): File "main.py", line 538, in <module> main(parser.parse_args()) File "main.py", line 492, in main model = train(args, model, True) #Train encoder File "main.py", line 251, in train outputs = model(inputs, only_encode=enc) File "/media/holly/Code/.pyenv/versions/Python3.6.3ERFNet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 325, in __call__ result = self.forward(*input, **kwargs) File "/media/holly/Code/.pyenv/versions/Python3.6.3ERFNet/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 68, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/media/holly/Code/.pyenv/versions/Python3.6.3ERFNet/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 78, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/media/holly/Code/.pyenv/versions/Python3.6.3ERFNet/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 67, in parallel_apply raise output File "/media/holly/Code/.pyenv/versions/Python3.6.3ERFNet/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 42, in _worker output = module(*input, **kwargs) File "/media/holly/Code/.pyenv/versions/Python3.6.3ERFNet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 325, in __call__ result = self.forward(*input, **kwargs) TypeError: forward() missing 1 required positional argument: 'input'

I debug in pycharm and found that the images and labels were loaded correctly, but when in inputs = Variable(images), I found some error: cannot call .data on torch.Tensor. Did I really load the data correctly or I make something wrong in other place?

Beside, the NUM_CLASSES = 20 in CityScape DataSet, but when I train I also met an error in val:

----- VALIDATING - EPOCH 1 ----- VAL loss: 0.6922 (epoch: 1, step: 0) // Avg time/img: 0.2710 s ERROR: Unknown label with id 19

So, does the label range from 0~19 or using the trainId in labels.py?

I use Ubuntu16.04, python3.6.3 and cuda9.0.
Thanks!

How to use cityscape conversor for erfnet_pytorch?

In the README you that that : "you can download the cityscapes scripts and use the conversor to generate trainIds from labelIds". How do I use this script to convert the data? Sorry I am new to this and I am unsure of how to run this script.

How much mIoU acquired when only train encoder ?

Hi Eromera,
How much mIoU you acquired when only train encoder ?
Thanks.

A issue for cityscapes dataset

Hi , Thx your code , I am very interested in your work .
But I don't know exactly which dataset I need . There is so many lable in Cityscapes dataset.
I guess the dataset is leftImg8bit_trainvaltest.zip (11GB) and the corresponding ground truth gtFine_trainvaltest.zip (241MB),haha.
And my computer do not has a GPU,which has a poor performance.
It is indeed double happiness for me to have a pretrained net , which contains txt or xml that compute before. Thanks a lot!

get stuck running erfnet model on Jetson TX2

Hi, I want to run the erfnet model on Jeson tx2.

I installed pytorch without problems and cuda 9.0 with cudnn 7.0
ex) i checked by 'import torch' , i checked by 'nvcc --version'

but When I try to run erfnet code, I got stuck

"RuntimeError : cuda runtime error(7) : too many resources requested for launch at /home/nvidia/pytorch/aten/src/THCUNN/im2col.h"

please help me!.

clarify "dropprob=0.03"

Hi, @Eromera
Inside Encoder, the first five "non_bt_1D" here applies dropprob=0.03, it that correct? or just a typo which should be 0.3?

THX!

trained weights for the "from scratch" model

Hi,

Do you have any plan to release pytorch version of the trained weights for the "from scratch" model?

thanks

Image Normalization in Pre-processing for Cityscapes

Hi,

I was trying to train the network on the Cityscapes dataset, and while going through the main training script, I could not find any Image Normalization code in the pre-processing part (here).

Does that mean we don't train the network on normalized image or am I missing something?

Thanks in advance!

reimplement help

Our work is almost done, and we are ready to quote your paper. But we encountered difficulties in reproduction.
Can I ask a question, how long have you spent training the model?

How to use cityscape conversor for erfnet_pytorch?

There is not model_best.pth after training finish

Hi, Eromera. There is --epochs-save to save model every X epochs, but when I set it to none zero, It can't save the model as I expected. And another issue is that when the training is finished, some time it will save the model_best.pth and someting just get the model_encoder_best.pth.
Thanks!

Could you give me the best weight of your training?

I want to test the model,could you give me the best weight of your training?

A test example

Could you give an example (say a command like) as to how to use it to do segmentation on an input image? (I see there are examples for cityscape datasets). Thanks!

How to reproduce 72.2 IoU on validation set?

Hi, I tried training ERFNet with the default configuration (single GPU):

python main.py --savedir erfnet_training1 --datadir /home/datasets/cityscapes/ --num-epochs 150 --batch-size 6

But, the best validation mean IoU that I could get was just ~69.0% (instead of 72.2 achieved by your pretrained model). Could you point out what could be causing this disparity?

*Another quick question - Why is the evaluation done on images and labels both resized to height of 512 pixels, rather than original resolution of height of 1024 pixels?

Thanks!

data with cityscape doesnot work

hi, thanks for the code first, but it seems code doesnot work with cityscape dataset currently. I'm using python3.6 with pytorch version: 0.2.0_4, it crashed after several training steps with:

THCudaCheck FAIL device-side assert triggered, update_grad)input_fn()
RuntimeError: cuda runtime error(59): device-side assert triggered at /opt/conda/conda-bld/pytorch/work/torch/lib/THCUNN/generic/Threshold.cu:66

Result

Has anyone run this code? How does it work