Giter Club home page Giter Club logo

erfnet_pytorch's Introduction

ERFNet (PyTorch version)

This code is a toolbox that uses PyTorch for training and evaluating the ERFNet architecture for semantic segmentation.

For the Original Torch version please go HERE

NOTE: This PyTorch version has a slightly better result than the ones in the Torch version (used in the paper): 72.1 IoU in Val set and 69.8 IoU in test set.

Example segmentation

Publications

If you use this software in your research, please cite our publications:

"Efficient ConvNet for Real-time Semantic Segmentation", E. Romera, J. M. Alvarez, L. M. Bergasa and R. Arroyo, IEEE Intelligent Vehicles Symposium (IV), pp. 1789-1794, Redondo Beach (California, USA), June 2017. [Best Student Paper Award], [pdf]

"ERFNet: Efficient Residual Factorized ConvNet for Real-time Semantic Segmentation", E. Romera, J. M. Alvarez, L. M. Bergasa and R. Arroyo, Transactions on Intelligent Transportation Systems (T-ITS), December 2017. [pdf]

Packages

For instructions please refer to the README on each folder:

  • train contains tools for training the network for semantic segmentation.
  • eval contains tools for evaluating/visualizing the network's output.
  • imagenet Contains script and model for pretraining ERFNet's encoder in Imagenet.
  • trained_models Contains the trained models used in the papers. NOTE: the pytorch version is slightly different from the torch models.

Requirements:

  • The Cityscapes dataset: Download the "leftImg8bit" for the RGB images and the "gtFine" for the labels. Please note that for training you should use the "_labelTrainIds" and not the "_labelIds", you can download the cityscapes scripts and use the conversor to generate trainIds from labelIds
  • Python 3.6: If you don't have Python3.6 in your system, I recommend installing it with Anaconda
  • PyTorch: Make sure to install the Pytorch version for Python 3.6 with CUDA support (code only tested for CUDA 8.0).
  • Additional Python packages: numpy, matplotlib, Pillow, torchvision and visdom (optional for --visualize flag)

In Anaconda you can install with:

conda install numpy matplotlib torchvision Pillow
conda install -c conda-forge visdom

If you use Pip (make sure to have it configured for Python3.6) you can install with:

pip install numpy matplotlib torchvision Pillow visdom

License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, which allows for personal and research use only. For a commercial license please contact the authors. You can view a license summary here: http://creativecommons.org/licenses/by-nc/4.0/

erfnet_pytorch's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

erfnet_pytorch's Issues

How to calculate FPS/FWT?

Dr. Edurado: Hello!
I have been puzzled recently to ask you, so can you tell me?
I am very interested in your real-time semantic work, ERFNet Efficient Residual Factorized ConvNet, but I have not found a description of FPS from your paper, it is related to FWT: FORWARD PASS TIME IN SECONDS, I also run your program , but I don't know if fwt is the Avg time/img:0.042 output during the training of your program, or the forward time per img tested by the script eval_forwardTime.py under the code eval folder, and this time and batchsize have Is it a big relationship? Also, does fwt need to test twice for training and test sets (ie, are they the same)?
I don't understand very well, can you tell me?
Thank you very much for taking time out of your busy schedule!

Not able to reproduce validation set accuracy

I am currently trying out the ERFNet on the cityscapes dataset. For that I use my own training script but the exact same model implementation as yours.

The mIoU results that I achieve are at best around 62% mIoU on the Cityscapes validation set for training from scratch. Now I am wondering if I am missing something during training since your validation set results are around 69% mIoU for training from scratch(right?).

What I do is:

  • Training Scale: 1024x512 (for testing: bilinear upsampling to 2048x1024)
  • Augmentation: Random translation x/y +-2px; rand. horizontal flipping; input normalization to [-1,1]
  • Class balancing with the weights from your script (the class train ids are the same as from the official cityscapes-scripts right? Or did you use a different train id distribution?)
  • Learning rate schedule with same lambda function as in your script
  • Start learning rate: 5e-4, weight decay: 1e-4
  • Batch size: 5 (you used 6 right? But I can't imagine that makes the huge difference)
  • Trained for 150 epochs then I picked the best working epoch (epoch 127 in my case) -> 62.3 % mIoU on val. set (did you search for the best epoch or achieved the results with simply the last epoch?)

So do you maybe know if I miss something that could lead to my poor performance? Any help would be appreciated!

About trainded model

Thanks for amazing work. I have a question about the pretrained weights. Is the model "erfnet_pretrained.pth" trained on ImageNet or cityscapes val or cityscapes trainval?

Results not reproducible

Have run the training at 1/4th resolution.

Two different runs give wildly different IoUs on the val set.

1st time: 65.43
2nd time: 60.28

Setting torch.manual_seed also doesn't work.

Paper issue confused

Hi, Eromera, Thanks to your great work!

I've confused about the 'width' increase concept you metioned in your paper(see as below pic).

image

Why you think Non-bottleneck-1D block directly increase the layer width?

And another question is did you test the Bottlenect-1D replaced erfnet's accuracy & inference time?

Looking forward to your reply,thanks :-)

dropout when eval model

Hi, did you use dropout when eval model? I use Caffe to reproduce ERFNet, but it costs about 100ms on forward. Could you give me some suggestions?

Implementation on PyTorch for Windows 10

Hi Eromera,
first thanks for your inspiring work! I wanted to recreate your project with a conda package of PyTorch for Windows 10 x64, Anaconda3 (Python 3.6), Cuda 8.0 and pytorch-0.3.0. When I tried to start the training I receive an IndexError because a list index is out of range. The same error also appears, when I try to evaluate the trained model on the validation set.

train error_li

evaluate error_li

Do you think this kind of errors are quick to fix or are they appearing because I try to deploy it on Windows?

All the best,
Max

CARLA Simulator - Semantic Segmentation

Hello,

Firstly, thanks for this amazing work.

Secondly, I want to use the network to train it on my own dataset from (CARLA Simulator). Are there any tips on how to adapt your implementation to my own dataset (with only 12 classes of semantics) ?

Why do you have 2 consecutive batch norm layer?

I'm just curious why you have 2 consecutive batch norm layer here. Also, is the encoder part for ImageNet and Cityscapes exactly the same? At least they seem to differ at this batch norm thing in the code.

iouEval

Hi, I don't understand "fpmult = x_onehot * (1-y_onehot-ignores) ". Why it's not "fpmult = x_onehot * (1-y_onehot)"?

Can demo be run on different sizes of images?

I've been trying to run the eval/eval_cityscape_color.py for a demoSequence for different sizes and I'm getting an error. Only 1024 x 2048 works. Is there anyway to run for different sizes?

File "/datadrive/pytorch/erfnet/erfnet_pytorch/eval/erfnet.py", line 21, in forward
    output = torch.cat([self.conv(input), self.pool(input)], 1)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 170 and 171 in dimension 3 at /pytorch/aten/src/THC/generic/THCTensorMath.cu:87

Do you report the top5 error for the encoder network?

Hi,

While I was reading your paper, the paper mentioned that the encoder was trained using two strategies: "from scratch", and "pretrained".

I was wondering what was the top 5 error for the encoder when training on ImageNet. Is it comparable (or better) than other efficient architectures like mobilenet or xception?

Add License

Hi,

Would you be able to add license to your code so others may use your work?

Thanks

transfer learning with erfnet

hi all,

I would like to do a transfer learning project by using erfnet, but i have some questions about the training process with erfnet.

I have collected my own data(training : val : test = 7k : 1.5k : 1.5k images), the dataset has 15 classes, if I would like to train the model without pre-trained ImageNet weights, when do I decide to terminate the encoder training process?

Thank you very much :)

how to test a pic?

when i use the code in the eval folder to test a pictures from VOC2012, i got some wrong results and every time, i got different results. i wonder if there are some trick you uncoment. if you can provide a demo in this github. thanks.

How to properly resume decoder training?

Hi,

I am trying to retrain the model on my own on the cityscapes dataset, but only using 2 classes. The encoder training works fine, but I have problems with decoder training which I think are caused by not attaching the trained encoder to the model properly.

As far as i understand, there are two possibilities for decoder training.

1. Use the encoder pretrained on imagenet

For this I used the following commands from the documentation

python main_binary.py --savedir erfnet_training1 --datadir /home/datasets/cityscapes/ --num-epochs 150 --batch-size 6 --decoder --pretrainedEncoder "../trained_models/erfnet_encoder_pretrained.pth.tar"
and
python main_binary.py --savedir erfnet_training1 --datadir /home/datasets/cityscapes/ --num-epochs 150 --batch-size 6 --decoder --pretrainedEncoder "../trained_models/erfnet_encoder_pretrained.pth.tar" --resume

I have not trained the model for all epochs, but the intermediate result looks fine (best Val-IoU after 85 epochs: 0.9495)

2. Use an encoder trained on cityscapes

The encoder training worked fine and resulted in a Val-IoU of 0.9471.
However, I couldn't find any documentation on how to attach the pretrained encoder for decoder training.
From the code, I got that --pretrainedEncoder flag is only for the imagenet encoder, and that I should use --state, so I used

python main_binary.py --savedir erfnet_training2 --datadir /home/datasets/cityscapes/ --num-epochs 150 --batch-size 6 --decoder --state "../save/erfnet_training2/model_best_enc.pth.tar"
and since the code says to only use --state for initializing:
python main_binary.py --savedir erfnet_training2 --datadir /home/datasets/cityscapes/ --num-epochs 150 --batch-size 6 --decoder --resume

However, this didn't work out and finished with a best Val-IoU of only 0.9441, so the whole network performs worse than just the encoder alone.
For comparison, I also tested training the decoder without initializing an encoder, which resulted in Val-IoU= 0.9461.

So what is the correct way of training the decoder after finished encoder training?
Especially the arguments in combination with the --resume flag, since I cannot train the model in one go due to hardware availability.

Thank You for your answer.

DownsamplerBlock

why there is " self.conv = nn.Conv2d(ninput, noutput-ninput, (3, 3), stride=2, padding=1, bias=True)"
(noutput-ninput, instead of noutput)
And as a result , the network will be (conv): Conv2d(3, 13, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)). It's different the paper.

How to calculate weights by processing dataset histogram?

Hi @Eromera , thanks for share.
In main.py for trainning, there is a TODO for calculating weights by processing dataset histogram. Can you tell something more about this? Such as how to create weight array using class balancing(calculate the total pixels or the number of polygon in json files as the number of classes)? What's the difference between encoder and decoder? Does it something like torch.utils.data.DataLoader() with sampler or batch_sampler option DataLoader?

Thanks!

Difference between the calculation of encoder weights and the decoder weights

hi, @Eromera
I want to train your model with a different dataset and different classes.

What is the difference between the calculation of encoder weights and the decoder weights?

I read in your article that you are using the next formula to determine the weights:

image

I saw that you have different values for the encoder and decoder weights:

image

If I use this formula to calculate the encoder weights, what should I do in order to calculate the decoder weights? Is there a connection between the encoder weights and the decoder weights? A multiplication factor or something ?

DownsamplerBlock torch cat inconsistent tensor size

hi Eromera, I'm using my dataset to train the model, but it seems something wrong with the torch.cat operation in DownsamplerBlock:

screen shot 2017-12-14 at 8 51 21 am

it seems that results from conv and pool doesnot match for odd image size...
Also, due to the repeated usage of this "DownsamplerBlock", all "cat" operation input must have the even size.
Also, I'm curious about this "cat" operation, does it contribute to the final performance? I mean why just use the conv or pool operation directly?

Computation of mIoU

Hi. Can your evaluation code iouEval.py calculate the miou between 'pred' and 'gt', if ‘pred’ has not been subjected to the max operation? Thanks.

Test speed

Hello ,could you help me how to calculate the segmentation speed.I don't know how to calculate the speed.

Error when I train my own dataset

Hi, Thanks for you share.
I train the model on CityScape Dateset and get the results which the paper show. I want to train the model on my own dataset, but met some issue.
When I train on 2 classes(including background), and change NUM_CLASSES=20(same with original code), the train process work find but the predict result look strange:
arriveroom_000002_000030_leftimg8bit

when I change NUM_CLASSES=20 and def __init__(self, nClasses, ignoreIndex=0) in iouEval.py(because my background is 0), in the encode val stage:

----- VALIDATING - EPOCH 1 ----- Traceback (most recent call last): File "main.py", line 545, in <module> main(parser.parse_args()) File "main.py", line 499, in main model = train(args, model, True) #Train encoder File "main.py", line 334, in train iouEvalVal.addBatch(outputs.max(1)[1].unsqueeze(1).data, targets.data) File "/media/holly/Code/Segmentation/ERFNet/erfnet_pytorch/train/iouEval.py", line 41, in addBatch x_onehot = x_onehot[:, :self.ignoreIndex] ValueError: result of slicing is an empty tensor

when I change NUM_CLASSES=2 and def __init__(self, nClasses, ignoreIndex=19), in the decode strage:

========== DECODER TRAINING =========== /DataSet/DSHolly/DataAll/SegmentationLikeCityScapes_room/leftImg8bit/train /DataSet/DSHolly/DataAll/SegmentationLikeCityScapes_room/leftImg8bit/val <class 'criterion.CrossEntropyLoss2d'> ----- TRAINING - EPOCH 1 ----- LEARNING RATE: 0.0005 THCudaCheck FAIL file=/pytorch/torch/lib/THCUNN/generic/Threshold.cu line=66 error=59 : device-side assert triggered THCudaCheck FAIL file=/pytorch/torch/lib/THCUNN/generic/Threshold.cu line=66 error=59 : device-side assert triggered Traceback (most recent call last): File "main.py", line 541, in <module> main(parser.parse_args()) File "main.py", line 514, in main model = train(args, model, False) #Train decoder File "main.py", line 260, in train loss.backward() File "/media/holly/Code/.pyenv/versions/Python3.6.3ERFNet/lib/python3.6/site-packages/torch/autograd/variable.py", line 156, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables) File "/media/holly/Code/.pyenv/versions/Python3.6.3ERFNet/lib/python3.6/site-packages/torch/autograd/__init__.py", line 98, in backward variables, grad_variables, retain_graph) File "/media/holly/Code/.pyenv/versions/Python3.6.3ERFNet/lib/python3.6/site-packages/torch/autograd/function.py", line 91, in apply return self._forward_cls.backward(self, *args) File "/media/holly/Code/.pyenv/versions/Python3.6.3ERFNet/lib/python3.6/site-packages/torch/nn/_functions/thnn/auto.py", line 187, in backward return (backward_cls.apply(input, grad_output, ctx.additional_args, ctx._backend, ctx.buffers, *tensor_params) + File "/media/holly/Code/.pyenv/versions/Python3.6.3ERFNet/lib/python3.6/site-packages/torch/nn/_functions/thnn/auto.py", line 219, in backward_cls_forward update_grad_input_fn(ctx._backend.library_state, input, grad_output, grad_input, *gi_args) RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/torch/lib/THCUNN/generic/Threshold.cu:66

Are there some tips if I want to train model on my own data set?
Thanks!!

TypeError: forward() missing 1 required positional argument: 'input' When training on CityScape DataSet

Hello, thanks for your share.
I want to train on CityScape DataSet using /train/main.py, but I usually met some error in encode stage when train or val like:

Traceback (most recent call last): File "main.py", line 538, in <module> main(parser.parse_args()) File "main.py", line 492, in main model = train(args, model, True) #Train encoder File "main.py", line 251, in train outputs = model(inputs, only_encode=enc) File "/media/holly/Code/.pyenv/versions/Python3.6.3ERFNet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 325, in __call__ result = self.forward(*input, **kwargs) File "/media/holly/Code/.pyenv/versions/Python3.6.3ERFNet/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 68, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/media/holly/Code/.pyenv/versions/Python3.6.3ERFNet/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 78, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/media/holly/Code/.pyenv/versions/Python3.6.3ERFNet/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 67, in parallel_apply raise output File "/media/holly/Code/.pyenv/versions/Python3.6.3ERFNet/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 42, in _worker output = module(*input, **kwargs) File "/media/holly/Code/.pyenv/versions/Python3.6.3ERFNet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 325, in __call__ result = self.forward(*input, **kwargs) TypeError: forward() missing 1 required positional argument: 'input'

I debug in pycharm and found that the images and labels were loaded correctly, but when in inputs = Variable(images), I found some error: cannot call .data on torch.Tensor. Did I really load the data correctly or I make something wrong in other place?

Beside, the NUM_CLASSES = 20 in CityScape DataSet, but when I train I also met an error in val:

----- VALIDATING - EPOCH 1 ----- VAL loss: 0.6922 (epoch: 1, step: 0) // Avg time/img: 0.2710 s ERROR: Unknown label with id 19

So, does the label range from 0~19 or using the trainId in labels.py?

I use Ubuntu16.04, python3.6.3 and cuda9.0.
Thanks!

How to use cityscape conversor for erfnet_pytorch?

In the README you that that : "you can download the cityscapes scripts and use the conversor to generate trainIds from labelIds". How do I use this script to convert the data? Sorry I am new to this and I am unsure of how to run this script.

A issue for cityscapes dataset

Hi , Thx your code , I am very interested in your work .
But I don't know exactly which dataset I need . There is so many lable in Cityscapes dataset.
I guess the dataset is leftImg8bit_trainvaltest.zip (11GB) and the corresponding ground truth gtFine_trainvaltest.zip (241MB),haha.
And my computer do not has a GPU,which has a poor performance.
It is indeed double happiness for me to have a pretrained net , which contains txt or xml that compute before. Thanks a lot!

get stuck running erfnet model on Jetson TX2

Hi, I want to run the erfnet model on Jeson tx2.

  • I installed pytorch without problems and cuda 9.0 with cudnn 7.0
    ex) i checked by 'import torch' , i checked by 'nvcc --version'

but When I try to run erfnet code, I got stuck

"RuntimeError : cuda runtime error(7) : too many resources requested for launch at /home/nvidia/pytorch/aten/src/THCUNN/im2col.h"

please help me!.

Image Normalization in Pre-processing for Cityscapes

Hi,

I was trying to train the network on the Cityscapes dataset, and while going through the main training script, I could not find any Image Normalization code in the pre-processing part (here).

Does that mean we don't train the network on normalized image or am I missing something?

Thanks in advance!

reimplement help

Our work is almost done, and we are ready to quote your paper. But we encountered difficulties in reproduction.
Can I ask a question, how long have you spent training the model?

How to use cityscape conversor for erfnet_pytorch?

In the README you that that : "you can download the cityscapes scripts and use the conversor to generate trainIds from labelIds". How do I use this script to convert the data? Sorry I am new to this and I am unsure of how to run this script.

There is not model_best.pth after training finish

Hi, Eromera. There is --epochs-save to save model every X epochs, but when I set it to none zero, It can't save the model as I expected. And another issue is that when the training is finished, some time it will save the model_best.pth and someting just get the model_encoder_best.pth.
Thanks!

A test example

Could you give an example (say a command like) as to how to use it to do segmentation on an input image? (I see there are examples for cityscape datasets). Thanks!

How to reproduce 72.2 IoU on validation set?

Hi, I tried training ERFNet with the default configuration (single GPU):

python main.py --savedir erfnet_training1 --datadir /home/datasets/cityscapes/ --num-epochs 150 --batch-size 6

But, the best validation mean IoU that I could get was just ~69.0% (instead of 72.2 achieved by your pretrained model). Could you point out what could be causing this disparity?

*Another quick question - Why is the evaluation done on images and labels both resized to height of 512 pixels, rather than original resolution of height of 1024 pixels?

Thanks!

data with cityscape doesnot work

hi, thanks for the code first, but it seems code doesnot work with cityscape dataset currently. I'm using python3.6 with pytorch version: 0.2.0_4, it crashed after several training steps with:

THCudaCheck FAIL device-side assert triggered, update_grad)input_fn()
RuntimeError: cuda runtime error(59): device-side assert triggered at /opt/conda/conda-bld/pytorch/work/torch/lib/THCUNN/generic/Threshold.cu:66

Result

Has anyone run this code? How does it work

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.