pavlosmelissinos / enet-keras Goto Github PK

A keras implementation of ENet (abandoned for the foreseeable future)

License: MIT License

Python 93.68% Makefile 0.53% C 5.79%

enet-keras's Introduction

ENet-keras

This is an implementation of ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation, ported from ENet-training (lua-torch) to keras.

Installation

Get code

git clone https://github.com/PavlosMelissinos/enet-keras.git
cd enet-keras

Setup environment

Dependencies

On poetry: poetry install

On Anaconda/miniconda: conda env create -f environment.yml

On pip: pip install -r requirements.txt

Set up data/model

make setup

The setup script only sets up some directories and converts the model to an appropriate format.

Usage

Train on MS-COCO

make train

Remaining tasks

enet-keras's People

Contributors

Stargazers

Watchers

enet-keras's Issues

MIT license & submit to Keras?

Thanks for putting this up, it looks very well written!

Could you consider the MIT license for this? This is because it is the same license Keras itself uses, and it would let people use it as they like. Here it is:

The MIT License (MIT)

Copyright (c) <year> <copyright holders>

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Also, a pull request of this code to the official keras-contrib repository as described in the Keras CONTRIBUTING.md, particularly the coco loader, would certainly be welcome if you are interested.

If so, I'd also be happy to add a couple of elements from my own coco script which handles downloading the dataset, plus extending it with a cocostuff option.

Output shows no segmentation on a test image

I have the following function get_model() which returns the enet model with weights loaded from torch_enet.pkl. The functions build() and transfer_weights() are from src/test.py.

def get_model(num_class):
    nc = num_class    # number of classes
    dw = 256
    dh = 256

    autoencoder, model_name = build(nc=nc, w=dw, h=dh)

    weights_fname = "trained_segmenter_weights.hdf5"

    if os.path.exists(weights_fname):
        autoencoder.load_weights(weights_fname)
    else:
        autoencoder = transfer_weights(model=autoencoder)
        autoencoder.save_weights(weights_fname)

    return autoencoder

I created a model with 11 classes by calling get_model(11). I fed the image 2015-11-08T13.52.54.655-0000011482.jpg from SUNRGBD dataset. The model gave a prediction tensor which I reshaped to (256, 256, 11). To visualize the predictions, I used the following function to save that tensor as an image:

def save_output(pred):
    h, w , nc = pred.shape
    print(h, w, nc)  # Prints: 256 256 11

    colors = [(random.randint(0, 255), random.randint(0, 255), random.randint(0, 255))
              for i in range(nc)
             ]
    output = np.zeros((h, w, 3))

    for i in range(h):
        for j in range(w):
            vals = pred[i, j, :].ravel().tolist()
            pos = vals.index(max(vals))
            output[i, j] = colors[pos]

    out_f = "pred_output.jpg"
    ret = cv2.imwrite(out_f, output)

The output shows almost random assignment of colors and there's no visible segmentation at all.

The input and the corresponding segmented output can be found below:

Speed up inference

I'm back again. I have some pretty decent results on the Camvid dataset now thanks to your help. I have a question for you, that you might be able to anwer. I'm not able to reproduce the fast inference time. In the article they state that:

"For inference we merge batch normalization and dropout layers into the convolutional
filters, to speed up all networks."

Do you know where I can find any related litterature on how to do so or perhabs you know how they do it?

Bad results - Investigate reason

Metric	IoU	area	maxDets	Result
Average Precision	0.50:0.95	all	100	0.001
Average Precision	0.50	all	100	0.004
Average Precision	0.75	all	100	0.000
Average Precision	0.50:0.95	small	100	0.000
Average Precision	0.50:0.95	medium	100	0.000
Average Precision	0.50:0.95	large	100	0.004
Average Recall	0.50:0.95	all	1	0.005
Average Recall	0.50:0.95	all	10	0.005
Average Recall	0.50:0.95	all	100	0.005
Average Recall	0.50:0.95	small	100	0.000
Average Recall	0.50:0.95	medium	100	0.001
Average Recall	0.50:0.95	large	100	0.019

This is using the official mscoco script.

Setup as: full image as input, each pixel gets classified using a one hot vector with a size of 81, 0 to 80 inclusive, that correspond to the actual category ids in MS-COCO. More specifically, index 0 is background, ..., index 12 corresponds to class id 13 (stop sign), ..., and index 80 is in fact class 90 (toothbrush). Output is the full image, not a crop. Then a script is used to separate the pixels of each detected object. No classes were used in the evalCOCO.py script (useCats = False).

These are really bad scores, and at the moment I have no idea why it's like that. I'll push the changes soon.

Which script do you use for evaluation @athundt ? If you have a working version maybe I should just replace mine with it. Does this work for mscoco?

UpSampling vs MaxUnpooling

Thanks for your work on getting ENet in Keras! Have you found any increases in accuracy from using MaxUnpooling instead of just naive UpSampling?

questioins about the bottleneck definition

src/models/from_torch.py needs to be updated

I noticed the from_torch.py was not working on my computer. It seems to be looking for the pretrained network in the wrong directory. I made the following changes to work with the current state of the git:

Starting from line 63 on src/models/from_torch.py

if name == "main":
DIR_PATH = os.path.dirname(os.path.realpath(file))
torch_model = os.path.join(DIR_PATH, os.pardir, os.pardir, 'pretrained', 'model-best.net')
weights = from_torch(torch_model=torch_model)
# weights = [module['weight'] for module in all_enet_modules]
with open('./pretrained/torch_enet.pkl', 'wb') as fout:
pkl.dump(obj=weights, file=fout)

End-to-end training

I see you are training the model end-to-end style, while, in the original paper, they train the encoder first in order to categorize downsampled regions and then they append the decoder afterwards. What are you thoughts on this? Do you have any intuition why it might be better to train it encoder-decoder style rather than end-to-end?

[Feature request] Multi-scale inputs (arbitrary shape)

Allow for arbitrary input size per batch. Define spatial dimensions of the input as None and let every batch define its shape during runtime.

Can't train on MSCOCOReduced

MSCOCOReduced has a bug that crashes the program during training. Reproduce by running train.py using the following solver.json:

{
  "model_name": "enet",
  "epochs": 100,
  "batch_size": 8,
  "completed_epochs": 0,
  "dh": 256,
  "dw": 256,
  "skip": 0,
  "resize_mode": "stretch",
  "instance_mode": true,
  "dataset_name": "mscoco_reduced"
}

The problem is probably related to MSCOCO assuming that the actual classes are depicted exactly in self._coco. Directly removing the extra classes from self._coco should have worked, however that is not the case for some reason.

pretrained file and enet_unpooling_best.h5 missing?

Hi,
when I run the predict.py and train.sh, it said that can't find "enet_unpooling_best.h5 " and "pretrained/torch_enet.pkl". And I can't find the two files in the folder either. So what's wrong with it?
Thanks!

about the label format

I have my own dataset
There are 34774 png input images and 34774 png labels images
input images are in shape (576, 576, 3)
label images are in shape(576, 576)
Every pixel in label image has a class number
and there are 6 classes
I don't quite understand how you deal with the MSCOCO annotations
and what you do in "flow()" function in datasets.py,
So what should I do on the"flow()"function , thanks!

usage

It seems the coco script requires files that don't exist in the repository and for which there are no generators?

in load_data():

     img_txt = os.path.join(data_dir, data_type, 'images.txt')
    lbl_txt = os.path.join(data_dir, data_type, 'labels.txt')

Also, note that if the class values are serialized into a single image then data will be lost, categorical classes are most appropriate since a single image can be multiple classes.

Add support for more datasets

CamVid, Cityscapes or SUN, which should help with #11 (cityscapes is the most recent one).

cocostuff, as a first stress test for #8

Fix data loading from disk

Now that the project uses the MSCOCO class to load the dataset from the annotation json file, the way files are loaded from disk has become more or less obsolete and unnecessary. It should be converted to a dataset class, like MSCOCO currently is ('json' mode works fine), so that they can share functionality.

Objections and/or suggestions are welcome.

COCO labels

Do you represent each label as separate channels in the dataset loader?

I ask because there is a lot of class overlap in COCO and the z order isn't always correct. For example the table category often blocks out all the objects on top of the table if you put it all into a single categorical channel, rather than a one-hot (multiple-hot?) encoding.

Where to download the pretrained/torch_enet.pkl file?

 ./train.sh 
Using TensorFlow backend.
solver json: /home/rvl/code/enet-keras/config/solver.json
Preparing to train on mscoco data...
ENet has found no compatible pretrained weights! Skipping weight transfer...
Traceback (most recent call last):
  File "src/train.py", line 141, in <module>
    train(solver=solver)
  File "src/train.py", line 82, in train
    autoencoder = model.transfer_weights(autoencoder)
  File "/home/rvl/code/enet-keras/src/models/enet_unpooling/model.py", line 47, in transfer_weights
    with open(weights, 'rb') as fin:
IOError: [Errno 2] No such file or directory: '/home/rvl/code/enet-keras/src/models/enet_unpooling/../../../models/pretrained/torch_enet.pkl'

How to save checkpoint during training?

Thanks for sharing code!

When I run train.py, I found it took a lot of time. I interrupted it during training and found there was not a checkpoint has been saved.
I noticed that there is a callbacks() function in the train.py. I guess this is used for saving checkpoint. But I didn't see it was called during training.
So how could I save the checkpoint during training periodically? For example, I want to save the checkpoint after trained every 1000 images.

Looking forward to your response. Thank you so much.
@PavlosMelissinos

SUNRGBD2Dseg = hdf5storage.loadmat(filename)

您好，我想咨询下为什么卡在这句SUNRGBD2Dseg = hdf5storage.loadmat(filename)将近一个小时还没有进展，谢谢

Fix dependencies

Pip throws and error on the line containing: gsutil=4.46 which seems to be a typo in requirements.txt . The same issue appears in environment.yml when using conda to install the dependencies.
Other than that, many of these dependencies seem to be on out-dated versions. Is there any other recommended way to get the installation to work with more recent packages?

OpenCV dependency removal

I'm interested in throwing away every import cv2 line because OpenCV has mostly been a pain for no reason. The attempt can be seen in the no_opencv branch

The strongest contenders are pillow and scikit-image.

I will be documenting my experiences here mostly as notes to myself and possibly to fuel a discussion.

Library	Wraps NumPy	channel order	dimension order
OpenCV	yes	bgr	(width, height)
Pillow	no	rgb	(width, height)
skimage	yes	rgb	(height, width)

Any suggestions are welcome!

MaxUnpooling in the decoder

Have you tried to implement the MaxUnpooling operation that the original ENet uses instead of using the UpSampling Layer?

Replace conda with poetry

I recently found out that conda does not always cooperate well when there are two or more people working on the same repo, especially over time (sometimes the environment creation is not reproducible).

My go to environment manager nowadays is poetry which seems to solve (some of) these problems.

Switch this project to poetry as well to make setup more consistent.

datasets.py standardization

Better couple properties of the dataset together. Conversions between IDS <-> CIDS <-> CATEGORIES <-> PALETTE should be done in a different way.
The only difference between MSCOCO and MSCOCOReduced lies in the above. Besides that, every operation is the same, therefore there should be no need to override the constructor (or any other function for that matter). Maybe I could modify the MSCOCO constructor to accept a dictionary and do any necessary pruning (like removing categories from the dataset) in there instead.

Suggestions are welcome as always.