Giter Club home page Giter Club logo

enet-keras's People

Contributors

ahundt avatar dependabot[bot] avatar pavlosmelissinos avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

enet-keras's Issues

src/models/from_torch.py needs to be updated

I noticed the from_torch.py was not working on my computer. It seems to be looking for the pretrained network in the wrong directory. I made the following changes to work with the current state of the git:

Starting from line 63 on src/models/from_torch.py

if name == "main":
DIR_PATH = os.path.dirname(os.path.realpath(file))
torch_model = os.path.join(DIR_PATH, os.pardir, os.pardir, 'pretrained', 'model-best.net')
weights = from_torch(torch_model=torch_model)
# weights = [module['weight'] for module in all_enet_modules]
with open('./pretrained/torch_enet.pkl', 'wb') as fout:
pkl.dump(obj=weights, file=fout)

Fix data loading from disk

Now that the project uses the MSCOCO class to load the dataset from the annotation json file, the way files are loaded from disk has become more or less obsolete and unnecessary. It should be converted to a dataset class, like MSCOCO currently is ('json' mode works fine), so that they can share functionality.

Objections and/or suggestions are welcome.

Replace conda with poetry

I recently found out that conda does not always cooperate well when there are two or more people working on the same repo, especially over time (sometimes the environment creation is not reproducible).

My go to environment manager nowadays is poetry which seems to solve (some of) these problems.

Switch this project to poetry as well to make setup more consistent.

OpenCV dependency removal

I'm interested in throwing away every import cv2 line because OpenCV has mostly been a pain for no reason. The attempt can be seen in the no_opencv branch

The strongest contenders are pillow and scikit-image.

I will be documenting my experiences here mostly as notes to myself and possibly to fuel a discussion.

Library Wraps NumPy channel order dimension order
OpenCV yes bgr (width, height)
Pillow no rgb (width, height)
skimage yes rgb (height, width)

Any suggestions are welcome!

Where to download the pretrained/torch_enet.pkl file?

 ./train.sh 
Using TensorFlow backend.
solver json: /home/rvl/code/enet-keras/config/solver.json
Preparing to train on mscoco data...
ENet has found no compatible pretrained weights! Skipping weight transfer...
Traceback (most recent call last):
  File "src/train.py", line 141, in <module>
    train(solver=solver)
  File "src/train.py", line 82, in train
    autoencoder = model.transfer_weights(autoencoder)
  File "/home/rvl/code/enet-keras/src/models/enet_unpooling/model.py", line 47, in transfer_weights
    with open(weights, 'rb') as fin:
IOError: [Errno 2] No such file or directory: '/home/rvl/code/enet-keras/src/models/enet_unpooling/../../../models/pretrained/torch_enet.pkl'

COCO labels

Do you represent each label as separate channels in the dataset loader?

I ask because there is a lot of class overlap in COCO and the z order isn't always correct. For example the table category often blocks out all the objects on top of the table if you put it all into a single categorical channel, rather than a one-hot (multiple-hot?) encoding.

Speed up inference

I'm back again. I have some pretty decent results on the Camvid dataset now thanks to your help. I have a question for you, that you might be able to anwer. I'm not able to reproduce the fast inference time. In the article they state that:

"For inference we merge batch normalization and dropout layers into the convolutional
filters, to speed up all networks."

Do you know where I can find any related litterature on how to do so or perhabs you know how they do it?

about the label format

I have my own dataset
There are 34774 png input images and 34774 png labels images
input images are in shape (576, 576, 3)
label images are in shape(576, 576)
Every pixel in label image has a class number
and there are 6 classes
I don't quite understand how you deal with the MSCOCO annotations
and what you do in "flow()" function in datasets.py,
So what should I do on the"flow()"function , thanks!

Fix dependencies

Pip throws and error on the line containing: gsutil=4.46 which seems to be a typo in requirements.txt . The same issue appears in environment.yml when using conda to install the dependencies.
Other than that, many of these dependencies seem to be on out-dated versions. Is there any other recommended way to get the installation to work with more recent packages?

MaxUnpooling in the decoder

Have you tried to implement the MaxUnpooling operation that the original ENet uses instead of using the UpSampling Layer?

usage

It seems the coco script requires files that don't exist in the repository and for which there are no generators?

in load_data():

     img_txt = os.path.join(data_dir, data_type, 'images.txt')
    lbl_txt = os.path.join(data_dir, data_type, 'labels.txt')

Also, note that if the class values are serialized into a single image then data will be lost, categorical classes are most appropriate since a single image can be multiple classes.

End-to-end training

I see you are training the model end-to-end style, while, in the original paper, they train the encoder first in order to categorize downsampled regions and then they append the decoder afterwards. What are you thoughts on this? Do you have any intuition why it might be better to train it encoder-decoder style rather than end-to-end?

Output shows no segmentation on a test image

I have the following function get_model() which returns the enet model with weights loaded from torch_enet.pkl. The functions build() and transfer_weights() are from src/test.py.

def get_model(num_class):
    nc = num_class    # number of classes
    dw = 256
    dh = 256

    autoencoder, model_name = build(nc=nc, w=dw, h=dh)

    weights_fname = "trained_segmenter_weights.hdf5"

    if os.path.exists(weights_fname):
        autoencoder.load_weights(weights_fname)
    else:
        autoencoder = transfer_weights(model=autoencoder)
        autoencoder.save_weights(weights_fname)

    return autoencoder

I created a model with 11 classes by calling get_model(11). I fed the image 2015-11-08T13.52.54.655-0000011482.jpg from SUNRGBD dataset. The model gave a prediction tensor which I reshaped to (256, 256, 11). To visualize the predictions, I used the following function to save that tensor as an image:

def save_output(pred):
    h, w , nc = pred.shape
    print(h, w, nc)  # Prints: 256 256 11

    colors = [(random.randint(0, 255), random.randint(0, 255), random.randint(0, 255))
              for i in range(nc)
             ]
    output = np.zeros((h, w, 3))

    for i in range(h):
        for j in range(w):
            vals = pred[i, j, :].ravel().tolist()
            pos = vals.index(max(vals))
            output[i, j] = colors[pos]

    out_f = "pred_output.jpg"
    ret = cv2.imwrite(out_f, output)

The output shows almost random assignment of colors and there's no visible segmentation at all.

The input and the corresponding segmented output can be found below:
2015-11-08t13 52 54 655-0000011482
out_2015-11-08t13 52 54 655-0000011482

How to save checkpoint during training?

Thanks for sharing code!

When I run train.py, I found it took a lot of time. I interrupted it during training and found there was not a checkpoint has been saved.
I noticed that there is a callbacks() function in the train.py. I guess this is used for saving checkpoint. But I didn't see it was called during training.
So how could I save the checkpoint during training periodically? For example, I want to save the checkpoint after trained every 1000 images.

Looking forward to your response. Thank you so much.
@PavlosMelissinos

datasets.py standardization

  1. Better couple properties of the dataset together. Conversions between IDS <-> CIDS <-> CATEGORIES <-> PALETTE should be done in a different way.

  2. The only difference between MSCOCO and MSCOCOReduced lies in the above. Besides that, every operation is the same, therefore there should be no need to override the constructor (or any other function for that matter). Maybe I could modify the MSCOCO constructor to accept a dictionary and do any necessary pruning (like removing categories from the dataset) in there instead.

Suggestions are welcome as always.

Can't train on MSCOCOReduced

MSCOCOReduced has a bug that crashes the program during training. Reproduce by running train.py using the following solver.json:

{
  "model_name": "enet",
  "epochs": 100,
  "batch_size": 8,
  "completed_epochs": 0,
  "dh": 256,
  "dw": 256,
  "skip": 0,
  "resize_mode": "stretch",
  "instance_mode": true,
  "dataset_name": "mscoco_reduced"
}

The problem is probably related to MSCOCO assuming that the actual classes are depicted exactly in self._coco. Directly removing the extra classes from self._coco should have worked, however that is not the case for some reason.

MIT license & submit to Keras?

Thanks for putting this up, it looks very well written!

Could you consider the MIT license for this? This is because it is the same license Keras itself uses, and it would let people use it as they like. Here it is:

The MIT License (MIT)

Copyright (c) <year> <copyright holders>

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Also, a pull request of this code to the official keras-contrib repository as described in the Keras CONTRIBUTING.md, particularly the coco loader, would certainly be welcome if you are interested.

If so, I'd also be happy to add a couple of elements from my own coco script which handles downloading the dataset, plus extending it with a cocostuff option.

Bad results - Investigate reason

Metric IoU area maxDets Result
Average Precision 0.50:0.95 all 100 0.001
Average Precision 0.50 all 100 0.004
Average Precision 0.75 all 100 0.000
Average Precision 0.50:0.95 small 100 0.000
Average Precision 0.50:0.95 medium 100 0.000
Average Precision 0.50:0.95 large 100 0.004
Average Recall 0.50:0.95 all 1 0.005
Average Recall 0.50:0.95 all 10 0.005
Average Recall 0.50:0.95 all 100 0.005
Average Recall 0.50:0.95 small 100 0.000
Average Recall 0.50:0.95 medium 100 0.001
Average Recall 0.50:0.95 large 100 0.019

This is using the official mscoco script.

Setup as: full image as input, each pixel gets classified using a one hot vector with a size of 81, 0 to 80 inclusive, that correspond to the actual category ids in MS-COCO. More specifically, index 0 is background, ..., index 12 corresponds to class id 13 (stop sign), ..., and index 80 is in fact class 90 (toothbrush). Output is the full image, not a crop. Then a script is used to separate the pixels of each detected object. No classes were used in the evalCOCO.py script (useCats = False).

These are really bad scores, and at the moment I have no idea why it's like that. I'll push the changes soon.

Which script do you use for evaluation @athundt ? If you have a working version maybe I should just replace mine with it. Does this work for mscoco?

UpSampling vs MaxUnpooling

Thanks for your work on getting ENet in Keras! Have you found any increases in accuracy from using MaxUnpooling instead of just naive UpSampling?

pretrained file and enet_unpooling_best.h5 missing?

Hi,
when I run the predict.py and train.sh, it said that can't find "enet_unpooling_best.h5 " and "pretrained/torch_enet.pkl". And I can't find the two files in the folder either. So what's wrong with it?
Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.