Giter Club home page Giter Club logo

object-localization's Introduction

object-localization

This project shows how to localize objects in images by using simple convolutional neural networks.

Dataset

Before getting started, we have to download a dataset and generate a csv file containing the annotations (boxes).

  1. Download The Oxford-IIIT Pet Dataset
  2. Download The Oxford-IIIT Pet Dataset Annotations
  3. tar xf images.tar.gz
  4. tar xf annotations.tar.gz
  5. mv annotations/xmls/* images/
  6. python3 generate_dataset.py

Single-object detection

Example 1: Finding dogs/cats

Architecture

First, let's look at YOLOv2's approach:

  1. Pretrain Darknet-19 on ImageNet (feature extractor)
  2. Remove the last convolutional layer
  3. Add three 3 x 3 convolutional layers with 1024 filters
  4. Add a 1 x 1 convolutional layer with the number of outputs needed for detection

We proceed in the same way to build the object detector:

  1. Choose a model from Keras Applications i.e. feature extractor
  2. Remove the dense layer
  3. Freeze some/all/no layers
  4. Add one/multiple/no convolution block (or _inverted_res_block for MobileNetv2)
  5. Add a convolution layer for the coordinates

The code in this repository uses MobileNetv2, because it is faster than other models and the performance can be adapted. For example, if alpha = 0.35 with 96x96 is not good enough, one can just increase both values (see here for a comparison). If you use another architecture, change preprocess_input.

  1. python3 example_1/train.py
  2. Adjust the WEIGHTS_FILE in example_1/test.py (given by the last script)
  3. python3 example_1/test.py

Result

In the following images red is the predicted box, green is the ground truth:

Image 1

Image 2

Example 2: Finding dogs/cats and distinguishing classes

This time we have to run the scripts example_2/train.py and example_2/test.py.

Changes

In order to distinguish between classes, we have to modify the loss function. I'm using here w_1*log((y_hat - y)^2 + 1) + w_2*FL(p_hat, p) where w_1 = w_2 = 1 are two weights and FL(p_hat, p) = -(0.9(1 - p_hat)^2 p*log(p_hat) + 0.1*p_hat^2(1 - p)log(1-p_hat)) (focal loss).

Instead of using all 37 classes, the code will only output class 0 (contains only class 0) or class 1 (contains class 1 to 36). However, it is easy to extend this to more classes (use categorical cross entropy instead of focal loss and try out different weights).

Multi-object detection

Example 3: Segmentation-like detection

Architecture

In this example, we use a skip-net architecture similar to U-Net. For an in-depth explanation see my blog post.

Architecture

Result

Dog

Example 4: YOLO-like detection

Architecture

This example is based on the three YOLO papers. For an in-depth explanation see this blog post.

Result

Multiple dogs

Guidelines

Improve accuracy (IoU)

  • enable augmentations: see example_4 the same code can be added to the other examples
  • better augmentations: try out different values (flips, rotation etc.)
  • for MobileNetv1/2: increase ALPHA and IMAGE_SIZE in train_model.py
  • other architectures: increase IMAGE_SIZE
  • add more layers
  • try out other loss functions (MAE, smooth L1 loss etc.)
  • other optimizer: SGD with momentum 0.9, adjust learning rate
  • use a feature pyramid
  • read keras-team/keras#9965

Increase training speed

  • increase BATCH_SIZE
  • less layers, IMAGE_SIZE and ALPHA

Overfitting

  • If the new dataset is small and similar to ImageNet, freeze all layers.
  • If the new dataset is small and not similar to ImageNet, freeze some layers.
  • If the new dataset is large, freeze no layers.
  • read http://cs231n.github.io/transfer-learning/

object-localization's People

Contributors

lars76 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

object-localization's Issues

Unable to open file (unable to open file: name = 'model-0.29.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)

Hi,

I've run the test.py script, but I got an error:

Unable to open file (unable to open file: name = 'model-0.29.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)

I looked into my directory, but there isn't a file named like that. But I do have the files: 'model-0.44.h5' till 'model-0.51.h5'. Even when I changed the 'model-0.29.h5' into 'model-0.44.h5' in WEIGHTS_FILE in test.py and run again, it isn't working. Why this isn't working and how can I fix this? Should I have the file 'model-0.29.h5' into my directory? Should I change: save_best_only=TRUE into FALSE in the train.py script?

image

some images are corrupted

I see these two image files are corrupted each time I download them from the webpage:
beagle_116.jpg
chihuahua_121.jpg

Infer trained model on OpenVino

Is it possible to infer trained model in openvino or even in opencv dnn module, as the openvino supports standard object detection models like, mobilenet with SSD, yolo, mask-rccn,etc. and simple classification models with respective neural network architecture(ex. inception,mobilenet,etc), so is it possible to convert h5 keras models to convert into native tensorflow frozen graph and infer it on openvino.

IndexError: index 7 is out of bounds for axis 1 with size 7

Hi, I prepared the dataset and in starting the training I have this error. I am trying to understand what may come from but I'm having difficulties.

File "train.py", line 175, in getitem
batch_boxes[i, floor_y, floor_x, 0] = (y1 - y0) / image_height
IndexError: index 7 is out of bounds for axis 1 with size 1

What does STD and MEAN do?

Hi, I'm using your code as a jumping off point for learning about localiser development, and was just a bit confused as to what STD and MEAN are explicitly used for? I think it looks like some kind of normalisation operation, but not 100% sure. Any info you can give would be greatly appreciated.
Cheers.

Dataset problem

Since the xml files have been generated, then can directly pass in to the neural network and train it? Or else need to manipulate them into array or csv file beforehand?

loss start too high

Have you any solution for this problem?The loss start to 17 and not decrement.

Empty train.csv and validation.csv files

Hello Lars, I would consider myself as a beginner in CNNs and would be grateful if you could assist with the problem I have. I have downloaded dataset and put .jpg and .xml files in the dataset folder. I have decided to put 4 instances for now to test the localization but it didn't work. train.csv and
validation.csv files that are generated at the beginning when running generate_dataset.py are empty. Do you have any suggestions on why it won't work for me?

Kind regards, Arseniy

confusion in iou calculation

def iou(y_true, y_pred):
    xA = K.maximum(y_true[...,0], y_pred[...,0])
    yA = K.maximum(y_true[...,1], y_pred[...,1])
    xB = K.minimum(y_true[...,2], y_pred[...,2])
    yB = K.minimum(y_true[...,3], y_pred[...,3])

    interArea = (xB - xA) * (yB - yA)

    boxAArea = (y_true[...,2] - y_true[...,0]) * (y_true[...,3] - y_true[...,1])
    boxBArea = (y_pred[...,2] - y_pred[...,0]) * (y_pred[...,3] - y_pred[...,1])
    return K.clip(interArea / (boxAArea + boxBArea - interArea + K.epsilon()), 0, 1)

in the above function, y_true and y_pred are directly coming from the train data and the model respectively - which are scaled coordinates. Shouldn't the iou function convert them back to xmin, ymin, xmax,ymax and then calculate the iou?

train_model.py

Reloaded modules: train_model
Traceback (most recent call last):

File "", line 1, in
runfile('C:/Users/huixu/Desktop/object-localization-0812/train_model.py', wdir='C:/Users/hhh/Desktop/object-localization-0812')

File "C:\Localdata\Python3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile
execfile(filename, namespace)

File "C:\Localdata\Python3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

File "C:/Users/hhh/Desktop/object-localization-0812/train_model.py", line 122, in
main()

File "C:/Users/hhh/Desktop/object-localization-0812/train_model.py", line 118, in main
train(model, EPOCHS, BATCH_SIZE, PATIENCE, TRAIN_CSV, VALIDATION_CSV, MEAN, STD)

File "C:/Users/hhh/Desktop/object-localization-0812/train_model.py", line 102, in train
train_datagen = DataSequence(train_csv, batch_size, mean, std)

File "C:/Users/hhh/Desktop/object-localization-0812/train_model.py", line 47, in init
for index, (path, x0, y0, x1, y1) in enumerate(reader):

ValueError: not enough values to unpack (expected 5, got 0)

Modifying the model for multi object localization in an image

Hi. thanks for your amazing work. this model works very nice. But i have a question. when i feed the model with an image with two or three cats, I have only one box and its accuracy is very low. How can I modify the model for multi detection? what parts of the train_model.py or evaluate_performance.py should be changed for that? Your response will be highly appreciated.

Model?

What model did you use, is that Faster_RCNN or other ?I was trying to do object localization using Faster_RCNN and i only prepare datasets with annotation but i don't know how to extract features from annotation file as an input to fed to region proposal layer?Can you help please?

TypeError: unsupported operand type(s) for -: 'tuple' and 'int'

When I run generate_dataset.py, I get this error. How can I fix this?

File "generate_dataset.py", line 115, in
main()
File "generate_dataset.py", line 62, in main
print("class {}: {} images".format(output[j-1][-2], i))
TypeError: unsupported operand type(s) for -: 'tuple' and 'int'

Program getting terminated after a particular error

Hi Lars,

I have tried running the code on a dataset for 100 epochs. After around 50 epochs, the error reduced from 2000(initially) to 25 but it stopped suddenly. How many times I have tried, I end up with same result. I couldn't figure out why it is stopping in between.

Could you help me in resolving this issue?

The exact error is placed in a screenshot. It is saying the val_iou didn't improve from 0.

screenshot from 2019-02-09 20-08-47

Also, how to use GPU to run the python script. Currently it has multi-threading options which use CPU but I have a GPU supported system. How to change the access?

Thanks in advance lars
Srinath

Error message from evaluate_performance.py

runfile('/Users/huinaxu/Desktop/object-localization-master/evaluate_performance.py', wdir='/Users/huinaxu/Desktop/object-localization-master')
Reloaded modules: generate_dataset, train_model
IoU on training data
2949/29499497/294911/294916/294920/294925/294930/294934/294939/294944/294949/294954/294958/294963/294968/294971/294976/294981/294986/294991/294996/2949101/2949106/2949110/2949114/2949120/2949123/2949126/2949132/2949135/2949138/2949143/2949146/2949149/2949153/2949158/2949163/2949168/2949171/2949174/2949177/2949180/2949183/2949186/2949189/2949193/2949198/2949203/2949208/2949213/2949217/2949222/2949227/2949230/2949233/2949236/2949239/2949242/2949247/2949252/2949257/2949262/2949266/2949271/2949276/2949281/2949284/2949287/2949290/2949293/2949296/2949299/2949302/2949306/2949311/2949315/2949319/2949323/2949328/2949333/2949338/2949343/2949346/2949349/2949354/2949358/2949362/2949366/2949370/2949373/2949376/2949380/2949383/2949386/2949390/2949395/2949400/2949404/2949410/2949415/2949419/2949424/2949427/2949431/2949435/2949439/2949443/2949447/2949451/2949456/2949461/2949466/2949471/2949476/2949481/2949486/2949491/2949495/2949499/2949504/2949509/2949513/2949517/2949521/2949524/2949527/2949530/2949534/2949539/2949543/2949547/2949553/2949558/2949563/2949568/2949572/2949577/2949582/2949587/2949592/2949596/2949601/2949608/2949612/2949617/2949622/2949627/2949632/2949637/2949642/2949646/2949649/2949653/2949659/2949664/2949669/2949674/2949679/2949682/2949685/2949688/2949691/2949695/2949699/2949703/2949708/2949712/2949715/2949719/2949723/2949727/2949732/2949737/2949741/2949745/2949749/2949753/2949758/2949763/2949767/2949771/2949775/2949778/2949782/2949785/2949788/2949791/2949796/2949801/2949806/2949810/2949814/2949818/2949822/2949827/2949832/2949837/2949841/2949847/2949851/2949855/2949860/2949864/2949868/2949873/2949878/2949882/2949886/2949890/2949894/2949900/2949904/2949908/2949912/2949915/2949920/2949924/2949928/2949932/2949937/2949942/2949947/2949951/2949955/2949959/2949963/2949967/2949970/2949972/2949976/2949980/2949984/2949988/2949992/2949996/29491000/29491004/29491008/29491013/29491017/29491021/29491025/29491029/29491033/29491037/29491041/29491046/29491050/29491054/29491058/29491062/29491066/29491070/29491073/29491076/29491079/29491084/29491088/29491091/29491095/29491099/29491102/29491106/29491110/29491114/29491118/29491122/29491126/29491130/29491134/29491138/29491142/29491146/29491150/29491154/29491158/29491162/29491166/29491170/29491174/29491178/29491182/29491186/29491190/29491194/29491198/29491202/29491205/29491210/29491214/29491219/29491223/29491227/29491231/29491235/29491240/29491244/29491248/29491252/29491256/29491260/29491264/29491268/29491272/29491276/29491280/29491284/29491294/29491311/29491336/29491367/29491411/29491478/29491582/29491739/29491996/29492402/2949
Avg IoU: 0.5732865147951192
Highest IoU: 0.9722222222222222
Lowest IoU: 0.0

IoU on validation data
737/737103/737
Avg IoU: 0.5246796910795323
Highest IoU: 0.9227614490772386
Lowest IoU: 0.0

Trying out unscaled image
images/Egyptian_Mau_167.jpg
Traceback (most recent call last):

File "", line 1, in
runfile('/Users/---/Desktop/object-localization-master/evaluate_performance.py', wdir='/Users/---/Desktop/object-localization-master')

File "/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 705, in runfile
execfile(filename, namespace)

File "/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

File "/Users/---/Desktop/object-localization-master/evaluate_performance.py", line 119, in
main()

File "/Users/h---/Desktop/object-localization-master/evaluate_performance.py", line 108, in main
pred = predict_image(path, model)

File "/Users/---/Desktop/object-localization-master/evaluate_performance.py", line 42, in predict_image
if im.shape[0] != IMAGE_SIZE:

AttributeError: 'NoneType' object has no attribute 'shape'

Issue found from all the commits. Might be my tensorflow version issue?

HHH-MBP:object-localization-master1 huinaxu$ python3 example_1/train_model.py
/Users/hhh/anaconda3/lib/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
Using TensorFlow backend.
Traceback (most recent call last):
File "example_1/train_model.py", line 130, in
main()
File "example_1/train_model.py", line 125, in main
model = create_model(IMAGE_SIZE, ALPHA)
File "example_1/train_model.py", line 75, in create_model
model = MobileNetV2(input_shape=(size, size, 3), include_top=False, alpha=alpha)
File "/Users/hhh/anaconda3/lib/python3.6/site-packages/keras_applications/mobilenet_v2.py", line 355, in MobileNetV2
expansion=1, block_id=0)
File "/Users/hhh/anaconda3/lib/python3.6/site-packages/keras_applications/mobilenet_v2.py", line 461, in _inverted_res_block
in_channels = inputs._keras_shape[-1]
AttributeError: 'Tensor' object has no attribute '_keras_shape'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.