Giter Club home page Giter Club logo

efficientdet's People

Contributors

dependabot[bot] avatar fsx950223 avatar noahtren avatar xuannianz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

efficientdet's Issues

TypeError: 'NoneType' object is not an iterator

File "train.py", line 373, in
main()
File "train.py", line 368, in main
validation_data=validation_generator
File "/home/cui/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1433, in fit_generator
steps_name='steps_per_epoch')
File "/home/cui/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_generator.py", line 331, in model_iteration
callbacks.on_epoch_end(epoch, epoch_logs)
File "/home/cui/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/keras/callbacks.py", line 311, in on_epoch_end
callback.on_epoch_end(epoch, logs)
File "/home/cui/soft/detection/EfficientDet-master/eval/pascal.py", line 75, in on_epoch_end
visualize=False
File "/home/cui/soft/detection/EfficientDet-master/eval/common.py", line 194, in evaluate
visualize=visualize)
File "/home/cui/soft/detection/EfficientDet-master/eval/common.py", line 81, in _get_detections
for i in progressbar.ProgressBar(range(generator.size()), prefix='Running network: '):
File "/home/cui/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/progressbar/bar.py", line 429, in next
value = next(self._iterable)
TypeError: 'NoneType' object is not an iterator
terminate called without an active exception

A puzzle about features returned by EfficientNet in efficientnet.py

According to function EfficientNet defined in efficientnet.py, the features returned by it has five elements corresponding to five different stages of efficientnet respectively. Why is the first element not the output of the first convolution operation followed by BN and swish?

error when loading model with tf.keras.models.load_model

When loading model with this, I get the following erro.
keras.models.load_model(filepath='.h5',custom_objects=custom_objects)
with
custom_objects = {
'BatchNormalization': layers_new.BatchNormalization,
'swish' : efficientnet.get_swish(),
'FixedDropout' :
efficientnet.get_dropout(backend=keras.backend,layers=keras.layers,models=keras.models,utils=keras.utils),
'wBiFPNAdd' : layers_new.wBiFPNAdd,
}


TypeError Traceback (most recent call last)
in ()
1 keras.models.load_model(filepath='ckpts/ckpts_B1/freeze-backbone-false-1e-6-bs2-rotate-flip-new/csv_06_0.6883_0.7987.h5',
----> 2 custom_objects=custom_objects)

9 frames
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/save.py in load_model(filepath, custom_objects, compile)
141 if (h5py is not None and (
142 isinstance(filepath, h5py.File) or h5py.is_hdf5(filepath))):
--> 143 return hdf5_format.load_model_from_hdf5(filepath, custom_objects, compile)
144
145 if isinstance(filepath, six.string_types):

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/hdf5_format.py in load_model_from_hdf5(filepath, custom_objects, compile)
160 model_config = json.loads(model_config.decode('utf-8'))
161 model = model_config_lib.model_from_config(model_config,
--> 162 custom_objects=custom_objects)
163
164 # set weights

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/model_config.py in model_from_config(config, custom_objects)
53 'Sequential.from_config(config)?')
54 from tensorflow.python.keras.layers import deserialize # pylint: disable=g-import-not-at-top
---> 55 return deserialize(config, custom_objects=custom_objects)
56
57

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/layers/serialization.py in deserialize(config, custom_objects)
103 module_objects=globs,
104 custom_objects=custom_objects,
--> 105 printable_module_name='layer')

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/utils/generic_utils.py in deserialize_keras_object(identifier, module_objects, custom_objects, printable_module_name)
189 custom_objects=dict(
190 list(_GLOBAL_CUSTOM_OBJECTS.items()) +
--> 191 list(custom_objects.items())))
192 with CustomObjectScope(custom_objects):
193 return cls.from_config(cls_config)

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/network.py in from_config(cls, config, custom_objects)
1069 # First, we create all layers and enqueue nodes to be processed
1070 for layer_data in config['layers']:
-> 1071 process_layer(layer_data)
1072 # Then we process nodes in order of layer depth.
1073 # Nodes that cannot yet be processed (if the inbound node

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/network.py in process_layer(layer_data)
1053 from tensorflow.python.keras.layers import deserialize as deserialize_layer # pylint: disable=g-import-not-at-top
1054
-> 1055 layer = deserialize_layer(layer_data, custom_objects=custom_objects)
1056 created_layers[layer_name] = layer
1057

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/layers/serialization.py in deserialize(config, custom_objects)
103 module_objects=globs,
104 custom_objects=custom_objects,
--> 105 printable_module_name='layer')

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/utils/generic_utils.py in deserialize_keras_object(identifier, module_objects, custom_objects, printable_module_name)
191 list(custom_objects.items())))
192 with CustomObjectScope(custom_objects):
--> 193 return cls.from_config(cls_config)
194 else:
195 # Then cls may be a function returning a class.

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/base_layer.py in from_config(cls, config)
599 A layer instance.
600 """
--> 601 return cls(**config)
602
603 def compute_output_shape(self, input_shape):

TypeError: type object argument after ** must be a mapping, not NoneType

How to fix this error or load model with tf.keras.models.load_model successfully?

CSV dataset loading error

Thank you for your great work.

I constructed my own dataset with csv file like under


filename, xmin, ymin, xmax, ymax, class_name
...


training looks like working well, but there are too many warning messages like under

UserWarning: Image with id 745 (shape (896, 896, 3)) contains no valid boxes after transform image.shape

b0 backbone is more often show the message, and b3 backbone has just few message
Can I know what it means?

Documentation required for the engineering environment

Hello! When I rediscovered your project, I encountered the following problem:
File "/media/zhangwentao/DATA1/Documents2/Project-DL/EfficientDet-master/model.py", line 162, in efficientdet class_head = build_class_head(w_head, d_head, num_classes=num_classes) File "/media/zhangwentao/DATA1/Documents2/Project-DL/EfficientDet-master/model.py", line 138, in build_class_head )(outputs)
TypeError: __call__() got an unexpected keyword argument 'partition_info'

, which I preliminarily judged to be the problem of tf version. Therefore, I would like to communicate with you. What is the basic environment required for your project?

Could you please provide yours requirements.txt?

thanks.

AttributeError: module 'tensorflow.keras.layers' has no attribute 'ReLU'

Thanks for your code,I try to train the VOC data, and get this error.
Traceback (most recent call last):
File "/home/bigdata/chenfangxiong/EfficientDet/model.py", line 189, in
model, prediction_model = efficientdet(phi=6, num_classes=20)
File "/home/bigdata/chenfangxiong/EfficientDet/model.py", line 160, in efficientdet
features = build_BiFPN(features, w_bifpn, i)
File "/home/bigdata/chenfangxiong/EfficientDet/model.py", line 43, in build_BiFPN
P3_in = ConvBlock(num_channels, kernel_size=1, strides=1, name='BiFPN_{}_P3'.format(id))(C3)
File "/home/bigdata/chenfangxiong/EfficientDet/model.py", line 36, in ConvBlock
f3 = layers.ReLU(name='{}_relu'.format(name))
AttributeError: module 'tensorflow.keras.layers' has no attribute 'ReLU'
can you tell your keras =?,tensorflow=?

Retrain w/ own Dataset:

hi, @xuannianz :

I'm trying to train on my dataset, I have modify voc_classes.txt file(eg:person,dog、cat)。
Running train.py.
str(weight_values[i].shape) + '.')
ValueError: Layer #495 (named "class_head"), weight <tf.Variable 'pyramid_classification/kernel:0' shape=(3, 3, 88, 27) dtype=float32> has shape (3, 3, 88, 27), but th
e saved weight has shape (180, 88, 3, 3).

Please help me out

Thx very much!

Retrain model,num_fp=0, num_tp=0

@xuannianz
I retrain the model,get 0.000. Please help me, boss
Running network: 100% (1279 of 1279) |##########################################################################################| Elapsed Time: 0:02:15 Time: 0:02:15
Parsing annotations: 100% (1279 of 1279) |######################################################################################| Elapsed Time: 0:00:00 Time: 0:00:00
num_fp=0, num_tp=0
3076 instances of class person with average precision: 0.0000
937 instances of class person_1 with average precision: 0.0000
4007 instances of class body with average precision: 0.0000

mAP=0 in step2

Thanks for your code first. I have some problems when train with PASCAL VOC.
In the first step I can train the network smoothly and the best mAP= 0.546 in epoch 47;
then I start step 2, train with h5 of epoch47, but the mAP is always 0.000 in the first 3 epoch, then I have to stop the training. So do you know what happend? Thanks very much
step1:
python3 train.py --snapshot imagenet --phi 1 --gpu 0 --random-transform --compute-val-loss --freeze-backbone --batch-size 2 --steps 1000 pascal data/VOC2012
step2:
python3 train.py --snapshot checkpoints/2019-12-25/pascal_47_1.4936_1.5473.h5 --phi 1 --gpu 0 --random-transform --compute-val-loss --freeze-bn --batch-size 4 --steps 10000 pascal data/VOC2012

How to train EfficientDet-D6

When I set Hyper parameter phi as 5 or 6 ,I found that I have to decrease the batchsize to a low value (because my gpu has only 11gb mem) .
But after I modify the batchsize to 2 or 4 ,the model can not be trained normally,the loss can not decline and the eval ap is almost zero all the time.
Can you give me some suggestions?

Error in inference.py

from utils import preprocess_image
I tried to use inference.py but faced followig error
ImportError: cannot import name 'preprocess_image'
How can I fix this error?

bad results !

This is my command:
(base) E:\EfficientDet>python train.py --snapshot imagenet --phi 1 --gpu 0 --ran
dom-transform --compute-val-loss --freeze-backbone --batch-size 2 --steps 50000
pascal datasets\VOC2007
my dataset has 40 classes with 110000+ images.and after 45 epoches ,i get:

image
and mAP is 0.18,it is so bad.
what problem happen?

About the eval of pretrained weights

I used your eval code,"python3 eval/common.py to evaluate by specifying model path there.",but error found,
ValueError: Layer #1 (named "stem_conv"), weight <tf.Variable 'stem_conv/kernel:0' shape=(3, 3, 3, 32) dtype=float32> has shape (3, 3, 3, 32), but the saved weight has shape (64, 3, 3, 3).

Can you tell me. what is weights(***.h5) you used in eval? Thank you!

loss is increased rapidly during training on Step2. Is EfficientDet unstable?

I have used my own dataset to train EfficientDet.
My params:
phi=0
gpu=0
workers=1
Other params are default following Step1 and Step2.
During training Step1, the loss is decreased stably. But the loss is increased rapidly on Step2.
The loss is increased from 0.5 to 57!
So is EfficientDet unstable? Or are there some tricks of training.

a puzzle about P6_in and P7_in

`P7_in = ConvBlock(num_channels, kernel_size=3, strides=2, freeze_bn=freeze_bn, name='BiFPN_{}_P7'.format(id))(P6_in)

def ConvBlock(num_channels, kernel_size, strides, name, freeze_bn=False):
f1 = layers.Conv2D(num_channels, kernel_size=kernel_size, strides=strides, padding='same',
use_bias=False, name='{}_conv'.format(name))
f2 = BatchNormalization(freeze=freeze_bn, name='{}_bn'.format(name))
f3 = layers.ReLU(name='{}_relu'.format(name))
return reduce(lambda f, g: lambda *args, **kwargs: g(f(*args, **kwargs)), (f1, f2, f3))`

According to these lines of code in model.py, P6_in is obtained via a 3X3 stride-2 conv followed by ReLU on C5 , and P7_in is obtained via a 3X3 stride-2 conv followed by ReLU on P6_in. However, int retinanet, P6 is
obtained via a 3X3 stride-2 conv on C5, and P7 is computed by applying
ReLU followed by a 3X3 stride-2 conv on P6.
Why are they different in ReLU operation?

What is the "name" in layers.py

File "/home/he/projects/EfficientDet-master/layers.py", line 43, in build
dtype=tf.float32)
TypeError: add_weight() missing 1 required positional argument: 'name'

I want to run the "./eval/coco.py",but there have be a error

Why loss increases and map reduces to zero suddenly at epoch 33?

I train my csv dataset with step1 and step2. At epoch 33 in step2, loss increases and map reduces to zero suddenly. What should I do for fixing this issue?

Epoch 00030: saving model to checkpoints/2019-12-07/csv_30_0.7954_0.7353.h5
68/68 [==============================] - 129s 2s/step - loss: 0.7954 - regression_loss: 0.6418 - classification_loss: 0.1536 - val_loss: 0.7353 - val_regression_loss: 0.5805 - val_classification_loss: 0.1548
Epoch 31/50
67/68 [============================>.] - ETA: 1s - loss: 0.7850 - regression_loss: 0.6313 - classification_loss: 0.1537Epoch 1/50
Running network: 100% (159 of 159) |######| Elapsed Time: 0:00:21 Time: 0:00:21
Parsing annotations: 100% (159 of 159) |##| Elapsed Time: 0:00:00 Time: 0:00:00
num_fp=14329.0, num_tp=1571.0
157 instances of class livingroom with average precision: 0.9348
90 instances of class bedroom with average precision: 0.9619
297 instances of class room_big with average precision: 0.9398
152 instances of class room_small_long with average precision: 0.7897
849 instances of class door with average precision: 0.9209
36 instances of class other with average precision: 0.1931
mAP: 0.7900

Epoch 00031: saving model to checkpoints/2019-12-07/csv_31_0.7831_0.7654.h5
68/68 [==============================] - 127s 2s/step - loss: 0.7831 - regression_loss: 0.6298 - classification_loss: 0.1533 - val_loss: 0.7654 - val_regression_loss: 0.6131 - val_classification_loss: 0.1523
Epoch 32/50
67/68 [============================>.] - ETA: 1s - loss: 0.7620 - regression_loss: 0.6117 - classification_loss: 0.1503Epoch 1/50
Running network: 100% (159 of 159) |######| Elapsed Time: 0:00:21 Time: 0:00:21
Parsing annotations: 100% (159 of 159) |##| Elapsed Time: 0:00:00 Time: 0:00:00
num_fp=14326.0, num_tp=1574.0
157 instances of class livingroom with average precision: 0.9583
90 instances of class bedroom with average precision: 0.9623
297 instances of class room_big with average precision: 0.9409
152 instances of class room_small_long with average precision: 0.8183
849 instances of class door with average precision: 0.9256
36 instances of class other with average precision: 0.2731
mAP: 0.8131

Epoch 00032: saving model to checkpoints/2019-12-07/csv_32_0.7617_0.7202.h5
68/68 [==============================] - 126s 2s/step - loss: 0.7617 - regression_loss: 0.6105 - classification_loss: 0.1512 - val_loss: 0.7202 - val_regression_loss: 0.5779 - val_classification_loss: 0.1423
Epoch 33/50
67/68 [============================>.] - ETA: 1s - loss: 2.7521 - regression_loss: 1.3013 - classification_loss: 1.4508Epoch 1/50
Running network: 100% (159 of 159) |######| Elapsed Time: 0:00:19 Time: 0:00:19
Parsing annotations: 100% (159 of 159) |##| Elapsed Time: 0:00:00 Time: 0:00:00
num_fp=0, num_tp=0
157 instances of class livingroom with average precision: 0.0000
90 instances of class bedroom with average precision: 0.0000
297 instances of class room_big with average precision: 0.0000
152 instances of class room_small_long with average precision: 0.0000
849 instances of class door with average precision: 0.0000
36 instances of class other with average precision: 0.0000
mAP: 0.0000

Epoch 00033: saving model to checkpoints/2019-12-07/csv_33_2.8166_6.8559.h5
68/68 [==============================] - 124s 2s/step - loss: 2.8166 - regression_loss: 1.3304 - classification_loss: 1.4862 - val_loss: 6.8559 - val_regression_loss: 2.9996 - val_classification_loss: 3.8562
Epoch 34/50
67/68 [============================>.] - ETA: 1s - loss: 2393.7484 - regression_loss: 2389.8921 - classification_loss: 3.8562Epoch 1/50
Running network: 100% (159 of 159) |######| Elapsed Time: 0:00:19 Time: 0:00:19
Parsing annotations: 100% (159 of 159) |##| Elapsed Time: 0:00:00 Time: 0:00:00
num_fp=0, num_tp=0
157 instances of class livingroom with average precision: 0.0000
90 instances of class bedroom with average precision: 0.0000
297 instances of class room_big with average precision: 0.0000
152 instances of class room_small_long with average precision: 0.0000
849 instances of class door with average precision: 0.0000
36 instances of class other with average precision: 0.0000
mAP: 0.0000

Epoch 00034: saving model to checkpoints/2019-12-07/csv_34_2358.6450_6.6235.h5
68/68 [==============================] - 124s 2s/step - loss: 2358.6450 - regression_loss: 2354.7886 - classification_loss: 3.8562 - val_loss: 6.6235 - val_regression_loss: 2.7673 - val_classification_loss: 3.8562
Epoch 35/50
67/68 [============================>.] - ETA: 1s - loss: 7.0168 - regression_loss: 3.1606 - classification_loss: 3.8562Epoch 1/50
Running network: 100% (159 of 159) |######| Elapsed Time: 0:00:19 Time: 0:00:19
Parsing annotations: 100% (159 of 159) |##| Elapsed Time: 0:00:00 Time: 0:00:00
num_fp=0, num_tp=0
157 instances of class livingroom with average precision: 0.0000
90 instances of class bedroom with average precision: 0.0000
297 instances of class room_big with average precision: 0.0000
152 instances of class room_small_long with average precision: 0.0000
849 instances of class door with average precision: 0.0000
36 instances of class other with average precision: 0.0000
mAP: 0.0000

Epoch 00035: saving model to checkpoints/2019-12-07/csv_35_7.0110_6.6163.h5
68/68 [==============================] - 124s 2s/step - loss: 7.0110 - regression_loss: 3.1548 - classification_loss: 3.8562 - val_loss: 6.6163 - val_regression_loss: 2.7601 - val_classification_loss: 3.8562
Epoch 36/50
67/68 [============================>.] - ETA: 1s - loss: 7.7442 - regression_loss: 3.8879 - classification_loss: 3.8562Epoch 1/50
Running network: 100% (159 of 159) |######| Elapsed Time: 0:00:19 Time: 0:00:19
Parsing annotations: 100% (159 of 159) |##| Elapsed Time: 0:00:00 Time: 0:00:00
num_fp=0, num_tp=0
157 instances of class livingroom with average precision: 0.0000
90 instances of class bedroom with average precision: 0.0000
297 instances of class room_big with average precision: 0.0000
152 instances of class room_small_long with average precision: 0.0000
849 instances of class door with average precision: 0.0000
36 instances of class other with average precision: 0.0000
mAP: 0.0000

How can i trained the model with coco dataset?

How can i trained the model with coco datesets? I am a rookie.And i want to train the model with coco dataset,I see this project has the coco.py.I want to know what is the order to training the model with it.please help me .thanks a lot .

TypeError: __call__() got an unexpected keyword argument 'partition_info'

Start to train pascal VOC,and the error :
Traceback (most recent call last):
File "train.py", line 355, in
main()
File "train.py", line 297, in main
freeze_bn=args.freeze_bn)
File "E:\EfficientDet\model.py", line 232, in efficientdet
class_head = build_class_head(w_head, d_head, num_classes=num_classes)
File "E:\EfficientDet\model.py", line 204, in build_class_head
)(outputs)
File "C:\ProgramData\Miniconda3\lib\site-packages\tensorflow\python\keras\engi
ne\base_layer.py", line 538, in call
self._maybe_build(inputs)
File "C:\ProgramData\Miniconda3\lib\site-packages\tensorflow\python\keras\engi
ne\base_layer.py", line 1603, in _maybe_build
self.build(input_shapes)
File "C:\ProgramData\Miniconda3\lib\site-packages\tensorflow\python\keras\laye
rs\convolutional.py", line 174, in build
dtype=self.dtype)
File "C:\ProgramData\Miniconda3\lib\site-packages\tensorflow\python\keras\engi
ne\base_layer.py", line 349, in add_weight
aggregation=aggregation)
File "C:\ProgramData\Miniconda3\lib\site-packages\tensorflow\python\training\c
heckpointable\base.py", line 607, in _add_variable_with_custom_getter
**kwargs_for_getter)
File "C:\ProgramData\Miniconda3\lib\site-packages\tensorflow\python\keras\engi
ne\base_layer_utils.py", line 145, in make_variable
aggregation=aggregation)
File "C:\ProgramData\Miniconda3\lib\site-packages\tensorflow\python\ops\variab
les.py", line 213, in call
return cls._variable_v1_call(*args, **kwargs)
File "C:\ProgramData\Miniconda3\lib\site-packages\tensorflow\python\ops\variab
les.py", line 176, in _variable_v1_call
aggregation=aggregation)
File "C:\ProgramData\Miniconda3\lib\site-packages\tensorflow\python\ops\variab
les.py", line 155, in
previous_getter = lambda **kwargs: default_variable_creator(None, **kwargs)
File "C:\ProgramData\Miniconda3\lib\site-packages\tensorflow\python\ops\variab
le_scope.py", line 2488, in default_variable_creator
import_scope=import_scope)
File "C:\ProgramData\Miniconda3\lib\site-packages\tensorflow\python\ops\variab
les.py", line 217, in call
return super(VariableMetaclass, cls).call(*args, **kwargs)
File "C:\ProgramData\Miniconda3\lib\site-packages\tensorflow\python\ops\resour
ce_variable_ops.py", line 294, in init
constraint=constraint)
File "C:\ProgramData\Miniconda3\lib\site-packages\tensorflow\python\ops\resour
ce_variable_ops.py", line 406, in _init_from_args
initial_value() if init_from_fn else initial_value,
File "C:\ProgramData\Miniconda3\lib\site-packages\tensorflow\python\keras\engi
ne\base_layer_utils.py", line 127, in
shape, dtype=dtype, partition_info=partition_info)
TypeError: call() got an unexpected keyword argument 'partition_info'

Detection oriented objects

@noahtren @xuannianz thanks for sharing the code and wonderful work , in the current detection can we add the orientation also , if so which part of the section would be advised to do this . Please find the below example i wanted to get the bounding boxes as shown in the right side
image

OOM issue

Hi. I try to train pascal model using phi=6, but OOM happens.
First, for step 1, there is always OOM when batch_size is bigger than 1. Try to let batch_size equals {16, 8, 4, 2}, they all face the same issue. At last,when I set batch_size = 1, the training process goes well.
Second, for step 2, It is always OOM even when I set batch_size = 1. Following the Readme ,I chose model at epoch 20 as the value of snapshot. This time the OOM issue appears again.
ps: I use Tesla V100 to train the model.
Any advices?

2019-12-06 03:05:27.140466: I tensorflow/core/common_runtime/bfc_allocator.cc:818] total_region_allocated_bytes_: 15651091200 memory_limit_: 15651091252 available bytes: 52 curr_region_allocation_bytes_: 17179869184
2019-12-06 03:05:27.140496: I tensorflow/core/common_runtime/bfc_allocator.cc:824] Stats: 
Limit:                 15651091252
InUse:                 15641373952
MaxInUse:              15646383616
NumAllocs:                    6516
MaxAllocSize:            380633088

2019-12-06 03:05:27.141000: W tensorflow/core/common_runtime/bfc_allocator.cc:319] ****************************************************************************************************
2019-12-06 03:05:27.141056: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at transpose_op.cc:199 : Resource exhausted: OOM when allocating tensor with shape[1,88,88,864] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
  File "train.py", line 355, in <module>
    main()
  File "train.py", line 350, in main
    validation_data=validation_generator
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/keras/engine/training.py", line 1433, in fit_generator
    steps_name='steps_per_epoch')
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/keras/engine/training_generator.py", line 264, in model_iteration
    batch_outs = batch_function(*batch_data)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/keras/engine/training.py", line 1175, in train_on_batch
    outputs = self.train_function(ins)  # pylint: disable=not-callable
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/keras/backend.py", line 3292, in __call__
    run_metadata=self.run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1458, in __call__
    run_metadata_ptr)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
  (0) Resource exhausted: OOM when allocating tensor with shape[1,864,88,88] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[{{node block5a_expand_bn/FusedBatchNorm}}]]

No module named 'utils.compute_overlap'

Hello, firstly thank you for the great work.

I am having trouble starting training. I get this error:

-- No module named 'utils.compute_overlap' --

I have the keras-retinanet running perfectly on my machine.

What could be done?

Regards!

A puzzle about the regression targets of anchors which have no intersection with any ground-truth boxes.

According the logic for computing the overlap in compute_overlap.pyx, if an anchor has no intersections with any ground-truth boxes, then, its overlaps with every ground-truth box will be zero and it will be assigned a background with state 0. However, in function bbox_transform, its regression targets are computed as the offset between it and one ground-truth box as the np.argmax(overlaps, axis=1) returns a index of one ground-truth box. As a result, it has a state 0 of background, but its box regression targets is the offset between it and one ground-truth box. It is odd. Do you know why?

a few questions about freeze-backbone,freeze-bn,and lr

Thanks for your great works. After walking through your code, I have a few questions about freeze-backbone,freeze-bn,lr and anchor ...
Train
STEP1: python3 train.py --snapshot imagenet --phi {0, 1, 2, 3, 4, 5, 6} --gpu 0 --random-transform --compute-val-loss --freeze-backbone --batch-size 32 --steps 1000 pascal datasets/VOC2012 to start training. The init lr is 1e-3.
STEP2: python3 train.py --snapshot xxx.h5 --phi {0, 1, 2, 3, 4, 5, 6} --gpu 0 --random-transform --compute-val-loss --freeze-bn --batch-size 4 --steps 10000 pascal datasets/VOC2012 to start training when val mAP can not increase during STEP1. The init lr is 1e-4 and decays to 1e-5 when val mAP keeps dropping down.
The training has two steps.

  1. freeze-backbone and freeze-bn are applied in step1 and step2 respectively, Why?
  2. How should I set the learning rate?
  3. In bbox_transform, you compute network targets with anchors and gt_boxes in the same way as fizyr/keras-retinanet. However, the source code of EfficientDet does not yet make public. And, the paper EfficientDet has little descriptions about how to compute network targets.
  4. Can you give some idea about the way of preprocessing the image in preprocess_image(self, image)?
  5. In focal(alpha=0.25, gamma=2.0), however, the paper EfficientDet employ focal loss with alpha=0.5 and gamma=1.5.
    Thanks!

so high CPU load rate,while GPU load normal

thanks
python3 train.py --snapshot imagenet --phi 0 --gpu 0 --random-transform --compute-val-loss --freeze-backbone --batch-size 8 --steps 1000 pascal datasets/VOC2012
when i run this command on tensorflow-gpu 1.15,I have a strange problem
image
image

why so high CPU load rate,while GPU load normal

What is the environment of the code

I meet an issue
TypeError: call() got an unexpected keyword argument 'partition_info'

and I think it is caused by the different environments. I am using TF-gpu 1.13.1, Keras 1.0.7

How to test "custum csv model"?

I understand that the "inference.py" file is a test file for PASCAL VOC data.

How can I use inference.py for my csv data?

Similarly, in /eval/commom.py it is difficult to evaluate csv data.

Thanks.

Bad results for EfficientDetB0

Hi.
When I downloaded your EfficientB0's pretrained model, and used for testing my own data, I get a very poor result. The boxes in the result are 300-dimensional list, and each score corresponds to box is greater than 0.9. Even if I set a very large score_threshold while filtering, most boxes will still be left. By drawing these boxes, I found that they are useless for detection.
I don't know what to do... Please help me.

Per FPN level filtering

RetinaNet uses per FPN level filtering (top 1k after confidence thresholding @ 0.05 ). Perhaps this scheme is better with an anchor-based detector?

How to add rotation to the model ?

I have a dataset that i would like to train but i would my requirements are a bit different than simple plain boxes. I have some objects that are to be detected with rotation. So what changes can i make to add this functionality in the architecture ?

Parameter counts are slightly off

Hi, thanks so much for creating this. EfficientDet looks really promising!

I ran your code and noticed that the parameter counts are slightly off compared to the official paper. Here's a table below comparing what they reported in the paper and what this implementation produced. Any idea what could be causing this slight difference?

Model Implementation Official
B0 4.268788 3.9
B1 7.225352 6.6
B2 8.93297 8.1
B3 13.828736 12
B4 23.838512 20.7
B5 39.019592 33.7
B6 62.830688 51.9

Also, I noticed the authors added P6 and P7 layers that didn't exist in the original EfficientNet backbone. I'm thinking that this might have something to do with it.

evaluate bug:Need to set freeze_bn to true when testing

eval/commont.py model, prediction_model = efficientdet(phi=phi, num_classes=num_classes, weighted_bifpn=weighted_bifpn) freeze_bn default equal False,we need set freeze_bn = True when test model
Otherwise mAP is only about 0.01

no module named utils.compute_overlap

Has anyone run into the error "no module named utils.compute_overlap"?

There is a utils directory with a compute_overlap.pyx file,
but train.py does not seem to find it even if I set the PYTHON_PATH for the enclosing directory.

I tried installing keras-retinanet and efficientnet into the environment, but neither of these has the required dependency.

How to change h5 model to pb models?

Thanks for your work
One question is, how do I convert the h5 model into a pb model?
Can you provide a script to do such a job?I encountered a problem while doing in shape of weight

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.