zzh8829 / yolov3-tf2 Goto Github PK
View Code? Open in Web Editor NEWYoloV3 Implemented in Tensorflow 2.0
License: MIT License
YoloV3 Implemented in Tensorflow 2.0
License: MIT License
python detect.py --weights ./checkpoints/yolov3-tiny.tf --tiny --image ./data/girl.png
W0706 03:01:45.838256 139645809293184 deprecation.py:506] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.init (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
2019-07-06 03:01:47.077830: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300000000 Hz
2019-07-06 03:01:47.078112: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x16d4a00 executing computations on platform Host. Devices:
2019-07-06 03:01:47.078146: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): ,
2019-07-06 03:01:47.158210: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
I0706 03:01:47.481413 139645809293184 detect.py:29] weights loaded
I0706 03:01:47.481828 139645809293184 detect.py:32] classes loaded
I0706 03:01:47.794163 139645809293184 detect.py:41] time: 0.30501627922058105
I0706 03:01:47.794389 139645809293184 detect.py:43] detections:
Traceback (most recent call last):
File "detect.py", line 56, in
app.run(main)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 300, in run
_run_main(main, args)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "detect.py", line 44, in main
for i in range(nums[0]):
TypeError: 'Tensor' object cannot be interpreted as an integer
I was trying to run detect.py
eagerly, but when setting a breakpoint it never stops (used PyCharm).
For example, I tried to set up a breakpoint at the beginning of YoloV3
function. It stops at the definition call:
if FLAGS.tiny:
yolo = YoloV3Tiny()
else:
yolo = YoloV3()
But do not stops at prediction call:
boxes, scores, classes, nums = yolo(img)
Probably I miss something here..
Hi, following dataset preprocess, I have generated tfrecords of coco and normalized labeled boxes with ori image width and height. But the label transformed got me this:
<tf.Tensor: id=1600, shape=(3, 26, 26, 3, 6), dtype=float32, numpy=
array([[[[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
...,
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]]],
[[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
...,
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]]],
[[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
...,
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]]],
...,
[[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
...,
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]]],
[[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
...,
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]]],
[[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
...,
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]]]],
[[[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
...,
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]]],
[[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
...,
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]]],
[[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
...,
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]]],
...,
[[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
...,
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]]],
[[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
...,
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]]],
[[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
...,
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]]]],
[[[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
...,
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]]],
[[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
...,
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]]],
[[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
...,
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]]],
...,
[[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
...,
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]]],
[[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
...,
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]]],
[[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
...,
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]]]]], dtype=float32)>
it's all zeros. Any suggestions on this?
I want to know the version of cudnn, I tried cudnn7.5.1, but he doesn't work.
~/yolov3-tf2$ python convert.py
/home/dhh/anaconda3/lib/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float
to np.floating
is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type
.
from ._conv import register_converters as _register_converters
2019-07-09 17:46:52.049771: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-07-09 17:46:52.054104: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2019-07-09 17:46:52.116723: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1009] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-09 17:46:52.117718: I tensorflow/compiler/xla/service/service.cc:162] XLA service 0x564ac0e64a40 executing computations on platform CUDA. Devices:
2019-07-09 17:46:52.117733: I tensorflow/compiler/xla/service/service.cc:169] StreamExecutor device (0): GeForce GTX 1050 Ti, Compute Capability 6.1
2019-07-09 17:46:52.119343: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3600000000 Hz
2019-07-09 17:46:52.119700: I tensorflow/compiler/xla/service/service.cc:162] XLA service 0x564ac0ed0e50 executing computations on platform Host. Devices:
2019-07-09 17:46:52.119714: I tensorflow/compiler/xla/service/service.cc:169] StreamExecutor device (0): ,
2019-07-09 17:46:52.120057: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1467] Found device 0 with properties:
name: GeForce GTX 1050 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.392
pciBusID: 0000:01:00.0
totalMemory: 3.94GiB freeMemory: 3.67GiB
2019-07-09 17:46:52.120088: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1546] Adding visible gpu devices: 0
2019-07-09 17:46:52.120143: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-07-09 17:46:52.121039: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1015] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-09 17:46:52.121049: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1021] 0
2019-07-09 17:46:52.121069: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1034] 0: N
2019-07-09 17:46:52.121172: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1149] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3462 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
Model: "yolov3"
input_1 (InputLayer) [(None, None, None, 0
yolo_darknet (Model) ((None, None, None, 40620640 input_1[0][0]
yolo_conv_0 (Model) (None, None, None, 5 11024384 yolo_darknet[1][2]
yolo_conv_1 (Model) (None, None, None, 2 2957312 yolo_conv_0[1][0]
yolo_darknet[1][1]
yolo_conv_2 (Model) (None, None, None, 1 741376 yolo_conv_1[1][0]
yolo_darknet[1][0]
yolo_output_0 (Model) (None, None, None, 3 4984063 yolo_conv_0[1][0]
yolo_output_1 (Model) (None, None, None, 3 1312511 yolo_conv_1[1][0]
yolo_output_2 (Model) (None, None, None, 3 361471 yolo_conv_2[1][0]
yolo_boxes_0 (Lambda) ((None, None, None, 0 yolo_output_0[1][0]
yolo_boxes_1 (Lambda) ((None, None, None, 0 yolo_output_1[1][0]
yolo_boxes_2 (Lambda) ((None, None, None, 0 yolo_output_2[1][0]
Total params: 62,001,757
Trainable params: 61,949,149
Non-trainable params: 52,608
I0709 17:46:57.439600 139777120954112 convert.py:18] model created
I0709 17:46:57.441039 139777120954112 utils.py:45] yolo_darknet/conv2d bn
I0709 17:46:57.443896 139777120954112 utils.py:45] yolo_darknet/conv2d_1 bn
I0709 17:46:57.446532 139777120954112 utils.py:45] yolo_darknet/conv2d_2 bn
I0709 17:46:57.448883 139777120954112 utils.py:45] yolo_darknet/conv2d_3 bn
I0709 17:46:57.451326 139777120954112 utils.py:45] yolo_darknet/conv2d_4 bn
I0709 17:46:57.454463 139777120954112 utils.py:45] yolo_darknet/conv2d_5 bn
I0709 17:46:57.456880 139777120954112 utils.py:45] yolo_darknet/conv2d_6 bn
I0709 17:46:57.459455 139777120954112 utils.py:45] yolo_darknet/conv2d_7 bn
I0709 17:46:57.461632 139777120954112 utils.py:45] yolo_darknet/conv2d_8 bn
I0709 17:46:57.464024 139777120954112 utils.py:45] yolo_darknet/conv2d_9 bn
I0709 17:46:57.468281 139777120954112 utils.py:45] yolo_darknet/conv2d_10 bn
I0709 17:46:57.470547 139777120954112 utils.py:45] yolo_darknet/conv2d_11 bn
I0709 17:46:57.473851 139777120954112 utils.py:45] yolo_darknet/conv2d_12 bn
I0709 17:46:57.476211 139777120954112 utils.py:45] yolo_darknet/conv2d_13 bn
I0709 17:46:57.479484 139777120954112 utils.py:45] yolo_darknet/conv2d_14 bn
I0709 17:46:57.481873 139777120954112 utils.py:45] yolo_darknet/conv2d_15 bn
I0709 17:46:57.485709 139777120954112 utils.py:45] yolo_darknet/conv2d_16 bn
I0709 17:46:57.488569 139777120954112 utils.py:45] yolo_darknet/conv2d_17 bn
I0709 17:46:57.492237 139777120954112 utils.py:45] yolo_darknet/conv2d_18 bn
I0709 17:46:57.494879 139777120954112 utils.py:45] yolo_darknet/conv2d_19 bn
I0709 17:46:57.498184 139777120954112 utils.py:45] yolo_darknet/conv2d_20 bn
I0709 17:46:57.500384 139777120954112 utils.py:45] yolo_darknet/conv2d_21 bn
I0709 17:46:57.503392 139777120954112 utils.py:45] yolo_darknet/conv2d_22 bn
I0709 17:46:57.505593 139777120954112 utils.py:45] yolo_darknet/conv2d_23 bn
I0709 17:46:57.508599 139777120954112 utils.py:45] yolo_darknet/conv2d_24 bn
I0709 17:46:57.510802 139777120954112 utils.py:45] yolo_darknet/conv2d_25 bn
I0709 17:46:57.513751 139777120954112 utils.py:45] yolo_darknet/conv2d_26 bn
I0709 17:46:57.525273 139777120954112 utils.py:45] yolo_darknet/conv2d_27 bn
I0709 17:46:57.528098 139777120954112 utils.py:45] yolo_darknet/conv2d_28 bn
I0709 17:46:57.534902 139777120954112 utils.py:45] yolo_darknet/conv2d_29 bn
I0709 17:46:57.538571 139777120954112 utils.py:45] yolo_darknet/conv2d_30 bn
I0709 17:46:57.550390 139777120954112 utils.py:45] yolo_darknet/conv2d_31 bn
I0709 17:46:57.554516 139777120954112 utils.py:45] yolo_darknet/conv2d_32 bn
I0709 17:46:57.565870 139777120954112 utils.py:45] yolo_darknet/conv2d_33 bn
I0709 17:46:57.569745 139777120954112 utils.py:45] yolo_darknet/conv2d_34 bn
I0709 17:46:57.581679 139777120954112 utils.py:45] yolo_darknet/conv2d_35 bn
I0709 17:46:57.585679 139777120954112 utils.py:45] yolo_darknet/conv2d_36 bn
I0709 17:46:57.597076 139777120954112 utils.py:45] yolo_darknet/conv2d_37 bn
I0709 17:46:57.601133 139777120954112 utils.py:45] yolo_darknet/conv2d_38 bn
I0709 17:46:57.612637 139777120954112 utils.py:45] yolo_darknet/conv2d_39 bn
I0709 17:46:57.616534 139777120954112 utils.py:45] yolo_darknet/conv2d_40 bn
I0709 17:46:57.627830 139777120954112 utils.py:45] yolo_darknet/conv2d_41 bn
I0709 17:46:57.631635 139777120954112 utils.py:45] yolo_darknet/conv2d_42 bn
I0709 17:46:57.642864 139777120954112 utils.py:45] yolo_darknet/conv2d_43 bn
I0709 17:46:57.699196 139777120954112 utils.py:45] yolo_darknet/conv2d_44 bn
I0709 17:46:57.705252 139777120954112 utils.py:45] yolo_darknet/conv2d_45 bn
I0709 17:46:57.757189 139777120954112 utils.py:45] yolo_darknet/conv2d_46 bn
I0709 17:46:57.761758 139777120954112 utils.py:45] yolo_darknet/conv2d_47 bn
I0709 17:46:57.804775 139777120954112 utils.py:45] yolo_darknet/conv2d_48 bn
I0709 17:46:57.809182 139777120954112 utils.py:45] yolo_darknet/conv2d_49 bn
I0709 17:46:57.859200 139777120954112 utils.py:45] yolo_darknet/conv2d_50 bn
I0709 17:46:57.863812 139777120954112 utils.py:45] yolo_darknet/conv2d_51 bn
I0709 17:46:57.906277 139777120954112 utils.py:45] yolo_conv_0/conv2d_52 bn
I0709 17:46:57.909931 139777120954112 utils.py:45] yolo_conv_0/conv2d_53 bn
I0709 17:46:57.959341 139777120954112 utils.py:45] yolo_conv_0/conv2d_54 bn
I0709 17:46:57.963100 139777120954112 utils.py:45] yolo_conv_0/conv2d_55 bn
I0709 17:46:58.012643 139777120954112 utils.py:45] yolo_conv_0/conv2d_56 bn
I0709 17:46:58.016298 139777120954112 utils.py:45] yolo_output_0/conv2d_57 bn
I0709 17:46:58.065901 139777120954112 utils.py:45] yolo_output_0/conv2d_58 bias
I0709 17:46:58.067934 139777120954112 utils.py:45] yolo_conv_1/conv2d_59 bn
I0709 17:46:58.070146 139777120954112 utils.py:45] yolo_conv_1/conv2d_60 bn
I0709 17:46:58.072342 139777120954112 utils.py:45] yolo_conv_1/conv2d_61 bn
I0709 17:46:58.077865 139777120954112 utils.py:45] yolo_conv_1/conv2d_62 bn
I0709 17:46:58.080121 139777120954112 utils.py:45] yolo_conv_1/conv2d_63 bn
I0709 17:46:58.086580 139777120954112 utils.py:45] yolo_conv_1/conv2d_64 bn
I0709 17:46:58.088860 139777120954112 utils.py:45] yolo_output_1/conv2d_65 bn
I0709 17:46:58.094257 139777120954112 utils.py:45] yolo_output_1/conv2d_66 bias
I0709 17:46:58.095541 139777120954112 utils.py:45] yolo_conv_2/conv2d_67 bn
I0709 17:46:58.097150 139777120954112 utils.py:45] yolo_conv_2/conv2d_68 bn
I0709 17:46:58.098598 139777120954112 utils.py:45] yolo_conv_2/conv2d_69 bn
I0709 17:46:58.100960 139777120954112 utils.py:45] yolo_conv_2/conv2d_70 bn
I0709 17:46:58.102352 139777120954112 utils.py:45] yolo_conv_2/conv2d_71 bn
I0709 17:46:58.104810 139777120954112 utils.py:45] yolo_conv_2/conv2d_72 bn
I0709 17:46:58.106407 139777120954112 utils.py:45] yolo_output_2/conv2d_73 bn
I0709 17:46:58.108677 139777120954112 utils.py:45] yolo_output_2/conv2d_74 bias
I0709 17:46:58.109469 139777120954112 convert.py:21] weights loaded
2019-07-09 17:46:58.121147: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-07-09 17:46:58.759909: E tensorflow/stream_executor/cuda/cuda_dnn.cc:328] Loaded runtime CuDNN library: 7.3.1 but source was compiled with: 7.4.2. CuDNN library major and minor version needs to match or have higher minor version in case of CuDNN 7.0 or later version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
2019-07-09 17:46:58.761417: E tensorflow/stream_executor/cuda/cuda_dnn.cc:328] Loaded runtime CuDNN library: 7.3.1 but source was compiled with: 7.4.2. CuDNN library major and minor version needs to match or have higher minor version in case of CuDNN 7.0 or later version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
Traceback (most recent call last):
File "convert.py", line 33, in
app.run(main)
File "/home/dhh/anaconda3/lib/python3.6/site-packages/absl/app.py", line 300, in run
_run_main(main, args)
File "/home/dhh/anaconda3/lib/python3.6/site-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "convert.py", line 24, in main
output = yolo(img)
File "/home/dhh/anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 660, in call
outputs = self.call(inputs, *args, **kwargs)
File "/home/dhh/anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/engine/network.py", line 870, in call
return self._run_internal_graph(inputs, training=training, mask=mask)
File "/home/dhh/anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/engine/network.py", line 1011, in _run_internal_graph
output_tensors = layer(computed_tensors, **kwargs)
File "/home/dhh/anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 660, in call
outputs = self.call(inputs, *args, **kwargs)
File "/home/dhh/anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/engine/network.py", line 870, in call
return self._run_internal_graph(inputs, training=training, mask=mask)
File "/home/dhh/anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/engine/network.py", line 1011, in _run_internal_graph
output_tensors = layer(computed_tensors, **kwargs)
File "/home/dhh/anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 660, in call
outputs = self.call(inputs, *args, **kwargs)
File "/home/dhh/anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/layers/convolutional.py", line 196, in call
outputs = self._convolution_op(inputs, self.kernel)
File "/home/dhh/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 1078, in call
return self.conv_op(inp, filter)
File "/home/dhh/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 634, in call
return self.call(inp, filter)
File "/home/dhh/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 233, in call
name=self.name)
File "/home/dhh/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 1951, in conv2d
name=name)
File "/home/dhh/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 1031, in conv2d
data_format=data_format, dilations=dilations, name=name, ctx=_ctx)
File "/home/dhh/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 1130, in conv2d_eager_fallback
ctx=_ctx, name=name)
File "/home/dhh/anaconda3/lib/python3.6/site-packages/tensorflow/python/eager/execute.py", line 66, in quick_execute
six.raise_from(core._status_to_exception(e.code, message), None)
File "", line 3, in raise_from
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [Op:Conv2D]
Hi, I want to convert weights to tflite using tflite_convert --keras_model_file=yolov3-tiny.h5 --output_file=yolov3-tiny.tflite
It fails with the following traceback:
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
W0513 14:27:21.003883 140717950494528 deprecation.py:506] From /usr/local/lib/python3.7/site-packages/tensorflow/python/ops/init_ops.py:97: calling Zeros.init (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
W0513 14:27:21.004970 140717950494528 deprecation.py:506] From /usr/local/lib/python3.7/site-packages/tensorflow/python/ops/init_ops.py:97: calling Ones.init (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
/usr/local/lib/python3.7/site-packages/tensorflow/python/keras/layers/core.py:820: UserWarning: yolov3_tf2.models is not loaded, but a Lambda layer uses it. It may cause errors.
, UserWarning)
Traceback (most recent call last):
File "/usr/local/bin/tflite_convert", line 11, in
sys.exit(main())
File "/usr/local/lib/python3.7/site-packages/tensorflow/lite/python/tflite_convert.py", line 448, in main
app.run(main=run_main, argv=sys.argv[:1])
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/usr/local/lib/python3.7/site-packages/absl/app.py", line 300, in run
_run_main(main, args)
File "/usr/local/lib/python3.7/site-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "/usr/local/lib/python3.7/site-packages/tensorflow/lite/python/tflite_convert.py", line 444, in run_main
_convert_model(tflite_flags)
File "/usr/local/lib/python3.7/site-packages/tensorflow/lite/python/tflite_convert.py", line 123, in _convert_model
converter = _get_toco_converter(flags)
File "/usr/local/lib/python3.7/site-packages/tensorflow/lite/python/tflite_convert.py", line 110, in _get_toco_converter
return converter_fn(**converter_kwargs)
File "/usr/local/lib/python3.7/site-packages/tensorflow/lite/python/lite.py", line 627, in from_keras_model_file
keras_model = _keras.models.load_model(model_file)
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/keras/saving/hdf5_format.py", line 215, in load_model
custom_objects=custom_objects)
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/keras/saving/model_config.py", line 55, in model_from_config
return deserialize(config, custom_objects=custom_objects)
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/keras/layers/serialization.py", line 95, in deserialize
printable_module_name='layer')
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/keras/utils/generic_utils.py", line 192, in deserialize_keras_object
list(custom_objects.items())))
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/keras/engine/network.py", line 1231, in from_config
process_layer(layer_data)
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/keras/engine/network.py", line 1215, in process_layer
layer = deserialize_layer(layer_data, custom_objects=custom_objects)
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/keras/layers/serialization.py", line 95, in deserialize
printable_module_name='layer')
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/keras/utils/generic_utils.py", line 192, in deserialize_keras_object
list(custom_objects.items())))
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/keras/engine/network.py", line 1241, in from_config
process_node(layer, node_data)
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/keras/engine/network.py", line 1197, in process_node
layer(flat_input_tensors[0], **kwargs)
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 612, in call
outputs = self.call(inputs, *args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/keras/layers/core.py", line 768, in call
return self.function(inputs, **arguments)
File "/home/mba/GitHub/yolov3-tf2/yolov3_tf2/models.py", line 139, in
x = Lambda(lambda x: import tensorflow as tf; tf.reshape(x, (-1, tf.shape(x)[1], tf.shape(x)[2], anchors, classes + 5)))(x)
NameError: name 'tf' is not defined
I found similar problem here: https://stackoverflow.com/questions/54347963/tf-is-not-defined-on-load-model-using-lambda
How could I solve it?
hello @zzh8829. Thank for your code!
How to convert your checkpoint to frozen graph (.pb file)
Thanks for the great work!
I wonder if you have any plan to implement C++ inference with tensorflow for the yolov3?
hi why you have done this?
def YoloLoss(anchors, classes=80, ignore_thresh=0.5):
def yolo_loss(y_true, y_pred):
.
return yolo_loss
yolov3-tf2/yolov3_tf2/dataset.py
Line 42 in 27a96bc
Does this line code really works? I got all labeld data be zeros, I think it's caused by this line code. You returned y_true_out but after zeros initilization I can not found anywhere else assign any values to it.
I'm trying to convert my darknet weights to tensorflow weights using the command
python convert.py --weights /path/to/weights --output ./checkpoints/yolo-obj.tf
And what I get is this error message:
File "convert.py", line 33, in <module>
app.run(main)
File "/home/raulberari/.conda/envs/yolov3-tf2/lib/python3.6/site-packages/absl/app.py", line 300, in run
_run_main(main, args)
File "/home/raulberari/.conda/envs/yolov3-tf2/lib/python3.6/site-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "convert.py", line 20, in main
load_darknet_weights(yolo, FLAGS.weights, FLAGS.tiny)
File "/home/raulberari/yolov3-tf2/yolov3_tf2/utils.py", line 66, in load_darknet_weights
conv_shape).transpose([2, 3, 1, 0])
ValueError: cannot reshape array of size 42732 into shape (256,128,3,3)
This happens after
I0801 10:46:50.183817 139702532433664 utils.py:45] yolo_output_2/conv2d_73 bn
Does anyone have an explanation for this? I'm running this in the given env, yolov3-tf2
on an Ubuntu machine.
Right now the non maximum suppression is performed for all the bounding boxes for all the classes.
yolov3-tf2/yolov3_tf2/models.py
Line 175 in f38bb5a
Would it be possible to add a flag that allows you to carry out the non maximum class separately for different classes?
I have a use case where a large object can sometimes have a smaller object attached to it. The problem is that the bounding box of the smaller object (if it is present) is always suppressed by the bounding object of the larger object.
I have found a hacky solution that works for me, but I think that it would be useful to a have a general solution.
How to implement export_inference_graph?
Go to tensotflow serving
I just wanna try your Keras version model in TF1.12 and added two lines into 'yolov3_tf2/models.py'
if __name__ == '__main__':
model = YoloV3(training=True, size=418)
Things go well in TF2.0, however, when I run it in TF1.12, the following error occurred:
I just wanna try your Keras version model in TF1.12 and added two lines into 'yolov3_tf2/models.py'
if __name__ == '__main__':
model = YoloV3(training=True, size=418)
However, the following error occurred:
Traceback (most recent call last):
File "/Users/xxx/Code/GitOA/xxx/YoloV3/src/yolov3_tf2/models.py", line 312, in
model = YoloV3(training=True, size=418)
File "/Users/xxx/Code/GitOA/xxx/YoloV3/src/yolov3_tf2/models.py", line 207, in YoloV3
x = YoloConv(256, name='yolo_conv_1')((x, x_61))
File "/Users/xxx/Code/GitOA/xxx/YoloV3/src/yolov3_tf2/models.py", line 112, in yolo_conv
return Model(inputs, x, name=name)(x_in)
File "/Users/xxx/anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 554, in call
outputs = self.call(inputs, *args, **kwargs)
File "/Users/xxx/anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/engine/network.py", line 815, in call
mask=masks)
File "/Users/xxx/anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/engine/network.py", line 1002, in _run_internal_graph
output_tensors = layer.call(computed_tensor, **kwargs)
File "/Users/xxx/anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/layers/convolutional.py", line 194, in call
outputs = self._convolution_op(inputs, self.kernel)
File "/Users/xxx/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 966, in call
return self.conv_op(inp, filter)
File "/Users/xxx/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 591, in call
return self.call(inp, filter)
File "/Users/xxx/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 208, in call
name=self.name)
File "/Users/xxx/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 1026, in conv2d
data_format=data_format, dilations=dilations, name=name)
File "/Users/xxx/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 529, in _apply_op_helper
(input_name, err))
ValueError: Tried to convert 'input' to a tensor and failed. Error: Dimension 1 in both shapes must be equal, but are 13 and 26. Shapes are [?,13,13,512] and [?,26,26,512].
From merging shape 0 with other shapes. for 'yolo_conv_1/conv2d_59/Conv2D/packed' (op: 'Pack') with input shapes: [?,13,13,512], [?,26,26,512].
I think it's because of the version of Tensorflow but I'm not sure which specific part caused this, does anybody knows?
Hello, i wanted to know how do you track that model accuracy ?
I only see loss tracking, i wonder if it is possible to follow an accuracy on validation dataset?
it gives an error when making a request to server. Says binary and op are different. Does this model work when server is loaded?
Hi Man, i think you have done a very nice job in here, congrats. I would just add the chance to finetune the network with other dataset, because as it is, you assume people would want to finetune in the same COCO Dataset.
I made some minor modifications to make this happen:
I will try to a PR with those changes so more people can benefit from it. Best regards!
Hey @zzh8829, thanks for your code, works great. I was thinking it would be a good idea to have a detection threshold flag --thresh for the detection.py
or conversion.py
. Do I get it right that iou_threshold
and score_threshold
in lines 190-192
of models.py
is the only way to change threshold for prediction? Let me know, this flag along with num_classes flag would be a great boost to the repo.
Nice job on this project! Works great in python on a model we custom trained.
I am in the process of using the TF c_api to translate to C/C++ for deployment. I understand the input tensor (image, 416x416x3x1), but I am having a little trouble trying to figure out the format of the output tensor (for either YoloV3 or TinyYoloV3). Referencing the last lines in those functions:
outputs = Lambda(lambda x: yolo_nms(x, anchors, masks, classes),
name='yolo_nms')((boxes_0[:3], boxes_1[:3]))
Pre-NMS the tensor shape would be similar to:
batch_size x 10647 x (num_classes + 5 bounding box attrs)
The number 10647 is equal to the sum 507 +2028 + 8112, which are the numbers of possible objects detected on each scale (for full YoloV3). The bbox values describing bounding box attributes stand for center_x, center_y, width, height, confidence.
So, looking for confirmation: if I used YoloV3Tiny with say 12 classes and two scales (not 3 like full YoloV3), the output tensor would look like the following:
(507 + 2028) * 3 * (12 + 5) = 2535 * 3 * 60 = 456,300
Note: I am also consulting the combined_max_suppression API, which is the last step in the pb file I am loading:
https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/image/combined_non_max_suppression
which defines the output as the following:
Returns:
'nmsed_boxes': A [batch_size, max_detections, 4] float32 tensor containing the non-max suppressed boxes. 'nmsed_scores': A [batch_size, max_detections] float32 tensor containing the scores for the boxes. 'nmsed_classes': A [batch_size, max_detections] float32 tensor containing the class for boxes. 'valid_detections': A [batch_size] int32 tensor indicating the number of valid detections per batch item. Only the top valid_detections[i] entries in nms_boxes[i], nms_scores[i] and nms_class[i] are valid. The rest of the entries are zero paddings.
Is this correct? Any help would be appreciated.
Thanks!
Rob
Thanks for great work!
Can I obtain multiple (e.g., top-10) outputs for each bboxt?
Current implementation returns only the class with the highest probability (e.g., dog 0.8 coordinates), but I wonder if I can obtain the results like:
dog 0.8 coordinatesA
cat 0.1 coordinatesA
dog 0.5 coordinatesB
cat 0.3 coordinatesB
horse 0.1 coordinatesB
...
Is there anyone who added some accuracy to the detection or training?
(yolov3-tf2) C:\Users\roman\ml\yolov3-tf2>python convert.py --weights ./data/yolov3-tiny.weights --output ./checkpoints/yolov3-tiny.tf --tiny
Traceback (most recent call last):
File "convert.py", line 4, in
from yolov3_tf2.models import YoloV3, YoloV3Tiny
File "C:\Users\roman\ml\yolov3-tf2\yolov3_tf2\models.py", line 2, in
import tensorflow as tf
File "C:\Users\roman\AppData\Roaming\Python\Python36\site-packages\tensorflow_init_.py", line 24, in
from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-import
File "C:\Users\roman\AppData\Roaming\Python\Python36\site-packages\tensorflow\python_init_.py", line 52, in
from tensorflow.core.framework.graph_pb2 import *
File "C:\Users\roman\AppData\Roaming\Python\Python36\site-packages\tensorflow\core\framework\graph_pb2.py", line 6, in
from google.protobuf import descriptor as _descriptor
File "C:\Users\roman\AppData\Roaming\Python\Python36\site-packages\google\protobuf\descriptor.py", line 47, in
from google.protobuf.pyext import _message
ImportError: DLL load failed: Procedure not found
When running the commands in your readme, I get an error when I call python convert.py
:
$ python convert.py
Traceback (most recent call last):
File "convert.py", line 4, in <module>
from yolov3_tf2.models import YoloV3, YoloV3Tiny
File "/current/working/directory/models.py", line 21, in <module>
from .utils import broadcast_iou
File "/current/working/directory/utils.py", line 4, in <module>
import cv2
ImportError: /usr/lib/x86_64-linux-gnu/libcairo.so.2: undefined symbol: FT_Get_Var_Design_Coordinates
It also seems strange to me that the symbol actually does appear in the libcairo file:
$ grep FT_Get_Var_Design_Coordinates /usr/lib/x86_64-linux-gnu/libcairo.so.2
Binary file /usr/lib/x86_64-linux-gnu/libcairo.so.2 matches
The error does not originate in your yolov3-tf2 code, but it might be related to the dependencies. Could you please check the versions of the dependency packages you have installed? It might help for me to downgrade some of them.
I am working on Ubuntu 18.04.2 LTS, with python 3.6.0, pip3 version 9.0.1 and conda 4.6.11
Edit: I should add that I am testing this on a computer without a GPU before migrating to one with a GPU. In the meantime, I have substituted the python package tensorflow-gpu-2.0.0a0
for tensorflow-2.0.0a0
Edit: after some additional investigation, I suspect I have somehow messed up a combination of things installed with apt
, pip3
, python3 -m pip
and conda
. It might be helpful if you could share the output of your python3 -m pip freeze
and conda list
for a working installation. Then I can compare it with my system.
I don't understand if the model should just output 0 for everything even if the model diverges a little bit, i.e., I get the output only for 2 epochs where train and val loss are almost same, but even for 1 epoch later I get all 0. Is this a expected behaviour or I am doing something wrong?
Does someone happen to know why this is the case?
Could you help tell your training speed? when i use other framework, the training speed is too slow.
thank you very much
Hi, I used training this model on coco, loss is not converge, I saw you have success trained on VOC, so I try VOC.
The loss is not as big as coco, after training like this:
1906/1906 [==============================] - 481s 252ms/step - loss: 33.4387 - yolo_output_0_loss: 12.5275 - yolo_output_1_loss: 10.1783 - yolo_output_2_loss: 5.0296 - val_loss: 48.1587 - val_yolo_output_0_loss: 6.8133 - val_yolo_output_1_loss: 35.6657 - val_yolo_output_2_loss: 1.8716
Epoch 3/100
1905/1906 [============================>.] - ETA: 0s - loss: 31.7364 - yolo_output_0_loss: 12.3622 - yolo_output_1_loss: 10.4751 - yolo_output_2_loss: 4.9999
Epoch 00003: saving model to checkpoints/yolov3_voc-3.tf
1906/1906 [==============================] - 633s 332ms/step - loss: 31.7288 - yolo_output_0_loss: 12.3598 - yolo_output_1_loss: 10.4729 - yolo_output_2_loss: 4.9976 - val_loss: 51.0017 - val_yolo_output_0_loss: 10.2503 - val_yolo_output_1_loss: 37.1377 - val_yolo_output_2_loss: 1.0919
Epoch 4/100
205/1906 [==>...........................] - ETA: 7:58 - loss: 31.8537 - yolo_output_0_loss: 12.5202 - yolo_output_1_loss: 11.6964 - yolo_output_2_loss: 5.5369^CTraceback (most recent call last):
I run detection, but got nothing result:
I0730 14:28:41.549190 139918602266368 demo_voc.py:31] weights loaded from ./checkpoints/yolov3_voc-3
I0730 14:28:42.425215 139918602266368 demo_voc.py:41] time: 0.8550496101379395
I0730 14:28:42.425319 139918602266368 demo_voc.py:43] detections:
box num: tf.Tensor(0, shape=(), dtype=int32)
Do you have any idea for why?
Most of the images in my dataset are rectangle, with width to height ratio of 16:9. In which way should I modify the function of 'transform_targets_for_output' or other, if training and predicting on rectangle images is desired? Thanks.
I have a train.txt, like this:
# imagepath xmin,ymin,xmax,ymax,label ...
path/to/img1.jpg 50,100,150,200,0 30,50,200,120,3
path/to/img2.jpg 120,300,250,600,2
...
Then I covert this txt file to tfreord with this code:
import os
import random
import tensorflow as tf
train_txt = './train.txt'
images_path = './JPEGImages'
def get_example(line):
class_text = []
xmin = []
ymin = []
xmax = []
ymax = []
line = line.split(' ')
# ่ฏปๅๅพ็
image_path = line[0]
with tf.io.gfile.GFile(image_path, 'rb') as fib:
image_encoded = fib.read()
# ่ฏปๅๅๆ ๅ็ฑปๅซ
for item in line[1:]:
item = item.split(',')
xmin.append(float(item[0]))
ymin.append(float(item[1]))
xmax.append(float(item[2]))
ymax.append(float(item[3]))
if item[4] == 0:
class_text.append("0".encode('utf8'))
else:
class_text.append("1".encode('utf8'))
example = tf.train.Example(features=tf.train.Features(feature={
'image/encoded': tf.train.Feature(bytes_list=tf.train.BytesList(value=[image_encoded])),
'image/object/bbox/xmin': tf.train.Feature(float_list=tf.train.FloatList(value=xmin)),
'image/object/bbox/xmax': tf.train.Feature(float_list=tf.train.FloatList(value=xmax)),
'image/object/bbox/ymin': tf.train.Feature(float_list=tf.train.FloatList(value=ymin)),
'image/object/bbox/ymax': tf.train.Feature(float_list=tf.train.FloatList(value=ymax)),
'image/object/class/text': tf.train.Feature(bytes_list=tf.train.BytesList(value=class_text))
}))
return example
train_writer = tf.io.TFRecordWriter('./train.tfrecord')
val_writer = tf.io.TFRecordWriter('./val.tfrecord')
with open(train_txt, 'r') as f:
lines = f.read().split('\n')
random.shuffle(lines)
# ่ฎญ็ปๆฐๆฎ
for line in lines[:5000]:
if len(line) > 0:
example = get_example(line)
train_writer.write(example.SerializeToString())
# valๆฐๆฎ
for line in lines[5000:]:
if len(line) > 0:
example = get_example(line)
val_writer.write(example.SerializeToString())
train_writer.close()
val_writer.close()
print("finish!")
and in yolov3_tf2/dataset.py, I have revised a little(line 79 ~ 97)๏ผ
# https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/using_your_own_dataset.md#conversion-script-outline-conversion-script-outline
IMAGE_FEATURE_MAP = {
# 'image/width': tf.io.FixedLenFeature([], tf.int64),
# 'image/height': tf.io.FixedLenFeature([], tf.int64),
# 'image/filename': tf.io.FixedLenFeature([], tf.string),
# 'image/source_id': tf.io.FixedLenFeature([], tf.string),
# 'image/key/sha256': tf.io.FixedLenFeature([], tf.string),
'image/encoded': tf.io.FixedLenFeature([], tf.string),
# 'image/format': tf.io.FixedLenFeature([], tf.string),
'image/object/bbox/xmin': tf.io.VarLenFeature(tf.float32),
'image/object/bbox/ymin': tf.io.VarLenFeature(tf.float32),
'image/object/bbox/xmax': tf.io.VarLenFeature(tf.float32),
'image/object/bbox/ymax': tf.io.VarLenFeature(tf.float32),
'image/object/class/text': tf.io.VarLenFeature(tf.string),
# 'image/object/class/label': tf.io.VarLenFeature(tf.int64),
# 'image/object/difficult': tf.io.VarLenFeature(tf.int64),
# 'image/object/truncated': tf.io.VarLenFeature(tf.int64),
# 'image/object/view': tf.io.VarLenFeature(tf.string),
}
when I begin to train, I miss this error:
2019-07-26 09:28:12.864438: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7134 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-07-26 09:28:27.838122: W tensorflow/core/framework/op_kernel.cc:1546] OP_REQUIRES failed at iterator_ops.cc:1055 : Invalid argument: Paddings must be non-negative: 0 -16
[[{{node Pad}}]]
Traceback (most recent call last):
File "train.py", line 177, in <module>
app.run(main)
File "/home/rainweic/anaconda3/envs/python35/lib/python3.5/site-packages/absl/app.py", line 300, in run
_run_main(main, args)
File "/home/rainweic/anaconda3/envs/python35/lib/python3.5/site-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "train.py", line 116, in main
for batch, (images, labels) in enumerate(train_dataset):
File "/home/rainweic/anaconda3/envs/python35/lib/python3.5/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 586, in __next__
return self.next()
File "/home/rainweic/anaconda3/envs/python35/lib/python3.5/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 623, in next
return self._next_internal()
File "/home/rainweic/anaconda3/envs/python35/lib/python3.5/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 615, in _next_internal
output_shapes=self._flat_output_shapes)
File "/home/rainweic/anaconda3/envs/python35/lib/python3.5/site-packages/tensorflow/python/ops/gen_dataset_ops.py", line 2120, in iterator_get_next_sync
_six.raise_from(_core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: Paddings must be non-negative: 0 -16
[[{{node Pad}}]] [Op:IteratorGetNextSync]
Exception ignored in: <bound method _CheckpointRestoreCoordinator.__del__ of <tensorflow.python.training.tracking.util._CheckpointRestoreCoordinator object at 0x7f0c8052abe0>>
Traceback (most recent call last):
File "/home/rainweic/anaconda3/envs/python35/lib/python3.5/site-packages/tensorflow/python/training/tracking/util.py", line 244, in __del__
File "/home/rainweic/anaconda3/envs/python35/lib/python3.5/site-packages/tensorflow/python/training/tracking/util.py", line 93, in node_names
File "/home/rainweic/anaconda3/envs/python35/lib/python3.5/site-packages/tensorflow/python/training/tracking/object_identity.py", line 76, in __getitem__
KeyError: (<tensorflow.python.training.tracking.object_identity._ObjectIdentityWrapper object at 0x7f0c8049a048>,)
Who can help me ? thanks๏ผ
Hello and thanks for your work !
I wanted to know if it is possible to play with custom learning rate optimiser. I usually use CLR for image classification and I wanted to know if It is possible to implement it in your code.
YoloV3-Tiny graph that runs using TF 2.0/CPU-only crashes when attempting to run TF2.0/GPU with the following error:
File "C:\Users\Rob\AppData\Local\conda\conda\envs\tf2-gpu\lib\site-packages\absl\app.py", line 251, in _run_main
sys.exit(main(argv))
File "detect_video.py", line 26, in main
tf.config.gpu.set_per_process_memory_fraction(FLAGS.gpu_fraction)
AttributeError: module 'tensorflow._api.v2.config' has no attribute 'gpu'
Any idea?
Thanks,
Rob
Traceback (most recent call last):
File "train.py", line 175, in <module>
app.run(main)
File "/usr/local/lib/python2.7/dist-packages/absl/app.py", line 300, in run
_run_main(main, args)
File "/usr/local/lib/python2.7/dist-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "train.py", line 49, in main
model = YoloV3(FLAGS.size, training=True)
File "/github.com/zzh8829/yolov3-tf2/yolov3_tf2/models.py", line 210, in YoloV3
x = YoloConv(128, name='yolo_conv_2')((x, x_36))
File "/github.com/zzh8829/yolov3-tf2/yolov3_tf2/models.py", line 103, in yolo_conv
x = Concatenate()([x, x_skip])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 594, in __call__
self._maybe_build(inputs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 1713, in _maybe_build
self.build(input_shapes)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/utils/tf_utils.py", line 290, in wrapper
output_shape = fn(instance, input_shape)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/layers/merge.py", line 392, in build
'Got inputs shapes: %s' % (input_shape))
ValueError: A `Concatenate` layer requires inputs with matching shapes except for the concat axis. Got inputs shapes: [(None, 36, 36, 128), (None, 37, 37, 256)]
I have found that tf.pad in dataset.py return the same shape as before (None,5).
Thanks for a very clean implementation! I'm getting 0.6 sec for predict()
on 1080ti and puzzled why this should be so slow. With a similar implementation, I am able to get > 30 fps.
Any idea, please?
I get the following error. I am not sure what I need to do to fix my tf record files.
2019-06-10 21:27:25.351566: W tensorflow/core/framework/op_kernel.cc:1431] OP_REQUIRES failed at example_parsing_ops.cc:240 : Invalid argument: Feature: image/key/sha256 (data type: string) is required but could not be found.
2019-06-10 21:27:25.351650: W tensorflow/core/framework/op_kernel.cc:1431] OP_REQUIRES failed at iterator_ops.cc:988 : Invalid argument: Feature: image/key/sha256 (data type: string) is required but could not be found.
[[{{node ParseSingleExample/ParseSingleExample}}]]
Traceback (most recent call last):
File "train.py", line 178, in <module>
app.run(main)
File "/home/shaun/tf2/env/lib/python3.6/site-packages/absl/app.py", line 300, in run
2019-06-10 21:27:25.351867: W tensorflow/core/framework/op_kernel.cc:1431] OP_REQUIRES failed at example_parsing_ops.cc:240 : Invalid argument: Feature: image/key/sha256 (data type: string) is required but could not be found.
_run_main(main, args)
File "/home/shaun/tf2/env/lib/python3.6/site-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "train.py", line 173, in main
validation_data=val_dataset)
File "/home/shaun/tf2/env/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 791, in fit
initial_epoch=initial_epoch)
File "/home/shaun/tf2/env/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1515, in fit_generator
steps_name='steps_per_epoch')
File "/home/shaun/tf2/env/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_generator.py", line 213, in model_iteration
batch_data = _get_next_batch(generator, mode)
File "/home/shaun/tf2/env/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_generator.py", line 355, in _get_next_batch
generator_output = next(generator)
File "/home/shaun/tf2/env/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 556, in __next__
return self.next()
File "/home/shaun/tf2/env/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 585, in next
return self._next_internal()
File "/home/shaun/tf2/env/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 577, in _next_internal
output_shapes=self._flat_output_shapes)
File "/home/shaun/tf2/env/lib/python3.6/site-packages/tensorflow/python/ops/gen_dataset_ops.py", line 1954, in iterator_get_next_sync
_six.raise_from(_core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: Feature: image/key/sha256 (data type: string) is required but could not be found.
[[{{node ParseSingleExample/ParseSingleExample}}]] [Op:IteratorGetNextSync]
Getting nan when training COCO dataset. Generated tf records using object detection's create_coco_tf_record script.
From your repo, followed the instructions to download weights and convert them. Ran training with the following command line:
python train.py --batch_size 8 --dataset
This is python3.0, tensorflow 2.0 gpu version.
How can we do any sort of transfer learning on our own dataset with number of classes other than 80? In my case training from scratch doesn't give that great result. Transfer darknet seems to transfer even the yolo layer where classes have been taken into consideration.
I've fixed the bug and (lightly) tested, but do not have push permissions!
It comes down to COCO names having different spellings than matching VOC names.
If you make the change, don't forget to keep the old names file and point to it in train.py ;-)
Thanks for the beautiful work!
I wonder do you have any plan to implement Mobilenet as an alternative of Darknet?
I am having trouble running the detect.py script. When I run it after training and loading weights from that training, I get this error:
Traceback (most recent call last):
File "detect.py", line 65, in
app.run(main)
File "C:\Users\venkav1\AppData\Local\Continuum\anaconda3\envs\tf-n\lib\site-packages\absl\app.py", line 300, in run
_run_main(main, args)
File "C:\Users\venkav1\AppData\Local\Continuum\anaconda3\envs\tf-n\lib\site-packages\absl\app.py", line 251, in _run_main
sys.exit(main(argv))
File "detect.py", line 53, in main
for i in range(nums[0]):
TypeError: 'Tensor' object cannot be interpreted as an integer
When I print out nums, it gives this <tf.Tensor 'yolov3/yolo_nms/combined_non_max_suppression/CombinedNonMaxSuppression:3' shape=(1,) dtype=int32>)
Any idea on how to fix this?
Hi there,
I am currently trying to train on my custom dataset. I have created a .tfrecord-file which looks reasonable to me. However, when I run train.py the following error message occurs directly after having printed Epoch 1/100
:
2019-05-20 22:19:39.735116: W tensorflow/core/framework/op_kernel.cc:1431] OP_REQUIRES failed at iterator_ops.cc:988 : Invalid argument: Expected image (JPEG, PNG, or GIF), got unknown format starting with '/9j/4AAQSkZJRgAB' [[{{node DecodeJpeg}}]] Traceback (most recent call last): File "C:/Users/Marcel/.../yolov3-tf2/train.py", line 184, in <module> app.run(main) File "C:\Users\Marcel\AppData\Local\Programs\Python\Python36\lib\site-packages\absl\app.py", line 300, in run _run_main(main, args) File "C:\Users\Marcel\AppData\Local\Programs\Python\Python36\lib\site-packages\absl\app.py", line 251, in _run_main sys.exit(main(argv)) File "C:/Users/Marcel/Desktop/Uni/6.Semester/Projektarbeit/YOLO/asia_repo/yolov3-tf2/train.py", line 176, in main validation_data=val_dataset) File "C:\Users\Marcel\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\keras\engine\training.py", line 791, in fit initial_epoch=initial_epoch) File "C:\Users\Marcel\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\keras\engine\training.py", line 1515, in fit_generator steps_name='steps_per_epoch') File "C:\Users\Marcel\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\keras\engine\training_generator.py", line 213, in model_iteration batch_data = _get_next_batch(generator, mode) File "C:\Users\Marcel\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\keras\engine\training_generator.py", line 355, in _get_next_batch generator_output = next(generator) File "C:\Users\Marcel\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\data\ops\iterator_ops.py", line 556, in __next__ return self.next() File "C:\Users\Marcel\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\data\ops\iterator_ops.py", line 585, in next return self._next_internal() File "C:\Users\Marcel\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\data\ops\iterator_ops.py", line 577, in _next_internal output_shapes=self._flat_output_shapes) File "C:\Users\Marcel\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\gen_dataset_ops.py", line 1983, in iterator_get_next_sync _six.raise_from(_core._status_to_exception(e.code, message), None) File "<string>", line 3, in raise_from tensorflow.python.framework.errors_impl.InvalidArgumentError: Expected image (JPEG, PNG, or GIF), got unknown format starting with '/9j/4AAQSkZJRgAB' [[{{node DecodeJpeg}}]] [Op:IteratorGetNextSync]
Comparing the given "start of the unknown data" (/9j/4AAQSkZJRgAB) with the .tfrecord-file it becomes clear, that it is the start of the encoded image:
features { feature { key: "image/encoded" value { bytes_list { value: "/9j/4AAQSkZJRgABAQAAAQABAAD..." } } ... }
So I think my .tfrecord-file is not the problem in this case but rather I am somewhere missing a decoding of the encoded image. I also already checked if my files are in some way corrupted, but I am pretty sure that they are fine. Used google and stackoverflow but these did not reveal the answer to my problem neither. Thus, I am stuck and cannot think of another reason for this error.
Did anyone else experience the same problem and can help me find the source of this error?
python train.py --batch_size 8 --dataset=C:\...\platt.record --val_dataset=C:\...\platt_val.record --epochs 10 --mode eager_fit --transfer fine_tune --weights ./checkpoints/yolov3-tiny.tf --tiny
results in this output:
Epoch 1/10
2019-06-20 02:13:00.680170: I tensorflow/core/profiler/lib/profiler_session.cc:164] Profile Session started.
2019-06-20 02:13:00.685371: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library cupti64_100.dll
1/Unknown - 4s 4s/step - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nanW0620 02:13:01.387073 9828 callbacks.py:236] Method (on_train_batch_end) is slow compared to the batch update (0.256449). Check your callbacks.
7/Unknown - 6s 807ms/step - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nanC:\...\Anaconda3\envs\yolov3-tf2\lib\site-packages\tensorflow\python\keras\callbacks.py:1467: RuntimeWarning: invalid value encountered in less
self.monitor_op = lambda a, b: np.less(a, b - self.min_delta)
C:\...\Anaconda3\envs\yolov3-tf2\lib\site-packages\tensorflow\python\keras\callbacks.py:979: RuntimeWarning: invalid value encountered in less
if self.monitor_op(current - self.min_delta, self.best):
Epoch 00001: saving model to checkpoints/yolov3_train_1.tf
7/7 [==============================] - 7s 1s/step - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nan - val_loss: nan - val_yolo_output_0_loss: nan - val_yolo_output_1_loss: nan
Epoch 2/10
6/7 [========================>.....] - ETA: 0s - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nan
Epoch 00002: saving model to checkpoints/yolov3_train_2.tf
7/7 [==============================] - 3s 394ms/step - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nan - val_loss: nan - val_yolo_output_0_loss: nan - val_yolo_output_1_loss: nan
Epoch 3/10
6/7 [========================>.....] - ETA: 0s - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nan
Epoch 00003: saving model to checkpoints/yolov3_train_3.tf
7/7 [==============================] - 3s 396ms/step - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nan - val_loss: nan - val_yolo_output_0_loss: nan - val_yolo_output_1_loss: nan
Epoch 00003: early stopping
What might be the cause for that? Also there are other open issues regarding training and I'm wondering if anyone was successfull.
Thanks for sharing this work! It's definitely interesting to see an example of what TF 2.0 is going to be to work with.
I have one question: in your readme you mention:
From my limited testing, GradientTape is definitely a bit slower than the normal graph mode.
Could you expand on this? Maybe share some numbers?
I'm interested because the advent of eager execution has always been to have imperative programming (for easier workflow) while not losing too much in performance. If it turns out however that for practical purposes it's not feasible to train in eager mode, one would have to maintain separate training loops, like you've done in train.py
. It seems to me this would be detrimental to maintainability of TF2 repositories. Do you have a view on this?
YoloV3 runs detcct_video with COCO detections! Nice.
YoloV3Tiny runs, but nothing detected. Is a detection threshold set too high somewhere?
Thanks,
Rob
I have successfully used the code to train from scratch and also using the options darknet
and no_output
on several custom data sets. I have written a script that produces the tf records for training and validation.
When I try to train on a new dataset, I get nan
values for the loss at the very beginning of training and also an error message that some tensor shapes do not match. I am not sure what is wrong. I think that I correctly generate the tf records for training and validation (eagle_train.record
and eagle_test.record
in the run below) because everything worked fine for the other data sets that I tried previously.
I have also noticed that the shapes that do not index into shape [8, 13, 13, 3, 6]
change each time I rerun the command for training. In the run below, the wrong shape is [2, 13, 5, 2]
, but in other runs I got shapes such as [1, 15, 5, 2]
and [7, 13, 5, 2]
even though I have not changed the code at all.
Does anybody have an idea what could be the cause for this behavior? Thanks a lot for your help!
95/Unknown - 57s 600ms/step - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nan - yolo_output_2_loss: nan2019-08-12 14:01:02.722807: W tensorflow/core/framework/op_kernel.cc:1546] OP_REQUIRES failed at scatter_nd_op.cc:217 : Invalid argument: indices[2] = [2, 13, 5, 2] does not index into sha
pe [8,13,13,3,6]
Here is the complete trace:
(yolov3-tf2-master) C:\Users\Pawel Wocjan\Documents\ML\yolov3-tf2-master>python train.py --dataset ./data/eagle_train.record --val_dataset ./data/eagle_test.record --transfer darknet --mode fit --epochs 2
C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\tensorflow\python\framework\dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\tensorflow\python\framework\dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\tensorflow\python\framework\dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\tensorflow\python\framework\dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\tensorflow\python\framework\dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\tensorflow\python\framework\dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
2019-08-12 13:59:48.546377: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library nvcuda.dll
2019-08-12 13:59:48.626458: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce RTX 2080 major: 7 minor: 5 memoryClockRate(GHz): 1.71
pciBusID: 0000:01:00.0
2019-08-12 13:59:48.629067: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2019-08-12 13:59:48.630822: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-08-12 13:59:48.633876: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2019-08-12 13:59:48.636530: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce RTX 2080 major: 7 minor: 5 memoryClockRate(GHz): 1.71
pciBusID: 0000:01:00.0
2019-08-12 13:59:48.638655: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2019-08-12 13:59:48.640399: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-08-12 13:59:49.208639: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-08-12 13:59:49.211078: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0
2019-08-12 13:59:49.212671: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N
2019-08-12 13:59:49.214527: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6280 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080, pci bus id: 0000:01:00.0, compute capability: 7.5)
W0812 14:00:03.033751 7376 deprecation.py:323] From C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\tensorflow\python\ops\array_ops.py:1340: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Epoch 1/2
2019-08-12 14:00:34.698461: I tensorflow/core/profiler/lib/profiler_session.cc:174] Profiler session started.
2019-08-12 14:00:34.700999: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'cupti64_100.dll'; dlerror: cupti64_100.dll not found
2019-08-12 14:00:34.703512: W tensorflow/core/profiler/lib/profiler_session.cc:182] Encountered error while starting profiler: Unavailable: CUPTI error: CUPTI could not be loaded or symbol could not be found.
1/Unknown - 29s 29s/step - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nan - yolo_output_2_loss: nan2019-08-12 14:00:34.991461: I tensorflow/core/platform/default/device_tracer.cc:641] Collecting 0 kernel records, 0 memcpy records.
2019-08-12 14:00:35.036669: E tensorflow/core/platform/default/device_tracer.cc:68] CUPTI error: CUPTI could not be loaded or symbol could not be found.
95/Unknown - 57s 600ms/step - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nan - yolo_output_2_loss: nan2019-08-12 14:01:02.722807: W tensorflow/core/framework/op_kernel.cc:1546] OP_REQUIRES failed at scatter_nd_op.cc:217 : Invalid argument: indices[2] = [2, 13, 5, 2] does not index into sha
pe [8,13,13,3,6]
96/Unknown - 57s 597ms/step - loss: nan - yolo_output_0_loss: nan - yolo_output_1_loss: nan - yolo_output_2_loss: nan2019-08-12 14:01:02.768297: W tensorflow/core/framework/op_kernel.cc:1546] OP_REQUIRES failed at iterator_ops.cc:1055 : Invalid argument: [_Derived_]{{function_node __inference_transform_t
argets_for_output_14319_specialized_for_StatefulPartitionedCall_at___inference_Dataset_map_<lambda>_15243}} {{function_node __inference_transform_targets_for_output_14319_specialized_for_StatefulPartitionedCall_at___inference_Dataset_map_<lambda>_15243}} indices[2] = [2, 13, 5, 2] does not index into shape [8
,13,13,3,6]
[[{{node TensorScatterUpdate}}]]
[[StatefulPartitionedCall]]
Traceback (most recent call last):
File "train.py", line 193, in <module>
app.run(main)
File "C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\absl\app.py", line 300, in run
_run_main(main, args)
File "C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\absl\app.py", line 251, in _run_main
sys.exit(main(argv))
File "train.py", line 188, in main
validation_data=val_dataset)
File "C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\tensorflow\python\keras\engine\training.py", line 643, in fit
use_multiprocessing=use_multiprocessing)
File "C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\tensorflow\python\keras\engine\training_generator.py", line 694, in fit
steps_name='steps_per_epoch')
File "C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\tensorflow\python\keras\engine\training_generator.py", line 220, in model_iteration
batch_data = _get_next_batch(generator, mode)
File "C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\tensorflow\python\keras\engine\training_generator.py", line 362, in _get_next_batch
generator_output = next(generator)
File "C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\tensorflow\python\data\ops\iterator_ops.py", line 586, in __next__
return self.next()
File "C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\tensorflow\python\data\ops\iterator_ops.py", line 623, in next
return self._next_internal()
File "C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\tensorflow\python\data\ops\iterator_ops.py", line 615, in _next_internal
output_shapes=self._flat_output_shapes)
File "C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\tensorflow\python\ops\gen_dataset_ops.py", line 2150, in iterator_get_next_sync
_six.raise_from(_core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: [_Derived_]{{function_node __inference_transform_targets_for_output_14319_specialized_for_StatefulPartitionedCall_at___inference_Dataset_map_<lambda>_15243}} {{function_node __inference_transform_targets_for_output_14319_specialized_for_StatefulPar
titionedCall_at___inference_Dataset_map_<lambda>_15243}} indices[2] = [2, 13, 5, 2] does not index into shape [8,13,13,3,6]
[[{{node TensorScatterUpdate}}]]
[[StatefulPartitionedCall]] [Op:IteratorGetNextSync]
Exception ignored in: <bound method _CheckpointRestoreCoordinator.__del__ of <tensorflow.python.training.tracking.util._CheckpointRestoreCoordinator object at 0x00000284600F3E10>>
Traceback (most recent call last):
File "C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\tensorflow\python\training\tracking\util.py", line 244, in __del__
File "C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\tensorflow\python\training\tracking\util.py", line 93, in node_names
File "C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\tensorflow\python\training\tracking\object_identity.py", line 76, in __getitem__
KeyError: (<tensorflow.python.training.tracking.object_identity._ObjectIdentityWrapper object at 0x00000282E5DA0F60>,)
It seems in https://github.com/zzh8829/yolov3-tf2/blob/eb30bd48ac1354a329a0763b2a8fe57364c5a272/yolov3_tf2/dataset.py you just simply resized original image into shape of (416, 416)
def transform_images(x_train, size): x_train = tf.image.resize(x_train, (size, size)) x_train = x_train / 255 return x_train
However, I think it might cause image distortion and make anchors meaningless. In darknet, the author use letterbox() method to keep image aspect ratio by padding.
Have you compared the results of different resizing method?
Hi,
You may have a bug in dataset.py line 36:
Instead of the line: idx, [box[0], box[1], box[2], box[3], 1, y_true[i][j][4]])
I think that it must be
[box[0], box[1], box[2]-box[0], box[3]-box[1], 1, y_true[i][j][4]])
Can you please confirm?
Thanks
What specific flags should I pass into train.py in order to resume my training checkpoint? I'm not sure if --transfer none, fine_tune, freeze, darknet fit the criteria to resume as they all involve freezing or getting rid of some parts of the weights.
Detection works without any problems. When I try running the first command as explained in README under training (applied to my train and test sets), I get the following error message.
(yolov3-tf2-master) C:\Users\Pawel Wocjan\Documents\ML\yolov3-tf2-master>python train.py --batch_size 8 --dataset "C:\Users\Pawel Wocjan\Documents\ML\yolov3-tf2-master\data\racoon_dataset\train.record" --val_dataset "C:\Users\Pawel Wocjan\Documents\ML\yolov3-tf2-master\data
racoon_dataset\test.record" --epochs 100 --mode eager_tf --transfer fine_tune
W0724 12:49:11.198809 5172 deprecation.py:506] From C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\tensorflow\python\ops\init_ops.py:1251: calling VarianceScaling.init (from tensorflow.python.ops.init_ops) with dtype is deprecated an
d will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
2019-07-24 12:49:13.840923: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
W0724 12:49:14.279574 5172 deprecation.py:323] From C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\tensorflow\python\autograph\impl\api.py:255: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated an
d will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Traceback (most recent call last):
File "train.py", line 175, in
app.run(main)
File "C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\absl\app.py", line 300, in run
_run_main(main, args)
File "C:\Users\Pawel Wocjan\Documents\ML\Environments\yolov3-tf2-master\lib\site-packages\absl\app.py", line 251, in _run_main
sys.exit(main(argv))
File "train.py", line 129, in main
epoch, batch, total_loss.numpy(),
AttributeError: 'Tensor' object has no attribute 'numpy'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.