endernewton / tf-faster-rcnn Goto Github PK

View Code? Open in Web Editor NEW

3.6K 121.0 1.6K 2.46 MB

Tensorflow Faster RCNN for Object Detection

Home Page: https://arxiv.org/pdf/1702.02138.pdf

License: MIT License

Shell 3.02% Makefile 0.04% MATLAB 0.85% Python 90.71% Cuda 2.36% Roff 0.56% C++ 0.07% Cython 2.38%

tensorflow object-detection faster-rcnn coco voc resnet mobilenet tensorboard

tf-faster-rcnn's People

Contributors

Stargazers

Watchers

Forkers

xiongweiwu ml-lab zhangxinnan rickppd tengxing wanjinchang liuguoyou benjamesbabala mvpduncan xjwxjw soledad89 sigmunjr cucapra manila95 andrei-pokrovsky anazou pierrehao wonyonyon lyk125 stephenjia johndpope aponamarev allensmile vyraun fatmas1982 zorrock bigsnarfdude pointivo philokey hyzcn as6520 arasharchor xiaofengqing wangzhiyuan2016 wzheng1983 ichito algpower lwwang think-station simujenni statml wenxichen liyichen7887 lukeandshuo chenxi116 bryantlj data-drone boluoyu anyong298 hemavakade tvkpz stikbuf qinhongwei jaypy-h tybxiaobao lhagendoorn singlasahil14 conanhung pleiadesvii kukuruza clungu zhyj3038 alphalfc aicarmark betsyweilin sungjinlees landuber peilin-yang perhapszzy coderx7 bhack kevinkit 1292765944 ducta-qc weiliangxiao ywpkwon tongli12 raghparihar vladyslavmelnyk whuguozili dengdan kongx73 chankeh zhunzhong07 josephkj blackspadeace jingbu mhttx2016 andrewhao123 fuwinterdong fanlu philo-zhang vmirly xiongcailuo stephenbydawell hyh21521038 wwwanghao sw1sh splashblot shuang1330

tf-faster-rcnn's Issues

Purpose of subtracting 1 from bounding boxes

What is the purpose of subtracting 1 from the bounding box corners in pascal_voc.py lines 178, 215, 216, 217, and 218? In the comments it says the purpose is to "Make pixel indexes 0-based" but aren't they already 0 based? Seems like this would cause problems for annotations already located at the 0th x or y coordinate.

4-times slower than caffe, any way to improve the speed?

Hi,

I notice the im_detect() is about 4x slower than caffe's implementation of im_detect(). Is there any way I can improve the speed?

Thanks.

training with custom data

Hi!

I would like to train the network with my custom data. Where could I find documentation about how to do that? (data format, configuration etc) ? Or can you directly explain me?

Thank you very much!

Regards,

Train on new dataset

Hi,

Great project. I have successfully ran the training and testing. I am wondering what do I need to change to train on a new dataset?

Thanks!

ResNext

Do you plan to experiment with ResNext?

Self-defined anchor ratios

@endernewton The anchor ratios are hardcoded as [0.5, 1, 2] in current version and the field "num_anchors" is computed by scales.shape[0] * 3
I suppose there should be some kind of interfaces that expose the anchor_ratio field, so that users may define their own ratios for various detection tasks.

Random results for custom image

Hi,

I took the code in from test_net() in lib/model/test.py and ran it on a custom image multiple times and print boxes.shape[0] after im_detect.
I get random results each run : sometimes 2 features detected, sometimes 4-6 or even 300.
Is this expected ? Is there something I can do to make it more predictable ?

Thanks

I can not reproduce the AP=0.295 with your advice. Do I miss something important?

I read your code and try to reproduce the AP in caffe-version py-faster-rcnn. In caffe-version py-faster-rcnn, I set RPN_MIN_SIZE=0 to keep the small proposals. My parameter of anchor is ratios=[0.5,1,2] scales: [4, 8, 16, 32]. My Learning rate setting is:
base_lr:0.001
lr_policy: "multistep"
stepvalue: 350000
stepvalue: 600000
stepvalue: 900000
gamma:0.1
weight_decay: 0.0005
momentum: 0.9
iter_size:2

I set the ASPECT_GROUPING: True and train with multi-scales.
I train with proposal_layer and test with proposal_layer(not proposal_top_layer).

I stop training at 1000K, because the valid ap stop improving. Although this setting can give me a improvement in small objects but the all AP=0.261(1000K). I train the vgg16 on trainval-minival and test on minival. Do I miss something?

Problem loading model

I am trying to run the training script but when it tries to load the vgg16.ckpt file I get the following error:

F ./tensorflow/core/util/tensor_slice_reader.h:168] Check failed: sss_[idx]->Get(key, &value) Failed to seek to the record for tensor vgg_16/fc6/weights, slice -:-:-:-: computed key =
./experiments/scripts/train_faster_rcnn.sh: line 75:  5508 Aborted                 (core dumped) CUDA_VISIBLE_DEVICES=${GPU_ID} python ./tools/trainval_net.py --weight data/imagenet_weights/${NET}.ckpt --imdb ${TRAIN_IMDB} --imdbval ${TEST_IMDB} --iters ${ITERS} --cfg experiments/cfgs/${NET}.yml --net ${NET} --set TRAIN.STEPSIZE ${STEPSIZE} ${EXTRA_ARGS}

I am running Tensorflow 1.0 with CUDA 8.0 on a GTX 1070.

utils/bbox.c:346:31: fatal error: numpy/arrayobject.h: No such file or directory #include "numpy/arrayobject.h"

So not sure how to fix this?

mona@pascal:~/computer_vision/tf-faster-rcnn/lib$ sudo pip2 install numpy
The directory '/home/mona/.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
The directory '/home/mona/.cache/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
Collecting numpy
/usr/local/lib/python2.7/dist-packages/pip/_vendor/requests/packages/urllib3/util/ssl_.py:318: SNIMissingWarning: An HTTPS request has been made, but the SNI (Subject Name Indication) extension to TLS is not available on this platform. This may cause the server to present an incorrect TLS certificate, which can cause validation failures. You can upgrade to a newer version of Python to solve this. For more information, see https://urllib3.readthedocs.io/en/latest/security.html#snimissingwarning.
  SNIMissingWarning
/usr/local/lib/python2.7/dist-packages/pip/_vendor/requests/packages/urllib3/util/ssl_.py:122: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. You can upgrade to a newer version of Python to solve this. For more information, see https://urllib3.readthedocs.io/en/latest/security.html#insecureplatformwarning.
  InsecurePlatformWarning
  Downloading numpy-1.12.0-cp27-cp27mu-manylinux1_x86_64.whl (16.5MB)
    100% |████████████████████████████████| 16.5MB 64kB/s 
Installing collected packages: numpy
Successfully installed numpy-1.12.0
mona@pascal:~/computer_vision/tf-faster-rcnn/lib$ make
python setup.py build_ext --inplace
running build_ext
skipping 'utils/bbox.c' Cython extension (up-to-date)
building 'utils.cython_bbox' extension
{'gcc': ['-Wno-cpp', '-Wno-unused-function']}
x86_64-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c utils/bbox.c -o build/temp.linux-x86_64-2.7/utils/bbox.o -Wno-cpp -Wno-unused-function
utils/bbox.c:346:31: fatal error: numpy/arrayobject.h: No such file or directory
 #include "numpy/arrayobject.h"
                               ^
compilation terminated.
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
make: *** [all] Error 1

Trouble downloading models

I'm using your script to fetch faster-rcnn models and it seems to be hanging. I'm not sure if this is due to heavy traffic or an error. Regardless, I thought I would ask if you had any suggestions. Thanks!

Is it applicable on CPU

I have a "out of memory" problem when I run the demo.py on my GPU with 1G memory.
Is this code applicable on CPU? If so, how can I set it up for running on CPU instead the GPU?
Thank you!

How to use demo.py to test my model?

I got my model by using fine tuning on vgg16 net with 26 classes. And I want to test it by using demo.py. But there are some errors.
If I use net.create_architecture(sess, "TEST", 27, tag='default', anchor_scales=[8, 16, 32]), it didn't work.
If I use net.create_architecture(sess, "TEST", 27, caffe_weight_path='../data/imagenet_weights/vgg16.weights', tag='default', anchor_scales=[8, 16, 32]), it showed that:

Loading caffe weights...
Done!
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key vgg_16/conv4_3/weights not found in checkpoint
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key vgg_16/conv4_2/weights not found in checkpoint
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key vgg_16/conv4_2/biases not found in checkpoint
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key vgg_16/conv4_3/biases not found in checkpoint
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key vgg_16/conv1_2/weights not found in checkpoint
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key vgg_16/conv5_2/biases not found in checkpoint
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key vgg_16/conv5_1/weights not found in checkpoint
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key vgg_16/conv1_2/biases not found in checkpoint
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key vgg_16/conv4_1/weights not found in checkpoint
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key vgg_16/conv5_1/biases not found in checkpoint
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key vgg_16/conv1_1/weights not found in checkpoint
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key vgg_16/conv4_1/biases not found in checkpoint
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key vgg_16/conv1_1/biases not found in checkpoint
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key vgg_16/conv3_3/weights not found in checkpoint
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key vgg_16/conv3_3/biases not found in checkpoint
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key vgg_16/conv4_3/weights not found in checkpoint
	 [[Node: save/RestoreV2_23 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_23/tensor_names, save/RestoreV2_23/shape_and_slices)]]
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key vgg_16/conv4_3/weights not found in checkpoint
	 [[Node: save/RestoreV2_23 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_23/tensor_names, save/RestoreV2_23/shape_and_slices)]]
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key vgg_16/conv3_2/weights not found in checkpoint
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key vgg_16/conv3_2/biases not found in checkpoint
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key vgg_16/conv3_1/weights not found in checkpoint
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key vgg_16/conv3_1/biases not found in checkpoint
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key vgg_16/conv2_2/weights not found in checkpoint
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key vgg_16/conv2_2/biases not found in checkpoint
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key vgg_16/conv2_1/weights not found in checkpoint
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key vgg_16/conv2_1/biases not found in checkpoint
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key vgg_16/conv4_3/weights not found in checkpoint
	 [[Node: save/RestoreV2_23 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_23/tensor_names, save/RestoreV2_23/shape_and_slices)]]
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key vgg_16/conv4_3/weights not found in checkpoint
	 [[Node: save/RestoreV2_23 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_23/tensor_names, save/RestoreV2_23/shape_and_slices)]]
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key vgg_16/conv4_3/weights not found in checkpoint
	 [[Node: save/RestoreV2_23 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_23/tensor_names, save/RestoreV2_23/shape_and_slices)]]
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key vgg_16/conv5_2/weights not found in checkpoint
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key vgg_16/conv4_3/weights not found in checkpoint
	 [[Node: save/RestoreV2_23 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_23/tensor_names, save/RestoreV2_23/shape_and_slices)]]
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key vgg_16/conv4_3/weights not found in checkpoint
	 [[Node: save/RestoreV2_23 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_23/tensor_names, save/RestoreV2_23/shape_and_slices)]]
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key vgg_16/conv5_3/biases not found in checkpoint
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key vgg_16/conv4_3/weights not found in checkpoint
	 [[Node: save/RestoreV2_23 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_23/tensor_names, save/RestoreV2_23/shape_and_slices)]]
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key vgg_16/conv4_3/weights not found in checkpoint
	 [[Node: save/RestoreV2_23 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_23/tensor_names, save/RestoreV2_23/shape_and_slices)]]
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key vgg_16/conv4_3/weights not found in checkpoint
	 [[Node: save/RestoreV2_23 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_23/tensor_names, save/RestoreV2_23/shape_and_slices)]]
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key vgg_16/conv4_3/weights not found in checkpoint
	 [[Node: save/RestoreV2_23 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_23/tensor_names, save/RestoreV2_23/shape_and_slices)]]
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key vgg_16/conv5_3/weights not found in checkpoint
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key vgg_16/conv4_3/weights not found in checkpoint
	 [[Node: save/RestoreV2_23 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_23/tensor_names, save/RestoreV2_23/shape_and_slices)]]
W tensorflow/core/framework/op_kernel.cc:993] Not found: Key vgg_16/conv4_3/weights not found in checkpoint
	 [[Node: save/RestoreV2_23 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_23/tensor_names, save/RestoreV2_23/shape_and_slices)]]
Traceback (most recent call last):
  File "/home/weiguang/tf-faster-rcnn/tools/demo-show.py", line 147, in <module>
    saver.restore(sess, tfmodel)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1428, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 767, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 965, in _run
    feed_dict_string, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1015, in _do_run
    target_list, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1035, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key vgg_16/conv4_3/weights not found in checkpoint
	 [[Node: save/RestoreV2_23 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_23/tensor_names, save/RestoreV2_23/shape_and_slices)]]
	 [[Node: save/RestoreV2_16/_113 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_175_save/RestoreV2_16", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

Caused by op u'save/RestoreV2_23', defined at:
  File "/home/weiguang/tf-faster-rcnn/tools/demo-show.py", line 146, in <module>
    saver = tf.train.Saver()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1040, in __init__
    self.build()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1070, in build
    restore_sequentially=self._restore_sequentially)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 675, in build
    restore_sequentially, reshape)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 402, in _AddRestoreOps
    tensors = self.restore_op(filename_tensor, saveable, preferred_shard)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 242, in restore_op
    [spec.tensor.dtype])[0])
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_io_ops.py", line 668, in restore_v2
    dtypes=dtypes, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2327, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1226, in __init__
    self._traceback = _extract_stack()

NotFoundError (see above for traceback): Key vgg_16/conv4_3/weights not found in checkpoint
	 [[Node: save/RestoreV2_23 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_23/tensor_names, save/RestoreV2_23/shape_and_slices)]]
	 [[Node: save/RestoreV2_16/_113 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_175_save/RestoreV2_16", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]


Process finished with exit code 1

And then, I wrote saver = tf.train.import_meta_graph(tfmodel + '.meta') instead of saver = tf.train.Saver().

Loading caffe weights...
Done!
Loaded network ../output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Demo for data/demo/000456.jpg
W tensorflow/core/framework/op_kernel.cc:993] Failed precondition: Attempting to use uninitialized value vgg_16/rpn_conv/3x3/weights
	 [[Node: vgg_16/rpn_conv/3x3/weights/read = Identity[T=DT_FLOAT, _class=["loc:@vgg_16/rpn_conv/3x3/weights"], _device="/job:localhost/replica:0/task:0/gpu:0"](vgg_16/rpn_conv/3x3/weights)]]
W tensorflow/core/framework/op_kernel.cc:993] Failed precondition: Attempting to use uninitialized value vgg_16/rpn_conv/3x3/weights
	 [[Node: vgg_16/rpn_conv/3x3/weights/read = Identity[T=DT_FLOAT, _class=["loc:@vgg_16/rpn_conv/3x3/weights"], _device="/job:localhost/replica:0/task:0/gpu:0"](vgg_16/rpn_conv/3x3/weights)]]
Traceback (most recent call last):
  File "/home/weiguang/tf-faster-rcnn/tools/demo-show.py", line 156, in <module>
    demo(sess, net, im_name)
  File "/home/weiguang/tf-faster-rcnn/tools/demo-show.py", line 84, in demo
    scores, boxes = im_detect(sess, net, im)
  File "/home/weiguang/tf-faster-rcnn/tools/../lib/model/test_vgg16.py", line 90, in im_detect
    _, scores, bbox_pred, rois = net.test_image(sess, blobs['data'], blobs['im_info'])
  File "/home/weiguang/tf-faster-rcnn/tools/../lib/nets/vgg16_depre.py", line 557, in test_image
    feed_dict=feed_dict)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 767, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 965, in _run
    feed_dict_string, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1015, in _do_run
    target_list, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1035, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value vgg_16/rpn_conv/3x3/weights
	 [[Node: vgg_16/rpn_conv/3x3/weights/read = Identity[T=DT_FLOAT, _class=["loc:@vgg_16/rpn_conv/3x3/weights"], _device="/job:localhost/replica:0/task:0/gpu:0"](vgg_16/rpn_conv/3x3/weights)]]
	 [[Node: vgg_16/rpn_cls_prob/transpose_1/_215 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_210_vgg_16/rpn_cls_prob/transpose_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

Caused by op u'vgg_16/rpn_conv/3x3/weights/read', defined at:
  File "/home/weiguang/tf-faster-rcnn/tools/demo-show.py", line 142, in <module>
    tag='default', anchor_scales=[8, 16, 32])
  File "/home/weiguang/tf-faster-rcnn/tools/../lib/nets/vgg16_depre.py", line 486, in create_architecture
    rois, cls_prob, bbox_pred = self._vgg16_from_imagenet(sess, training)
  File "/home/weiguang/tf-faster-rcnn/tools/../lib/nets/vgg16_depre.py", line 351, in _vgg16_from_imagenet
    rpn = self._conv_layer_shape(net, [3,3], 512, "rpn_conv/3x3", initializer, train)
  File "/home/weiguang/tf-faster-rcnn/tools/../lib/nets/vgg16_depre.py", line 144, in _conv_layer_shape
    filt=tf.get_variable('weights', size, initializer=initializer, trainable=trainable)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 988, in get_variable
    custom_getter=custom_getter)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 890, in get_variable
    custom_getter=custom_getter)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 348, in get_variable
    validate_shape=validate_shape)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 333, in _true_getter
    caching_device=caching_device, validate_shape=validate_shape)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 684, in _get_single_variable
    validate_shape=validate_shape)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variables.py", line 197, in __init__
    expected_shape=expected_shape)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variables.py", line 315, in _init_from_args
    self._snapshot = array_ops.identity(self._variable, name="read")
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 1490, in identity
    result = _op_def_lib.apply_op("Identity", input=input, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2327, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1226, in __init__
    self._traceback = _extract_stack()

FailedPreconditionError (see above for traceback): Attempting to use uninitialized value vgg_16/rpn_conv/3x3/weights
	 [[Node: vgg_16/rpn_conv/3x3/weights/read = Identity[T=DT_FLOAT, _class=["loc:@vgg_16/rpn_conv/3x3/weights"], _device="/job:localhost/replica:0/task:0/gpu:0"](vgg_16/rpn_conv/3x3/weights)]]
	 [[Node: vgg_16/rpn_cls_prob/transpose_1/_215 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_210_vgg_16/rpn_cls_prob/transpose_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]


Process finished with exit code 1

early stopping

has anyone considered building early stopping into the training process?

Res101 demo

After the last resnet accurancy improvment results do you plan to adapt the demo for res101 model?

generate_anchors_pre() argument mismatch

While running the demo, I face the following issue with generate_anchors_pre(). The tensorflow version I am using is 1.0.1

Demo for data/demo/000456.jpg
Traceback (most recent call last):
File "/home/aakhochare/tensorflowr10/local/lib/python2.7/site-packages/tensorflow/python/ops/script_ops.py", line 82, in call
ret = func(*args)
TypeError: generate_anchors_pre() takes exactly 5 arguments (4 given)
W tensorflow/core/framework/op_kernel.cc:993] Internal: Failed to run py callback pyfunc_0: see error log.
Traceback (most recent call last):
File "./tools/demo_depre.py", line 152, in
demo(sess, net, im_name)
File "./tools/demo_depre.py", line 83, in demo
scores, boxes = im_detect(sess, net, im)
File "/home/aakhochare/VisualAlgorithms/tf-faster-rcnn/tools/../lib/model/test_vgg16.py", line 90, in im_detect
_, scores, bbox_pred, rois = net.test_image(sess, blobs['data'], blobs['im_info'])
File "/home/aakhochare/VisualAlgorithms/tf-faster-rcnn/tools/../lib/nets/vgg16_depre.py", line 532, in test_image
feed_dict=feed_dict)
File "/home/aakhochare/tensorflowr10/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 767, in run
run_metadata_ptr)
File "/home/aakhochare/tensorflowr10/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 965, in _run
feed_dict_string, options, run_metadata)
File "/home/aakhochare/tensorflowr10/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1015, in _do_run
target_list, options, run_metadata)
File "/home/aakhochare/tensorflowr10/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1035, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Failed to run py callback pyfunc_0: see error log.
[[Node: vgg16_default/ANCHOR_default/generate_anchors = PyFunc[Tin=[DT_INT32, DT_INT32, DT_INT32, DT_INT32], Tout=[DT_FLOAT, DT_INT32], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](vgg16_default/ANCHOR_default/ToInt32, vgg16_default/ANCHOR_default/ToInt32_1, vgg16_default/ANCHOR_default/generate_anchors/input_2, vgg16_default/ANCHOR_default/generate_anchors/input_3)]]

Caused by op u'vgg16_default/ANCHOR_default/generate_anchors', defined at:
File "./tools/demo_depre.py", line 141, in
tag='default', anchor_scales=[8, 16, 32])
File "/home/aakhochare/VisualAlgorithms/tf-faster-rcnn/tools/../lib/nets/vgg16_depre.py", line 484, in create_architecture
rois, cls_prob, bbox_pred = self._vgg16_from_imagenet(sess, training)
File "/home/aakhochare/VisualAlgorithms/tf-faster-rcnn/tools/../lib/nets/vgg16_depre.py", line 346, in _vgg16_from_imagenet
self._anchor_component()
File "/home/aakhochare/VisualAlgorithms/tf-faster-rcnn/tools/../lib/nets/vgg16_depre.py", line 293, in _anchor_component
[tf.float32, tf.int32], name="generate_anchors")
File "/home/aakhochare/tensorflowr10/local/lib/python2.7/site-packages/tensorflow/python/ops/script_ops.py", line 189, in py_func
input=inp, token=token, Tout=Tout, name=name)
File "/home/aakhochare/tensorflowr10/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_script_ops.py", line 40, in _py_func
name=name)
File "/home/aakhochare/tensorflowr10/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
op_def=op_def)
File "/home/aakhochare/tensorflowr10/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2327, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/aakhochare/tensorflowr10/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1226, in init
self._traceback = _extract_stack()

InternalError (see above for traceback): Failed to run py callback pyfunc_0: see error log.
[[Node: vgg16_default/ANCHOR_default/generate_anchors = PyFunc[Tin=[DT_INT32, DT_INT32, DT_INT32, DT_INT32], Tout=[DT_FLOAT, DT_INT32], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](vgg16_default/ANCHOR_default/ToInt32, vgg16_default/ANCHOR_default/ToInt32_1, vgg16_default/ANCHOR_default/generate_anchors/input_2, vgg16_default/ANCHOR_default/generate_anchors/input_3)]]

ValueError: Dimension 0 in both shapes must be equal, but are 2 and 1

I was going to train on my data, but I got this error of the reshape problem when I called below function
which is in the lib/nets/vgg16.py :

rpn_cls_score_reshape = self._reshape_layer(rpn_cls_score, 2, "rpn_cls_score_reshape")

def _reshape_layer(self, bottom, num_dim, name):
    input_shape = tf.shape(bottom)
    with tf.variable_scope(name) as scope:
        # change the channel to the caffe format
        to_caffe = tf.transpose(bottom, [0,3,1,2])
        # then force it to have channel 2
        reshaped = tf.reshape(to_caffe, tf.concat(0, [[self._batch_size], [num_dim, -1], [input_shape[2]]]))
        # then swap the channel back
        to_tf = tf.transpose(reshaped, [0,2,3,1])
        return to_tf

It get something wrong in the dimension of tf.concat function inside the line:

reshaped = tf.reshape(to_caffe, tf.concat(0, [[self._batch_size], [num_dim, -1], [input_shape[2]]]))

I'm not familiar with it. I wonder if you have any idea about solving this problem??
Below is the Trackback:

Traceback (most recent call last):
  File "./tools/trainval_vgg16_net.py", line 125, in <module>
    max_iters=args.max_iters)
  File "/tmp/caffe_update/tf-faster-rcnn/tools/../lib/model/train_val.py", line 323, in train_net
    sw.train_model(sess, max_iters)
  File "/tmp/caffe_update/tf-faster-rcnn/tools/../lib/model/train_val.py", line 110, in train_model
    tag='default', anchor_scales=anchors)
  File "/tmp/caffe_update/tf-faster-rcnn/tools/../lib/nets/vgg16.py", line 499, in create_architecture
    rois, cls_prob, bbox_pred = self._vgg16_from_imagenet(sess, training)
  File "/tmp/caffe_update/tf-faster-rcnn/tools/../lib/nets/vgg16.py", line 352, in _vgg16_from_imagenet
    rpn_cls_score_reshape = self._reshape_layer(rpn_cls_score, 2, "rpn_cls_score_reshape")
  File "/tmp/caffe_update/tf-faster-rcnn/tools/../lib/nets/vgg16.py", line 182, in _reshape_layer
    reshaped = tf.reshape(to_caffe, tf.concat(0, [[self._batch_size], [num_dim, -1], [input_shape[2]]]))
  File "/home/.local/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 1047, in concat
    dtype=dtypes.int32).get_shape(
  File "/home/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 651, in convert_to_tensor
    as_ref=False)
  File "/home/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 716, in internal_convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/home/.local/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 923, in _autopacking_conversion_function
    return _autopacking_helper(v, inferred_dtype, name or "packed")
  File "/home/.local/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 886, in _autopacking_helper
    return gen_array_ops._pack(elems_as_tensors, name=scope)
  File "/home/.local/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 2041, in _pack
    result = _op_def_lib.apply_op("Pack", values=values, axis=axis, name=name)
  File "/home/.local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
    op_def=op_def)
  File "/home/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2397, in create_op
    set_shapes_for_outputs(ret)
  File "/home/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1757, in set_shapes_for_outputs
    shapes = shape_func(op)
  File "/home/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1707, in call_with_requiring
    return call_cpp_shape_fn(op, require_shape_fn=True)
  File "/home/.local/lib/python2.7/site-packages/tensorflow/python/framework/common_shapes.py", line 610, in call_cpp_shape_fn
    debug_python_shape_fn, require_shape_fn)
  File "/home/.local/lib/python2.7/site-packages/tensorflow/python/framework/common_shapes.py", line 675, in _call_cpp_shape_fn_impl
    raise ValueError(err.message)
ValueError: Dimension 0 in both shapes must be equal, but are 2 and 1
         From merging shape 1 with other shapes. for 'vgg16_default/rpn_cls_score_reshape/concat/concat_dim' (op: 'Pack') with input shapes: [1], [2], [1].
Command exited with non-zero status 1

online hard example mining?

Very solid work!!!

Have you ever tried OHEM?
I've tried it here, but found no help at all.

Typo in README.md

Please correct 'python-opencv' dependency to 'opencv-python' in the Prerequisite section.

Reason:

While doing
pip install python-opencv :
No matching distribution found for python-opencv

and
pip install opencv-python :
Installing collected packages: numpy, opencv-python
Successfully installed numpy-1.12.1 opencv-python-3.2.0.7

Does this code support multi-gpu training?

InvalidArgumentError (see above for traceback): Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight

I followed your instruction and got this error. Can you please suggest solutions?

mona@pascal:~/computer_vision/tf-faster-rcnn$ GPU_ID=0
mona@pascal:~/computer_vision/tf-faster-rcnn$ ./experiments/scripts/vgg16.sh $GPU_ID pascal_voc
+ set -e
+ export PYTHONUNBUFFERED=True
+ PYTHONUNBUFFERED=True
+ GPU_ID=0
+ DATASET=pascal_voc
+ array=($@)
+ len=2
+ EXTRA_ARGS=
+ EXTRA_ARGS_SLUG=
+ case ${DATASET} in
+ TRAIN_IMDB=voc_2007_trainval
+ TEST_IMDB=voc_2007_test
+ STEPSIZE=50000
+ ITERS=70000
++ date +%Y-%m-%d_%H-%M-%S
+ LOG=experiments/logs/vgg16_voc_2007_trainval__vgg16.txt.2017-02-14_22-08-43
+ exec
++ tee -a experiments/logs/vgg16_voc_2007_trainval__vgg16.txt.2017-02-14_22-08-43
tee: experiments/logs/vgg16_voc_2007_trainval__vgg16.txt.2017-02-14_22-08-43: No such file or directory
+ echo Logging output to experiments/logs/vgg16_voc_2007_trainval__vgg16.txt.2017-02-14_22-08-43
Logging output to experiments/logs/vgg16_voc_2007_trainval__vgg16.txt.2017-02-14_22-08-43
+ set +x
+ '[' '!' -f output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt.index ']'
+ [[ ! -z '' ]]
+ CUDA_VISIBLE_DEVICES=0
+ time python ./tools/trainval_vgg16_net.py --weight data/imagenet_weights/vgg16.weights --imdb voc_2007_trainval --imdbval voc_2007_test --iters 70000 --cfg experiments/cfgs/vgg16.yml --set TRAIN.STEPSIZE 50000
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
Called with args:
Namespace(cfg_file='experiments/cfgs/vgg16.yml', imdb_name='voc_2007_trainval', imdbval_name='voc_2007_test', max_iters=70000, set_cfgs=['TRAIN.STEPSIZE', '50000'], tag=None, weight='data/imagenet_weights/vgg16.weights')
Using config:
{'DATA_DIR': '/home/mona/computer_vision/tf-faster-rcnn/data',
 'DEDUP_BOXES': 0.0625,
 'EPS': 1e-14,
 'EXP_DIR': 'vgg16',
 'GPU_ID': 0,
 'MATLAB': 'matlab',
 'PIXEL_MEANS': array([[[ 102.9801,  115.9465,  122.7717]]]),
 'POOLING_MODE': 'crop',
 'RNG_SEED': 3,
 'ROOT_DIR': '/home/mona/computer_vision/tf-faster-rcnn',
 'TEST': {'BBOX_REG': True,
          'HAS_RPN': True,
          'MAX_SIZE': 1000,
          'MODE': 'nms',
          'NMS': 0.3,
          'PROPOSAL_METHOD': 'selective_search',
          'RPN_NMS_THRESH': 0.7,
          'RPN_POST_NMS_TOP_N': 300,
          'RPN_PRE_NMS_TOP_N': 6000,
          'RPN_TOP_N': 5000,
          'SCALES': [600],
          'SVM': False},
 'TRAIN': {'ASPECT_GROUPING': False,
           'BATCH_SIZE': 256,
           'BBOX_INSIDE_WEIGHTS': [1.0, 1.0, 1.0, 1.0],
           'BBOX_NORMALIZE_MEANS': [0.0, 0.0, 0.0, 0.0],
           'BBOX_NORMALIZE_STDS': [0.1, 0.1, 0.2, 0.2],
           'BBOX_NORMALIZE_TARGETS': True,
           'BBOX_NORMALIZE_TARGETS_PRECOMPUTED': True,
           'BBOX_REG': True,
           'BBOX_THRESH': 0.5,
           'BG_THRESH_HI': 0.5,
           'BG_THRESH_LO': 0.0,
           'BIAS_DECAY': False,
           'DISPLAY': 20,
           'DOUBLE_BIAS': True,
           'FG_FRACTION': 0.25,
           'FG_THRESH': 0.5,
           'GAMMA': 0.1,
           'HAS_RPN': True,
           'IMS_PER_BATCH': 1,
           'LEARNING_RATE': 0.001,
           'MAX_SIZE': 1000,
           'MOMENTUM': 0.9,
           'PROPOSAL_METHOD': 'gt',
           'RPN_BATCHSIZE': 256,
           'RPN_BBOX_INSIDE_WEIGHTS': [1.0, 1.0, 1.0, 1.0],
           'RPN_CLOBBER_POSITIVES': False,
           'RPN_FG_FRACTION': 0.5,
           'RPN_NEGATIVE_OVERLAP': 0.3,
           'RPN_NMS_THRESH': 0.7,
           'RPN_POSITIVE_OVERLAP': 0.7,
           'RPN_POSITIVE_WEIGHT': -1.0,
           'RPN_POST_NMS_TOP_N': 2000,
           'RPN_PRE_NMS_TOP_N': 12000,
           'SCALES': [600],
           'SNAPSHOT_ITERS': 5000,
           'SNAPSHOT_KEPT': 3,
           'SNAPSHOT_PREFIX': 'vgg16_faster_rcnn',
           'STEPSIZE': 50000,
           'SUMMARY_INTERVAL': 180,
           'TRUNCATED': False,
           'USE_FLIPPED': True,
           'USE_GT': False,
           'WEIGHT_DECAY': 0.0005},
 'USE_GPU_NMS': True}
Loaded dataset `voc_2007_trainval` for training
Set proposal method: gt
Appending horizontally-flipped training examples...
voc_2007_trainval gt roidb loaded from /home/mona/computer_vision/tf-faster-rcnn/data/cache/voc_2007_trainval_gt_roidb.pkl
done
Preparing training data...
done
10022 roidb entries
Output will be saved to `/home/mona/computer_vision/tf-faster-rcnn/output/vgg16/voc_2007_trainval/default`
TensorFlow summaries will be saved to `/home/mona/computer_vision/tf-faster-rcnn/tensorboard/vgg16/voc_2007_trainval/default`
Loaded dataset `voc_2007_test` for training
Set proposal method: gt
Preparing training data...
voc_2007_test gt roidb loaded from /home/mona/computer_vision/tf-faster-rcnn/data/cache/voc_2007_test_gt_roidb.pkl
done
4952 validation roidb entries
Filtered 0 roidb entries: 10022 -> 10022
Filtered 0 roidb entries: 4952 -> 4952
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: Tesla K40c
major: 3 minor: 5 memoryClockRate (GHz) 0.745
pciBusID 0000:03:00.0
Total memory: 11.92GiB
Free memory: 11.85GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K40c, pci bus id: 0000:03:00.0)
Solving...
Loading caffe weights...
Done!
/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py:91: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
Loading initial model weights from data/imagenet_weights/vgg16.weights
Loaded.
iter: 20 / 70000, total loss: 0.443026
 >>> rpn_loss_cls: 0.345992
 >>> rpn_loss_box: 0.097034
 >>> loss_cls: 0.000000
 >>> loss_box: 0.000000
 >>> lr: 0.001000
speed: 1.749s / iter
iter: 40 / 70000, total loss: 0.516920
 >>> rpn_loss_cls: 0.399234
 >>> rpn_loss_box: 0.117686
 >>> loss_cls: 0.000000
 >>> loss_box: 0.000000
 >>> lr: 0.001000
speed: 1.760s / iter
iter: 60 / 70000, total loss: 0.393830
 >>> rpn_loss_cls: 0.353334
 >>> rpn_loss_box: 0.040496
 >>> loss_cls: 0.000000
 >>> loss_box: 0.000000
 >>> lr: 0.001000
speed: 1.668s / iter
iter: 80 / 70000, total loss: 0.217178
 >>> rpn_loss_cls: 0.146591
 >>> rpn_loss_box: 0.070533
 >>> loss_cls: 0.000053
 >>> loss_box: 0.000000
 >>> lr: 0.001000
speed: 1.536s / iter
iter: 100 / 70000, total loss: 0.390607
 >>> rpn_loss_cls: 0.277706
 >>> rpn_loss_box: 0.030601
 >>> loss_cls: 0.075361
 >>> loss_box: 0.006940
 >>> lr: 0.001000
speed: 1.495s / iter
iter: 120 / 70000, total loss: 0.882707
 >>> rpn_loss_cls: 0.566185
 >>> rpn_loss_box: 0.227990
 >>> loss_cls: 0.083081
 >>> loss_box: 0.005452
 >>> lr: 0.001000
speed: 1.570s / iter
iter: 140 / 70000, total loss: 0.223789
 >>> rpn_loss_cls: 0.113045
 >>> rpn_loss_box: 0.049687
 >>> loss_cls: 0.052417
 >>> loss_box: 0.008640
 >>> lr: 0.001000
speed: 1.510s / iter
iter: 160 / 70000, total loss: 0.219555
 >>> rpn_loss_cls: 0.187197
 >>> rpn_loss_box: 0.032358
 >>> loss_cls: 0.000000
 >>> loss_box: 0.000000
 >>> lr: 0.001000
speed: 1.494s / iter
iter: 180 / 70000, total loss: 2.256282
 >>> rpn_loss_cls: 1.965876
 >>> rpn_loss_box: 0.290406
 >>> loss_cls: 0.000000
 >>> loss_box: 0.000000
 >>> lr: 0.001000
speed: 1.475s / iter
iter: 200 / 70000, total loss: 1.727870
 >>> rpn_loss_cls: 1.226427
 >>> rpn_loss_box: 0.501443
 >>> loss_cls: 0.000000
 >>> loss_box: 0.000000
 >>> lr: 0.001000
speed: 1.463s / iter
iter: 220 / 70000, total loss: 0.353863
 >>> rpn_loss_cls: 0.298823
 >>> rpn_loss_box: 0.055040
 >>> loss_cls: 0.000000
 >>> loss_box: 0.000000
 >>> lr: 0.001000
speed: 1.461s / iter
iter: 240 / 70000, total loss: 0.147688
 >>> rpn_loss_cls: 0.039554
 >>> rpn_loss_box: 0.108122
 >>> loss_cls: 0.000012
 >>> loss_box: 0.000000
 >>> lr: 0.001000
speed: 1.450s / iter
iter: 260 / 70000, total loss: 0.485889
 >>> rpn_loss_cls: 0.416970
 >>> rpn_loss_box: 0.068911
 >>> loss_cls: 0.000009
 >>> loss_box: 0.000000
 >>> lr: 0.001000
speed: 1.428s / iter
iter: 280 / 70000, total loss: 0.153297
 >>> rpn_loss_cls: 0.108915
 >>> rpn_loss_box: 0.044243
 >>> loss_cls: 0.000139
 >>> loss_box: 0.000000
 >>> lr: 0.001000
speed: 1.440s / iter
iter: 300 / 70000, total loss: 0.374053
 >>> rpn_loss_cls: 0.310106
 >>> rpn_loss_box: 0.063945
 >>> loss_cls: 0.000001
 >>> loss_box: 0.000000
 >>> lr: 0.001000
speed: 1.397s / iter
iter: 320 / 70000, total loss: 1.169239
 >>> rpn_loss_cls: 1.099040
 >>> rpn_loss_box: 0.070199
 >>> loss_cls: 0.000000
 >>> loss_box: 0.000000
 >>> lr: 0.001000
speed: 1.385s / iter
iter: 340 / 70000, total loss: 0.243177
 >>> rpn_loss_cls: 0.193078
 >>> rpn_loss_box: 0.049057
 >>> loss_cls: 0.001042
 >>> loss_box: 0.000000
 >>> lr: 0.001000
speed: 1.370s / iter
iter: 360 / 70000, total loss: 0.387752
 >>> rpn_loss_cls: 0.375503
 >>> rpn_loss_box: 0.012084
 >>> loss_cls: 0.000166
 >>> loss_box: 0.000000
 >>> lr: 0.001000
speed: 1.353s / iter
iter: 380 / 70000, total loss: 0.494936
 >>> rpn_loss_cls: 0.312221
 >>> rpn_loss_box: 0.045870
 >>> loss_cls: 0.136845
 >>> loss_box: 0.000000
 >>> lr: 0.001000
speed: 1.336s / iter
/home/mona/computer_vision/tf-faster-rcnn/tools/../lib/model/bbox_transform.py:48: RuntimeWarning: overflow encountered in exp
  pred_w = np.exp(dw) * widths[:, np.newaxis]
/home/mona/computer_vision/tf-faster-rcnn/tools/../lib/model/bbox_transform.py:48: RuntimeWarning: overflow encountered in multiply
  pred_w = np.exp(dw) * widths[:, np.newaxis]
/home/mona/computer_vision/tf-faster-rcnn/tools/../lib/model/bbox_transform.py:49: RuntimeWarning: overflow encountered in exp
  pred_h = np.exp(dh) * heights[:, np.newaxis]
/home/mona/computer_vision/tf-faster-rcnn/tools/../lib/model/bbox_transform.py:49: RuntimeWarning: overflow encountered in multiply
  pred_h = np.exp(dh) * heights[:, np.newaxis]
/home/mona/computer_vision/tf-faster-rcnn/tools/../lib/model/bbox_transform.py:55: RuntimeWarning: invalid value encountered in subtract
  pred_boxes[:, 1::4] = pred_ctr_y - 0.5 * pred_h
iter: 400 / 70000, total loss: nan
 >>> rpn_loss_cls: nan
 >>> rpn_loss_box: nan
 >>> loss_cls: 3.037189
 >>> loss_box: 0.000000
 >>> lr: 0.001000
speed: 1.321s / iter
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
Traceback (most recent call last):
  File "./tools/trainval_vgg16_net.py", line 117, in <module>
    max_iters=args.max_iters)
  File "/home/mona/computer_vision/tf-faster-rcnn/tools/../lib/model/train_val.py", line 304, in train_net
    sw.train_model(sess, max_iters)
  File "/home/mona/computer_vision/tf-faster-rcnn/tools/../lib/model/train_val.py", line 197, in train_model
    self.net.train_step_with_summary(sess, blobs, train_op)
  File "/home/mona/computer_vision/tf-faster-rcnn/tools/../lib/nets/vgg16.py", line 561, in train_step_with_summary
    feed_dict=feed_dict)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 766, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 964, in _run
    feed_dict_string, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1014, in _do_run
    target_list, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1034, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]

Caused by op u'TRAIN/vgg16_default/conv3_1/weight', defined at:
  File "./tools/trainval_vgg16_net.py", line 117, in <module>
    max_iters=args.max_iters)
  File "/home/mona/computer_vision/tf-faster-rcnn/tools/../lib/model/train_val.py", line 304, in train_net
    sw.train_model(sess, max_iters)
  File "/home/mona/computer_vision/tf-faster-rcnn/tools/../lib/model/train_val.py", line 91, in train_model
    tag='default', anchor_scales=anchors)
  File "/home/mona/computer_vision/tf-faster-rcnn/tools/../lib/nets/vgg16.py", line 507, in create_architecture
    self._add_train_summary(var)
  File "/home/mona/computer_vision/tf-faster-rcnn/tools/../lib/nets/vgg16.py", line 48, in _add_train_summary
    tf.summary.histogram('TRAIN/' + var.op.name, var)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/summary/summary.py", line 205, in histogram
    tag=scope.rstrip('/'), values=values, name=scope)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_logging_ops.py", line 139, in _histogram_summary
    name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 759, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2240, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1128, in __init__
    self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]

E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:652] Deallocating stream with pending work
Command exited with non-zero status 1
435.97user 110.56system 9:22.01elapsed 97%CPU (0avgtext+0avgdata 2976644maxresident)k
60224inputs+2752outputs (4major+2126190minor)pagefaults 0swaps
mona@pascal:~/computer_vision/tf-faster-rcnn$

Could you share your experience to train VGG which has ap=28.3 on coco_2014_minival?

Could you share your experience to train VGG which has ap=28.3 on coco_2014_minival? Thanks!
1.For coco, we find the performance improving with more iterations (790k), and potentially better performance can be achieved with even more iterations. The learning rate is?

Why does (RPN's result reshape to [-1,4]) work?

https://github.com/endernewton/tf-faster-rcnn/blob/master/lib/layer_utils/proposal_layer.py#L30

As rpn_bbox_pred is just the result of conv, so this confused me.

no CUDA-capable device is detected

When executing the command

CUDA_VISIBLE_DEVICES=$0 ./tools/demo.py

I get greeted with "no CUDA-capable device is detected" - however I have a TITAN X on Ubuntu 16.04 with CUDA installed, running nvidia-smi gives me the Device-ID of 0, so what could be the problem here?

stuck at "Filtered 0 roidb entries: 4952 -> 4952 ..." using tensorflow 1.0

For the tensorflow 1.0 version, the call to training would stuck at the following forever. Do you know what could possibly be the problem?

$ ./experiments/scripts/train_faster_rcnn.sh 0 pascal_voc vgg16
+ set -e
+ export PYTHONUNBUFFERED=True
+ PYTHONUNBUFFERED=True
+ GPU_ID=0
+ DATASET=pascal_voc
+ NET=vgg16
+ array=($@)
+ len=3
+ EXTRA_ARGS=
+ EXTRA_ARGS_SLUG=
+ case ${DATASET} in
+ TRAIN_IMDB=voc_2007_trainval
+ TEST_IMDB=voc_2007_test
+ STEPSIZE=50000
+ ITERS=70000
++ date +%Y-%m-%d_%H-%M-%S
+ LOG=experiments/logs/vgg16_voc_2007_trainval__vgg16.txt.2017-03-07_14-02-43
+ exec
++ tee -a experiments/logs/vgg16_voc_2007_trainval__vgg16.txt.2017-03-07_14-02-43
tee: experiments/logs/vgg16_voc_2007_trainval__vgg16.txt.2017-03-07_14-02-43: No such file or directory
+ echo Logging output to experiments/logs/vgg16_voc_2007_trainval__vgg16.txt.2017-03-07_14-02-43
Logging output to experiments/logs/vgg16_voc_2007_trainval__vgg16.txt.2017-03-07_14-02-43
+ set +x
+ '[' '!' -f output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt.index ']'
+ [[ ! -z '' ]]
+ CUDA_VISIBLE_DEVICES=0
+ time python ./tools/trainval_net.py --weight data/imagenet_weights/vgg16.ckpt --imdb voc_2007_trainval --imdbval voc_2007_test --iters 70000 --cfg experiments/cfgs/vgg16.yml --net vgg16 --set TRAIN.STEPSIZE 50000
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally

** (trainval_net.py:14625): WARNING **: Couldn't connect to accessibility bus: Failed to connect to socket /tmp/dbus-khoZhY13uf: Connection refused
Called with args:
Namespace(cfg_file='experiments/cfgs/vgg16.yml', imdb_name='voc_2007_trainval', imdbval_name='voc_2007_test', max_iters=70000, net='vgg16', set_cfgs=['TRAIN.STEPSIZE', '50000'], tag=None, weight='data/imagenet_weights/vgg16.ckpt')
Using config:
{'DATA_DIR': '/home/wenxi/repos/tf-faster-rcnn/data',
 'DEDUP_BOXES': 0.0625,
 'EPS': 1e-14,
 'EXP_DIR': 'vgg16',
 'GPU_ID': 0,
 'MATLAB': 'matlab',
 'PIXEL_MEANS': array([[[ 102.9801,  115.9465,  122.7717]]]),
 'POOLING_MODE': 'crop',
 'RNG_SEED': 3,
 'ROOT_DIR': '/home/wenxi/repos/tf-faster-rcnn',
 'TEST': {'BBOX_REG': True,
          'HAS_RPN': True,
          'MAX_SIZE': 1000,
          'MODE': 'nms',
          'NMS': 0.3,
          'PROPOSAL_METHOD': 'selective_search',
          'RPN_NMS_THRESH': 0.7,
          'RPN_POST_NMS_TOP_N': 300,
          'RPN_PRE_NMS_TOP_N': 6000,
          'RPN_TOP_N': 5000,
          'SCALES': [600],
          'SVM': False},
 'TRAIN': {'ASPECT_GROUPING': False,
           'BATCH_SIZE': 256,
           'BBOX_INSIDE_WEIGHTS': [1.0, 1.0, 1.0, 1.0],
           'BBOX_NORMALIZE_MEANS': [0.0, 0.0, 0.0, 0.0],
           'BBOX_NORMALIZE_STDS': [0.1, 0.1, 0.2, 0.2],
           'BBOX_NORMALIZE_TARGETS': True,
           'BBOX_NORMALIZE_TARGETS_PRECOMPUTED': True,
           'BBOX_REG': True,
           'BBOX_THRESH': 0.5,
           'BG_THRESH_HI': 0.5,
           'BG_THRESH_LO': 0.0,
           'BIAS_DECAY': False,
           'DISPLAY': 20,
           'DOUBLE_BIAS': True,
           'FG_FRACTION': 0.25,
           'FG_THRESH': 0.5,
           'GAMMA': 0.1,
           'HAS_RPN': True,
           'IMS_PER_BATCH': 1,
           'LEARNING_RATE': 0.001,
           'MAX_SIZE': 1000,
           'MOMENTUM': 0.9,
           'PROPOSAL_METHOD': 'gt',
           'RPN_BATCHSIZE': 256,
           'RPN_BBOX_INSIDE_WEIGHTS': [1.0, 1.0, 1.0, 1.0],
           'RPN_CLOBBER_POSITIVES': False,
           'RPN_FG_FRACTION': 0.5,
           'RPN_NEGATIVE_OVERLAP': 0.3,
           'RPN_NMS_THRESH': 0.7,
           'RPN_POSITIVE_OVERLAP': 0.7,
           'RPN_POSITIVE_WEIGHT': -1.0,
           'RPN_POST_NMS_TOP_N': 2000,
           'RPN_PRE_NMS_TOP_N': 12000,
           'SCALES': [600],
           'SNAPSHOT_ITERS': 5000,
           'SNAPSHOT_KEPT': 3,
           'SNAPSHOT_PREFIX': 'vgg16_faster_rcnn',
           'STEPSIZE': 50000,
           'SUMMARY_INTERVAL': 180,
           'TRUNCATED': False,
           'USE_ALL_GT': True,
           'USE_FLIPPED': True,
           'USE_GT': False,
           'WEIGHT_DECAY': 0.0005},
 'USE_GPU_NMS': True}
Loaded dataset `voc_2007_trainval` for training
Set proposal method: gt
Appending horizontally-flipped training examples...
voc_2007_trainval gt roidb loaded from /home/wenxi/repos/tf-faster-rcnn/data/cache/voc_2007_trainval_gt_roidb.pkl
done
Preparing training data...
done
10022 roidb entries
Output will be saved to `/home/wenxi/repos/tf-faster-rcnn/output/vgg16/voc_2007_trainval/default`
TensorFlow summaries will be saved to `/home/wenxi/repos/tf-faster-rcnn/tensorboard/vgg16/voc_2007_trainval/default`
Loaded dataset `voc_2007_test` for training
Set proposal method: gt
Preparing training data...
voc_2007_test gt roidb loaded from /home/wenxi/repos/tf-faster-rcnn/data/cache/voc_2007_test_gt_roidb.pkl
done
4952 validation roidb entries
Filtered 0 roidb entries: 10022 -> 10022
Filtered 0 roidb entries: 4952 -> 4952
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.

Thanks!

vgg weights file

Does the training process and testing process using different vgg weights file?
From the source code, the training process needs to load a vgg16.ckpt file, but the demo.py needs a vgg16 weights file from caffe model.
Why not using ckpt weights model directly in demo.py?

Does this code support multi-gpu training?

Hi, does this code support multi-gpu training?

vgg16.weights?

I am trying to test out the demo code using
./experiments/scripts/test_vgg16.sh 0 pascal_voc

I followed the tutorial and only have a vgg16.cpkt file rather than a .weights file?

error when i run the demo when calling ./tools/demo_depre.py

Hi, i have followed the installation guide to install the faster rcnn on ubuntu16.04, i use GTX1080 and i don't change the arch from sm_52 to sm_61, and line 49 of setu.py i replace "pjoin(home, 'lib64')}" by pjoin(home, 'lib')},; therefore this no error when i run make command. But, because i install tensorflow with python3,so i am not sure whether other changes should be done.
so the make is done. However, when i run step 3 in Test with (old) pre-trained models (Demo Included):
GPU_ID=1
CUDA_VISIBLE_DEVICES=${GPU_ID} ./tools/demo.py
I have made some change from python2 to python3, then i get the following error:

(tensorflow) fei@fei:~/tf-faster-rcnn$ CUDA_VISIBLE_DEVICES=${GPU_ID} ./tools/demo_depre.py

Traceback (most recent call last):
File "./tools/demo_depre.py", line 20, in
from model.test_vgg16 import im_detect
File "/home/fei/tf-faster-rcnn/tools/../lib/model/test_vgg16.py", line 15, in
from utils.cython_nms import nms, nms_new
ImportError: /home/fei/tf-faster-rcnn/tools/../lib/utils/cython_nms.so: undefined symbol: _Py_ZeroStruct

How should i solve this problem? i have tried a lot of ways that i find from the Internet.
thanks a lot if anyone can give me a hint

Is there a quick way to use distributed tensorflow instead of GPU?

Is it possible to run the code in a multi-core platform where there is no GPU? I can see that the setup.py requires locating CUDA.

Lower accuracy than expected on VOC 2007 model & data

Thank you for providing your code. I have installed and run the test provided but unfortunately I am seeing lower accuracy on the VOC 2007 benchmark than I expected.

On the Readme I see that the model achieves 71.2 but when I run ./experiments/scripts/test_vgg16.sh 0 pascal_voc with VOC 2007 data and your model I see a result of Mean AP = 0.4955. If I am right this is meant to be interpreted as an mAP of 49.55. Should I be using a different testing script or different model than the one downloaded by ./data/scripts/fetch_faster_rcnn_models.sh ? Here are the full results:

AP for aeroplane = 0.5898 AP for bicycle = 0.5308 AP for bird = 0.4317 AP for boat = 0.3876 AP for bottle = 0.2347 AP for bus = 0.6052 AP for car = 0.5414 AP for cat = 0.6908 AP for chair = 0.2789 AP for cow = 0.5222 AP for diningtable = 0.5555 AP for dog = 0.6149 AP for horse = 0.7065 AP for motorbike = 0.5160 AP for person = 0.4421 AP for pottedplant = 0.2304 AP for sheep = 0.4441 AP for sofa = 0.5538 AP for train = 0.6770 AP for tvmonitor = 0.3559 Mean AP = 0.4955

GPU easydict

We still consume an easydict for GPU at https://github.com/endernewton/tf-faster-rcnn/blob/master/lib/model/nms_wrapper.py#L21

ImportError:No module named pycocotools.coco

When I am in your STEP test and train,the terminal show me the error.
from pycocotools.coco import COCO
I cannot find the folder or document named pycocotools,can you give me some suggestions?Thank you so much for your help,hope to hear from you.

Running without CUDA (CPU only)

Hi,

I am just wondering if there is a way of running this model on a machine without a GPU? Running make attempts to find CUDA.

Thanks!

bug @ demo.py

~/DL/tf-faster-rcnn$ CUDA_VISIBLE_DEVICES=${GPU_ID} ./tools/demo.py

Traceback (most recent call last):
  File "./tools/demo.py", line 118, in <module>
    dataset = dataset
NameError: name 'dataset' is not defined

About OHEM

Do you plan to add OHEM?

tensorflow/core/common_runtime/bfc_allocator.cc:275] Ran out of memory trying to allocate 392.00MiB. See logs for memory state.

Can you please help how to fix this? I don't have another tf process going or anything using GPU:

mona@pascal:~/computer_vision/tf-faster-rcnn$ ./experiments/scripts/test_vgg16.sh $GPU_ID pascal_voc
+ set -e
+ export PYTHONUNBUFFERED=True
+ PYTHONUNBUFFERED=True
+ GPU_ID=0
+ DATASET=pascal_voc
+ array=($@)
+ len=2
+ EXTRA_ARGS=
+ EXTRA_ARGS_SLUG=
+ case ${DATASET} in
+ TRAIN_IMDB=voc_2007_trainval
+ TEST_IMDB=voc_2007_test
+ ITERS=70000
++ date +%Y-%m-%d_%H-%M-%S
+ LOG=experiments/logs/test_vgg16_voc_2007_trainval_.txt.2017-02-13_21-19-04
+ exec
++ tee -a experiments/logs/test_vgg16_voc_2007_trainval_.txt.2017-02-13_21-19-04
tee: experiments/logs/test_vgg16_voc_2007_trainval_.txt.2017-02-13_21-19-04: No such file or directory
+ echo Logging output to experiments/logs/test_vgg16_voc_2007_trainval_.txt.2017-02-13_21-19-04
Logging output to experiments/logs/test_vgg16_voc_2007_trainval_.txt.2017-02-13_21-19-04
+ set +x
+ [[ ! -z '' ]]
+ CUDA_VISIBLE_DEVICES=0
+ time python ./tools/test_vgg16_net.py --imdb voc_2007_test --weight data/imagenet_weights/vgg16.weights --model output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt --cfg experiments/cfgs/vgg16.yml --set
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so.5.0 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so.8.0 locally
Called with args:
Namespace(cfg_file='experiments/cfgs/vgg16.yml', comp_mode=False, imdb_name='voc_2007_test', max_per_image=100, model='output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt', set_cfgs=[], tag='', weight='data/imagenet_weights/vgg16.weights')
Using config:
{'DATA_DIR': '/home/mona/computer_vision/tf-faster-rcnn/data',
 'DEDUP_BOXES': 0.0625,
 'EPS': 1e-14,
 'EXP_DIR': 'vgg16',
 'GPU_ID': 0,
 'MATLAB': 'matlab',
 'PIXEL_MEANS': array([[[ 102.9801,  115.9465,  122.7717]]]),
 'POOLING_MODE': 'crop',
 'RNG_SEED': 3,
 'ROOT_DIR': '/home/mona/computer_vision/tf-faster-rcnn',
 'TEST': {'BBOX_REG': True,
          'HAS_RPN': True,
          'MAX_SIZE': 1000,
          'MODE': 'nms',
          'NMS': 0.3,
          'PROPOSAL_METHOD': 'selective_search',
          'RPN_NMS_THRESH': 0.7,
          'RPN_POST_NMS_TOP_N': 300,
          'RPN_PRE_NMS_TOP_N': 6000,
          'RPN_TOP_N': 5000,
          'SCALES': [600],
          'SVM': False},
 'TRAIN': {'ASPECT_GROUPING': False,
           'BATCH_SIZE': 256,
           'BBOX_INSIDE_WEIGHTS': [1.0, 1.0, 1.0, 1.0],
           'BBOX_NORMALIZE_MEANS': [0.0, 0.0, 0.0, 0.0],
           'BBOX_NORMALIZE_STDS': [0.1, 0.1, 0.2, 0.2],
           'BBOX_NORMALIZE_TARGETS': True,
           'BBOX_NORMALIZE_TARGETS_PRECOMPUTED': True,
           'BBOX_REG': True,
           'BBOX_THRESH': 0.5,
           'BG_THRESH_HI': 0.5,
           'BG_THRESH_LO': 0.0,
           'BIAS_DECAY': False,
           'DISPLAY': 20,
           'DOUBLE_BIAS': True,
           'FG_FRACTION': 0.25,
           'FG_THRESH': 0.5,
           'GAMMA': 0.1,
           'HAS_RPN': True,
           'IMS_PER_BATCH': 1,
           'LEARNING_RATE': 0.001,
           'MAX_SIZE': 1000,
           'MOMENTUM': 0.9,
           'PROPOSAL_METHOD': 'gt',
           'RPN_BATCHSIZE': 256,
           'RPN_BBOX_INSIDE_WEIGHTS': [1.0, 1.0, 1.0, 1.0],
           'RPN_CLOBBER_POSITIVES': False,
           'RPN_FG_FRACTION': 0.5,
           'RPN_NEGATIVE_OVERLAP': 0.3,
           'RPN_NMS_THRESH': 0.7,
           'RPN_POSITIVE_OVERLAP': 0.7,
           'RPN_POSITIVE_WEIGHT': -1.0,
           'RPN_POST_NMS_TOP_N': 2000,
           'RPN_PRE_NMS_TOP_N': 12000,
           'SCALES': [600],
           'SNAPSHOT_ITERS': 5000,
           'SNAPSHOT_KEPT': 3,
           'SNAPSHOT_PREFIX': 'vgg16_faster_rcnn',
           'STEPSIZE': 30000,
           'SUMMARY_INTERVAL': 180,
           'TRUNCATED': False,
           'USE_FLIPPED': True,
           'USE_GT': False,
           'WEIGHT_DECAY': 0.0005},
 'USE_GPU_NMS': True}
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 0 with properties: 
name: Tesla K40c
major: 3 minor: 5 memoryClockRate (GHz) 0.8755
pciBusID 0000:03:00.0
Total memory: 11.92GiB
Free memory: 572.75MiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:972] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K40c, pci bus id: 0000:03:00.0)
Loading caffe weights...
Done!
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (256): 	Total Chunks: 1, Chunks in use: 0 256B allocated for chunks. 256B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (512): 	Total Chunks: 1, Chunks in use: 0 512B allocated for chunks. 512B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (1024): 	Total Chunks: 1, Chunks in use: 0 1.0KiB allocated for chunks. 1.0KiB client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (2048): 	Total Chunks: 1, Chunks in use: 0 2.0KiB allocated for chunks. 2.0KiB client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (4096): 	Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (8192): 	Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (16384): 	Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (32768): 	Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (65536): 	Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (131072): 	Total Chunks: 1, Chunks in use: 0 130.0KiB allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (262144): 	Total Chunks: 1, Chunks in use: 0 288.0KiB allocated for chunks. 288.0KiB client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (524288): 	Total Chunks: 2, Chunks in use: 0 1.44MiB allocated for chunks. 576.0KiB client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (1048576): 	Total Chunks: 2, Chunks in use: 0 2.88MiB allocated for chunks. 1.12MiB client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (2097152): 	Total Chunks: 1, Chunks in use: 0 2.25MiB allocated for chunks. 2.25MiB client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (4194304): 	Total Chunks: 2, Chunks in use: 0 11.50MiB allocated for chunks. 4.50MiB client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (8388608): 	Total Chunks: 1, Chunks in use: 0 9.00MiB allocated for chunks. 9.00MiB client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (16777216): 	Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (33554432): 	Total Chunks: 1, Chunks in use: 0 37.00MiB allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (67108864): 	Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (134217728): 	Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:643] Bin (268435456): 	Total Chunks: 0, Chunks in use: 0 0B allocated for chunks. 0B client-requested for chunks. 0B in use in bin. 0B client-requested in use in bin.
I tensorflow/core/common_runtime/bfc_allocator.cc:660] Bin for 392.00MiB was 256.00MiB, Chunk State: 
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1305ba0000 of size 1280
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1305ba0600 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1305ba0700 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1305ba0a00 of size 512
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1305ba0c00 of size 512
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1305ba1200 of size 1024
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1305ba1600 of size 1024
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1305ba1a00 of size 1536
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1305ba2000 of size 6912
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1305ba4300 of size 2048
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1305ba4b00 of size 2048
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1305ba5300 of size 2048
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1305ba5b00 of size 2048
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1305ba6300 of size 2048
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1305ba6b00 of size 2048
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1305bc7b00 of size 147456
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1305c33b00 of size 443648
I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x1305ba0500 of size 256
I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x1305ba0800 of size 512
I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x1305ba0e00 of size 1024
I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x1305ba3b00 of size 2048
I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x1305ba7300 of size 133120
I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x1305bebb00 of size 294912
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1306930000 of size 589824
I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x13068a0000 of size 589824
I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x13069c0000 of size 917504
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1306bc0000 of size 1179648
I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x1306aa0000 of size 1179648
I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x1306ce0000 of size 1835008
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x13070e0000 of size 2359296
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1307320000 of size 3670016
I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x1306ea0000 of size 2359296
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1307b20000 of size 4718592
I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x13076a0000 of size 4718592
I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x1307fa0000 of size 7340032
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x1308fa0000 of size 9437184
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x13098a0000 of size 14680064
I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x13086a0000 of size 9437184
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130a6a0000 of size 9437184
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130afa0000 of size 9437184
I tensorflow/core/common_runtime/bfc_allocator.cc:678] Chunk at 0x130b8a0000 of size 9437184
I tensorflow/core/common_runtime/bfc_allocator.cc:687] Free at 0x130c1a0000 of size 38797312
I tensorflow/core/common_runtime/bfc_allocator.cc:693]      Summary of in-use Chunks by size: 
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 2 Chunks of size 256 totalling 512B
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 2 Chunks of size 512 totalling 1.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 2 Chunks of size 1024 totalling 2.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 1280 totalling 1.2KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 1536 totalling 1.5KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 6 Chunks of size 2048 totalling 12.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 6912 totalling 6.8KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 147456 totalling 144.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 443648 totalling 433.2KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 589824 totalling 576.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 1179648 totalling 1.12MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 2359296 totalling 2.25MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 3670016 totalling 3.50MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 4718592 totalling 4.50MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 4 Chunks of size 9437184 totalling 36.00MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 14680064 totalling 14.00MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:700] Sum Total of in-use chunks: 62.53MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats: 
Limit:                   390856704
InUse:                    65562880
MaxInUse:                 74998016
NumAllocs:                      53
MaxAllocSize:             14680064

W tensorflow/core/common_runtime/bfc_allocator.cc:274] **_**__****x___****____________***************xxx**********************_____________________________
W tensorflow/core/common_runtime/bfc_allocator.cc:275] Ran out of memory trying to allocate 392.00MiB.  See logs for memory state.
Traceback (most recent call last):
  File "./tools/test_vgg16_net.py", line 89, in <module>
    tag='default', anchor_scales=anchors)
  File "/home/mona/computer_vision/tf-faster-rcnn/tools/../lib/nets/vgg16.py", line 481, in create_architecture
    rois, cls_prob, bbox_pred = self._vgg16_from_imagenet(sess, training)
  File "/home/mona/computer_vision/tf-faster-rcnn/tools/../lib/nets/vgg16.py", line 375, in _vgg16_from_imagenet
    fc6 = self._fc_layer(sess, pool5, "fc6", train)
  File "/home/mona/computer_vision/tf-faster-rcnn/tools/../lib/nets/vgg16.py", line 129, in _fc_layer
    weight = self._get_fc_weight(sess, name, trainable=trainable)
  File "/home/mona/computer_vision/tf-faster-rcnn/tools/../lib/nets/vgg16.py", line 104, in _get_fc_weight
    sess.run(weight.initializer, feed_dict={phcw: cw})
  File "/home/mona/tensorflow/_python_build/tensorflow/python/client/session.py", line 717, in run
    run_metadata_ptr)
  File "/home/mona/tensorflow/_python_build/tensorflow/python/client/session.py", line 915, in _run
    feed_dict_string, options, run_metadata)
  File "/home/mona/tensorflow/_python_build/tensorflow/python/client/session.py", line 965, in _do_run
    target_list, options, run_metadata)
  File "/home/mona/tensorflow/_python_build/tensorflow/python/client/session.py", line 985, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors.InternalError: Dst tensor is not initialized.
	 [[Node: _recv_vgg16_default/fc6/Placeholder_0/_53 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_4__recv_vgg16_default/fc6/Placeholder_0", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
Command exited with non-zero status 1
5.24user 4.06system 0:16.86elapsed 55%CPU (0avgtext+0avgdata 2066436maxresident)k
0inputs+32outputs (0major+320384minor)pagefaults 0swaps
mona@pascal:~/computer_vision/tf-faster-rcnn$ ps
  PID TTY          TIME CMD
21495 pts/15   00:00:01 bash
24534 pts/15   00:00:00 ps
mona@pascal:~/computer_vision/tf-faster-rcnn$

distributed version

Is there a distributed version?

Lower AP on voc2007.

I train the model on voc2007, but the test accuracy is only 0.693 compared with your result 71.2.
And when use the resnet101, my result is 0.7394 compared with your result 0.75.
Is it the random error?
Thanks!

Can I change the anchor size?

How can I change the anchor size of the RPN to 16^2, 40^2, 100^2 instead of 128^2, 256^2, 512^2? Also, How can I attach the feature maps from conv4_3 to anchor boxes instead conv5_3?

Train with fewer classes?

Hi! I tried to train it with fewer classes, let's say 18 classes, without 'sofa', 'train', and 'tvmonitor'. This is what I changed:

pascal_voc.py: I deleted some classes:
self._classes = ('background', # always index 0
'aeroplane', 'bicycle', 'bird', 'boat',
'bottle', 'bus', 'car', 'cat', 'chair',
'cow', 'diningtable', 'dog', 'horse',
'motorbike', 'person', 'pottedplant',
'sheep')#, 'sofa', 'tvmonitor')#, 'train')
proposal_target_layer.py: I added:
for ind in inds:
cls = clss[ind]
if cls >= num_classes:
cls = 0

start = int(4 * cls)
end = start + 4
in def _get_bbox_regression_labels(bbox_target_data, num_classes) function

At some point, the model gives all cross_entropy to be [nan,nan,nan....nan], and the cls_scores are: [nan,nan,nan,..,nan], fc7 layers gives results of [0,0,0,0,.....]...
I wonder anyone encountered similar problems? Or anyone has ideas about it..?

How to fine tuning?

I don't know how to finetuning. I had changed the file of pascal_voc.py. But there was an error when I was running ./experiments/scripts/train_faster_rcnn.sh 0 pascal_voc vgg16

Loaded dataset voc_2007_trainval for training
Set proposal method: gt
Appending horizontally-flipped training examples...
voc_2007_trainval gt roidb loaded from /home/weiguang/tf-faster-rcnn/data/cache/voc_2007_trainval_gt_roidb.pkl
Traceback (most recent call last):
File "./tools/trainval_net.py", line 105, in
imdb, roidb = combined_roidb(args.imdb_name)
File "./tools/trainval_net.py", line 76, in combined_roidb
roidbs = [get_roidb(s) for s in imdb_names.split('+')]
File "./tools/trainval_net.py", line 73, in get_roidb
roidb = get_training_roidb(imdb)
File "/home/weiguang/tf-faster-rcnn/tools/../lib/model/train_val.py", line 303, in get_training_roidb
imdb.append_flipped_images()
File "/home/weiguang/tf-faster-rcnn/tools/../lib/datasets/imdb.py", line 120, in append_flipped_images
assert (boxes[:, 2] >= boxes[:, 0]).all()
AssertionError
Command exited with non-zero status 1
1.81user 0.51system 0:02.35elapsed 98%CPU (0avgtext+0avgdata 303112maxresident)k
168inputs+56outputs (0major+56787minor)pagefaults 0swaps

input shapes not equal?

I am trying the demo code on pascal_voc
./experiments/scripts/test_vgg16.sh $GPU_ID pascal_voc

but I get:

Traceback (most recent call last): File "./tools/test_vgg16_net.py", line 89, in <module> tag='default', anchor_scales=anchors) File "/home/brian/tf-faster-rcnn/tools/../lib/nets/vgg16_depre.py", line 481, in create_architecture rois, cls_prob, bbox_pred = self._vgg16_from_imagenet(sess, training) File "/home/brian/tf-faster-rcnn/tools/../lib/nets/vgg16_depre.py", line 350, in _vgg16_from_imagenet rpn_cls_score_reshape = self._reshape_layer(rpn_cls_score, 2, "rpn_cls_score_reshape") File "/home/brian/tf-faster-rcnn/tools/../lib/nets/vgg16_depre.py", line 181, in _reshape_layer reshaped = tf.reshape(to_caffe, tf.concat([[self._batch_size], [num_dim, -1], [input_shape[2]]], 0)) File "/home/brian/.local/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py", line 1075, in concat dtype=dtypes.int32).get_shape( File "/home/brian/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 669, in convert_to_tensor ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref) File "/home/brian/.local/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py", line 836, in _autopacking_conversion_function return _autopacking_helper(v, inferred_dtype, name or "packed") File "/home/brian/.local/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py", line 799, in _autopacking_helper return gen_array_ops._pack(elems_as_tensors, name=scope) File "/home/brian/.local/lib/python3.5/site-packages/tensorflow/python/ops/gen_array_ops.py", line 1975, in _pack result = _op_def_lib.apply_op("Pack", values=values, axis=axis, name=name) File "/home/brian/.local/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 759, in apply_op op_def=op_def) File "/home/brian/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2242, in create_op set_shapes_for_outputs(ret) File "/home/brian/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1617, in set_shapes_for_outputs shapes = shape_func(op) File "/home/brian/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1568, in call_with_requiring return call_cpp_shape_fn(op, require_shape_fn=True) File "/home/brian/.local/lib/python3.5/site-packages/tensorflow/python/framework/common_shapes.py", line 610, in call_cpp_shape_fn debug_python_shape_fn, require_shape_fn) File "/home/brian/.local/lib/python3.5/site-packages/tensorflow/python/framework/common_shapes.py", line 675, in _call_cpp_shape_fn_impl raise ValueError(err.message) ValueError: Dimension 0 in both shapes must be equal, but are 2 and 1 From merging shape 1 with other shapes. for 'vgg16_default/rpn_cls_score_reshape/concat/concat_dim' (op: 'Pack') with input shapes: [1], [2], [1].

any ideas?

res101 test error

After setup, got following error (0712 version of model files):
"Loading model check point from output/res101/voc_2007_trainval/default/res101_faster_rcnn_iter_70000.ckpt
W tensorflow/core/framework/op_kernel.cc:993] Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [401408,84] rhs shape= [100352,84]"

Has no issue with vgg16 previously (tf 0.12). Env: ubuntu 16.04, cuda 8.0/cudnn 5.1 and Nvidia 1070.

Is R-FCN code included in this repository?

The README shows the performance on VOC2007 when R-FCN is adopted. But I have not found the related code in this repo.

Can't download pretrained models from ladoga.graphics.cs.cmu.edu

# ./data/scripts/fetch_faster_rcnn_models.sh
Downloading Faster R-CNN models (2G)...
--2017-03-12 07:23:49--  http://ladoga.graphics.cs.cmu.edu/xinleic/tf-faster-rcnn/faster_rcnn_models.tgz
Resolving ladoga.graphics.cs.cmu.edu (ladoga.graphics.cs.cmu.edu)... 128.2.220.68
Connecting to ladoga.graphics.cs.cmu.edu (ladoga.graphics.cs.cmu.edu)|128.2.220.68|:80... connected.
HTTP request sent, awaiting response... Read error (Connection reset by peer) in headers.

# ping ladoga.graphics.cs.cmu.edu
PING ladoga.graphics.cs.cmu.edu (128.2.220.68) 56(84) bytes of data.
64 bytes from ladoga.graphics.cs.cmu.edu (128.2.220.68): icmp_seq=1 ttl=45 time=149 ms
64 bytes from ladoga.graphics.cs.cmu.edu (128.2.220.68): icmp_seq=2 ttl=45 time=149 ms
64 bytes from ladoga.graphics.cs.cmu.edu (128.2.220.68): icmp_seq=3 ttl=45 time=149 ms
^C
--- ladoga.graphics.cs.cmu.edu ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2001ms
rtt min/avg/max/mdev = 149.578/149.782/149.998/0.478 ms

make build error

I get:

fatal error: numpy/arrayobject.h: No such file or directory

when I try to build the project as per the instructions

./lib/datasets/pascal_voc.py
self._classes had be changed

What should I do after that?
How should I do to change the model?

Now, I have this error:

out of memory
invalid argument
an illegal memory access was encountered
an illegal memory access was encountered
E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:656] failed to record completion event; therefore, failed to create inter-stream dependency
I tensorflow/stream_executor/stream.cc:3788] stream 0x54b8220 did not memcpy host-to-device; source: 0x7fd50fe37000
E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:656] failed to record completion event; therefore, failed to create inter-stream dependency
E tensorflow/stream_executor/stream.cc:272] Error recording event in stream: error recording CUDA event on stream 0x54b7400: CUDA_ERROR_ILLEGAL_ADDRESS; not marking stream as bad, as the Event object may be at fault. Monitor for further errors.
E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:656] failed to record completion event; therefore, failed to create inter-stream dependency
I tensorflow/stream_executor/stream.cc:3788] stream 0x54b8220 did not memcpy host-to-device; source: 0x7fd50fe46000
E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:656] failed to record completion event; therefore, failed to create inter-stream dependency
I tensorflow/stream_executor/stream.cc:3788] stream 0x54b8220 did not memcpy host-to-device; source: 0x7fd50fe80000
E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:656] failed to record completion event; therefore, failed to create inter-stream dependency
I tensorflow/stream_executor/stream.cc:3788] stream 0x54b8220 did not memcpy host-to-device; source: 0x7fd50fe63000
I tensorflow/stream_executor/stream.cc:3788] stream 0x54b8220 did not memcpy host-to-device; source: 0x7fd50fe0e400
E tensorflow/stream_executor/cuda/cuda_event.cc:49] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_ADDRESS
F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:198] Unexpected Event status: 1
Command terminated by signal 6
9.98user 2.68system 0:11.87elapsed 106%CPU (0avgtext+0avgdata 3055576maxresident)k
0inputs+1568outputs (0major+683838minor)pagefaults 0swaps

endernewton / tf-faster-rcnn Goto Github PK

tf-faster-rcnn's People

Contributors

Stargazers

Watchers

Forkers

tf-faster-rcnn's Issues

Recommend Projects

Recommend Topics

Recommend Org