smallcorgi / 3d-deepbox Goto Github PK

3D Bounding Box Estimation Using Deep Learning and Geometry (MultiBin)

License: MIT License

Python 82.11% C++ 0.79% MATLAB 17.10%

3d-deepbox's Introduction

3D Bounding Box Estimation Using Deep Learning and Geometry

A Tensorflow implementation of the paper: Mousavian, Arsalan, et al. 3D Bounding Box Estimation Using Deep Learning and Geometry by Fu-Hsiang Chan.

The aim of this project is to predict the size of the bounding box and orientation of the object in 3D space from a single two dimensional image.

Prerequisites

TensorFlow
Numpy
OpenCV
tqdm

Installation

Clone the repository

git clone https://github.com/smallcorgi/3D-Deepbox.git

Download the KITTI object detection dataset, calib and label (http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=2d).

Download the weights file (vgg_16.ckpt).

cd $3D-Deepbox_ROOT
wget http://download.tensorflow.org/models/vgg_16_2016_08_28.tar.gz
tar zxvf vgg_16_2016_08_28.tar.gz

Compile evaluation code

g++ -O3 -DNDEBUG -o ./kitti_eval/evaluate_object_3d_offline ./kitti_eval/evaluate_object_3d_offline.cpp

KITTI train/val split used in 3DOP/Mono3D/MV3D

Usage

Train model

python main.py --mode train --gpu [gpu_id] --image [train_image_path] --label [train_label_path] --box2d [train_2d_boxes]

Test model

python main.py --mode test --gpu [gpu_id] --image [test_image_path] --box2d [test_2d_boxes_path] --model [model_path] --output [output_file_path]

Evaluation on kitti

./kitti_eval/evaluate_object_3d_offline [ground_truth_path] [predict_path]

Result

car_detection AP: 100.000000 100.000000 100.000000

car_orientation AP: 98.482552 95.809959 91.975380

pedestrian_detection AP: 100.000000 100.000000 100.000000

pedestrian_orientation AP: 76.835083 74.943863 71.997620

cyclist_detection AP: 100.000000 100.000000 100.000000

cyclist_orientation AP: 89.908524 81.029915 79.436340

car_detection_ground AP: 90.743927 85.268692 76.673523

pedestrian_detection_ground AP: 97.148033 98.034355 98.376617

cyclist_detection_ground AP: 82.906242 82.897720 75.573006

Eval 3D bounding boxes

car_detection_3d AP: 84.500374 84.358612 75.764938

pedestrian_detection_3d AP: 96.662766 97.702209 89.280357

cyclist_detection_3d AP: 80.711548 81.337944 74.269547

Visualization

mv "output_file" ./validation/result_2
cd ./3D-Deepbox/visualization
run run_demo.m

References

3d-deepbox's People

Contributors

Stargazers

Watchers

Forkers

ipa-jcl-hy lihua213 johndpope timsl dave-luk seragentp sysroute zhangguanghui1 caomw roger00 ml-lab rodionvolovik sdukaka ywwang2013 labimage peiliangli zc-tx jkznst billow06 lxp2013 poirotzhu qiqzhang choiyeren wf-hahaha matrixlover el-moghazy liulj13 zy20091082 ahmadyan pandinosaurus yuehchuan zzzzzz0407 wpfhtl shubhampachori12110095 weiliuxm zzzzzzyyy xfarxod nicknemcev joestark xiaoyc003 qiqika haifengsu shlpu songhanchen scutan90 patickk flyingman1013 srutibh selfdriving-fun-1 weidezhang gatsby23 rajatsharma01 superz678 jzheng84 peiyangong leeyangg collector-m ewenwan littlebylittle2 poodarchu seugs louxy126 dingjiangang klqulei ytongbai expert68 weisili2016 zhn3 guangshengshi cedricxie convmech xiaohuangya317 pyun-ram wzhang1 kdzy nnu-gisa jafei0912 ziyibaby stephen-adams chengwei-lee fangwudi bsaghafi wzan0001 amirunpri2018 yiboliu31 mtsai101 emrys-lee arvind-india liuckind prateekb4u lzw875 samghk jlqzzz jarygrace snailwalkeryc lc041336 mysteryyun ansonyanxin briantoliveira robvcc

3d-deepbox's Issues

Question about loss

Hi @smallcorgi

What loss value did you arrive after 50 epoch training?
I got about 16.0 which seems higher than my expectation(starting from 19.3). The orientation value regressed during test seems reasonable. I am just wondering your case.
Thank you for your answer in advance.

demo model download error

Hi, Thanks for your amazing work. But when I download the demo model, there still some problems.
$ git lfs pull
batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.
error: failed to fetch some objects from 'https://github.com/smallcorgi/3D-Deepbox.git/info/lfs'

So, could you offer me other links(google drive) to download the demo model. Thanks!

How to use the model after training is complete?

Hi:
When I have finished training, there will be three files, which file should I use.

what's the --box2d meaning?

the "test" command is
"python main.py --mode test --gpu [gpu_id] --image [test_image_path] --box2d [test_2d_boxes_path] --model [model_path] --output "
and I get the model by trained the model, now I want to test in the new data, but the box2d label unknow for the new data.
I am not sure only the imagedata and the model are enough for the test?

What is box2d?

In training process, should provide box2d file. But what is this and where should I obtain it?

the question is label is also box provide by kitti. You can using the labeled data directly of course but what's the difference between box2d and label?

are the object coordinate and camera coordinate aligned?

Seen from the code, the object coordinate x_corners is in the length dimension, y_corners the height, z_corners the width. Is it right that the object coordinate positive X axis towards forward, positive Y axis towards downside and positive Z axis towards left (the right hand rule) ? So the camera coordinate also the same direction with object coordinate? I am assuming the object coordinate must be aligned with the camera coordinate, right ? But if it is right, the kitti paper says the camera coordinate positive X towards right, positive Y towards downside, and positive Z towards forward...

Training Data

Hello I have a problem to define which data should we download? since you said the image, calibration, and label. With the current data can I visualize the 3D bounding boxes (just load the dataset and make sure our data loader correct)?

Run inference on Jetson Xavier NX

Can I run inference on Jetson Xavier NX?

Add license

Can you add a license please for people who would like to use your code?

bad result

thank you so much for your code!
but, I wrote some code to visualize, the Orientation result is bad. I test my code on the label of kitti , its good.
so, if you have time ,
would you please explain your anchors code part?

train error

Hi:
when i execute “python main.py --model train --gpu 0 --image training/image_2/ --label training/label_2/
” the command error：
W tensorflow/core/framework/op_kernel.cc:1158] Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for train

my environment is tensorflow 1.2 gpu , python anaconda2

Difference between your implementation and the actual paper's?

Hi @smallcorgi,

Thanks for providing us with your code. As I look through your code and look at the issues section, I see that there are some differences between the actual implementation and your code. Would it be possible for you to list down the differences as I feel that it would be really for those who use your code for their implementation. Thanks.

Some differences that I observed are:

Instead of proposing several 2D boxes for 3D box estimation, you use the 2D boxes from the ground truth dataset.
Instead of estimating the 3D position, you use the ground truth 3D position from the dataset.
During testing, you seem to be giving only the image patch containing the object as input. Shouldn't we give the entire image as input and expect the network to regress the 3D bounding boxes from the entire image.

I am currently trying to understand the paper, so I apologise if things that I suggested turn out to be incorrect.

Thanks again.

What does the orientation defines?

The training data has a orientation with 2 dim matrix, what does this orientation stands for?

Anchors Computing

I have a hard time understanding the anchors and the way it is computed ?
can you give a clear illustration about that ?

How is the translation solved?

Hi,

Thanks for sharing your implementation!
I was wondering if you know how to solve for the translation vector (from object center to camera center) after predicting object dimensions and yaw angles. In your current visualization, the groundtruth translation vectors are used; but these should also be solved as described in the paper.

Best,
Pan

GPU usage is 0

Hi, I am running the training and it seems it was very slow. Take more than one hour to complete one epoch. Here is the output of nvidia-smi:

-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.05 Driver Version: 450.51.05 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 TITAN V Off | 00000000:65:00.0 Off | N/A |
| 32% 46C P8 28W / 250W | 320MiB / 12066MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Quadro K620 Off | 00000000:B3:00.0 On | N/A |
| 35% 51C P0 3W / 30W | 373MiB / 1977MiB | 2% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1205 G /usr/lib/xorg/Xorg 4MiB |
| 0 N/A N/A 1806 G /usr/lib/xorg/Xorg 4MiB |
| 0 N/A N/A 21419 C python3 305MiB |
| 1 N/A N/A 1205 G /usr/lib/xorg/Xorg 56MiB |
| 1 N/A N/A 1806 G /usr/lib/xorg/Xorg 133MiB |
| 1 N/A N/A 2005 G /usr/bin/gnome-shell 131MiB |
| 1 N/A N/A 3991 G /usr/lib/firefox/firefox 0MiB |
| 1 N/A N/A 4606 G /usr/lib/firefox/firefox 0MiB |
| 1 N/A N/A 4976 G /usr/lib/firefox/firefox 37MiB |
+-----------------------------------------------------------------------------+

About yolo and 3d-deepbox

Hi, i want to use yolov3 to do 3d-deepbox, do you have any ideas? Thank you!!

train error

What is the definition of yaw angle?

I am a little confused with the yaw angle of the objects. I thought the orientation of the object is defined by the fourth element in each row of the label file, not sure what the 15th element stands for. And I saw you are using the following formula to compute the yaw angle.

3D-Deepbox/visualization/readLabels.m

Line 33 in 0a321f5

objects(o).ry = C{4}(o) + atan(C{12} (o)/C{14}(o));

Could you please explain how you come up with it and do you use this formula in the training? @smallcorgi

Thanks!

the result is bad

HI , I run the demo by :
python main.py --mode train --gpu 0 --image training/image_2/ --label training/label_2/ --box2d training/label_2/
python main.py --mode test --gpu 0 --image training/image_2/ --box2d training/label_2/ --model model/model-10 --output out/
run run_demo.m

And I got the bad result,why?

test 2D box？

hello, when i use test mode, I don't know how to get my 2d box data...I think DNN is to predict the 2d box data in test mode by using test images...and Kitti test images has not label_2 folder... and how do you get 2d box data of test image? not by test mode?

why do the "line[-1] = angle_offset +np.arctan(float(line[11]) / float(line[13]))" ??

I don't know why you using the kitti GT annotation during inference??
That term "line[11], line[13]" are from kitti GT annotation.

Mistake in reading the orientation label

Check the data_processing/parse_annotation function.
The trained orientation is incorrect. In the Kitti dataset, alpha is the relative orientation between object and the camera.
The one you need to train is rotation_y, which is the last value of each label data.

A Bad Result

I'm sorry that I follow your instructions and get a bad result on the train/val data.

Pre-trained mode

Can anyone share a pre-trained model I can run test with? I do not have a GPU so I can not try my own mode...

3D Box orientation Regression problem

hi,
in your code , global object orientation is regressed by CNN , from only the contents of the 2D bboxs result,
but , I found that the author of the paper[https://arxiv.org/abs/1612.00496] suggested that ,local orientation is regressed, not the global object orientation.
Did I get it wrong?

Does it only one solution to make 4 sides of the 2D box corresponding to 4 of 3D box points？

Hi
i have a littie confused in this powerful method.
Please give me your instruction and advice.

In the paper you said that 2D box sides has 64 configurations between 3D box points.
'''''''''''''
So Does it only one solution to make 4 sides of the 2D box corresponding to 4 of 3D box points?
'''''''''''''

if it is Yes.
How to make 64 configurations to decide only one possible corresponding solution?

if it is no.
By solving the Linear System
it has so many best solution of translation parameters T = [Tx Ty Tz 1]
How to decide the final translation parameters T

Thank you.

what is test_2d_boxes_path and output_file_path?

I ve to run the below command in test mode
python main.py --mode test --gpu [gpu_id] --image [test_image_path] --box2d [test_2d_boxes_path] --model [model_path] --output [output_file_path]

But I dont know what values to pass for test_2d_boxes_path and output_file_path.
Could you help me with that?

Regards
Abhi

Can't understand the angle trained in network

In data_processing.py, 43-45 lines shows how to get theta than trianed, but i don't understand the equal in file. theat_train should be rotation_ray - angle_obj, could you explain your method?

The problem of the devkit_object

hi,
I appreciate your work. But I have some questions.
drawBox3D(h,object,corners,face_idx,orientation);
in this function ,what do these input variable mean?
Could you show me a example of this function?
Thanks

when i use the training model with my pictures , it produced the txt files ,how can i use the txt files to show the 3d bounding box in the test pictures

quota exceeded

git clone https://github.com/smallcorgi/3D-Deepbox.git
Cloning into '3D-Deepbox'...
remote: Enumerating objects: 3855, done.
remote: Total 3855 (delta 0), reused 0 (delta 0), pack-reused 3855
Receiving objects: 100% (3855/3855), 2.29 MiB | 5.71 MiB/s, done.
Resolving deltas: 100% (27/27), done.
Downloading demo_model/demo_model.data-00000-of-00001 (162 MB)
Error downloading object: demo_model/demo_model.data-00000-of-00001 (40bed66): Smudge error: Error downloading demo_model/demo_model.data-00000-of-00001 (40bed66ec5109d2620a390eb16ad04cad335406a5f33fca209dee2417381aca8): batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.

Errors logged to /home/user/3D-Deepbox/.git/lfs/objects/logs/20201019T220032.252927311.log
Use git lfs logs last to view the log.
error: external filter 'git-lfs filter-process' failed
fatal: demo_model/demo_model.data-00000-of-00001: smudge filter lfs failed
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry the checkout with 'git checkout -f HEAD'

Question for the box2d part!

we use python3 main.py --mode train --gpu 2 --image /data1/KITTI/object/training/image_2/ --label /data1/KITTI/object/training/label_2/ --box2d[]

what's the box2d part? Where can I get it?

tried to run main.py

  File "main.py", line 178
    print "Epoch:", epoch+1, " done. Loss:", np.mean(epoch_loss)

also could you point out which exactly path do I specify in the following exacmple if I just downloaded from github and build the code?

./kitti_eval/evaluate_object_3d_offline [ground_truth_path] [predict_path]

what do I have to use as the first parameter? as the second parameter? do I download these files? from where?

labels and 2d bounding boxes

I downloaded kitti dataset, so I have calib, image_2 and label_2. Where can I find the 2D bounding boxes?
--box2d [train_2d_boxes]

Any pretrained model available to test?

Should I crop image before input my own image to the model?

Hi, I am trying to test some of my own images to your code,
Should I have to generate 2D bounding box with my own raw image,
and then input each crop image to your model to get 3D bounding Box?
Thanks.

How epochs does it need to get a fair result?

I have trained about 50 epochs with batch size of 8 on kitti , and the loss is 15.0 approximate but still a little high.

model restored from ./model/model-50.
Epoch 1 : Loss:0.0: 100%|###############################################################################| 1978/1978 [05:00<00:00,  3.67it/s]
Epoch: 1  done. Loss: 15.919499275778374
Epoch Time Cost: 300.38 s
Epoch 2 : Loss:0.0: 100%|###############################################################################| 1978/1978 [04:59<00:00,  6.76it/s]
Epoch: 2  done. Loss: 15.914487312247466
Epoch Time Cost: 299.55 s
Epoch 3 : Loss:0.0: 100%|###############################################################################| 1978/1978 [05:02<00:00,  7.03it/s]
Epoch: 3  done. Loss: 15.917136192803918
Epoch Time Cost: 302.93 s
Epoch 4 : Loss:0.0: 100%|###############################################################################| 1978/1978 [04:56<00:00,  6.75it/s]
Epoch: 4  done. Loss: 15.917050718417423

Seems the loss is very hard to converge

AttributeError: 'generator' object has no attribute 'next'

Epoch 1 : Loss:0.0: 0%| | 0/1978 [00:00<?, ?it/s]Traceback (most recent call last):
File "/home/wrc/下载/pycharm-2018.2.5/helpers/pydev/pydevd.py", line 1664, in
main()
File "/home/wrc/下载/pycharm-2018.2.5/helpers/pydev/pydevd.py", line 1658, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "/home/wrc/下载/pycharm-2018.2.5/helpers/pydev/pydevd.py", line 1068, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/home/wrc/下载/pycharm-2018.2.5/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/wrc/yuyijie/3d_detec/3D-Deepbox/main.py", line 282, in
train(args.image, args.box2d, args.label)
File "/home/wrc/yuyijie/3d_detec/3D-Deepbox/main.py", line 168, in train
train_img, train_label = train_gen.next()
AttributeError: 'generator' object has no attribute 'next'

I inputed --mode train --gpu 0 --image /home/wrc/yuyijie/3d_detec/3D-Deepbox/training/image_2/ --label /home/wrc/yuyijie/3d_detec/3D-Deepbox/training/label_2/ --box2d /home/wrc/yuyijie/3d_detec/3D-Deepbox/training/label_2/ as parameters but this error happened,what should I do

run_demo

Hi when I run run_demo ,I got the error is:
Index exceeds matrix dimensions.

Error in visualization (line 12)
img = imread(sprintf('%s/%s',image_dir,img_list(3+0).name));

Error in run_demo (line 45)
h = visualization('init',image_dir);

How can I solved it?

next method

Epoch 1 : Loss:0.0: 0%| | 0/1977 [00:00<?, ?it/s]
Traceback (most recent call last):
File "main.py", line 287, in
train(args.image, args.box2d, args.label)
File "main.py", line 172, in train
train_img, train_label = train_gen.next()
AttributeError: 'generator' object has no attribute 'next'

Cannot test the output. Model is not trained proper I guess.

I used this code to test the 3d bounding box. Its showing some TensorSliceReader error.
python main.py --mode test --gpu 00:1e.0 --image training/image_2/ --box2d training/label_2/ --model model/ --output out/

NotFoundError (see above for traceback): Unsuccessful TensorSliceReader constructor: Failed to find any matching files for model/
	 [[Node: save/RestoreV2_37 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_37/tensor_names, save/RestoreV2_37/shape_and_slices)]]
	 [[Node: save/RestoreV2_17/_43 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_122_save/RestoreV2_17", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]

Here is my model folder:
checkpoint model-5.data-00000-of-00001
model-10.data-00000-of-00001 model-5.index
model-10.index model-5.meta
model-10.meta

I should say that I changed main.py to change no of the epoch to 10 only.

How to train your own dataset？

I hope you can give you some suggestions, look forward to your reply！

What is the ground truth in the evaluation?

As I run the evaluation, I need to input the ground truth path, what is that? Where can i get it?

Question regarding the purpose of this repo

Hi @smallcorgi, thanks a lot for your code.

Wanted to confirm that this repo only regresses the 2D-3D part of the paper, correct? This doesn't do the 2D detection (Faster RCNN/MS CNN) in the paper as even while testing, the 2D bounding boxes are given as the input. This means that the 2D input is fed to regress 3D and that's it.

Where does the location value(x, y, z) come from?

Hi,

I am wondering how you get the location value(3D object location x,y,z) which are necessary for 3D box calculation.
Do you suppose the location_x, location_y, and location_z are already regressed in 2D detector? It seems 3D location values are just read from 2D txt file according to your source code.
Please make me right if I misunderstood.

how to test my photo from trained model,and how to show my *.jpg from trained model

thks

replace vgg with resnet

Hi
Thanks your great work, I am now trying to replace the backbone of second stage using Resnet since Resnet usually has a better performance when doing a same work. However, after I replace the vgg by resnet, the result is terrible. I wonder if you have do the same thing and cloud you please tell me whether this idea is useful. Thank you!

Best regards

loading demo_model

Hi,

I was trying to test the demo_model but when I was loading the model, I was getting the following error messages:
W tensorflow/core/framework/op_kernel.cc:1192] Out of range: Read less bytes than requested

Has anyone here successfully loaded the demo_model?