Giter Club home page Giter Club logo

3d-deepbox's Introduction

3D Bounding Box Estimation Using Deep Learning and Geometry

A Tensorflow implementation of the paper: Mousavian, Arsalan, et al. 3D Bounding Box Estimation Using Deep Learning and Geometry by Fu-Hsiang Chan.

The aim of this project is to predict the size of the bounding box and orientation of the object in 3D space from a single two dimensional image.

Prerequisites

  1. TensorFlow
  2. Numpy
  3. OpenCV
  4. tqdm

Installation

  1. Clone the repository
    git clone https://github.com/smallcorgi/3D-Deepbox.git
  2. Download the KITTI object detection dataset, calib and label (http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=2d).
  3. Download the weights file (vgg_16.ckpt).
    cd $3D-Deepbox_ROOT
    wget http://download.tensorflow.org/models/vgg_16_2016_08_28.tar.gz
    tar zxvf vgg_16_2016_08_28.tar.gz
  4. Compile evaluation code
    g++ -O3 -DNDEBUG -o ./kitti_eval/evaluate_object_3d_offline ./kitti_eval/evaluate_object_3d_offline.cpp
  5. KITTI train/val split used in 3DOP/Mono3D/MV3D

Usage

Train model

python main.py --mode train --gpu [gpu_id] --image [train_image_path] --label [train_label_path] --box2d [train_2d_boxes]

Test model

python main.py --mode test --gpu [gpu_id] --image [test_image_path] --box2d [test_2d_boxes_path] --model [model_path] --output [output_file_path]

Evaluation on kitti

./kitti_eval/evaluate_object_3d_offline [ground_truth_path] [predict_path]

Result

car_detection AP: 100.000000 100.000000 100.000000

car_orientation AP: 98.482552 95.809959 91.975380

pedestrian_detection AP: 100.000000 100.000000 100.000000

pedestrian_orientation AP: 76.835083 74.943863 71.997620

cyclist_detection AP: 100.000000 100.000000 100.000000

cyclist_orientation AP: 89.908524 81.029915 79.436340

car_detection_ground AP: 90.743927 85.268692 76.673523

pedestrian_detection_ground AP: 97.148033 98.034355 98.376617

cyclist_detection_ground AP: 82.906242 82.897720 75.573006

Eval 3D bounding boxes

car_detection_3d AP: 84.500374 84.358612 75.764938

pedestrian_detection_3d AP: 96.662766 97.702209 89.280357

cyclist_detection_3d AP: 80.711548 81.337944 74.269547

Visualization

mv "output_file" ./validation/result_2
cd ./3D-Deepbox/visualization
run run_demo.m

References

  1. https://github.com/shashwat14/Multibin
  2. https://github.com/experiencor/didi-starter/tree/master/simple_solution
  3. https://github.com/experiencor/image-to-3d-bbox
  4. https://github.com/prclibo/kitti_eval

3d-deepbox's People

Contributors

smallcorgi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

3d-deepbox's Issues

Question about loss

Hi @smallcorgi

What loss value did you arrive after 50 epoch training?
I got about 16.0 which seems higher than my expectation(starting from 19.3). The orientation value regressed during test seems reasonable. I am just wondering your case.
Thank you for your answer in advance.

demo model download error

Hi, Thanks for your amazing work. But when I download the demo model, there still some problems.
$ git lfs pull
batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.
error: failed to fetch some objects from 'https://github.com/smallcorgi/3D-Deepbox.git/info/lfs'

So, could you offer me other links(google drive) to download the demo model. Thanks!

what's the --box2d meaning?

the "test" command is
"python main.py --mode test --gpu [gpu_id] --image [test_image_path] --box2d [test_2d_boxes_path] --model [model_path] --output "
and I get the model by trained the model, now I want to test in the new data, but the box2d label unknow for the new data.
I am not sure only the imagedata and the model are enough for the test?

What is box2d?

In training process, should provide box2d file. But what is this and where should I obtain it?

the question is label is also box provide by kitti. You can using the labeled data directly of course but what's the difference between box2d and label?

are the object coordinate and camera coordinate aligned?

Seen from the code, the object coordinate x_corners is in the length dimension, y_corners the height, z_corners the width. Is it right that the object coordinate positive X axis towards forward, positive Y axis towards downside and positive Z axis towards left (the right hand rule) ? So the camera coordinate also the same direction with object coordinate? I am assuming the object coordinate must be aligned with the camera coordinate, right ? But if it is right, the kitti paper says the camera coordinate positive X towards right, positive Y towards downside, and positive Z towards forward...

Training Data

Hello I have a problem to define which data should we download? since you said the image, calibration, and label. With the current data can I visualize the 3D bounding boxes (just load the dataset and make sure our data loader correct)?
image

Add license

Can you add a license please for people who would like to use your code?

bad result

thank you so much for your code!
but, I wrote some code to visualize, the Orientation result is bad. I test my code on the label of kitti , its good.
so, if you have time ,
would you please explain your anchors code part?

train error

Hi:
when i execute “python main.py --model train --gpu 0 --image training/image_2/ --label training/label_2/
” the command error:
W tensorflow/core/framework/op_kernel.cc:1158] Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for train

my environment is tensorflow 1.2 gpu , python anaconda2

Difference between your implementation and the actual paper's?

Hi @smallcorgi,

Thanks for providing us with your code. As I look through your code and look at the issues section, I see that there are some differences between the actual implementation and your code. Would it be possible for you to list down the differences as I feel that it would be really for those who use your code for their implementation. Thanks.

Some differences that I observed are:

  1. Instead of proposing several 2D boxes for 3D box estimation, you use the 2D boxes from the ground truth dataset.
  2. Instead of estimating the 3D position, you use the ground truth 3D position from the dataset.
  3. During testing, you seem to be giving only the image patch containing the object as input. Shouldn't we give the entire image as input and expect the network to regress the 3D bounding boxes from the entire image.

I am currently trying to understand the paper, so I apologise if things that I suggested turn out to be incorrect.

Thanks again.

Anchors Computing

I have a hard time understanding the anchors and the way it is computed ?
can you give a clear illustration about that ?

How is the translation solved?

Hi,

Thanks for sharing your implementation!
I was wondering if you know how to solve for the translation vector (from object center to camera center) after predicting object dimensions and yaw angles. In your current visualization, the groundtruth translation vectors are used; but these should also be solved as described in the paper.

Best,
Pan

GPU usage is 0

Hi, I am running the training and it seems it was very slow. Take more than one hour to complete one epoch. Here is the output of nvidia-smi:

-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.05 Driver Version: 450.51.05 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 TITAN V Off | 00000000:65:00.0 Off | N/A |
| 32% 46C P8 28W / 250W | 320MiB / 12066MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Quadro K620 Off | 00000000:B3:00.0 On | N/A |
| 35% 51C P0 3W / 30W | 373MiB / 1977MiB | 2% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1205 G /usr/lib/xorg/Xorg 4MiB |
| 0 N/A N/A 1806 G /usr/lib/xorg/Xorg 4MiB |
| 0 N/A N/A 21419 C python3 305MiB |
| 1 N/A N/A 1205 G /usr/lib/xorg/Xorg 56MiB |
| 1 N/A N/A 1806 G /usr/lib/xorg/Xorg 133MiB |
| 1 N/A N/A 2005 G /usr/bin/gnome-shell 131MiB |
| 1 N/A N/A 3991 G /usr/lib/firefox/firefox 0MiB |
| 1 N/A N/A 4606 G /usr/lib/firefox/firefox 0MiB |
| 1 N/A N/A 4976 G /usr/lib/firefox/firefox 37MiB |
+-----------------------------------------------------------------------------+

What is the definition of yaw angle?

I am a little confused with the yaw angle of the objects. I thought the orientation of the object is defined by the fourth element in each row of the label file, not sure what the 15th element stands for. And I saw you are using the following formula to compute the yaw angle.

objects(o).ry = C{4}(o) + atan(C{12} (o)/C{14}(o));

Could you please explain how you come up with it and do you use this formula in the training? @smallcorgi

Thanks!

the result is bad

HI , I run the demo by :
python main.py --mode train --gpu 0 --image training/image_2/ --label training/label_2/ --box2d training/label_2/
python main.py --mode test --gpu 0 --image training/image_2/ --box2d training/label_2/ --model model/model-10 --output out/
run run_demo.m

And I got the bad result,why?

test 2D box?

hello, when i use test mode, I don't know how to get my 2d box data...I think DNN is to predict the 2d box data in test mode by using test images...and Kitti test images has not label_2 folder... and how do you get 2d box data of test image? not by test mode?

Mistake in reading the orientation label

Check the data_processing/parse_annotation function.
The trained orientation is incorrect. In the Kitti dataset, alpha is the relative orientation between object and the camera.
The one you need to train is rotation_y, which is the last value of each label data.

A Bad Result

I'm sorry that I follow your instructions and get a bad result on the train/val data.

Pre-trained mode

Can anyone share a pre-trained model I can run test with? I do not have a GPU so I can not try my own mode...

3D Box orientation Regression problem

hi,
in your code , global object orientation is regressed by CNN , from only the contents of the 2D bboxs result,
but , I found that the author of the paper[https://arxiv.org/abs/1612.00496] suggested that ,local orientation is regressed, not the global object orientation.
Did I get it wrong?

Does it only one solution to make 4 sides of the 2D box corresponding to 4 of 3D box points?

Hi
i have a littie confused in this powerful method.
Please give me your instruction and advice.

In the paper you said that 2D box sides has 64 configurations between 3D box points.
'''''''''''''
So Does it only one solution to make 4 sides of the 2D box corresponding to 4 of 3D box points?
'''''''''''''

if it is Yes.
How to make 64 configurations to decide only one possible corresponding solution?

if it is no.
By solving the Linear System
it has so many best solution of translation parameters T = [Tx Ty Tz 1]
How to decide the final translation parameters T

Thank you.

what is test_2d_boxes_path and output_file_path?

I ve to run the below command in test mode
python main.py --mode test --gpu [gpu_id] --image [test_image_path] --box2d [test_2d_boxes_path] --model [model_path] --output [output_file_path]

But I dont know what values to pass for test_2d_boxes_path and output_file_path.
Could you help me with that?

Regards
Abhi

Can't understand the angle trained in network

In data_processing.py, 43-45 lines shows how to get theta than trianed, but i don't understand the equal in file. theat_train should be rotation_ray - angle_obj, could you explain your method?

The problem of the devkit_object

hi,
I appreciate your work. But I have some questions.
drawBox3D(h,object,corners,face_idx,orientation);
in this function ,what do these input variable mean?
Could you show me a example of this function?
Thanks

quota exceeded

git clone https://github.com/smallcorgi/3D-Deepbox.git
Cloning into '3D-Deepbox'...
remote: Enumerating objects: 3855, done.
remote: Total 3855 (delta 0), reused 0 (delta 0), pack-reused 3855
Receiving objects: 100% (3855/3855), 2.29 MiB | 5.71 MiB/s, done.
Resolving deltas: 100% (27/27), done.
Downloading demo_model/demo_model.data-00000-of-00001 (162 MB)
Error downloading object: demo_model/demo_model.data-00000-of-00001 (40bed66): Smudge error: Error downloading demo_model/demo_model.data-00000-of-00001 (40bed66ec5109d2620a390eb16ad04cad335406a5f33fca209dee2417381aca8): batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.

Errors logged to /home/user/3D-Deepbox/.git/lfs/objects/logs/20201019T220032.252927311.log
Use git lfs logs last to view the log.
error: external filter 'git-lfs filter-process' failed
fatal: demo_model/demo_model.data-00000-of-00001: smudge filter lfs failed
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry the checkout with 'git checkout -f HEAD'

Question for the box2d part!

we use python3 main.py --mode train --gpu 2 --image /data1/KITTI/object/training/image_2/ --label /data1/KITTI/object/training/label_2/ --box2d[]

what's the box2d part? Where can I get it?

tried to run main.py

  File "main.py", line 178
    print "Epoch:", epoch+1, " done. Loss:", np.mean(epoch_loss)

also could you point out which exactly path do I specify in the following exacmple if I just downloaded from github and build the code?

./kitti_eval/evaluate_object_3d_offline [ground_truth_path] [predict_path]

what do I have to use as the first parameter? as the second parameter? do I download these files? from where?

labels and 2d bounding boxes

I downloaded kitti dataset, so I have calib, image_2 and label_2. Where can I find the 2D bounding boxes?
--box2d [train_2d_boxes]

How epochs does it need to get a fair result?

I have trained about 50 epochs with batch size of 8 on kitti , and the loss is 15.0 approximate but still a little high.

model restored from ./model/model-50.
Epoch 1 : Loss:0.0: 100%|###############################################################################| 1978/1978 [05:00<00:00,  3.67it/s]
Epoch: 1  done. Loss: 15.919499275778374
Epoch Time Cost: 300.38 s
Epoch 2 : Loss:0.0: 100%|###############################################################################| 1978/1978 [04:59<00:00,  6.76it/s]
Epoch: 2  done. Loss: 15.914487312247466
Epoch Time Cost: 299.55 s
Epoch 3 : Loss:0.0: 100%|###############################################################################| 1978/1978 [05:02<00:00,  7.03it/s]
Epoch: 3  done. Loss: 15.917136192803918
Epoch Time Cost: 302.93 s
Epoch 4 : Loss:0.0: 100%|###############################################################################| 1978/1978 [04:56<00:00,  6.75it/s]
Epoch: 4  done. Loss: 15.917050718417423

Seems the loss is very hard to converge

AttributeError: 'generator' object has no attribute 'next'

Epoch 1 : Loss:0.0: 0%| | 0/1978 [00:00<?, ?it/s]Traceback (most recent call last):
File "/home/wrc/下载/pycharm-2018.2.5/helpers/pydev/pydevd.py", line 1664, in
main()
File "/home/wrc/下载/pycharm-2018.2.5/helpers/pydev/pydevd.py", line 1658, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "/home/wrc/下载/pycharm-2018.2.5/helpers/pydev/pydevd.py", line 1068, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/home/wrc/下载/pycharm-2018.2.5/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/wrc/yuyijie/3d_detec/3D-Deepbox/main.py", line 282, in
train(args.image, args.box2d, args.label)
File "/home/wrc/yuyijie/3d_detec/3D-Deepbox/main.py", line 168, in train
train_img, train_label = train_gen.next()
AttributeError: 'generator' object has no attribute 'next'

I inputed --mode train --gpu 0 --image /home/wrc/yuyijie/3d_detec/3D-Deepbox/training/image_2/ --label /home/wrc/yuyijie/3d_detec/3D-Deepbox/training/label_2/ --box2d /home/wrc/yuyijie/3d_detec/3D-Deepbox/training/label_2/ as parameters but this error happened,what should I do

run_demo

Hi when I run run_demo ,I got the error is:
Index exceeds matrix dimensions.

Error in visualization (line 12)
img = imread(sprintf('%s/%s',image_dir,img_list(3+0).name));

Error in run_demo (line 45)
h = visualization('init',image_dir);

How can I solved it?

next method

Epoch 1 : Loss:0.0: 0%| | 0/1977 [00:00<?, ?it/s]
Traceback (most recent call last):
File "main.py", line 287, in
train(args.image, args.box2d, args.label)
File "main.py", line 172, in train
train_img, train_label = train_gen.next()
AttributeError: 'generator' object has no attribute 'next'

Cannot test the output. Model is not trained proper I guess.

I used this code to test the 3d bounding box. Its showing some TensorSliceReader error.
python main.py --mode test --gpu 00:1e.0 --image training/image_2/ --box2d training/label_2/ --model model/ --output out/

NotFoundError (see above for traceback): Unsuccessful TensorSliceReader constructor: Failed to find any matching files for model/
	 [[Node: save/RestoreV2_37 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_37/tensor_names, save/RestoreV2_37/shape_and_slices)]]
	 [[Node: save/RestoreV2_17/_43 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_122_save/RestoreV2_17", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]

Here is my model folder:
checkpoint model-5.data-00000-of-00001
model-10.data-00000-of-00001 model-5.index
model-10.index model-5.meta
model-10.meta

I should say that I changed main.py to change no of the epoch to 10 only.

Question regarding the purpose of this repo

Hi @smallcorgi, thanks a lot for your code.

Wanted to confirm that this repo only regresses the 2D-3D part of the paper, correct? This doesn't do the 2D detection (Faster RCNN/MS CNN) in the paper as even while testing, the 2D bounding boxes are given as the input. This means that the 2D input is fed to regress 3D and that's it.

Where does the location value(x, y, z) come from?

Hi,

I am wondering how you get the location value(3D object location x,y,z) which are necessary for 3D box calculation.
Do you suppose the location_x, location_y, and location_z are already regressed in 2D detector? It seems 3D location values are just read from 2D txt file according to your source code.
Please make me right if I misunderstood.

replace vgg with resnet

Hi
Thanks your great work, I am now trying to replace the backbone of second stage using Resnet since Resnet usually has a better performance when doing a same work. However, after I replace the vgg by resnet, the result is terrible. I wonder if you have do the same thing and cloud you please tell me whether this idea is useful. Thank you!

Best regards

loading demo_model

Hi,

I was trying to test the demo_model but when I was loading the model, I was getting the following error messages:
W tensorflow/core/framework/op_kernel.cc:1192] Out of range: Read less bytes than requested

Has anyone here successfully loaded the demo_model?

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.