Giter Club home page Giter Club logo

monocular-vehicle-localization's Introduction

Monocular Vehicle Localization (In Progress)

Introduction

Estimating the orientation and the size of vehicle using a single 2d image. The output is a 3d bounding box around the detected vehicle.

The model is trained on KITTI dataset for 3d object detection. The prediction is made little efficient by detecting the 2d bounding frames with a separate NN model and running the Pose estimation on the particular frame.

The 2d detection is done with a simple SSD Tensorflow object detection pretrained model for now; can be optimized with a standalone vehicle detection model. The 3d prediction model is constructed of CNN layers and outputs the dimension and the orientation of the detected object. The traditional approach utilizes a model that regresses the 3d coordinates of the object detected. Here, monodepth model is used to determine the depth of the object detected and the corresponding x,y position in the camera coordinates is calculated by stereo vision triangulation. This helps in the precise perception of the environment.

Architecture

Prerequisites

Usage

Edit the necessary configurations and paths in config.py

#export tensorflow utils for 2d detection model
cd tensorflow/models/research
export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim

#export current working directory
cd working_directory
export PYTHONPATH=$PYTHONPATH:`pwd`

#train the model
python main.py --mode=train

#evaluate example images
python main.py --mode=predict

Dataset

For training, KITTI 3d detection dataset was used. Note that the location of the object is in camera coordinates, which is defined to be camera- x:right y:down z:forward Below mentioned table from KITTI object development kit describes the data in the labeles file.

#Values    Name      Description
----------------------------------------------------------------------------
   1    type         Describes the type of object: 'Car', 'Van', 'Truck',
                     'Pedestrian', 'Person_sitting', 'Cyclist', 'Tram',
                     'Misc' or 'DontCare'
   1    truncated    Float from 0 (non-truncated) to 1 (truncated), where
                     truncated refers to the object leaving image boundaries
   1    occluded     Integer (0,1,2,3) indicating occlusion state:
                     0 = fully visible, 1 = partly occluded
                     2 = largely occluded, 3 = unknown
   1    alpha        Observation angle of object, ranging [-pi..pi]
   4    bbox         2D bounding box of object in the image (0-based index):
                     contains left, top, right, bottom pixel coordinates
   3    dimensions   3D object dimensions: height, width, length (in meters)
   3    location     3D object location x,y,z in camera coordinates (in meters)
   1    rotation_y   Rotation ry around Y-axis in camera coordinates [-pi..pi]
   1    score        Only for results: Float, indicating confidence in
                     detection, needed for p/r curves, higher is better.

For the final prediction, the bbox data is provided by the 2d object detection model ssd_detection.py which uses pretrained Tensorflow COCOmobilenet to output a 2d bounding box.

Ouput

The below images are the ouput from the partially trained model on 8 vCPUs, 52 GB memory machine (no GPU). The output will be more precise after complete training (currently trained only for 1 out of 500 Epochs).

References

monocular-vehicle-localization's People

Contributors

tj27-vkr avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.