Giter Club home page Giter Club logo

calib-challenge's Introduction

LICENSE python3 Build Status

Commaai's Calib-Challange

Goal

the goal is to predict the direction of travel (in camera frame) from provided dashcam video. (yaw and pitch, fortunately, no roll)

Commaai's repo provides 10 videos. Every video is 1min long and 20 fps.
5 videos are labelled with a 2D array describing the direction of travel at every frame of the video with a pitch and yaw angle in radians.
5 videos are unlabeled. It is your task to generate the labels for them.
The example labels are generated using a Neural Network, and the labels were confirmed with a SLAM algorithm.
You can estimate the focal length to be 910 pixels.


Extending the goals and making myself a functional device that can meke my car drive itself.
what are the things I have to consider when writing code (still thinking)
So its a lot of classes I have to think some clever way to get it down.

Evaluation

They will evaluate our mean squared error against our ground truth labels. Errors for frames where the car speed is less than 4m/s will be ignored. Those are also labelled as NaN in the example labels.

commaai's repo includes an eval script that will give an error score (lower is better). You can use it to test your solutions against the labelled examples. They will use this script to evaluate your solution.

Architecture

I am thinking of using some kind of optical flow model and rather than doing some kind of image stabilization or something I'll make it yield Yaw and Pitch of the moving vehicle.

Adding details about the architecture soon!

1. FlowNetCorr

I'm gonna keep it short, sweet and to the point
So the architecture used was taken from this reseach paper it's ConvNets again !! predicting stuff like optical flows is not easy and surely you can not do it with a single input image.

A straightforward step is to create two separate, yet identical processing streams for the two adjacent frames and to combine them at a later stage (after 3 convs in this case).

In the research paper to concatenate the outputs for the convnets, they used "CORRelation layer" but I don't think it makes a lot of difference.



after a bunch of ConvNets, it goes through a refinement layer the output for the above architecture is the input for the refinement layer!



This pretty much summarizes the architecture and at the end rather than implementing the last layer I make the matrix pass through a Linear layer and predict yaw and pitch with ONE HOT vector kinda thing. If you have a better idea for the ONE HOT vector alternative just let me know !!

2. Global Motion Aggregation

3. MarkFlowNet --> no implimentation

4. FlowNet2.0 --> no implimentation

Navigation

Labelled dataset [by comma]
Unlabeled test dataset [by comma]
Eval script [by comma]
Models and training script
Setup
Pretrained weights
what the user sees (software)
what the user sees (webpage)
segmentation

ToDo

  • Visualizing the data
  • MaskFlow net
  • FlowNet corr (not as good as I thought)
  • Gma
  • Training the model( on azure cause i have .edu email :) ) PS. Azure is useless !
  • SLAM
  • Write utility functions ( done for flownet working on GMA )
  • Build and Deploy with QT5 in the pedal repo.
  • Update README
  • Segment comma 10k dataset
  • Pilotnet
  • Implement the research papers from george and do some viz on it use future images for pred lines (yaw and pitch)
  • Depth_net

I have to deploy it and retrain it on new data and keep on doing that !
for now I'm not doing it in real time or with time i will make this thing work with carla

I'm so lazy to compelete the code. If there is anyone to compelete it for me go on !!

How to tinker/use the code?

  • you can monitor the training process with tensorboard:
tensorboard --port=PORT --logdir=pretrained
  • pretrained model is a little too heavy for github, uploading on google drive
https://drive.google.com/file/d/1kxpD8DmL-CQIB02zxah_-BIoM6spcBJF/view?usp=sharing
  • training script for flownetCorr is here
  - python train_flownetcorr --help (for all the arguments and folder locations)
  - the training loop is in the 'train' function.
  - the validation loop is in the 'validation' function.
  - there are relevent comments before every piece of code so it is not that tough to identify and change stuff.
  - it uses MSE loss that is the squared of the mean of the losses through the batchsize.

  • FlownetCorr model is here
  -

adding soon be patient!

example of how opensource is changing the world !!

comma ai

calib-challenge's People

Contributors

shauray8 avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.