This is to understand images with deep learning approaches. As IUDLM tells, it involves Image Understanding (object detection, localization, recognition, segmentation, understanding), Deep Learning (CNN, RNN, RL), and Mathematics (Optimization, Statistics).
Our goal is to combine deep learning and object detection. For the overview of framework, we refer to Object Detection with Deep Learning: A Review -- Zhong-Qiu Zhao. We will borrow our machine learning algorithms and Cameo architecture.
We extract the core idea from the review, Recent Advances in Deep Learning for Object Detection -- Xiongwei Wu. Then we summarize the design guideline as a manual based on the reference and experience. We can build the computational model by constructing the blocks.
The whole system is listed in the folder DL4CV. It consists of
- Deep Learning (keras/tensorflow)
1.1 iudlm -> dataloader
1.2 iudlm -> IO
1.3 iudlm -> model
1.4 iudlm -> preprocessor
1.5 iudlm -> utils
- Computer Vision (OpenCV)
2.1 videoanalysis -> CaptureManager
2.2 videoanalysis -> WindowManager
2.3 ...
- Real-Time Application
3.1 Cameo
3.2 ...
- Neural Network
- Probabilistic Graphical Model
- Solver
- Object Detection
- Object Recognition
- Segmentation
- Localization
- Optimization
- Statistics
- Region proposal based (R-CNN)
- Regression/Classification based (YOLO)
- Prototype
- Optimization
- Established
- Standard
- Class
- Abstraction
- Python/PyCharm
- Tensorflow
- OpenCV
- Numpy
- Pandas
- Matplotlib
- Sklearn
- Scipy
- MATLAB
- C++
class ClassName(object):
def __init__(self):
# variables
def method(self):
# operations
Reference: Selective Search for Object Recognition -- J.R.R. Uijlings
Problem: Generating possible object locations for use in object recognition
Solution: Selective Search
Reference: Efficient Graph-Based Image Segmentation -- Pedro F. Felzenszwalb
Problem: segmenting an image into regions
Solution: Graph-Based Image Segmentation
Reference: Rich feature hierarchies for accurate object detection and semantic segmentation -- Ross Girshick
Framework: R-CNN: Regions with CNN features
Modules
- Region proposals
- Feature extraction
- Classification
Here we are focused on Region proposals. We have built the other modules. Once we can finish the region proposals module, we can build R-CNN, and its variants.
We refer to source code mentioned in Efficient Graph-Based Image Segmentation. We will write the prototpye using python.
We build a rough prototpye using Python.
We build the prototype using object-oriented programming. Reference: Lifelong Machine Learning Systems: Beyond Learning Algorithms -- Daniel L. Silver
The goal is to sequentially retain learned knowledge and to selectively transfer that knowledge when learning a new task so as to develop more accurate hypotheses or policies.
We went over the Deep learning notes cmu -- Deep learaning. This is to introduce Deep learning with neural network.
Rich feature hierarchies for accurate object detection and semantic segmentation -- Ross Girshick proposes R-CNN -- Regions with CNN features.
Reference, Matching Networks for One Shot Learning -- Oriol Vinyals, consists of learning a class from a single labelled example.
We review Faster-RCNN and write down a summary. We focus on two Modules: Region Proposal Network (RPN) and Region of Interest (ROI) Pooling. We use keras to build a MiniVGGNet as the base network.
Reference, SCARLET-NAS: Bridging the gap Between Scalability and Fairness in Neural Architecture Search -- Xiangxiang Chu, proposes an Architeture search approach to bridge the gap between Scalability and Fairness with a linearly transformation. The problem can be converted into a multi-objective optimization problem. Mathematically, we can find the optimal architecture by sovling the optimization problem.
We extract the core idea from the review, Recent Advances in Deep Learning for Object Detection -- Xiongwei Wu. Then we summarize the design guideline as a manual based on the reference and experience. We can build the computational model by constructing the blocks. Deep Learning for Computer Vision, Layered Software Design, Object Detection Framework.