microsoft / computervision-recipes Goto Github PK

Best Practices, code samples, and documentation for Computer Vision.

License: MIT License

Jupyter Notebook 97.46% Python 2.26% Shell 0.01% Dockerfile 0.01% HTML 0.06% C++ 0.03% Cuda 0.11% C 0.01% JavaScript 0.05% CSS 0.01%

machine-learning computer-vision deep-learning python jupyter-notebook operationalization kubernetes azure microsoft data-science

computervision-recipes's Introduction

+ Update July: Added support for action recognition and tracking
+              in the new release v1.2.

Computer Vision

In recent years, we've see an extra-ordinary growth in Computer Vision, with applications in face recognition, image understanding, search, drones, mapping, semi-autonomous and autonomous vehicles. A key part to many of these applications are visual recognition tasks such as image classification, object detection and image similarity.

This repository provides examples and best practice guidelines for building computer vision systems. The goal of this repository is to build a comprehensive set of tools and examples that leverage recent advances in Computer Vision algorithms, neural architectures, and operationalizing such systems. Rather than creating implementations from scratch, we draw from existing state-of-the-art libraries and build additional utility around loading image data, optimizing and evaluating models, and scaling up to the cloud. In addition, having worked in this space for many years, we aim to answer common questions, point out frequently observed pitfalls, and show how to use the cloud for training and deployment.

We hope that these examples and utilities can significantly reduce the “time to market” by simplifying the experience from defining the business problem to development of solution by orders of magnitude. In addition, the example notebooks would serve as guidelines and showcase best practices and usage of the tools in a wide variety of languages.

These examples are provided as Jupyter notebooks and common utility functions. All examples use PyTorch as the underlying deep learning library.

Examples

This repository supports various Computer Vision scenarios which either operate on a single image:

As well as scenarios such as action recognition which take a video sequence as input:

Target Audience

Our target audience for this repository includes data scientists and machine learning engineers with varying levels of Computer Vision knowledge as our content is source-only and targets custom machine learning modelling. The utilities and examples provided are intended to be solution accelerators for real-world vision problems.

Getting Started

To get started, navigate to the Setup Guide, which lists instructions on how to setup the compute environment and dependencies needed to run the notebooks in this repo. Once your environment is setup, navigate to the Scenarios folder and start exploring the notebooks. We recommend to start with the image classification notebooks, since this introduces concepts which are also used by the other scenarios (e.g. pre-training on ImageNet).

Alternatively, we support Binder which makes it easy to try one of our notebooks in a web-browser simply by following this link. However, Binder is free, and as a result only comes with limited CPU compute power and without GPU support. Expect the notebook to run very slowly (this is somewhat improved by reducing image resolution to e.g. 60 pixels but at the cost of low accuracies).

Scenarios

The following is a summary of commonly used Computer Vision scenarios that are covered in this repository. For each of the main scenarios ("base"), we provide the tools to effectively build your own model. This includes simple tasks such as fine-tuning your own model on your own data, to more complex tasks such as hard-negative mining and even model deployment.

Scenario	Support	Description
Classification	Base	Image Classification is a supervised machine learning technique to learn and predict the category of a given image.
Similarity	Base	Image Similarity is a way to compute a similarity score given a pair of images. Given an image, it allows you to identify the most similar image in a given dataset.
Detection	Base	Object Detection is a technique that allows you to detect the bounding box of an object within an image.
Keypoints	Base	Keypoint detection can be used to detect specific points on an object. A pre-trained model is provided to detect body joints for human pose estimation.
Segmentation	Base	Image Segmentation assigns a category to each pixel in an image.
Action recognition	Base	Action recognition to identify in video/webcam footage what actions are performed (e.g. "running", "opening a bottle") and at what respective start/end times. We also implemented the i3d implementation of action recognition that can be found under (contrib)[contrib].
Tracking	Base	Tracking allows to detect and track multiple objects in a video sequence over time.
Crowd counting	Contrib	Counting the number of people in low-crowd-density (e.g. less than 10 people) and high-crowd-density (e.g. thousands of people) scenarios.

We separate the supported CV scenarios into two locations: (i) base: code and notebooks within the "utils_cv" and "scenarios" folders which follow strict coding guidelines, are well tested and maintained; (ii) contrib: code and other assets within the "contrib" folder, mainly covering less common CV scenarios using bleeding edge state-of-the-art approaches. Code in "contrib" is not regularly tested or maintained.

Computer Vision on Azure

Note that for certain computer vision problems, you may not need to build your own models. Instead, pre-built or easily customizable solutions exist on Azure which do not require any custom coding or machine learning expertise. We strongly recommend evaluating if these can sufficiently solve your problem. If these solutions are not applicable, or the accuracy of these solutions is not sufficient, then resorting to more complex and time-consuming custom approaches may be necessary.

The following Microsoft services offer simple solutions to address common computer vision tasks:

Vision Services are a set of pre-trained REST APIs which can be called for image tagging, face recognition, OCR, video analytics, and more. These APIs work out of the box and require minimal expertise in machine learning, but have limited customization capabilities. See the various demos available to get a feel for the functionality (e.g. Computer Vision). The service can be used through API calls or through SDKs (available in .NET, Python, Java, Node and Go languages)
Custom Vision is a SaaS service to train and deploy a model as a REST API given a user-provided training set. All steps including image upload, annotation, and model deployment can be performed using an intuitive UI or through SDKs (available in .NEt, Python, Java, Node and Go languages). Training image classification or object detection models can be achieved with minimal machine learning expertise. The Custom Vision offers more flexibility than using the pre-trained cognitive services APIs, but requires the user to bring and annotate their own data.

If you need to train your own model, the following services and links provide additional information that is likely useful.

Azure Machine Learning service (AzureML) is a service that helps users accelerate the training and deploying of machine learning models. While not specific for computer vision workloads, the AzureML Python SDK can be used for scalable and reliable training and deployment of machine learning solutions to the cloud. We leverage Azure Machine Learning in several of the notebooks within this repository (e.g. deployment to Azure Kubernetes Service)
Azure AI Reference architectures provide a set of examples (backed by code) of how to build common AI-oriented workloads that leverage multiple cloud components. While not computer vision specific, these reference architectures cover several machine learning workloads such as model deployment or batch scoring.

Build Status

AzureML Testing

Build Type	Branch	Branch
Linux GPU	master	staging
Linux CPU	master	staging
Notebook unit GPU	master	staging

Contributing

This project welcomes contributions and suggestions. Please see our contribution guidelines.

computervision-recipes's People

Contributors

Stargazers

Watchers

Forkers

praneet22 sravan90 simonzhaoms ujjwalmsft infernox64 agrwalmohit anupammicrosoft mhmohona wesszumino uchihasr yalitco leolorenzoluis siayou irexyu kuan-li gaoqiangwu trendingtechnology wanxnch donandthou yingning tchigher kant pystokes phitheta hymntaha gravitytrope jianantian sh4d0w777 revodavid dung-n-tran evilches satyawan sorgina13-zz alpaix lizhaofu yian454 mkzirncz1 sanchow29 chjinche serdarketenci o7s8r6 lequangphuoc fitrialif antrepo battani yazici endicotttechnology mave5 doneladams meatloaf111 loomlike roidangur youngpark prasadseemakurthi arunkumarramanan akbar333 shamim-io davtalab zeta1999 aprilxiaoyanliu attibalazs souravroych hbcbh1999 alikerin nishantsbi omarsayed7 tharunponduru faisalshahbaz samy-er dotran thefakhir dattachandan mihir6692 thamizhannal mdfazal makarovartyom kirgal sidravic prpankajsingh ahoyosid v1zh3d gangareddy jcjs jenboc3 renatoviolin haroldss srikarplus learn-ajasra sharmahimansh zlapp sravyaysk simonry14 cris21395 nagpalnitesh palcode guolong-zhang haseeamarathunga threadstonesecure dgks0n amirstudy

computervision-recipes's Issues

Feature ISeg: show how create and use custom labelled segmentation dataset

As the title says, the scope is:

find a custom unlabelled dataset, or find a labelled dataset and show how to create segmentation labels yourself. Dataset has to have continuity to datasets used in #49 and #47
show how to label this dataset (create semantic segmentation labels for each pixel) with a custom smart labelling tool such as http://www.cs.toronto.edu/~amlan/demo/ (polygon RNN)
write notebook detailing this work training a simple DNN model for semantic segmentation.

The scope of the work includes using a pre-trained semantic segmentation tool (does not include showing how to train such a tool)

Feature ISeg: Build basic 2D image segmentation notebook

Similar to IC, build a notebook called "01_training_introduction.ipynb" which introduces the problem, without going into details.
This notebook can be heavily based on the fast.ai's 2019 lesson3-camvid.ipynb and lesson3-camvid-tiramisu.ipynb notebooks:
https://github.com/fastai/course-v3/blob/master/nbs/dl1/

Feature ISeg: Find good default parameters for all segmentation use cases

Similar to IC, find good default parameters for (i) fast accuracy (e.g. deep model, high resolution); and for (ii) fast inference (e.g. low resolution) using ~5 image segmentation datasets.

Feature IC: write 01b notebook to support multi-class image classification.

Feature: automatic code quality monitoring

Using e.g. flake

Feature IC: write user installation guide

Feature ISeg: 2.5D semantic segmentation notebook

TODO: placeholder for 2.5D semantic segmentation regarding how to work with 2.5D data in general - nothing to do with oil and gas.

Feature IC: Support self-trained model in webcam demo

Feature IC: gives examples of different verticals.

Feature IC: create toy dataset for single-class image classification

E.g. using cans, bottles, and milk boxes.

Single-label image classification: single object per image. Maybe also some "negative" images without a single object-of-interest in them.
Multi-label image classification: 0 or more objects in image.

Maybe 50 images for each of the two datasets?

Feature IC: add more content to initial training notebook.

Feature IC: hyperdrive parameter estimation.

Important to show if/what the value is of parameter optimization using hyperdrive versus using default parameters.

Feature IC: Accuracy comparison CVBP vs CVS and provide feedback

Feature IC: ACS deployment

[BUG] Test errors in test_default_sweeper_*

tests/unit/test_experiments.py:26:

>           mkdir(name, mode)
E           FileExistsError: [Errno 17] File exists: 'tmp_data'

tests/unit/test_experiments.py:71: AssertionError

>       assert df.mean(level=(2)).loc["fridgeObjects", "accuracy"] > 0.85
E       assert 0.7954545617103577 > 0.85

Feature ISeg: point back to classification and say how it relates to segmentation

Create a notebook which compares fast.ai U-net model with pixel-level classification model https://github.com/waldeland/CNN-for-ASI and explains how classification compares to segmentation (semantic 2D segmentation in this case).

Feature IC: Fastai Learner callback_fn to record training accuracy

Description

The default fastai record callback is useful but limited.
Add custom callback function so that we can track more information while training.

Feature IC: create webcam demo notebook using a pre-trained model.

Feature ISeg: push baseline classification model for segmentation to fast.ai

Implement model https://github.com/waldeland/CNN-for-ASI for fast.ai in pyTorch, push to fast.ai and expose as baseline learner (show pixel-level classification model from this link compared to say UNet model here #47). Could add to #47 notebook to show how much slower classification models work for segmentation.

Feature IC: additional gpu guidance in webcam notebook

In the webcam notebook, we print out the name of the device that will be used to train your model. Would it be possible to provide some guidance, for the users, on:

how to know whether their machine has a GPU installed or not
what they need to do to get the model to train on the GPU instead of a CPU, if the former is present
what to expect if their machine doesn't have a GPU.

Feature IC: Conversion and inference using ONNX

Feature IC: Visualization of training and test loss/accuracy during training.

Feature: Automatic testing win+linux GPU

Might want to support both Windows and Linux.

Feature ISeg: Compile multiple 2D image segmentation datasets

Compile ~5 image segmentation datasets. Similar to IC, these will be used to find good default parameters, for integration tests, etc.
The datasets should be small since (i) otherwise running parameter tuning takes a long time; and (ii) most customer only have 100s of images, even for image classification.

Feature ISeg: 3D semantic segmentation use case for oil and gas

TODO: placeholder for 3D semantic segmentation specific for oil and gas based on existing MSRA work (Beijing).

Feature IC: Kubernetes deployment

Feature IC: Find good default parameters

Find default parameteres (e.g. learning rate) which work well for many IC problems, for shallow/deep models, for low/high res image, etc. See CVTK's default parameters table. Uses the internal datasets for single-class and multi-class image classification.

Feature ISeg: write user installation guide

Automatic testing Windows + Linux CPU

Feature IC: document existing MS products

Feature IC: Compile diverse evaluation datasets for single-class problems

Remark:

we already have 4-5 datasets for single-class problems which were used to evaluate the CVTK.
datasets should be diverse to cover large spectrum of use cases
datasets should be small (500-5000 images) so that can run parameter sweep efficiently

Feature ISeg: Build oil & gas use case notebook

Use the basic notebook in combination with an oil & gas dataset. Tweak parameters to improve accuracy, etc. As a second step, evaluate accuracy improvements using spatial consistency smoothing.

Feature ISeg: push DeepLab to fast.ai and expose in 2D segmentation benchmarks

Push DeepLab model https://github.com/lexfridman/mit-deep-learning/blob/master/tutorial_driving_scene_segmentation/tutorial_driving_scene_segmentation.ipynb to fast.ai and show how it compares to other models which we already wrote about in #47

Set up DevOps org for Azure pipelines and infrastructure to run tests on

Setted up - https://dev.azure.com/best-practices org and https://dev.azure.com/best-practices/computervision project for CV
Setted up DSVM (Linux and Windows, GPU supported) to use as private agents.

Feature: human code quality monitoring

Nominate a code champion which will monitor and drive code quality

Feature IC: AML training.

Widgets may not rendered correctly on some browsers.

Widgets may not rendered correctly on some browsers. Should mention that from the notebook. (mainly in the web-cam notebook).

Feature IC: hard-negative mining

Ideally show (at the very least explain) in an example where hard-negative mining makes a big difference. E.g. training a deer classifier, which also fires on trees.

Feature IC: Support and describe different image augmentations

In addition to code, this should also include a description when to use and when not to use various image augmentations.

Feature IC: quantify difference in training time when using optimized Pillow.

Measure how/if training time decreases when using an optimized Pillow package.

Fastai's conda (test) channel has an experimental pillow package built against a custom build of libjpeg-turbo. See: https://docs.fast.ai/performance.html

Feature ISeg: AML hyperparameter tuning of segmentation models

TODO: this is a placeholder for hyperparameter tuning. Pending on 2D semantic segmentation and 2D Oil and Gas scenarios, we will see what dataset and model we want to pick for this notebook - but the accent should be on scalable hyperparameter tuning of segmentation models.

Feature IC: Support different DNN architectures

Should at the very least support a high-accuracy model (e.g. ResNet) and a high-speed model (e.g. MobileNet)

Feature IC: support changing image resolution during inference (without re-training).

Add documentation for code quality, notebook conversion and running hooks.

Added documentation for working with git hooks and pre-commit
how to use black and flake8
How to work with notebooks.

Feature IC: Visualizations using Jupyter Widgets

Feature IC: Show how to extract DNN features for a given image(s)

Ideally this should use batching to speed up feature computation. Also, could include a toy example, e.g. k-means clustering of an image dataset

Feature IC: document pitfalls and guidelines

Examples which are, and which are not suited for image classification and object detection
- For example:
  - Image classification: area of interest is tiny in the image (e.g. logo far in the background)
  - Object detection: object to be found is hard to define where it starts/ends
Annotation
- How to annotate / how to not annotate
Common pitfalls and how to avoid
Trade-off speed versus accuracy in training and inference
etc.