Giter Club home page Giter Club logo

tinyml_and_efficient_dlc's Introduction

TinyML and Efficient Deep Learning Computing

강의 주제: TinyML and Efficient Deep Learning Computing
Instructor : Song Han(Associate Professor, MIT EECS)
[schedule(2023 Fall)] | [schedule(2022 Fall)] | [youtube]

💡 목표

  • 효율적인 추론 방법 공부

    딥러닝 연산에 있어서 효율성을 높일 수 있는 알고리즘을 공부한다.

  • 제한된 성능에서의 딥러닝 모델 구성

    디바이스의 제약에 맞춘 효율적인 딥러닝 모델을 구성한다.


🚩 정리한 문서 목록

📖 Basics of Deep Learning

  • Efficiency Metrics

    latency, storage, energy

    Memory-Related(#parameters, model size, #activations), Computation(MACs, FLOP)

📔 Efficient Inference

  • Pruning Granularity, Pruning Critertion

    unstructured/structured pruning

    magnitude-based pruning(L1-norm), second-order-based pruning, percentage-of-zero-based pruning, regression-based pruning

  • Automatic Pruning, Lottery Ticket Hypothesis

    Pruning Ratio, Sensitivity Analysis, Automatic Pruning(AMC, NetAdapt)

    Lottery Ticket Hypothesis(Winning Ticket, Iterative Magnitude Pruning, Scaling Limitation), Pruning with Regularization

    Pruning at Initialization(Connection Sensitivity)

  • System & Hardware Support for Sparsity

    EIE(CSC format: relative index, column pointer)

    M:N Sparsity


  • Basic Concepts of Quantization

    Numeric Data Types: Integer, Fixed-Point, Floating-Point(IEEE FP32/FP16, BF16, NVIDIA FP8), INT4 and FP4

    Uniform vs Non-uniform quantization, Symmetric vs Asymmetric quantization

  • Vector Quantization, Linear Quantization

    Vector Quantization(VQ): Deep Compression(iterative pruning, retrain codebook, Huffman encoding), Product Quantization(PQ): AND THE BIT GOES DOWN

    Linear Quantization: Zero point, Scaling Factor, Quantization Error(clip error, round error), Linear Quantized Matrix Multiplization(FC layer, Conv layer)

  • Post Training Quantization

    Weight Quantiztion: Per-Tensor Activation Per-Channel Activation, Group Quantization(Per-Vector, MX), Weight Equalization, Adative Rounding

    Activation Quantization: During training(EMA), Calibration(Min-Max, KL-divergence, Mean Squared Error)

    Bias Correction, Zero-Shot Quantization(ZeroQ)

  • Quantization-Aware Training, Low bit-width quantization

    Fake quantization, Straight-Through Estimator

    Binary Quantization(Deterministic, Stochastic, XNOR-Net), Ternary Quantization


  • Neural Architecture Search: basic concepts & manually-designed neural networks

    input stem, stage, head

    AlexNet, VGGNet, SqueezeNet(global average pooling, fire module, pointwise convolution), ResNet50(bottleneck block, residual learning), ResNeXt(grouped convolution)

    MobileNet(depthwise-separable convolution, width/resolution multiplier), MobileNetV2(inverted bottleneck block), ShuffleNet(channel shuffle), SENet(squeeze-and-excitation block), MobileNetV3(redesigning expensive layers, h-swish)

  • Neural Architecture Search: RNN controller & search strategy

    cell-level search space, network-level search space

    design the search space: Cumulative Error Distribution, FLOPs distribution

    Search Strategy: grid search, random search, reinforcement learning, bayesian optimization, gradient-based search, evolutionary search

    EfficientNet(compound scaling), DARTS

  • Neural Architecture Search: Performance Estimation & Hardware-Aware NAS

    Weight Inheritance, HyperNetwork, Weight Sharing(super-network, sub-network)

    Performance Estimation Heuristics: Zen-NAS, GradSign

    Hardware-Aware NAS(ProxylessNAS, HAT), One-Shot NAS(Once-for-All)


  • Knowledge Distillation

    Knowledge Distillation(distillation loss, temperature)

    KD: matching intermediate weights/features/attention maps/sparsity pattern/relational information(layers, samples)

  • Self Distillation, Online Distlliation, Applications

    Self Distillation, Online Distillation, Combining Online and Self-Distillation, Network Augmentation

    Applications: Object Detection, Semantic Segmentation, GAN, NLP


  • MCUNet

    microcontroller, flash/SRAM usage, peak SRAM usage, MCUNet: TinyNAS, TinyEngine

    TinyNAS: automated search space optimization(weight/resolution multiplier), resource-constrained model specialization(Once-for-All)

    MCUNetV2: patch-based inference, network redistribution, joint automated search for optimization, MCUNetV2 architecture(VWW dataset inference)

    RNNPool, MicroNets(MOPs & latency/energy consumption relationship)

⚙️ Efficient Training and System Support

  • TinyEngine

    memory hierarchy of MCU, data layout(NCHW, NHWC, CHWN)

    TinyEngine: Loop Unrolling, Loop Reordering, Loop Tiling, SIMD programming, Im2col, In-place depthwise convolution, appropriate data layout(pointwise, depthwise convolution), Winograd convolution


🔍 Schedule

Lecture 1: Introduction

[ slides ]

Lecture 2: Basics of Deep Learning

[ slides | video ]


Efficient Inference


Lecture 3: Pruning and Sparsity (Part I)

[ slides | video ]

Lecture 4: Pruning and Sparsity (Part II)

[ slides | video ]

Lecture 5: Quantization (Part I)

[ slides | video ]

Lecture 6: Quantization (Part II)

[ slides | video ]

Lecture 7: Neural Architecture Search

(Part I)

[ slides | video ]

Lecture 8: Neural Architecture Search

(Part II)

[ slides | video ]

Lecture 9: Neural Architecture Search

(Part III)

[ slides | video ]

Lecture 10: Knowledge Distillation

[ slides | video ]

Lecture 11: MCUNet - Tiny Neural Network

Design for Microcontrollers

[ slides | video ]

Lecture 12: Paper Reading Presentation


Efficient Training and System Support


Lecture 13: Distributed Training and Gradient Compression (Part I)

[ slides | video ]

Lecture 14: Distributed Training and Gradient Compression (Part II)

[ slides | video ]

Lecture 15: On-Device Training and Transfer Learning (Part I)

[ slides | video ]

Lecture 16: On-Device Training and Transfer Learning (Part II)

[ slides | video ]

Lecture 17: TinyEngine - Efficient Training and Inference on Microcontrollers

[ slides | video ]


Application-Specific Optimizations


Lecture 18: Efficient Point Cloud Recognition

[ slides | video ]

Lecture 19: Efficient Video Understanding and GANs

[ slides | video ]

Lecture 20: Efficient Transformers

[ slides | video ]


Quantum ML


Lecture 21: Basics of Quantum Computing

[ slides | video ]

Lecture 22: Quantum Machine Learning

[ slides | video ]

Lecture 23: Noise Robust Quantum ML

[ slides | video ]

Lecture 24: Final Project Presentation

Lecture 25: Final Project Presentation

Lecture 26: Course Summary & Guest Lecture

[ slides | video ]

tinyml_and_efficient_dlc's People

Contributors

erectbranch avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.