Giter Club home page Giter Club logo

sjtu_microe's Introduction

SJTU_microe

This is a repository for FPGA-based neural network inference, which delivered 70.3% Intersection over Union (IoU) and 249.380 fps with 4.95 W power consumption on Ultra96v2. The design won the third place in the 58th IEEE/ACM Design Automation Conference System Design Contest (DAC-SDC). Designed by:

SJTU_microe group, School of Electronic, Information and Electrical Engineering (SEIEE), Shanghai Jiao Tong University(SJTU)

Peidong Du, Guochao Deng, Yaoyao Kong, Qin Wang, Naifeng Jing, Jianfei Jiang

The final rankings are published at https://dac-sdc-2021.groups.et.byu.net/doku.php?id=results

rank

Our design bases on UltraNet, the champion design of the 57th IEEE/ACM Design Automation Conference System Design Contest (DAC-SDC). Our contributions are as follows:

  1. DNN model optimization
  • Since most of the images in the dataset have an object size of 1-2% of the captured images[1], we add a bypass module to the original DNN model, which merges the feature maps of a shallow layer and a deep layer to improve the detection accuracy of small objects. The accuracy is improved from 65.6% to 70.3% with this optimization.
  1. Accelerator optimization
  • Improve DSP utilization: The DSP on the FPGA platform supports 27bits×18bits multiplication, and we quantize the weights and activations to 4 bits, so we map four 4-bit multiplications on a DSP to improve DSP utilization.
  • Optimize the dataflow: We optimize the dataflow to adjust to the new DNN model and the above DSP optimization.
  • Reduce the bandwidth: We reduce the bandwidth between different layers to the minimized requirement.
  1. System Optimization
  • Balance PS and PL: Since the throughput of PL is higher than that of PS, so we set the number of thread loading images to 8 and batch size of the accelerator to 64 to improve the speed of loading images, and we reduce the frequency of PL to 125MHz to adjust to PS.

Repo organization

  • training: The DNN model definition and training scripts.
  • quantization: The script which process the model generation weights and configuration files to be used in Vivado HLS.
  • hls: The DNN model implemented using Vivado High Level Synthesis (HLS), written in C++.
  • script: The script to generate a Vivado block design.
  • deploy: The bitstream for FPGA configuration and the Jupyter notebook showing how to use the DNN model to perform object detection on FPGA.

Platform

Xilinx Ultra96 V2

Software prerequisites

Vivado Design Suite - HLx Editions PyTorch

Build the bitstream from scratch

In this work, the DNN model implementation is written in C++ code. To deploy the model on FPGA, we go through four major steps:

1. Training and weights file generation

  1. Download DAC-SDC 2021 object detection dataset.

  2. Start the training:

    cd training/
    python3 train.py --multi-scale --img-size 320 --multi-scale --batch-size 32
  3. Generate the weights and configuration to be used in Vivado HLS:

    cd quantization/
    python3 torch_export.py
    python3 ultranet_param_gen.py

2. Vivado HLS implementation

In this step, the C++ source code is synthesized by Vivado High Level Synthesis (HLS) tool to generate RTL (Verilog code) code, and exported as an HLS IP.

The source code of the DNN model can be found in ./hls/ folder. There are typically four steps:

  1. C code simulation
  2. C code synthesis
  3. C and Verilog co-simulation
  4. Export RTL (Verilog/VHDL)

3. Integration and bitstream generation

In this step we integrate the generated HLS IP into the whole system, and generate the bitstream (.bit) and the hardware configuration file (.hwh) in Vivado. The system configuration script can be found in ./script/.

4. Deployment on the Ultra96

In this step, we upload the generated bitstream file (.bit) and the hardware description file (.hwh) generated by Vivado to FPGA, and finish the host code running in the embedded ARM core in Python.

First, find the following two files to upload to the board (default name and path):

  1. $Project_Name_wrapper.bit ($Path_To_Your_RTL_Project/$Project_Name/$Project_Name.runs/impl_1)
  2. $Project_Name.hwh ($Path_To_Your_RTL_Project/$Project_Name/$Project_Name.srcs/sources_1/bd/$Project_Name/hw_handoff)

Remember to rename the .bit and .hwh file to dac_sdc.bit and dac_sdc.hwh, or anything but need to be the same.

Second, You may refer to the SJTU_microe.ipynb in the ./deployto see how we can use the DNN on the FPGA to perform object detection.

sjtu_microe's People

Contributors

heymesut avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.