Giter Club home page Giter Club logo

deepfloat's Introduction

This repository contains the SystemVerilog RTL, C++, HLS (Intel FPGA OpenCL to wrap RTL code) and Python needed to reproduce the numerical results in "Rethinking floating point for deep learning" [1].

There are two types of floating point implemented:

  • N-bit (N, l, alpha, beta, gamma) log with ELMA [1]
  • N-bit (N, s) (linear) posit [2]

with partial implementation of IEEE-style (e, s) floating point (likely quite buggy) and non-posit tapered log.

8-bit (8, 1, 5, 5, 7) log is the format described in "Rethinking floating point for deep learning", shown within to be more energy efficient than int8/32 integer multiply-add at 28 nm and an effective drop-in replacement for IEEE 754 binary32 single precision floating point via round to nearest even for CNN inference on ResNet-50 on ImageNet.

[1] Johnson, J. "Rethinking floating point for deep learning." (2018). https://arxiv.org/abs/1811.01721

[2] Gustafson, J. and Yonemoto, I. "Beating floating point at its own game: Posit arithmetic." Supercomputing Frontiers and Innovations 4.2 (2017): 71-86.

Requirements

You will need:

  • a PyTorch CPU installation
  • a C++11-compatible compiler to use to generate a PyTorch C++ extension module
  • the ImageNet ILSVRC12 image validation set
  • an Intel OpenCL for FPGA compatible board
  • a Quartus Prime Pro installation with the Intel OpenCL for FPGA compiler

rtl contains the SystemVerilog modules needed for the design.

bitstream contains the OpenCL that wraps the RTL modules.

cpp contains host CPU-side code for interacting with the FPGA OpenCL design.

py contains the top-level functionality to compile the CPU code and run networks.

Flow

In bitstream, run

./build_lib.sh <design>

followed by

./build_afu.sh <design> (this will take several hours to synthesize the FPGA design)

where <design> is one of loglib or positlib. The aoc/aocl tools, Quartus, Quartus license file, OpenCL BSP etc. must be in your path/environment. loglib is configured to generate a design with 8-bit (8, 1, 5, 5, 7) log arithmetic, and positlib is configured to generate a design with 8-bit (8, 1) posit arithmetic by default.

The aoc build seems to require a Python 2.x interpreter in the path, otherwise it will fail.

Update the aocx_file in py/run_fpga_resnet.py to your choice of design.

Update valdir towards the end of py/validate.py to point to a Torch dataset loader compatible installation of the ImageNet validation set.

Using a python environment with PyTorch available, in py run:

python run_fpga_resnet.py

If successful, this will run the complete validation set against the FPGA design. This requires a Python 3.x interpreter.

RTL comments

The modules used by the OpenCL design reside in rtl/log/operators and rtl/posit/operators. You can see how they are assembled here.

rtl/paper_syn contains the modules used in the paper's 28 nm synthesis results (Paper*Top.sv are the top-level modules). Waves_*.sv are the testbench programs used to generate switching activity for power analysis output.

You will have to provide your own Synopsys Design Compiler scripts/flow/cell libraries/PDK/etc. for synthesis, as we are not allowed to share details on which 28 nm semiconductor process was used or our Design Compiler synthesis scripts.

Other comments

The posit encoding implemented herein implements negative values with a sign bit rather than two's complement encoding. It is a TODO to change it, but the cost either way is largely dwarfed by other concerns in my opinion.

The FPGA design itself is not super flexible yet to support different bit widths than 8. loglib is restricted to N <= 8 bits at the moment, while positlib should be ok for N <= 16 bits, though some of the larger designs may run into FPGA resource issues if synthesized for the FPGA.

Contributions

This repo currently exists as a proof of concept. Contributions may be considered, but the design is mostly that which is needed to reproduce the results from the paper.

License

This code is licensed under CC-BY-NC 4.0.

This code also includes and uses the Single Python Fixed-Point Module for LUT SystemVerilog log-to-linear and linear-to-log mapping module generation in rtl/log/luts, which is licensed by the Python-2.4.2 license.

deepfloat's People

Contributors

wickedfoo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deepfloat's Issues

Simulation problem in RTL LogNumberUnpackedToLogCompact.sv

Hi,

I used paper_syn/Wave_PaperLogPETop.sv to simulate PaperLogPETop. All correlated file were included in Wave_PaperLogPETop.sv. I used ncverilog to simulate and met "Multiple drivers to always_comb output variable preRound.data.isZero detected." in LogNumberUnpackedToLogCompact.sv. I used verdi to check corrections of connections (around "preRound") but I can't find them. Can you help me to solve problem.

Best Regards,
Andy Chang

PaperPositPETop.sv output unknown state (xxxxx)

PaperPositPETop.sv output unknown state (xxxxx). This is because of the FF in line 27 to line 30 has no reset value, which results makes the input of PaperPositPE to be xx. As a result, qPostAdd become unknown state in the first clock after rst=0, consequently qReg.data. Hence forth the output will be stuck at unknown state.

This is the FF which has no reset value in file PaperPositPETop.sv
always_ff @(posedge clock) begin
aInReg.data <= aIn;
bInReg.data <= bIn;
end

No support for ES values bigger than the number of bits of the posit number. Bug in multiply module.

Hello,

I have used your Posit implementation of addition and multiplication as a reference model in my master thesis. During the verification of my code I have found some issues and a possible bug in your implementation.

The first problem is that it is not possible to select ES values bigger than POSIT_WIDTH-3. And looking at the documentation of the numbers there is no restrictions for the ES value. In order to support it in your code I have changed the PositDecode and PositEncode files. But maybe it is also needed to change other files.

The second problem is that there is a bug in the multiplication rounding. The value of the normalTrailingBits is not correct so I have added a line in the PositMultiply module assigning the correct value to the variable. With this change the multiplication results were correct.

Please find attached the three files I have modified. I would like to know if these changes make sense for you or I misunderstood something.

Best regards,

Laura.
PositDecode.txt
PositEncode.txt
PositMultiply.txt

Problem in FC layers

Hello @wickedfoo,
Readme specifies to use python3.x for validating the model. However, there is a memory mapping error while programming the FPGA. Using python2.x solves that issue. However, the fully connected layer has a problem.

Traceback (most recent call last):
File "run_fpga_resnet.py", line 103, in
reference_model=cpu_model)
File "/home/manoj/git/deepfloat/py/validate.py", line 112, in validate
output = fpga_h.forward(input)
File "run_fpga_resnet.py", line 37, in forward
self.output_p = self.model.forward(*dev + (input_p, ))
File "/home/manoj/git/deepfloat/py/fpga_resnet.py", line 209, in forward
x = self.fc.forward(context, program, queue, x)
RuntimeError: Assert failed at /home/manoj/git/deepfloat/cpp/ops/TensorMath.cpp line 217

line 217: CL_ASSERT(a.getSize(1) == b.getSize(0));

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.