Giter Club home page Giter Club logo

fptuner's Introduction

FPTuner Logo

Table of Contents

Overview

FPTuner is a rigorous tool for automatic precision-tuning of real valued expressions. FPTuner generates a mixed-precision allocation (single, double, or quadruple precision) on a given input domain that is guaranteed to have error below a given threshold.

In addition to precision-tuning, FPTuner also allows users to control precision allocation in ways that helps optimize code. As two examples,

  • it allows users to control the maximum number of type-casts introduced during precision allocation. Capping the number of type-casts can help reduce the associated overheads.

  • FPTuner allows users to group ("gang") expressions (typically similar expression) and force the principal operators of these expressions to share the same precision allocation. Doing so encourages the compiler to do vectorization.

For further details of FPTuner, please consult our paper. The rest of this file will guide you through FPTuner's installations. A more comprehensive reference manual of FPTuner is situated at Reference.md. This reference manual describes FPTuner's flags in detail. The flags include basic flags (error threshold allowed, precision choices available) and allocation-controlling flags (fix the number of type-casts, gang expressions, etc.)

Requirements

FPTuner has been tested on Ubuntu 12.04, 14.04, 16.04 on x86_64; we recommend version 16.04. It depends on the following free projects:

  • git
  • python3 (FPTuner currently supports python3 only)
  • PLY for python3
  • bison
  • flex
  • ocaml
  • g++

On Ubuntu these can all be installed with

sudo apt-get install -y git python3-ply bison flex ocaml g++ make

Apart from these, FPTuner also depends on Gurobi v6.5. Note that FPTuner's installation script does not automatically install Gurobi. Please follow the following steps to install Gurobi and obtain a free academic license.

  1. Installation.
  • On Gurobi website (tab "DOWNLOADS") select "Download Center."
  • Select "Gurobi Optimizer." You need to register for an account to obtain the academic licenses.
  • Download gurobi6.5.2_linux64.tar.gz and unpack with tar -xvf gurobi6.5.2_linux64.tar.gz.
  • Execute cd gurobi652/linux64 and ./setup.py build.
  1. Set the required environment variables as follows:

    export GUROBI_HOME=your-path/gurobi652/linux64
    export PATH=$GUROBI_HOME/bin:$PATH
    export LD_LIBRARY_PATH=$GUROBI_HOME/lib:$LD_LIBRARY_PATH
    
  2. Obtain an academic license.

  • Go to https://user.gurobi.com/download/licenses/free-academic.
  • Read the User License Agreement and the conditions, then click "Request License."
  • Copy command grbgetkey your-activation-code shown on the screen.
  • Under the bin directory of your Gurobi installation, run the grbgetkey command which you just copied. This command will require you to enter a path to store the license key file. The grbgetkey command will indicate you to setup environment variable GRB_LICENSE_FILE to the license file path.
  1. After the installation, add the path of Gurobi's python module to environment variable PYTHONPATH.
  • Assuming Gurobi is installed under GUROBI_HOME, you should have a directory similar to $GUROBI_HOME/lib/python3.4_utf32. Note: We assumed the version of Gurobi to be 6.5.2, and hence your Gurobi path may be different. Also, type python3 --version to find the Python version on your system. If it is Python 3.5, use $GUROBI_HOME/lib/python3.5_utf32 instead.
  • Add this to your environment with export PYTHONPATH=$GUROBI_HOME/lib/python3.4_utf32:$PYTHONPATH

For more installation details, please refer to the user menu.

Installation

  1. Download FPTuner from our GitHub repository: git clone https://github.com/soarlab/FPTuner

  2. Go to the root directory of FPTuner, for example: cd ./FPTuner

  3. Run the setup script at the root directory of FPTuner: python3 setup.py install

  4. Set up the required environment variables. The installation script will create a file fptuner_vars for setting the related environment variables. To do so, run source fptuner_vars.

To uninstall, run python3 setup.py uninstall.

Running FPTuner

To test the installation, please try out the hello-world example through the following steps:

  1. Go to directory bin under the root of FPTuner.

  2. Run command python3 ./fptuner.py -e 0.001 ../examples/helloworld0.py

The console output of FPTuner should be the following:

==== error bound : 0.001 ====
Total # of operators: 5
# of 32-bit operators: 2
# of 64-bit operators: 3

---- alloc. ----
Group 0 : 32-bit
Group 1 : 32-bit
Group 2 : 64-bit
Group 3 : 64-bit
Group 4 : 64-bit
----------------

# L2H castings: 2
# H2L castings: 0
# Castings: 2

Expression:
(* (+ (A) (B)) (C))

In addition, a .cpp file helloworld0.0.001.cpp will be generated. Now we describe how to use FPTuner with this hello-world example.

Input

FPTuner takes an expression specification and an user-specified error threshold for generating the optimal allocation. In the command python3 ./fptuner.py -e 0.001 ../examples/helloworld0.py, file helloworld0.py is the expression specification and -e 0.001 specifies 1e-03 as the error threshold.

The later section "Example of Expression Specification" describes how to specify the expression through the python-based interface.

Output

FPTuner summarizes the number of 32- and 64-bit operators, prints the allocation on the console. In the example output, for example, Group 0 : 32-bit denotes that the group 0 (gang 0) operators are assigned 32-bit precision. # L2H castings (resp., # H2L castings) indicates the number of low-to-high (resp., high-to-low) type casts in this allocation. # Castings is the summation of # L2H castings and # H2L castings. In addition to the console output, a .cpp file is synthesized by FPTuner which implements the allocation.

When outputting to a terminal a colorized s-expression will be emitted indicating the allocations of variables and operations. For example:

Colorized output

Variables A and B are allocated at 32-bit precision as indicated by the green text. Blue text indicates that each operation and variable C are allocated at 64-bit precision. Notably, the blue parenteses around A and B mean that they are both cast to 64-bit.

To POPL Artifact Evaluation Reviewers

Reproduce the tuning results of Table 5.1 and Table 5.2

The tuning results of Table 5.1 are shown under column "# of double-ops forced by Es" and the results of Table 5.2 are shown under column "# of single-ops forced by Es." With a correct installation of FPTuner (e.g., the above hello-world example works), the fastest way to reproduce the two tables is using the scripts under directory bin.

For Table 5.1, please run (under directory bin)

./test-table-5.1

For Table 5.2, please run (under directory bin)

./test-table-5.2

Performance and energy measurements

We currently don't offer the scripts to automatically measure performance and energy. However, as demonstrated through the hello-world example, the .cpp files of the corresponding mixed precision allocations are offered. You can freely do performance and energy measurements with those .cpp files on your platforms.

Tuning results and tuning performance may be affected by global optimization

The tuning results and the tuning performance of FPTuner are affected by the underlying global optimization. The global optimization may calculate tight bounds (resp., loose bounds) of the first derivatives that result in more (resp., fewer) low-precision operators. In addition, FPTuner's performance is currently dominated by global optimization. Consequently, there may be tuning results which don't exactly match results shown in the paper.

Individually running the Benchmarks

Similar to the hello-world example, we can run each of the benchmarks with the following command (under directory bin):

python3 ./fptuner.py -e "0.001 0.0001" -b "32 64" path-to-the-benchmark

(The desired error thresholds and the bit-width candidates are specified with options -e and -b respectively.) The following table offers the benchmark names and their relative paths to the root directory of FPTuner.

Benchmark Name Relative Path to the Root of FPTuner
sine examples/primitives/sine.py
sqroot examples/primitives/sqroot.py
sineOrder3 examples/primitives/sineOrder3.py
predatorPrey examples/primitives/predatorPrey.py
verhulst examples/primitives/verhulst.py
rigidBody 1 examples/primitives/rigidBody-1.py
rigidBody 2 examples/primitives/rigidBody-2.py
turbine 1 examples/primitives/turbine-1.py
turbine 2 examples/primitives/turbine-2.py
turbine 3 examples/primitives/turbine-3.py
doppler 1 examples/primitives/doppler-1.py
doppler 2 examples/primitives/doppler-2.py
doppler 3 examples/primitives/doppler-3.py
carbonGas examples/primitives/carbonGas.py
jet examples/primitives/jet.py
cone-area examples/math/cone-area.py
Gaussian examples/math/gaussian.py
Maxwell-Boltzmann examples/math/maxwell-boltzmann.py
reduction examples/micro/reduction.py

Reference

The complete reference of FPTuner is given in Reference.md.

Here we introduce some more tuning options provided by FPTuner.

Candidate bit-widths

FPTuner tunes for mixed 32- and 64-bit by default. Tuning for mixed 64- and 128-bit can be done with option

-b "64 128"

FPTuner currently supports tuning for the following three bit-width candidate sets:

  • 32- and 64-bit (specified with -b "32 64")
  • 64- and 128-bit (specified with -b "64 128")
  • 32-, 64-, and 128-bit (specified with -b "32 64 128")

Multiple error thresholds

FPTuner can take multiple error thresholds and generate the optimal allocation of each threshold. For example, the following option results in two allocations generated for the two error thresholds (0.001 and 0.0001):

-e "0.001 0.0001"

Example of Expression Specification

FPTuner decides the optimal bit-widths of the operators in the floating-point implementations of real-number computations.

At this point, FPTuner provides a Python interface that allows the users to specify their the real-number computations. In this section, we introduce how to use the Python interface through a simple example:

(A + B) * C

which is the hello-world 0 example.

Invoke the interface module

  • In a python (.py) file, use the following line to invoke the interface module:

    import tft_ir_api as IR
    
  • Note that the src directory under the FPTuner root directory should be added to the environment variable PYTHONPATH.

Declare bounded variables

FPTuner currently supports variables which have bounded and contiguous ranges. For example, we want to declare three variables, A, B, and C, and assign [0.0, 100.0] as their ranges. This can be achieved with function IR.RealVE as shown in the following lines:

A = IR.RealVE("A", 0, 0.0, 100.0) 
B = IR.RealVE("B", 1, 0.0, 100.0) 
C = IR.RealVE("C", 2, 0.0, 100.0) 

Function IR.RealVE returns a variable (variable expression) with taking four arguments:

  1. The label of the variable.

  2. The group ID of the variable. Expressions assigned with the same group (gang) ID will be assigned with the same bit-width. In this example, we assume that we want to assign different bit-widths to the variables. Thus, the three variables have different ID: A has 1, B has 2, and C has 3.

  3. The lower bound of the value range.

  4. The upper bound of the value range.

Specify binary expressions

There are two binary expressions in our example, and they can be specified with function IR.BE as shown in the following line:

rel = IR.BE("*", 4, IR.BE("+", 3, A, B), C) 

The application

IR.BE("+", 3, A, B)

results in a binary expression (A + B). The four arguments are explained as follows:

  1. The first argument is a string which specifies the binary operator. In this case, "+" specifies the addition.

  2. The second argument is an integer which gives the group ID. Expressions having the same group ID will be assigned with the same bit-width.

  3. The third argument is the left-hand-side operand. In this case, it is variable A.

  4. The fourth argument is the right-hand-side operand. In this case, it is variable B.

Similarly,

IR.BE("*", 4, IR.BE("+", 3, A, B), C) 

returns expression

(A + B) * C

Tune for expression (A + B) * C

To assign (A + B) * C to FPTuner as the tuning target, we use the following line:

IR.TuneExpr(rel)

rel is the reference of our targeted expression. Function IR.TuneExpr specifies the expression to tune.

Acknowledgements

Supported in part by NSF grants 1643056, 1421726, and 1642958.

fptuner's People

Contributors

ianbriggs avatar keram88 avatar wfchiang avatar zvonimir avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fptuner's Issues

Gelpia compilation errors

Hello,

First of all, many thanks for the excellent paper and sharing the work through this repo.

I'm trying to build FPTuner in an Ubuntu 16.04. I get the following errors related to building Gelpia.

  1. "make requirements" fails: the url (http://lipforge.ens-lyon.fr/frs/download.php/162/crlibm-1.0beta4.tar.gz) to download CRLibM is broken

  2. I bypassed problem 1. by using the crlibm.tar.gz from the develop branch of Gelpia and have succesfully built requirements. However, compilation of Gelpia ("make") fails with an error related to Rust and the getopts package:

error[E0301]: cannot mutably borrow in a pattern guard
--> /home/ggeorgak/.cargo/registry/src/github.com-1ecc6299db9ec823/getopts-0.2.17/src/lib.rs:405:73
|
405 | } else if was_long || name_pos < names.len() || args.peek().map_or(true, |n| is_arg(&n)) {
| ^^^^ borrowed mutably in pattern guard

error: aborting due to previous error

error: Could not compile getopts.

Compilation error related to Gelpia

I am trying to build and use FPTuner in an Ubuntu 16.04 x86_64 virtual machine. I got Gurobi installed, but the FPTuner build fails while trying to compile Gelpia in what looks like Rust code. I'm not familiar with Rust so I'm not sure where to start debugging this one. Is this an FPTuner issue or a Gelpia issue?

Downloading Gelpia git repository at branch ArtifactEvaluation
Building Gelpia requirements
Installing Rust
Installing CRLibM
Installing GAOL
Building Gelpia
/home/lam/.cargo/registry/src/github.com-1ecc6299db9ec823/gcc-0.3.53/src/lib.rs:338:23: 338:42 error: the `?` operator is not stable (see issue #31436)
/home/lam/.cargo/registry/src/github.com-1ecc6299db9ec823/gcc-0.3.53/src/lib.rs:338         let out_dir = self.get_out_dir()?;
                                                                                                          ^~~~~~~~~~~~~~~~~~~
/home/lam/.cargo/registry/src/github.com-1ecc6299db9ec823/gcc-0.3.53/src/lib.rs:338:23: 338:42 help: add #![feature(question_mark)] to the crate attributes to enable
/home/lam/.cargo/registry/src/github.com-1ecc6299db9ec823/gcc-0.3.53/src/lib.rs:341:25: 341:48 error: the `?` operator is not stable (see issue #31436)
/home/lam/.cargo/registry/src/github.com-1ecc6299db9ec823/gcc-0.3.53/src/lib.rs:341             let mut f = fs::File::create(&src)?;
                                                                                                            ^~~~~~~~~~~~~~~~~~~~~~~
[many more similar lines snipped]

Build failed

Hi,
I am trying to build and use FPTuner in an Ubuntu 16.04 x86_64 machine. I installed the latest Gurobi, but the FPTuner build fails in InstallFPTaylor. What should I do to fix this?

Downloading FPTaylor git repository at branch develop
Checking out FPTaylor at commit bb773cb0e9e1b13db8845623e80186e1a343bb11
Building FPTaylor
File "fpu.ml", line 24, characters 0-52:
Warning 3: deprecated: [@@noalloc] should be used instead of "noalloc"
File "fpu.ml", line 25, characters 0-54:
Warning 3: deprecated: [@@noalloc] should be used instead of "noalloc"
File "fpu.ml", line 26, characters 0-60:
Warning 3: deprecated: [@@noalloc] should be used instead of "noalloc"
File "fpu.ml", line 24, characters 0-52:
Warning 3: deprecated: [@@noalloc] should be used instead of "noalloc"
File "fpu.ml", line 25, characters 0-54:
Warning 3: deprecated: [@@noalloc] should be used instead of "noalloc"
File "fpu.ml", line 26, characters 0-60:
Warning 3: deprecated: [@@noalloc] should be used instead of "noalloc"
/usr/bin/ld: chcw.o: relocation R_X86_64_32S against `.bss' can not be used when making a PIE object; recompile with -fPIC
/usr/bin/ld: final link failed: Nonrepresentable section on output
collect2: error: ld returned 1 exit status
File "none", line 1:
Error: Error while building custom runtime system
Makefile:23: recipe for target 'ocamlfpu' failed
make[1]: *** [ocamlfpu] Error 2
Makefile:71: recipe for target 'compile-interval' failed
make: *** [compile-interval] Error 2
Traceback (most recent call last):
File "setup.py", line 258, in
main(sys.argv)
File "setup.py", line 199, in main
InstallFPTaylor("develop", "bb773cb0e9e1b13db8845623e80186e1a343bb11")
File "setup.py", line 83, in InstallFPTaylor
assert(os.path.isfile("./fptaylor"))
AssertionError

Thank you very much.

Failed build

Dear FPTuner Team,

When building the code:

/usr/bin/ld: chcw.o: relocation R_X86_64_32S against `.bss' can not be used when making a shared object; recompile with -fPIC

My system is Ubunt17.10 and python3.6.

Could you help me?
Best,

Use FPTuner without the INTERVAL library?

Hi,

I am trying to install FPTuner and ran into several issues when building dependencies, particularly the INTERVAL library of FPTaylor. Seems like this is a known issue, so I applied the suggested solution and used the make fptaylor-simple-interval to compile FPTaylor. All dependencies are now installed, but seems like the later version of FPTaylor is incompatible with FPTuner.

When running the command:
$ python3 ./fptuner.py -e 0.001 ../examples/helloworld0.py

I am getting the following error:

$~/FPTuner/bin/./fptuner.py:4: DeprecationWarning: the imp module is deprecated in favour of importlib and slated for removal in Python 3.12; see the module's documentation for alternative uses
  import imp
Traceback (most recent call last):
  File "/root/FPTuner/bin/./fptuner.py", line 259, in <module>
    main()
  File "/root/FPTuner/bin/./fptuner.py", line 196, in main
    eforms, alloc = tft_tuning.TFTRun(EXPRS_NAME)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/FPTuner/src/tft_tuning.py", line 181, in TFTRun
    eforms, alloc = tft_sol_exprs.SolveExprs(fname_input, OPTIMIZERS)  
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/FPTuner/src/tft_sol_exprs.py", line 587, in SolveExprs
    ef = GenerateErrorFormFromExpr(te, ERROR_TYPE, E_UPPER_BOUND, M2, EQ_GIDS, CONSTRAINT_EXPRS) 
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/FPTuner/src/tft_sol_exprs.py", line 133, in GenerateErrorFormFromExpr
    text_terms = tft_get_first_derivations.GetFirstDerivations(expr)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/FPTuner/src/tft_get_first_derivations.py", line 201, in GetFirstDerivations
    ParseFPTaylorResults(cstr_expr, vs, fpt_outputs) 
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/FPTuner/src/tft_get_first_derivations.py", line 143, in ParseFPTaylorResults
    assert(id_comment in raw_expr_2_comment.keys()) 
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError

Any advice on how to proceed? Is it possible to use FPTuner without the INTERVAL library?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.