danielschemmel / fp-benchmarks-aachen Goto Github PK

CMake 0.05% Makefile 8.02% C 41.17% Shell 10.47% GDB 0.01% M4 3.22% C++ 4.75% Yacc 0.21% Lex 0.06% XS 0.55% Perl 1.12% HTML 0.57% TeX 4.86% Assembly 23.60% Fortran 0.02% Emacs Lisp 0.10% Batchfile 0.04% Python 0.07% Roff 0.95% Perl 6 0.18%

fp-benchmarks-aachen's Issues

`ld-unnormal_default.x86_64` is mis-compiled by Clang when optimizations are enabled

When ld-unnormal_default.x86_64 is built with optimizations enabled the benchmark is aggressively constant folded so many of the checks are not performed at run time. A bug in LLVM means this constant folding is done incorrectly which results in the benchmark always failing an assert when compiled with Clang and optimizations enabled.

The solution for now is to always build this benchmark without optimizations. The bug in LLVM will not be simple to fix so this will do for now.

I'll be submitting a fix soon.

Ninja & GMP

The GMP test will not build with ninja. Since it does seem to build with make, further analysis is currently deferred.

syn/atof_default.x86_64 is incorrect despite the spec saying it should be correct

Our tool just found a bug in this benchmark

Error: ASSERTION FAIL: out < 1
File: /home/user/fp-bench/benchmarks/c/aachen/syn/atof/test.c
Line: 14
assembly.ll line: 87

The generated test for this makes the "input" be

object    0: name: b'input'
object    0: size: 8
object    0:
object   0[0]: 0x00
object   0[1]: 0x00
object   0[2]: 0x32
object   0[3]: 0x65
object   0[4]: 0x31
object   0[5]: 0x00
object   0[6]: 0x00
object   0[7]: 0x00

effectively this means we pass the following C string to atof() during execution

"0.2e1"

This is 2.0 which means assert(out < 1) fails. This is a genuine bug which I have reproduced natively.

We never found this bug before because we previously had never let our tool execute for so long (~ 1hour).

sqr_ family of benchmarks incorrect

Aachen's fork of KLEE detected a problem with the aachen/syn/sqr_float-flow.x86_64 benchmark. This problem suggests that there is very likely to be a problem with all the variants of this benchmark. The assertion that fails is

assert(b == INFINITY || b == 0 || (a * 0.99 <= b && a >= b * 0.99) && "this should pass...");

Sidenote: Because the variables here are float the constants should be 0.99f. However the bug is still present even if this minor problem is removed.

Our (Imperial's) version of KLEE did not detect an error here due to the way we set up our solver timeout.

The problematic value generated for f is -0x1.8100f8p-75 (C99 hex float format). This is approximately -3.9808458254123446e-23.

Native replay of the generated KTest file confirms that the bug is reproducible.

The a *0.99 <= b condition is false for the value of f in the generated test case.

This assertion appears in the noflow (it's expected to fail here though) and long double and double variants too so I would expect those variants to fail too.

gmp_klee_inv_arg.x86_64 crashes KLEE by calling `raise()`

@danielschemmel @afd @ccadar

The gmp_klee_inv_arg.x86_64 benchmark crashes our fork of KLEE because libgmp's error handler tries to call raise(SIGFPE). Our version of KLEE just calls this as an external which causes KLEE is effectively call kill() on itself.

The spec file currently contains

  klee_inv_arg:
    defines:
      KLEE: null
      INVALID_ARGUMENTS: null
    dependencies:
      klee_runtime: {}
    description: >
      Tests if gmp correctly rejects invalid arguments. >
      This should never reach the assert in the main function.
    verification_tasks:
      no_assert_fail:
        correct: true
      no_reach_error_function:
        correct: false
        counter_examples:
          -
            description: 'Expected Error'
            locations:
              -
                description: 'first method of abortion'
                file: 'gmp-6.1.1/invalid.c'
                line: 84

The expected failure is on line 84 which is a call to abort(). Notice before that call is raise(SIGFPE).

void
__gmp_invalid_operation (void)
{
  raise (SIGFPE);
  abort ();
}

@danielschemmel I'm not sure how you to chose to handle raise() but the fact you put the abort() as the failure sounds like you are treated it as a no-op which isn't correct. KLEE doesn't currently have proper support for modelling signals so handling raise() properly is a bit out of scope.

How do you want to handle this? The simplest thing is to remove the raise(SIGFPE)

klee_assume(f == f) is not a good way to ignore paths where symbolics are NaN

I've noticed that the syn/halve/test.c benchmark does this

float f = 131, g;
#if defined(USE_KLEE)
klee_make_symbolic(&f, sizeof(f), "f");
#endif
klee_assume(f == f);

I presume that the use of klee_assume(f == f) is avoid considering paths where f is a nan. Although this will probably work I don't think this is a good idea for two reasons

It is not very clear what is being done here. klee_assume(isnan(f)) is much clearer.
Very aggressive compilers might optimize this to klee_assume(true). I've not seen this happen but apparently is can under -ffast-math. Although this is not a scenario we are checking I see no reason to avoid using the isnan() here.

Mistakes in `aachen/syn/sqr/spec.yml`

The spec for this file and the corresponding source file have some problems.

For the aachen/syn/sqr_double-flow.x86_64.bc benchmark we observe

KLEE: ERROR: /home/user/fp-bench/benchmarks/c/aachen/syn/sqr/test.c:45: ASSERTION FAIL: a * 0.99 <= b && a >= b * 0.99 && "this should have an error path..."

According to the spec we should only see an error on line 47.

Taking a closer look at ``aachen/syn/sqr/spec.yml` it suggests to me that.

For double-flow the ENABLE_FLOW define should be called ENABLE_INFINITY in the spec
For longdouble-flow the ENABLE_FLOW define should be called ENABLE_INFINITY in the spec

in order for the provided counter examples to make sense. Of course you could rename the macro in the source file instead. I don't mind which.

`ld-unnormal` requires more investigation

This benchmark fails depending on which compiler I use when I try to run it natively.

# Clang 3.4
~/dev/klee/llvm/build/Release+Debug+Asserts/bin/clang -O0 -lm syn/ld-unnormal/test.c  && ./a.out

No failures

# Clang 3.9
clang -O0 -lm syn/ld-unnormal/test.c  && ./a.out

No failures

# gcc (GCC) 6.2.1 20160830
gcc -O0 -lm syn/ld-unnormal/test.c  && ./a.out
a.out: syn/ld-unnormal/test.c:12: main: Assertion `!isnan(l)' failed.
Aborted (core dumped)

assert fails.

The specification for `real/sorting_doubles` is incorrect.

The spec for this benchmark claims it is correct w.r.t to all verification tasks. This is not correct.

Our tool reports that an assertion can fail.

real/sorting/main.c:10: ASSERTION FAIL: arr[i] >= arr[i - 1]

I've checked the generated test case. Which gives these concrete values

ktest file : 'test000018.ktest'
args       : ['/tmp/sorting_doubles.x86_64.bc']
num objects: 2
object    0: name: b'a'
object    0: size: 8
object    0: data: -inf
object    1: name: b'b'
object    1: size: 8
object    1: data: inf

I've taken the ktest file and replayed it on the native binary and confirmed. The problem here I think is that when a is -inf and b is inf then you will have an element a + b in the array being sorted which evaluates to NaN

fp-benchmarks-aachen/real/sorting/main.c

Line 20 in 4f8338a

double arr[] = {10., a, b, .3, .2, 1., 3., a + b, 1., 8.};

after doing a sort the NaN will still be in the array and will not be comparable to any other element.

danielschemmel / fp-benchmarks-aachen Goto Github PK

fp-benchmarks-aachen's Issues

`ld-unnormal_default.x86_64` is mis-compiled by Clang when optimizations are enabled

Ninja & GMP

syn/atof_default.x86_64 is incorrect despite the spec saying it should be correct

sqr_ family of benchmarks incorrect

gmp_klee_inv_arg.x86_64 crashes KLEE by calling `raise()`

klee_assume(f == f) is not a good way to ignore paths where symbolics are NaN

Mistakes in `aachen/syn/sqr/spec.yml`

`ld-unnormal` requires more investigation

The specification for `real/sorting_doubles` is incorrect.

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent