sosy-lab / sv-benchmarks Goto Github PK

View Code? Open in Web Editor NEW

188.0 34.0 169.0 3.53 GB

Collection of Verification Tasks (MOVED, please follow the link)

Home Page: https://gitlab.com/sosy-lab/benchmarking/sv-benchmarks

verification-task sv-comp software-verification c java witness competition benchmark verification

sv-benchmarks's Introduction

Moved to https://gitlab.com/sosy-lab/benchmarking/sv-benchmarks

Collection of Verification Tasks

Repository Description

Purpose

This collection of verification tasks is constructed and maintained as a common benchmark for evaluating the effectiveness and efficiency of state-of-the-art verification technology.

This repository is used by many research groups to evaluate the effectiveness and efficiency of verification algorithms for software. The category structure was developed for the International Competition on Software Verification SV-COMP.

The verification tasks were contributed by several research and development groups. After the submission of verification tasks, a group of people (mainly SV-COMP organizer and participants) are working on improving the quality of the verification tasks. This means that after the sets were made public, some programs were removed (no property encoded, unknown architecture), and some programs got technically improved (compiler warnings, memory model). These changes have improved the overall quality of the final set of programs for the competition SV-COMP, and have not changed the intended verification result; all changes are tracked in the public repository.

This repository is open for submission of new verification tasks! Please refer to our contribution guidelines to see how to submit verification tasks to this repository.

Thanks to all contributors of programs, patches, and discussion comments.

Structure

The collection consists of three directories, which contain verification tasks written in different languages:

c/ (programming language C, follows the GNU C standard, many programs even adhere to ANSI C)
java/ (programming language Java)
causes/ (systems from the other directories translated to Horn clauses and stored in SMT format)

License

The programs are under different licenses, which are specified either via a file LICENSE*.txt in the same directory, or via a comment in the program header. Most of the programs are under an open-source license such as Apache 2.0 or GPL.

Origin, Description, and Attribution

The subdirectories that contain the programs contain files README.txt, which give further information about the programs, in particular, this is the place to trace the origin and to attribute the programs to their contributors. For some programs, this information is given in the header of the program as comment.

Definitions

The following definitions are taken from the SV-COMP report 2016 (and previous years).

A verification task consists of

a program,
a specification (set of properties), and
parameters.

A category is a set of verification tasks.

A sub-category is a set of verification tasks that consist of the same specification.

A verification run is

a non-interactive execution
of one verification tool
on one verification task
under specific resource constraints in order to check whether the following statement is correct: "The program satisfies the specification."

A verification result is a triple (ANSWER, WITNESS, TIME) with

ANSWER is an element from {TRUE, FALSE, UNKNOWN},
WITNESS is a violation witness or correctness witness in the common witness format that supports validation of the (untrusted) answer, and
TIME is the CPU time that the verification run has consumed (in practice, also other resource measurement values are reported).

Programs

The program files in this repository are named as follows:

the original file name or short title of the program is given at the beginning,
for each property against which the program is to be verified, the string _false-<property> or _true-<property> is included, according to the expected verification answer, and
the filename ends with ending .c for not preprocessed files and .i for preprocessed files (for C files).

For example, the program minepump_spec5_product61_true-unreach-call_false-termination.cil.c is expected to satisfy property unreach-call and to violate property termination.

There are some old programs that have ending .c although they are preprocessed.

Behavioral Specifications

There are several 'default' specifications that many people use:

unreach-call: A certain function call must not be reachable in the program.
valid-memsafety, valid-deref, valid-free, valid-memtrack: A certain memory safety property must hold in the program. "memsafety" is the conjunction the other three properties.
valid-memcleanup: All allocated memory must be deallocated before the program terminates (note that this is stronger then avoiding memory leaks).
no-overflow: A certain kind of undefined behavior (overflows of signed ints) must not be present in the program.
termination: The program must terminate on all execution paths.

The above specifications are used, e.g., by SV-COMP, and the competition reports explains those specifications.

Test Specifications

The following are some 'default' specifications that many people use for test-case generation:

coverage-branches: The generated test suite should cover all branches of the program.
coverage-error-call: The generated test suite should contain (at least) one test case that covers the call of a certain function.

The above test specifications are used, e.g., by Test-Comp, and the competition reports define those specifications.

Parameters

The parameters of a verification task are needed to make additional information about the verification task available to the verification run. The most prominent parameter is the machine model; currently, there are verification tasks for the ILP32 (32-bit) and the LP64 (64-bit) architecture (cf. https://www.unix.org/whitepapers/64bit.html).

Task Definitions

In order to obtain verification tasks from the programs and specifications in the repository, a simple task-definition mechanism is used. We use version 2.0 of this format with some additional requirements. For each program, the repository contains a .yml file that specifies the following items:

format_version: the version of the format (the version string 2.0)
input_files: the subject program files or directories (a file or directory name, or a list of files or directory names, that the program consists of)
properties: the properties that constitute the specification of the program, each consisting of the following items:
- property_file: file that contains a property definition (cf. common property files for C and for Java)
- expected_verdict: the intended verification result (true or false, only for non-coverage properties)
- subproperty (optional): a subproperty of the property that is violated in cases where the property is a conjunction of subproperties (for verdict false)
options: parameters that are relevant for verification or give extra information:
- language: programming language that the program is written in (C or Java)
- data_model data model of the computer architecture (ILP32, LP64, see https://www.unix.org/whitepapers/64bit.html, only for C programs)

Optional items are explicitly marked as optional, all other items are mandatory. The dictionary options can contain additional data that are not mentioned above.

The SV-COMP 2019 report has documented the first version of the repository's task-definition format 1.0, and contains a description of the format with an example in Sect. 4 and Fig. 3. Format 2.0 adds the options dictionary. Here as example an extract of the task-definition file c/list-properties/list-1.yml:

format_version: '2.0'

input_files: 'list-1.i'

properties:
  - property_file: ../properties/unreach-call.prp
    expected_verdict: true
  - property_file: ../properties/valid-memsafety.prp
    expected_verdict: false
    subproperty: valid-memtrack

options:
  language: C
  data_model: ILP32

sv-benchmarks's People

Contributors

Stargazers

Watchers

Forkers

tautschnig mutilin lgpbentes cryingshadow seahorn zmanchun sutd delcypher gildotomasco symbooglix inkytonik blexim cheshire vlstill versokova zvonimir staticafi smackers oliverstth mitchellgerrard niyasc aziem msaxena2 quiknlucky graydon ottseb aprove-developers lequangloc yihaoyan martinhruska mikhailramalho driemey trucnguyenlam mmuesly anushrijana peterschrammel ziqingluo wenhaowu0udel jhensel jiten-thakkar lucasccordeiro druidos heizmann arthaud algobardo crowdhorn serkern acidburn0zzz vmordan ultimate-pa lembergerth boehmseb jeremygu ksuenaga pavelandrianov dipali-bhutada damonliuthu martinspiessl guluchen aogrcs marek-trtik farian h-lin kratl ufwt emankcin paradise-fi praveenkulkarni1996 1933131 shivayu007 dopelsunce karkhaz avritichauhan xuechaofelix pruemmer verifierintegerassignment xvitovs1 bitcalc michkot yinliangze omainv aadityanaik ccadar oraluben fanhy2114 jin-hui donggeliu moritzbuhl cyrille-artho gennaro-parlato gernst reversingwithme lost222 numairmansur eboyu yanxx297 maxwies kishorekumar1505028 glebpro kcsl

sv-benchmarks's Issues

Incorrect labeling of two ldv-regression benchmarks

Hi,

as far as I can say these two benchmarks are incorrectly labeled:

ldv-regression/test29_true-unreach-call.c
ldv-regression/test24_false-unreach-call.c

I put the reasons as comments to the sources:

/* ldv-regression/test29_true-unreach-call.c */
union dummy {
  int a;
  char b;
};

int main()
{
  union dummy d1, d2;
  int n = __VERIFIER_nondet_int();
  /* let's say n == 0 */
  union dummy *pd = n ? &d1 : &d2;
  /* pd = &d2 */
  if (pd == &d1) {
    pd->a = 0;
  } else {
    /* we take this branch, setting d2.b to 0,
       that is, set the _first_ byte of d2.a to 0 */
    pd->b = 0;
  }
  /* pd == &d2, d2.b == 0, but not whole d2.a == 0 on systems
     where sizeof(int) > sizeof(char), because we set only the
     first byte. That is, d2.a may not be 0 and we jump to the error label */
  if (pd == &d2 && d2.a != 0) {
    goto ERROR;
  }

  return 0;

  ERROR: __VERIFIER_error();
  return 1;
}

If you want a real experience, then change the first two lines to this:

  union dummy d1 = {0xffff}, d2 = {0xffff};
  int n = 0;

compile & run & see :)

For the other benchmark:

/* ldv-regression/test24_false-unreach-call.c */
int check(struct dummy *ad1, int b)
{
  return ad1[b].a == b;
}

int main()
{
  struct dummy ad1[10], *ad2;
  int i, *pa;
  if (i >= 0 && i < 10) {
    ad2 = ad1;
    ad1[i].a = i;
    pa = &ad1[i].a;
    /* before this assignment, i is in the range [0,9],
       after it, it is [10, 19] since ad2[i].a == i (ad2 == ad1) */
    i =  ad2[i].a + 10;
    /* this condition is never true since *pa is
        the old value of i (i.e it is the same as i + 10 < i) */
    while (i < *pa) {
      ++i;
    }
    /* here, the value of i is in the range [10, 19],
       which means that we make an out-of-bound
       access in the following check - which is the only error
       in this benchmark. The error here is thus that the benchmark
       contains undefined behaviour */
    if (!check(ad1, i)) {
      goto ERROR;
    }
  }

  return 0;

  ERROR: __VERIFIER_error();
  return 1;
}

Can somebody, please, confirm this?

Thanks,
Marek

BusyBox "false" actually "true"?

The tasks

sv-benchmarks/c/busybox-1.22.0/hostid_false-unreach-call.c
sv-benchmarks/c/busybox-1.22.0/logname_false-unreach-call.c
sv-benchmarks/c/busybox-1.22.0/readlink_false-unreach-call.c
sv-benchmarks/c/busybox-1.22.0/sync_false-unreach-call.c
sv-benchmarks/c/busybox-1.22.0/tty_false-unreach-call.c
sv-benchmarks/c/busybox-1.22.0/uniq_false-unreach-call.c
sv-benchmarks/c/busybox-1.22.0/usleep_false-unreach-call.c
sv-benchmarks/c/busybox-1.22.0/whoami-incomplete_false-unreach-call.c

supposedly contain specification violations, but the only assertion I can see is the NULL-check on argv, which is malloc'ed by the wrapper. According to the SV-COMP rules, malloc does never return NULL, so argv should not be NULL and the error should be unreachable.

Problem with ldv-linux-3.4-simple/32_1_cilled_true-unreach-call_ok_nondet_linux-3.4-32_1-drivers--media--video--videobuf-vmalloc.ko-ldv_main0_sequence_infinite_withcheck_stateful.cil.out.c

4 years ago in 8344f7e there was a report that the file ldv-linux-3.4-simple/32_1_cilled_true-unreach-call_ok_nondet_linux-3.4-32_1-drivers--media--video--videobuf-vmalloc.ko-ldv_main0_sequence_infinite_withcheck_stateful.cil.out.c might contain a counterexample:

  We find a counter-example because this example assumes a specific
  memory model. In my opinion, it is not a good example for the
  competition.

  void main(void)
  {
     // -- assumed to be uninitialized disjoint memory region
     struct vm_area_struct *var_group1;

     ...
     videobuf_vm_close (var_group1);
     ...
  }

  static void videobuf_vm_close(struct vm_area_struct *vma )
  {
     q = ((struct videobuf_mapping *)vma->vm_private_data)->q;
     ...
     videobuf_queue_lock(q);
     ... // code that writes memory through pointers
     videobuf_queue_lock(q);
  }

  __inline static void videobuf_queue_lock(struct videobuf_queue *q )
  {
     if (!q->ext_lock)
       mutex_lock (&(q->vb_lock))
  }

  __inline static void videobuf_queue_unlock(struct videobuf_queue *q )
  {
    if (!q->ext_lock)
      mutex_unlock (&(q->vb_lock))
  }

  A counterexample here is an execution in which exactly one of
  mutex_lock()/mutex_unlock() is called.

  The problem with this example is that var_group1 is not
  allocated. So, our semantics is to assume that it occupies a
  non-deterministically chosen memory regions (i.e., chosen by
  __VERIFIER_nondet_pointer()). However, under this assumption, it is
  possible that the code between videobuf_queue_lock() and
  videobuf_queue_unlock() writes into q->ext_lock field.

What should we do with this? @dbeyer who sent this problem report?

Repository metadata redesign should be considered

This is reposting of a comment I made in #194 to make it a separate issue.

I'm not happy with the Category.cfg file (or any of the other files that describe the verification tasks for that matter). When organising benchmarks we have a choice, have a single file describe everything about a benchmark (which is what I went for when I made the sv-comp mock up) or distribute this information across multiple files (what the repository currently does). Both choices have their advantages/disadvantages.

The single file describing everything about a benchmark (the spec.yml file in my mock up) has the following advantages:

Easy for the build system to build the benchmark because it does not need to traverse and process lots of files to work out what to do. This not only faster but also much simpler.
The same (as above) applies to verifiers wanting to verify the benchmark.
It makes it very easy to specify multiple properties and their correctness for a benchmark.
It makes the repository more modular (i.e. you can take any sub-directory and that on its own is a valid benchmark repository that can be built and verified).

It has the following disadvantages:

It introduces some wasted space. For example is my mock-up every spec.yml file has to specify what categories a benchmark belongs to. This uses up more disk space than the *.set files do.
It is harder to gain aggregate statistics (e.g. "How many benchmarks are in category X?", "How many benchmarks as expected to be correct?") because every spec.yml file has to be loaded into to memory.

The multiple file approach has the following advantages:

Easier to gain aggregate statistics because there are less files to parse
This representation of benchmark information is more compact.

It has the following disadvantages:

Harder for the build system to work out how to build the benchmarks because multiple files need to be traversed and loaded and parsed.
The same (as above) applies to verifiers.
Not modular.

There is a trade off here but I consider the single file (per benchmark) approach to be better. The weakness of gathering statistics can be solved by having additional scripts to traverse the repository and gather the needed information. The modular nature of this organisation is also quite nice because you can pass the script a sub directory of the repository and it will gather statistics only for that sub directory.

Whatever choice is made it is important that metadata be machine readable. Although the Category.cfg file kind of looks like YAML, there's no schema that documents the format and allows it to be easily read by a machine.

This is why the mock up I had provided a versioned (for doing upgrades) schema for the spec.yml files.

Continuous integration for "compilable check"

@dbeyer @tautschnig

I was looking other some stuff in the repository and I notice you seem to be having some problems with benchmarks that don't compile properly with warnings enabled. I have a suggestion to try to prevent this from getting any worse.

Create an actual set of Makefiles that will actually make it easy to compile the benchmarks individually or all together. Some of the makefiles would mark the benchmarks that are blacklisted.
Add TravisCI (free continuous integration service for open source repositories) to this repository to run this makefile for every PR or commit to prevent additional benchmarks being introduced that don't conform.

I'm currently trying to see if I can get KLEE to symbolically execute some of the benchmarks some I going to be doing step 1 anyway and I have lots of experience using TravisCI so I'm happy to set that up as well if it is desired.

Type errors in pthread-lit/fk2012_true-unreach-call

The "producer" function returns integers without casting them to (void*) first and the "consumer" function is missing a return statement. Also, the return values of the "producer" function aren't used anywhere.

Cannot compile ldv-races/race-4_2-thread_local_vars_false-unreach-call.c

This benchmark cannot be compiled using clang, since thread_ath9k function with returns type void* contains return;.

race-4_2-thread_local_vars_false-unreach-call.c:34:5: error: non-void function 'thread_ath9k' should return a value [-Wreturn-type]
return;

seq-pthread benchmarks have misleading indentation

cs_queue_true-unreach-call.i: In function ‘empty’:
cs_queue_true-unreach-call.i:1048:2: warning: this ‘else’ clause does not guard... [-Wmisleading-indentation]
  else
  ^~~~
cs_queue_true-unreach-call.i:1049:14: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the ‘else’
   __CS_cs(); if (__CS_ret != 0) return 0;
              ^~
cs_queue_true-unreach-call.i: In function ‘full’:
cs_queue_true-unreach-call.i:1064:2: warning: this ‘else’ clause does not guard... [-Wmisleading-indentation]
  else
  ^~~~
cs_queue_true-unreach-call.i:1065:14: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the ‘else’
   __CS_cs(); if (__CS_ret != 0) return 0;
              ^~
cs_queue_true-unreach-call.i: In function ‘dequeue’:
cs_queue_true-unreach-call.i:1106:2: warning: this ‘else’ clause does not guard... [-Wmisleading-indentation]
  else
  ^~~~
cs_queue_true-unreach-call.i:1107:14: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the ‘else’
   __CS_cs(); if (__CS_ret != 0) return 0;
              ^~
.............cs_queue_false-unreach-call.i: In function ‘empty’:
cs_queue_false-unreach-call.i:1028:2: warning: this ‘else’ clause does not guard... [-Wmisleading-indentation]
  else
  ^~~~
cs_queue_false-unreach-call.i:1029:14: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the ‘else’
   __CS_cs(); if (__CS_ret != 0) return 0;
              ^~
.cs_queue_false-unreach-call.i: In function ‘full’:
cs_queue_false-unreach-call.i:1044:2: warning: this ‘else’ clause does not guard... [-Wmisleading-indentation]
  else
  ^~~~
cs_queue_false-unreach-call.i:1045:14: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the ‘else’
   __CS_cs(); if (__CS_ret != 0) return 0;
              ^~
cs_queue_false-unreach-call.i: In function ‘dequeue’:
cs_queue_false-unreach-call.i:1086:2: warning: this ‘else’ clause does not guard... [-Wmisleading-indentation]
  else
  ^~~~
cs_queue_false-unreach-call.i:1087:14: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the ‘else’
   __CS_cs(); if (__CS_ret != 0) return 0;

New category for undefined behavior?

Currently, we have this structure for SV-COMP 2017:
https://sv-comp.sosy-lab.org/2017/categories.svg
The main categories Arrays, BitVectors, and HeapDataStructures
contain each a sub-category for memsafety or overflows,
which in turn represent undefined behavior (invalid free, invalid deref, and invalid overflows).

Would it be better to concentrate undefined behavior in an own property class or category?

I.e., wouldn't it be better to move invalid free, invalid deref, and invalid overflows to a category
for undefined behavior, and leave mem-leak in the Heaps category?

Overflows: Were put to BitVectors because it was assumed that bit-vector theory is good to find those bugs. But other techniques work as well. So the connection to bit-vectors is not obvious anymore.

Arrays-MemSafety: Are those verification tasks really interesting properties for a theory of arrays,
or are we simply looking for undefined stuff?

I am wondering if a property/category for "Defined Behavior" makes more sense in terms of
a more logical or intuitive structure,
and is a good idea to invite definedness checkers to the competition in that category.

Remerge #139 ("preproc")

It looks like #139 which was recently merged has been reverted which is a bit of a pain. If there was something wrong with the PR it should have not been merged.

Anyway I'd like to find out what you want ( @dbeyer ) changed so the code can be remerged as a new PR.

After speaking to @PhilippWendler I'm happy to re-license the SVCB code under Apache 2.0 but I want to know:

Are there any other changes needed before I submit a new PR?

c/loops/veris.c_OpenSER__cases1_stripFullBoth_arr_true-unreach-call has no assertion

There is no call to assert or __VERIFIER_assert in the program c/loops/veris.c_OpenSER__cases1_stripFullBoth_arr_true-unreach-call. We presume this is an inadvertent omission.

Define and check license requirements

We should discuss what licenses are acceptable for benchmarks, i.e., what kind of rights to we need to be granted in the license.

I would list at least the following requirements:

use the files
change the files (e.g., preprocessing, when we find a bug, or change some __VERIFIER_* specifics)
redistribute, both original and changed
do all of this commercially (we don't want to prevent authors of commercial tools to do this)

When this is answered, it should also be added to the documentation.

Furthermore, we should check for all existing directories whether their license grants the necessary rights:

loop-industry contains declaration that apparently doesn't declare anything

gcc 6.2.1 emits these warnings

make[1]: Entering directory './loop-industry-pattern/'
.....aiob_2_true-unreach-call.c:1473:23: warning: declaration does not declare anything
 nozomi5nozomi5_4290_M ;
                       ^
aiob_3_true-unreach-call.c:1473:23: warning: declaration does not declare anything
 nozomi5nozomi5_4290_M ;
                       ^
ofuf_2_true-unreach-call.c:1473:23: warning: declaration does not declare anything
 nozomi5nozomi5_4290_M ;
                       ^
ofuf_4_true-unreach-call.c:1473:23: warning: declaration does not declare anything
 nozomi5nozomi5_4290_M ;
                       ^
...aiob_4_true-unreach-call.c:1473:23: warning: declaration does not declare anything
 nozomi5nozomi5_4290_M ;
                       ^
ofuf_5_true-unreach-call.c:1473:23: warning: declaration does not declare anything
 nozomi5nozomi5_4290_M ;
                       ^
..ofuf_3_true-unreach-call.c:1473:23: warning: declaration does not declare anything
 nozomi5nozomi5_4290_M ;
                       ^
.ofuf_1_true-unreach-call.c:1473:23: warning: declaration does not declare anything
 nozomi5nozomi5_4290_M ;
                       ^
aiob_1_true-unreach-call.c:1473:23: warning: declaration does not declare anything
 nozomi5nozomi5_4290_M ;

Here's a snippet of the offending code

typedef union {
int  Id_MCDC_600;
__pthread_slist_t Id_MCDC_601;
} T2T2_10370_M;
typedef T2T2_10370_M nozomi5nozomi5_4290_M;
struct __pthread_mutex_s{
int  Id_MCDC_4;
unsigned int  Id_MCDC_410;
int  Id_MCDC_602;
int  Id_MCDC_603;
unsigned int  Id_MCDC_604;
nozomi5nozomi5_4290_M ;
} ;

Note nozomi5nozomi5_4290_M is just a type and in the __pthread_mutex_s struct it is used but no member name is given so I don't think anything is declared.

Uninitialized variables in floats-esbmc-regression

@mikhailramalho
The *nondet* files contain uninitialized variables.
Instead, they should use e.g. __VERIFIER_nondet_float() or __VERIFIER_nondet_double().

Problematic assembly in ldv-linux-3.0 benchmark

In ldv-linux-3.0/usb_urb-drivers-media-video-c-qcam.ko_true-unreach-call.cil.out.i.pp.cil.c some assembly on line 6009

            case_2: 
#line 222
            __asm__  ("mov"
                      "w "
                      "%%"
                      "gs"
                      ":"
                      "%P"
                      "1"
                      ",%0": "=r" (pfo_ret__): "p" (& kernel_stack));
#line 222
            goto switch_break;

seems to cause gcc 5.2 problems when building doing

$ make SYNTAX_ONLY=0

/anthill/stuff/tacas-comp/inst/current/envs/linux-3.0.1/linux-3.0.1/arch/x86/include/asm/thread_info.h:222: Error: incorrect register `%rax' used with `w' suffix

A lot of the surrounding assembly is commented out. What should be done here?

Categories with inconsistent architectures

The category HeapReach is a 32-bit category, but it contains benchmarks from the directory ldv-regression, which are claimed to be 64 bit.

The reverse is true for the 64-bit category Termination, which recently got added the 32-bit files in product-lines (db70dcd).

Add -Wno-builtin-requires-header globally?

For some directories, Clang complains about declaration of built-in function '%0' requires inclusion of the header ....h. It seems this is not something we can avoid as long as we have preprocessed benchmarks. Thus I suggest to add -Wno-builtin-requires-header to DEFAULT_CLANG_WARNINGS in Makefile.config to disable this warning globally.

@delcypher Did I understand this situation correctly? Do you know another solution that does not involve adding #include to .i files or removing the .i files?

Improve compilation checks

It would be good to improve the compilation checks that are executed for all benchmarks:

Read architecture of files from official ~~Category.cfg~~ task-definition files instead of Makefile to avoid the need to specify it twice.
New files and directories should be picked up automatically, without need to explicitly list them in another file.
Do not abort on first error, but show all errors.
Enable all warnings and treat them as errors by default, but allow to silence certain warnings per directory.

I keep wondering whether the existing Makefile-based system is a good fit for this. The Makefile syntax is very hard to read (especially Makefile.rules) and I would have no idea how to implement these features. @delcypher how could this be done?

Enable -Werror=return-type and -Werror=main-return-type

I'd like to add -Werror=return-type to COMMON_WARNINGS_AS_ERRORS and -Werror=main-return-type to CLANG_ONLY_WARNINGS_AS_ERRORS.

Unfortunately several benchmarks trigger these warnings and so the build fails when treated as errors.

Three c/loop-lit programs use assert instead of __VERIFIER_assert

The following programs in the c/loop-lit category use assert instead of __VERIFIER_assert. For consistency with the other programs they should probably call __VERIFIER_assert (although it is straight-forawrd to support both in a tool):

bhmr2007_true-unreach-call.c
19: assert(a + b == 3*n);

gcnr2008_false-unreach-call.c
24: assert(x >= 4 && y <= 2);

hhk2008_true-unreach-call.c
17: assert(res == a + b);

Pre-processed files should be removed

The pre-processed files are causing lots of problems

It's really easy to make an architecture mistake when pre-processing
a source file (i.e. what I reported on in my first e-mail in this
thread). See #121
It is a huge waste of space. We are effectively doubling (at least)
the size of the repository by having these pre-processed files under
source control
It doubles the maintenance effort when fixing benchmarks files
because both the *.i and corresponding *.c files need to be
fixed by hand! I dare not regenerate the *.i files automatically
because I have no way of knowing what build environment the original
creator of the benchmark used (this is why we need to define a
"canonical" build environment). It also makes code reviews harder
The line pragmas in the *.i files make debugging benchmarks
extremely difficult. When I found compilation issues I had to remove
all the line pragmas to get a readable error message from the
compiler.

Several floats-cdfpl benchmarks will not compile without defines!

The build is currently passing because the build system sets

CC.Flags += -DNR=1 -DITERATIONS=2

for the floats-cdfpl benchmarks but this bad because the benchmarks won't build unless you know the necessary defines.

Merge some subcategories of "Integers and Control Flow"?

From the description of the subcategories ControlFlow and Loops, it is not clear why these two categories are separate. After all, at least the programs in ssh-simplified and locks also contain a crucial loop.

So maybe we should either merge the two subcategories or improve their description, because in #160 it is not really clear where to put the new files. @cheshire you have worked a lot with ´Loops` I guess, do you have some info on this?

The subcategories Simple and ControlFlow should be merged if both have the same memory model as discussed on the mailing list.

Increase Traceability and Reduce Maintenance Effort for Verification Tasks

How to generate the .i files automatically from the .c files?
How to remember architecture dependencies?
How to better reflect what the intentions of a verification tasks are and were it comes from?

Fix BusyBox benchmarks to make them usable

The files in the busybox-1.22.0 directory should be brought into a shape that makes it possible to use them in some category.

Unfortunately, I do not know what exactly needs to be done.
@mdangl, if I remember correctly you once looked at them? Can you repeat your findings here, please?

Ping original submitters to fix errors spotted by continuous integration checks

Hi,
In #4, #5, #6 I have provided patches for those benchmarks that I have submitted or feel somewhat responsible for. check-blacklist nevertheless still lists 3560 broken benchmarks, which ought to be fixed before the next edition of the competition commences. It would be great if the original submitters of the benchmarks could be pinged to review and fix their submissions.

To help people understand what is going wrong with their specific files, they should run

cd c ; ./check <directory-of-interest>/*.[ci]

Thanks a lot,
Michael

loops/nec40_true-unreach-call seems to be incorrect

If x contains one != zero element, then when k is zero the program reaches the always failing assertion since y contains { non-zero, zero, ... } and i is one.

Should this one be reclassified as a "false' benchmark? OR are some other assumptions made about it's execution?

question on array_ptr_partial_init_true-unreach-call

This file is marked as true, but according to the standard If the space cannot be allocated, a null pointer is returned. (section 7.22.3).

typedef long unsigned int size_t;
extern void *malloc (size_t __size) __attribute__ ((__nothrow__ , __leaf__)) __attribute__ ((__malloc__)) ;
extern void __VERIFIER_error() __attribute__ ((__noreturn__));
void __VERIFIER_assert(int cond) { if(!(cond)) { ERROR: __VERIFIER_error(); } }
#define NULL 0
#define SIZE 100000

int *a[SIZE];
int i;
int main ()
{
	for(i = 0; i < SIZE; i++)
	{
		a[i] = NULL;
	}

	for(i = 0; i < SIZE / 2; i++)
	{
		a[i] = malloc(sizeof(int)) ;
	}


	for(i = 0; i < SIZE / 2; i++)
	{
		__VERIFIER_assert(a[i] != NULL);
	}

return 0;
}

Explicit null-pointer access in memsafety-ext/tree_dsw_true-valid-memsafety.c

In task memsafety-ext/tree_dsw_true-valid-memsafety.c , line 69, one can find the following statement:
((struct TreeNode*)NULL)->left = NULL;

To me, this seems like an undefined null-pointer access.
Can anyone help me make sense of the meaning of this statement?

The surrounding code in the file is the following:

 26  struct TreeNode* root = malloc(sizeof(*root)), *n;
 27   root->left = NULL;
 28   root->right = NULL;
 29 
 30   while (__VERIFIER_nondet_int()) {
 31     n = root;
 32     while (n->left && n->right) {
 33       if (__VERIFIER_nondet_int())
 34         n = n->left;
 35       else
 36         n = n->right;
 37     }
 38     if (!n->left && __VERIFIER_nondet_int()) {
 39       n->left = malloc(sizeof(*n));
 40       n->left->left = NULL;
 41       n->left->right = NULL;
 42     }
 43     if (!n->right && __VERIFIER_nondet_int()) {
 44       n->right = malloc(sizeof(*n));
 45       n->right->left = NULL;
 46       n->right->right = NULL;
 47     }
 48   }
 49 
 50   struct TreeNode sentinel;
 51 
 52   n = root;
 53   struct TreeNode* pred = &sentinel;
 54   struct TreeNode* succ = NULL;
 55 
 56   while (n != &sentinel) {
 57     succ = n->left;
 58     n->left = n->right;
 59     n->right = pred;
 60     pred = n;
 61     n = succ;
 62     if (!n) {
 63       n = pred;
 64       pred = NULL;
 65     }
 66   }
 67 
 68   if (pred != root)
 69     ((struct TreeNode*)NULL)->left = NULL;
 70

seq-mthread/pals_STARTPALS_ActiveStandby* benchmarks have suspicious self comparision

Clang is emitting a lot of warnings of the same type for some very suspicious looking code in the

seq-mthread/pals_STARTPALS_ActiveStandby*.c benchmarks.

The warnings look similar to this:

pals_STARTPALS_ActiveStandby_false-unreach-call.4_1.ufo.BOUNDED-10.pals.c:320:22: warning: self-comparison always evaluates to false [-Wtautological-compare]
    s1s1_new = nomsg != nomsg && s1s1_new == nomsg ? nomsg : s1s1_new;

The author of these benchmarks should fix this.

``regression`` benchmarks contain GCC compiler built-ins which Clang cannot handle

Clang cannot build some of the benchmarks in the regression folder due to the files containing compiler built-ins that are incompatible with Clang. Passing -fno-builtins does not fix the issues here.

The first issue I saw was

clang building regression/drivers--usb--serial--spcp8x5.ko_001.0ba4034.32_7a.cil_true-unreach-call.i
/home/ldvuser/ldv/inst/kernel-rules/verifier/rcv.h:49:6: error: definition of builtin function '__builtin_expect'
long __builtin_expect(long exp , long c ) 
     ^
1 error generated.

Even if rename that function so it's builtin_expect() I run into more issues.

void __builtin_va_start(__builtin_va_list  ) ;
     ^
<compiler builtins>:1:6: note: '__builtin_va_start' is a builtin with type 'void (__va_list_tag *, ...)'
/work/ldvuser/novikov/work/current--X--drivers/isdn/mISDN/mISDN_core.ko--X--defaultlinux--X--39_7a--X--cpachecker/linux/csd_deg_dscv/30/dscv_tempdir/dscv/ri/39_7a/drivers/isdn/mISDN/layer1.c.prepared:194:45: error: 
      too few arguments to function call, expected at least 2, have 1
  __builtin_va_start((__va_list_tag *)(& va));
                                            ^
/work/ldvuser/novikov/work/current--X--drivers/isdn/mISDN/mISDN_core.ko--X--defaultlinux--X--39_7a--X--cpachecker/linux/csd_deg_dscv/30/dscv_tempdir/dscv/ri/39_7a/drivers/isdn/mISDN/layer2.c.prepared:193:45: error: 
      too few arguments to function call, expected at least 2, have 1
  __builtin_va_start((__va_list_tag *)(& va));
                                            ^
/work/ldvuser/novikov/work/current--X--drivers/isdn/mISDN/mISDN_core.ko--X--defaultlinux--X--39_7a--X--cpachecker/linux/csd_deg_dscv/30/dscv_tempdir/dscv/ri/39_7a/drivers/isdn/mISDN/tei.c.prepared:176:45: error: 
      too few arguments to function call, expected at least 2, have 1
  __builtin_va_start((__va_list_tag *)(& va));
                                            ^
/work/ldvuser/novikov/work/current--X--drivers/isdn/mISDN/mISDN_core.ko--X--defaultlinux--X--39_7a--X--cpachecker/linux/csd_deg_dscv/30/dscv_tempdir/dscv/ri/39_7a/drivers/isdn/mISDN/tei.c.prepared:320:45: error: 
      too few arguments to function call, expected at least 2, have 1
  __builtin_va_start((__va_list_tag *)(& va));
                                            ^

I'm not sure if it's possible to fix this. Personally I don't think compiler builtins should be appearing in benchmarks.

@zvonimir @dbeyer @PhilippWendler Any ideas here?

Enable WARNINGS_AS_ERRORS=1

One day I'd like to enable WARNIGS_AS_ERRORS=1 by default but right now there are so many warnings that I think this is going to have to be a long term goal. This issue will track this.

In order for this to be possible we would also need to pick our reference compilers (e.g. gcc 4.8 and clang 3.7) because the warnings reported vary between compiler versions.

C source files with ``.i`` extension

Why are there C source files with an .i extension rather than .c?

Currently the Makefile build system won't build the *.i files. It technically could be taught to but it really over complicates matters (I have to worry about name clashes when generating the *.o and *.d files).

Enable -Werror=implicit

I'd like to add -Werror=implicit to COMMON_WARNINGS_AS_ERRORS (see PR #105) to improve the quality of benchmarks that are in the suite. The -Werror=implicit flag is equivalent to passing -Werror=implicit-function-declaration and -Werror=implicit-int. Read the gcc man page for a description of these warnings. Note the -Werror=* means treat the warning * as an error.

Unfortunately there are some issues blocking this

The inclusion of decls.h needs to be removed. I'd like to keep around some file that documents what the expected function signatures are for the VERIFIER_*() functions but if we want the benchmarks to be truly standalone then we need to stop passing -include decls.h to the compiler. This is trivial to fix.

But then we have some much bigger problems...

The suite is littered with violations of -Werror=implicit-function-declaration.
The suite is littered with violations of -Werror=implicit-int.

veris.c_sendmail__tTflag_arr_one_loop_true-unreach-call.i has tautological compare

clang building /loops/veris.c_sendmail__tTflag_arr_one_loop_true-unreach-call.i
veris.c_sendmail__tTflag_arr_one_loop_true-unreach-call.i:29:24: warning: comparison of unsigned expression >= 0 is always true [-Wtautological-compare]
  __VERIFIER_assert (i >= 0);
                     ~ ^  ~
1 warning generated.

Various type checking errors reported in the busybox folder

The checking script proposed in #2 reports a number of errors, among them those in the busybox folder. My fork of the repository has all the necessary changes, and I will submit a pull request once #2 is closed.

Best,
Michael

Invalid memory access in DeviceDriversLinux64

Hi guys,

I found several problematic benchmarks that contain invalid memory access in DeviceDriversLinux64 category. For each benchmark, I will briefly describe the problem and its witness file can be found in a github repo (https://github.com/Guoanshisb/BenchmarksWithInvalidDerefs).

ldv-linux-3.4-simple/32_1_cilled_true-unreach-call_ok_nondet_linux-3.4-32_1-drivers--media--dvb--dvb-usb--dvb-usb-az6007.ko-ldv_main0_sequence_infinite_withcheck_stateful.cil.out.c
main calls az6007_tuner_attach with an local uninitialized pointer var_group4 as the argument. At line 8295, __cil_tmp27 is dereferenced which is var_group4 plus an offset.

ldv-consumption/32_7a_cilled_true-unreach-call_linux-3.8-rc1-32_7a-drivers--gpu--drm--ttm--ttm.ko-ldv_main5_sequence_infinite_withcheck_stateful.cil.out.c
main calls ttm_bo_vm_fault with two local uninitialized pointer var_group1 and var_group2. The first argument is derefenced multiple times in this function. For example, at line 9887, an uninitialized function pointer is dereferenced which can alias to any function with identical signature.

ldv-consumption/32_7a_cilled_true-unreach-call_linux-3.8-rc1-32_7a-drivers--pcmcia--pcmcia.ko-ldv_main2_sequence_infinite_withcheck_stateful.cil.out.c
pccard_show_cis is called by main with two local uninitialized pointers var_group1 and var_group2 as arugments. pccard_validate_cis is called by pccard_show_cis with a argument which is essentially var_group2 plus an offset. This argument is passed around and dereferenced directly or indirectly.

ldv-linux-3.16-rc1/205_9a_array_safes_linux-3.16-rc1.tar.xz-205_9a-drivers--net--usb--rtl8150.ko-entry_point_true-unreach-call.cil.out.c
main calls rtl8150_probe with global pointer rtl8150_driver_group1 as an argument, which is not assigned to any memory location. rtl8150_probe then calls interface_to_usbdev which dereferences rtl8150_driver_group1.

@zvonimir @MontyCarter @mutilin

Some files in termination-memory-alloca not included in Termination category

1514524 from #19 added some files to the termination-memory-alloca directory, but these files were never added to some category so they are not used. @letonchanh, was the intention to add them to the Termination category?

In any case, it might be better to move these files to a separate directory, because the readme in this directory does not fit these files (it claims a different contributor). There is also the question what license these files have? @letonchanh?

termination-crafted/Nyala-2lex_false-valid-deref.c is misclassified

This benchmark does not even contain any dereferencing operator (or arrays with index access), and is thus trivially true-valid-deref.

See the discussion in 85a695e.

Many files in float-benchs have no known verdict

It seems they were submitted by Antoine Mine (cf. 54069e2), maybe we should contact him.

Document requirements for benchmark submission

The following list of things is my check list for assessing pull request with new benchmarks.
Each of these items should probably be mentioned in the documentation.

counterexample for array-industry-pattern/check_removal_from_set_after_insertion_true-unreach-call.i

The file array-industry-pattern/check_removal_from_set_after_insertion_true-unreach-call.i contains a bug.
The error can be reached with the following assignments: SIZE=2; values={0,0}; element=0;.
Error found with CPAchecker :-)

Suggestion: rename the file (.c and .i file)

Should pull requests add new benchmarks to categories?

When a pull request is made with new benchmarks, should the pull request also add these benchmarks to some category?

Currently this is handled inconsistently.

If new benchmarks are added to an existing directory, the existing file patterns in the category files also often match the new files, so they get implicitly added to the category.

For pull requests that add a new directory, some of them contain changes to .set files, some of them not.

The answer to this question should also be documented in the repository readme.

Not all tasks from the termination set are preprocessed

The following files contain #include statements:

termination-crafted/LexIndexValue-Array_true-termination.c
termination-crafted/SyntaxSupportPointer01_true-termination.c
termination-crafted/4BitCounterPointer_true-termination.c
termination-crafted/LexIndexValue-Pointer_true-termination.c
termination-crafted-lit/cstrcmp_true-termination.c
termination-crafted-lit/cstrncmp_true-termination.c
termination-crafted-lit/cstrspn_true-termination.c
termination-crafted-lit/cstrlen_true-termination.c
termination-crafted-lit/cstrpbrk_true-termination.c
termination-crafted-lit/strchr_true-termination.c
termination-crafted-lit/cstrcspn_true-termination.c

Unused files in ldv-regression

The directory ldv-regression contains files test[01..30]*.c that are not matched by any of the patterns in the category definitions, thus they are unused. It seems this is the case for at least 3 years. What was the intention for these files? Should we include them?

Some ntdrivers-simplified have tautological comparisions

........cdaudio_simpl1_false-unreach-call_true-termination.cil.c: In function ‘CdAudioPnp’:
cdaudio_simpl1_false-unreach-call_true-termination.cil.c:269:22: warning: self-comparison always evaluates to true [-Wtautological-compare]
           if (status == status) {
                      ^~
..cdaudio_simpl1_true-unreach-call_true-termination.cil.c: In function ‘CdAudioPnp’:
cdaudio_simpl1_true-unreach-call_true-termination.cil.c:269:22: warning: self-comparison always evaluates to true [-Wtautological-compare]
           if (status == status) {

Finish decls.h and make them the documented function signatures

The special __VERIFIER_* functions are not precisely specified by the current documentation which appears to have lead to some divergence in the expected function signatures for the various functions.

The new decls.h file has started making this explicit but...

All function signatures need implementing
We should decide on how we handle _Bool types. For example, the current implementation has

void __VERIFIER_assume(int);

but this probably should be

void __VERIFIER_assume(_Bool);

This file needs to be documented as the official function signatures.
Existing benchmarks should not declare these as they will now come from decls.h.

Benchmarks depending on undefined functions in DeviceDriversLinux64

Hi,

in those benchmarks there are calls to functions (or dereference of an external variable in one case) that are not defined and then a decision is made based on the return value (I linked exact lines):

https://github.com/sosy-lab/sv-benchmarks/blob/master/c/ldv-linux-4.2-rc1/linux-4.2-rc1.tar.xz-32_7a-drivers--net--usb--cdc_subset.ko-entry_point_true-unreach-call.cil.out.c#L6096

https://github.com/sosy-lab/sv-benchmarks/blob/master/c/ldv-commit-tester/m0_true-unreach-call_drivers-usb-gadget-g_printer-ko--106_1a--2b9ec6c.c#L3827

https://github.com/sosy-lab/sv-benchmarks/blob/master/c/ldv-linux-3.4-simple/32_1_cilled_true-unreach-call_ok_nondet_linux-3.4-32_1-drivers--staging--speakup--speakup_ltlk.ko-ldv_main0_sequence_infinite_withcheck_stateful.cil.out.c#L588

Could somebody from ldv team check it? I have more 'candidates' on my list that I haven't checked yet, so maybe I'll add some more later.

Thanks,
Marek

Incorrect pre-processing of source files leads to incompatible C library redeclarations

I've been investigating an issue that Clang has been warning about all
over the place (-Wincompatible-library-redeclaration) and I'd like to
draw attention to it because it implies that people who have been
preprocessing benchmarks have been a bit careless (or I have incorrect
knowledge of which architecture a benchmark is intended for).

Here's an example (with -Werror=incompatible-library-redeclaration
passed to Clang)

$ cd heap-manipulation
$ make CC=clang bubble_sort_linux_false-unreach-call.oi
clang building heap-manipulation/bubble_sort_linux_false-unreach-call.i
bubble_sort_linux_false-unreach-call.i:478:14: error: incompatible
redeclaration of library function 'malloc'
[-Werror,-Wincompatible-library-redeclaration]
extern void *malloc (size_t __size) __attribute__ ((__nothrow__ ,
__leaf__)) __attribute__ ((__malloc__)) ;
             ^
bubble_sort_linux_false-unreach-call.i:478:14: note: 'malloc' is a
builtin with type 'void *(unsigned int)'
bubble_sort_linux_false-unreach-call.i:480:14: error: incompatible
redeclaration of library function 'calloc'
[-Werror,-Wincompatible-library-redeclaration]
extern void *calloc (size_t __nmemb, size_t __size)
             ^
bubble_sort_linux_false-unreach-call.i:480:14: note: 'calloc' is a
builtin with type 'void *(unsigned int, unsigned int)'
bubble_sort_linux_false-unreach-call.i:492:14: error: incompatible
redeclaration of library function 'realloc'
[-Werror,-Wincompatible-library-redeclaration]
extern void *realloc (void *__ptr, size_t __size)
             ^
bubble_sort_linux_false-unreach-call.i:492:14: note: 'realloc' is a
builtin with type 'void *(void *, unsigned int)'
bubble_sort_linux_false-unreach-call.i:787:12: error: incompatible
redeclaration of library function 'snprintf'
[-Werror,-Wincompatible-library-redeclaration]
extern int snprintf (char *__restrict __s, size_t __maxlen,
           ^
bubble_sort_linux_false-unreach-call.i:787:12: note: 'snprintf' is a
builtin with type 'int (char *, unsigned int, const char *, ...)'
bubble_sort_linux_false-unreach-call.i:790:12: error: incompatible
redeclaration of library function 'vsnprintf'
[-Werror,-Wincompatible-library-redeclaration]
extern int vsnprintf (char *__restrict __s, size_t __maxlen,
           ^
bubble_sort_linux_false-unreach-call.i:790:12: note: 'vsnprintf' is a
builtin with type 'int (char *, unsigned int, const char *,
__builtin_va_list)'
5 errors generated.

AFAIK this benchmark (bubble_sort_linux_false-unreach-call.c) is
supposed to be a 32-bit (I look at the Makefile in the directory for
this, if CC.Arch is not set I assume 32-bit) program but the
warnings Clang show indicate that the pre-processed file
(bubble_sort_linux_false-unreach-call.i) was preprocessed for 64-bit
compilation.

For this particular benchmark the problem is that the typedef for
size_t is different depending on the architecture, as demonstrated
below

$ clang -m32 -xc <(echo "#include <stdio.h>") -E -o - | grep -E
"^typedef.+ size_t;$"
typedef unsigned int size_t;
$ clang -m64 -xc <(echo "#include <stdio.h>") -E -o - | grep -E
"^typedef.+ size_t;$"
typedef long unsigned int size_t;

In bubble_sort_linux_false-unreach-call.i near the top you can see

typedef long unsigned int size_t;

which implies that the benchmark was preprocessed with -m64
(possibly implicitly) even though the benchmark is supposed to 32-bit.
If I change the typedef to the one that should be used with
-m32 then the Clang warnings go away.

In this particular case on 32-bit unsigned int and long unsigned int are the same size but I worry that people incorrectly
pre-processing source files will lead to more problems that I haven't
thought about.

Clang compilation without errors

Currently compilation of many benchmarks fails with clang 3.7. The benchmarks need to be fixed.