Giter Club home page Giter Club logo

dataracebench's Introduction

DataRaceBench 1.4.0

DataRaceBench is a benchmark suite designed to systematically and quantitatively evaluate the effectiveness of data race detection tools. It includes a set of microbenchmarks with and without data races. Parallelism is represented by OpenMP directives. OpenMP is a popular parallel programming model for multi-threaded applications.

Note that some microbenchmarks use OpenMP 4.5 and 5.0 features. You may need a recent OpenMP compiler to compile them.

DataRaceBench also comes with an evaluation script (check-data-races.sh). The script can be used to evaluate the tools Helgrind, Archer, Thread Sanitizer, Intel Inspector, and Coderrect Scanner. In addition a parameterized test harness (scripts/test-harness.sh) is available which allows to provide a number of different parameters for the evaluation. The test harness is used by the evaluation script with some pre-defined values.

Quick Start

./check-data-races.sh

Usage: ./check-data-races.sh [--run] [--help] language

--help      : this option
--small     : compile and test all benchmarks using small parameters with Helgrind, ThreadSanitizer, Archer, Intel inspector.
--run       : compile and run all benchmarks with gcc (no evaluation)
--run-intel : compile and run all benchmarks with Intel compilers (no evaluation)
--helgrind  : compile and test all benchmarks with Helgrind
--tsan-clang: compile and test all benchmarks with clang ThreadSanitizer
--tsan-gcc  : compile and test all benchmarks with gcc ThreadSanitizer
--archer    : compile and test all benchmarks with Archer
--coderrect : compile and test all benchmarks with Coderrect Scanner
--inspector : compile and test all benchmarks with Intel Inspector
--romp      : compile and test all benchmarks with Romp
--customize : compile and test customized test list and tools

More information: User Guide

Latest Tool Evaluation Results

Data race detection tool dashboard

List of Benchmarks

Benchmark labels and lists - C/C++

Benchmark labels and lists - Fortran

Authors

DataRaceBench was created by Chunhua Liao, Pei-Hung Lin, Gaurav Verma, Yaying Shi, Joshua Asplund, Markus Schordan, and Ian Karlin.

Release

DataRaceBench is released under a BSD license. For more details see the file LICENSE.txt. The microbenchmarks marked 'Polyhedral' in above table were generated as optimization variants of benchmarks from the PolyOpt benchmark suite. For those benchmarks see the license file LICENSE.OSU.txt.

LLNL-CODE-732144

How to Cite DataRaceBench in a Publication

If you are referring to DataRaceBench in a publication, please cite the following paper:

If you use DataRaceBench v.1.4.0 or later, please additionally cite the following paper:

  • Pei-Hung Lin and Chunhua Liao, High-Precision Evaluation of Both Static and Dynamic Tools using DataRaceBench, International Workshop on Software Correctness for HPC Applications (Correctness) SC21, 2021 pdf

Other papers

  • Chunhua Liao, Pei-Hung Lin, Markus Schordan and Ian Karlin, A Semantics-Driven Approach to Improving DataRaceBench's OpenMP Standard Coverage, IWOMP 2018: 14th International Workshop on OpenMP, Barcelona, Spain, September 26-28, 2018, pdf, Version 1.2
  • Pei-Hung Lin, Chunhua Liao, Markus Schordan, Ian Karlin. Runtime and Memory Evaluation of Data Race Detection Tools. ISoLA (2) 2018: 179-196. pdf
  • Pei-Hung Lin, Chunhua Liao, Markus Schordan, Ian Karlin. Exploring Regression of Data Race Detection Tools Using DataRaceBench. 2019 IEEE/ACM 3rd International Workshop on Software Correctness for HPC Applications (Correctness), Denver, CO, USA, 2019, pp. 11-18. pdf, Presentation
  • Gleison Souza Diniz Mendonça, Chunhua Liao, and Fernando Magno Quintão Pereira. AutoParBench: a unified test framework for OpenMP-based parallelizers. In Proceedings of the 34th ACM International Conference on Supercomputing (ICS '20). Association for Computing Machinery, New York, NY, USA, Article 28, 1–10.pdf
  • Gaurav Verma, Yaying Shi, Chunhua Liao, Barbara Chapman, and Yonghong Yan, Enhancing DataRaceBench for Evaluating Data Race Detection Tools, International Workshop on Software Correctness for HPC Applications (Correctness) SC20, 2020 pdf, Version 1.3
  • Yaying Shi, Anjia Wang, Yonghong Yan and Chunhua Liao, RDS: A Cloud-Based Metaservice for Detecting Data Races in Parallel Programs, 14th IEEE/ACM International Conference on Utility and Cloud Computing, University of Leicester, Leicester, UK, December 6-9, 2021 pdf

dataracebench's People

Contributors

chunhualiao avatar hassansalehe avatar jprotze avatar lchengit avatar mschordan avatar peihunglin avatar xintin avatar yaying-llnl-summer avatar zygyz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dataracebench's Issues

pireduction

Is pireduction-orig-no really supposed to run 2 trillion times? Left the script running overnight and it's still working on it.

GPU data races

The next big things is to evaluate tools catching data races on GPUs. Even OpenMP CPU versions are perfectly fine, there are cases that the OpenMP compiler implementations introduce data races.

issues in DRB171-default-orig-no.c

int argh --> int argc

Please specify which variables are reported to cause data races, in the comments
This is a general need for all NPB kernels you added. Please fix this for all.

The code is very strange. why are r1 and r2 private?

cascaded data races

miniAMR data races reported by OMPRacer

1 #pragma omp parallel default(shared) \
2 private(i, j, k, bp)
3 {
4 for (in = 0; in < sorted_index[num_refine+1]; in++)
5 {
6 bp = &blocks[sorted_list[in].n];
7 for (i = 1; i <= x_block_size; i++)
8 for (j = 1; j <= y_block_size; j++)
9 for (k = 1; k <= z_block_size; k++)
10 work[i][j][k] =
11 (bp->array[var][i-1][j ][k ] +
12 bp->array[var][i ][j-1][k ] +
13 bp->array[var][i ][j ][k-1] +
14 bp->array[var][i ][j ][k ] +
15 bp->array[var][i ][j ][k+1] +
16 bp->array[var][i ][j+1][k ] +
17 bp->array[var][i+1][j ][k ])/7.0;
18 for (i = 1; i <= x_block_size; i++)
19 for (j = 1; j <= y_block_size; j++)
20 for (k = 1; k <= z_block_size; k++)
21 bp->array[var][i][j][k] = work[i][j][k];
22 }
23 }

data races

  • on loop index in at line 4: shared
  • bp->array: bp is initialized from variable in used as index. Data races on in: bp may obtain the same blocks entry, in the end bp->array may be the same for different iterations.

another example showing thread sensitivity

iteration 0 and 1 can have conflicting writes to A[0]. But if they are scheduled to be run by the same thread, dynamic tools may miss this.

1 #pragma omp parallel for shared(A)
2 for(int i = 0; i < 10; i++) {
3 A[i] = i;
4 if (i == 1) { A[0] = 1; }
5 }

Credit: OMPRacer.

DRB014-outofbounds-orig-yes.f95

Other than the C-version, this code actually performs out of allocation access. I think, the indexes in the matrix assignment should be changed.

Adding CI into this repo

We need to set up the CI Workflow to this repository so that contributors can continuously build and test the code to ensure that the commit doesn't introduce errors.

At least we should run some smoke test first.

double-check DRB172-parallel-orig-no.c

This code is too complex. Please reduce it further. You don't need the same statements show up 15+ times in the loop. One or two may be sufficient.

Which variables cause the tools to report data races?

loop variables j and k are shared in this code, they may cause true posive data races.

only loop variable i immediately after "omp for" will be private by some implicit rule.

I am not sure this code is really data race free.

DRB127-tasking-threadprivate-orig-yes.c is not a data race

You can argue that the code has a race condition, but clearly no data race.

The code should also have a taskwait before the if, otherwise all tasks might execute after the printf.
I suggest to use #pragma omp taskyield in place of the empty task.

./check-data-races.sh hangs on specific options

./check-data-races.sh --run runs to completion, albeit with an error: "jacobikernel-orig-no.c:130: undefined reference to `sqrt'".

But `./check-data-races.sh --helgrind just sits there forever. The file results/helgrind.csv seems to have all the benchmarks (as far as I can tell) but the script hangs with no response or output after the line "Saving results to results/helgrind.csv".

Running with --archer has the same effect.

This is on Ubuntu Linux 16.04.

make a new release by the end of summer

The new release should contain:

  • Fortran tests
  • OpenMP 5.0 tests
  • New tests from benchmarks and literatures
  • support customized file lists to define which programs to run
  • updated dashboard showing the state-of-art of available tools

an example showing strengh of static analysis

From OMPRacer paper:

1 int *A; int N;
2 load_from_input(A, &N);
3 #pragma omp parallel for shared(A)
4 for(int i = 0; i < N; i++) {
5 A[i] = i;
6 if (N > 10000) { A[0] = 1; }
7 }

Data races on A[0] happens only N is larger than a threshold.

Error with Tsan

I tried to use the command

./check-data-races.sh --tsan

And get an error message that Invalid tool name tsan

Correction in Ground Truth

Please correct Ground Truth for the followings: Remember to correct it for all four tools excel sheet. Otherwise we will report wrong results.

  • Archer Fortran
  • Tsan Fortran
  • ROMP Fortran
  • Inspector

DRB131-taskdep4-orig-yes-omp45.f95
DRB134-taskdep5-orig-yes-omp45.f95
DRB142-acquirerelease-orig-yes-omp50.f95
DRB165-taskdep4-orig-yes-omp50.f95
DRB168-taskdep5-orig-yes-omp50.f95

DRB126-firstprivatesections-orig-yes.c is not a data race

Although it is not specified which section executes first and prints 1, the single thread clearly will print 1 first and then 2.

If the sections execute on different threads, they don't access the same memory, and therefore also no data race.

DRB050-functionparameter-orig-no.f95 has a data race

The variable volnew_o8 is shared and clang-tsan/11 correctly reports a data race:

==================
WARNING: ThreadSanitizer: data race (pid=236697)
  Write of size 8 at 0x7ffdfda476b0 by thread T7:
    #0 __drb050_MOD_foo1._omp_fn.0 DRB050-functionparameter-orig-no.f95:25 (DRB050-functionparameter-orig-no.f95.tsan-clang.out+0x4b3b5a)
    #1 __kmp_GOMP_microtask_wrapper(int*, int*, void (*)(void*), void*) <null> (libomp.so+0x87362)
    #2 __drb050_MOD_foo1 DRB050-functionparameter-orig-no.f95:26 (DRB050-functionparameter-orig-no.f95.tsan-clang.out+0x4b36ce)

  Previous write of size 8 at 0x7ffdfda476b0 by main thread:
    #0 __drb050_MOD_foo1._omp_fn.0 micro-benchmarks-fortran/DRB050-functionparameter-orig-no.f95:25 (DRB050-functionparameter-orig-no.f95.tsan-clang.out+0x4b3b5a)
    #1 __kmp_api_GOMP_parallel_40_alias <null> (libomp.so+0x8bb82)
    #2 __drb050_MOD_foo1 micro-benchmarks-fortran/DRB050-functionparameter-orig-no.f95:26 (DRB050-functionparameter-orig-no.f95.tsan-clang.out+0x4b36ce)

  Location is stack of main thread.

  Location is global '??' at 0x7ffdfda28000 ([stack]+0x00000001f6b0)

  Thread T7 (tid=236705, running) created by main thread at:
    #0 pthread_create tsan_interceptors_posix.cpp:966:199 (DRB050-functionparameter-orig-no.f95.tsan-clang.out+0x444deb)
    #1 __kmp_create_worker <null> (libomp.so+0x83086)

SUMMARY: ThreadSanitizer: data race DRB050-functionparameter-orig-no.f95:25 in __drb050_MOD_foo1._omp_fn.0
==================
ThreadSanitizer: reported 1 warnings

After making the variable private, the report is gone

DRB094-doall2-ordered

Currently, this application has no memory access which would need any synchronization.
Further, this application has the additional issue of accessing uninitialized memory.

My suggestion is to change the code like:

int main()
{
  int i, j;
#pragma omp parallel for ordered(2)
  for (i = 0; i < 100; i++)
    for (j = 0; j < 100; j++)
    {
      a[i][j] = i + j;
#pragma omp ordered depend(sink:i-1,j) depend (sink:i,j-1)
      if (i>0 && j>0)
        a[i][j] = (a[i-1][j] + a[i][j-1] + a[i][j]) / 3;
      printf ("test i=%d j=%d\n",i,j);
#pragma omp ordered depend(source)
    }
  printf ("test a[99][99]=%d\n",a[99][99]);
  return 0;
}

The array is properly initialized. The assignment in the ordered region accesses the array fields, that are synchronized by the depend clauses. If the OpenMP implementation respects the depend clause on the ordered construct, the result should always be 196.

metric.py misses first benchmark

Without any failure, tools fail rate is not 1.0 (btw: how is this fail rate and not success rate?):

total test case is  172
compiler segmentation fault is  0
runtime segmentation fault is  0
compiler time out is  0
runtime time out is  0
tools fail rate is  0.9941860465116279
false positive is  0
true positive is  67
true negative is  89
false negative is  15

Also the last four number dont add up to 172.

When I change the reported data race in the cvs file for the first benchmark, the calculate metrics do not change.
I think for i in range(1,len(Ntruth)): in line 60 should be for i in range(len(Ntruth)):

Another minor issue:
Precision and Recall might be 0/0, which results in a python runtime error.

DRB122-taskundeferred-orig-no.c has a data race

Firstly, if the intention is to have undeferred tasks, the clause needs to be if(0).

In any case, you can simply remove the whole task region and will immediatlely see the data race on var.

Remove unused variables and code in NPB kernels

There are quite some unused variables in these new NPB kernels, please remove them.

One example: https://github.com/LLNL/dataracebench/edit/master/micro-benchmarks/DRB173-threadprivate3-orig-no.c
m and n are not used.

Please try to have high-quality of code extracted, minimal to reproduce the issues without compiler warnings (about unused variables) or errors.

Minimal means no unecessary statements which do not touch the variables reported to trigger data races.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.