anl-cesar / xsbench Goto Github PK

View Code? Open in Web Editor NEW

68.0 68.0 51.0 7.79 MB

XSBench: The Monte Carlo Macroscopic Cross Section Lookup Benchmark

License: MIT License

C 49.91% Makefile 2.98% Cuda 22.01% C++ 25.09%

xsbench's People

Contributors

Stargazers

Watchers

xsbench's Issues

Segmentation fault in openmp-threading with XL size

Running the v19 release of openmp-threading with -s XL results in a SIGSEGV indexing into SD.index_grid at GridInit.c:109.

I will also note the spelling typo in "Intializing" (instead of "Initializing").

Thanks.

Intializing nuclide grids...
Intializing unionized grid...

Program received signal SIGSEGV, Segmentation fault.
grid_init_do_not_profile (in=..., mype=0) at GridInit.c:109
109						SD.index_grid[e * in.n_isotopes + i] = idx_low[i];
(gdb) bt
#0  grid_init_do_not_profile (in=..., mype=0) at GridInit.c:109
#1  0x0000000000401219 in main (argc=3, argv=0x7fffffffd2d8) at Main.c:47
(gdb) print e
$7 = 3445050
(gdb) print in.n_isotopes 
$8 = 355
(gdb) print i
$9 = 142
(gdb) print e * in.n_isotopes + i
$10 = 1222992892
(gdb) print SD.index_grid
$11 = (int *) 0x7ffed375c010
(gdb) print SD.index_grid
$12 = (int *) 0x7ffed375c010

Compilation error

Hello,

I am one of the maintainer of the SMPI simulator, a tool allowing to run real MPI application on top of simulated infrastructure. More info here, in particular on slides 45 and 46.

We run nightly tests against most of the applications listed as a ProxyApp by the Exascale Project, and as you certainly know, XSBench is one of them. For more info, head to our compilation farm.

Since approximately 2 days, your code fails to compile with the following error:

smpicc -std=gnu99 -Wall -fopenmp -flto -O3 -DMPI -c CalculateXS.c -o CalculateXS.o
/home/ci/workspace/SMPI-proxy-apps/build_mode/SMPI/label/proxy-apps/Benchmarks/ECP/XSBench/src/Main.c: In function ‘main’:
/home/ci/workspace/SMPI-proxy-apps/build_mode/SMPI/label/proxy-apps/Benchmarks/ECP/XSBench/src/Main.c:40:6: error: ‘thread’ undeclared (first use in this function)
  if( thread == 0 )
      ^~~~~~
/home/ci/workspace/SMPI-proxy-apps/build_mode/SMPI/label/proxy-apps/Benchmarks/ECP/XSBench/src/Main.c:40:6: note: each undeclared identifier is reported only once for each function it appears in
/home/ci/workspace/SMPI-proxy-apps/build_mode/SMPI/label/proxy-apps/Benchmarks/ECP/XSBench/src/Main.c:71:8: warning: unused variable ‘index_data’ [-Wunused-variable]
  int * index_data = NULL;
        ^~~~~~~~~~
/home/ci/workspace/SMPI-proxy-apps/build_mode/SMPI/label/proxy-apps/Benchmarks/ECP/XSBench/src/Main.c:19:13: warning: unused variable ‘stat’ [-Wunused-variable]
  MPI_Status stat;
             ^~~~
Makefile:129: recipe for target 'Main.o' failed

I suspect that this is related to the changes that you introduced recently, isn't it?

Thanks a lot for maintaining this proxy app, that's really precious for the tool makers.
Martin.

Verification checksum is invalid when compiling with gnu/8.3.1

Hi, I am compiling XSBench with gcc/8.3.1 the offload version. The compilation finishes with no warnings/errors but when executing the application I am getting the following output:

Simulation complete.

Runtime: 3.067 seconds
Lookups: 17,000,000
Lookups/s: 5,543,286
Verification checksum: 0 (WARNING - INAVALID CHECKSUM!)

The verification checksum is invalid.

code returns 1 even after success

Hello,
It looks like since a23df67 only the first process will validate its results and return 0 in the end in case of success, all of the others will return 1, which could be misinterpreted by some wrappers.

request change to openmp-offload/Makefile for -march=

commit c7a9218bda7c31238726bda6c96c2066cbf97478 (HEAD -> amd-omp-arch)
Author: Ron Lieberman [email protected]
Date: Thu Nov 25 02:49:16 2021 +0000

provide evar to override -march for amd

diff --git a/openmp-offload/Makefile b/openmp-offload/Makefile
index f3aae5c..dac93b7 100644
--- a/openmp-offload/Makefile
+++ b/openmp-offload/Makefile
@@ -67,7 +67,8 @@ endif

AOMP Targeting MI100 -- Change march to Target Other GPUs

ifeq ($(COMPILER),amd)
CC = clang

CFLAGS += -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx908

AOMP_GPU ?= gfx908
CFLAGS += -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=$(AOMP_GPU)
endif

Debug Flags

Invalidate checksum with openmp-threading with -p and -g flag

When I ran ./XSbench command in openmp-threading, its checksum was valid.
But if I change Gridpoints or Histories with -p or -g flag, its checksum became invalid.

The values I used for grid points and histories as below:
./XSBench -g 130000 -p 30000000

Even if I tested with other values for -g or -p, its checksum still invalid.
Is there any rule to set up -g or -p value? Or Is there any pre-requirements before setting grid points or histories?

Verification checksum is invalid for offloading with Cray clang 11 targeting AMD GFX908

I am using XSBench to give toolchains on Spock a try. Building openmp-offload with Cray clang version 12.0.1 and MPICH 8.1.4, with PE_MPICH_GTL_LIBS_amd_gfx908="-lmpi_gtl_hsa", XSBench seems to run fine except for,

================================================================================
                                  INPUT SUMMARY
================================================================================
Programming Model:            OpenMP Target Offloading
Simulation Method:            Event Based
Grid Type:                    Unionized Grid
Materials:                    12
H-M Benchmark Size:           large
Total Nuclides:               355
Gridpoints (per Nuclide):     11,303
Unionized Energy Gridpoints:  4,012,565
Total XS Lookups:             17,000,000
MPI Ranks:                    1
Mem Usage per MPI Rank (MB):  5,649
Binary File Mode:             Off

...

================================================================================
                                     RESULTS
================================================================================
MPI ranks:   1
Total Lookups/s:            8,895,331
Avg Lookups/s per MPI rank: 8,895,331
Verification checksum: 904279 (WARNING - INAVALID CHECKSUM!)

The executable was linked by,

cc -std=gnu99 -Wall -I/sw/spock/spack-envs/views/rocm-4.1.0/include -fopenmp -D__STRICT_ANSI__  -g -pg -O3 -DMPI Main.o io.o Simulation.o GridInit.o XSutils.o Materials.o -o XSBench -lm -L/sw/spock/spack-envs/views/rocm-4.1.0/lib -lamdhip64 -lhsa-runtime64 -g -pg

As well as the expected output, it also prints a,

srun: error: spock31: task 0: Exited with exit code 1

(with the "31" varying) but I guess this simply reflects the checksum invalidity.

On a different Cray AMD GFX908 system, the offload version of XSBench works fine with ROCm.

On the system with the failing checksum, https://github.com/jtramm/omp_target_issues/tree/master/simple built with,

cc -fopenmp -I/sw/spock/spack-envs/views/rocm-4.1.0/include -L/sw/spock/spack-envs/views/rocm-4.1.0/lib -lamdhip64 -lhsa-runtime64 main.cpp -o test

repeats a few, "Hello world from accelerator."

Is there anything I can provide to help you diagnose the issue? Would it be better for me to reach out to the Spock admins?

Result of computation is never checked -> optimising compilers skew results

I manually added the LTO options to CFLAGS / LDFLAGS on GCC and the compiler is smart enough to throw away the computation.

The issue is that in Main.c calculate_macro_xs, the return value (in macro_xs_vector) is never checked in the main function. If I force the generation of the results through asm volatile (""::"m"(macro_xs_vector[0]),"m"(macro_xs_vector[1]), ...) the results change significantly:
Comparing lookups/s on a three core machine:

$ while true; do res1=$(./XSBench -s small | awk '/Lookups.s:/ {print $2}'); res2=$(./XSBench.force_use -s small | awk '/Lookups.s:/ {print $2}'); echo $res1 $res2; done
6,142,927 920,383
5,513,363 983,074
5,243,478 991,507

Shows a over 6x speed increase due to the optimised out computation in the existing code. The numbers on the right are very similar to what I get if I disable LTO.
Please fix the usage of the results, through either this mechanism (empty asm volatile consuming the data), or by keeping a running sum of the results etc.

verification result is fixed

There is a comment: zjin-lcf/HeCBench#30
I'd better seek your suggestion and comment. Thanks.

UEG Alignment - BG/Q - OpenMP Allocation Fail

When running with -p 3 or -p4 on BG/Q, XSBench crashes. Following message shows up in error log:

libgomp: Out of memory allocating 8512 bytes

Or some similar number.

This doesn't show up when the executable is qsub'd directly. Only happens in script mode.

Seems to be fixed if I just do a serial UEG alignment (commenting out the #pragmas for the UEG portion).

Tried increasing omp stacksize by passing the additional environment variable OMP_STACKSIZE=2G, but this didn't help anything.

Need to figure out what the deal here is, as I'm wasting a lot of proc time here.

Unchecked malloc in GridInit causes segfaults

There are multiple unchecked mallocs in generate_energy_grid that can return NULL and then cause segfault when initialising the grid. The one I stumbled upon was the one on line 98:

int * full = (int *) malloc( n_isotopes ...);

Could you please check the return values of the mallocs and bail?
Thanks!

XSBench: GridInit.c:91: grid_init_do_not_profile: Assertion `SD.index_grid != NULL' failed

when I running the program,report this,why

Running hip on Crusher up to 2.2B lookups

Looking for guidance when running on Crusher for different workloads.
The hip variant runs fine up to 2.2B lookups.

To reproduce use the srun command within a BATCH script:
srun -n1 --ntasks-per-node=1 --gpus-per-task=1 --gpu-bind=closest $XSBENCH_HIP -l 2200000000 -m event

where $XSBench_HIP is the hip executable built with rocm-hip/4.3.0 module.
It only prints the usage instructions. XSBench runs find for smaller numbers (e.g. 2B lookups).

My question is: is this supposed to be an "real" science case run? Am I hitting a memory limit in event mode?
Any help is appreciated.

ppc64le: Random "killed" during Aligning Unionized Grid

Hi, I'm on a ppc64le system using MPI compiled with xlc/xlC/xlf. My Makefile configuration is as follows:

COMPILER    = gnu
OPTIMIZE    = yes
DEBUG       = yes
PROFILE     = no
MPI         = yes
PAPI        = no
VEC_INFO    = no
VERIFY      = no
BENCHMARK   = no
BINARY_DUMP = no
BINARY_READ = no

(leaving "gnu" as the compiler doesn't seem to matter, since "MPI = yes" causes "cc = mpicc")

It compiles successfully, but when I run, it will sometimes print "killed" during Aligning Unionized Grid at random percentages typically above 50%.

This seems not to happen when the gcc compiler is used (no MPI).

Running gdb has not offered much insight. Here is the output of a failure while running gdb:

[u0017592@sys-85165 src]$ gdb ./XSBench
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-64.ael7b
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "ppc64le-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/u0017592/projects/ase-3.13.0b1/XSBench/src/XSBench...done.
(gdb) run
Starting program: /home/u0017592/projects/ase-3.13.0b1/XSBench/src/./XSBench
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/power8/libthread_db.so.1".
Detaching after fork from child process 24746.
[New Thread 0x3fffb72cf0d0 (LWP 24750)]
[New Thread 0x3fffb696f0d0 (LWP 24751)]
================================================================================
                   __   __ ___________                 _
                   \ \ / //  ___| ___ \               | |
                    \ V / \ `--.| |_/ / ___ _ __   ___| |__
                    /   \  `--. \ ___ \/ _ \ '_ \ / __| '_ \
                   / /^\ \/\__/ / |_/ /  __/ | | | (__| | | |
                   \/   \/\____/\____/ \___|_| |_|\___|_| |_|

================================================================================
                    Developed at Argonne National Laboratory
                                   Version: 13
================================================================================
                                  INPUT SUMMARY
================================================================================
Materials:                    12
H-M Benchmark Size:           large
Total Nuclides:               355
Gridpoints (per Nuclide):     11,303
Unionized Energy Gridpoints:  4,012,565
XS Lookups:                   15,000,000
MPI Ranks:                    1
OMP Threads per MPI Rank:     2
Mem Usage per MPI Rank (MB):  5,678
================================================================================
                                 INITIALIZATION
================================================================================
Generating Nuclide Energy Grids...
Sorting Nuclide Energy Grids...
Generating Unionized Energy Grid...
Copying and Sorting all nuclide grids...
Assigning energies to unionized grid...
Assigning pointers to Unionized Energy Grid...
[New Thread 0x3fffb591f0d0 (LWP 24752)]
Aligning Unionized Grid[Thread 0x3fffb7ff79d0 (LWP 24743) exited]
[Thread 0x3fffb591f0d0 (LWP 24752) exited]
[Thread 0x3fffb696f0d0 (LWP 24751) exited]
[Inferior 1 (process 24743) exited with code 0315]
Missing separate debuginfos, use: debuginfo-install glibc-2.17-78.ael7b.ppc64le libgcc-4.8.3-9.ael7b.ppc64le libstdc++-4.8.3-9.ael7b.ppc64le

The above run offered no backtrace. I ran it again, though, and got slightly different results as well as a backtrace:

...

Assigning pointers to Unionized Energy Grid...
[New Thread 0x3fffb591f0d0 (LWP 24899)]
Alig^Cng Unionized Grid...(12% complete)
Program received signal SIGINT, Interrupt.
binary_search (A=0x0, quarry=11.743114876173937, n=11303) at XSutils.c:53
53              if( A[0].energy > quarry )
Missing separate debuginfos, use: debuginfo-install glibc-2.17-78.ael7b.ppc64le libgcc-4.8.3-9.ael7b.ppc64le libstdc++-4.8.3-9.ael7b.ppc64le
(gdb) bt
#0  binary_search (A=0x0, quarry=11.743114876173937, n=11303) at XSutils.c:53
#1  0x000000001000480c in set_grid_ptrs$_$OL$_$1$_$OL$_$2 () at GridInit.c:145
#2  0x0000000010004644 in set_grid_ptrs$_$OL$_$1 () at GridInit.c:134
#3  0x00003fffb7bec52c in _lomp_Parallel_StartDefault () from /opt/ibm/lib/libxlsmp.so.1
#4  0x0000000010004554 in set_grid_ptrs (energy_grid=0x3fffac2c0010, nuclide_grids=0x101377a0, n_isotopes=355, n_gridpoints=11303)
    at GridInit.c:131
#5  0x0000000010001858 in main (argc=1, argv=0x3ffffffff4b8) at Main.c:85

One more run:

Assigning pointers to Unionized Energy Grid...
[New Thread 0x3fffb591f0d0 (LWP 24928)]
Alignin[Thread 0x3fffb591f0d0 (LWP 24928) exited]
[Thread 0x3fffb696f0d0 (LWP 24927) exited]
[Thread 0x3fffb72cf0d0 (LWP 24926) exited]

Program terminated with signal SIGKILL, Killed.
The program no longer exists.

Readme History/Event-based documentation is unclear

Currently the history and event-based documentation in the readme is very terse. It is currently describing the differences in history vs. event-based for a full MC application, but does not describe how this is represented in XSBench. Need to add a pseudo code for event-based XSBench itself, and how this relates to the full application.

SYCL "simulation only" runtime statistics misleading

For the SYCL version, we are currently reporting runtime statistics for both the kernel initialization / JIT compiling as well as the actual execution. This may result in some issues on certain systems, e.g.:

Total Time Statistics (SYCL+OpenCL Init / JIT Compilation + Simulation Kernel)
Runtime:                XXXXXXX seconds
Lookups:               XXXXXXXXXX
Lookups/s:            XXXXXXXXXX
Simulation Kernel Only Statistics
Runtime:               0.00001 seconds
Lookups/s:             1,000,000,000,000,000
Verification checksum: (Valid)

Timing these things as we are now included some assumptions as to the asynchronous behavior of SYCL that do not appear to be true in all cases with all compilers on all machines. Instead, we should just time only the total runtime.

anl-cesar / xsbench Goto Github PK

xsbench's People

Contributors

Stargazers

Watchers

Forkers

xsbench's Issues

AOMP Targeting MI100 -- Change march to Target Other GPUs

Debug Flags

Recommend Projects

Recommend Topics

Recommend Org