e4s-project / testsuite Goto Github PK

E4S test suite with validation tests

License: MIT License

Makefile 0.35% Fortran 74.22% C 19.25% Shell 0.40% Roff 0.01% Modula-3 0.01% SourcePawn 0.01% Awk 0.01% C++ 4.65% Python 0.15% CMake 0.47% TeX 0.10% Batchfile 0.01% Cap'n Proto 0.01% SWIG 0.01% GLSL 0.01% Cuda 0.04% Objective-C++ 0.37% Pascal 0.01% BitBake 0.01%

testsuite's People

Contributors

Stargazers

Watchers

testsuite's Issues

superlu fails on JLSE

The superlu test fails at the compile stage with the following error:

/usr/bin/ld: cannot find -lopenblas

I believe this occurs due to the test assuming that superlu is built with openblas:

testsuite/validation_tests/superlu/compile.sh

Line 7 in 0e7604e

 ${TEST_CC} -g ./c_sample.c -I${SUPERLU_ROOT}/include/ -L${SUPERLU_ROOT}/lib64 -L${SUPERLU_ROOT}/lib -L${OPENBLAS_ROOT}/lib -lsuperlu -lopenblas -lm -o c_sample 

So if superlu is built against any other BLAS provider the test fails.

Test issue: legion - Illegal instruction (core dumped) when running run.sh validation test

Summary:
After building spec according to spack.yaml, legion's validation test fails at run time when executing run.sh due to an "Illegal instruction"

Steps to reproduce:

$ ./setup.sh
$ ./clean.sh
$ ./compile.sh
$ ./run.sh

Error output:

sbosch@instinct:~/instinctE4S-22.05/testsuite/validation_tests/legion$ ./run.sh
4sjbscc
g3tu5cd
++ mktemp ./tmp.XXXXXXX
+ TMPFILE=./tmp.3Paqy6j
+ cd build
+ ./legion
./run.sh: line 10: 3983381 Illegal instruction     (core dumped) ./legion > ${TMPFILE}

spack debug report:

Spack: 0.18.0 (c09bf37ff690c29779a342670cf8a171ad1b9233)
Python: 3.8.10
Platform: linux-ubuntu20.04-broadwell
Concretizer: clingo

Helpful files:
(Changed to .txt extension for upload)
run.sh.txt
spack.yaml.txt

Additional Info:

Test matches the test from the legion source in legion/tutorial/08_multiple_partitions/multiple_partitions.cc

Package Maintainers and Interested Parties:

Veloc test fails on perlmutter

@gonsie
@vsoch

The veloc standalone test defined here: https://github.com/E4S-Project/testsuite/tree/master/validation_tests/veloc fails when run on the veloc installed as part of the e4s 22.11 deployment on perlmutter using these variants:

-- linux-sles15-zen3 / [email protected] -------------------------------
44htwoe [email protected]~ipo build_system=cmake build_type=RelWithDebInfo

With this console output:

REDSET 0.1.0 ABORT: rank 0 on nid001901: XOR requires at least 2 ranks per set, but found 1 rank(s) in set @ /tmp/lpeyrala/spack-stage/spack-stage-redset-0.2.0-4rss7cokqukwuvzvzlymxazgem3a6gim/spack-src/src/redset_xor.c:157
MPICH ERROR [Rank 0] [job id 3921177.7] [Tue Dec  6 14:52:36 2022] [nid001901] - Abort(-1) (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0

aborting job:
application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0
srun: error: nid001901: task 0: Exited with exit code 255
srun: launch/slurm: _step_signal: Terminating StepId=3921177.7
slurmstepd: error: *** STEP 3921177.7 ON nid001901 CANCELLED AT 2022-12-06T22:52:36 ***
srun: error: nid001901: task 1: Terminated
srun: Force Terminated StepId=3921177.7

Updating to the latest heatdis_mem.c included with Veloc 1.5 resulted in this runtime error output:

[FATAL 0] [/tmp/lpeyrala/spack-stage/spack-stage-veloc-1.5-44htwoezm4qmelqjkr52pc3r3e4bqm4j/spack-src/src/lib/client.cpp:57:client_impl_t] MPI threaded mode requested but not available, please use MPI_Init_thread
[FATAL 0] [/tmp/lpeyrala/spack-stage/spack-stage-veloc-1.5-44htwoezm4qmelqjkr52pc3r3e4bqm4j/spack-src/src/lib/client.cpp:57:client_impl_t] MPI threaded mode requested but not available, please use MPI_Init_thread
[FATAL 0] [/tmp/lpeyrala/spack-stage/spack-stage-veloc-1.5-44htwoezm4qmelqjkr52pc3r3e4bqm4j/spack-src/src/lib/client.cpp:57:client_impl_t] MPI threaded mode requested but not available, please use MPI_Init_thread
[FATAL 0] [/tmp/lpeyrala/spack-stage/spack-stage-veloc-1.5-44htwoezm4qmelqjkr52pc3r3e4bqm4j/spack-src/src/lib/client.cpp:57:client_impl_t] MPI threaded mode requested but not available, please use MPI_Init_thread

superlu test failed compilation

CDASH: https://my.cdash.org/test/63278701

buildspec: https://github.com/buildtesters/buildtest-nersc/blob/devel/buildspecs/e4s/E4S-Testsuite/perlmutter/22.05/superlu.yml

superlu : b6f4pqc
Cleaning /global/cfs/cdirs/m3503/buildtest/runs/perlmutter_check/2022-09-28/perlmutter.slurm.regular/superlu/superlu_e4s_testsuite_22.05/a16c8446/stage/testsuite/validation_tests/superlu
---CLEANUP LOG---
Compiling /global/cfs/cdirs/m3503/buildtest/runs/perlmutter_check/2022-09-28/perlmutter.slurm.regular/superlu/superlu_e4s_testsuite_22.05/a16c8446/stage/testsuite/validation_tests/superlu
---COMPILE LOG---
Skipping load: Environment already setup
+ cc -g ./c_sample.c -I/global/common/software/spackecp/perlmutter/e4s-22.05/software/cray-sles15-zen3/cce-13.0.2/superlu-5.3.0-b6f4pqcphndaspnzvlpjog6wxjhib4ff/include/ -L/global/common/software/spackecp/perlmutter/e4s-22.05/software/cray-sles15-zen3/cce-13.0.2/superlu-5.3.0-b6f4pqcphndaspnzvlpjog6wxjhib4ff/lib64 -L/global/common/software/spackecp/perlmutter/e4s-22.05/software/cray-sles15-zen3/cce-13.0.2/superlu-5.3.0-b6f4pqcphndaspnzvlpjog6wxjhib4ff/lib -L/lib -lsuperlu -lopenblas -lm -o c_sample
/usr/bin/ld: /global/common/software/spackecp/perlmutter/e4s-22.05/software/cray-sles15-zen3/cce-13.0.2/superlu-5.3.0-b6f4pqcphndaspnzvlpjog6wxjhib4ff/lib/libsuperlu.a(util.c.o): in function `ifill':
/global/common/software/spackecp/perlmutter/e4s-22.05/spack/var/spack/stage/spack-stage-superlu-5.3.0-b6f4pqcphndaspnzvlpjog6wxjhib4ff/spack-src/SRC/util.c:392: undefined reference to `__cray_sset_detect'
/usr/bin/ld: /global/common/software/spackecp/perlmutter/e4s-22.05/spack/var/spack/stage/spack-stage-superlu-5.3.0-b6f4pqcphndaspnzvlpjog6wxjhib4ff/spack-src/SRC/util.c:392: undefined reference to `__cray_sset_detect'
/usr/bin/ld: /global/common/software/spackecp/perlmutter/e4s-22.05/software/cray-sles15-zen3/cce-13.0.2/superlu-5.3.0-b6f4pqcphndaspnzvlpjog6wxjhib4ff/lib/libsuperlu.a(memory.c.o): in function `intCalloc':
/global/common/software/spackecp/perlmutter/e4s-22.05/spack/var/spack/stage/spack-stage-superlu-5.3.0-b6f4pqcphndaspnzvlpjog6wxjhib4ff/spack-src/SRC/memory.c:161: undefined reference to `__cray_sset_detect'
/usr/bin/ld: /global/common/software/spackecp/perlmutter/e4s-22.05/software/cray-sles15-zen3/cce-13.0.2/superlu-5.3.0-b6f4pqcphndaspnzvlpjog6wxjhib4ff/lib/libsuperlu.a(csnode_dfs.c.o): in function `csnode_dfs':
/global/common/software/spackecp/perlmutter/e4s-22.05/spack/var/spack/stage/spack-stage-superlu-5.3.0-b6f4pqcphndaspnzvlpjog6wxjhib4ff/spack-src/SRC/csnode_dfs.c:111: undefined reference to `__cray_sset_detect'
/usr/bin/ld: /global/common/software/spackecp/perlmutter/e4s-22.05/software/cray-sles15-zen3/cce-13.0.2/superlu-5.3.0-b6f4pqcphndaspnzvlpjog6wxjhib4ff/lib/libsuperlu.a(cmemory.c.o): in function `complexCalloc':
/global/common/software/spackecp/perlmutter/e4s-22.05/spack/var/spack/stage/spack-stage-superlu-5.3.0-b6f4pqcphndaspnzvlpjog6wxjhib4ff/spack-src/SRC/cmemory.c:695: undefined reference to `__cray_dset_detect'
/usr/bin/ld: /global/common/software/spackecp/perlmutter/e4s-22.05/software/cray-sles15-zen3/cce-13.0.2/superlu-5.3.0-b6f4pqcphndaspnzvlpjog6wxjhib4ff/lib/libsuperlu.a(cutil.c.o): in function `cfill':
/global/common/software/spackecp/perlmutter/e4s-22.05/spack/var/spack/stage/spack-stage-superlu-5.3.0-b6f4pqcphndaspnzvlpjog6wxjhib4ff/spack-src/SRC/cutil.c:398: undefined reference to `__cray_dset_detect'
/usr/bin/ld: /global/common/software/spackecp/perlmutter/e4s-22.05/software/cray-sles15-zen3/cce-13.0.2/superlu-5.3.0-b6f4pqcphndaspnzvlpjog6wxjhib4ff/lib/libsuperlu.a(get_perm_c.c.o): in function `getata':
/global/common/software/spackecp/perlmutter/e4s-22.05/spack/var/spack/stage/spack-stage-superlu-5.3.0-b6f4pqcphndaspnzvlpjog6wxjhib4ff/spack-src/SRC/get_perm_c.c:106: undefined reference to `__cray_sset_detect'
/usr/bin/ld: /global/common/software/spackecp/perlmutter/e4s-22.05/spack/var/spack/stage/spack-stage-superlu-5.3.0-b6f4pqcphndaspnzvlpjog6wxjhib4ff/spack-src/SRC/get_perm_c.c:140: undefined reference to `__cray_sset_detect'
/usr/bin/ld: /global/common/software/spackecp/perlmutter/e4s-22.05/spack/var/spack/stage/spack-stage-superlu-5.3.0-b6f4pqcphndaspnzvlpjog6wxjhib4ff/spack-src/SRC/get_perm_c.c:173: undefined reference to `__cray_sset_detect'
/usr/bin/ld: /global/common/software/spackecp/perlmutter/e4s-22.05/software/cray-sles15-zen3/cce-13.0.2/superlu-5.3.0-b6f4pqcphndaspnzvlpjog6wxjhib4ff/lib/libsuperlu.a(get_perm_c.c.o): in function `at_plus_a':
/global/common/software/spackecp/perlmutter/e4s-22.05/spack/var/spack/stage/spack-stage-superlu-5.3.0-b6f4pqcphndaspnzvlpjog6wxjhib4ff/spack-src/SRC/get_perm_c.c:240: undefined reference to `__cray_sset_detect'
/usr/bin/ld: /global/common/software/spackecp/perlmutter/e4s-22.05/spack/var/spack/stage/spack-stage-superlu-5.3.0-b6f4pqcphndaspnzvlpjog6wxjhib4ff/spack-src/SRC/get_perm_c.c:269: undefined reference to `__cray_sset_detect'
/usr/bin/ld: /global/common/software/spackecp/perlmutter/e4s-22.05/software/cray-sles15-zen3/cce-13.0.2/superlu-5.3.0-b6f4pqcphndaspnzvlpjog6wxjhib4ff/lib/libsuperlu.a(get_perm_c.c.o):/global/common/software/spackecp/perlmutter/e4s-22.05/spack/var/spack/stage/spack-stage-superlu-5.3.0-b6f4pqcphndaspnzvlpjog6wxjhib4ff/spack-src/SRC/get_perm_c.c:306: more undefined references to `__cray_sset_detect' follow
/usr/bin/ld: /global/common/software/spackecp/perlmutter/e4s-22.05/software/cray-sles15-zen3/cce-13.0.2/superlu-5.3.0-b6f4pqcphndaspnzvlpjog6wxjhib4ff/lib/libsuperlu.a(colamd.c.o): in function `colamd_set_defaults':
/global/common/software/spackecp/perlmutter/e4s-22.05/spack/var/spack/stage/spack-stage-superlu-5.3.0-b6f4pqcphndaspnzvlpjog6wxjhib4ff/spack-src/SRC/colamd.c:960: undefined reference to `__cray_dset_detect'
/usr/bin/ld: /global/common/software/spackecp/perlmutter/e4s-22.05/software/cray-sles15-zen3/cce-13.0.2/superlu-5.3.0-b6f4pqcphndaspnzvlpjog6wxjhib4ff/lib/libsuperlu.a(colamd.c.o): in function `symamd':
/global/common/software/spackecp/perlmutter/e4s-22.05/spack/var/spack/stage/spack-stage-superlu-5.3.0-b6f4pqcphndaspnzvlpjog6wxjhib4ff/spack-src/SRC/colamd.c:1022: undefined reference to `__cray_sset_detect'
/usr/bin/ld: /global/common/software/spackecp/perlmutter/e4s-22.05/software/cray-sles15-zen3/cce-13.0.2/superlu-5.3.0-b6f4pqcphndaspnzvlpjog6wxjhib4ff/lib/libsuperlu.a(colamd.c.o): in function `colamd_set_defaults':
/global/common/software/spackecp/perlmutter/e4s-22.05/spack/var/spack/stage/spack-stage-superlu-5.3.0-b6f4pqcphndaspnzvlpjog6wxjhib4ff/spack-src/SRC/colamd.c:960: undefined reference to `__cray_dset_detect'
/usr/bin/ld: /global/common/software/spackecp/perlmutter/e4s-22.05/software/cray-sles15-zen3/cce-13.0.2/superlu-5.3.0-b6f4pqcphndaspnzvlpjog6wxjhib4ff/lib/libsuperlu.a(colamd.c.o): in function `symamd':
/global/common/software/spackecp/perlmutter/e4s-22.05/spack/var/spack/stage/spack-stage-superlu-5.3.0-b6f4pqcphndaspnzvlpjog6wxjhib4ff/spack-src/SRC/colamd.c:1100: undefined reference to `__cray_sset_detect'
/usr/bin/ld: /global/common/software/spackecp/perlmutter/e4s-22.05/spack/var/spack/stage/spack-stage-superlu-5.3.0-b6f4pqcphndaspnzvlpjog6wxjhib4ff/spack-src/SRC/colamd.c:1226: undefined reference to `__cray_sset_detect'
/usr/bin/ld: /global/common/software/spackecp/perlmutter/e4s-22.05/software/cray-sles15-zen3/cce-13.0.2/superlu-5.3.0-b6f4pqcphndaspnzvlpjog6wxjhib4ff/lib/libsuperlu.a(colamd.c.o): in function `colamd':
/global/common/software/spackecp/perlmutter/e4s-22.05/spack/var/spack/stage/spack-stage-superlu-5.3.0-b6f4pqcphndaspnzvlpjog6wxjhib4ff/spack-src/SRC/colamd.c:1357: undefined reference to `__cray_sset_detect'
/usr/bin/ld: /global/common/software/spackecp/perlmutter/e4s-22.05/software/cray-sles15-zen3/cce-13.0.2/superlu-5.3.0-b6f4pqcphndaspnzvlpjog6wxjhib4ff/lib/libsuperlu.a(colamd.c.o): in function `colamd_set_defaults':
/global/common/software/spackecp/perlmutter/e4s-22.05/spack/var/spack/stage/spack-stage-superlu-5.3.0-b6f4pqcphndaspnzvlpjog6wxjhib4ff/spack-src/SRC/colamd.c:960: undefined reference to `__cray_dset_detect'
/usr/bin/ld: /global/common/software/spackecp/perlmutter/e4s-22.05/software/cray-sles15-zen3/cce-13.0.2/superlu-5.3.0-b6f4pqcphndaspnzvlpjog6wxjhib4ff/lib/libsuperlu.a(colamd.c.o): in function `init_scoring':
/global/common/software/spackecp/perlmutter/e4s-22.05/spack/var/spack/stage/spack-stage-superlu-5.3.0-b6f4pqcphndaspnzvlpjog6wxjhib4ff/spack-src/SRC/colamd.c:1940: undefined reference to `__cray_sset_detect'
/usr/bin/ld: /global/common/software/spackecp/perlmutter/e4s-22.05/software/cray-sles15-zen3/cce-13.0.2/superlu-5.3.0-b6f4pqcphndaspnzvlpjog6wxjhib4ff/lib/libsuperlu.a(csp_blas2.c.o): in function `sp_cgemv':
/global/common/software/spackecp/perlmutter/e4s-22.05/spack/var/spack/stage/spack-stage-superlu-5.3.0-b6f4pqcphndaspnzvlpjog6wxjhib4ff/spack-src/SRC/csp_blas2.c:518: undefined reference to `__cray_sset_detect'
/usr/bin/ld: /global/common/software/spackecp/perlmutter/e4s-22.05/software/cray-sles15-zen3/cce-13.0.2/superlu-5.3.0-b6f4pqcphndaspnzvlpjog6wxjhib4ff/lib/libsuperlu.a(sp_coletree.c.o): in function `mxCallocInt':
/global/common/software/spackecp/perlmutter/e4s-22.05/spack/var/spack/stage/spack-stage-superlu-5.3.0-b6f4pqcphndaspnzvlpjog6wxjhib4ff/spack-src/SRC/sp_coletree.c:69: undefined reference to `__cray_sset_detect'
/usr/bin/ld: /global/common/software/spackecp/perlmutter/e4s-22.05/spack/var/spack/stage/spack-stage-superlu-5.3.0-b6f4pqcphndaspnzvlpjog6wxjhib4ff/spack-src/SRC/sp_coletree.c:69: undefined reference to `__cray_sset_detect'
/usr/bin/ld: /global/common/software/spackecp/perlmutter/e4s-22.05/spack/var/spack/stage/spack-stage-superlu-5.3.0-b6f4pqcphndaspnzvlpjog6wxjhib4ff/spack-src/SRC/sp_coletree.c:69: undefined reference to `__cray_sset_detect'
/usr/bin/ld: /global/common/software/spackecp/perlmutter/e4s-22.05/software/cray-sles15-zen3/cce-13.0.2/superlu-5.3.0-b6f4pqcphndaspnzvlpjog6wxjhib4ff/lib/libsuperlu.a(sp_coletree.c.o):/global/common/software/spackecp/perlmutter/e4s-22.05/spack/var/spack/stage/spack-stage-superlu-5.3.0-b6f4pqcphndaspnzvlpjog6wxjhib4ff/spack-src/SRC/sp_coletree.c:190: more undefined references to `__cray_sset_detect' follow
collect2: error: ld returned 1 exit status
Compile failed

omega-h test fails on perlmutter

@cwsmith

The test defined here: https://github.com/E4S-Project/testsuite/tree/master/validation_tests/omega-h

Fails against the variant of omega-h installed by e4s 22.11 on perlmutter:

-- linux-sles15-zen3 / [email protected] -------------------------------
[email protected]~cuda~examples~ipo+mpi+optimize+shared+symbols~throw+trilinos~warnings+zlib build_system=cmake build_type=RelWithDebInfo
==> 1 installed package

With the following console output:


++ srun -n 2 ./castle
[0] number of vertices 8
[0]: Local vertex number 0 has global index 0
[0]: Local vertex number 1 has global index 2
[0]: Local vertex number 2 has global index 3
[0]: Local vertex number 3 has global index 6
[0]: Local vertex number 4 has global index 7
[0]: Local vertex number 5 has global index 8
[0]: Local vertex number 6 has global index 9
[0]: Local vertex number 7 has global index 10
[0]: local  0 4 0 6
[0]: global 1 7 0 9
[0]: local  1 1 3 7
[0]: global 3 2 6 10
[0]: local  2 3 2 5
[0]: global 6 6 3 8
[0]: local  3 2 4 5
[0]: global 7 3 7 8
[0]: local  4 5 4 6
[0]: global 9 8 7 9
[0]: local  5 5 6 7
[0]: global 10 8 9 10
[0]: local  6 3 5 7
[0]: global 13 6 8 10
[1] number of vertices 8
[1]: Local vertex number 0 has global index 0
[1]: Local vertex number 1 has global index 1
[1]: Local vertex number 2 has global index 2
[1]: Local vertex number 3 has global index 4
[1]: Local vertex number 4 has global index 5
[1]: Local vertex number 5 has global index 9
[1]: Local vertex number 6 has global index 10
[1]: Local vertex number 7 has global index 11
[1]: local  0 4 2 6
[1]: global 0 5 2 10
[1]: local  1 0 3 5
[1]: global 2 0 4 9
[1]: local  2 3 1 7
[1]: global 4 4 1 11
[1]: local  3 1 4 7
[1]: global 5 1 5 11
[1]: local  4 4 6 7
[1]: global 8 5 10 11
[1]: local  5 5 3 7
[1]: global 11 9 4 11
[1]: local  6 6 5 7
[1]: global 12 10 9 11
assertion array.size() == nents_[ent_dim] * ncomps failed at /tmp/lpeyrala/spack-stage/spack-stage-omega-h-9.34.13-rihdjj4qcpstotswnpa5ulldhcgseej2/spack-src/src/Omega_h_mesh.cpp +146
assertion array.size() == nents_[ent_dim] * ncomps failed at /tmp/lpeyrala/spack-stage/spack-stage-omega-h-9.34.13-rihdjj4qcpstotswnpa5ulldhcgseej2/spack-src/src/Omega_h_mesh.cpp +146
srun: error: nid001032: task 0: Aborted
srun: launch/slurm: _step_signal: Terminating StepId=3727136.74
slurmstepd: error: *** STEP 3727136.74 ON nid001032 CANCELLED AT 2022-11-21T20:23:15 ***
srun: error: nid001032: task 1: Aborted

Scr test fails on perlmutter

@CamStan @gonsie

The scr test defined here https://github.com/E4S-Project/testsuite/tree/master/validation_tests/scr fails for the e4s 22.11 install of scr on perlmutter for this variant:

-- linux-sles15-zen3 / [email protected] -------------------------------
crfo7h3 [email protected]+bbapi~bbapi_fallback~dw+examples+fortran~ipo+libyogrt+pdsh+shared+tests build_system=cmake build_type=RelWithDebInfo cache_base=/dev/shm cntl_base=/dev/shm copy_config=none file_lock=FLOCK resource_manager=SLURM scr_config=scr.conf

With this console output:

+ cd ./build
++ whoami
+ export SCR_USER_NAME=wspear
+ SCR_USER_NAME=wspear
+ CLEANTMP='rm -rf /tmp/wspear/scr.defjobid; rm -rf ./output* ./rank*  ./ckpt.*;rm -rf ./*.ckpt; rm -rf ./timestep.*; rm -rf ./.scr'
+ eval rm -rf '/tmp/wspear/scr.defjobid;' rm -rf './output*' './rank*' './ckpt.*;rm' -rf './*.ckpt;' rm -rf './timestep.*;' rm -rf ./.scr
++ rm -rf /tmp/wspear/scr.defjobid
++ rm -rf './output*' './rank*' './ckpt.*'
++ rm -rf './*.ckpt'
++ rm -rf './timestep.*'
++ rm -rf ./.scr
+ eval srun -n 8 ./test_api
++ srun -n 8 ./test_api
MPICH ERROR [Rank 0] [job id 3727136.92] [Mon Nov 21 12:27:30 2022] [nid001032] - Abort(1616271) (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(171).......:
MPID_Init(495)..............:
MPIDI_OFI_mpi_init_hook(816):
create_endpoint(1353).......: OFI EP enable failed (ofi_init.c:1353:create_endpoint:Address already in use)

aborting job:
Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(171).......:
MPID_Init(495)..............:
MPIDI_OFI_mpi_init_hook(816):
create_endpoint(1353).......: OFI EP enable failed (ofi_init.c:1353:create_endpoint:Address already in use)
srun: error: nid001032: task 0: Exited with exit code 255
srun: launch/slurm: _step_signal: Terminating StepId=3727136.92
slurmstepd: error: *** STEP 3727136.92 ON nid001032 CANCELLED AT 2022-11-21T20:27:30 ***
srun: error: nid001032: tasks 1-7: Terminated
srun: Force Terminated StepId=3727136.92

Git cloning in cinema test fails on internet-disabled compute nodes

@EthanS94 @dhrogers

The cinema test as currently implemented can't run on systems that disable internet access from their compute nodes. Is there a reasonable subset of test code that can be permanently committed to the test suite or another way to execute a standalone test without cloning the whole repository?

superlu-dist-nonspacktest fails on JLSE

The superlu-dist-nonspacktest test fails at the compile stage. This issue has the same root cause as #62.

The culprit seems to be these lines:

testsuite/validation_tests/superlu-dist-nonspacktest/EXAMPLE/compile.sh

Lines 16 to 21 in 0e7604e

 #mpicxx  

 if [[ $NERSC_HOST = "perlmutter" ]]; then 

 LIBBLAS=/opt/cray/pe/libsci/21.08.1.2/GNU/9.1/x86_64/lib/libsci_gnu_82_mpi_mp.a 

 else 

 LIBBLAS=${OPENBLAS_ROOT}/lib/libopenblas.so 

 fi

Tasmanian test fails on perlmutter

@mkstoyanov

The package-provided spack test for tasmanian fails for the e4s 22.11 install on perlmutter for this variant:

4yms6vd [email protected]~blas~cuda~fortran~ipo~magma+mpi~openmp~python~rocm~xsdkflags build_system=cmake build_type=Release

With this console output:

==> Error: TestFailure: 1 tests failed.


Command exited with status 1:
    '/global/cfs/cdirs/m3896/shared/ParaTools/E4S/22.11/PrgEnv-gnu/spack/opt/spack/linux-sles15-zen3/gcc-11.2.0/cmake-3.24.2-4allaay72zaqalfo7nxzhks523bcarof/bin/cmake' '/pscratch/sd/w/wspear/perlmutter/spack_user_cache/test/cihea7qmik53325lrao6xlwwmesl42s7/tasmanian-7.9-4yms6vd/cache/tasmanian/testing'
-- The CXX compiler identification is GNU 11.2.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /opt/cray/pe/gcc/11.2.0/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Tasmanian post-installation testing
-- Could NOT find MPI_CXX (missing: MPI_CXX_WORKS) 
-- Configuring incomplete, errors occurred!
See also "/pscratch/sd/w/wspear/perlmutter/spack_user_cache/test/cihea7qmik53325lrao6xlwwmesl42s7/tasmanian-7.9-4yms6vd/CMakeFiles/CMakeOutput.log".
See also "/pscratch/sd/w/wspear/perlmutter/spack_user_cache/test/cihea7qmik53325lrao6xlwwmesl42s7/tasmanian-7.9-4yms6vd/CMakeFiles/CMakeError.log".
CMake Error at /global/cfs/cdirs/m3896/shared/ParaTools/E4S/22.11/PrgEnv-gnu/spack/opt/spack/linux-sles15-zen3/gcc-11.2.0/cmake-3.24.2-4allaay72zaqalfo7nxzhks523bcarof/share/cmake-3.24/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
  Could NOT find MPI (missing: MPI_CXX_FOUND)
Call Stack (most recent call first):
  /global/cfs/cdirs/m3896/shared/ParaTools/E4S/22.11/PrgEnv-gnu/spack/opt/spack/linux-sles15-zen3/gcc-11.2.0/cmake-3.24.2-4allaay72zaqalfo7nxzhks523bcarof/share/cmake-3.24/Modules/FindPackageHandleStandardArgs.cmake:594 (_FPHSA_FAILURE_MESSAGE)
  /global/cfs/cdirs/m3896/shared/ParaTools/E4S/22.11/PrgEnv-gnu/spack/opt/spack/linux-sles15-zen3/gcc-11.2.0/cmake-3.24.2-4allaay72zaqalfo7nxzhks523bcarof/share/cmake-3.24/Modules/FindMPI.cmake:1835 (find_package_handle_standard_args)
  /global/cfs/cdirs/m3896/shared/ParaTools/E4S/22.11/PrgEnv-gnu/spack/opt/spack/linux-sles15-zen3/gcc-11.2.0/tasmanian-7.9-4yms6vdlfdzpaezhmzx3ivril6paz6qw/lib/Tasmanian/TasmanianConfig.cmake:69 (find_package)
  CMakeLists.txt:9 (find_package)





1 error found in test log:
     32    -- Detecting CXX compile features - done
     33    -- Tasmanian post-installation testing
     34    -- Could NOT find MPI_CXX (missing: MPI_CXX_WORKS)
     35    -- Configuring incomplete, errors occurred!
     36    See also "/pscratch/sd/w/wspear/perlmutter/spack_user_cache/test/cihea7qmik53325lrao6xlwwmesl42s7/tasmanian-7.9-4yms6vd/CMakeFiles/CMakeOu
           tput.log".
     37    See also "/pscratch/sd/w/wspear/perlmutter/spack_user_cache/test/cihea7qmik53325lrao6xlwwmesl42s7/tasmanian-7.9-4yms6vd/CMakeFiles/CMakeEr
           ror.log".
  >> 38    CMake Error at /global/cfs/cdirs/m3896/shared/ParaTools/E4S/22.11/PrgEnv-gnu/spack/opt/spack/linux-sles15-zen3/gcc-11.2.0/cmake-3.24.2-4al
           laay72zaqalfo7nxzhks523bcarof/share/cmake-3.24/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
     39      Could NOT find MPI (missing: MPI_CXX_FOUND)
     40    Call Stack (most recent call first):
     41      /global/cfs/cdirs/m3896/shared/ParaTools/E4S/22.11/PrgEnv-gnu/spack/opt/spack/linux-sles15-zen3/gcc-11.2.0/cmake-3.24.2-4allaay72zaqalfo
           7nxzhks523bcarof/share/cmake-3.24/Modules/FindPackageHandleStandardArgs.cmake:594 (_FPHSA_FAILURE_MESSAGE)
     42      /global/cfs/cdirs/m3896/shared/ParaTools/E4S/22.11/PrgEnv-gnu/spack/opt/spack/linux-sles15-zen3/gcc-11.2.0/cmake-3.24.2-4allaay72zaqalfo
           7nxzhks523bcarof/share/cmake-3.24/Modules/FindMPI.cmake:1835 (find_package_handle_standard_args)
     43      /global/cfs/cdirs/m3896/shared/ParaTools/E4S/22.11/PrgEnv-gnu/spack/opt/spack/linux-sles15-zen3/gcc-11.2.0/tasmanian-7.9-4yms6vdlfdzpaez
           hmzx3ivril6paz6qw/lib/Tasmanian/TasmanianConfig.cmake:69 (find_package)
     44      CMakeLists.txt:9 (find_package)



/global/cfs/cdirs/m3896/shared/ParaTools/E4S/22.11/PrgEnv-gnu/spack/lib/spack/spack/build_environment.py:1086, in _setup_pkg_and_run:
       1083        tb_string = traceback.format_exc()
       1084
       1085        # build up some context from the offending package so we can
  >>   1086        # show that, too.
       1087        package_context = get_package_context(tb)
       1088
       1089        logfile = None

See test log for details:
  /pscratch/sd/w/wspear/perlmutter/spack_user_cache/test/cihea7qmik53325lrao6xlwwmesl42s7/tasmanian-7.9-4yms6vd-test-out.txt

==> Error: 1 test(s) in the suite failed.

Slate+rocm test fails on crusher

@G-Ragghianti @mgates3

The slate standalone test defined here: https://github.com/E4S-Project/testsuite/tree/master/validation_tests/slate-rocm fails when run on the slate build installed as part of the e4s 22.11 deployment on crusher using these variants with the console output below:

-- linux-sles15-zen3 / [email protected] -------------------------------
edojdwe [email protected]~cuda~ipo+mpi+openmp+rocm+shared amdgpu_target=gfx90a build_system=cmake build_type=RelWithDebInfo
7ej4aoh     [email protected]~cuda~ipo+openmp+rocm+shared amdgpu_target=gfx90a build_system=cmake build_type=RelWithDebInfo
c6gpjyk         [email protected]~doc+ncurses+ownlibs~qt build_system=generic build_type=Release
igbrz2c             [email protected]~symlinks+termlib abi=none build_system=autotools
savxweu                 [email protected] build_system=autotools
kq7i44v             [email protected]~docs~shared build_system=generic certs=mozilla
6ki4n47                 ca-certificates-mozilla@2022-10-11 build_system=generic
ucjrwtm                 [email protected]+cpanm+shared+threads build_system=generic
gqdvawb                     [email protected]+cxx~docs+stl build_system=autotools patches=26090f4,b231fcc
g2bpsoz                     [email protected]~debug~pic+shared build_system=generic
rnafwos                         [email protected] build_system=autotools
xfogkcu                             [email protected] build_system=autotools libs=shared,static
otqsxvg                     [email protected] build_system=autotools
6mvf2em                         [email protected] build_system=autotools
76b2zrq                     [email protected]+optimize+pic+shared build_system=makefile
bzm57qy         [email protected]~ipo build_system=cmake build_type=Release patches=959d1fe
e5ldtkh         [email protected]+image~ipo+shared build_system=cmake build_type=Release patches=71e6851
mm6mnhr         [email protected]~ipo~link_llvm_dylib~llvm_dylib~openmp+rocm-device-libs build_system=cmake build_type=Release patches=a08bbe1
bgpvt5g         [email protected]~bignuma~consistent_fpcsr+fortran~ilp64+locking+pic+shared build_system=makefile patches=d3d9b15 symbol_suffix=none threads=openmp
g2sf37k         [email protected]~ipo+tensile amdgpu_target=auto build_system=cmake build_type=Release patches=81591d9
oaykapp     [email protected]+wrappers build_system=generic
izppu2z     [email protected]~cuda~ipo+rocm+shared amdgpu_target=gfx90a build_system=cmake build_type=RelWithDebInfo
orsl6og         [email protected]~ipo+optimal amdgpu_target=auto build_system=cmake build_type=Release

slate+rocm %gcc: edojdwe
terminate called after throwing an instance of 'std::out_of_range'
  what():  map::at
terminate called after throwing an instance of 'std::out_of_range'
  what():  map::at
terminate called after throwing an instance of 'std::out_of_range'
  what():  map::at
srun: error: crusher124: tasks 1-3: Aborted
srun: launch/slurm: _step_signal: Terminating StepId=230307.0
slurmstepd: error: *** STEP 230307.0 ON crusher124 CANCELLED AT 2022-12-14T18:10:38 ***
srun: error: crusher124: task 0: Terminated
srun: Force Terminated StepId=230307.0

amrex-cuda downloads source during test

The amrex-cuda test attempts to git clone a repository to obtain test code. This isn't allowed since some systems (i.e. crusher) do not have external network access on compute nodes there the testsuite will be run. @PlatinumCD

how to distinguish spack tests in validation_tests directory

@wspear

The validation_tests directory contains a lot of tests and it is not clear which ones invoke spack test and which one are not. It would be better to either remove all test that invoke spack test from this repo, if there is a need to keep them around i would suggest put them in a separate directory so its easier to keep track of them.

One option would be to have two top-level directories validation_tests and spack_tests. I suspect there will be translation as test migrates from validation_tests --> spack_tests when a package supports spack test. It is something to consider, i am trying to distinguish which tests should run from E4S Testsuite and which ones run from spack test and i dont want to write redundant tests from two different test suites.

trilinos test failed

CDASH: https://my.cdash.org/test/63278708

buildspec: https://github.com/buildtesters/buildtest-nersc/blob/devel/buildspecs/e4s/E4S-Testsuite/perlmutter/22.05/trilinos.yml

I suspect this test is failing because we have this set in our startup modulefile gpu which is loaded by default

e4s:login34> ml show gpu
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   /global/common/software/nersc/pm-2022.08.4/extra_modulefiles/gpu/1.0.lua:
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
family("hardware")
load("cudatoolkit")
load("craype-accel-nvidia80")
setenv("MPICH_GPU_SUPPORT_ENABLED","1")

We can unload this module by just loading cpu module. Anyhow i wanted to bring this up.

Error:

+ cd -
/global/cfs/cdirs/m3503/buildtest/runs/perlmutter_check/2022-09-28/perlmutter.slurm.regular/trilinos/trilinos_e4s_testsuite_22.05/75260858/stage/testsuite/validation_tests/trilinos
Running /global/cfs/cdirs/m3503/buildtest/runs/perlmutter_check/2022-09-28/perlmutter.slurm.regular/trilinos/trilinos_e4s_testsuite_22.05/75260858/stage/testsuite/validation_tests/trilinos
Skipping load: Environment already setup
+ cd ./build
+ export CUDA_MANAGED_FORCE_DEVICE_ALLOC=1
+ CUDA_MANAGED_FORCE_DEVICE_ALLOC=1
+ export OMP_NUM_THREADS=4
+ OMP_NUM_THREADS=4
+ srun -n 8 ./Zoltan
MPICH ERROR [Rank 0] [job id 3289011.0] [Wed Sep 28 19:56:45 2022] [nid003233] - Abort(-1) (rank 0 in comm 0): MPIDI_CRAY_init: GPU_SUPPORT_ENABLED is requested, but GTL library is not linked
 (Other MPI error)

aborting job:
MPIDI_CRAY_init: GPU_SUPPORT_ENABLED is requested, but GTL library is not linked

srun: error: nid003233: tasks 0-7: Segmentation fault
srun: launch/slurm: _step_signal: Terminating StepId=3289011.0
Run failed

slepc test fails on perlmutter

@balay @joseeroman

The slepc test defined here: https://github.com/E4S-Project/testsuite/tree/master/validation_tests/slepc

Fails for the e4s 22.11 deployment of slepc on perlmutter with this variant:

-- linux-sles15-zen3 / [email protected] -------------------------------
5puydjf [email protected]+arpack~blopex~cuda~rocm build_system=generic

With this console output:

MPICH ERROR [Rank 0] [job id 3727136.93] [Mon Nov 21 12:28:07 2022] [nid001032] - Abort(1616271) (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(171).......:
MPID_Init(495)..............:
MPIDI_OFI_mpi_init_hook(816):
create_endpoint(1353).......: OFI EP enable failed (ofi_init.c:1353:create_endpoint:Address already in use)

aborting job:
Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(171).......:
MPID_Init(495)..............:
MPIDI_OFI_mpi_init_hook(816):
create_endpoint(1353).......: OFI EP enable failed (ofi_init.c:1353:create_endpoint:Address already in use)
srun: error: nid001032: task 0: Exited with exit code 255
srun: launch/slurm: _step_signal: Terminating StepId=3727136.93
slurmstepd: error: *** STEP 3727136.93 ON nid001032 CANCELLED AT 2022-11-21T20:28:08 ***
srun: error: nid001032: tasks 1-7: Terminated
srun: Force Terminated StepId=3727136.93

Pumi test fails on perlmutter

@cwsmith

The spack test for this variant of pumi installed on perlmutter by e4s 22.11:

-- linux-sles15-zen3 / [email protected] -------------------------------
[email protected]~fortran~int64~ipo~shared+simmodsuite_version_check~testing~zoltan build_system=cmake build_type=RelWithDebInfo simmodsuite=none

Fails with the following console output:

==> Error: TestFailure: 2 tests failed.


Command exited with status 127:
    './uniform' '../testdata/pipe.dmg' '../testdata/pipe.smb' 'pipe_unif.smb'
./uniform: error while loading shared libraries: libfabric.so.1: cannot open shared object file: No such file or directory



1 error found in test log:
     1    ==> Testing package pumi-2.2.7-llomhac
     2    ==> [2022-11-21-12:25:41.390040] testing pumi uniform mesh refinement
     3    ==> [2022-11-21-12:25:41.390483] './uniform' '../testdata/pipe.dmg' '../testdata/pipe.smb' 'pipe_unif.smb'
     4    ./uniform: error while loading shared libraries: libfabric.so.1: cannot open shared object file: No such file or directory
     5    FAILED: Command exited with status 127:
     6        './uniform' '../testdata/pipe.dmg' '../testdata/pipe.smb' 'pipe_unif.smb'
  >> 7    ./uniform: error while loading shared libraries: libfabric.so.1: cannot open shared object file: No such file or directory
     8    
     9      File "/global/cfs/cdirs/m3896/shared/ParaTools/E4S/22.11/PrgEnv-gnu/spack/bin/spack", line 100, in <module>



Failed to find executable '/opt/cray/pe/mpich/8.1.17/ofi/gnu/9.1/bin/mpiexec'

/global/cfs/cdirs/m3896/shared/ParaTools/E4S/22.11/PrgEnv-gnu/spack/lib/spack/spack/package_base.py:2023, in run_test:
       2020                    # We're below the package context, so get context from
       2021                    # stack instead of from traceback.
       2022                    # The traceback is truncated here, so we can't use it to
  >>   2023                    # traverse the stack.
       2024                    m = "\n".join(spack.build_environment.get_package_context(tb))
       2025
       2026                exc = e  # e is deleted after this block


/global/cfs/cdirs/m3896/shared/ParaTools/E4S/22.11/PrgEnv-gnu/spack/lib/spack/spack/build_environment.py:1086, in _setup_pkg_and_run:
       1083        tb_string = traceback.format_exc()
       1084
       1085        # build up some context from the offending package so we can
  >>   1086        # show that, too.
       1087        package_context = get_package_context(tb)
       1088
       1089        logfile = None

See test log for details:
  /pscratch/sd/w/wspear/perlmutter/spack_user_cache/test/hyjemf5zrzgrnmfmzuddmljbsxwuyat4/pumi-2.2.7-llomhac-test-out.txt

==> Error: 1 test(s) in the suite failed.

--- ==> Spack test hyjemf5zrzgrnmfmzuddmljbsxwuyat4
==> Testing package pumi-2.2.7-llomhac
======================== 1 failed, 0 passed of 1 specs ========================= ---

add license

@sameershende @wspear @eugeneswalker

According to E4S documentation it states

Distribution
E4S is open source software published under the MIT License. E4S can be redistributed and modified under the terms of this license. E4S packages each have their own open source license.

Please consider adding a LICENSE file, if its MIT for all the projects including E4S Testsuite than you can use the MIT template found here: https://choosealicense.com/licenses/mit/

netcdf-fortran test fails

The netcdf-fortran test has a typo.

The compile line is $TEST_FTN_MPI simple_xy_wr.f90 -o simple_xy_wr -I${NETCDF_FORTRAN_ROOT}/include -L${NETCDF_FORTRAN_ROOT}/lib -Wl,-rpath,${NETCDF_FORTRAN_ROOT}/lib -lnetcdff

emphasis on -o simple_xy_wr.

The run line is $TEST_RUN ./simple_xy_nc4_wr, so the test always fails since simple_xy_nc4_wr doesn't exist.

Test Header-only Library (ECP SICM/Metall)

Hello,

I'm one of the developers of the ECP Metall library and trying to add Metall's test to this Testsuite project.

Our ECP Metall is a header-only library.
Because of that, its Spack package installs/depends on a compiler only when Spack's build-time test is enabled.

Could I have Testsuite install a proper compiler during the setup step?

Or can we assume that Spack's build-time test is always enabled when E4S installs Metall?

Thank you,
Keita

umap test fails

@egreen77 Recent versions of umap have the umap test failing with the text Test failed. and no other information.

strumpack test fails on perlmutter

@pghysels

The strumpack test defined here: https://github.com/E4S-Project/testsuite/tree/master/validation_tests/strumpack fails for the perlmutter install under e4s 22.11 with the following variants:

-- linux-sles15-zen3 / [email protected] -------------------------------
jslue4d [email protected]+butterflypack+c_interface~count_flops~cuda~ipo+mpi+openmp+parmetis~rocm~scotch+shared~slate~task_timers+zfp build_system=cmake build_type=RelWithDebInfo

With the following console output. (It seems that it runs to near completion and segfaults at the end)

strumpack~rocm %gcc: jslue4d
+ export OMP_NUM_THREADS=1
+ OMP_NUM_THREADS=1
+ ./testPoisson2d 100 --sp_disable_gpu
solving 2D 100x100 Poisson problem with 1 right hand sides
# Initializing STRUMPACK
# using 1 OpenMP thread(s)
# matrix equilibration, r_cond = 1 , c_cond = 1 , type = N
# initial matrix:
#   - number of unknowns = 10,000
#   - number of nonzeros = 49,600
# nested dissection reordering:
#   - Geometric reordering
#   - strategy parameter = 8
#   - number of separators = 1,967
#   - number of levels = 12
#   - nd time = 0.000863288
#   - symmetrization time = 2.67616e-06
# symbolic factorization:
#   - nr of dense Frontal matrices = 1,967
#   - symb-factor time = 0.000889237
# multifrontal factorization:
#   - estimated memory usage (exact solver) = 4.50818 MB
#   - minimum pivot, sqrt(eps)*|A|_1 = 8.42937e-08
#   - replacing of small pivots is not enabled
#   - factor time = 0.0113813
#   - factor nonzeros = 563,522
#   - factor memory = 4.50818 MB
REFINEMENT it. 0	res =      442.368	rel.res =            1	bw.error =            1
REFINEMENT it. 1	res =  7.89021e-14	rel.res =  1.78363e-16	bw.error =   5.8175e-16
# DIRECT/GMRES solve:
#   - abs_tol = 1e-10, rel_tol = 1e-06, restart = 30, maxit = 5000
#   - number of Krylov iterations = 1
#   - solve time = 0.00290866
# COMPONENTWISE SCALED RESIDUAL = 5.68529e-16
# relative error = ||x-x_exact||_F/||x_exact||_F = 3.96446e-15
./run.sh: line 7: 97192 Segmentation fault      ./testPoisson2d 100 --sp_disable_gpu

E4S-rocm/mfem test points to mfem~rocm~cuda test

The rocm enabled test just points to the normal validation test mfem which tries to load mfem~rocm~cuda as the spec.

Precice test fails on perlmutter

@MakisH @fsimonis

The precice test defined here: https://github.com/E4S-Project/testsuite/tree/master/validation_tests/precice

Fails on perlmutter for this variant installed with e4s 22.11:

-- linux-sles15-zen3 / [email protected] -------------------------------
[email protected]~ipo+mpi+petsc~python+shared build_system=cmake build_type=RelWithDebInfo

With the following console output:

DUMMY: Running solver dummy with preCICE config file "precice-config.xml", participant name "SolverOne", and mesh name "MeshOne".
preCICE:^[[0m This is preCICE version 2.5.0
preCICE:^[[0m Revision info: no-info [git failed to run]
preCICE:^[[0m Build type: Release (without debug log)
preCICE:^[[0m Configuring preCICE with configuration "precice-config.xml"
preCICE:^[[0m I am participant "SolverOne"
preCICE:^[[0m Setting up primary communication to coupling partner/s
MPICH ERROR [Rank 0] [job id ] [Mon Nov 21 12:25:26 2022] [nid001032] - Abort(1616271) (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(171).......:
MPID_Init(495)..............:
MPIDI_OFI_mpi_init_hook(816):
create_endpoint(1353).......: OFI EP enable failed (ofi_init.c:1353:create_endpoint:Address already in use)

DUMMY: Running solver dummy with preCICE config file "precice-config.xml", participant name "SolverTwo", and mesh name "MeshTwo".
aborting job:
Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(171).......:
MPID_Init(495)..............:
MPIDI_OFI_mpi_init_hook(816):
create_endpoint(1353).......: OFI EP enable failed (ofi_init.c:1353:create_endpoint:Address already in use)
~

libcatalyst downloads source during test

@mathstuf In its compile phase the libcatalyst test attempts to git clone a repository to obtain test code. This isn't allowed since some systems (i.e. crusher) do not have external network access on compute nodes there the testsuite will be run.

strumpack test failed on E4S 22.05

CDASH Result: https://my.cdash.org/test/63278714

buildspec: https://github.com/buildtesters/buildtest-nersc/blob/devel/buildspecs/e4s/E4S-Testsuite/perlmutter/22.05/strumpack.yml

The test just segfault

REFINEMENT it. 0	res =      442.368	rel.res =            1	bw.error =            1
REFINEMENT it. 1	res =  7.89021e-14	rel.res =  1.78363e-16	bw.error =   5.8175e-16
# DIRECT/GMRES solve:
#   - abs_tol = 1e-10, rel_tol = 1e-06, restart = 30, maxit = 5000
#   - number of Krylov iterations = 1
#   - solve time = 0.0597916
# COMPONENTWISE SCALED RESIDUAL = 5.68529e-16
# relative error = ||x-x_exact||_F/||x_exact||_F = 3.96446e-15
./run.sh: line 7: 87374 Segmentation fault      ./testPoisson2d 100 --sp_disable_gpu
Run failed

kokkos test failing

CDASH: https://my.cdash.org/test/63278698

buildspec: https://github.com/buildtesters/buildtest-nersc/blob/devel/buildspecs/e4s/E4S-Testsuite/perlmutter/22.05/kokkos.yml

+ make
[ 14%] Building CXX object CMakeFiles/example_no_kokkos.dir/bar.cpp.o
[ 28%] Linking CXX executable example_no_kokkos
[ 28%] Built target example_no_kokkos
[ 42%] Building CXX object CMakeFiles/example_with_kokkos.dir/foo.cpp.o
[ 57%] Linking CXX executable example_with_kokkos
[ 57%] Built target example_with_kokkos
Scanning dependencies of target example_cmake
[ 71%] Building CXX object CMakeFiles/example_cmake.dir/cmake_example.cpp.o
[ 85%] Building Fortran object CMakeFiles/example_cmake.dir/foo.f.o
[100%] Linking CXX executable example_cmake
[100%] Built target example_cmake
Running /global/cfs/cdirs/m3503/buildtest/runs/perlmutter_check/2022-09-28/perlmutter.slurm.regular/kokkos/kokkos_e4s_testsuite_22.05/9afbd51c/stage/testsuite/validation_tests/kokkos
Skipping load: Environment already setup
+ export OMP_PROC_BIND=spread
+ OMP_PROC_BIND=spread
+ export OMP_PLACES=threads
+ OMP_PLACES=threads
+ export OMP_NUM_THREADS=8
+ OMP_NUM_THREADS=8
+ ./build/example_with_kokkos 500000000
WARNING: Requested total thread count and/or thread affinity may result in
oversubscription of available CPU resources!  Performance may be degraded.
Explicitly set OMP_WAIT_POLICY=PASSIVE or ACTIVE to suppress this message.
Set CRAY_OMP_CHECK_AFFINITY=TRUE to print detailed thread-affinity messages.
terminate called after throwing an instance of 'std::runtime_error'
  what():  Kokkos::Impl::HostThreadTeamData::organize_pool ERROR pool already exists
./run.sh: line 7: 61519 Aborted                 ./build/example_with_kokkos 500000000
Run failed

update README with the settings.sh or other scripts

@wspear

At Cori we needed to link settings.sh to settings.cori.sh to use the appropriate compilers. It would be worth mentioning when this needs to be made for each site or if it is left alone, which i assume will run the default setting.

adios2 test is slow on Cori

I tested adios2 on Cori it is taking way too long to build and run the code. According to output it is just running a Hello World but it is clearly doing something else that requires retooling the test.

siddiq90@nid00284> time ./test-all.sh validation_tests/adios2/ --print-logs --settings settings.cori.sh
===
validation_tests/adios2/
Cleaning /global/cscratch1/sd/siddiq90/testsuite/validation_tests/adios2
---CLEANUP LOG---
rm -f *.o hello-world
Compiling /global/cscratch1/sd/siddiq90/testsuite/validation_tests/adios2
---COMPILE LOG---
+ export ADIOS2_LIB_PATH=/global/common/software/spackecp/e4s-20.10/software/cray-cnl7-haswell/intel-19.1.2.254/adios2-2.6.0-n4dtk4qstwl4v7oc5ue62fbwicooac6q/lib
+ ADIOS2_LIB_PATH=/global/common/software/spackecp/e4s-20.10/software/cray-cnl7-haswell/intel-19.1.2.254/adios2-2.6.0-n4dtk4qstwl4v7oc5ue62fbwicooac6q/lib
+ [[ ! -d /global/common/software/spackecp/e4s-20.10/software/cray-cnl7-haswell/intel-19.1.2.254/adios2-2.6.0-n4dtk4qstwl4v7oc5ue62fbwicooac6q/lib ]]
+ export ADIOS2_LIB_PATH=/global/common/software/spackecp/e4s-20.10/software/cray-cnl7-haswell/intel-19.1.2.254/adios2-2.6.0-n4dtk4qstwl4v7oc5ue62fbwicooac6q/lib64
+ ADIOS2_LIB_PATH=/global/common/software/spackecp/e4s-20.10/software/cray-cnl7-haswell/intel-19.1.2.254/adios2-2.6.0-n4dtk4qstwl4v7oc5ue62fbwicooac6q/lib64
+ make
CC  -I/global/common/software/spackecp/e4s-20.10/software/cray-cnl7-haswell/intel-19.1.2.254/adios2-2.6.0-n4dtk4qstwl4v7oc5ue62fbwicooac6q/include -Wall   -c -o hello-world.o hello-world.cpp
CC  -o hello-world hello-world.o -L/global/common/software/spackecp/e4s-20.10/software/cray-cnl7-haswell/intel-19.1.2.254/adios2-2.6.0-n4dtk4qstwl4v7oc5ue62fbwicooac6q/lib64 -ladios2_cxx11
Running /global/cscratch1/sd/siddiq90/testsuite/validation_tests/adios2
---RUN LOG---
Hello World from ADIOS2
Hello World from ADIOS2
Hello World from ADIOS2
Hello World from ADIOS2
Hello World from ADIOS2
Hello World from ADIOS2
Hello World from ADIOS2
Hello World from ADIOS2
Success
real	11m10.137s
user	2m11.663s
sys	0m51.527s

hdf5 test failing

CDASH: https://my.cdash.org/test/63278692

buildspec: https://github.com/buildtesters/buildtest-nersc/blob/devel/buildspecs/e4s/E4S-Testsuite/perlmutter/22.05/hdf5.yml

Cleaning /global/cfs/cdirs/m3503/buildtest/runs/perlmutter_check/2022-09-28/perlmutter.slurm.regular/hdf5/hdf5_e4s_testsuite_22.05/b54379cd/stage/testsuite/validation_tests/hdf5
---CLEANUP LOG---
Internal Spack test. No clean step required.
Compiling /global/cfs/cdirs/m3503/buildtest/runs/perlmutter_check/2022-09-28/perlmutter.slurm.regular/hdf5/hdf5_e4s_testsuite_22.05/b54379cd/stage/testsuite/validation_tests/hdf5
---COMPILE LOG---
Internal Spack test. No build step required.
Running /global/cfs/cdirs/m3503/buildtest/runs/perlmutter_check/2022-09-28/perlmutter.slurm.regular/hdf5/hdf5_e4s_testsuite_22.05/b54379cd/stage/testsuite/validation_tests/hdf5
Skipping load: Environment already setup
==> Error: TestFailure: 1 tests failed.


Executable 'h5format_convert' expected in prefix, found in /global/common/software/nersc/pm-2022q3/sw/python/3.9-anaconda-2021.11/bin/h5format_convert instead

/global/common/software/spackecp/perlmutter/e4s-22.05/spack/lib/spack/spack/package.py:2109, in _run_test_helper:
       2106
       2107        if installed:
       2108            msg = "Executable '{0}' expected in prefix".format(runner.name)
  >>   2109            msg += ", found in {0} instead".format(runner.path)
       2110            assert runner.path.startswith(self.spec.prefix), msg
       2111
       2112        try:


/global/common/software/spackecp/perlmutter/e4s-22.05/spack/lib/spack/spack/build_environment.py:1076, in _setup_pkg_and_run:
       1073        tb_string = traceback.format_exc()
       1074
       1075        # build up some context from the offending package so we can
  >>   1076        # show that, too.
       1077        package_context = get_package_context(tb)
       1078
       1079        logfile = None

See test log for details:
  /global/homes/e/e4s/.spack/test/puxhe6geynlfin7dhv6c7wi4d7ykx2zu/hdf5-1.8.22-hsb3pcd-test-out.txt

==> Error: 1 test(s) in the suite failed.

--- ==> Spack test puxhe6geynlfin7dhv6c7wi4d7ykx2zu
==> Testing package hdf5-1.8.22-hsb3pcd
======================== 1 failed, 0 passed of 1 specs ========================= ---
Run failed

visit test fails on Crusher

@cyrush @brugger1

The visit test defined here https://github.com/E4S-Project/testsuite/tree/master/validation_tests/visit fails like:

Version 3.2.2 of cli does not exist for the architecture linux-x86_64.

For our e4s 22.11 build on crusher with this spec:

 -- linux-sles15-zen3 / [email protected] -------------------------------
rayqtpj [email protected]+adios2+conduit~gui+hdf5~ipo+mfem+mpi+osmesa+plugins+python+silo~vtkm build_system=cmake build_type=RelWithDebInfo patches=2c4c27f,70b2f94,9ae2769,f362758

sundials+rocm spack test fails with no output

@balos1 @cswoodward @gardner48

On Crusher the internal spack test for sundials works for the cpu build but the +rocm build (variant below) fails with an empty log output. Assuming this variant is valid, is it possible the sundials test could be updated to support gpu variants?

-- linux-sles15-zen3 / [email protected] -------------------------------
e5jbpyc [email protected]+ARKODE+CVODE+CVODES+IDA+IDAS+KINSOL~cuda+examples+examples-install~f2003~fcmix+generic-math~ginkgo~hypre~int64~ipo~klu~kokkos~kokkos-kernels~lapack~magma~monitoring+mpi~openmp~petsc~profiling~pthread~raja+rocm+shared+static~superlu-dist~superlu-mt~sycl~trilinos amdgpu_target=gfx90a build_system=cmake build_type=RelWithDebInfo cstd=99 cxxstd=14 logging-level=0 logging-mpi=OFF precision=double

Why a separate kokkos-rocm Test

I get that you were testing separable compilation there apparently. And thats fine, but its absolutely not rocm specific.

Heffte test fails on crusher

@mkstoyanov

Using the e4s 22.11 test install on crusher with the spec below running spack test heffte fails with the error output below.

kd7aodn [email protected]~cuda+fftw~fortran~ipo~magma~mkl~python~rocm+shared build_system=cmake build_type=RelWithDebInfo
c6gpjyk     [email protected]~doc+ncurses+ownlibs~qt build_system=generic build_type=Release
igbrz2c         [email protected]~symlinks+termlib abi=none build_system=autotools
savxweu             [email protected] build_system=autotools
kq7i44v         [email protected]~docs~shared build_system=generic certs=mozilla
6ki4n47             ca-certificates-mozilla@2022-10-11 build_system=generic
ucjrwtm             [email protected]+cpanm+shared+threads build_system=generic
gqdvawb                 [email protected]+cxx~docs+stl build_system=autotools patches=26090f4,b231fcc
g2bpsoz                 [email protected]~debug~pic+shared build_system=generic
rnafwos                     [email protected] build_system=autotools
xfogkcu                         [email protected] build_system=autotools libs=shared,static
otqsxvg                 [email protected] build_system=autotools
6mvf2em                     [email protected] build_system=autotools
76b2zrq                 [email protected]+optimize+pic+shared build_system=makefile
oaykapp     [email protected]+wrappers build_system=generic
vw5qjrw     [email protected]+mpi~openmp~pfft_patches build_system=autotools precision=double,float,long_double

==> Error: TestFailure: 1 tests failed.


Command exited with status 2:
    '/usr/bin/make' 'test'
Running tests...
Test project /autofs/nccs-svm1_home1/wspear/.spack/test/zo67aakzvhymclohwiv7tojxl3abljfc/heffte-2.3.0-kd7aodn
    Start 1: example_fftw
1/5 Test #1: example_fftw .....................***Failed    0.13 sec
    Start 2: example_r2r
2/5 Test #2: example_r2r ......................***Failed    0.07 sec
    Start 3: example_options
3/5 Test #3: example_options ..................***Failed    0.07 sec
    Start 4: example_vectors
4/5 Test #4: example_vectors ..................***Failed    0.07 sec
    Start 5: example_r2c
5/5 Test #5: example_r2c ......................***Failed    0.07 sec

0% tests passed, 5 tests failed out of 5

Total Test time (real) =   0.45 sec

The following tests FAILED:
	  1 - example_fftw (Failed)
	  2 - example_r2r (Failed)
	  3 - example_options (Failed)
	  4 - example_vectors (Failed)
	  5 - example_r2c (Failed)
Errors while running CTest
Output from these tests are in: /autofs/nccs-svm1_home1/wspear/.spack/test/zo67aakzvhymclohwiv7tojxl3abljfc/heffte-2.3.0-kd7aodn/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.
make: *** [Makefile:71: test] Error 8



1 error found in test log:
     89    	  3 - example_options (Failed)
     90    	  4 - example_vectors (Failed)
     91    	  5 - example_r2c (Failed)
     92    Errors while running CTest
     93    Output from these tests are in: /autofs/nccs-svm1_home1/wspear/.spack/test/zo67aakzvhymclohwiv7tojxl3abljfc/heffte-2.3.0-kd7aodn/Testing/Temporary/LastTest.log
     94    Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.
  >> 95    make: *** [Makefile:71: test] Error 8
     96    
     97      File "/gpfs/alpine/csc439/world-shared/E4S/ParaTools/22.11/PrgEnv-gnu/spack/bin/spack", line 100, in <module>



/gpfs/alpine/csc439/world-shared/E4S/ParaTools/22.11/PrgEnv-gnu/spack/lib/spack/spack/build_environment.py:1086, in _setup_pkg_and_run:
       1083        tb_string = traceback.format_exc()
       1084
       1085        # build up some context from the offending package so we can
  >>   1086        # show that, too.
       1087        package_context = get_package_context(tb)
       1088
       1089        logfile = None

gptune spack test fails on perlmutter

@liuyangzhuan

The spack test for this spec installed under e4s 22.11 fails on perlmutter.

-- linux-sles15-zen3 / [email protected] -------------------------------
[email protected]~hypre~ipo+mpispawn~superlu build_system=cmake build_type=RelWithDebInfo

The test doesn't produce any console output.

kokkos-legacy test

That test copied in here requires UVM memory. I think that is pretty bad since it won't work on AMD right now.

	#mpicxx
	if [[ $NERSC_HOST = "perlmutter" ]]; then
	LIBBLAS=/opt/cray/pe/libsci/21.08.1.2/GNU/9.1/x86_64/lib/libsci_gnu_82_mpi_mp.a
	else
	LIBBLAS=${OPENBLAS_ROOT}/lib/libopenblas.so
	fi

e4s-project / testsuite Goto Github PK

testsuite's People

Contributors

Stargazers

Watchers

Forkers

testsuite's Issues

Recommend Projects

Recommend Topics

Recommend Org