dalel487 / su2hmc Goto Github PK

View Code? Open in Web Editor NEW

0.0 3.0 0.0 60.95 MB

Two Colour Hybrid Monte Carlo with Wilson Fermions

Home Page: https://dalel487.github.io/su2hmc/

License: BSD 3-Clause "New" or "Revised" License

C 79.38% Makefile 0.74% Cuda 19.87%

lattice-qcd qcd qc2d hep-lat wilson-fermion

su2hmc's Introduction

su2hmc: Two Colour Hybrid Monte Carlo with Wilson Fermions

Introduction

Hybrid Monte Carlo algorithm for Two Color QCD with Wilson-Gor'kov fermions based on the algorithm of Duane et al. Phys. Lett. B195 (1987) 216.

There is "up/down partitioning": each update requires one operation of congradq on complex vectors to determine $$ \left(M^\dagger M\right)^{-1}\Phi $$ where $\Phi$ has dimension 4 * kvol * nc * Nf - The matrix M is the Wilson matrix for a single flavor there is no extra species doubling as a result

matrix multiplies done using routines hdslash and hdslashd

Hence, the number of lattice flavors Nf is related to the number of continuum flavors N_f by $$N_f = 2 \text{Nf}$$

Fermion expectation values are measured using a noisy estimator. on the Wilson-Gor'kov matrix, which has dimension 8 * kvol * nc * Nf inversions done using congradp, and matrix multiplies with dslash, dslashd

trajectory length is random with mean dt * stepl The code runs for a fixed number ntraj of trajectories.


Phi	pseudofermion field
bmass	bare fermion mass
fmu	chemical potential
actiona	running average of total action

Fermion expectation values are measured using a noisy estimator. The code produces the following outputs:

File Name	Data type
config.bβββkκκκmuμμμμjJJJsNXtNT.XXXXXX	Lattice configuration for given parameters. Last digits are the configuration number
Output.bβββkκκκmuμμμμjJJJsNXtNT	Number of conjugate gradient steps for each trajectory. Also contains general simulation details upon completion
bose.bβββkκκκmuμμμμjJJJsNXtNT	spatial plaquette, temporal plaquette, Polyakov line
fermi.bβββkκκκmuμμμμjJJJsNXtNT	psibarpsi, energy density, baryon density
diq.bβββkκκκmuμμμμjJJJsNXtNT	real

SJH March 2005

Hybrid code, P.Giudice, May 2013

Converted from Fortran to C by D. Lawlor March 2021

Conversion notes

This two colour implementation was originally written in FORTRAN for: S. Hands, S. Kim and J.-I. Skullerud, Deconfinement in dense 2-color QCD, Eur. Phys. J. C48, 193 (2006), hep- lat/0604004

It has since been rewritten in C and is in the process of being adapted for CUDA. We have sucessfully run on 7000+ Zen 2 cores, as well as A100 GPUs

Some adaptions from the original are:

Mixed precision conjugate gradient
Implementation of BLAS routines for vector operations
Removal of excess halo exchanges
#pragma omp simd instructions
Makefiles for Intel, GCC and AMD compilers with flags set for latest machines
GSL ranlux support
CUDA implementation.

Other works in progress include:

Improved action
SYCL implementation.
Multi-GPU support
CMake build system
yaml input file
Set lattice volume and CPU grid at runtime
Higher order integrators. 11 stage 4th order non-gradient integrator implimented but no speedup yet

Getting started

This code is written for MPI on Linux, thus has a few caveats to get up and running

In sizes.h, set the lattice size. By default we assume the spatial components to be equal
Also in sizes.h set the processor grid size by setting the values of
```
npx npy npz npt
```
These MUST be divisors of
```
nx ny nz nt
```
set in step one.
Compile the code using the desired Makefile. Please note that the paths given in the Makefiles for BLAS libraries etc. are based on my own system. You may need to adjust these manually.
Run the code. This may differ from system to system, especially if a task scheduler like SLURM is being used. On my desktop it can be run locally using the following command
```
mpirun -n<nproc> ./su2hmc <input_file>
```

nproc is the number of processors, given by the product of npx npy npz npt
If no input file is given, the programme defaults to midout. The default name is a historical one which goes back generations to the early days of Lattice QCD.

Input parameters

A sample input file looks like

0.00200	1.7	0.1780	0.00	0.000	0.0	0.0	500	20	1	1	100
dt	beta	akappa	jqq	thetaq	fmu	aNf	stepl	ntraj	istart	icheck	iread

where

dt is the step size for the update
beta is β, given up to three significant figures
akappa is hopping parameter, given up to four significant figures
jqq is the diquark source, given up to three significant figures
thetaq is the diquark mixing angle
fmu is the chemical potential
aNf is ignored. Originating in the Cornell group when Ken Wilson was still there, that molecular dynamics time-discretisation artifacts can be absorbed into renormalisation of the bare parameters of the lattice action
stepl is the average number of steps per trajectory. For a single trajectory it times dt should equal 1
ntraj is the number of trajectories
istart signals a hot start (>=1) or cold start (<=0)
icheck is how often to print out a configuration. We typically use 5 and tune for 80% acceptance rate
iread is the starting configuration for continuation runs. If zero, start without reading

The bottom line of the input is ignored by the programme and is just there to make your life easier. Blank space does not matter, so long as there is some gap between the input parameters in the file and they are all on a single line.

su2hmc's People

Contributors

Watchers

su2hmc's Issues

CUDA Linkage

CUDA Code passes the compilation phase just fine, but does not link.
In particular, it is the C modules that seem to have trouble linking.

Type conversion kernel for GPU

One of main reasons we send stuff to the CPU at the moment is to convert it from single to double precision or vice versa. It's then sent straight back to the GPU, so why not do it there?

CUDA Force Terms

Next on the endless list of CUDA conversion bugs is the force routines. Both Gauge_Force and Force are currently giving incorrect results.

AOCC Makefile does not accept trajectories

Compiled both the AOCC and GCC code on EPYC Rome 7452 machine. Both launch but only the GCC seems to do the calculation correctly. Both are linking with AOCL BLIS

DP matrix multiplication streams

hdslash_f and hdslashd_f use 8 streams for matrix multiplication to keep the GPU as busy as possible. Would be nice to have the DP and dslash routines do the same. Not a priority as they are not major bottlenecks

Move Lattice Spacing/Nproc definitions to makefiles

Instead of defining nx etc. in sizes.h; define them as variables in the Makefile that get passed to the compiler.

Has the advantage of being able to name the output file after the number of processors/lattice dimensions and not having to edit the include file (a two second inconvenience)

Windows Support

Adjusting the code so that it runs on windows machines to make it more accessible to undergrads doing projects (like SPUR in Maynooth) who may not have Linux machines at home

CUDA Bosonic Observables

Now that the CUDA code runs, we need it to run correctly.

First focus is cubosonic.cu

If we can get that working then similar solutions will hopefully apply to other files.

Non-MPI performance

Even if a particular dimension of the lattice isn't divided across MPI ranks, it is still doing halo exchanges to itself. This increases the (admittedly small) memory footprint of the programme and wastes time on memory bound tasks. This will be even more critical in a GPU environment where we may not be parallelising across all lattice dimensions.

Multiple Integrators

At the moment, only leapfrog is implemented. But it would be cool to get OMF2 and OMF4 like openqcd 1.6

While we're at it, we can also make the integrator modular so it can be easily changed in the future if the need arises.

Prepare Plaquette for clover

SU2plaq currently only calculates the "forward" plaquette with μ and ν positive. For the clover we need all combinations of positive and negative implemented. This can be done as three new functions, or as a switch statement

Set seed

An option to set the seed would be nice

ICX Compilation Fails

Pretty sure this is an issue with the compiler rather than the code, but it fails with

 "/usr/bin/ld" -z relro --hash-style=gnu --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o su2hmc_OneAPI /lib/x86_64-linux-gnu/Scrt1.o /lib/x86_64-linux-gnu/crti.o /usr/lib/gcc/x86_64-linux-gnu/12/crtbeginS.o -L/opt/intel/oneapi/mkl/2024.0/lib/intel64 -L/opt/intel/oneapi/compiler/2024.0/bin/compiler/../../lib -L/opt/intel/oneapi/compiler/2024.0/bin/compiler/../../lib -L/opt/intel/oneapi/mkl/2024.0/lib -L/opt/intel/oneapi/compiler/2024.0/lib/clang/17/lib/x86_64-unknown-linux-gnu -L/opt/intel/oneapi/compiler/2024.0/bin/compiler/../../lib -L/usr/lib/gcc/x86_64-linux-gnu/12 -L/usr/lib/gcc/x86_64-linux-gnu/12/../../../../x86_64-linux-gnu/lib/../lib64 -L/usr/lib/gcc/x86_64-linux-gnu/12/../../../../lib64 -L/lib/x86_64-linux-gnu -L/lib/../lib64 -L/usr/lib/x86_64-linux-gnu -L/usr/lib/../lib64 -L/usr/lib/gcc/x86_64-linux-gnu/12/../../../../x86_64-linux-gnu/lib -L/usr/lib/gcc/x86_64-linux-gnu/12/../../.. -L/opt/intel/oneapi/compiler/2024.0/bin/compiler/../../lib -L/opt/intel/oneapi/compiler/2024.0/bin/compiler/../../opt/compiler/lib -L/lib -L/usr/lib -plugin /opt/intel/oneapi/compiler/2024.0/bin/compiler/../../lib/icx-lto.so -plugin-opt=mcpu=skylake -plugin-opt=O3 -plugin-opt=-vector-library=SVML -plugin-opt=fintel-libirc-allowed -plugin-opt=fintel-advanced-optim -plugin-opt=-disable-hir-generate-mkl-call -plugin-opt=-enable-npm-multiversioning -plugin-opt=-loopopt -plugin-opt=-intel-abi-compatible=true -plugin-opt=-x86-enable-unaligned-vector-move=true -L/opt/intel/oneapi/tbb/2021.11/env/../lib/intel64/gcc4.8 -L/opt/intel/oneapi/mpi/2021.11/lib -L/opt/intel/oneapi/mkl/2024.0/lib/ -L/opt/intel/oneapi/ippcp/2021.9/lib/ -L/opt/intel/oneapi/ipp/2021.10/lib -L/opt/intel/oneapi/dpl/2022.3/lib -L/opt/intel/oneapi/dnnl/2024.0/lib -L/opt/intel/oneapi/dal/2024.0/lib -L/opt/intel/oneapi/compiler/2024.0/lib -L/opt/intel/oneapi/ccl/2021.11/lib/ -L/opt/intel/oneapi/compiler/2023.2.2/linux/compiler/lib/intel64_lin -L/opt/intel/oneapi/compiler/2023.2.2/linux/lib -L/opt/AMD/aocl/aocl-linux-aocc-4.1.0/aocc/lib -L/opt/AMD/aocc-compiler-4.1.0/lib -L/opt/AMD/aocc-compiler-4.1.0/lib32 -L/usr/lib/x86_64-linux-gnu -L/usr/lib64 -L/usr/lib32 -L/usr/lib -L. coord.o random.o matrices.o congrad.o bosonic.o fermionic.o force.o par_mpi.o su2hmc.o -limf -lgsl main.o --start-group -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core --end-group -Bstatic -lsvml -Bdynamic -Bstatic -lirng -Bdynamic -Bstatic -limf -Bdynamic -lm -lgcc --as-needed -lgcc_s --no-as-needed -Bstatic -lirc -Bdynamic -ldl -liomp5 -L/opt/intel/oneapi/compiler/2024.0/bin/lib -lgcc --as-needed -lgcc_s --no-as-needed -lpthread -lc -lgcc --as-needed -lgcc_s --no-as-needed -Bstatic -lirc_s -Bdynamic /usr/lib/gcc/x86_64-linux-gnu/12/crtendS.o /lib/x86_64-linux-gnu/crtn.o
icx: error: unable to execute command: Segmentation fault (core dumped)
icx: error: linker command failed due to signal (use -v to see invocation)
make: *** [Makefile_OneAPI:62: su2hmc_OneAPI] Error 1

when I try to compile using OneAPI mpiicx compiler. I've tried with both 2023.2.2 and 2024.0.0 and gotten the same result.

However, when I tried to compile without -flto or -ipo I got


 "/opt/intel/oneapi/compiler/2023.2.2/linux/bin-llvm/clang" -cc1 -triple x86_64-unknown-linux-gnu -S -save-temps=cwd -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name fermionic.c -mrelocation-model static -fveclib=SVML -mframe-pointer=none -menable-no-infs -menable-no-nans -fapprox-func -funsafe-math-optimizations -fno-signed-zeros -mreassociate -freciprocal-math -fdenormal-fp-math=preserve-sign,preserve-sign -ffp-contract=fast -fno-rounding-math -ffast-math -ffinite-math-only -mconstructor-aliases -funwind-tables=2 -target-cpu skylake -mllvm -x86-enable-unaligned-vector-move=true -debugger-tuning=gdb -v -fcoverage-compilation-dir=/home/postgrad/dalel487/Code/su2/hybrid_code/production_c -resource-dir /opt/intel/oneapi/compiler/2023.2.2/linux/lib/clang/17 -O3 -std=gnu11 -fdebug-compilation-dir=/home/postgrad/dalel487/Code/su2/hybrid_code/production_c -ferror-limit 19 -fheinous-gnu-extensions -fopenmp-late-outline -fopenmp-threadprivate-legacy -fopenmp -fgnuc-version=4.2.1 -finline-functions -mllvm -enable-gvn-hoist -fcolor-diagnostics -vectorize-loops -vectorize-slp -mprefer-vector-width=512 -mllvm -paropt=31 -fopenmp-typed-clauses -ax=skylake-avx512,cascadelake,tigerlake,skylake,alderlake,raptorlake -D__GCC_HAVE_DWARF2_CFI_ASM=1 -fintel-compatibility -fintel-libirc-allowed -fintel-advanced-optim -mllvm -disable-hir-generate-mkl-call -mllvm -loopopt -floopopt-pipeline=full -mllvm -intel-abi-compatible=true -o fermionic.s -x cpp-output fermionic.i
clang -cc1 version 17.0.0 based upon LLVM 17.0.0git default target x86_64-unknown-linux-gnu
#include "..." search starts here:
End of search list.
PLEASE append the compiler options "-save-temps -v", rebuild the application to to get the full command which is failing and submit a bug report to https://software.intel.com/en-us/support/priority-support which includes the failing command, input files for the command and the crash backtrace (if any).
Stack dump:
0.	Program arguments: /opt/intel/oneapi/compiler/2023.2.2/linux/bin-llvm/clang -cc1 -triple x86_64-unknown-linux-gnu -S -save-temps=cwd -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name fermionic.c -mrelocation-model static -fveclib=SVML -mframe-pointer=none -menable-no-infs -menable-no-nans -fapprox-func -funsafe-math-optimizations -fno-signed-zeros -mreassociate -freciprocal-math -fdenormal-fp-math=preserve-sign,preserve-sign -ffp-contract=fast -fno-rounding-math -ffast-math -ffinite-math-only -mconstructor-aliases -funwind-tables=2 -target-cpu skylake -mllvm -x86-enable-unaligned-vector-move=true -debugger-tuning=gdb -v -fcoverage-compilation-dir=/home/postgrad/dalel487/Code/su2/hybrid_code/production_c -resource-dir /opt/intel/oneapi/compiler/2023.2.2/linux/lib/clang/17 -O3 -std=gnu11 -fdebug-compilation-dir=/home/postgrad/dalel487/Code/su2/hybrid_code/production_c -ferror-limit 19 -fheinous-gnu-extensions -fopenmp-late-outline -fopenmp-threadprivate-legacy -fopenmp -fgnuc-version=4.2.1 -finline-functions -mllvm -enable-gvn-hoist -fcolor-diagnostics -vectorize-loops -vectorize-slp -mprefer-vector-width=512 -mllvm -paropt=31 -fopenmp-typed-clauses -ax=skylake-avx512,cascadelake,tigerlake,skylake,alderlake,raptorlake -D__GCC_HAVE_DWARF2_CFI_ASM=1 -fintel-compatibility -fintel-libirc-allowed -fintel-advanced-optim -mllvm -disable-hir-generate-mkl-call -mllvm -loopopt -floopopt-pipeline=full -mllvm -intel-abi-compatible=true -o fermionic.s -x cpp-output fermionic.i
1.	<eof> parser at end of file
2.	Optimizer
 #0 0x000055d6d6336303 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/opt/intel/oneapi/compiler/2023.2.2/linux/bin-llvm/clang+0x4f85303)
 #1 0x000055d6d6334780 llvm::sys::RunSignalHandlers() (/opt/intel/oneapi/compiler/2023.2.2/linux/bin-llvm/clang+0x4f83780)
 #2 0x000055d6d633694f SignalHandler(int) Signals.cpp:0:0
 #3 0x00007f15d5e42520 (/usr/lib/x86_64-linux-gnu/libc.so.6+0x42520)
 #4 0x000055d6d665c507 llvm::loopopt::RegDDRef::getTempBaseValue() const (/opt/intel/oneapi/compiler/2023.2.2/linux/bin-llvm/clang+0x52ab507)
 #5 0x000055d6d66acb93 llvm::loopopt::DDTest::depends(llvm::loopopt::DDRef const*, llvm::loopopt::DDRef const*, llvm::loopopt::DirectionVector const&, bool, bool) (/opt/intel/oneapi/compiler/2023.2.2/linux/bin-llvm/clang+0x52fbb93)
 #6 0x000055d6d66b00e3 llvm::loopopt::DDTest::findDependencies(llvm::loopopt::DDRef*, llvm::loopopt::DDRef*, llvm::loopopt::DirectionVector const&, llvm::loopopt::DirectionVectorInfo&) (/opt/intel/oneapi/compiler/2023.2.2/linux/bin-llvm/clang+0x52ff0e3)
 #7 0x000055d6d669d3bb llvm::loopopt::HIRDDAnalysis::buildGraph(llvm::loopopt::HIRGraph<llvm::loopopt::DDRef, llvm::loopopt::DDEdge>&, llvm::loopopt::HLNode const*) (/opt/intel/oneapi/compiler/2023.2.2/linux/bin-llvm/clang+0x52ec3bb)
 #8 0x000055d6d669c8e0 llvm::loopopt::HIRDDAnalysis::getGraphImpl(llvm::loopopt::HLRegion const*, llvm::loopopt::HLNode const*) (/opt/intel/oneapi/compiler/2023.2.2/linux/bin-llvm/clang+0x52eb8e0)
 #9 0x000055d6d6766642 llvm::loopopt::scalarreplarray::MemRefGroup::isLegal() const (/opt/intel/oneapi/compiler/2023.2.2/linux/bin-llvm/clang+0x53b5642)
#10 0x000055d6d6767719 llvm::loopopt::scalarreplarray::HIRScalarReplArray::doAnalysis(llvm::loopopt::HLLoop*) (/opt/intel/oneapi/compiler/2023.2.2/linux/bin-llvm/clang+0x53b6719)
#11 0x000055d6d676750b llvm::loopopt::scalarreplarray::HIRScalarReplArray::run() (/opt/intel/oneapi/compiler/2023.2.2/linux/bin-llvm/clang+0x53b650b)
#12 0x000055d6d6768282 llvm::loopopt::HIRScalarReplArrayPass::runImpl(llvm::Function&, llvm::AnalysisManager<llvm::Function>&, llvm::loopopt::HIRFramework&) (/opt/intel/oneapi/compiler/2023.2.2/linux/bin-llvm/clang+0x53b7282)
#13 0x000055d6d769fdb0 llvm::detail::PassModel<llvm::Function, llvm::loopopt::HIRScalarReplArrayPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) PassBuilder.cpp:0:0
#14 0x000055d6d5e84340 llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/opt/intel/oneapi/compiler/2023.2.2/linux/bin-llvm/clang+0x4ad3340)
#15 0x000055d6d53c843d llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) NVPTXTargetMachine.cpp:0:0
#16 0x000055d6d5e8b0e9 llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/opt/intel/oneapi/compiler/2023.2.2/linux/bin-llvm/clang+0x4ada0e9)
#17 0x000055d6d53c81ad llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) NVPTXTargetMachine.cpp:0:0
#18 0x000055d6d5e83260 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/opt/intel/oneapi/compiler/2023.2.2/linux/bin-llvm/clang+0x4ad2260)
#19 0x000055d6d6d5dda6 (anonymous namespace)::EmitAssemblyHelper::RunOptimizationPipeline(clang::BackendAction, std::__1::unique_ptr<llvm::raw_pwrite_stream, std::__1::default_delete<llvm::raw_pwrite_stream>>&, std::__1::unique_ptr<llvm::ToolOutputFile, std::__1::default_delete<llvm::ToolOutputFile>>&) BackendUtil.cpp:0:0
#20 0x000055d6d6d582d6 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>, std::__1::unique_ptr<llvm::raw_pwrite_stream, std::__1::default_delete<llvm::raw_pwrite_stream>>) (/opt/intel/oneapi/compiler/2023.2.2/linux/bin-llvm/clang+0x59a72d6)
#21 0x000055d6d70e4dac clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) CodeGenAction.cpp:0:0
#22 0x000055d6d824790b clang::ParseAST(clang::Sema&, bool, bool) (/opt/intel/oneapi/compiler/2023.2.2/linux/bin-llvm/clang+0x6e9690b)
#23 0x000055d6d70e39aa clang::CodeGenAction::ExecuteAction() (/opt/intel/oneapi/compiler/2023.2.2/linux/bin-llvm/clang+0x5d329aa)
#24 0x000055d6d7053faa clang::FrontendAction::Execute() (/opt/intel/oneapi/compiler/2023.2.2/linux/bin-llvm/clang+0x5ca2faa)
#25 0x000055d6d6fe08f5 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/opt/intel/oneapi/compiler/2023.2.2/linux/bin-llvm/clang+0x5c2f8f5)
#26 0x000055d6d70e0561 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/opt/intel/oneapi/compiler/2023.2.2/linux/bin-llvm/clang+0x5d2f561)
#27 0x000055d6d509cc5d cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/opt/intel/oneapi/compiler/2023.2.2/linux/bin-llvm/clang+0x3cebc5d)
#28 0x000055d6d5098bb0 ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) driver.cpp:0:0
#29 0x000055d6d5098542 clang_main(int, char**, llvm::ToolContext const&) (/opt/intel/oneapi/compiler/2023.2.2/linux/bin-llvm/clang+0x3ce7542)
#30 0x000055d6d50a5ace main (/opt/intel/oneapi/compiler/2023.2.2/linux/bin-llvm/clang+0x3cf4ace)
#31 0x00007f15d5e29d90 __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16
#32 0x00007f15d5e29e40 call_init ./csu/../csu/libc-start.c:128:20
#33 0x00007f15d5e29e40 __libc_start_main ./csu/../csu/libc-start.c:379:5
#34 0x000055d6d5095da9 _start (/opt/intel/oneapi/compiler/2023.2.2/linux/bin-llvm/clang+0x3ce4da9)
icx: error: unable to execute command: Segmentation fault (core dumped)
icx: error: clang frontend command failed due to signal (use -v to see invocation)
Intel(R) oneAPI DPC++/C++ Compiler 2023.2.2 (2023.2.2.20230908)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/intel/oneapi/compiler/2023.2.2/linux/bin-llvm
Configuration file: /opt/intel/oneapi/compiler/2023.2.2/linux/bin-llvm/../bin/icx.cfg
icx: note: diagnostic msg: 
********************

PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
icx: note: diagnostic msg: /tmp/icx-e01265/fermionic-40cf65.c
icx: note: diagnostic msg: /tmp/icx-e01265/fermionic-40cf65.sh
icx: note: diagnostic msg: 

********************
make: *** [Makefile_OneAPI:65: fermionic.o] Error 1

Compilation works flawlessly with the Classic Compiler (mpiicc).

MacOS Support

While the chances of running this code on a mac cluster are rather slim, it may yet be run on a Macbook for teaching purposes or a Mac Pro/Studio for small production runs.

Having a Makefile tuned for those machines would make sense.

CUDA Race Condition

Get the right answer if CUDA_DEVICE_BLOCKING is set to 1 so the GPU code is doing the correct logic. The answer also differs on each run, indicating a race condition

Clover action

Implement a the clover fermion action to eliminate the \script{O}(a) discretisation error.

Hipification of GPU code

Convert the GPU code so that it uses HIP instead of CUDA.

That'll cause issues on machine that only support CUDA so we'll probably need both lying around.

Things to keep in mind

My strong personal dislike of American Spellings has lead me to correct cudaDeviceSynchronize() to cudaDeviceSynchronise(). We'll need a manual translation of these (sed?)
What file extensions are standard for HIP?
Do we rename every function (e.g. cuDslash) to a hip name (e.g. hipDslash)for consistency?

Doxygen

Would be good if the comments were in doxygen format to make documentation easier.

ARM Optimisations

With more ARM based clusters coming online, having a the code able to make use of those seems like sensible futureproofing

SU2plaq return value

SU(2) plaq currently returns only the "scalar" component (creal(u11t)) of the plaquette as that's all we need for the average plaquette. We should generalise it (again, like it was in FORTRAN) to return the full gauge field and call creal in average plaquette. This will only cause a minute increase in runtime, but better set us up for the clover term.

Optimisation of CUDA

So the CUDA code finally works (yay!), how about making it performant. This will involve profiling the kernels and potentially a rather large rewrite of how the lattice is sent to the various routines.

CUDA Matrix Multiplication

Now we need the multiplication routines in CUDA to produce the correct results. The good news is that they're rather similar so fixing for one function should allow us to fix the rest for the same float size.

gamval_d needs to be used instead of gamval is the first thing to check.

Convert Conjugate gradient to single precision

We need to copy the device float array x2_f into the managed double array x2. Because this is CUDA a simple x2_f[i]=(float)x2[i] won't cut it. One option is to change x2_f to be managed too. Need to see where the managed array is used though and who that'll affect memory performance.

CUDA to SYCL

DiRAC/Codeplay workshop on SYCL is a good chance to try making the code more portable. So we'll try that now

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.