Comments (7)
With cudatoolkit/11.5
, running cuda-gdb
gives an error
(impactx) mgarten@nid001512:/pscratch/sd/m/mgarten/impactx/001_FODO_single-GPU_DEBUG> cuda-gdb
cuda-gdb: warning: PyMemoryView_FromObject: called while Python is not available!
Python path configuration:
PYTHONHOME = (not set)
PYTHONPATH = '/opt/cray/pe/python/3.9.7.1'
program name = 'python3'
isolated = 0
environment = 1
user site = 1
import site = 1
sys._base_executable = '/global/homes/m/mgarten/sw/perlmutter/venvs/impactx/bin/python3'
sys.base_prefix = '/opt/cray/pe/python/3.9.7.1'
sys.base_exec_prefix = '/opt/cray/pe/python/3.9.7.1'
sys.platlibdir = 'lib'
sys.executable = '/global/homes/m/mgarten/sw/perlmutter/venvs/impactx/bin/python3'
sys.prefix = '/opt/cray/pe/python/3.9.7.1'
sys.exec_prefix = '/opt/cray/pe/python/3.9.7.1'
sys.path = [
'/opt/cray/pe/python/3.9.7.1',
'/opt/cray/pe/python/3.9.7.1/lib/python39.zip',
'/opt/cray/pe/python/3.9.7.1/lib/python3.9',
'/opt/cray/pe/python/3.9.7.1/lib/python3.9/lib-dynload',
]
Fatal Python error: init_fs_encoding: failed to get the Python codec of the filesystem encoding
Python runtime state: core initialized
Traceback (most recent call last):
File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
cuda-gdb: warning: PyMemoryView_FromObject: called while Python is not available!
File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
cuda-gdb: warning: PyMemoryView_FromObject: called while Python is not available!
File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
cuda-gdb: warning: PyMemoryView_FromObject: called while Python is not available!
File "<frozen importlib._bootstrap_external>", line 846, in exec_module
cuda-gdb: warning: PyMemoryView_FromObject: called while Python is not available!
File "<frozen importlib._bootstrap_external>", line 951, in get_code
cuda-gdb: warning: PyMemoryView_FromObject: called while Python is not available!
SystemError: <class 'memoryview'> returned NULL without setting an error
But swapping it out for cudatoolkit/11.0
lets me run the debugger.
cuda-gdb run
(cuda-gdb) file impactx
Reading symbols from impactx...done.
(cuda-gdb) run input_fodo.in amrex.throw_exception=1 amrex.signal_handling=0
Starting program: /pscratch/sd/m/mgarten/impactx/001_FODO_single-GPU_DEBUG/impactx input_fodo.in amrex.throw_exception=1 amrex.signal_handling=0
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
warning: File "/opt/cray/pe/gcc/11.2.0/snos/lib64/libstdc++.so.6.0.29-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
To enable execution of this file add
add-auto-load-safe-path /opt/cray/pe/gcc/11.2.0/snos/lib64/libstdc++.so.6.0.29-gdb.py
line to your configuration file "/global/homes/m/mgarten/.cuda-gdbinit".
To completely disable this security protection add
set auto-load safe-path /
line to your configuration file "/global/homes/m/mgarten/.cuda-gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual. E.g., run from the shell:
info "(gdb)Auto-loading safe path"
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
[New Thread 0x7fffe5ed7000 (LWP 67128)]
Initializing CUDA...
[Detaching after fork from child process 67129]
[New Thread 0x7fffdbbb0000 (LWP 67141)]
[New Thread 0x7fffdb3af000 (LWP 67142)]
warning: Cuda API error detected: cuPointerGetAttribute returned (0x1)
warning: Cuda API error detected: cuPointerGetAttribute returned (0x1)
CUDA initialized with 1 GPU per MPI rank; 1 GPU(s) used in total
warning: Cuda API error detected: cuPointerGetAttribute returned (0x1)
warning: Cuda API error detected: cuPointerGetAttribute returned (0x1)
MPI initialized with 1 MPI processes
MPI initialized with thread support level 0
AMReX (22.06-39-g2d931f63cb4d) initialized
warning: Cuda API error detected: cuPointerGetAttribute returned (0x1)
warning: Cuda API error detected: cuPointerGetAttribute returned (0x1)
boxArray(0) (BoxArray maxbox(1)
m_ref->m_hash_sig(0)
((0,0,0) (7,7,7) (0,0,0)) )
warning: Cuda API error detected: cuPointerGetAttribute returned (0x1)
warning: Cuda API error detected: cuPointerGetAttribute returned (0x1)
Beam kinetic energy (MeV): 2000
Bunch charge (C): 0
Particle type: electron
Number of particles: 10000
Beam distribution type: waterbag
Static units
Initialized beam distribution parameters
warning: Cuda API error detected: cuPointerGetAttribute returned (0x1)
warning: Cuda API error detected: cuPointerGetAttribute returned (0x1)
# of particles: 10000
Initialized element list
++++ Starting step=0
warning: Cuda API error detected: cuPointerGetAttribute returned (0x1)
warning: Cuda API error detected: cuPointerGetAttribute returned (0x1)
CUDA Exception: Warp Illegal Address
The exception was triggered at PC 0x6f5c240 (Drift.H:69)
Thread 1 "impactx" received signal CUDA_EXCEPTION_14, Warp Illegal Address.
[Switching focus to CUDA kernel 0, grid 61, block (0,0,0), thread (128,0,0), device 0, sm 0, warp 4, lane 0]
0x0000000006f5c250 in impactx::Drift::operator() (this=0x131f9ad0, p=..., px=<optimized out>, py=<optimized out>, pt=<optimized out>, refpart=...)
at /global/homes/m/mgarten/src/impactx/src/particles/elements/Drift.H:69
69 p.pos(0) = x + m_ds * px;
Backtrace
(cuda-gdb) backtrace
#0 0x0000000006f5c250 in impactx::Drift::operator() (this=0x131f9ad0, p=..., px=<optimized out>, py=<optimized out>, pt=<optimized out>, refpart=...)
at /global/homes/m/mgarten/src/impactx/src/particles/elements/Drift.H:69
#1 impactx::detail::PushSingleParticle<impactx::Drift const&>::operator() (this=0x7fffddfffbf8, i=<optimized out>) at /global/homes/m/mgarten/src/impactx/src/particles/Push.cpp:81
#2 amrex::detail::call_f<impactx::detail::PushSingleParticle<impactx::Drift const&>, int> (f=..., i=<optimized out>)
at /global/u1/m/mgarten/src/impactx/build/_deps/fetchedamrex-src/Src/Base/AMReX_GpuLaunchFunctsG.H:752
#3 0x00000000070cd460 in _ZZN5amrex11ParallelForIiRKN7impactx6detail18PushSingleParticleIRKNS1_5DriftEEEvEENSt9enable_ifIXsr5amrex19MaybeDeviceRunnableIT0_vEE5valueEvE4typeERKNS_3Gpu10KernelInfoET_OSB_ENKUlvE_clEv (this=<optimized out>) at /global/u1/m/mgarten/src/impactx/build/_deps/fetchedamrex-src/Src/Base/AMReX_GpuLaunchFunctsG.H:802
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
I could not see any values for variables because the compiler optimizes them out in Drift.H
.
69 p.pos(0) = x + m_ds * px;
(cuda-gdb) print px
$1 = <optimized out>
(cuda-gdb) print x
$2 = <optimized out>
(cuda-gdb) print p
$3 = (@local _ZN7impactx5Drift5PTypeE & @local) <error reading variable>
(cuda-gdb) break Drift.H:67
So I built again with the option g -O0
and hopefully I will see more.
Edit:
... I actually tried to build it again without optimization but it still shows <optimized out>
. Should I have deleted the build directory completely before?
from impactx.
The object p is complicated struct, so I think the final line makes sense. I'm not sure if gdb will allow a print p.pos(0), etc.
from impactx.
In the end, the current AMReX particle AoS object p
is really just a
struct {
amrex::ParticleReal r[n];
int i[m];
};
You could check in cuda-gdb if the object p
is valid memory (on the device) itself by printing its address and checking its range and then printing it's first member (which we interpret as position x).
... I actually tried to build it again without optimization but it still shows . Should I have deleted the build directory completely before?
yes, you need to redo the configure step with a fresh build dir. CXXFLAGS are only added at the first configure in a build directory (they change defaults for the configure step).
from impactx.
that should work in general... doing it with a single configure is the safest bet if you are unsure though.
You can configure with -DCMAKE_VERBOSE_MAKEFILE=ON
if you are unsure what's ending up on the compiler line and want to see.
from impactx.
Memo from our discussion:
- Debug workflow: https://warpx.readthedocs.io/en/latest/usage/workflows/debugging.html
cuda-gdb
with AMReX runtime optionsamrex.throw_exception = 1 amrex.signal_handling = 0
from impactx.
yes, you need to redo the configure step with a fresh build dir. CXXFLAGS are only added at the first configure in a build directory (they change defaults for the configure step).
But deleting build
, running cmake -S . -B build
and then doing ccmake build
, editing stuff, hitting c
to configure and g
to generate should work, no?
from impactx.
cc @WeiqunZhang @atmyers @kngott turns out this is in part a bug in AMReX init with GPU-aware MPI on Perlmutter.
If I set export MPICH_GPU_SUPPORT_ENABLED=0
the issue Cuda API error detected: cuPointerGetAttribute returned (0x1)
vanishes. Backtrace:
- Src/Base/AMReX_ParallelDescriptor.H:907
- Src/Base/AMReX_GpuDevice.cpp:299
- Src/Base/AMReX.cpp:432
- src/initialization/InitAMReX.cpp:32
- impactx/src/main.cpp:27
The other issue above is an when we try to access fundamental types (not even pointers) of lattice elements on device, e.g., the amrex::ParticleReal m_ds
member: CUDA Exception: Warp Illegal Address
. The problem is so weird that I start to think it's a compiler bug... and it probably is: #174
from impactx.
Related Issues (20)
- constf.kt = 0: handle special case
- Implement unused parameter warnings (& optional abort) HOT 4
- Distributions: Move to ABLASTR
- Reference Particle Helpers HOT 4
- Conda: llvm-strip <=13 broken on Apple aarch64 M1 HOT 1
- OpenMP Support
- Activate Dynamic Load Balancing
- Default (Top) Visualization of a Lattice
- Examples: All with a Python Version HOT 8
- Examples: show Python version first HOT 1
- Python: On by Default
- Field Gather: Beyond 1st Order Shape HOT 1
- Analysis Test Scripts: Check <z> and pt HOT 1
- Embedded Boundaries
- Benchmark with Mayes et al. (2018)
- Space-Charge: AMR
- Python: Implement Callbacks HOT 2
- In-Situ Beam Parameter Calculation for Reduced Diagnostics HOT 1
- Transition Guide: IMPACT-Z -> ImpactX
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from impactx.