Giter Club home page Giter Club logo

Comments (7)

n01r avatar n01r commented on July 16, 2024 1

With cudatoolkit/11.5, running cuda-gdb gives an error

(impactx) mgarten@nid001512:/pscratch/sd/m/mgarten/impactx/001_FODO_single-GPU_DEBUG> cuda-gdb
cuda-gdb: warning: PyMemoryView_FromObject: called while Python is not available!
Python path configuration:
  PYTHONHOME = (not set)
  PYTHONPATH = '/opt/cray/pe/python/3.9.7.1'
  program name = 'python3'
  isolated = 0
  environment = 1
  user site = 1
  import site = 1
  sys._base_executable = '/global/homes/m/mgarten/sw/perlmutter/venvs/impactx/bin/python3'
  sys.base_prefix = '/opt/cray/pe/python/3.9.7.1'
  sys.base_exec_prefix = '/opt/cray/pe/python/3.9.7.1'
  sys.platlibdir = 'lib'
  sys.executable = '/global/homes/m/mgarten/sw/perlmutter/venvs/impactx/bin/python3'
  sys.prefix = '/opt/cray/pe/python/3.9.7.1'
  sys.exec_prefix = '/opt/cray/pe/python/3.9.7.1'
  sys.path = [
    '/opt/cray/pe/python/3.9.7.1',
    '/opt/cray/pe/python/3.9.7.1/lib/python39.zip',
    '/opt/cray/pe/python/3.9.7.1/lib/python3.9',
    '/opt/cray/pe/python/3.9.7.1/lib/python3.9/lib-dynload',
  ]
Fatal Python error: init_fs_encoding: failed to get the Python codec of the filesystem encoding
Python runtime state: core initialized
Traceback (most recent call last):
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
cuda-gdb: warning: PyMemoryView_FromObject: called while Python is not available!
  File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
cuda-gdb: warning: PyMemoryView_FromObject: called while Python is not available!
  File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
cuda-gdb: warning: PyMemoryView_FromObject: called while Python is not available!
  File "<frozen importlib._bootstrap_external>", line 846, in exec_module
cuda-gdb: warning: PyMemoryView_FromObject: called while Python is not available!
  File "<frozen importlib._bootstrap_external>", line 951, in get_code
cuda-gdb: warning: PyMemoryView_FromObject: called while Python is not available!
SystemError: <class 'memoryview'> returned NULL without setting an error

But swapping it out for cudatoolkit/11.0 lets me run the debugger.

cuda-gdb run
(cuda-gdb) file impactx
Reading symbols from impactx...done.
(cuda-gdb) run input_fodo.in amrex.throw_exception=1 amrex.signal_handling=0
Starting program: /pscratch/sd/m/mgarten/impactx/001_FODO_single-GPU_DEBUG/impactx input_fodo.in amrex.throw_exception=1 amrex.signal_handling=0
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
warning: File "/opt/cray/pe/gcc/11.2.0/snos/lib64/libstdc++.so.6.0.29-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
To enable execution of this file add
	add-auto-load-safe-path /opt/cray/pe/gcc/11.2.0/snos/lib64/libstdc++.so.6.0.29-gdb.py
line to your configuration file "/global/homes/m/mgarten/.cuda-gdbinit".
To completely disable this security protection add
	set auto-load safe-path /
line to your configuration file "/global/homes/m/mgarten/.cuda-gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
	info "(gdb)Auto-loading safe path"
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
[New Thread 0x7fffe5ed7000 (LWP 67128)]
Initializing CUDA...
[Detaching after fork from child process 67129]
[New Thread 0x7fffdbbb0000 (LWP 67141)]
[New Thread 0x7fffdb3af000 (LWP 67142)]
warning: Cuda API error detected: cuPointerGetAttribute returned (0x1)

warning: Cuda API error detected: cuPointerGetAttribute returned (0x1)

CUDA initialized with 1 GPU per MPI rank; 1 GPU(s) used in total
warning: Cuda API error detected: cuPointerGetAttribute returned (0x1)

warning: Cuda API error detected: cuPointerGetAttribute returned (0x1)

MPI initialized with 1 MPI processes
MPI initialized with thread support level 0
AMReX (22.06-39-g2d931f63cb4d) initialized
warning: Cuda API error detected: cuPointerGetAttribute returned (0x1)

warning: Cuda API error detected: cuPointerGetAttribute returned (0x1)

boxArray(0) (BoxArray maxbox(1)
       m_ref->m_hash_sig(0)
       ((0,0,0) (7,7,7) (0,0,0)) )

warning: Cuda API error detected: cuPointerGetAttribute returned (0x1)

warning: Cuda API error detected: cuPointerGetAttribute returned (0x1)

Beam kinetic energy (MeV): 2000
Bunch charge (C): 0
Particle type: electron
Number of particles: 10000
Beam distribution type: waterbag
Static units
Initialized beam distribution parameters
warning: Cuda API error detected: cuPointerGetAttribute returned (0x1)

warning: Cuda API error detected: cuPointerGetAttribute returned (0x1)

# of particles: 10000
Initialized element list
 ++++ Starting step=0
warning: Cuda API error detected: cuPointerGetAttribute returned (0x1)

warning: Cuda API error detected: cuPointerGetAttribute returned (0x1)


CUDA Exception: Warp Illegal Address
The exception was triggered at PC 0x6f5c240 (Drift.H:69)

Thread 1 "impactx" received signal CUDA_EXCEPTION_14, Warp Illegal Address.
[Switching focus to CUDA kernel 0, grid 61, block (0,0,0), thread (128,0,0), device 0, sm 0, warp 4, lane 0]
0x0000000006f5c250 in impactx::Drift::operator() (this=0x131f9ad0, p=..., px=<optimized out>, py=<optimized out>, pt=<optimized out>, refpart=...)
    at /global/homes/m/mgarten/src/impactx/src/particles/elements/Drift.H:69
69	            p.pos(0) = x + m_ds * px;

Backtrace

(cuda-gdb) backtrace
#0  0x0000000006f5c250 in impactx::Drift::operator() (this=0x131f9ad0, p=..., px=<optimized out>, py=<optimized out>, pt=<optimized out>, refpart=...)
    at /global/homes/m/mgarten/src/impactx/src/particles/elements/Drift.H:69
#1  impactx::detail::PushSingleParticle<impactx::Drift const&>::operator() (this=0x7fffddfffbf8, i=<optimized out>) at /global/homes/m/mgarten/src/impactx/src/particles/Push.cpp:81
#2  amrex::detail::call_f<impactx::detail::PushSingleParticle<impactx::Drift const&>, int> (f=..., i=<optimized out>)
    at /global/u1/m/mgarten/src/impactx/build/_deps/fetchedamrex-src/Src/Base/AMReX_GpuLaunchFunctsG.H:752
#3  0x00000000070cd460 in _ZZN5amrex11ParallelForIiRKN7impactx6detail18PushSingleParticleIRKNS1_5DriftEEEvEENSt9enable_ifIXsr5amrex19MaybeDeviceRunnableIT0_vEE5valueEvE4typeERKNS_3Gpu10KernelInfoET_OSB_ENKUlvE_clEv (this=<optimized out>) at /global/u1/m/mgarten/src/impactx/build/_deps/fetchedamrex-src/Src/Base/AMReX_GpuLaunchFunctsG.H:802
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

I could not see any values for variables because the compiler optimizes them out in Drift.H.

69	            p.pos(0) = x + m_ds * px;
(cuda-gdb) print px
$1 = <optimized out>
(cuda-gdb) print x
$2 = <optimized out>
(cuda-gdb) print p
$3 = (@local _ZN7impactx5Drift5PTypeE & @local) <error reading variable>
(cuda-gdb) break Drift.H:67

So I built again with the option g -O0 and hopefully I will see more.

Edit:
... I actually tried to build it again without optimization but it still shows <optimized out>. Should I have deleted the build directory completely before?

from impactx.

cemitch99 avatar cemitch99 commented on July 16, 2024 1

The object p is complicated struct, so I think the final line makes sense. I'm not sure if gdb will allow a print p.pos(0), etc.

from impactx.

ax3l avatar ax3l commented on July 16, 2024 1

In the end, the current AMReX particle AoS object p is really just a

struct {
   amrex::ParticleReal r[n];
   int i[m];
};

You could check in cuda-gdb if the object p is valid memory (on the device) itself by printing its address and checking its range and then printing it's first member (which we interpret as position x).

... I actually tried to build it again without optimization but it still shows . Should I have deleted the build directory completely before?

yes, you need to redo the configure step with a fresh build dir. CXXFLAGS are only added at the first configure in a build directory (they change defaults for the configure step).

from impactx.

ax3l avatar ax3l commented on July 16, 2024 1

that should work in general... doing it with a single configure is the safest bet if you are unsure though.
You can configure with -DCMAKE_VERBOSE_MAKEFILE=ON if you are unsure what's ending up on the compiler line and want to see.

from impactx.

ax3l avatar ax3l commented on July 16, 2024

Memo from our discussion:

from impactx.

n01r avatar n01r commented on July 16, 2024

yes, you need to redo the configure step with a fresh build dir. CXXFLAGS are only added at the first configure in a build directory (they change defaults for the configure step).

But deleting build, running cmake -S . -B build and then doing ccmake build, editing stuff, hitting c to configure and g to generate should work, no?

from impactx.

ax3l avatar ax3l commented on July 16, 2024

cc @WeiqunZhang @atmyers @kngott turns out this is in part a bug in AMReX init with GPU-aware MPI on Perlmutter.

If I set export MPICH_GPU_SUPPORT_ENABLED=0 the issue Cuda API error detected: cuPointerGetAttribute returned (0x1) vanishes. Backtrace:

The other issue above is an when we try to access fundamental types (not even pointers) of lattice elements on device, e.g., the amrex::ParticleReal m_ds member: CUDA Exception: Warp Illegal Address. The problem is so weird that I start to think it's a compiler bug... and it probably is: #174

from impactx.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.