Comments (39)
I will look about this later.
from cupy.
Can you paste codes to reproduce?
from cupy.
Sorry, my code causing the problem is for my research, so I can't paste it.
I have no idea which part causes this problem.
So, if there is any way to debug this, please tell me and I will try to solve this.
from cupy.
On cython, import pdb; pdb.set_trace()
is not available, so just do print
debug. You can build cupy like
pip uninstall cupy # just to assure another cupy is not installed
python setup.py install
and run your program.
from cupy.
Hmm, I can not reproduce.
import cupy
cupy.cuda.set_allocator(cupy.cuda.MemoryPool().malloc)
x = cupy.array([1,2,3,4], dtype=cupy.float64)
import cupy
cupy.cuda.set_allocator(cupy.cuda.MemoryPool().malloc)
x = cupy.array([1,2,3,4], dtype=cupy.float64)
y = cupy.array([1,2,3,4], dtype=cupy.float64)
z = float(cupy.sum((x - y)**2, dtype=cupy.float64))
from cupy.
Let me take a note. cupy.cuda.runtime.free
should be invoked only when free_all_blocks
is invoked
Line 471 in 5062f61
Memory.__dealloc__
is called)from cupy.
Thank you for your rapid replies.
I will try to create PoC code from my framework and do print
debug tomorrow on PC in our lab (please wait ...).
I realized that:
- Normal forwarding steps in training seem not to affect.
- Permutating the rows of a design matrix (calculating permutation index array with
numpy.random.permutation
and indexing on cupy array) seems to affect (causingcudaErrorIllegalAddress
?). - Adam updates in backwarding steps seem to affect (not causing
cudaErrorIllegalAddress
).
from cupy.
The traceback line
File "cupy\cuda\memory.pyx", line 358, in cupy.cuda.memory.PooledMemory.free
TypeError: 'NoneType' object is not callable
tells the pool
object which is a weakref.ref
of a SingleDeviceMemoryPool
was None
.
Line 358 in 5062f61
347 def __dealloc__(self):
348 if self.ptr != 0:
349 self.free()
350
351 cpdef free(self):
352 """Frees the memory buffer and returns it to the memory pool.
353
354 This function actually does not free the buffer. It just returns the
355 buffer to the memory pool for reuse.
356
357 """
358 pool = self.pool()
359 if pool and self.ptr != 0:
360 pool.free(self.ptr, self.size)
361 self.ptr = 0
362 self.size = 0
363 self.device = None
You need to be careful what you do in a
__dealloc__()
method. By the time your__dealloc__()
method is called, the object may already have been partially destroyed and may not be in a valid state as far as Python is concerned, so you should avoid invoking any Python operations which might touch the object. In particular, donโt call any other methods of the object or do anything which might cause the object to be resurrected. Itโs best if you stick to just deallocating C data.
it looks the pool
object potentially can be None in __dealloc__()
.
This looks like a potential bug which also exists in cupy 1.0.1.
from cupy.
BTW: When I put raise RuntimeError
like below:
351 cpdef free(self):
352 """Frees the memory buffer and returns it to the memory pool.
353
354 This function actually does not free the buffer. It just returns the
355 buffer to the memory pool for reuse.
356
357 """
358 raise RuntimeError('yay!')
359 pool = self.pool()
I got following error messages which are similar to issued error messages
Exception ignored in: 'cupy.cuda.memory.PooledMemory.__dealloc__'
Traceback (most recent call last):
File "cupy/cuda/memory.pyx", line 358, in cupy.cuda.memory.PooledMemory.free (cupy/cuda/memory.cpp:7482)
RuntimeError: yay!
Exception ignored in: 'cupy.cuda.memory.Memory.__dealloc__'
Traceback (most recent call last):
File "cupy/cuda/runtime.pyx", line 222, in cupy.cuda.runtime.free (cupy/cuda/runtime.cpp:3776)
File "cupy/cuda/runtime.pyx", line 130, in cupy.cuda.runtime.check_status (cupy/cuda/runtime.cpp:2262)
cupy.cuda.runtime.CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered
Now, I am wondering how to fix this problem because cython's cdef classes do not have __del__()
.
from cupy.
@hiro4bbh I wrote an experimental patch master...sonots:fix_317. This patch is to use __del__()
instead of __dealloc__()
to release an object. Could you try this?
Build as:
git remote add sonots https://github.com/sonots/cupy
git remote update
git checkout -b fix_317 sonots/fix_317
git clean -fdx
python setup.py install
from cupy.
Thank you for your patch.
I applied the patch as you tell, then I got the following error message many times:
Traceback (most recent call last):
File "cupy\cuda\memory.pyx", line 349, in cupy.cuda.memory.PooledMemory.__del__
File "cupy\cuda\memory.pyx", line 360, in cupy.cuda.memory.PooledMemory.free
File "cupy\cuda\memory.pyx", line 485, in cupy.cuda.memory.SingleDeviceMemoryPool.free
File "cupy\cuda\memory.pyx", line 501, in cupy.cuda.memory.SingleDeviceMemoryPool.free
ValueError: list.remove(x): x not in list
Exception ignored in: <bound method PooledMemory.__del__ of <cupy.cuda.memory.PooledMemory object at 0x0000020C17A87A20>>
The number of times that this error happens changes at each run... Are some free lists destroyed in some chunk operations? I think there is no multithreaded operations...
If cupy.cuda.set_allocator
is called, some calculations failed (NaN happens or losses didn't decrease in some case if random permutations change). cupy.cuda.set_pinned_memory_allocator
accelerates calculations, and some error messages won't happen. Does pinned memory allocator affect this problem?
I will inspect the details as preparing PoC code.
from cupy.
Thank you for trying. Hmm, I will investigate.
cupy.cuda.set_pinned_memory_allocator
is used to cache a pinned host (CPU) memory, not GPU memory. cupy.cuda.memory
is not a module for pinned memory, so pinned memory allocator is probably not related with this problem.
from cupy.
It looks __del__()
is working anyway although __dealloc__()
was not working well. It is a progress although we still have another strange behavior.
from cupy.
When I implement Adam with cupy.ElementwiseKernel
, then some error messages didn't happen, but rarely happen... Furthermore, when I extract the code from my framework, then any error won't happen...
Maybe, this error is based on free list manipulation operated at memory allocations/deallocations, so it would be difficult to write stable PoC code succinctly (some parts in my framework may affect).
I couldn't create PoC code, but i will continue to create PoC code and inspect the implementation.
from cupy.
As trying to create stable PoC code, I realized that cudaErrorIllegalAddress
won't happen anymore. I think that cupy should use Python __dealloc__
for pure-C data structures, so @sonots's patch is helpful. Thank you.
In some cases, when cupy.ElementwiseKernel
is used, any error doesn't happen, so I suspect that some small allocations destroys free list maybe (x * y
causes one allocation for the result, etc.).
from cupy.
Let me make sure. Do you mean you still get ValueError: list.remove(x): x not in list
although you do no get cudaErrorInvalidDevicePointer
?
from cupy.
Yes. I got ValueError: list.remove(x): x not in list
sometimes (I can't figure out the pattern...). However, I didn't get cudaErrorInvalidDevicePointer
anymore.
from cupy.
Okay, thanks.
from cupy.
@hiro4bbh could you do me a favor?
I added debug print > sonots@6a6732a (this commit is pushed in sonots/fix_317
branch)
Could you run your program with this and paste the result? Please note that the result would become so huge. Pasting on a separated gist would be better. If log is too huge to paste, it is okay to filter to only "malloc" and "free" line.
from cupy.
Thanks for your patch.
I will try your patch next week, because I can't use CUDA PC untill then.
Sorry for my late reply...
from cupy.
I tried @sonots patch on a CUDA machine.
I got the exceptions (fix_317_failed_stdout.txt) and fix_317
logs (about 1MB, fix_317_failed_malloc_free.txt). The links are at my gist.
I think that we can ignore RuntimeError: reentrant call inside <_io.BufferedWriter name='<stdout>'>
. Thus the problem is only ValueError: list.remove(x): x not in list
.
from cupy.
Thanks! But, it seems the last line of fix_317_failed_malloc_free.txt is broken like:
fix_317 free(ptr=81726034432, si
Was you able to paste entire logs until last line where an error occurred?
Hmm, Gist may not be a good place to paste.
Can you send logs via email? My email address is available from here https://github.com/sonots.
from cupy.
One more thing. I changed the log line of malloc as:
fix_317 malloc(size=512) ptr=38930 PooledMemory=<cupy.cuda.memory.PooledMemory object at 0x7f88482002e8>
Could you pull fix_317
branch again? Thank you for your cooperation.
from cupy.
Thank you for your email!!
from cupy.
With logs you've sent via email, I could not see ValueError: list.remove(x): x not in list
C:\Program Files\Python36\lib\importlib\_bootstrap.py:205: ImportWarning: can't resolve package from __spec__ or __package__, falling back on __name__ and __path__
return f(*args, **kwds)
Exception ignored in: <bound method PooledMemory.__del__ of <cupy.cuda.memory.PooledMemory object at 0x00000204360EF2E8>>
Traceback (most recent call last):
File "cupy\cuda\memory.pyx", line 349, in cupy.cuda.memory.PooledMemory.__del__
File "cupy\cuda\memory.pyx", line 360, in cupy.cuda.memory.PooledMemory.free
File "cupy\cuda\memory.pyx", line 491, in cupy.cuda.memory.SingleDeviceMemoryPool.free
File "cupy\cuda\memory.pyx", line 492, in cupy.cuda.memory.SingleDeviceMemoryPool.free
RuntimeError: reentrant call inside <_io.BufferedWriter name='<stdout>'>
Exception ignored in: <bound method PooledMemory.__del__ of <cupy.cuda.memory.PooledMemory object at 0x0000020436696F28>>
Traceback (most recent call last):
File "cupy\cuda\memory.pyx", line 349, in cupy.cuda.memory.PooledMemory.__del__
File "cupy\cuda\memory.pyx", line 360, in cupy.cuda.memory.PooledMemory.free
File "cupy\cuda\memory.pyx", line 491, in cupy.cuda.memory.SingleDeviceMemoryPool.free
File "cupy\cuda\memory.pyx", line 500, in cupy.cuda.memory.SingleDeviceMemoryPool.free
RuntimeError: reentrant call inside <_io.BufferedWriter name='<stdout>'>
Did you get the ValueError
actually?
from cupy.
This is just my progress. I tried to reproduce by generating python codes like below from logs:
import re
print('import cupy')
print('pool = cupy.cuda.MemoryPool()')
for line in open('fix_317_failed_all.txt', 'r'):
# fix_317 malloc(size=512) ptr=81719733760 PooledMemory=<cupy.cuda.memory.PooledMemory object at 0x0000020435305BE0>
if line.startswith('fix_317 malloc'):
line = line.replace('fix_317 malloc(', '')
line = line.replace(')', '')
line = re.sub(r'PooledMemory object.*$', '', line)
items = line.split(' ')
d = {}
for item in items:
k, v = item.split('=')
d[k] = v
print('m{} = pool.malloc({})'.format(d['ptr'], d['size']))
# fix_317 free(ptr=81723916288, size=24064)
elif line.startswith('fix_317 free'):
line = line.replace('fix_317 free(', '')
line = line.replace(')', '')
items = line.split(', ')
d = {}
for item in items:
k, v = item.split('=')
d[k] = v
print('del m{}'.format(d['ptr']))
Generated codes are:
import cupy
pool = cupy.cuda.MemoryPool()
m81719721984 = pool.malloc(512)
m81719722496 = pool.malloc(512)
m81719723008 = pool.malloc(512)
m81719723520 = pool.malloc(512)
m81719724032 = pool.malloc(512)
m81721819136 = pool.malloc(12288)
m81719724544 = pool.malloc(512)
m81721831424 = pool.malloc(12288)
m81723916288 = pool.malloc(24064)
m81719725056 = pool.malloc(512)
del m81723916288
m81723916288 = pool.malloc(512)
del m81719725056
m81719725056 = pool.malloc(512)
m81723916800 = pool.malloc(512)
[omitted]
But, I still can not reproduce yet.
from cupy.
Sorry, I think that I extracted the log of not-problematic code. I will extract the log of the problematic one.
Please wait a moment...
from cupy.
Thank you for new logs.
from cupy.
Hmm, unfortunately, I could not reproduce from replay. I will investigate more.
from cupy.
$ grep -C 2 -n '81721835008' ~/fix_317_failed_all.txt
1349:fix_317 [pop best-fit free_list] ptr=81721835008 size=512 chunk=<cupy.cuda.memory.Chunk object at 0x00000240B0A7BC10>
1350:fix_317 [split size=512] ptr=81721835008 size=512 chunk=<cupy.cuda.memory.Chunk object at 0x00000240B0A7BC10>
1351:fix_317 [push in_use] ptr=81721835008 size=512 chunk=<cupy.cuda.memory.Chunk object at 0x00000240B0A7BC10>
1352-fix_317 malloc(size=512) ptr=81721835008 PooledMemory=<cupy.cuda.memory.PooledMemory object at 0x00000240B0A44A20>
--
2121-fix_317 free(ptr=81721834496, size=512)
2122-fix_317 [pop in_use] ptr=81721834496 size=512 chunk=<cupy.cuda.memory.Chunk object at 0x00000240B0A7B3F0>
2123:fix_317 [remove next free_list] ptr=81721835008 size=512 chunk=<cupy.cuda.memory.Chunk object at 0x00000240B0A7BC10>
2124-fix_317 free(ptr=81721833984, size=512)
This log tells 81721835008
is in_use, but free(ptr=81721834496, size=512)
tried to remove 81721835008
from free_list, and error occurred.
I tried to reproduce this on my environment, but I still can not reproduce.
from cupy.
@hiro4bbh could you tell me python version and cython version you used?
from cupy.
I think there is no multithreaded operations...
Do you run in multiple threads actually? I found wierd logs as below:
1684 fix_317 malloc(size=512) ptr=81723928576 PooledMemory=<cupy.cuda.memory.PooledMemory object at 0x00000240B0A44D68>
1685 fix_317 [pop best-fit free_list] ptr=81723929088 size=11264 chunk=<cupy.cuda.memory.Chunk object at 0x00000240B0C31798>
1686 fix_317 [split size=512] ptr=81723929088 size=11264 chunk=<cupy.cuda.memory.Chunk object at 0x00000240B0C31798>fix_317 free(ptr=81721839616, size=512)
1687 fix_317 [pop in_use] ptr=81721839616 size=512 chunk=<cupy.cuda.memory.Chunk object at 0x00000240B0A7A4C0>
1688 fix_317 [remove prev free_list] ptr=81721835520 size=4096 chunk=<cupy.cuda.memory.Chunk object at 0x00000240B0C31A08>
1689 fix_317 [merged] ptr=81721835520 size=4608 chunk=<cupy.cuda.memory.Chunk object at 0x00000240B0C31CE0>
1690 fix_317 [push free_list] ptr=81721835520 size=4608 chunk=<cupy.cuda.memory.Chunk object at 0x00000240B0C31CE0>
1691 fix_317 free(ptr=81721840128, size=512)
1692 fix_317 [pop in_use] ptr=81721840128 size=512 chunk=<cupy.cuda.memory.Chunk object at 0x00000240B0A7A938>
1693 fix_317 [remove prev free_list] ptr=81721835520 size=4608 chunk=<cupy.cuda.memory.Chunk object at 0x00000240B0C31CE0>
1694 fix_317 [merged] ptr=81721835520 size=5120 chunk=<cupy.cuda.memory.Chunk object at 0x00000240B0C31D48>
1695 fix_317 [push free_list] ptr=81721835520 size=5120 chunk=<cupy.cuda.memory.Chunk object at 0x00000240B0C31D48>
1696 fix_317 free(ptr=81721840640, size=512)
1697 fix_317 [pop in_use] ptr=81721840640 size=512 chunk=<cupy.cuda.memory.Chunk object at 0x00000240B0C31118>
1698 fix_317 [remove prev free_list] ptr=81721835520 size=5120 chunk=<cupy.cuda.memory.Chunk object at 0x00000240B0C31D48>
1699 fix_317 [merged] ptr=81721835520 size=5632 chunk=<cupy.cuda.memory.Chunk object at 0x00000240B0C31DB0>
1700 fix_317 [push free_list] ptr=81721835520 size=5632 chunk=<cupy.cuda.memory.Chunk object at 0x00000240B0C31DB0>
1701
1702 fix_317 [push remaining free_list] ptr=81723929600 size=10752 chunk=<cupy.cuda.memory.Chunk object at 0x00000240B0A7AEE8>
1703 fix_317 [push in_use] ptr=81723929088 size=512 chunk=<cupy.cuda.memory.Chunk object at 0x00000240B0A7AD48>
1704 fix_317 malloc(size=512) ptr=81723929088 PooledMemory=<cupy.cuda.memory.PooledMemory object at 0x00000240AE7B56A0>
where
1685 fix_317 [pop best-fit free_list] ptr=81723929088 size=11264 chunk=<cupy.cuda.memory.Chunk object at 0x00000240B0C31798>
1686 fix_317 [split size=512] ptr=81723929088 size=11264 chunk=<cupy.cuda.memory.Chunk object at 0x00000240B0C31798>fix_317 free(ptr=81721839616, size=512)
1702 fix_317 [push remaining free_list] ptr=81723929600 size=10752 chunk=<cupy.cuda.memory.Chunk object at 0x00000240B0A7AEE8>
1703 fix_317 [push in_use] ptr=81723929088 size=512 chunk=<cupy.cuda.memory.Chunk object at 0x00000240B0A7AD48>
1704 fix_317 malloc(size=512) ptr=81723929088 PooledMemory=<cupy.cuda.memory.PooledMemory object at 0x00000240AE7B56A0>
are consecutive logs.
from cupy.
Added codes to print thread_ids on the fix_317 branch.
from cupy.
hiro4bbh says...
Python on Windows 10 x64 is version 3.6.2, and Cython is version 0.26.
Also, I got logs from hiro4bbh-san and it seemed it was one thread when the error occurred although it shows the latter part of logs uses another different thread, but it probably is not related with the error.
from cupy.
I tried with same python and cython version, but I could not reproduce. I now doubt windows environment, but I do not have a windows environment ...
from cupy.
I am not sure whether this helps, but I made thread-safe implementation as master...sonots:fix_317. Can you try this?
from cupy.
I tried several cases for reproducing the bug in your previous patch, but I couldn't reproduce.
I will try your latest patch. If there is no problem, I will use that version. I will report how your latest patch works.
Thank you for your patch!
from cupy.
I confimed that your latest patch didn't fail. I couldn't reproduce the bug.
Thank you!
from cupy.
from cupy.
Related Issues (20)
- Runtime compilation failed - CUDA synchronization primitives are only supported for sm_70 and up HOT 12
- `cupy.cuda.compile_with_cache` vs `cupy.RawModule` HOT 6
- CuPy v14 Release Plan
- Support SciPy 1.13
- Request for adding implementations of scipy hypergeometric functions HOT 2
- UnicodeEncodeError: 'ascii' codec can't encode characters in position 663-665: ordinal not in range(128) HOT 1
- cupyx.scipy.sparse.linalg and scipy.sparse.linalg HOT 2
- Discover cuTENSOR wheels when building CuPy
- Bug in cupyx.scipy.spatial.distance.euclidean function HOT 1
- batch processing cupyx.scipy.interpolate.cubicspline HOT 6
- Complete implementation of cp.random.choice() HOT 2
- Monodimensional linear interpolation performances HOT 4
- Failures to compile on operations on boolean arrays HOT 3
- RFC: Seeking feedback on adding new special functions from SciPy HOT 5
- Support NumPy 2.0 HOT 3
- cp.unique without sorting HOT 4
- `cupyx.scipy.sparse.bmat` default return format is different than `scipy.sparse.bmat` HOT 1
- cufft call with callback producing CUFFT_INTERNAL_ERROR HOT 2
- libnvrtc not found, v13 HOT 5
- Might be a bug with ReductionKernel and stream capture HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cupy.