CUDA Exception: too many resources requested for launch about accelerate HOT 12 CLOSED

acceleratehs commented on May 27, 2024

CUDA Exception: too many resources requested for launch

from accelerate.

Comments (12)

tmcdonell commented on May 27, 2024

I'm currently away on an internship and thus not working on accelerate right now, but if I have a spare moment I'll try to look into it. Sorry about that =

from accelerate.

nushio3 commented on May 27, 2024

Hi tmcdonell, thank you for your response. I wish you do a good job on your intern.

By the way, I've managed to write a working simulator, by using Cell (Acc (Array DIM2 Real)) instead of
Acc (Array DIM2 (Cell Real)) as the representation of the state, where type Cell a = ((a,a,a) , (a,a,a) , (a,a,a) , a) .

However, the Accelerate implementation was about 500 times slower than CUDA counterpart, sadly. This is due to my awkwardness in using Accelerate.

Here I prepared a source code to explain you.
https://github.com/nushio3/accelerate-test/blob/fa6e7b3b92e8d2ab357dad26d133c591d5756ef1/step05/OptTest.hs

You can run it like this.

> ./OptTest.hs 1 /dev/null
. . . .
success.

see that, in line 72-73, I have

instance Num AWR where
  a+b = A.use $ run $ A.zipWith (+) a b
  a-b = A.use $ run $ A.zipWith (-) a b

Since A.use . run equals id in semantics, We should be able to remove those. But when I do so:

instance Num AWR where
  a+b = A.zipWith (+) a b
  a-b = A.use $ run $ A.zipWith (-) a b

I get this:

> ./OptTest.hs 1 /dev/null
. . . .
OptTest.hs: 
*** Internal error in package accelerate ***
*** Please submit a bug report at https://github.com/mchakravarty/accelerate/issues
./Data/Array/Accelerate/Smart.hs:886 ((+++)): Precondition violated

Or by doing this:

instance Num AWR where
  a+b = A.use $ run $ A.zipWith (+) a b
  a-b = A.zipWith (-) a b

I get this:

> ./OptTest.hs 1 /dev/null
. . . .
*** Internal error in package accelerate ***
*** Please submit a bug report at https://github.com/mchakravarty/accelerate/issues
./Data/Array/Accelerate/Smart.hs:321 (convertSharingAcc (prjIdx)): inconsistent valuation; sa = 51; env = [57]

The effect of A.use . run compared to id is to force smaller AST, hindering the optimizations.
I guess there are some bugs in optimization routines?

from accelerate.

mchakravarty commented on May 27, 2024

Trevor,

What do you think might be the problem here.

Manuel

Am 14/07/2011 um 22:05 schrieb nushio3:

Hello,
I'm trying to use Accelerate for hydrodynamic simulations.
As a training, I'm writing a Lattice-Boltzmann solver with Accelerate. The program, under construction, is

https://github.com/nushio3/accelerate-test/blob/7a8248fa30c0e728cea0fe03ccd21bf5bed8a5ef/step05/MainAcc.hs

I have expressed what I want to write also in C++ and CUDA. They are
main-omp.cpp and main-cuda.cu at the same folder.

To begin with, I wrote a function to initialize the array in Accelerate,
(it corresponds to the function 'initialize()' in fluid.h)
but it fails with 'submit a bug report' error.

It says 'too many resources requested,' so I looked at the printout of Accelerate's kernel,
but for me it looks normal.
Am I doing something wrong, so that I'm wasting resources?
Or shall I decrease e.g. the resolution?

./MainAcc.hs 0
... some warnings omitted ...
map
(\x0 -> (+) ((+) ((+) ((+) ((+) ((+) ((+) ((+) (2 (3 x0),
1 (3 x0)),
0 (3 x0)),
2 (2 x0)),
1 (2 x0)),
0 (2 x0)),
2 (1 x0)),
1 (1 x0)),
0 (1 x0)))
(generate
(Z :. 1024) :. 768
(\x0 -> ((0.0,0.0,0.0),
(0.1,
0.7,
(+) (0.2,
() (1.0e-3,
(/) (() (12.0, fromIntegral (indexHead x0)), 768.0)))),
(0.0,0.0,0.0),
((<) ((+) (() (64.0,
() ((-) (fromIntegral (indexHead (indexTail x0)),
(/) (768.0, 6.0)),
(-) (fromIntegral (indexHead (indexTail x0)),
(/) (768.0, 6.0)))),
() ((-) (fromIntegral (indexHead x0), (/) (768.0, 2.0)),
(-) (fromIntegral (indexHead x0), (/) (768.0, 2.0)))),
() ((/) (768.0, 24.0), (/) (768.0, 24.0)))) ?
(1.0, 0.0))))
MainAcc.hs:
** Internal error in package accelerate ***
*** Please submit a bug report at https://github.com/mchakravarty/accelerate/issues
./Data/Array/Accelerate/CUDA.hs:59 (unhandled): CUDA Exception: too many resources requested for launch

Reply to this email directly or view it on GitHub:
https://github.com/mchakravarty/accelerate/issues/25

from accelerate.

tmcdonell commented on May 27, 2024

I can not reproduce the first bug report, unfortunately. Specs for my test machine follow; as you can see it is not one of the high-end CUDA cards. Which version of the CUDA toolkit are you using? I haven't tried with the 4.x series yet, so maybe that has something to do with it (for example, if the way device capabilities are reported has changed). I'll test that next...

Prelude Foreign.CUDA.Driver> props =<< device 0
DeviceProperties {deviceName = "GeForce GT 120", computeCapability = 1.1, totalGlobalMem = 268107776, totalConstMem = 65536, sharedMemPerBlock = 16384, regsPerBlock = 8192, warpSize = 32, maxThreadsPerBlock = 512, maxBlockSize = (512,512,64), maxGridSize = (65535,65535,1), maxTextureDim1D = 8192, maxTextureDim2D = (65536,32768), maxTextureDim3D = (2048,2048,2048), clockRate = 1250000, multiProcessorCount = 4, memPitch = 2147483647, textureAlignment = 256, computeMode = Default, deviceOverlap = True, concurrentKernels = False, eccEnabled = False, kernelExecTimeoutEnabled = True, integrated = False, canMapHostMemory = True}

For the second, the program runs without the use . run statements if using @sseefried's patch for issue #22, although I'm not sure of the status of that patch relative to your own changes to sharing recovery.

from accelerate.

nushio3 commented on May 27, 2024

Thank you tmcdonell, for your effort. Let me see, the ghci trick
Prelude> :m +Foreign.CUDA.Driver
Prelude Foreign.CUDA.Driver> props =<< device 0
Loading package extensible-exceptions-0.1.1.2 ... linking ... done.
Loading package bytestring-0.9.1.10 ... linking ... done.
Loading package cuda-0.3.2.2 ... linking ... done.
*** Exception: CUDA Exception: driver not initialised
... didn't work for me. I'm using Tesla M2050 (device capability 2.0) with CUDA 3.2 . I'll upload the result of deviceQuery, if you need. I have tried CUDA 4.0 environment, too, but I couldn't install the hackage cuda-0.3.2.2 (which is the latest) into CUDA 4.0 environment.

I'm trying the patch 5c24257 now...

from accelerate.

tmcdonell commented on May 27, 2024

Ah, first you need to run initialise []; sorry for the omission. No matter --- the model number and driver version tell me everything I was interested in. I am also using nvcc version 3.2. Do you happen to be running a 64-bit version of GHC?

I have only done light testing on compute-2.0 devices since I only briefly had access to one. I recall there being some problems when the 2.0 series devices were released; maybe this is why the first example works on my 1.x series card but not your own...

from accelerate.

nushio3 commented on May 27, 2024

Thanks, tmcdonell, with initialize I could query the device by props =<< device 0 .
With patch 5c24257 , I could compile the code without use . run . Now benchmarking.

from accelerate.

mchakravarty commented on May 27, 2024

Any progress on this problem?

from accelerate.

nushio3 commented on May 27, 2024

Nice to hear from you again! I haven't tried accelerate since ICFP2011, where I was possible to compute what I want in accelerate (but was slow.) Maybe it's a good time for me to touch the lates accelerate again!

from accelerate.

mchakravarty commented on May 27, 2024

Good to hear from you as well. There have been many changes to Accelerate in the last few months. So, it may indeed be worthwhile to have another look.

from accelerate.

tmcdonell commented on May 27, 2024

I'm going to go ahead and close this issue, as both of the example programs work now (it is still slow, but that's a different issue).

Congratulations on your recent release of Paraiso!

from accelerate.

nushio3 commented on May 27, 2024

Thank you for your congratulations!

I've been watching that Ryan Newton came in and accelerate is recently
seeing rapid progress. I'd really like to try it again but I've been
having something to do first...

Please keep up the good work!

2012/6/20 Trevor L. McDonell
[email protected]:

I'm going to go ahead and close this issue, as both of the example programs work now (it is still slow, but that's a different issue).

Congratulations on your recent release of Paraiso!

Reply to this email directly or view it on GitHub:
#25 (comment)

Takayuki MURANUSHI
The Hakubi Center for Advanced Research, Kyoto University
http://www.hakubi.kyoto-u.ac.jp/02_mem/h22/muranushi.html

from accelerate.

CUDA Exception: too many resources requested for launch about accelerate HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent