Comments (12)
I'm currently away on an internship and thus not working on accelerate right now, but if I have a spare moment I'll try to look into it. Sorry about that =
from accelerate.
Hi tmcdonell, thank you for your response. I wish you do a good job on your intern.
By the way, I've managed to write a working simulator, by using Cell (Acc (Array DIM2 Real))
instead of
Acc (Array DIM2 (Cell Real))
as the representation of the state, where type Cell a = ((a,a,a) , (a,a,a) , (a,a,a) , a)
.
However, the Accelerate implementation was about 500 times slower than CUDA counterpart, sadly. This is due to my awkwardness in using Accelerate.
Here I prepared a source code to explain you.
https://github.com/nushio3/accelerate-test/blob/fa6e7b3b92e8d2ab357dad26d133c591d5756ef1/step05/OptTest.hs
You can run it like this.
> ./OptTest.hs 1 /dev/null
. . . .
success.
see that, in line 72-73, I have
instance Num AWR where
a+b = A.use $ run $ A.zipWith (+) a b
a-b = A.use $ run $ A.zipWith (-) a b
Since A.use . run
equals id
in semantics, We should be able to remove those. But when I do so:
instance Num AWR where
a+b = A.zipWith (+) a b
a-b = A.use $ run $ A.zipWith (-) a b
I get this:
> ./OptTest.hs 1 /dev/null
. . . .
OptTest.hs:
*** Internal error in package accelerate ***
*** Please submit a bug report at https://github.com/mchakravarty/accelerate/issues
./Data/Array/Accelerate/Smart.hs:886 ((+++)): Precondition violated
Or by doing this:
instance Num AWR where
a+b = A.use $ run $ A.zipWith (+) a b
a-b = A.zipWith (-) a b
I get this:
> ./OptTest.hs 1 /dev/null
. . . .
*** Internal error in package accelerate ***
*** Please submit a bug report at https://github.com/mchakravarty/accelerate/issues
./Data/Array/Accelerate/Smart.hs:321 (convertSharingAcc (prjIdx)): inconsistent valuation; sa = 51; env = [57]
The effect of A.use . run
compared to id
is to force smaller AST, hindering the optimizations.
I guess there are some bugs in optimization routines?
from accelerate.
Trevor,
What do you think might be the problem here.
Manuel
Am 14/07/2011 um 22:05 schrieb nushio3:
Hello,
I'm trying to use Accelerate for hydrodynamic simulations.
As a training, I'm writing a Lattice-Boltzmann solver with Accelerate. The program, under construction, isI have expressed what I want to write also in C++ and CUDA. They are
main-omp.cpp and main-cuda.cu at the same folder.To begin with, I wrote a function to initialize the array in Accelerate,
(it corresponds to the function 'initialize()' in fluid.h)
but it fails with 'submit a bug report' error.It says 'too many resources requested,' so I looked at the printout of Accelerate's kernel,
but for me it looks normal.
Am I doing something wrong, so that I'm wasting resources?
Or shall I decrease e.g. the resolution?./MainAcc.hs 0
... some warnings omitted ...
map
(\x0 -> (+) ((+) ((+) ((+) ((+) ((+) ((+) ((+) (2 (3 x0),
1 (3 x0)),
0 (3 x0)),
2 (2 x0)),
1 (2 x0)),
0 (2 x0)),
2 (1 x0)),
1 (1 x0)),
0 (1 x0)))
(generate
(Z :. 1024) :. 768
(\x0 -> ((0.0,0.0,0.0),
(0.1,
0.7,
(+) (0.2,
() (1.0e-3,
(/) (() (12.0, fromIntegral (indexHead x0)), 768.0)))),
(0.0,0.0,0.0),
((<) ((+) (() (64.0,
() ((-) (fromIntegral (indexHead (indexTail x0)),
(/) (768.0, 6.0)),
(-) (fromIntegral (indexHead (indexTail x0)),
(/) (768.0, 6.0)))),
() ((-) (fromIntegral (indexHead x0), (/) (768.0, 2.0)),
(-) (fromIntegral (indexHead x0), (/) (768.0, 2.0)))),
() ((/) (768.0, 24.0), (/) (768.0, 24.0)))) ?
(1.0, 0.0))))
MainAcc.hs:
** Internal error in package accelerate ***
*** Please submit a bug report at https://github.com/mchakravarty/accelerate/issues
./Data/Array/Accelerate/CUDA.hs:59 (unhandled): CUDA Exception: too many resources requested for launchReply to this email directly or view it on GitHub:
https://github.com/mchakravarty/accelerate/issues/25
from accelerate.
I can not reproduce the first bug report, unfortunately. Specs for my test machine follow; as you can see it is not one of the high-end CUDA cards. Which version of the CUDA toolkit are you using? I haven't tried with the 4.x series yet, so maybe that has something to do with it (for example, if the way device capabilities are reported has changed). I'll test that next...
Prelude Foreign.CUDA.Driver> props =<< device 0
DeviceProperties {deviceName = "GeForce GT 120", computeCapability = 1.1, totalGlobalMem = 268107776, totalConstMem = 65536, sharedMemPerBlock = 16384, regsPerBlock = 8192, warpSize = 32, maxThreadsPerBlock = 512, maxBlockSize = (512,512,64), maxGridSize = (65535,65535,1), maxTextureDim1D = 8192, maxTextureDim2D = (65536,32768), maxTextureDim3D = (2048,2048,2048), clockRate = 1250000, multiProcessorCount = 4, memPitch = 2147483647, textureAlignment = 256, computeMode = Default, deviceOverlap = True, concurrentKernels = False, eccEnabled = False, kernelExecTimeoutEnabled = True, integrated = False, canMapHostMemory = True}
For the second, the program runs without the use . run
statements if using @sseefried's patch for issue #22, although I'm not sure of the status of that patch relative to your own changes to sharing recovery.
from accelerate.
Thank you tmcdonell, for your effort. Let me see, the ghci trick
Prelude> :m +Foreign.CUDA.Driver
Prelude Foreign.CUDA.Driver> props =<< device 0
Loading package extensible-exceptions-0.1.1.2 ... linking ... done.
Loading package bytestring-0.9.1.10 ... linking ... done.
Loading package cuda-0.3.2.2 ... linking ... done.
*** Exception: CUDA Exception: driver not initialised
... didn't work for me. I'm using Tesla M2050 (device capability 2.0) with CUDA 3.2 . I'll upload the result of deviceQuery, if you need. I have tried CUDA 4.0 environment, too, but I couldn't install the hackage cuda-0.3.2.2 (which is the latest) into CUDA 4.0 environment.
I'm trying the patch 5c24257 now...
from accelerate.
Ah, first you need to run initialise []
; sorry for the omission. No matter --- the model number and driver version tell me everything I was interested in. I am also using nvcc version 3.2. Do you happen to be running a 64-bit version of GHC?
I have only done light testing on compute-2.0 devices since I only briefly had access to one. I recall there being some problems when the 2.0 series devices were released; maybe this is why the first example works on my 1.x series card but not your own...
from accelerate.
Thanks, tmcdonell, with initialize
I could query the device by props =<< device 0
.
With patch 5c24257 , I could compile the code without use . run
. Now benchmarking.
from accelerate.
Any progress on this problem?
from accelerate.
Nice to hear from you again! I haven't tried accelerate since ICFP2011, where I was possible to compute what I want in accelerate (but was slow.) Maybe it's a good time for me to touch the lates accelerate again!
from accelerate.
Good to hear from you as well. There have been many changes to Accelerate in the last few months. So, it may indeed be worthwhile to have another look.
from accelerate.
I'm going to go ahead and close this issue, as both of the example programs work now (it is still slow, but that's a different issue).
Congratulations on your recent release of Paraiso!
from accelerate.
Thank you for your congratulations!
I've been watching that Ryan Newton came in and accelerate is recently
seeing rapid progress. I'd really like to try it again but I've been
having something to do first...
Please keep up the good work!
2012/6/20 Trevor L. McDonell
[email protected]:
I'm going to go ahead and close this issue, as both of the example programs work now (it is still slow, but that's a different issue).
Congratulations on your recent release of Paraiso!
Reply to this email directly or view it on GitHub:
#25 (comment)
Takayuki MURANUSHI
The Hakubi Center for Advanced Research, Kyoto University
http://www.hakubi.kyoto-u.ac.jp/02_mem/h22/muranushi.html
from accelerate.
Related Issues (20)
- [BUG] Imperfect dead code elimination
- [BUG] Unexpectedly long phases when training a neural network HOT 1
- Support CUDA 11 HOT 1
- [BUG] CUDA-10 library doesn't support the Turing-based RTX 2060? HOT 8
- `inconsistent valuation @ shared 'Acc'` when trying to lift non-`Acc` function to `Acc` HOT 6
- `Foreign` instance for reference interpreter
- Is there a way to force accelerate operations to be sequentially evaluated? HOT 10
- [BUG] doc bugs
- Could not enable debugging options HOT 5
- Support GHCJS compilation HOT 7
- [BUG] Function hashes have incorrect length causing internal errors HOT 2
- [BUG] undefined symbol: _ZTIN4llvm10CallbackVHE HOT 4
- [BUG] Value 'sm_30' is not defined for option 'gpu-name' HOT 4
- [BUG] typo in Semigroup instance of Exp (Maybe a) HOT 1
- How to realise convolution? HOT 13
- [Tracking Issue] Implementing (Segmented) Single-Pass Look-Back Scans
- [BUG] Internal error in package accelerate and LLVM.PTX backend: CUDA Exception - misaligned address HOT 1
- [BUG] Runtime error with llvm-ptx backend: double free or corruption (!prev)
- [BUG] Library won't compile with debug flag when referenced by another project's cabal.project file. HOT 9
- [BUG] ptxas fatal error, sm_89 not defined for gpu-name
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from accelerate.