Comments (13)
About how to implement a convolution: I rewrote your code to the following, and this seems to work, but I don't really know much about Gaussian convolutions, so please test if this indeed does what you intended.
<snip>
In case anyone runs into this in the future, keep in mind that does not actually compute convolution. Instead it computes cross correlation. Since the kernel is symmetrical those two will yield the same result, but you need to change the {kerx,kery} - n `A.div` 2
into n `A.div` 2 - {kerx,kery}
to get convolution.
from accelerate.
First let me explain the error. Accelerate has a very strict separation between array computations (Acc) and scalar-level computations (Exp). You cannot perform array computations within an Exp context. This is also explained under "Avoiding nested parallelism" in the documentation. In your program, you're entering the Exp context by writing generate (A.constant (Z :. h :. w))
; the operation that you then perform in that context is compute_re . compute_plat
, which does array operations. This is disallowed, hence the error.
As for how to implement convolution: if the kernel size (m,n
in your code) is known at Haskell compile time, then a 2-dimensional stencil (for usage, see the docs for 1-dimensional stencils) is without doubt the most runtime-efficient way of implementing this. Stencils in general (see I guess this wikipedia article, but not a super clear source) are literally made for this. However, with Accelerate's built-in stencil operations, the size of the kernel influences the type of the stencil operation, meaning that you cannot just have a size in two Int
values and do a stencil of that size. Arbitrary-size built-in stencil operations have been asked for earlier (#426); they just haven't been implemented yet.
from accelerate.
About how to implement a convolution: I rewrote your code to the following, and this seems to work, but I don't really know much about Gaussian convolutions, so please test if this indeed does what you intended.
EDIT: as correctly observed by robbert-vdh below, if the kernel array is non-symmetric, this is technically cross-correlation, not convolution. (This is the same behaviour as the poster's original code.) Either reverse the kernel array in both dimensions, or replace ker{x,y} - {m,n} `A.div` 2
by {m,n} `A.div` 2 - ker{x,y}
, to get convolution even for non-symmetric kernel arrays.
module Main where
import qualified Data.Array.Accelerate as A
import Data.Array.Accelerate (Exp, Acc, Array, DIM2, DIM3, DIM4, Z(..), (:.)(..))
import qualified Data.Array.Accelerate.Interpreter as I
-- Note that I've removed the additional Int parameters; you can use `shape` to
-- request the size of an array.
gauss_convolution :: Acc (Array DIM2 Double) -> Acc (Array DIM2 Double) -> Acc (Array DIM2 Double)
gauss_convolution gauss_op image = compute_re gauss_op (compute_plat gauss_op image)
-- Add two dimensions on the bottom. This function replaces every cell in the
-- input image with a 2-dimensional array the size of gauss_op, thereby making
-- a 4-dimensional array.
compute_plat :: Acc (Array DIM2 Double) -> Acc (Array DIM2 Double) -> Acc (Array DIM4 Double)
compute_plat gauss_op image = A.generate (A.I4 h w n m) myfun
where
-- Size of the gaussian kernel. Use the pattern synonyms instead of `lift`
-- and `unlift`; they are much nicer. Alternative notation here would be
-- `Z_ ::. n ::. m`.
A.I2 n m = A.shape gauss_op
-- Size of the full image
A.I2 h w = A.shape image
-- Now this function is going to be called for the whole 4-dimensional
-- array, which means that it gets the index in the image as the first two
-- components and the index in the kernel as the last two components.
--
-- About yuan{x,y}: `floor ((fromIntegral x :: Double) / 2)` is precisely
-- what `div` does. ;)
--
-- kery and kerx are what you called y1 and x1.
myfun :: Exp DIM4 -> Exp Double
myfun (A.I4 y x kery kerx) =
image A.! A.I2 (A.min (h - 1) (A.max 0 (y + kery - n `A.div` 2)))
(A.min (w - 1) (A.max 0 (x + kerx - m `A.div` 2)))
-- Fold the 2-dimensional subarrays into a scalar again, bringing the
-- 4-dimensional array down to 2 dimensions again. The first argument is the
-- gaussian kernel.
compute_re :: Acc (Array DIM2 Double) -> Acc (Array DIM4 Double) -> Acc (Array DIM2 Double)
compute_re gauss_op pla =
let A.I4 h w n m = A.shape pla
-- Replicate the gaussian kernel over all positions in the image
replicated_gauss_op :: Acc (Array DIM4 Double)
replicated_gauss_op = A.replicate (A.I4 h w A.All_ A.All_) gauss_op
-- Multiply the kernel elementwise with each neighbourhood in the image
multiplied :: Acc (Array DIM4 Double)
multiplied = A.zipWith (A.*) replicated_gauss_op pla
-- Contract each neighbourhood (2-dimensional subarray) down to a
-- 1-dimensional vector, so that we can fold it away in one go
contracted :: Acc (Array DIM3 Double)
contracted = A.reshape (A.I3 h w (n * m)) multiplied
in -- Finally, sum this contracted vector to get the 2-dimensional result, the size of the original image.
A.fold (A.+) 0 contracted
main :: IO ()
main = do
let image = A.fromList (Z :. (7 :: Int) :. (8 :: Int))
[1, 0, 0, 0, 0, 0, 0, 1
,0, 1, 0, 0, 0, 0, 1, 0
,0, 0, 1, 0, 0, 1, 0, 0
,0, 0, 0, 1, 1, 0, 0, 0
,0, 0, 1, 0, 0, 1, 0, 0
,0, 1, 0, 0, 0, 0, 1, 0
,1, 0, 0, 0, 0, 0, 0, 1]
gauss_op = A.fromList (Z :. (3 :: Int) :. (5 :: Int))
(map (/ 17)
[0, 0, 2, 0, 0
,1, 3, 5, 3, 1
,0, 0, 2, 0, 0])
print $ I.runN gauss_convolution gauss_op image
from accelerate.
the
First let me explain the error. Accelerate has a very strict separation between array computations (Acc) and scalar-level computations (Exp). You cannot perform array computations within an Exp context. This is also explained under "Avoiding nested parallelism" in the documentation. In your program, you're entering the Exp context by writing
generate (A.constant (Z :. h :. w))
; the operation that you then perform in that context iscompute_re . compute_plat
, which does array operations. This is disallowed, hence the error.As for how to implement convolution: if the kernel size (
m,n
in your code) is known at Haskell compile time, then a 2-dimensional stencil (for usage, see the docs for 1-dimensional stencils) is without doubt the most runtime-efficient way of implementing this. Stencils in general (see I guess this wikipedia article, but not a super clear source) are literally made for this. However, with Accelerate's built-in stencil operations, the size of the kernel influences the type of the stencil operation, meaning that you cannot just have a size in twoInt
values and do a stencil of that size. Arbitrary-size built-in stencil operations have been asked for earlier (#426); they just haven't been implemented yet.
Thanks, It's effective!
from accelerate.
About how to implement a convolution: I rewrote your code to the following, and this seems to work, but I don't really know much about Gaussian convolutions, so please test if this indeed does what you intended.
module Main where import qualified Data.Array.Accelerate as A import Data.Array.Accelerate (Exp, Acc, Array, DIM2, DIM3, DIM4, Z(..), (:.)(..)) import qualified Data.Array.Accelerate.Interpreter as I -- Note that I've removed the additional Int parameters; you can use `shape` to -- request the size of an array. gauss_convolution :: Acc (Array DIM2 Double) -> Acc (Array DIM2 Double) -> Acc (Array DIM2 Double) gauss_convolution gauss_op image = compute_re gauss_op (compute_plat gauss_op image) -- Add two dimensions on the bottom. This function replaces every cell in the -- input image with a 2-dimensional array the size of gauss_op, thereby making -- a 4-dimensional array. compute_plat :: Acc (Array DIM2 Double) -> Acc (Array DIM2 Double) -> Acc (Array DIM4 Double) compute_plat gauss_op image = A.generate (A.I4 h w n m) myfun where -- Size of the gaussian kernel. Use the pattern synonyms instead of `lift` -- and `unlift`; they are much nicer. Alternative notation here would be -- `Z_ ::. n ::. m`. A.I2 n m = A.shape gauss_op -- Size of the full image A.I2 h w = A.shape image -- Now this function is going to be called for the whole 4-dimensional -- array, which means that it gets the index in the image as the first two -- components and the index in the kernel as the last two components. -- -- About yuan{x,y}: `floor ((fromIntegral x :: Double) / 2)` is precisely -- what `div` does. ;) -- -- kery and kerx are what you called y1 and x1. myfun :: Exp DIM4 -> Exp Double myfun (A.I4 y x kery kerx) = image A.! A.I2 (A.min (h - 1) (A.max 0 (y + kery - n `A.div` 2))) (A.min (w - 1) (A.max 0 (x + kerx - m `A.div` 2))) -- Fold the 2-dimensional subarrays into a scalar again, bringing the -- 4-dimensional array down to 2 dimensions again. The first argument is the -- gaussian kernel. compute_re :: Acc (Array DIM2 Double) -> Acc (Array DIM4 Double) -> Acc (Array DIM2 Double) compute_re gauss_op pla = let A.I4 h w n m = A.shape pla -- Replicate the gaussian kernel over all positions in the image replicated_gauss_op :: Acc (Array DIM4 Double) replicated_gauss_op = A.replicate (A.I4 h w A.All_ A.All_) gauss_op -- Multiply the kernel elementwise with each neighbourhood in the image multiplied :: Acc (Array DIM4 Double) multiplied = A.zipWith (A.*) replicated_gauss_op pla -- Contract each neighbourhood (2-dimensional subarray) down to a -- 1-dimensional vector, so that we can fold it away in one go contracted :: Acc (Array DIM3 Double) contracted = A.reshape (A.I3 h w (n * m)) multiplied in -- Finally, sum this contracted vector to get the 2-dimensional result, the size of the original image. A.fold (A.+) 0 contracted main :: IO () main = do let image = A.fromList (Z :. (7 :: Int) :. (8 :: Int)) [1, 0, 0, 0, 0, 0, 0, 1 ,0, 1, 0, 0, 0, 0, 1, 0 ,0, 0, 1, 0, 0, 1, 0, 0 ,0, 0, 0, 1, 1, 0, 0, 0 ,0, 0, 1, 0, 0, 1, 0, 0 ,0, 1, 0, 0, 0, 0, 1, 0 ,1, 0, 0, 0, 0, 0, 0, 1] gauss_op = A.fromList (Z :. (3 :: Int) :. (5 :: Int)) (map (/ 17) [0, 0, 2, 0, 0 ,1, 3, 5, 3, 1 ,0, 0, 2, 0, 0]) print $ I.runN gauss_convolution gauss_op image
WOW! Beautiful job! I need to retry it.
from accelerate.
About how to implement a convolution: I rewrote your code to the following, and this seems to work, but I don't really know much about Gaussian convolutions, so please test if this indeed does what you intended.
module Main where import qualified Data.Array.Accelerate as A import Data.Array.Accelerate (Exp, Acc, Array, DIM2, DIM3, DIM4, Z(..), (:.)(..)) import qualified Data.Array.Accelerate.Interpreter as I -- Note that I've removed the additional Int parameters; you can use `shape` to -- request the size of an array. gauss_convolution :: Acc (Array DIM2 Double) -> Acc (Array DIM2 Double) -> Acc (Array DIM2 Double) gauss_convolution gauss_op image = compute_re gauss_op (compute_plat gauss_op image) -- Add two dimensions on the bottom. This function replaces every cell in the -- input image with a 2-dimensional array the size of gauss_op, thereby making -- a 4-dimensional array. compute_plat :: Acc (Array DIM2 Double) -> Acc (Array DIM2 Double) -> Acc (Array DIM4 Double) compute_plat gauss_op image = A.generate (A.I4 h w n m) myfun where -- Size of the gaussian kernel. Use the pattern synonyms instead of `lift` -- and `unlift`; they are much nicer. Alternative notation here would be -- `Z_ ::. n ::. m`. A.I2 n m = A.shape gauss_op -- Size of the full image A.I2 h w = A.shape image -- Now this function is going to be called for the whole 4-dimensional -- array, which means that it gets the index in the image as the first two -- components and the index in the kernel as the last two components. -- -- About yuan{x,y}: `floor ((fromIntegral x :: Double) / 2)` is precisely -- what `div` does. ;) -- -- kery and kerx are what you called y1 and x1. myfun :: Exp DIM4 -> Exp Double myfun (A.I4 y x kery kerx) = image A.! A.I2 (A.min (h - 1) (A.max 0 (y + kery - n `A.div` 2))) (A.min (w - 1) (A.max 0 (x + kerx - m `A.div` 2))) -- Fold the 2-dimensional subarrays into a scalar again, bringing the -- 4-dimensional array down to 2 dimensions again. The first argument is the -- gaussian kernel. compute_re :: Acc (Array DIM2 Double) -> Acc (Array DIM4 Double) -> Acc (Array DIM2 Double) compute_re gauss_op pla = let A.I4 h w n m = A.shape pla -- Replicate the gaussian kernel over all positions in the image replicated_gauss_op :: Acc (Array DIM4 Double) replicated_gauss_op = A.replicate (A.I4 h w A.All_ A.All_) gauss_op -- Multiply the kernel elementwise with each neighbourhood in the image multiplied :: Acc (Array DIM4 Double) multiplied = A.zipWith (A.*) replicated_gauss_op pla -- Contract each neighbourhood (2-dimensional subarray) down to a -- 1-dimensional vector, so that we can fold it away in one go contracted :: Acc (Array DIM3 Double) contracted = A.reshape (A.I3 h w (n * m)) multiplied in -- Finally, sum this contracted vector to get the 2-dimensional result, the size of the original image. A.fold (A.+) 0 contracted main :: IO () main = do let image = A.fromList (Z :. (7 :: Int) :. (8 :: Int)) [1, 0, 0, 0, 0, 0, 0, 1 ,0, 1, 0, 0, 0, 0, 1, 0 ,0, 0, 1, 0, 0, 1, 0, 0 ,0, 0, 0, 1, 1, 0, 0, 0 ,0, 0, 1, 0, 0, 1, 0, 0 ,0, 1, 0, 0, 0, 0, 1, 0 ,1, 0, 0, 0, 0, 0, 0, 1] gauss_op = A.fromList (Z :. (3 :: Int) :. (5 :: Int)) (map (/ 17) [0, 0, 2, 0, 0 ,1, 3, 5, 3, 1 ,0, 0, 2, 0, 0]) print $ I.runN gauss_convolution gauss_op image
I read your code, it's good job. But the process of creating a four-dimensional array consumes a lot of memory, especially when I work with large images
from accelerate.
But the process of creating a four-dimensional array consumes a lot of memory, especially when I work with large images
This is an understandable worry. However, have you tried it? Does it indeed consume much more memory at runtime than expected?
The reason I'm asking is that Accelerate will not actually create ("manifest") this intermediate 4-dimensional array in memory. (At least, if all is right.) The reason is array fusion: see the "Fusion" subheading in the documentation for Acc
.
See the image (I hope it's readable...) for a diagram showing how this works for the program that I wrote above. In black is the original program, in red some of the fusion rules, and in blue the fused program. Note that Accelerate can actually give you the fused program: show gauss_convolution
is a String
that shows what it will actually compute. Notice that there is just one fold
over a delayed
array, which contains an expression computing each element of the three-dimensional array. (That expression is kind of unreadable, which is unfortunate, but at least you can see that there are no other array primitives left.)
from accelerate.
But the process of creating a four-dimensional array consumes a lot of memory, especially when I work with large images
This is an understandable worry. However, have you tried it? Does it indeed consume much more memory at runtime than expected?
The reason I'm asking is that Accelerate will not actually create ("manifest") this intermediate 4-dimensional array in memory. (At least, if all is right.) The reason is array fusion: see the "Fusion" subheading in the documentation for
Acc
.See the image (I hope it's readable...) for a diagram showing how this works for the program that I wrote above. In black is the original program, in red some of the fusion rules, and in blue the fused program. Note that Accelerate can actually give you the fused program:
show gauss_convolution
is aString
that shows what it will actually compute. Notice that there is just onefold
over adelayed
array, which contains an expression computing each element of the three-dimensional array. (That expression is kind of unreadable, which is unfortunate, but at least you can see that there are no other array primitives left.)
Thank you for your professional help.I solved it.
By the way , what functions should I use to extract some rows and columns of the matrix?
Do I need to call slice function repeatedly?
from accelerate.
You can indeed use calls to slice
to get a few rows and columns of a matrix; does that not work?
from accelerate.
The problem is that I need to call it repeatly, I just wan to sample every other row and col .
from accelerate.
if you want, for example, the 0th, 2nd, 4th, 6th, row etc. of a matrix m :: Acc (Matrix a)
, why not use:
let I2 h w = shape m
in backpermute (I2 ((h + 1) `div` 2) w) -- new shape: height goes 0->0, 1->1, 2->1, 3->2, 4->2, 5->3, etc.
(\(I2 y x) -> m ! I2 (2 * y) x) -- given index in new array, produce index in old array
m -- array to read values from
Getting the 1st, 3rd, 5th, etc. rows is an exercise to the reader. :)
from accelerate.
backpermute
Thank you!
This question seems to go beyond the subject, I think it's time to finish it perfectly.
Thank you once again. I will study it.
from accelerate.
Hope this was useful :)
from accelerate.
Related Issues (20)
- Examples of incorrect output from pretty printer HOT 11
- [BUG] Imperfect shape query propagation in the presence of dead code HOT 2
- [BUG] Build fails HOT 6
- [BUG] Imperfect dead code elimination
- [BUG] Unexpectedly long phases when training a neural network HOT 1
- Support CUDA 11 HOT 1
- [BUG] CUDA-10 library doesn't support the Turing-based RTX 2060? HOT 8
- `inconsistent valuation @ shared 'Acc'` when trying to lift non-`Acc` function to `Acc` HOT 6
- `Foreign` instance for reference interpreter
- Is there a way to force accelerate operations to be sequentially evaluated? HOT 10
- [BUG] doc bugs
- Could not enable debugging options HOT 5
- Support GHCJS compilation HOT 7
- [BUG] Function hashes have incorrect length causing internal errors HOT 2
- [BUG] undefined symbol: _ZTIN4llvm10CallbackVHE HOT 4
- [BUG] Value 'sm_30' is not defined for option 'gpu-name' HOT 4
- [BUG] typo in Semigroup instance of Exp (Maybe a) HOT 1
- [Tracking Issue] Implementing (Segmented) Single-Pass Look-Back Scans
- [BUG] Internal error in package accelerate and LLVM.PTX backend: CUDA Exception - misaligned address HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from accelerate.