Comments (3)
Oh, thanks for pointing that out 😅, my assumption was copyto!
would be the optimized implementation. I'll try it in a bit!
from fasttranspose.jl.
For me:
julia> B = rand(4093, 4093); A = rand(4093, 4093);
julia> @benchmark transpose!($B, $A)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 40.022 ms (0.00% GC)
median time: 40.160 ms (0.00% GC)
mean time: 40.199 ms (0.00% GC)
maximum time: 42.061 ms (0.00% GC)
--------------
samples: 125
evals/sample: 1
julia> @benchmark recursive_transpose!($B, $A)
BenchmarkTools.Trial:
memory estimate: 576 bytes
allocs estimate: 12
--------------
minimum time: 25.982 ms (0.00% GC)
median time: 26.238 ms (0.00% GC)
mean time: 26.248 ms (0.00% GC)
maximum time: 27.209 ms (0.00% GC)
--------------
samples: 191
evals/sample: 1
Note that on my desktop PC I had to pick n = 4093 not 4096, because the latter is a perfect way to destroy L1 cache; I think it's called the critical stride where each load fills the same cache line:
julia> @benchmark transpose!($B, $A)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 37.015 ms (0.00% GC)
median time: 37.740 ms (0.00% GC)
mean time: 37.740 ms (0.00% GC)
maximum time: 38.543 ms (0.00% GC)
--------------
samples: 133
evals/sample: 1
julia> @benchmark recursive_transpose!($B, $A)
BenchmarkTools.Trial:
memory estimate: 576 bytes
allocs estimate: 12
--------------
minimum time: 114.412 ms (0.00% GC)
median time: 131.820 ms (0.00% GC)
mean time: 131.358 ms (0.00% GC)
maximum time: 132.169 ms (0.00% GC)
--------------
samples: 39
evals/sample: 1
apparently transpose!
does not suffer from this!
from fasttranspose.jl.
For me it's the other way around:
julia> B = rand(4093, 4093); A = rand(4093, 4093);
julia> @benchmark transpose!($B, $A)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 31.469 ms (0.00% GC)
median time: 57.118 ms (0.00% GC)
mean time: 56.974 ms (0.00% GC)
maximum time: 67.906 ms (0.00% GC)
--------------
samples: 88
evals/sample: 1
julia> @benchmark recursive_transpose!($B, $A)
BenchmarkTools.Trial:
memory estimate: 576 bytes
allocs estimate: 12
--------------
minimum time: 58.961 ms (0.00% GC)
median time: 60.631 ms (0.00% GC)
mean time: 60.993 ms (0.00% GC)
maximum time: 66.589 ms (0.00% GC)
--------------
samples: 82
evals/sample: 1
julia> B = rand(4096, 4096); A = rand(4096, 4096);
julia> @benchmark transpose!($B, $A)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 32.347 ms (0.00% GC)
median time: 35.922 ms (0.00% GC)
mean time: 37.950 ms (0.00% GC)
maximum time: 54.238 ms (0.00% GC)
--------------
samples: 132
evals/sample: 1
julia> @benchmark recursive_transpose!($B, $A)
BenchmarkTools.Trial:
memory estimate: 576 bytes
allocs estimate: 12
--------------
minimum time: 34.468 ms (0.00% GC)
median time: 35.153 ms (0.00% GC)
mean time: 35.302 ms (0.00% GC)
maximum time: 39.143 ms (0.00% GC)
--------------
samples: 142
evals/sample: 1
But maybe I am doing something wrong; your timings seem much more stable (i.e. less fluctuations). Also, I forgot the size but your implementation was clearly beating transpose!
with a large amount when I timed yesterday. I might have to many things running on my computer currently.
from fasttranspose.jl.
Related Issues (1)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fasttranspose.jl.