Comments (7)
The following is my simple template and benchmark comparing prefetching data in a CPU parallel thread to the normal version without prefetching. Basically, the training loop does not need to wait for data. However this is just twice as fast, not 10 times. So I guess the GPU can run 10 parallel trainings once all data is in GPU. Probably I need 10 threads for prefetching. Could you give some comments? Many thanks.
macro swap(x,y)
quote
local tmp = $(esc(x))
$(esc(x)) = $(esc(y))
$(esc(y)) = tmp
end
end
# some slow function
@everywhere function get_data(i)
sleep(0.6)
println("get_data $i")
i
end
function slow_train(x)
sleep(0.6)
println("slow_train $x")
end
function prefetch(rng)
@assert length(rng) > 1
rng = collect(rng)
a = b = nothing
function _iter()
for i ∈ 1:length(rng)
if a == nothing
a = remotecall(get_data, 2, rng[i])
b = remotecall(get_data, 2, rng[i+1])
else
if i < length(rng)
a = remotecall(get_data, 2, rng[i+1])
end
@swap(a,b)
end
d = fetch(a)
produce(d)
end
end
return Task(_iter)
end
@time for x ∈ prefetch(1:10)
slow_train(x)
end
% julia -p 2 test-task.jl
6.957115 seconds (153.23 k allocations: 6.454 MB)
macro swap(x,y)
quote
local tmp = $(esc(x))
$(esc(x)) = $(esc(y))
$(esc(y)) = tmp
end
end
# some slow function
@everywhere function get_data(i)
sleep(0.6)
println("get_data $i")
i
end
function slow_train(x)
sleep(0.6)
println("slow_train $x")
end
function fetch(rng)
rng = collect(rng)
function _iter()
for i ∈ 1:length(rng)
d = get_data(rng[i])
produce(d)
end
end
return Task(_iter)
end
@time for x ∈ fetch(1:10)
slow_train(x)
end
% julia test-task.jl
12.146958 seconds (84.82 k allocations: 3.528 MB)
from knet.jl.
I am getting 0.1ms for 1000 batches of 64x1000 Float32. Did I misunderstand the problem? Here is my code:
using Knet
function togpu(a)
b=Array(Any,length(a))
@inbounds for i=1:length(a)
b[i]=KnetArray(a[i])
end
return b
end
a = [ rand(Float32, 64, 1000) for i=1:1000 ]
@time a1=togpu(a);
@time a2=togpu(a);
@time a3=togpu(a);
from knet.jl.
To clarify: I meant 0.1ms per transfer.
from knet.jl.
I tried your test and got 0.27s for the first @time
, 0.07s for the second and third @time
, which is
70 times slower than yours. Is this abnormal? My PC configuration is CPU i7-5820K + GPU GTX 1080
julia> using Knet
INFO: Knet using GPU 0
julia> function togpu(a)
b=Array(Any,length(a))
@inbounds for i=1:length(a)
b[i]=KnetArray(a[i])
end
return b
end
togpu (generic function with 1 method)
julia> a = [ rand(Float32, 64, 1000) for i=1:1000 ];
julia> @time a1=togpu(a);
0.276411 seconds (243.87 k allocations: 10.282 MB)
julia> @time a2=togpu(a);
0.073134 seconds (10.01 k allocations: 289.391 KB)
julia> @time a3=togpu(a);
0.073607 seconds (10.01 k allocations: 289.391 KB)
from knet.jl.
I think this is consistent with my results, not slower. Ignore the first result, it includes compilation time. You are transferring 1000 arrays in 0.073 seconds. This means per transfer cost is 0.073 ms or 73 μs, which is better than my setup. One call to cudaMalloc takes 10 μs, (gpu allocation is slow, which is why I had to write a custom memory manager for Knet). So it seems 63 μs is the cost of RAM->GPU transfer.
from knet.jl.
For another data point, here is what I get on an AWS instance:
[ec2-user@ip-172-31-23-146 ~]$ julia foo.jl
INFO: Knet using GPU 0
0.488581 seconds (243.72 k allocations: 10.312 MB)
0.184520 seconds (10.01 k allocations: 289.391 KB)
0.190965 seconds (10.01 k allocations: 289.391 KB)
from knet.jl.
Thanks for clarifying the CPU to GPU time info. It is a common problem and independent of the framework then. As your benchmarks, it is fast enough though.
from knet.jl.
Related Issues (20)
- dropout seeds the global RNG
- MethodError: no method matching LinearIndices HOT 2
- ERROR: LoadError: KeyError: key :nw not found
- missing docstring error for "File I/O" section
- Making a new CUDA 3 compatible release? HOT 7
- Create a DeepMap.jl package
- Knet 1.4.7: libknet8 library not found HOT 4
- LoadError: Failed to precompile Knet [1902f260-5fb4-5aff-8c31-6271790ab950] to my julia directory HOT 3
- TagBot trigger issue HOT 2
- load/save problem with CuArrays in tutorial/60.rnn.ipynb HOT 2
- Derivative of a Function That Includes @diff Macro HOT 4
- R1 Regularization HOT 9
- type DataType has no field mutable HOT 2
- Quick Start tutorial notebook is broken HOT 1
- Cannot locate artifact 'libknet8' for aarch64-linux-gnu-libgfortran5-cxx11-libstdcxx29-julia_version+1.7.1 in Docker Container on Apple Silicon HOT 9
- Fix deprecations from MLDatasets in tutorial notebooks
- Problem with conv4 on gpu HOT 4
- UndefVarError: accuracy not defined in tutorial 30.lin.ipynb HOT 1
- CuArray{Float32} in ytype raised error HOT 4
- how to use the resnet modules in example folder
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from knet.jl.