Giter Club home page Giter Club logo

cunn's Introduction

# CUDA backend for the Neural Network Package #

This package provides a CUDA implementation for many of the modules in the base nn package: nn

  • Modules: There are also additional GPU-related modules not found in the nn package.

To use

Simply convert your network model to CUDA by calling :cuda():

local model = nn.Sequential()
model:add(nn.Linear(2,2))
model:add(nn.LogSoftMax())

model:cuda()  -- convert model to CUDA

... and similarly for your tensors:

local input = torch.Tensor(32,2):uniform()
input = input:cuda()
local output = model:forward(input)

... or create them directly as CudaTensors:

local input = torch.CudaTensor(32,2):uniform()
local output = model:forward(input)

To run unit-tests

luajit -l cunn -e 'cunn.test()'

GPU Training Concepts

Performance

  • data should be transferred between main memory and gpu in batches, otherwise the transfer time will be dominated by latency associated with speed of light, and execution overheads, rather than by bandwidth
  • therefore, train and predict using mini-batches
  • allocating GPU memory causes a sync-point, which will noticeably affect performance
    • therefore try to allocate any CudaTensors once, at the start of the program, and then simply copy data backwards and forwards between main memory and existing CudaTensors
  • similarly, try to avoid any operations that implicitly allocate new tensors. For example, if you write:
require 'cutorch'

local a = torch.CudaTensor(1000):uniform()
for it=1,1000 do
  local b = torch.add(a, 1)
end

... this will allocate one thousand new CudaTensors, one for each call to torch.add(a, 1).

Use instead this form:

require 'cutorch'

local a = torch.CudaTensor(1000):uniform()
local b = torch.CudaTensor(1000):uniform()
for it=1,1000 do
  b:add(a, 1)
end

In this form, b is allocated only once, before the loop. Then the b:add(a,1) operation will perform the add inside the GPU kernel, and store the result into the original b CudaTensor. This will run noticeably faster, in general. It's also a lot less likely to eat up arbitrary amounts of memory, and less likely to need frequent calls to collectgarbage(); collectgarbage().

Benchmarking

  • GPU operations will typically continue after an instruction has been issued
  • eg, if you do:
require 'cutorch'
local a = torch.CudaTensor(1000,1000):uniform()
a:add(1)

... the GPU kernel to add 1 will only be scheduled for launch by a:add(1). It might not have completed yet, or even have reached the GPU, at the time that the a:add(1) instructions has completed

  • therefore for running wall-clock timings, you should call cutorch.synchronize() before each timecheck point:
require 'cutorch'
require 'sys'

local a = torch.CudaTensor(1000,1000):uniform()
cutorch.synchronize()
start = sys.tic()
a:add(1)
cutorch.synchronize()
print(sys.toc())

cunn's People

Contributors

adamlerer avatar adampolyak avatar ajtulloch avatar andreaskoepf avatar andresy avatar apaszke avatar borisfom avatar caldweln avatar clement-masson avatar clementfarabet avatar colesbury avatar dominikgrewe avatar fbesse avatar fmassa avatar hughperkins avatar ipanchenko avatar jhjin avatar jonathantompson avatar kmul00 avatar leonbottou avatar lukeyeager avatar mys007 avatar nicholas-leonard avatar samehkhamis avatar soumith avatar szagoruyko avatar tkoeppe avatar verdimrc avatar vladmnih avatar xianjiec avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.