It looks like <a class="issue-link js-issue-link" data-error-text="Failed to load titl

Can't reproduce with default commandline options: <div class="snippet-clipboard-co

Just tried with GPU ID 0, still cant reproduce: <div class="snippet-clipboard-cont

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

CUDA support broken against current torch/nn about char-rnn HOT 13 CLOSED

karpathy commented on September 25, 2024

CUDA support broken against current torch/nn

from char-rnn.

Comments (13)

soumith commented on September 25, 2024

goddamnit! how do I check this? should I run char-rnn with some flags?

from char-rnn.

soumith commented on September 25, 2024

Can't reproduce with default commandline options:

th train.lua -data_dir data/tinyshakespeare -gpuid -1

vocab.t7 and data.t7 do not exist. Running preprocessing...
one-time setup: preprocessing input text file data/tinyshakespeare/input.txt...
loading text file...
creating vocabulary mapping...
putting data into tensor...
saving data/tinyshakespeare/vocab.t7
saving data/tinyshakespeare/data.t7
loading data files...
cutting off end of data so that the batches/sequences divide evenly
reshaping tensor...
data load done. Number of data batches in train: 423, val: 23, test: 0
vocab size: 65
creating an LSTM with 2 layers
number of parameters in the model: 240321
cloning rnn
cloning criterion
1/21150 (epoch 0.002), train_loss = 4.19766416, grad/param norm = 4.5006e-01, time/batch = 0.38s
2/21150 (epoch 0.005), train_loss = 4.10134056, grad/param norm = 6.3375e-01, time/batch = 0.26s
3/21150 (epoch 0.007), train_loss = 3.44502399, grad/param norm = 9.4798e-01, time/batch = 0.27s
4/21150 (epoch 0.009), train_loss = 3.45054399, grad/param norm = 1.1340e+00, time/batch = 0.27s
5/21150 (epoch 0.012), train_loss = 3.33238818, grad/param norm = 7.8976e-01, time/batch = 0.27s
6/21150 (epoch 0.014), train_loss = 3.37363688, grad/param norm = 7.0334e-01, time/batch = 0.27s
7/21150 (epoch 0.017), train_loss = 3.36438210, grad/param norm = 6.5300e-01, time/batch = 0.27s
8/21150 (epoch 0.019), train_loss = 3.33342581, grad/param norm = 7.6950e-01, time/batch = 0.27s
9/21150 (epoch 0.021), train_loss = 3.29173263, grad/param norm = 6.1282e-01, time/batch = 0.27s

from char-rnn.

soumith commented on September 25, 2024

if you updated nn, you also probably want to update cunn. an equivalent PR was landed in cunn at the same time: torch/cunn#120

from char-rnn.

wbertelsen commented on September 25, 2024

I get this with th train.lua -data_dir data/tinyshakespeare -gpuid 0 (-1 is CPU mode)

from char-rnn.

soumith commented on September 25, 2024

Just tried with GPU ID 0, still cant reproduce:

th train.lua -data_dir data/tinyshakespeare -gpuid 0

using CUDA on GPU 0...
loading data files...
cutting off end of data so that the batches/sequences divide evenly
reshaping tensor...
data load done. Number of data batches in train: 423, val: 23, test: 0
vocab size: 65
creating an LSTM with 2 layers
number of parameters in the model: 240321
cloning rnn
cloning criterion
1/21150 (epoch 0.002), train_loss = 4.16315975, grad/param norm = 4.5507e-01, time/batch = 0.28s
2/21150 (epoch 0.005), train_loss = 4.06560737, grad/param norm = 6.1593e-01, time/batch = 0.11s
3/21150 (epoch 0.007), train_loss = 3.50594769, grad/param norm = 1.2221e+00, time/batch = 0.11s
4/21150 (epoch 0.009), train_loss = 3.45355825, grad/param norm = 1.3675e+00, time/batch = 0.11s
5/21150 (epoch 0.012), train_loss = 3.35222242, grad/param norm = 1.2052e+00, time/batch = 0.11s
6/21150 (epoch 0.014), train_loss = 3.37636928, grad/param norm = 8.7048e-01, time/batch = 0.11s
7/21150 (epoch 0.017), train_loss = 3.36737326, grad/param norm = 6.1815e-01, time/batch = 0.10s
8/21150 (epoch 0.019), train_loss = 3.32496874, grad/param norm = 4.2533e-01, time/batch = 0.10s
9/21150 (epoch 0.021), train_loss = 3.29095509, grad/param norm = 4.5369e-01, time/batch = 0.11s
10/21150 (epoch 0.024), train_loss = 3.38070163, grad/param norm = 4.3267e-01, time/batch = 0.11s
11/21150 (epoch 0.026), train_loss = 3.30103775, grad/param norm = 4.4517e-01, time/batch = 0.11s
12/21150 (epoch 0.028), train_loss = 3.32078692, grad/param norm = 3.6975e-01, time/batch = 0.11s
13/21150 (epoch 0.031), train_loss = 3.30807559, grad/param norm = 2.9326e-01, time/batch = 0.11s

from char-rnn.

wbertelsen commented on September 25, 2024

This is what I get. Are you sure you're running against current code?

$th train.lua -data_dir data/tinyshakespeare -gpuid 0
using CUDA on GPU 0...
loading data files...
cutting off end of data so that the batches/sequences divide evenly
reshaping tensor...
data load done. Number of data batches in train: 423, val: 23, test: 0
vocab size: 65
creating an LSTM with 2 layers
number of parameters in the model: 240321
cloning rnn
cloning criterion
/Users/wbertelsen/torch/install/bin/luajit: ...sen/torch/install/share/lua/5.1/nn/ClassNLLCriterion.lua:34: bad argument #1 (field weights does not exist)
stack traceback:
    [C]: in function 'ClassNLLCriterion_updateOutput'
    ...sen/torch/install/share/lua/5.1/nn/ClassNLLCriterion.lua:34: in function 'forward'
    train.lua:213: in function 'opfunc'
    ...wbertelsen/torch/install/share/lua/5.1/optim/rmsprop.lua:32: in function 'rmsprop'
    train.lua:252: in main chunk
    [C]: in function 'dofile'
    ...lsen/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
    [C]: at 0x010f20b320

from char-rnn.

wbertelsen commented on September 25, 2024

Actually reading the source of the ClassNLLCriterion maybe im the one with old code.

from char-rnn.

soumith commented on September 25, 2024

luarocks install nn
luarocks install cunn

these two should fix it for you.

from char-rnn.

wbertelsen commented on September 25, 2024

Thanks! Looks like they got mismatched.

from char-rnn.

hughperkins commented on September 25, 2024

@soumith, just out of curiosity, what GPU model were you using above?

from char-rnn.

soumith commented on September 25, 2024

@hughperkins whatever was the default in char-rnn

from char-rnn.

soumith commented on September 25, 2024

GPU model nvidia k40m

from char-rnn.

hughperkins commented on September 25, 2024

Interesting. Thanks!

from char-rnn.

CUDA support broken against current torch/nn about char-rnn HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent