Hi, I am new to Torch and am wondering if it's possible to easily im

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

(And thank you <a class="user-mention notranslate" data-hovercard-type="user" data-hov

LSTM forget gate bias about char-rnn HOT 6 CLOSED

karpathy commented on September 25, 2024

LSTM forget gate bias

from char-rnn.

Comments (6)

karpathy commented on September 25, 2024

yes this would be relatively easy. You have to annotate the nngraph node (see docs) of the nn.Linear layer with parameters, so that you can query for it from the nngraph, and then you meddle with its .bias field. You'd have to be careful because I compute all 4 vectors i,f,o,g in one go and with one vector, so you'd want to set the correct portion of the bias vector higher.

I already did this in one fork of char-rnn but didn't find noticeable improvements in training time. But maybe I did it wrong ;) Fun exercise to try for yourself.

from char-rnn.

rfru commented on September 25, 2024

Cool! Thanks for the info. Is that fork available somewhere? Would be great to take a look at it to start.

from char-rnn.

faradox commented on September 25, 2024

In my humble experience there are noticeable improvements in many cases. In fact, I couldn't even get a single LSTM network with more than 2 layers to learn something if the forget gates weren't initialized with 1 (but that wasn't in the char-rnn code so maybe I did it wrong, too). To implement it here, I did:

In model/LSTM.lua:

local in_gate = nn.Sigmoid()(n1)
-- annotate the forget gate so we can manipulate it directly later
local forget_gate = nn.Sigmoid()(n2):annotate{
    name = 'forget', description = 'Forget gate',
}
local out_gate = nn.Sigmoid()(n3)

And in train.lua:

-- initialization
if do_random_init then
    params:uniform(-0.08, 0.08) -- small numbers uniform
    for _,node in ipairs(protos.rnn.forwardnodes) do
        if node:graphNodeName() == "forget" then
            node.bias:fill(1) -- initialize forget gates to 1
        end
    end
end

from char-rnn.

ffmpbgrnn commented on September 25, 2024

Hi @faradox , I think you should annotate on the Linear layer. Like:

local i2h = nn.Linear(input_size_L, 4 * rnn_size)(x)
i2h:annotate{name='i2h_'..L}

-- and then
for layer_idx = 1, opt.n_layers do
    for _,node in ipairs(protos.rnn.forwardnodes) do
        if node.data.annotations.name == "i2h_"..layer_idx then
            node.data.module.bias[{{1*opt.rnn_size+1, 2*opt.rnn_size}}]:fill(1)
        end
    end
end

Correct me if I am wrong.

from char-rnn.

karpathy commented on September 25, 2024

OK i ran a small experiment and I'm now seeing improvements from initializing with 1.0. I'm adding this feature to char-rnn since there is enough evidence that this probably helps, and usually doesn't hurt.

from char-rnn.

karpathy commented on September 25, 2024

(And thank you @rfru , @faradox and @ffmpbgrnn for the discussion surrounding this)

from char-rnn.

LSTM forget gate bias about char-rnn HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent