Giter Club home page Giter Club logo

adaptive-inertia-adai's People

Contributors

zeke-xie avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

adaptive-inertia-adai's Issues

Linear layer state['step'] increament is 2

If I define a simple linear model like this:

class TinyModel(torch.nn.Module):

    def __init__(self):
        super(TinyModel, self).__init__()

        self.layer1 = torch.nn.Linear(1000, 100, bias=True)
        self.relu = torch.nn.ReLU()

    def forward(self, x):
        x = self.layer1(x)
        x = self.relu(x)
        return x

Linear layer has two parameters, y=xA' +b.
In your adai.py code, state['step'] will be 1 for A, and 2 for b. If we unroll the for loop:

param_size = param_size +sizeA
grad = p.grad.data# A's grad
bias_correction2 = 1 - beta2
exp_avg_sq.mul_(beta2).addcmul_(grad, grad, value=1 - beta2)
exp_avg_sq_hat_sum += exp_avg_sq.sum() / bias_correction2

param_size = param_size +sizeb
grad = p.grad.data# b's grad
bias_correction2 = 1 - beta2**2
exp_avg_sq.mul_(beta2).addcmul_(grad, grad, value=1 - beta2)
exp_avg_sq_hat_sum += exp_avg_sq.sum() / bias_correction2

Is it a bug for a layer with multiple parameters?

fit bortfeld function with adai and adam, adai cannot converge at the same learning rate

I tested some simple models on pytorch, Adai does have better performance over Adam, therefore, I tried to use Adai to fit bortfeld function. I implemented two matlab functions to compare the performance of Adai and Adam. And I found Adai cannot converge while Adam converged. Is my implementation wrong?

Bortfeld function() is a function used to fit proton bragg peak, An analytical approximation of the Bragg curve for therapeutic proton beams

function [theta_best,loss] = adai(depth,para,idd_i,lb,ub,lr)
    % adam inertia
    T = 2000;
    beta0 = 0.1;
    beta1_cum_prod = 1;
    beta2 = 0.99;
    epsilon = 1e-3;
    loss = zeros(T,1);
    m_tm1 = 0;
    v_tm1 = 0;
    theta_tm1 = para;
    v_t_mean = 0;
    
    theta_best = para;
    loss_best = 1e9;
    loss(1) = norm((bf_mex(depth,theta_tm1,'idd') - idd_i),'fro');
    for t = 2:T
        % get gradient = jacobian*error
        g_t = 2*bf_mex(depth,theta_tm1,'jacobian')'*(bf_mex(depth,theta_tm1,'idd') - idd_i);
        % Update biased second raw moment estimate
        v_t = beta2*v_tm1 + (1-beta2)*g_t.^2;
        % Compute bias-corrected second raw moment estimate
        v_t_hat = v_t / (1-beta2^(t-1));
        v_t_mean = mean(v_t_hat);
        beta1t = max(min(1-(v_t_hat./v_t_mean).*beta0, 1-epsilon),0);
        % Update biased first moment estimate
        m_t = beta1t.*m_tm1 + (1-beta1t).*g_t;
        beta1_cum_prod = beta1_cum_prod.*beta1t;
        % Compute bias-corrected first moment estimate
        m_t_hat = m_t ./ (1-beta1_cum_prod);
        % Update parameters
        theta_t = theta_tm1 - lr*m_t_hat;
        % constrain
        theta_t(theta_t < lb) = lb(theta_t < lb);
        theta_t(theta_t > ub) = ub(theta_t > ub);
        
        theta_tm1 = theta_t;
        m_tm1 = m_t;
        v_tm1 = v_t;
            
        idd_pred = bf_mex(depth,theta_t,'idd');
        loss(t) = norm((idd_pred - idd_i),'fro');
        
        if loss(t) < loss_best
           loss_best = loss(t);
           theta_best = theta_t;
        end
        if (abs(loss(t) - loss(t-1)) < 1e-6)
            break;
        end
    end
    
end
function [theta_best,loss] = adam(depth,para,idd_i,lb,ub,lr)
    T = 2000;
    beta1 = 0.9;
    beta2 = 0.999;
    epsilon = 1e-8;
    loss = zeros(T,1);
    m_tm1 = 0;
    v_tm1 = 0;
    theta_tm1 = para;
    
    theta_best = para;
    loss_best = 1e9;
    loss(1) = norm((bf_mex(depth,theta_tm1,'idd') - idd_i),'fro');
    for t = 2:T
        % get gradient = jacobian*error
        g_t = 2*bf_mex(depth,theta_tm1,'jacobian')'*(bf_mex(depth,theta_tm1,'idd') - idd_i);
        % Update biased first moment estimate
        m_t = beta1*m_tm1 + (1-beta1)*g_t;
        % Update biased second raw moment estimate
        v_t = beta2*v_tm1 + (1-beta2)*g_t.^2;
        % Compute bias-corrected first moment estimate
        m_t_hat = m_t / (1-beta1^(t-1));
        % Compute bias-corrected second raw moment estimate
        v_t_hat = v_t / (1-beta2^(t-1));
        % Update parameters
        theta_t = theta_tm1 - lr*m_t_hat./(sqrt(v_t_hat)+epsilon);
        
        % constrain
        theta_t(theta_t < lb) = lb(theta_t < lb);
        theta_t(theta_t > ub) = ub(theta_t > ub);
        
        theta_tm1 = theta_t;
        m_tm1 = m_t;
        v_tm1 = v_t;
            
        idd_pred = bf_mex(depth,theta_t,'idd');
        loss(t) = norm((idd_pred - idd_i),'fro');
        
        if loss(t) < loss_best
           loss_best = loss(t);
           theta_best = theta_t;
        end
        if (abs(loss(t) - loss(t-1)) < 1e-6)
            break;
        end
    end
    
end

About combining with pnm

Hi,

Firstly, thanks for the great work.

I just wonder if adai is suitable for combining together with positive-negative momentum which is another work of you. From my understanding of these two, it is ok to replace the original momentum with pnm in adai. Do you have any suggestions or experiences?

Thanks for the reply.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.