Giter Club home page Giter Club logo

hiroyuki-kasai / sgdlibrary Goto Github PK

View Code? Open in Web Editor NEW
209.0 19.0 86.0 32.04 MB

MATLAB/Octave library for stochastic optimization algorithms: Version 1.0.20

License: MIT License

MATLAB 85.71% HTML 9.83% Makefile 0.15% C 4.31%
optimization optimization-algorithms machine-learning machine-learning-algorithms stochastic-optimization-algorithms stochastic-gradient-descent big-data gradient-descent-algorithm gradient logistic-regression

sgdlibrary's Introduction

SGDLibrary : Stochastic Optimization Algorithm Library in MATLAB/Octave


Authors: Hiroyuki Kasai

Last page update: November 20, 2020

Latest library version: 1.0.20 (see Release notes for more info)


Announcement

We are very welcome to your contribution. Please tell us

  • Stochastic optimization solvers written by MATLAB, and
  • your comments and suggestions.

Introduction

The SGDLibrary is a pure-MATLAB library or toolbox of a collection of stochastic optimization algorithms. This solves an unconstrained minimization problem of the form, min f(x) = sum_i f_i(x). The SGDLibrary is also operable on GNU Octave (Free software compatible with many MATLAB scripts). Note that this SGDLibrary internally contains the GDLibrary.


Document

The document of SGDLibrary can be obtained from below;



Algorithm configurations

Algorithm name in example codes function options.sub_mode other options
SGD sgd
SGD-CM sgd_cm 'CM'
SGD-CM-NAG sgd_cm 'CM-NAG'
AdaGrad adagrad 'AdaGrad'
RMSProp adagrad 'RMSProp'
AdaDelta adagrad 'AdaDelta'
Adam adam 'Adam'
AdaMax adam 'AdaMax'
SVRG svrg
SAG sag 'SAG'
SAGA sag 'SAGA'
SARAH sarah
SQN slbfgs 'SQN'
SVRG-SQN slbfgs 'SVRG-SQN'
SVRG-LBFGS slbfgs 'SVRG-LBFGS'
SS-SVRG subsamp_svrg
oBFGS-Inf obfgs 'Inf-mem'
oLBFGS-Lim obfgs 'Lim-mem'
Reg-oBFGS-Inf obfgs 'Inf-mem' regularized=true
Damp-oBFGS-Inf obfgs 'Inf-mem' regularized=true & damped=true
IQN iqn
SCR scr gradient_sampling=1 & Hessian_sampling=1
Subsampled-TR subsamp_tr gradient_sampling=1 & Hessian_sampling=1
SVRG-BB svrg_bb
  • Note that other algorithms could be configurable by selecting other combinations of sub_mode and options.

  • L2-norm regularized multidimensional linear regression
  • L2-norm regularized linear SVM
  • L2-norm regularized logistic regression
  • Softmax classification (multinomial logistic regression)
    • Note that softmax classification problem does not support Hessian-vector product type algorithms, i.e., SQN, SVRG-SQN and SVRG-LBFGS.
  • L1-norm regularized multidimensional linear regression
  • L1-norm regularized logistic regression
  • Sum quadratic problem

Additionally, the following problems are provided for gradient descent algorithms.


Folders and files

./                      - Top directory.
./README.md             - This readme file.
./run_me_first.m        - The scipt that you need to run first.
./demo.m                - Demonstration script to check and understand this package easily. 
|plotter/               - Contains plotting tools to show convergence results and various plots.
|tool/                  - Some auxiliary tools for this project.
|problem/               - Problem definition files to be solved.
|sgd_solver/            - Contains various stochastic optimization algorithms.
|sgd_test/              - Some helpful test scripts to use this package.
|gd_solver/             - Contains various gradient descent optimization algorithms.
|gd_test/               - Some helpful test scripts using gradient descent algorithms to use this package.

First to do

Run run_me_first for path configurations.

%% First run the setup script
run_me_first; 

Simplest usage example: 4 steps!

Just execute demo for the simplest demonstration of this package. This is the case of logistic regression problem.

%% Execute the demonstration script
demo; 

The "demo.m" file contains below.

%% generate synthetic data        
% set number of dimensions
d = 3;
% set number of samples    
n = 300;
% generate data
data = logistic_regression_data_generator(n, d);


%% define problem definitions
problem = logistic_regression(data.x_train, data.y_train, data.x_test, data.y_test); 


%% perform algorithms SGD and SVRG 
options.w_init = data.w_init;
options.step_init = 0.01;       
[w_sgd, info_sgd] = sgd(problem, options);  
[w_svrg, info_svrg] = svrg(problem, options);


%% display cost/optimality gap vs number of gradient evaluations
display_graph('grad_calc_count','cost', {'SGD', 'SVRG'}, {w_sgd, w_svrg}, {info_sgd, info_svrg});

Let take a closer look at the code above bit by bit. The procedure has only **4 steps**!

Step 1: Generate data

First, we generate datasets including train set and test set using a data generator function logistic_regression_data_generator(). The output include train set and test set and an initial value of the solution w.

d = 3;
n = 300;
data = logistic_regression_data_generator(n, d);

Step 2: Define problem

The problem to be solved should be defined properly from the supported problems. logistic_regression() provides the comprehensive functions for a logistic regression problem. This returns the cost value by cost(w), the gradient by grad(w) and the hessian by hess(w) when given w. These are essential for any gradient descent algorithms.

problem = logistic_regression(data.x_train, data.y_train, data.x_test, data.y_test); 

Step 3: Perform solver

Now, you can perform optimization solvers, i.e., SGD and SVRG, calling solver functions, i.e., sgd() function and svrg() function after setting some optimization options.

options.w_init = data.w_init;
options.step_init = 0.01;  
[w_sgd, info_sgd] = sgd(problem, options);  
[w_svrg, info_svrg] = svrg(problem, options);

They return the final solutions of w and the statistics information that include the histories of epoch numbers, cost values, norms of gradient, the number of gradient evaluations and so on.

Step 4: Show result

Finally, display_graph() provides output results of decreasing behavior of the cost values in terms of the number of gradient evaluations. Note that each algorithm needs different number of evaluations of samples in each epoch. Therefore, it is common to use this number to evaluate stochastic optimization algorithms instead of the number of iterations.

display_graph('grad_calc_count','cost', {'SGD', 'SVRG'}, {w_sgd, w_svrg}, {info_sgd, info_svrg});

That's it!


More plots

"demo_ext.m" gives you more plots.

  • Demonstration of "optimality gap"

For the calculation of "optimality gap", you need optimal solution w_opt beforehand by calling calc_solution() function of the problem definition function.

%% calculate optimal solution for optimality gap
w_opt = problem.calc_solution(1000);
options.f_opt = problem.cost(w_opt);

This case uses the full gradient descent solve gd() to obtain an optimal solution under max iteration 1000 with very precise tolerant stopping condition.

Then, you obtain the result of optimality gap by display_graph().

display_graph('grad_calc_count','optimality_gap', {'SGD', 'SVRG'}, {w_sgd, w_svrg}, {info_sgd, info_svrg});    
  • Demonstration of "classification accuracy"

Additionally, in this case of logistic regression, the results of classification accuracy are calculated using the corresponding prediction function prediction() and accuracy of the problem definition function logistic_regression(). Furthermore, the classification accuracies are illustrated by display_classification_result() function that is written in "demo.m" like below;

%% calculate classification accuracy
% for SGD
% predict
y_pred_sgd = problem.prediction(w_sgd);
% calculate accuracy
accuracy_sgd = problem.accuracy(y_pred_sgd); 
fprintf('Classificaiton accuracy: %s: %.4f\n', 'SGD', accuracy_sgd);
% convert from {1,-1} to {1,2}
y_pred_sgd(y_pred_sgd==-1) = 2;
y_pred_sgd(y_pred_sgd==1) = 1; 

% for SVRG
% predict    
y_pred_svrg = problem.prediction(w_svrg);
% calculate accuracy
accuracy_svrg = problem.accuracy(y_pred_svrg); 
fprintf('Classificaiton accuracy: %s: %.4f\n', 'SVRG', accuracy_svrg);
% convert from {1,-1} to {1,2}
y_pred_svrg(y_pred_svrg==-1) = 2;
y_pred_svrg(y_pred_svrg==1) = 1;


%% display classification results 
% convert from {1,-1} to {1,2}
data.y_train(data.y_train==-1) = 2;
data.y_train(data.y_train==1) = 1;
data.y_test(data.y_test==-1) = 2;
data.y_test(data.y_test==1) = 1;  
% display results
display_classification_result(problem, {'SGD', 'SVRG'}, {w_sgd, w_svrg}, {y_pred_sgd, y_pred_svrg}, {accuracy_sgd, accuracy_svrg}, data.x_train, data.y_train, data.x_test, data.y_test);    

Output results:



  • Demonstration of "convergence animation"

You need specify additional options before executing solvers.

%% set options for convergence animation
options.max_epoch = 100;    
options.store_w = true;

Then, draw_convergence_animation() draws a convergence animation. Note that draw_convergence_animation() is executable when only the dimension of the parameters is 2.

%% display convergence animation
draw_convergence_animation(problem, {'SGD', 'SVRG'}, {info_sgd.w, info_svrg.w}, options.max_epoch);   

Example results of other problems

  • Linear regression problem

  • Softmax classifier problem

  • Linear SVM problem




Convergence behavior animation example (Linear regression problem)

"test_convergence_animation_demo.m" provides you an animation of convergence behaviors of algorithms. Please click the image below to see its animation.




License

  • The SGDLibrary is free and open source.
  • The code provided iin SGDLibrary should only be used for academic/research purposes.
  • The codes provided by original papers are included. (Big thanks !!!)
  • The codes ported from original python codes are included. (Big thanks !!!)
    • scr.m, cr_subsolver.m, subsamp_tr.m, tr_subsolver.m: Python codes are originally created by J. M. Kohler and A. Lucchi. These MATLAB codes are ported with original authors' big helps.
  • Third party files are included.
    • subsamp_newton.m: originally created by Peng Xu and Jiyan Yang in Subsampled-Newton. This is modifided to handle other problems like linear regression.
    • Proximal Solver from FOM

Notes

  • As always, parameters such as the step size should be configured properly in each algorithm and each problem.
  • Softmax classification problem does not support "Hessian-vector product" type algorithms, i.e., SQN, SVRG-SQN and SVRG-LBFGS.
  • This SGDLibrary internally contains the GDLibrary.

Problems or questions

If you have any problems or questions, please contact the author: Hiroyuki Kasai (email: hiroyuki dot kasai at waseda dot jp)


Release Notes

  • Version 1.0.20 (Nov. 10, 2020)
    • Buf fixed, and some files are added.
  • Version 1.0.19 (Oct. 27, 2020)
    • Buf fixed, and some files are added.
  • Version 1.0.17 (Apr. 17, 2018)
    • Sub-sampled CR (including ARC) and Sub-sampled TR are nely added.
  • Version 1.0.16 (Apr. 01, 2018)
    • GNU Octave is supported.
    • Change the functions of problem into class-based definitions.
  • Version 1.0.12 (Sep. 29, 2017)
    • SARAH is nely added.
  • Version 1.0.11 (Sep. 28, 2017)
    • SGD-CM and SGD-CM-NAG are nely added.
  • Version 1.0.10 (Sep. 26, 2017)
    • Options paramter in solvers is re-organized.
    • Separate the function to store statistics information from solver.
  • Version 1.0.9 (Sep. 25, 2017)
    • Proximal operator is newly added.
    • Some new problems are added.
    • User-defined stepsize algorithm is supported. See test_stepsize_alg_demo.m.
  • Version 1.0.8 (Mar. 28, 2017)
    • IQN (incremental Quasi-Newton method, iqn.m) is nely included.
  • Version 1.0.7 (Mar. 17, 2017)
    • Add some functions and modify items.
  • Version 1.0.6 (Mar. 13, 2017)
    • Add some functions and modify items. Sum quadratic problem is added.
  • Version 1.0.5 (Jan. 12, 2017)
    • Add some functions and modify items.
  • Version 1.0.4 (Nov. 04, 2016)
    • Integrate GDLibrary with SGDLibrary.
  • Version 1.0.3 (Nov. 04, 2016)
    • Modify many items.
  • Version 1.0.2 (Nov. 01, 2016)
    • SVRG-BB (SVRG with Barzilai-Borwein) is added.
  • Version 1.0.1 (Oct. 28, 2016)
    • SS-SVRG (Subsampled Hessian algorithm followed by SVT) is added.
    • Convergence behavior animation function is added.
  • Version 1.0.0 (Oct. 27, 2016)
    • Initial version.

sgdlibrary's People

Contributors

hiroyuki-kasai avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sgdlibrary's Issues

test_l1_linear_regression.m Problem

Thanks for developing such a library. I am trying to learn the SGD through the demon, it helps me a lot. But when I try to run this demo(test_l1_linear_regression.m), the command line throw an error:
The method, attribute or field'prox_flag' of class'l1_linear_regression' is not recognized.

demo.m code problem

Thanks for developing such a library. When trying to use it, I found that when I change the values of "d" and "n" in the file "demo.m", there will be some errors in the convergence results. For instance, if I choose "d = 105", "n = 30000", the returned results are very strange as shown before:

SVRG: Epoch = 000, cost = Inf, optgap = Inf
SVRG: Epoch = 001, cost = NaN, optgap = NaN

Is there any problem with the code for the reusability?

linear_regression_data_generator.m

Hello,

@hiroyuki-kasai Thank you for creating this project with a wide variety of algorithms.

I went through the code in linear_regression_data_generator.m and was not quite clear how the data is being generated. Also, on running the code I find that all my rows have the same number. Can you explain how the dataset is generated for linear regression?

% set number of dimensions
d = 50;
% set number of samples
n = 7000;
% generate data
std = 0.25
data = linear_regression_data_generator(n, d, std);

Attached below is the data(x_train) and label(y_train) generated

data.xlsx
label.xlsx

Thank you!

Binary 1s and 0s

Greetings Hiroyuki,

I'm so excited to have come across such a rich library in SGD, thank you for these great project.

I'm currently learning Python, Matlab and Machine Learning for data training and prediction.

I have collected data as bitmap files (400 files each containing 1000 ~ 1s and 0s), I wanted to do data training and prediction on single file first, and later a couple of files (5 files, 10 files, 20 files... and so on). These bitmap files are in .txt (text files) format.

Kindly advice, how I can use these bitmap files as input?

Lawrence.

damping scheme in obfgs

Dear Prof. Hiroyuki KASAI,

I have some confusion about the damping scheme in obfgs.
You gave a citation of Wang et al. 's "Stochastic Quasi-Newton Methods for Nonconvex Stochastic Optimization," when mentioning the damping, but I found that the code is more likely to be the classical Powell's damping. In the limited memory setting, Wang's damping only requires one two-loop-recursion but in your implementation you call it twice. In addition, the threshold value indicating when to carray out the damping is set to 0.2 in your code, which is the same to Powell's method but different from the Wang's. I also found a probable bug, if it is indeed the Powell's damping scheme. In Line 144, the probablly correct code is "lbfgs_two_loop_recursion(s, s_array, y_array)" (otherwise you may directly reuse the variable calculated before). Is this a typo or something wrong with my opinion?
Thanks!

Python porting questions

Greetings,

First, I would like to thank you for creating such a nice project with so many optimization algorythms!

I'm currently trying to port the online memory limited BFGS algorythm to python ( yes scipy has many algorythm coded in haskell but it's lacking some that I would like to test, namely this one.)

I've got most of it done but I'm confused about the gradient descent part using the weight on the input.

I'm not familiar with matlab and I want to know if calling .cost , .grad or anything else on the input (the variable problem) are integrated method in the problem code itself as a method or is it a matlab built-in?

P.S.:Thanks for all your hard work!

SGD Library

Hello, this library is really good. I am now learning machine learning algorithms. It helps me a lot and I have some problems. When I was learning this library, I saw the data in the file data. I wanted to know what the data meant, where it was collected, and what each data represented. Thank you very much

The 'NAG' submode in sgd_cm.m

Hi @hiroyuki-kasai ,

Thanks for publishing this project. This looks great and I would like do some research with this toolbox.

My trouble is in the sgd_cm.m. It looks like containing two momentum schemes, the classic one ('CM') and the Nesterov's ('NAG'), but in the current implementain, they seem to differ only in the setting of the momentum coefficient. See lines 78, 80, and 82.

In my impression NAG should 'look one step ahead' before the gradient calculation, but in the code, the gradient is evaluated just in the current point. This seems to be inconsistent to the original paper. See equations 3 and 4 in Ilya Sutskever, James Martens, George Dahl and Geoffrey Hinton, "On the importance of initialization and momentum in deep learning," ICML, 2013.

Thank you!

Experiment Mismatch

It seems to me that the experiment in demo_paper.m has a mismatch between the result and setting.

image

Should the variable be info_sag instead? Also, I am wondering if the same issue occurs in the paper? Or the plots in the paper is actually correct.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.