Giter Club home page Giter Club logo

mlbd2022fall-minibatch-sgd's Introduction

mlbd2022fall-minibatch-sgd

Machine Learning & Big Data 2022 Fall homework 1: mini batch-sgd

https://github.com/keyork/mlbd2022fall-minibatch-sgd

Usage

pip install numpy pandas matplotlib colorlog
python train.py -h
python train.py --args ...

Task

  • Using Mini-batch gradient descent for the example in slides 31-33

  • Test the performances with different batch sizes

Model

Four main parts: Dataloader, Linear Model, SGD, Back Line Search

Dataloader

Using iteration in Python, randomly rearrange all data, load {batch size} data each time.

Linear Model

Using array * array in numpy directly instead of circulate, args is also a np.array: $\beta$.

SGD

$$ \beta = \beta - learningrate \cdot \frac{(f(x)-y)\cdot x}{batchsize} $$

Back Line Search

$$ loss(x+learningrate\cdot \nabla loss(x))\leq loss(x)+c_1\cdot learningrate\cdot (\nabla loss(x))^2 $$

$$ \nabla loss(x+learningrate\cdot \nabla loss(x))\geq c_2 (\nabla loss(x))^2 $$

$$ 0 \lt c_1 \lt c_2 \lt 1 $$

Search in the direction of getting smaller to get $lr_{max(temp)}$, then larger to get the true $lr_{max}$(if the initial value less than true $lr_{max}$), then smaller to get $lr_{min}$.

Set $lr = \sqrt{lr_{min}lr_{max}}$.

Experiments

Loss Curve & Batch Size

Set iteration = {1,3,20,50}, using back line search to ensure learning rate, set batch_size = {1,10,50,100,500,1000,4000}, record result and loss curve.

img-bs_compare-iter_1-bls_True-lr_0.02img-bs_compare-iter_3-bls_True-lr_0.02img-bs_compare-iter_20-bls_True-lr_0.02img-bs_compare-iter_50-bls_True-lr_0.02

Loss Curve & Back Line Search

Not using back line search, repeat the experiments.

img-bs_compare-iter_1-bls_False-lr_0.02img-bs_compare-iter_3-bls_False-lr_0.02img-bs_compare-iter_20-bls_False-lr_0.02img-bs_compare-iter_50-bls_False-lr_0.02

Ablation Experiment

remove bls, remove mini batch

img-bs_50-iter_1-bls_compare-lr_0.001img-bs_compare-iter_1-bls_True-lr_0.001

Analysis

The larger the batch size, the slower the model converges if others are the same. Back Line Search can ensure that the learning rate is appropriate to avoid divergences and allow the model to converge quickly.

Result

We use the result by {iter=50, batch_size=50, back line search=True} as a good outcome:$\beta=[87.31551772, 8.87405893, 0.4220265, -1.78599689]$

$$ y=87.3+8.87x_1+0.42x_2-1.79x_3 $$

mlbd2022fall-minibatch-sgd's People

Contributors

keyork avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.