Giter Club home page Giter Club logo

sltrain's Introduction

SLTrain

A repository containing beta implementation for SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining. Preprint available on http://arxiv.org/abs/2406.02214.

Modeling for pretraining

The main idea is to re-parameterize linear layer with low-rank and sparse factors for improved parameter and memory efficiency.

W = BA + S,

where B, A model the low-rank component and S models the sparse component. S has a random sparsity pattern.

Motivation

Below, we show how the learned weights L + S enlarges the spectrum. In particular, the L component primarily learns the head singular value spectrum and the S component primarily learns the tail spectrum.

Contribution of L and S components in the singular values of learned W

Zoomed view

Results

Result Comparisons

SlTrain Memory

Installation

Build cpp extensions via

cd ./sparse-lora
pip install .

Usage

Run the scripts placed in scripts/llm_pretrain/. Typical usage:

torchrun --standalone --nproc_per_node 1 torchrun_main.py \
    --model_config configs/llama_60m.json \
    --lr 0.003 \
    --peft_model sltrain\
    --optimizer adamw \
    --rank 128 \  
    --sp_ratio 0.03 \  # sparsity delta
    --batch_size 256 \
    --total_batch_size 512 \
    --num_training_steps 11000 \
    --warmup_steps 1100 \
    --weight_decay 0 \
    --dtype bfloat16 \
    --eval_every 1000 \
    --lora_alpha 32 

Citation

@article{han2024sltrain,
  title={{SLTrain}: a sparse plus low-rank approach for parameter and memory efficient pretraining},
  author={Han, Andi and Li, Jiaxiang and Huang, Wei and Hong, Mingyi and Takeda, Akiko and Jawanpuria, Pratik and Mishra, Bamdev},
  journal={arXiv preprint arXiv:2406.02214},
  year={2024}
}

sltrain's People

Contributors

andyjm3 avatar bamdevm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

bamdevm

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.