Giter Club home page Giter Club logo

Comments (10)

Godofnothing avatar Godofnothing commented on August 18, 2024

@18140663659 this is research repository for benchmarking and evaluation of the efficacy of the pruning method.
We could add, in principle, demo colab (for smaller models) with generations of the sparse model.

from sparsegpt.

18140663659 avatar 18140663659 commented on August 18, 2024

@18140663659 this is research repository for benchmarking and evaluation of the efficacy of the pruning method.
We could add, in principle, demo colab (for smaller models) with generations of the sparse model.

Thank you for your reply. if you can add this demo colab, this is helpful for me !
I would like to ask: if the weight is only set to 0 and the storage format is not changed, the model volume should not decrease. Do you recommend any tools to support sparse reasoning in the application process? For example, deepSparse?

from sparsegpt.

Godofnothing avatar Godofnothing commented on August 18, 2024

@18140663659 Ok, I'll try to add some demo with demonstrations. Vanilla PyTorch cannot utilize the sparsity, as you've said the memory storage and compute is the same.

DeepSparse is a great tool for model compression and acceleration on CPU. In a recent blogpost they claim to show some speedups with OPT-2.7b model.

from sparsegpt.

18140663659 avatar 18140663659 commented on August 18, 2024

@18140663659 Ok, I'll try to add some demo with demonstrations. Vanilla PyTorch cannot utilize the sparsity, as you've said the memory storage and compute is the same.

DeepSparse is a great tool for model compression and acceleration on CPU. In a recent blogpost they claim to show some speedups with OPT-2.7b model.

Thank you for your answer. I would also like to ask a question: If I want to save the SparseGPT pruned and quantified model (with reduced volume, eg 14G(7b) -> ~7G (7b + 50% sparse)) and support reasoning, what should I do? Is there a recommended route to use the tool

from sparsegpt.

Godofnothing avatar Godofnothing commented on August 18, 2024

@18140663659 I've added a demo with use case.
Concerning the save of SparseGPT model - we do not provide the option of saving the pruned + quantized model.
For quantization one can use the code from the GPTQ repository.

from sparsegpt.

efrantar avatar efrantar commented on August 18, 2024

See also my comment here for references to some other libraries for actually exploiting sparse models in practice.

from sparsegpt.

xiao1228 avatar xiao1228 commented on August 18, 2024

hi @Godofnothing I have run the sparseGPT and saved the sparse model and then quantized the sparse model with GPTQ, then all the sparsity are gone.. is there another way of doing it? Thank you!

from sparsegpt.

Godofnothing avatar Godofnothing commented on August 18, 2024

Hi, @xiao1228. Note, that GPTQ updates the weights within the same input dimension, when quantizing the weights.
image
Unless you manifestly prevent the change of the pruned weights (since implementation of the GPTQ is not aware of the existence of sparse weights) they could be changed. I would propose two solutions to prevent such outcome:

  • You can merge the SparseGPT and GPTQ implementation and prune a fraction of weights (say 50%) in the inner loop via SparseGPT and then process the remaining weights via GPTQ.
  • You can run SparseGPT procedure first, save the masks (locations of the zero weights), and then manifestly impose sparsity in GPTQ (prevent this weights from being updated).

from sparsegpt.

efrantar avatar efrantar commented on August 18, 2024

Note that sparse + quant, as discussed in the paper, is actually implemented in this repository as well (see gptq.py). You can test it via the --wbits option of opt.py. However, there is currently no code for exporting or running such a sparse + quantized model in compressed form (only in simulated sparse + quantized mode via FP16 weights).

from sparsegpt.

xiao1228 avatar xiao1228 commented on August 18, 2024

thank you @efrantar i tried that option, i didnt get very good PPL for 50% sparse +4bit,

  • for gptq 4bits on its own PPL is 5.78 on wikitext2(baseline 5.68)
  • for sparsegpt 50% sparse on its own PPL is 7.21 on wikitext2

For 50% sparse + 4bits PPL is 14.54 on wikitext2 after using python llama.py ./llama-7b/ c4 --sparsity 0.5 --wbits 4 --save ./llama_pth_7B_50sparse_4bits
and the saved model is quantized mode via FP16 weights and with 50% sparse in it right?

As @Godofnothing suggested option 2, i generated using sparseGPT for a 50% sparse model and then i tried to put mask in this function in gptq for llama and run gptq: https://github.com/qwopqwop200/GPTQ-for-LLaMa/blob/triton/quant/quantizer.py#L28

after that before evaluation i did a check for sparsity, there are sparsity in layers but not every layer is 50% sparse, maybe there are some other operations inside GPTQ that i missed.
However, when i try to export the model and it call llama_pack https://github.com/qwopqwop200/GPTQ-for-LLaMa/blob/triton/llama.py#L265 after loading back that packed model all the zeros are gone...

Thank you for the help!

from sparsegpt.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.