Comments (6)
It needs some major refactoring on the hqq lib side to support serialization with transformers.This is because it's not possible to store the meta-data directly as a safetensor. Since hqq supports different backends and various settings, that meta-data becomes cumbersome. Currently, I am trying to simplify things to make the refactoring easier. For the moment, you can use the approach shared above:
@mxjmtxrm save/load was initially included in the pull request, but we decided to remove it because the logic was too different from the rest of the transformers models. For the moment, you can do something like this:
from hqq.models.hf.base import AutoHQQHFModel #Save AutoHQQHFModel.save_quantized(model, save_path) #Load AutoHQQHFModel.from_quantized(save_path)
from transformers.
Got it. Thanks.
from transformers.
Hi @mxjmtxrm, this is indeed the case. You cannot serialize the model when using transformers integration because the serialization logic was not compatible with transformers code. However, you can use directly the hqq library if you want to save and relaod a quantized model https://github.com/mobiusml/hqq cc @mobicham
from transformers.
@mxjmtxrm save/load was initially included in the pull request, but we decided to remove it because the logic was too different from the rest of the transformers models. For the moment, you can do something like this:
from hqq.models.hf.base import AutoHQQHFModel
#Save
AutoHQQHFModel.save_quantized(model, save_path)
#Load
AutoHQQHFModel.from_quantized(save_path)
from transformers.
I'm wondering that is_serializable
is True in AWQ/EETQ and other quantizers except HQQ. If it can be set True directly in HQQ quantizer?
AWQ quantizer:
@property
def is_serializable(self):
# AWQ through auto-awq has been always serializable, except if the model is fused.
if self.quantization_config.do_fuse:
logger.warning("You cannot save an AWQ model that uses fused modules!")
return False
if self.quantization_config.version == AWQLinearVersion.EXLLAMA:
logger.warning("You cannot save an AWQ model that uses Exllama backend!")
return False
return True
from transformers.
We set it to True when it is indeed possible to save the model with transformers logic. with EETQ and AWQ, they can indeed be serialized but this is not the case for HQQ. Hence we set it to False.
from transformers.
Related Issues (20)
- Off-by-one error in strided perplexity calculation
- RuntimeError: unique_by_key: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered HOT 2
- Autotokenizer."from_pretrained" read wrong config file. not "tokenizer_config.json", but "config.json" HOT 3
- ViTLayer.forward() needs to be in "eager" mode when `output_attentions=True` HOT 2
- Fix for hardcoded `final_labels` to enable loss calculation in PaliGemma HOT 9
- Sentence Transformers Gets Stuck loading HOT 3
- Paligemma causal attention still not causal ? HOT 5
- Add Nomic Embed Code to Transformers HOT 2
- loss calculation for PaliGemmaForConditionalGeneration potentially not cast to correct device HOT 2
- Trainer should throw a warning if max_sequence_length < number of tokens in dataset sample record. HOT 1
- Missing "config.json" when loading Llama-2-7b-chat-hf HOT 2
- OSError due to huggingface-hub FutureWarning about resume_download HOT 1
- Is there a source code installation method available? For example: from Test_transformers import AutoModelForCausalLM, AutoTokenizer HOT 4
- Mistral compile not working on T4 GPU (torch 2.3 + cu121)
- model.generate() able to accept past_key_values=None HOT 2
- RuntimeError: unable to open file when calling from_pretrained on multiple processes after upgrading hugginface_hub to 0.23.1 HOT 3
- Models with Phi3Config fail due to missing attention_bias
- Arm-based (M1, M2, M3) Mac Install Issues for `pip install transformers` (<4.23.0)
- `SeamlessM4Tv2ConformerEncoder` does not behaves as expected if gradient checkpointing is enabled HOT 1
- Llama tokenizer inconsistency for the newline character for convert_tokens_to_ids HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from transformers.