Comments (4)
cc @SunMarc
from transformers.
Hi @jojje, thanks for the detailed report ! For the gguf tests, we are using transformers-quantization-latest-gpu docker file since we are trying to run quantization related tests. You will see that we indeed have pip install gguf
in the Dockerfile. Could you try with that instead ? From the traceback, I see that qwen2 test (tests/quantization/ggml/test_ggml.py::GgufIntegrationTests::test_qwen2_q4_0
) is not passing and the reason is that qwen2
is not in GGUF_SUPPORTED_ARCHITECTURES
which is very strange. Could you try to debug from there ? I'm not able to reproduce this unfortunately.
from transformers.
In current HEAD (72fb02c)
- The Qwen2 problem seems fixed.
- The Llama test still fails, but for a different reason; one that is trivially fixed. It seems the generated tokens have changed from the expected
"Hello, I am interested in [The Park]\nThe"
to"Hello, I am new to this forum. I am"
.
So I consider the reported issue resolved.
The latest test results with more details in the attached log, as before.
72fb02c47 Fixed `log messages` that are resulting in TypeError due to too many arguments (#32017)
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.4.4, pluggy-1.5.0 -- /usr/bin/python3
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase(PosixPath('/home/user/transformers/.hypothesis/examples'))
rootdir: /home/user/transformers
configfile: pyproject.toml
plugins: dash-2.17.1, rich-0.1.1, hypothesis-6.108.2, xdist-3.6.1, timeout-2.3.1
collecting ... collected 14 items
tests/quantization/ggml/test_ggml.py::GgufIntegrationTests::test_llama3_q4_0 FAILED [ 7%]
tests/quantization/ggml/test_ggml.py::GgufIntegrationTests::test_llama3_q4_0_tokenizer PASSED [ 14%]
tests/quantization/ggml/test_ggml.py::GgufIntegrationTests::test_mistral_q4_0 PASSED [ 21%]
tests/quantization/ggml/test_ggml.py::GgufIntegrationTests::test_q2_k PASSED [ 28%]
tests/quantization/ggml/test_ggml.py::GgufIntegrationTests::test_q2_k_serialization PASSED [ 35%]
tests/quantization/ggml/test_ggml.py::GgufIntegrationTests::test_q3_k PASSED [ 42%]
tests/quantization/ggml/test_ggml.py::GgufIntegrationTests::test_q4_0 PASSED [ 50%]
tests/quantization/ggml/test_ggml.py::GgufIntegrationTests::test_q4_k_m PASSED [ 57%]
tests/quantization/ggml/test_ggml.py::GgufIntegrationTests::test_q5_k PASSED [ 64%]
tests/quantization/ggml/test_ggml.py::GgufIntegrationTests::test_q6_k PASSED [ 71%]
tests/quantization/ggml/test_ggml.py::GgufIntegrationTests::test_q6_k_fp16 PASSED [ 78%]
tests/quantization/ggml/test_ggml.py::GgufIntegrationTests::test_q8_0 PASSED [ 85%]
tests/quantization/ggml/test_ggml.py::GgufIntegrationTests::test_qwen2_q4_0 PASSED [ 92%]
tests/quantization/ggml/test_ggml.py::GgufIntegrationTests::test_tokenization_xnli PASSED [100%]
There is one other problem I found that would need fixing, and that is the fact that the latest version (0.9.1) of the gguf
package on pypi is broken. This makes me wonder what specific version you've baked into your transformers-quantization-latest-gpu/Dockerfile
CI image. You're not pinning the gguf package, so whatever version was the latest when you built the CI container image will be what's baked in.
I worked around the gguf version problem by installing the gguf package directly from the source llamacpp source tree. For some reason that results in a working gguf package, even for 0.9.1. So you may want to consider pinning all your python packages going forward to avoid external dependency drifts.
The exact package versions in my container are provided in the accompanying pytest log for completeness sake, to ensure the reason for our different test outcomes isn't due to different (unpinned) python dependencies.
PS. If you want me send a PR with the fixed test, let me know. Otherwise I trust you know how to fix it yourself.
from transformers.
Nice investigation for the gguf package issue ! This will help me a lot to debug that ! As for the PR, I will fix it after the gguf package issue is solved ! Thanks a lot again for this detailed report !
from transformers.
Related Issues (20)
- ValueError: InternVLChatModel does not support Flash Attention 2.0 yet. HOT 2
- model.generate cannot handle past_key_values correctly HOT 2
- Embedding size 0 when using TrainingArguments & Deepspeed stage 3 with ```model.get_input_embedding()``` HOT 1
- Chameleon model failed after receiving two times the same inputs HOT 2
- Transformer Hangup. HOT 2
- Minor typo in ImageClassificationPipeline
- Bert cannot converge on toy dataset HOT 1
- Why MPS can never be used successfully? HOT 5
- ddp_time in TrainingArguments with deepspeed doesn't take effect HOT 1
- static cache implementation is not compatible with attn_implementation==flash_attention_2 HOT 2
- `AutoModel` class for `image-text-to-text` models HOT 3
- Load llama-2-70b model need too much CPU memory HOT 1
- Using `numpy==2.0.0` HOT 1
- Bug version 4.42.4: KeyError: 'Cache only has 0 layers, attempted to access layer with index 0'
- *Nothing* HOT 1
- More robust tests required for gradient checkpointing HOT 3
- The ProgressCallback triggers a `cannot pickle '_thread.lock' object` failure HOT 5
- Checkpoint validation as an option HOT 5
- BertForSequenceClassification.from_pretrained broken when using FSDP HOT 4
- unexpected keyword argument 'torch_empty_cache_steps' in TrainingArguments HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from transformers.