System Info <a target="_blank" rel="noopener noreferrer" href="https://private-u

Fails the same way for me on a gcp vm with <code class="notranslate"

CohereForAI/c4ai-command-r-plus-4bit deployment fails on Inference Endpoint about text-generation-inference HOT 2 OPEN

h4gen commented on May 25, 2024 2

CohereForAI/c4ai-command-r-plus-4bit deployment fails on Inference Endpoint

from text-generation-inference.

Comments (2)

backroom-coder commented on May 25, 2024 2

same issue
2024-05-03T17:11:35.945462Z INFO text_generation_launcher: Unknown quantization method bitsandbytes

from text-generation-inference.

davhin commented on May 25, 2024

Fails the same way for me on a gcp vm with

docker run --gpus all --shm-size 1g -p 8888:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:2.0 --model-id $model --speculate 3 --num-shard 2

File "/opt/conda/bin/text-generation-server", line 8, in
sys.exit(app())

File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 90, in serve
server.serve(

File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 240, in serve
asyncio.run(

File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)

File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()

File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 201, in serve_inner
model = get_model(

File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/init.py", line 375, in get_model
return FlashCohere(

File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_cohere.py", line 61, in init
model = FlashCohereForCausalLM(config, weights)

File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_cohere_modeling.py", line 482, in init
self.model = FlashCohereModel(config, weights)

File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_cohere_modeling.py", line 420, in init
[

File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_cohere_modeling.py", line 421, in
FlashCohereLayer(

File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_cohere_modeling.py", line 360, in init
self.self_attn = FlashCohereAttention(

File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_cohere_modeling.py", line 217, in init
self.query_key_value = load_attention(config, prefix, weights)

File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_cohere_modeling.py", line 140, in load_attention
return _load_gqa(config, prefix, weights)

File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_cohere_modeling.py", line 167, in _load_gqa
assert list(weight.shape) == [

AssertionError: [44040192, 1] != [8192, 12288]
rank=1

from text-generation-inference.

CohereForAI/c4ai-command-r-plus-4bit deployment fails on Inference Endpoint about text-generation-inference HOT 2 OPEN

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent