I am seeing the following failures around bitsandbytes with

No dice <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-

I was able to run the following command on a A10 powered VM: <div class="highlight

I'm not sure I still just see <div class="highlight highlight-source-shell notrans

<div class="highlight highlight-source-shell notranslate position-relative overflow-aut

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Issues Running Quantization on A10,about huggingface/text-generation-inference

Comments (20)

OlivierDehaene commented on May 12, 2024

This might have been an issue with the docker image.
Can you try with the latest one? ghcr.io/huggingface/text-generation-inference:sha-6837b2e

from text-generation-inference.

sam-h-bean commented on May 12, 2024

yeah one sec

from text-generation-inference.

sam-h-bean commented on May 12, 2024

No dice @OlivierDehaene it seems something has changed since I was able to get past this point yesterday

from text-generation-inference.

OlivierDehaene commented on May 12, 2024

I was able to run the following command on a A10 powered VM:

docker run --gpus "device=0" -p 8080:80 -v $PWD/data:/data  ghcr.io/huggingface/text-generation-inference:sha-6837b2e --model-id decapoda-research/llama-7b-hf --quantize

Are you sure you don't have another issue with your setup?

from text-generation-inference.

sam-h-bean commented on May 12, 2024

I'm not sure I still just see

{"timestamp":"2023-04-19T20:17:40.488749Z","level":"ERROR","fields":{"message":"Shard 0 failed to start:\n/opt/conda/lib/python3.9/site-packages/bitsandbytes/cextension.py:33: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.\n  warn(\"The installed version of bitsandbytes was compiled without GPU support. \"\nTraceback (most recent call last):\n\n  File \"/opt/conda/bin/text-generation-server\", line 8, in <module>\n    sys.exit(app())\n\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py\", line 58, in serve\n    server.serve(model_id, revision, sharded, quantize, uds_path)\n\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py\", line 135, in serve\n    asyncio.run(serve_inner(model_id, revision, sharded, quantize))\n\n  File \"/opt/conda/lib/python3.9/asyncio/runners.py\", line 44, in run\n    return loop.run_until_complete(main)\n\n  File \"/opt/conda/lib/python3.9/asyncio/base_events.py\", line 647, in run_until_complete\n    return future.result()\n\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py\", line 104, in serve_inner\n    model = get_model(model_id, revision, sharded, quantize)\n\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py\", line 133, in get_model\n    return llama_cls(model_id, revision, quantize=quantize)\n\n  File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/models/causal_lm.py\", line 306, in __init__\n    raise ValueError(\"quantization is not available on CPU\")\n\nValueError: quantization is not available on CPU\n\n"},"target":"text_generation_launcher"}

However I'm using bitsandbytes on a different pod on the same node and AMI

from text-generation-inference.

sam-h-bean commented on May 12, 2024

Do you have any idea what else would be horrible wrong here?

I went back to the 0.5.0 and am seeing

{"timestamp":"2023-04-19T20:30:38.758543Z","level":"INFO","fields":{"message":"Args { model_id: \"decapoda-research/llama-7b-hf\", revision: None, sharded: None, num_shard: Some(1), quantize: true, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1000, max_total_tokens: 1512, max_batch_size: 32, max_waiting_tokens: 20, port: 6018, shard_uds_path: \"/tmp/text-generation-server\", master_addr: \"localhost\", master_port: 29500, huggingface_hub_cache: Some(\"/data\"), weights_cache_override: None, disable_custom_kernels: false, json_output: true, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None }"},"target":"text_generation_launcher"}
{"timestamp":"2023-04-19T20:30:38.758781Z","level":"INFO","fields":{"message":"Starting shard 0"},"target":"text_generation_launcher"}
{"timestamp":"2023-04-19T20:30:40.794214Z","level":"ERROR","fields":{"message":"\"Error when initializing model\nTraceback (most recent call last):\n  File \\\"/opt/miniconda/envs/text-generation/bin/text-generation-server\\\", line 8, in <module>\n    sys.exit(app())\n  File \\\"/opt/miniconda/envs/text-generation/lib/python3.9/site-packages/typer/main.py\\\", line 311, in __call__\n    return get_command(self)(*args, **kwargs)\n  File \\\"/opt/miniconda/envs/text-generation/lib/python3.9/site-packages/click/core.py\\\", line 1130, in __call__\n    return self.main(*args, **kwargs)\n  File \\\"/opt/miniconda/envs/text-generation/lib/python3.9/site-packages/typer/core.py\\\", line 778, in main\n    return _main(\n  File \\\"/opt/miniconda/envs/text-generation/lib/python3.9/site-packages/typer/core.py\\\", line 216, in _main\n    rv = self.invoke(ctx)\n  File \\\"/opt/miniconda/envs/text-generation/lib/python3.9/site-packages/click/core.py\\\", line 1657, in invoke\n    return _process_result(sub_ctx.command.invoke(sub_ctx))\n  File \\\"/opt/miniconda/envs/text-generation/lib/python3.9/site-packages/click/core.py\\\", line 1404, in invoke\n    return ctx.invoke(self.callback, **ctx.params)\n  File \\\"/opt/miniconda/envs/text-generation/lib/python3.9/site-packages/click/core.py\\\", line 760, in invoke\n    return __callback(*args, **kwargs)\n  File \\\"/opt/miniconda/envs/text-generation/lib/python3.9/site-packages/typer/main.py\\\", line 683, in wrapper\n    return callback(**use_params)  # type: ignore\n  File \\\"/opt/miniconda/envs/text-generation/lib/python3.9/site-packages/text_generation_server/cli.py\\\", line 55, in serve\n    server.serve(model_id, revision, sharded, quantize, uds_path)\n  File \\\"/opt/miniconda/envs/text-generation/lib/python3.9/site-packages/text_generation_server/server.py\\\", line 135, in serve\n    asyncio.run(serve_inner(model_id, revision, sharded, quantize))\n  File \\\"/opt/miniconda/envs/text-generation/lib/python3.9/asyncio/runners.py\\\", line 44, in run\n    return loop.run_until_complete(main)\n  File \\\"/opt/miniconda/envs/text-generation/lib/python3.9/asyncio/base_events.py\\\", line 634, in run_until_complete\n    self.run_forever()\n  File \\\"/opt/miniconda/envs/text-generation/lib/python3.9/asyncio/base_events.py\\\", line 601, in run_forever\n    self._run_once()\n  File \\\"/opt/miniconda/envs/text-generation/lib/python3.9/asyncio/base_events.py\\\", line 1905, in _run_once\n    handle._run()\n  File \\\"/opt/miniconda/envs/text-generation/lib/python3.9/asyncio/events.py\\\", line 80, in _run\n    self._context.run(self._callback, *self._args)\n> File \\\"/opt/miniconda/envs/text-generation/lib/python3.9/site-packages/text_generation_server/server.py\\\", line 104, in serve_inner\n    model = get_model(model_id, revision, sharded, quantize)\n  File \\\"/opt/miniconda/envs/text-generation/lib/python3.9/site-packages/text_generation_server/models/__init__.py\\\", line 112, in get_model\n    return llama_cls(model_id, revision, quantize=quantize)\n  File \\\"/opt/miniconda/envs/text-generation/lib/python3.9/site-packages/text_generation_server/models/flash_llama.py\\\", line 39, in __init__\n    raise NotImplementedError(\\\"FlashLlama does not support quantization\\\")\nNotImplementedError: FlashLlama does not support quantization\n\""},"target":"text_generation_launcher","span":{"rank":0,"name":"shard-manager"},"spans":[{"rank":0,"name":"shard-manager"}]}

Which means that we got past the point where the failure above happens @OlivierDehaene

Let me know any things I could check to help here!

from text-generation-inference.

darth-veitcher commented on May 12, 2024

The installed version of bitsandbytes was compiled without GPU support

Looks like a CUDA issue based on that error.

from text-generation-inference.

sam-h-bean commented on May 12, 2024

@darth-veitcher I agree but when I check the pod it has a GPU allocated it from K8s and I am running bitsandbytes in a different pod on the same node just fine.

I put more details in this issue
#197

from text-generation-inference.

OlivierDehaene commented on May 12, 2024

Can you override the entrypoint with /bin/bash and args with sleep 10000, shell into the pod and:

run nvidia-smi
run python:

import torch 

print(torch.cuda.is_available())

import bitsandbytes

from text-generation-inference.

sam-h-bean commented on May 12, 2024

@OlivierDehaene as requested

# nvidia-smi 
Fri Apr 21 14:53:21 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.161.03   Driver Version: 470.161.03   CUDA Version: N/A      |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A10G         On   | 00000000:00:1E.0 Off |                    0 |
|  0%   26C    P8    16W / 300W |      0MiB / 22731MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
# python 
Python 3.9.16 | packaged by conda-forge | (main, Feb  1 2023, 21:39:03) 
[GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
pri>>> print(torch.cuda.is_available())
False
>>> import bitsandbytes

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /opt/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so
/opt/conda/lib/python3.9/site-packages/bitsandbytes/cextension.py:33: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
  warn("The installed version of bitsandbytes was compiled without GPU support. "
CUDA SETUP: Loading binary /opt/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so...

So it seems that the CUDA version is N/A? I have never seen that one before.

On my other pod I ran the same command and see

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.161.03   Driver Version: 470.161.03   CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A10G         On   | 00000000:00:1E.0 Off |                    0 |
|  0%   32C    P0    58W / 300W |  19098MiB / 22731MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

This pod is using a base image of nvcr.io/nvidia/pytorch:22.12-py3 and an AMI of ami-021248e8228eba010 which are both pretty basic...

from text-generation-inference.

darth-veitcher commented on May 12, 2024

It will likely be the way that it’s resolving the linked CUDA libraries and it can’t find them. The python -m bitsandbytes usually contains some useful information but I imagine the use of Conda complicates things.

Would try running with the above command and getting the more verbose error messages it generates.

from text-generation-inference.

sam-h-bean commented on May 12, 2024

@darth-veitcher as requested

# python -m bitsandbytes

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /opt/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so
/opt/conda/lib/python3.9/site-packages/bitsandbytes/cextension.py:33: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
  warn("The installed version of bitsandbytes was compiled without GPU support. "
CUDA SETUP: Loading binary /opt/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++ BUG REPORT INFORMATION ++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

+++++++++++++++++++ ANACONDA CUDA PATHS ++++++++++++++++++++
/opt/conda/lib/libicudata.so
/opt/conda/lib/libomptarget.rtl.cuda.nextgen.so
/opt/conda/lib/libomptarget.rtl.cuda.so
/opt/conda/lib/python3.9/site-packages/torch/lib/libc10_cuda.so
/opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so
/opt/conda/lib/python3.9/site-packages/torch/lib/libtorch_cuda_linalg.so
/opt/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda102_nocublaslt.so
/opt/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda110.so
/opt/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda110_nocublaslt.so
/opt/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda111.so
/opt/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda111_nocublaslt.so
/opt/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda112.so
/opt/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda112_nocublaslt.so
/opt/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda113.so
/opt/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda113_nocublaslt.so
/opt/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda114.so
/opt/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda114_nocublaslt.so
/opt/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda115.so
/opt/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda115_nocublaslt.so
/opt/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda116.so
/opt/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda116_nocublaslt.so
/opt/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda117.so
/opt/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda117_nocublaslt.so
/opt/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda118.so
/opt/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda118_nocublaslt.so
/opt/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda120.so
/opt/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda120_nocublaslt.so
/opt/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda121.so
/opt/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda121_nocublaslt.so
/opt/conda/lib/python3.9/site-packages/flash_attn_cuda.cpython-39-x86_64-linux-gnu.so
/opt/conda/pkgs/icu-72.1-hcb278e6_0/lib/libicudata.so
/opt/conda/pkgs/llvm-openmp-16.0.1-h417c0b6_0/lib/libomptarget.rtl.cuda.nextgen.so
/opt/conda/pkgs/llvm-openmp-16.0.1-h417c0b6_0/lib/libomptarget.rtl.cuda.so
/opt/conda/pkgs/pytorch-2.0.0-py3.9_cuda11.8_cudnn8.7.0_0/lib/python3.9/site-packages/torch/lib/libc10_cuda.so
/opt/conda/pkgs/pytorch-2.0.0-py3.9_cuda11.8_cudnn8.7.0_0/lib/python3.9/site-packages/torch/lib/libtorch_cuda.so
/opt/conda/pkgs/pytorch-2.0.0-py3.9_cuda11.8_cudnn8.7.0_0/lib/python3.9/site-packages/torch/lib/libtorch_cuda_linalg.so

++++++++++++++++++ /usr/local CUDA PATHS +++++++++++++++++++


+++++++++++++++ WORKING DIRECTORY CUDA PATHS +++++++++++++++
/usr/src/transformers/build/lib.linux-x86_64-cpython-39/transformers/models/bloom/custom_kernels/fused_bloom_attention_cuda.cpython-39-x86_64-linux-gnu.so
/usr/src/transformers/build/lib.linux-x86_64-cpython-39/transformers/models/gpt_neox/custom_kernels/fused_attention_cuda.cpython-39-x86_64-linux-gnu.so
/usr/src/transformers/src/transformers/models/bloom/custom_kernels/fused_bloom_attention_cuda.cpython-39-x86_64-linux-gnu.so
/usr/src/transformers/src/transformers/models/gpt_neox/custom_kernels/fused_attention_cuda.cpython-39-x86_64-linux-gnu.so

++++++++++++++++++ LD_LIBRARY CUDA PATHS +++++++++++++++++++
Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/conda/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/opt/conda/lib/python3.9/site-packages/bitsandbytes/__main__.py", line 95, in <module>
    generate_bug_report_information()
  File "/opt/conda/lib/python3.9/site-packages/bitsandbytes/__main__.py", line 66, in generate_bug_report_information
    lib_path = os.environ['LD_LIBRARY_PATH'].strip()
  File "/opt/conda/lib/python3.9/os.py", line 679, in __getitem__
    raise KeyError(key) from None
KeyError: 'LD_LIBRARY_PATH'

from text-generation-inference.

darth-veitcher commented on May 12, 2024

Is it a windows host by any chance running this? Came across a similar edge case the other day where was running containers on WSL. Needed to add the following command.

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/lib/wsl/lib

Your solution will likely need to be quite similar given the last segment of the error message. It can't find this environment variable.

from text-generation-inference.

sam-h-bean commented on May 12, 2024

It is a Linux machine. I linked the AMI above. It's a standard Amazon Deep Learning machine image so I imagine this will be a common issue.

from text-generation-inference.

OlivierDehaene commented on May 12, 2024

From what you reported it seems that you have a configuration issue with your node (torch does not detect the GPU):

Python 3.9.16 | packaged by conda-forge | (main, Feb  1 2023, 21:39:03) 
[GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.cuda.is_available())
False <- torch does not seem to detect the GPU

Your driver seems really old. Maybe it is the root cause?

from text-generation-inference.

darth-veitcher commented on May 12, 2024

I believe this is is the issue

++++++++++++++++++ LD_LIBRARY CUDA PATHS +++++++++++++++++++
Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/conda/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/opt/conda/lib/python3.9/site-packages/bitsandbytes/__main__.py", line 95, in <module>
    generate_bug_report_information()
  File "/opt/conda/lib/python3.9/site-packages/bitsandbytes/__main__.py", line 66, in generate_bug_report_information
    lib_path = os.environ['LD_LIBRARY_PATH'].strip()
  File "/opt/conda/lib/python3.9/os.py", line 679, in __getitem__
    raise KeyError(key) from None
KeyError: 'LD_LIBRARY_PATH'

It's looking for LD_LIBRARY_PATH in the environment and can't find it. You usually get an output from the tool that says something along the lines of the below.

CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine!
CUDA SETUP: CUDA runtime path found: /home/user/miniconda3/envs/bits/lib/libcudart.so
CUDA SETUP: Loading binary /home/user/miniconda3/envs/bits/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
CUDA SETUP: Problem: The main issue seems to be that the main CUDA library was not detected.
CUDA SETUP: Solution 1): Your paths are probably not up-to-date. You can update them via: sudo ldconfig.
CUDA SETUP: Solution 2): If you do not have sudo rights, you can do the following:
CUDA SETUP: Solution 2a): Find the cuda library via: find / -name libcuda.so 2>/dev/null
CUDA SETUP: Solution 2b): Once the library is found add it to the LD_LIBRARY_PATH: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:FOUND_PATH_FROM_2a
CUDA SETUP: Solution 2c): For a permanent solution add the export from 2b into your .bashrc file, located at ~/.bashrc

Solution with Conda which might work is something along the lines of this.

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/conda/lib/

I was fighting this issue the other day on a windows host and the above process resolved it for me followed by uninstalling bitsandbytes and reinstalling via python -m pip install git+https://github.com/TimDettmers/bitsandbytes.git.

The issue thread bitsandbytes#112 was helpful.

from text-generation-inference.

OlivierDehaene commented on May 12, 2024

I don't think it's related. The container works fine on our production environments with and without quantization. If the issue was with LD_LIBRARY_PATH, it simply would not work.
Also, in conda envs, bitsandbytes uses the CONDA_PREFIX env var which is correctly set in docker image.

from text-generation-inference.

OlivierDehaene commented on May 12, 2024

I started a g5.12xlarge just to make sure.
Here is what I get:

ubuntu:~$ docker run --gpus all -it --entrypoint /bin/bash ghcr.io/huggingface/text-generation-inference:0.6.0
root@d47b97b21497:/usr/src# nvidia-smi
Fri Apr 21 19:37:01 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A10G         On   | 00000000:00:1B.0 Off |                    0 |
|  0%   27C    P8     9W / 300W |      0MiB / 23028MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA A10G         On   | 00000000:00:1C.0 Off |                    0 |
|  0%   26C    P8     8W / 300W |      0MiB / 23028MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA A10G         On   | 00000000:00:1D.0 Off |                    0 |
|  0%   26C    P8     9W / 300W |      0MiB / 23028MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA A10G         On   | 00000000:00:1E.0 Off |                    0 |
|  0%   27C    P8    10W / 300W |      0MiB / 23028MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
root@d47b97b21497:/usr/src# python
Python 3.9.16 | packaged by conda-forge | (main, Feb  1 2023, 21:39:03)
[GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True
>>> import bitsandbytes

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /opt/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda118.so
CUDA SETUP: CUDA runtime path found: /opt/conda/lib/libcudart.so.11.0
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /opt/conda/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda118.so...
>>>

As you can see, torch.cuda.is_available() AND bitsandbytes work.

I'm almost 100% positive this is a driver issue.
Either your driver is too old, or you are missing one of datacenter-gpu-manager or cuda-drivers-fabricmanager.

from text-generation-inference.

sam-h-bean commented on May 12, 2024

@OlivierDehaene what AMI did you use?

from text-generation-inference.

OlivierDehaene commented on May 12, 2024

amazon-eks-gpu-node-1.25-v20230217 (ami-02eaf06c708b0fad1) works for example.

from text-generation-inference.

Issues Running Quantization on A10 about text-generation-inference HOT 20 CLOSED

Comments (20)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent