python : 3.7 cuda : 11.1 pytorch : 1.8 I am trying to compil

cuda does not install about extension-cpp HOT 17 OPEN

dorooddorood606 commented on August 11, 2024 4

cuda does not install

from extension-cpp.

Comments (17)

apatsekin commented on August 11, 2024 33

if you are building in Nvidia docker container without actual GPU, you can use something like this:

CUDA_VERSION=$(/usr/local/cuda/bin/nvcc --version | sed -n 's/^.*release \([0-9]\+\.[0-9]\+\).*$/\1/p')
if [[ ${CUDA_VERSION} == 9.0* ]]; then
    export TORCH_CUDA_ARCH_LIST="3.5;5.0;6.0;7.0+PTX"
elif [[ ${CUDA_VERSION} == 9.2* ]]; then
    export TORCH_CUDA_ARCH_LIST="3.5;5.0;6.0;6.1;7.0+PTX"
elif [[ ${CUDA_VERSION} == 10.* ]]; then
    export TORCH_CUDA_ARCH_LIST="3.5;5.0;6.0;6.1;7.0;7.5+PTX"
elif [[ ${CUDA_VERSION} == 11.0* ]]; then
    export TORCH_CUDA_ARCH_LIST="3.5;5.0;6.0;6.1;7.0;7.5;8.0+PTX"
elif [[ ${CUDA_VERSION} == 11.* ]]; then
    export TORCH_CUDA_ARCH_LIST="3.5;5.0;6.0;6.1;7.0;7.5;8.0;8.6+PTX"
else
    echo "unsupported cuda version."
    exit 1
fi

from extension-cpp.

gaetan-landreau commented on August 11, 2024 24

Tried to investigate a bit this issue since I've faced the same problem in one of my Docker container.

If you're currently running your code through a setup.py , you should first add TORCH_CUDA_ARCH_LIST="YOUR_GPUs_CC+PTX" to run:

python TORCH_CUDA_ARCH_LIST="YOUR_GPUs_CC+PTX" setup.py install

(or an ARG TORCH_CUDA_ARCH_LIST="YOUR_GPUs_CC+PTX" in your Dockerfile for instance )

Additional infos. can be found here: https://pytorch.org/docs/stable/cpp_extension.html

from extension-cpp.

MalteEbner commented on August 11, 2024 18

The solution that worked for me on Linux:
The docker requires access to the cuda library during build time. To ensure this, make sure that
your /etc/docker/daemon.json file looks as follows:

{
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    },
    "default-runtime": "nvidia"
}

If not, you need to change it and then restart docker with

sudo systemctl restart docker

from extension-cpp.

ClementPinard commented on August 11, 2024 12

Hello, for anyone visiting this issue, the problem is caused here : https://github.com/pytorch/pytorch/blob/master/torch/utils/cpp_extension.py#L1694

basically, the arch_list is supposed to be constructed with discovered architectures with torch.cuda.get_device_capability(i)

The thing is, when no CUDA card is detected, the function torch.cuda.device_count() returns 0 and thus no architecture is added to that list.

The leads to the last line, which essentially says "add '+PTX' to the name of last architecture, whicvh obviously fails when the arch_list is empty

As such, this problem is essentially because no cuda hardware was found by torch. Possible reasons and solutions:

driver / cuda mismatch. Probably due to updating of cuda, reboot and driver will be updated
docker context. See comments above ( #71 (comment) )

If there is no way to detect gpu at build time, but you know what architecture it should run on, you can explicitly set it with environment variable, like said in this comment ( #71 (comment) )

from extension-cpp.

earor-R commented on August 11, 2024 9

Tried to investigate a bit this issue since I've faced the same problem in one of my Docker container.
If you're currently running your code through a setup.py , you should first add TORCH_CUDA_ARCH_LIST="YOUR_GPUs_CC+PTX" to run:
python TORCH_CUDA_ARCH_LIST="YOUR_GPUs_CC+PTX" setup.py install
(or an ARG TORCH_CUDA_ARCH_LIST="YOUR_GPUs_CC+PTX" in your Dockerfile for instance )
Additional infos. can be found here: https://pytorch.org/docs/stable/cpp_extension.html

How to find the "YOUR_GPUs_CC+PTX" of my gpu?

If the gpu driver is loaded correctly, execute the following statement in the python console

>>> torch.cuda.get_device_capability(0)
(6, 1)

that means TORCH_CUDA_ARCH_LIST="6.1". However, in most cases, cuda is unavailable because you have specified gpu incorrectly, such as whether you have set CUDA_ VISIBLE_ DEVICES and the specified gpu is not available?

from extension-cpp.

kuzand commented on August 11, 2024 5

Tried to investigate a bit this issue since I've faced the same problem in one of my Docker container.

If you're currently running your code through a setup.py , you should first add TORCH_CUDA_ARCH_LIST="YOUR_GPUs_CC+PTX" to run:

python TORCH_CUDA_ARCH_LIST="YOUR_GPUs_CC+PTX" setup.py install

(or an ARG TORCH_CUDA_ARCH_LIST="YOUR_GPUs_CC+PTX" in your Dockerfile for instance )

Additional infos. can be found here: https://pytorch.org/docs/stable/cpp_extension.html

How to find the "YOUR_GPUs_CC+PTX" of my gpu?

from extension-cpp.

Leask commented on August 11, 2024 5

CUDA_VERSION=$(/usr/local/cuda/bin/nvcc --version | sed -n 's/^.release ([0-9]+.[0-9]+).$/\1/p')
if [[ ${CUDA_VERSION} == 9.0* ]]; then
export TORCH_CUDA_ARCH_LIST="3.5;5.0;6.0;7.0+PTX"
elif [[ ${CUDA_VERSION} == 9.2* ]]; then
export TORCH_CUDA_ARCH_LIST="3.5;5.0;6.0;6.1;7.0+PTX"
elif [[ ${CUDA_VERSION} == 10.* ]]; then
export TORCH_CUDA_ARCH_LIST="3.5;5.0;6.0;6.1;7.0;7.5+PTX"
elif [[ ${CUDA_VERSION} == 11.0* ]]; then
export TORCH_CUDA_ARCH_LIST="3.5;5.0;6.0;6.1;7.0;7.5;8.0+PTX"
elif [[ ${CUDA_VERSION} == 11.* ]]; then
export TORCH_CUDA_ARCH_LIST="3.5;5.0;6.0;6.1;7.0;7.5;8.0;8.6+PTX"
else
echo "unsupported cuda version."
exit 1
fi

updated this workaround to support cuda v12:

CUDA_VERSION=$(/usr/local/cuda/bin/nvcc --version | sed -n 's/^.*release \([0-9]\+\.[0-9]\+\).*$/\1/p')
if [[ ${CUDA_VERSION} == 9.0* ]]; then
    export TORCH_CUDA_ARCH_LIST="3.5;5.0;6.0;7.0+PTX"
elif [[ ${CUDA_VERSION} == 9.2* ]]; then
    export TORCH_CUDA_ARCH_LIST="3.5;5.0;6.0;6.1;7.0+PTX"
elif [[ ${CUDA_VERSION} == 10.* ]]; then
    export TORCH_CUDA_ARCH_LIST="3.5;5.0;6.0;6.1;7.0;7.5+PTX"
elif [[ ${CUDA_VERSION} == 11.0* ]]; then
    export TORCH_CUDA_ARCH_LIST="3.5;5.0;6.0;6.1;7.0;7.5;8.0+PTX"
elif [[ ${CUDA_VERSION} == 11.* ]]; then
    export TORCH_CUDA_ARCH_LIST="3.5;5.0;6.0;6.1;7.0;7.5;8.0;8.6+PTX"
elif [[ ${CUDA_VERSION} == 12.* ]]; then
    export TORCH_CUDA_ARCH_LIST="5.0;5.2;5.3;6.0;6.1;6.2;7.0;7.2;7.5;8.0;8.6;8.7;8.9;9.0+PTX"
else
    echo "unsupported cuda version."
    exit 1
fi

from extension-cpp.

darkdevahm commented on August 11, 2024 4

Tried to investigate a bit this issue since I've faced the same problem in one of my Docker container.
If you're currently running your code through a setup.py , you should first add TORCH_CUDA_ARCH_LIST="YOUR_GPUs_CC+PTX" to run:
python TORCH_CUDA_ARCH_LIST="YOUR_GPUs_CC+PTX" setup.py install
(or an ARG TORCH_CUDA_ARCH_LIST="YOUR_GPUs_CC+PTX" in your Dockerfile for instance )
Additional infos. can be found here: https://pytorch.org/docs/stable/cpp_extension.html

How to find the "YOUR_GPUs_CC+PTX" of my gpu?

Have you solved this issue?

from extension-cpp.

oliver-batchelor commented on August 11, 2024 2

It came to my attention last night when I was trying to compile for 1.8.2 - and I realized this was because torch.cuda.is_available() was False. Once I fixed my cuda this compile error was also gone.

…

On Mon, Oct 18, 2021 at 10:12 AM Ahmed Ahmed ***@***.***> wrote: Is torch.cuda.is_available() False? I have had this only when I try to compile with a broken install of pytorch or cuda. Which cuda and pytorch version did you use? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#71 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAITRZJIU77MP5ADH4HZEYLUHM337ANCNFSM42ZYAHGA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

from extension-cpp.

imzeroan commented on August 11, 2024 1

I solve this by running: # TORCH_CUDA_ARCH_LIST="6.1+PTX" python setup.py install for my GTX1080ti. The GPU_CC number6.1 is according to 1080ti refer to https://developer.nvidia.com/cuda-gpus

from extension-cpp.

gaetan-landreau commented on August 11, 2024

You should find everything you need on this link (go to section CUDA-Enabled NVIDIA Quadro and NVIDIA RTX)

from extension-cpp.

oliver-batchelor commented on August 11, 2024

Is torch.cuda.is_available() False? I have had this only when I try to compile with a broken install of pytorch or cuda.

from extension-cpp.

darkdevahm commented on August 11, 2024

Is torch.cuda.is_available() False? I have had this only when I try to compile with a broken install of pytorch or cuda.

Which cuda and pytorch version did you use?

from extension-cpp.

andriworld commented on August 11, 2024

I had the same error running in WSL on Windows. The above solutions of setting the TORCH_CUDA_ARCH_LIST environment variable fixed the issue.

from extension-cpp.

XiangFeng66 commented on August 11, 2024

how to solve this problem on windows platform @gaetan-landreau @ClementPinard

from extension-cpp.

mnpenner commented on August 11, 2024

I got cuda working inside of docker on Windows 10 thanks to the instructions here and a little help from ChatGPT.

The issue is as @earor-R said, you can figure out the TORCH_CUDA_ARCH_LIST but the GPU still isn't available during docker build. You can, however, make it available during docker run by adding --gpus=all.

So you can set up half the Dockerfile automated like

FROM nvidia/cuda:11.7.1-devel-ubuntu22.04

WORKDIR /srv

RUN apt update && apt install -y curl build-essential git

RUN curl -sL "https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh" > /tmp/miniconda.sh

RUN bash /tmp/miniconda.sh -b -p /opt/miniconda

ENV PATH="/opt/miniconda/bin:$PATH"

RUN pip install torch torchvision torchaudio

RUN git clone https://github.com/oobabooga/text-generation-webui .

RUN mkdir /srv/repositories
RUN cd /srv/repositories && git clone https://github.com/oobabooga/GPTQ-for-LLaMa.git -b cuda

Then build it:

docker build . -t oobabooga --progress=plain

Then run it, give the container a name, add --gpus all, and don't add --rm:

docker run --gpus all -it --name temp-container oobabooga /bin/bash

Then once inside you can get the cuda version like @earor-R said and finish the install:

python -c 'import torch; print(".".join(map(str, torch.cuda.get_device_capability(0))))'
export TORCH_CUDA_ARCH_LIST=="8.6+PTX"
cd /srv/repositories/GPTQ-for-LLaMa && python setup_cuda.py install

Then exit the container and commit it back into an image:

 docker commit temp-container oobabooga-run

And then finally you can run it:

docker run -it --gpus=all --rm -p 7860:7860 --mount "type=bind,src=$(wslpath -w text-generation-webui/models),dst=/srv/models,readonly" oobabooga-run python server.py --auto-devices --chat --model=gpt4-x-alpaca-13b-native-4bit-128g --wbits=4 --groupsize=128 --gpu-memory=18 --listen

I wish I could automate the build easier so this is maintainable but that's the best I've got right now.

from extension-cpp.

alexmeri98 commented on August 11, 2024

Tried to investigate a bit this issue since I've faced the same problem in one of my Docker container.
If you're currently running your code through a setup.py , you should first add TORCH_CUDA_ARCH_LIST="YOUR_GPUs_CC+PTX" to run:
python TORCH_CUDA_ARCH_LIST="YOUR_GPUs_CC+PTX" setup.py install
(or an ARG TORCH_CUDA_ARCH_LIST="YOUR_GPUs_CC+PTX" in your Dockerfile for instance )
Additional infos. can be found here: https://pytorch.org/docs/stable/cpp_extension.html

How to find the "YOUR_GPUs_CC+PTX" of my gpu?

You can use the next scrip to obtain your GPUs arch:
import torch torch.cuda.get_arch_list()

You will get ['sm_37', 'sm_50', 'sm_60', 'sm_70', 'sm_75', 'sm_80', 'sm_86'] and you will have to parse this into "3.7 5.0 6.0 7.0 7.5 8.0 8.6+PTX"

from extension-cpp.

cuda does not install about extension-cpp HOT 17 OPEN

Comments (17)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent