Giter Club home page Giter Club logo

Comments (9)

MayDomine avatar MayDomine commented on August 22, 2024 1

please ensure that you have tried pip install bmtrain --no-cache-dir.

cuda version:11.3 torch version: 1.12.1 print(torch.version.cuda):11.3 print(torch.cuda.is_available()): True !python -c "import torch;print(torch.cuda.nccl.version())", can return (2, 10, 3)

still the same error #26

from cpm-bee.

MayDomine avatar MayDomine commented on August 22, 2024

To ensure that the CUDA version used to compile your Torch C++ plugin matches the runtime version of your current CUDA Toolkit, you can use the following Python command:

import torch
print(torch.version.cuda)

This command will print the CUDA version that was used to compile PyTorch. Please ensure that this version matches the version of your installed CUDA Toolkit.

In addition, please note that PyTorch version 2.0.0 and above are not yet supported. You should ensure that your installed version of PyTorch is less than 2.0.0. You can check the PyTorch version with the following Python command:

import torch
print(torch.__version__)

If your PyTorch version is not compatible, please downgrade PyTorch to a compatible version using pip or conda, depending on how you initially installed PyTorch.

from cpm-bee.

yyq avatar yyq commented on August 22, 2024

To ensure that the CUDA version used to compile your Torch C++ plugin matches the runtime version of your current CUDA Toolkit, you can use the following Python command:

import torch
print(torch.version.cuda)

This command will print the CUDA version that was used to compile PyTorch. Please ensure that this version matches the version of your installed CUDA Toolkit.

In addition, please note that PyTorch version 2.0.0 and above are not yet supported. You should ensure that your installed version of PyTorch is less than 2.0.0. You can check the PyTorch version with the following Python command:

import torch
print(torch.__version__)

If your PyTorch version is not compatible, please downgrade PyTorch to a compatible version using pip or conda, depending on how you initially installed PyTorch.

I tried downgrade to torch.version.cuda=11.7 and touch__version__=1.13.1+cu117, still the same error.

from cpm-bee.

MayDomine avatar MayDomine commented on August 22, 2024

torch.version.cuda=11.7 and torch__version__=1.13.1+cu117 only means the cuda version used to compile torch is 11.7.You need to make sure that the CUDA Toolkit version matches the version used to compile torch.
You can use nvidia-smi or nvcc --version to check the version of CUDA Toolkit.

from cpm-bee.

MathamPollard avatar MathamPollard commented on August 22, 2024

cuda version:11.3
torch version: 1.12.1
print(torch.version.cuda):11.3
print(torch.cuda.is_available()): True
!python -c "import torch;print(torch.cuda.nccl.version())", can return (2, 10, 3)

still the same error
#26

from cpm-bee.

diaojunxian avatar diaojunxian commented on August 22, 2024

please ensure that you have tried pip install bmtrain --no-cache-dir.

cuda version:11.3 torch version: 1.12.1 print(torch.version.cuda):11.3 print(torch.cuda.is_available()): True !python -c "import torch;print(torch.cuda.nccl.version())", can return (2, 10, 3)
still the same error #26

@MayDomine hi, my server environment, also had the errors.

torch == 1.13.1+cu117

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_May__3_18:49:52_PDT_2022
Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0

from cpm-bee.

LLMChild avatar LLMChild commented on August 22, 2024

please ensure that you have tried pip install bmtrain --no-cache-dir.

cuda version:11.3 torch version: 1.12.1 print(torch.version.cuda):11.3 print(torch.cuda.is_available()): True !python -c "import torch;print(torch.cuda.nccl.version())", can return (2, 10, 3)
still the same error #26

@MayDomine hi, my server environment, also had the errors.

torch == 1.13.1+cu117

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_May__3_18:49:52_PDT_2022
Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0

这个环境我测试过不会出错,请检查cuda runtime的路径,pip安装是否使用cache、以及本地nccl版本是否有冲突等等

from cpm-bee.

diaojunxian avatar diaojunxian commented on August 22, 2024

please ensure that you have tried pip install bmtrain --no-cache-dir.

cuda version:11.3 torch version: 1.12.1 print(torch.version.cuda):11.3 print(torch.cuda.is_available()): True !python -c "import torch;print(torch.cuda.nccl.version())", can return (2, 10, 3)
still the same error #26

@MayDomine hi, my server environment, also had the errors.

torch == 1.13.1+cu117

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_May__3_18:49:52_PDT_2022
Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0

这个环境我测试过不会出错,请检查cuda runtime的路径,pip安装是否使用cache、以及本地nccl版本是否有冲突等等

python -c "import torch;print(torch.cuda.nccl.version())"
执行有结果:(2, 14, 3)

locate nccl| grep "libnccl.so" | tail -n1 | sed -r 's/^.*\.so\.//'
执行有结果:2

我在用 transformers 进行训练的时候:

CUDA SETUP: CUDA runtime path found: /usr/local/cuda-11.7/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /home/.conda/envs/3.9/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda117.so...

@Fword4u 你好,我这边检查的环境是这样,实在看不出来哪里环境配置有冲突;

from cpm-bee.

diaojunxian avatar diaojunxian commented on August 22, 2024

pip install bmtrain --no-cache-dir

我执行这个 pip install bmtrain --no-cache-dir现在不报错了,想知道原因;

from cpm-bee.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.