2x A6000 Ada: WORLD_SIZE=2 CUDA_VISIBLE_

Above failure was with CUDA 11.8 <div class="snippet-clipboard-content notranslate

<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clip

8x A100 80GB <a class="commit-link" data-hovercard-type="commit" d

Benchmarks on 2xA6000 Ada vs 2xA100 80GB (roughly same speed) about h2ogpt HOT 5 CLOSED

h2oai commented on July 17, 2024

Benchmarks on 2xA6000 Ada vs 2xA100 80GB (roughly same speed)

from h2ogpt.

Comments (5)

arnocandel commented on July 17, 2024

2x A100 80GB (fluidstack.io):

WORLD_SIZE=2 CUDA_VISIBLE_DEVICES="0,1" torchrun --nproc_per_node=2 --nnodes=1 finetune.py --data_path=ShareGPT_unfiltered_cleaned_split.json.generate_human_bot.train_plain.json --num_epochs=1 --base_model=togethercomputer/GPT-NeoXT-Chat-Base-20B --prompt_type=plain --data_mix_in_path=None --micro_batch_size=4 --batch_size=16 --cutoff_len=1024 --run_id=4

Traceback (most recent call last):
  File "/home/fsuser/h2o-llm.clean/finetune.py", line 874, in <module>
    fire.Fire(train)
  File "/home/fsuser/miniconda3/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/fsuser/miniconda3/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/fsuser/miniconda3/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home/fsuser/h2o-llm.clean/finetune.py", line 234, in train
    model = model_loader.from_pretrained(
  File "/home/fsuser/miniconda3/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 471, in from_pretrained
    return model_class.from_pretrained(
  File "/home/fsuser/miniconda3/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2736, in from_pretrained
    ) = cls._load_pretrained_model(
  File "/home/fsuser/miniconda3/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3064, in _load_pretrained_model
    new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
  File "/home/fsuser/miniconda3/lib/python3.10/site-packages/transformers/modeling_utils.py", line 700, in _load_state_dict_into_meta_model
    set_module_8bit_tensor_to_device(model, param_name, param_device, value=param)
  File "/home/fsuser/miniconda3/lib/python3.10/site-packages/transformers/utils/bitsandbytes.py", line 76, in set_module_8bit_tensor_to_device
    new_value = value.to(device)
  File "/home/fsuser/miniconda3/lib/python3.10/site-packages/torch/cuda/__init__.py", line 247, in _lazy_init
    torch._C._cuda_init()
RuntimeError: CUDA driver initialization failed, you might not have a CUDA gpu.

need to debug their conda env that comes pre-shipped

from h2ogpt.

arnocandel commented on July 17, 2024

1 GPU A100 80GB

CUDA_VISIBLE_DEVICES=0 python finetune.py --data_path=ShareGPT_unfiltered_cleaned_split.json.generate_human_bot.train_plain.json --num_epochs=1 --base_model=togethercomputer/GPT-NeoXT-Chat-Base-20B --prompt_type=plain --data_mix_in_path=None --micro_batch_size=4 --batch_size=16 --cutoff_len=1024 --run_id=4
0%| | 2/5254 [01:45<77:04:12, 52.83s/it]

from h2ogpt.

arnocandel commented on July 17, 2024

Above failure was with CUDA 11.8

>>> import torch
>>> torch.cuda.is_available()
/home/fsuser/miniconda3/lib/python3.10/site-packages/torch/cuda/__init__.py:107: UserWarning: CUDA initialization: CUDA driver initialization failed, you might not have a CUDA gpu. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:109.)
  return torch._C._cuda_getDeviceCount() > 0
False

Installing CUDA 12.1

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
sudo apt-get update
sudo apt-get -y install cuda
sudo apt-get install libcudnn8 libcudnn8-dev libcudnn8-samples
pip uninstall bitsandbytes
git clone https://github.com/TimDettmers/bitsandbytes.git
cd bitsandbytes
CUDA_VERSION=121 make cuda12x
CUDA_VERSION=121 python setup.py install
cd ..

now get this

/home/fsuser/miniconda3/lib/python3.10/site-packages/torch/cuda/__init__.py:107: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 803: system has unsupported display driver / cuda driver combination (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:109.)
  return torch._C._cuda_getDeviceCount() > 0

so rebooting

from h2ogpt.

arnocandel commented on July 17, 2024

Tue Apr  4 22:47:01 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02              Driver Version: 530.30.02    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-SXM4-80GB           On | 00000000:05:00.0 Off |                    0 |
| N/A   21C    P0               49W / 400W|      0MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA A100-SXM4-80GB           On | 00000000:06:00.0 Off |                   On |
| N/A   20C    P0               50W / 400W|      0MiB / 81920MiB |     N/A      Default |
|                                         |                      |              Enabled |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| MIG devices:                                                                          |
+------------------+--------------------------------+-----------+-----------------------+
| GPU  GI  CI  MIG |                   Memory-Usage |        Vol|      Shared           |
|      ID  ID  Dev |                     BAR1-Usage | SM     Unc| CE ENC DEC OFA JPG|
|                  |                                |        ECC|                       |
|==================+================================+===========+=======================|
|  No MIG devices found                                                                 |
+---------------------------------------------------------------------------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

still same now

from h2ogpt.

arnocandel commented on July 17, 2024

8x A100 80GB

538113d
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 WORLD_SIZE=8 torchrun --nproc_per_node=8 --nnodes=1 finetune.py --data_path=alpaca_data_cleaned.json --run_id=1
1%|▍ | 16/2433 [03:20<8:11:43, 12.21s/it]

2x A6000 Ada 48GB

538113d
CUDA_VISIBLE_DEVICES=0,1 WORLD_SIZE=2 torchrun --nproc_per_node=2 --nnodes=1 finetune.py --data_path=alpaca_data_cleaned.json --run_id=1
0%| | 2/2433 [01:05<22:18:07, 33.03s/it]

from h2ogpt.

Benchmarks on 2xA6000 Ada vs 2xA100 80GB (roughly same speed) about h2ogpt HOT 5 CLOSED

Comments (5)

2x A100 80GB (fluidstack.io):

1 GPU A100 80GB

8x A100 80GB

2x A6000 Ada 48GB

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent