<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clip

If you want to finetune the model with CPU, please get rid of the <code class="notrans

CUDA out of memory about dnabert_2 HOT 8 CLOSED

chenruipu commented on September 28, 2024

CUDA out of memory

from dnabert_2.

Comments (8)

chenruipu commented on September 28, 2024

i tried to use another platform without a CUDA or GPU to do finetune. However, there is another error like this:

dnabert2-cpu/lib/python3.8/site-packages/torch/cuda/__init__.py:619: UserWarning: Can't initialize NVML
  warnings.warn("Can't initialize NVML")
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
Traceback (most recent call last):
  File "/data5/chenruipu/software/DNABERT_2-main/finetune/train.py", line 314, in <module>
    train()
  File "/data5/chenruipu/software/DNABERT_2-main/finetune/train.py", line 227, in train
    model_args, data_args, training_args = parser.parse_args_into_dataclasses()
  File "/data5/chenruipu/miniconda3/envs/dnabert2-cpu/lib/python3.8/site-packages/transformers/hf_argparser.py", line 346, in parse_args_into_dataclasses
    obj = dtype(**inputs)
  File "<string>", line 117, in __init__
  File "/data5/chenruipu/miniconda3/envs/dnabert2-cpu/lib/python3.8/site-packages/transformers/training_args.py", line 1337, in __post_init__
    raise ValueError(
ValueError: FP16 Mixed precision training with AMP or APEX (`--fp16`) and FP16 half precision evaluation (`--fp16_full_eval`) can only be used on CUDA devices.

from dnabert_2.

Zhihan1996 commented on September 28, 2024

If you want to finetune the model with CPU, please get rid of the --fp16 tag. This only applies to GPUs. We have never tested model fine-tuning on CPUs. So please share more here if you meet other type of errors.

from dnabert_2.

chenruipu commented on September 28, 2024

  warnings.warn("Can't initialize NVML")
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
WARNING:root:Perform single sequence classification...
<__main__.SupervisedDataset object at 0x7fc2e0271d00>
WARNING:root:Perform single sequence classification...
WARNING:root:Perform single sequence classification...
Some weights of the model checkpoint at /data5/chenruipu/data/wangchao/model/DNABERT-2-117M_model were not used when initializing BertForSequenceClassification: ['cls.predictions.decoder.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at /data5/chenruipu/data/wangchao/model/DNABERT-2-117M_model and are newly initialized: ['classifier.weight', 'classifier.bias', 'bert.pooler.dense.bias', 'bert.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
***** Running training *****
  Num examples = 46,499
  Num Epochs = 5
  Instantaneous batch size per device = 4
  Total train batch size (w. parallel, distributed & accumulation) = 4
  Gradient Accumulation steps = 1
  Total optimization steps = 58,125
  Number of trainable parameters = 117,069,313
  0%|                                                                                                                                              | 0/58125 [00:00<?, ?it/s]/data5/chenruipu/.cache/huggingface/modules/transformers_modules/DNABERT-2-117M_model/bert_layers.py:433: UserWarning: Increasing alibi size from 512 to 1501
  warnings.warn(
Traceback (most recent call last):
  File "/data5/chenruipu/software/DNABERT_2-main/finetune/train.py", line 314, in <module>
    train()
  File "/data5/chenruipu/software/DNABERT_2-main/finetune/train.py", line 296, in train
    trainer.train()
  File "/data5/chenruipu/miniconda3/envs/dnabert2-cpu/lib/python3.8/site-packages/transformers/trainer.py", line 1664, in train
    return inner_training_loop(
  File "/data5/chenruipu/miniconda3/envs/dnabert2-cpu/lib/python3.8/site-packages/transformers/trainer.py", line 1940, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/data5/chenruipu/miniconda3/envs/dnabert2-cpu/lib/python3.8/site-packages/transformers/trainer.py", line 2735, in training_step
    loss = self.compute_loss(model, inputs)
  File "/data5/chenruipu/miniconda3/envs/dnabert2-cpu/lib/python3.8/site-packages/transformers/trainer.py", line 2767, in compute_loss
    outputs = model(**inputs)
  File "/data5/chenruipu/miniconda3/envs/dnabert2-cpu/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/data5/chenruipu/miniconda3/envs/dnabert2-cpu/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data5/chenruipu/.cache/huggingface/modules/transformers_modules/DNABERT-2-117M_model/bert_layers.py", line 859, in forward
    outputs = self.bert(
  File "/data5/chenruipu/miniconda3/envs/dnabert2-cpu/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/data5/chenruipu/miniconda3/envs/dnabert2-cpu/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data5/chenruipu/.cache/huggingface/modules/transformers_modules/DNABERT-2-117M_model/bert_layers.py", line 609, in forward
    encoder_outputs = self.encoder(
  File "/data5/chenruipu/miniconda3/envs/dnabert2-cpu/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/data5/chenruipu/miniconda3/envs/dnabert2-cpu/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data5/chenruipu/.cache/huggingface/modules/transformers_modules/DNABERT-2-117M_model/bert_layers.py", line 447, in forward
    hidden_states = layer_module(hidden_states,
  File "/data5/chenruipu/miniconda3/envs/dnabert2-cpu/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/data5/chenruipu/miniconda3/envs/dnabert2-cpu/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data5/chenruipu/.cache/huggingface/modules/transformers_modules/DNABERT-2-117M_model/bert_layers.py", line 328, in forward
    attention_output = self.attention(hidden_states, cu_seqlens, seqlen,
  File "/data5/chenruipu/miniconda3/envs/dnabert2-cpu/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/data5/chenruipu/miniconda3/envs/dnabert2-cpu/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data5/chenruipu/.cache/huggingface/modules/transformers_modules/DNABERT-2-117M_model/bert_layers.py", line 241, in forward
    self_output = self.self(input_tensor, cu_seqlens, max_s, indices,
  File "/data5/chenruipu/miniconda3/envs/dnabert2-cpu/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/data5/chenruipu/miniconda3/envs/dnabert2-cpu/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data5/chenruipu/.cache/huggingface/modules/transformers_modules/DNABERT-2-117M_model/bert_layers.py", line 182, in forward
    attention = flash_attn_qkvpacked_func(qkv, bias)
  File "/data5/chenruipu/miniconda3/envs/dnabert2-cpu/lib/python3.8/site-packages/torch/autograd/function.py", line 598, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/data5/chenruipu/.cache/huggingface/modules/transformers_modules/DNABERT-2-117M_model/flash_attn_triton.py", line 1021, in forward
    o, lse, ctx.softmax_scale = _flash_attn_forward(
  File "/data5/chenruipu/.cache/huggingface/modules/transformers_modules/DNABERT-2-117M_model/flash_attn_triton.py", line 781, in _flash_attn_forward
    assert q.is_cuda and k.is_cuda and v.is_cuda
AssertionError
  0%|

then i got more complex errors like this error

from dnabert_2.

Zhihan1996 commented on September 28, 2024

can you try pip uninstall triton?

from dnabert_2.

xliaoyi commented on September 28, 2024

I also encountered a CUDA out of memory error, I am fine-tuning with 3xA100. I first tried using sample_data for fine-tuning, and it works fine, then I switched to my own data for fine-tuning which raised the CUDA out of memory error (I also uninstalled the triton.)

Here is the code for fine-tuning:

cd finetune

export DATA_PATH=../data  # e.g., ./sample_data
export MAX_LENGTH=128 # Please set the number as 0.25 * your sequence length. 
											# e.g., set it as 250 if your DNA sequences have 1000 nucleotide bases
											# This is because the tokenized will reduce the sequence length by about 5 times
export LR=3e-5

# Training use DataParallel
python train.py \
    --model_name_or_path zhihan1996/DNABERT-2-117M \
    --data_path  ${DATA_PATH} \
    --kmer -1 \
    --run_name DNABERT2_${DATA_PATH} \
    --model_max_length ${MAX_LENGTH} \
    --per_device_train_batch_size 8 \
    --per_device_eval_batch_size 16 \
    --gradient_accumulation_steps 1 \
    --learning_rate ${LR} \
    --num_train_epochs 5 \
    --fp16 \
    --save_steps 200 \
    --output_dir output/dnabert2 \
    --evaluation_strategy steps \
    --eval_steps 200 \
    --warmup_steps 50 \
    --logging_steps 100 \
    --overwrite_output_dir True \
    --log_level info \
    --find_unused_parameters False

Here is the error I got:


  File "train.py", line 303, in <module>
    train()
  File "train.py", line 285, in train
    trainer.train()
  File "/work/09059/xliaoyi/ls6/software/miniconda/envs/dna/lib/python3.8/site-packages/transformers/trainer.py", line 1664, in train
    return inner_training_loop(
  File "/work/09059/xliaoyi/ls6/software/miniconda/envs/dna/lib/python3.8/site-packages/transformers/trainer.py", line 2019, in _inner_training_loop
    self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
  File "/work/09059/xliaoyi/ls6/software/miniconda/envs/dna/lib/python3.8/site-packages/transformers/trainer.py", line 2300, in _maybe_log_save_evaluate
    metrics = self.evaluate(ignore_keys=ignore_keys_for_eval)
  File "/work/09059/xliaoyi/ls6/software/miniconda/envs/dna/lib/python3.8/site-packages/transformers/trainer.py", line 3029, in evaluate
    output = eval_loop(
  File "/work/09059/xliaoyi/ls6/software/miniconda/envs/dna/lib/python3.8/site-packages/transformers/trainer.py", line 3235, in evaluation_loop
    preds_host = logits if preds_host is None else nested_concat(preds_host, logits, padding_index=-100)
  File "/work/09059/xliaoyi/ls6/software/miniconda/envs/dna/lib/python3.8/site-packages/transformers/trainer_pt_utils.py", line 114, in nested_concat
    return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
  File "/work/09059/xliaoyi/ls6/software/miniconda/envs/dna/lib/python3.8/site-packages/transformers/trainer_pt_utils.py", line 114, in <genexpr>
    return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
  File "/work/09059/xliaoyi/ls6/software/miniconda/envs/dna/lib/python3.8/site-packages/transformers/trainer_pt_utils.py", line 116, in nested_concat
    return torch_pad_and_concatenate(tensors, new_tensors, padding_index=padding_index)
  File "/work/09059/xliaoyi/ls6/software/miniconda/envs/dna/lib/python3.8/site-packages/transformers/trainer_pt_utils.py", line 75, in torch_pad_and_concatenate
    return torch.cat((tensor1, tensor2), dim=0)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 18.09 GiB. GPU
^M  0%|          | 200/758330 [06:32<413:24:47,  1.96s/it]

Is that because the dataset used for fine-tuning is too large (I have 3 million sequences for fine-tuning)?

from dnabert_2.

Zhihan1996 commented on September 28, 2024

The size of the dataset should not impact memory usage. Can you try to launch the experiment with distributed data parallel? Basically, you can achieve this by replacing python with torchrun --npro_per_node=3 in your scripts.

from dnabert_2.

chenruipu commented on September 28, 2024

thanks for your respon, now my finetune task can run correctly. But it seems to use only 1 cpu core for the task, which will take too much time (about 250 hours) to finish. i want to know whether i can do the finetune with multiple cpus?

from dnabert_2.

Zhihan1996 commented on September 28, 2024

Sorry, I have no idea and experience on multiple-cpu training. You may need to investigate this by yourself. Good luck!

from dnabert_2.

CUDA out of memory about dnabert_2 HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent