Multi GPU mode is stuck at the beginning about gpt2 HOT 14 CLOSED

affjljoo3581 commented on June 2, 2024

Multi GPU mode is stuck at the beginning

from gpt2.

Comments (14)

affjljoo3581 commented on June 2, 2024 1

In my case, the GPU utilization was over 80% with my 2x V100s. Although my Dataset class does not spawn the worker threads for fetching data from the corpus, it actually does not matter for the performance with the proper system (sufficient CPUs and RAMs) and suitable vocabulary size. How about testing the bottleneck of my Dataset loader? Change the

GPT2/src/gpt2/data/corpus.py

Lines 28 to 50 in 71ebf91

 def _fetch_one(self) -> Dict[str, List[int]]: 

 while True: 

 # Read subword-tokenized sequence from corpus. 

 line = self.corpus_fp.readline() 

 if not line: 

 # Raise error when all sequences are fetched. 

 if not self.repeat: 

 raise StopIteration() 

 # Or, move to the first of the corpus. 

 self.corpus_fp.seek(0) 

 continue 

 # Use token indices rather than the token names directly. 

 indices = [self.vocab[t] for t in line.split()] 

 if len(indices) + 2 > self.seq_len: 

 continue 

 # Decorate the sequence with additional tokens. 

 indices = [self.vocab.bos_idx] + indices + [self.vocab.eos_idx] 

 indices += [self.vocab.pad_idx] * (self.seq_len - len(indices) + 1) 

 return {'input': indices[:-1], 'output': indices[1:]}

function code as below:

    def _fetch_one(self) -> Dict[str, List[int]]:
        indices += [0] * (self.seq_len + 1)
        return {'input': indices[:-1], 'output': indices[1:]}

from gpt2.

affjljoo3581 commented on June 2, 2024

First of all, you did not append the backslash (\) to the end of the --gpus 4 parameter line. Because of that, the arguments after the --gpu 4 line may be ignored. I think it is not a solution, but show me the result after fixing the bug first.

from gpt2.

liygzting commented on June 2, 2024

sorry,This is the format of the display
python -m gpt2 train --train_corpus ../build/corpus.train.txt \ --eval_corpus ../build/corpus.test.txt \ --vocab_path ../build/vocab.txt \ --dims 1024 \ --batch_train 128 \ --batch_eval 128 \ --seq_len 64 \ --total_steps 3000 \ --eval_steps 500 \ --save_steps 3000 \ --gpus 4 \ --save_checkpoint_path ckpt-gpt2.pth \ --save_model_path gpt2-pretrained.pth

and ENTER for run
it still stuck，as follows

Train GPT-2 model: 0%| | 0/3000 [00:00<?, ?it/s]

when run nvidia-smi,

The multi GPU seems to be up, but it's stuck
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 3542447 C /root/anaconda3/bin/python 2315MiB |
| 1 3542448 C /root/anaconda3/bin/python 2320MiB |
| 2 3542449 C /root/anaconda3/bin/python 2315MiB |
| 3 3542450 C /root/anaconda3/bin/python 2320MiB |
+-----------------------------------------------------------------------------+

from gpt2.

liygzting commented on June 2, 2024

from gpt2.

affjljoo3581 commented on June 2, 2024

How long did you wait for the freeze? Due to the distributed training environment, it usually spends a few minutes before start training. In my case, 2x V100 required about 2 to 3 minutes.

from gpt2.

liygzting commented on June 2, 2024

It's been running for hours, so I canceled it
However, single GPU can run at a speed of 1.5it/s

from gpt2.

affjljoo3581 commented on June 2, 2024

What about two GPUs? Can you show me the result with 2 and 3 GPUs?

from gpt2.

liygzting commented on June 2, 2024

I've tested that --gpus from 2 to 4
There was no improvement
Maybe dataloader doesn't allow multithreading

from gpt2.

affjljoo3581 commented on June 2, 2024

I ran this model on 2x V100s. I think distributed reduction would be the problem. Can you check if the GPU memory usage increases depending on the batch size?

from gpt2.

affjljoo3581 commented on June 2, 2024

And check if tcp port 8000 is available as well.

from gpt2.

liygzting commented on June 2, 2024

the GPU memory usage looks like OK when i set batch size is 64
and port 8000 is available

from gpt2.

liygzting commented on June 2, 2024

I think it is ommunication problem of multi GPU graphics card
Communication problem of multi GPU graphics card
When I set CUDA_VISIBLE_DEVICES=0,1 and --gpus 2 it don't work
But I set CUDA_VISIBLE_DEVICES=0,2 and --gpus 2 it works
Maybe 0 and 2 or 1 and 3 are able to communicate

from gpt2.

liygzting commented on June 2, 2024

I find Volatile GPU-Util is too low, Most of the time it is 10% even 0%
0% -> 10% -> 99%
How can I set it to work like DataLoader set num_workers

from gpt2.

liygzting commented on June 2, 2024

Thank you very much
I think this may be another point I need to know
At present, my GPU is busy running
At the same time, I need to understand gpt2 deeply

from gpt2.

Multi GPU mode is stuck at the beginning about gpt2 HOT 14 CLOSED

Comments (14)

Related Issues (9)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	def _fetch_one(self) -> Dict[str, List[int]]:
	while True:
	# Read subword-tokenized sequence from corpus.
	line = self.corpus_fp.readline()
	if not line:
	# Raise error when all sequences are fetched.
	if not self.repeat:
	raise StopIteration()

	# Or, move to the first of the corpus.
	self.corpus_fp.seek(0)
	continue

	# Use token indices rather than the token names directly.
	indices = [self.vocab[t] for t in line.split()]
	if len(indices) + 2 > self.seq_len:
	continue

	# Decorate the sequence with additional tokens.
	indices = [self.vocab.bos_idx] + indices + [self.vocab.eos_idx]
	indices += [self.vocab.pad_idx] * (self.seq_len - len(indices) + 1)

	return {'input': indices[:-1], 'output': indices[1:]}