DefaultCPUAllocator: can't allocate memory: you tried to allocate 4009910272 bytes. Error code 12 (Cannot allocate memory)
[2023-04-03 22:11:36,967] [WARNING] [runner.py:186:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2023-04-03 22:11:38,845] [INFO] [runner.py:550:main] cmd = /home/dm/.miniconda3/envs/LLM/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgM119 --master_addr=127.0.0.1 --master_port=11000 --enable_each_rank_log=None examples/finetune.py --model_name_or_path decapoda-research/llama-7b-hf --lora_model_path /home/dm/projects/LMFlow/output_models/llama7b-lora-380k --dataset_path /home/dm/projects/LMFlow/data/example_dataset/train --output_dir /home/dm/projects/LMFlow/output_models/finetune --overwrite_output_dir --local_rank=4 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --num_train_epochs 0.01 --learning_rate 2e-5 --block_size 256 --use_ram_optimized_load False --per_device_train_batch_size 1 --deepspeed configs/ds_config_zero3.json --run_name finetune --validation_split_percentage 0 --logging_steps 20 --do_train --ddp_timeout 72000 --save_steps 5000 --dataloader_num_workers 1
[2023-04-03 22:11:40,811] [INFO] [launch.py:142:main] WORLD INFO DICT: {'localhost': [0, 1, 2, 3]}
[2023-04-03 22:11:40,811] [INFO] [launch.py:148:main] nnodes=1, num_local_procs=4, node_rank=0
[2023-04-03 22:11:40,811] [INFO] [launch.py:161:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3]})
[2023-04-03 22:11:40,811] [INFO] [launch.py:162:main] dist_world_size=4
[2023-04-03 22:11:40,811] [INFO] [launch.py:164:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
===================================BUG REPORT===================================================================================================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues===================================BUG REPORT===================================
================================================================================Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
/home/dm/.miniconda3/envs/LLM/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /home/dm/.miniconda3/envs/LLM did not contain libcudart.so as expected! Searching further paths...
warn(msg)
/home/dm/.miniconda3/envs/LLM/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /home/dm/.miniconda3/envs/LLM did not contain libcudart.so as expected! Searching further paths...
warn(msg)
/home/dm/.miniconda3/envs/LLM/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /home/dm/.miniconda3/envs/LLM did not contain libcudart.so as expected! Searching further paths...
warn(msg)
/home/dm/.miniconda3/envs/LLM/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /home/dm/.miniconda3/envs/LLM did not contain libcudart.so as expected! Searching further paths...
warn(msg)
/home/dm/.miniconda3/envs/LLM/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
warn(msg)
/home/dm/.miniconda3/envs/LLM/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 6.1CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Detected CUDA version 114CUDA SETUP: Highest compute capability among GPUs detected: 6.1
CUDA SETUP: Detected CUDA version 114
/home/dm/.miniconda3/envs/LLM/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 6.1
CUDA SETUP: Detected CUDA version 114
CUDA SETUP: Loading binary /home/dm/.miniconda3/envs/LLM/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda114_nocublaslt.so...
CUDA SETUP: Loading binary /home/dm/.miniconda3/envs/LLM/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda114_nocublaslt.so...
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 6.1
CUDA SETUP: Detected CUDA version 114
/home/dm/.miniconda3/envs/LLM/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
warn(msg)
CUDA SETUP: Loading binary /home/dm/.miniconda3/envs/LLM/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda114_nocublaslt.so...
CUDA SETUP: Loading binary /home/dm/.miniconda3/envs/LLM/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda114_nocublaslt.so...
[2023-04-03 22:11:56,238] [INFO] [comm.py:652:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
04/03/2023 22:11:57 - WARNING - lmflow.pipeline.finetuner - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: False
04/03/2023 22:11:58 - WARNING - lmflow.pipeline.finetuner - Process rank: 1, device: cuda:1, n_gpu: 1distributed training: True, 16-bits training: False
04/03/2023 22:11:58 - WARNING - datasets.builder - Found cached dataset json (/home/dm/.cache/huggingface/datasets/json/default-5ae8ba371b9f2d27/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
ModelArguments(model_name_or_path='decapoda-research/llama-7b-hf', lora_model_path='/home/dm/projects/LMFlow/output_models/llama7b-lora-380k', model_type=None, config_overrides=None, config_name=None, tokenizer_name=None, cache_dir=None, use_fast_tokenizer=True, model_revision='main', use_auth_token=False, torch_dtype=None, use_lora=False, lora_r=8, lora_alpha=32, lora_dropout=0.1, use_ram_optimized_load=False)
04/03/2023 22:11:58 - WARNING - lmflow.pipeline.finetuner - Process rank: 3, device: cuda:3, n_gpu: 1distributed training: True, 16-bits training: False
04/03/2023 22:11:58 - WARNING - lmflow.pipeline.finetuner - Process rank: 2, device: cuda:2, n_gpu: 1distributed training: True, 16-bits training: False
04/03/2023 22:11:59 - WARNING - datasets.builder - Found cached dataset json (/home/dm/.cache/huggingface/datasets/json/default-5ae8ba371b9f2d27/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
ModelArguments(model_name_or_path='decapoda-research/llama-7b-hf', lora_model_path='/home/dm/projects/LMFlow/output_models/llama7b-lora-380k', model_type=None, config_overrides=None, config_name=None, tokenizer_name=None, cache_dir=None, use_fast_tokenizer=True, model_revision='main', use_auth_token=False, torch_dtype=None, use_lora=False, lora_r=8, lora_alpha=32, lora_dropout=0.1, use_ram_optimized_load=False)
04/03/2023 22:11:59 - WARNING - datasets.builder - Found cached dataset json (/home/dm/.cache/huggingface/datasets/json/default-5ae8ba371b9f2d27/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
ModelArguments(model_name_or_path='decapoda-research/llama-7b-hf', lora_model_path='/home/dm/projects/LMFlow/output_models/llama7b-lora-380k', model_type=None, config_overrides=None, config_name=None, tokenizer_name=None, cache_dir=None, use_fast_tokenizer=True, model_revision='main', use_auth_token=False, torch_dtype=None, use_lora=False, lora_r=8, lora_alpha=32, lora_dropout=0.1, use_ram_optimized_load=False)
04/03/2023 22:12:04 - WARNING - datasets.builder - Found cached dataset json (/home/dm/.cache/huggingface/datasets/json/default-5ae8ba371b9f2d27/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
ModelArguments(model_name_or_path='decapoda-research/llama-7b-hf', lora_model_path='/home/dm/projects/LMFlow/output_models/llama7b-lora-380k', model_type=None, config_overrides=None, config_name=None, tokenizer_name=None, cache_dir=None, use_fast_tokenizer=True, model_revision='main', use_auth_token=False, torch_dtype=None, use_lora=False, lora_r=8, lora_alpha=32, lora_dropout=0.1, use_ram_optimized_load=False)
[2023-04-03 22:12:40,880] [INFO] [partition_parameters.py:415:__exit__] finished initializing model with 6.74B parameters
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33/33 [01:36<00:00, 2.92s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33/33 [01:36<00:00, 2.92s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33/33 [01:36<00:00, 2.92s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33/33 [01:36<00:00, 2.92s/it]
04/03/2023 22:14:21 - WARNING
..................
- datasets.fingerprint - Parameter 'function'=<function HFDecoderModel.tokenize.<locals>.tokenize_function at 0x7f8e405aeca0> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
04/03/2023 22:14:21 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/dm/.cache/huggingface/datasets/json/default-5ae8ba371b9f2d27/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-1c80317fa3b1799d.arrow
04/03/2023 22:14:21 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/dm/.cache/huggingface/datasets/json/default-5ae8ba371b9f2d27/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-440a0d2293dee367.arrow
04/03/2023 22:14:21 - WARNING - datasets.fingerprint - Parameter 'function'=<function HFDecoderModel.tokenize.<locals>.tokenize_function at 0x7f63180cdca0> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
04/03/2023 22:14:21 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/dm/.cache/huggingface/datasets/json/default-5ae8ba371b9f2d27/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-1c80317fa3b1799d.arrow
04/03/2023 22:14:21 - WARNING - datasets.fingerprint - Parameter 'function'=<function HFDecoderModel.tokenize.<locals>.tokenize_function at 0x7f2ee8477e50> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
04/03/2023 22:14:21 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/dm/.cache/huggingface/datasets/json/default-5ae8ba371b9f2d27/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-1c80317fa3b1799d.arrow
04/03/2023 22:14:21 - WARNING - datasets.fingerprint - Parameter 'function'=<function HFDecoderModel.tokenize.<locals>.tokenize_function at 0x7f35d1f88ca0> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
04/03/2023 22:14:21 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/dm/.cache/huggingface/datasets/json/default-5ae8ba371b9f2d27/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-1c80317fa3b1799d.arrow
04/03/2023 22:14:21 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/dm/.cache/huggingface/datasets/json/default-5ae8ba371b9f2d27/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-440a0d2293dee367.arrow
04/03/2023 22:14:21 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/dm/.cache/huggingface/datasets/json/default-5ae8ba371b9f2d27/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-440a0d2293dee367.arrow
04/03/2023 22:14:21 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/dm/.cache/huggingface/datasets/json/default-5ae8ba371b9f2d27/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-440a0d2293dee367.arrow
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/dm/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/dm/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/dm/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/dm/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/dm/.cache/torch_extensions/py39_cu117/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module cpu_adam...
Time to load cpu_adam op: 1.1844513416290283 seconds
Loading extension module cpu_adam...
Time to load cpu_adam op: 1.2648406028747559 seconds
Loading extension module cpu_adam...
Time to load cpu_adam op: 1.2672991752624512 seconds
Loading extension module cpu_adam...
Time to load cpu_adam op: 1.2600975036621094 seconds
Using /home/dm/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/dm/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/dm/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/dm/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Emitting ninja build file /home/dm/.cache/torch_extensions/py39_cu117/utils/build.ninja...
Building extension module utils...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module utils...
Time to load utils op: 0.5386121273040771 seconds
Loading extension module utils...
Time to load utils op: 0.3074460029602051 seconds
Loading extension module utils...
Loading extension module utils...
Time to load utils op: 0.6129136085510254 seconds
Time to load utils op: 0.6200845241546631 seconds
Parameter Offload: Total persistent parameters: 266240 in 65 params
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/dm/projects/LMFlow/examples/finetune.py:71 in <module> │
│ │
│ 68 │
│ 69 │
│ 70 if __name__ == '__main__': │
│ ❱ 71 │ main() │
│ 72 │
│ │
│ /home/dm/projects/LMFlow/examples/finetune.py:67 in main │
│ │
│ 64 │ │ ) │
│ 65 │ │
│ 66 │ # Finetuning │
│ ❱ 67 │ tuned_model = finetuner.tune(model=model, lm_dataset=lm_dataset) │
│ 68 │
│ 69 │
│ 70 if __name__ == '__main__': │
│ │
│ /home/dm/projects/LMFlow/src/lmflow/pipeline/finetuner.py:232 in tune │
│ │
│ 229 │ │ │ │ checkpoint = training_args.resume_from_checkpoint │
│ 230 │ │ │ elif last_checkpoint is not None: │
│ 231 │ │ │ │ checkpoint = last_checkpoint │
│ ❱ 232 │ │ │ train_result = trainer.train(resume_from_checkpoint=checkpoint) │
│ 233 │ │ │ │
│ 234 │ │ │ if not model_args.use_lora: │
│ 235 │ │ │ │ trainer.save_model() # Saves the tokenizer too for easy upload │
│ │
│ /home/dm/.miniconda3/envs/LLM/lib/python3.9/site-packages/transformers-4.28.0.dev0-py3.9.egg/tra │
│ nsformers/trainer.py:1639 in train │
│ │
│ 1636 │ │ inner_training_loop = find_executable_batch_size( │
│ 1637 │ │ │ self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size │
│ 1638 │ │ ) │
│ ❱ 1639 │ │ return inner_training_loop( │
│ 1640 │ │ │ args=args, │
│ 1641 │ │ │ resume_from_checkpoint=resume_from_checkpoint, │
│ 1642 │ │ │ trial=trial, │
│ │
│ /home/dm/.miniconda3/envs/LLM/lib/python3.9/site-packages/transformers-4.28.0.dev0-py3.9.egg/tra │
│ nsformers/trainer.py:1708 in _inner_training_loop │
│ │
│ 1705 │ │ │ or self.fsdp is not None │
│ 1706 │ │ ) │
│ 1707 │ │ if args.deepspeed: │
│ ❱ 1708 │ │ │ deepspeed_engine, optimizer, lr_scheduler = deepspeed_init( │
│ 1709 │ │ │ │ self, num_training_steps=max_steps, resume_from_checkpoint=resume_from_c │
│ 1710 │ │ │ ) │
│ 1711 │ │ │ self.model = deepspeed_engine.module │
│ │
│ /home/dm/.miniconda3/envs/LLM/lib/python3.9/site-packages/transformers-4.28.0.dev0-py3.9.egg/tra │
│ nsformers/deepspeed.py:378 in deepspeed_init │
│ │
│ 375 │ │ "lr_scheduler": lr_scheduler, │
│ 376 │ } │
│ 377 │ │
│ ❱ 378 │ deepspeed_engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs) │
│ 379 │ │
│ 380 │ if resume_from_checkpoint is not None: │
│ 381 │ │ # it's possible that the user is trying to resume from model_path, which doesn't │
│ │
│ /home/dm/.miniconda3/envs/LLM/lib/python3.9/site-packages/deepspeed/__init__.py:125 in │
│ initialize │
│ │
│ 122 │ assert model is not None, "deepspeed.initialize requires a model" │
│ 123 │ │
│ 124 │ if not isinstance(model, PipelineModule): │
│ ❱ 125 │ │ engine = DeepSpeedEngine(args=args, │
│ 126 │ │ │ │ │ │ │ │ model=model, │
│ 127 │ │ │ │ │ │ │ │ optimizer=optimizer, │
│ 128 │ │ │ │ │ │ │ │ model_parameters=model_parameters, │
│ │
│ /home/dm/.miniconda3/envs/LLM/lib/python3.9/site-packages/deepspeed/runtime/engine.py:340 in │
│ __init__ │
│ │
│ 337 │ │ │ model_parameters = list(model_parameters) │
│ 338 │ │ │
│ 339 │ │ if has_optimizer: │
│ ❱ 340 │ │ │ self._configure_optimizer(optimizer, model_parameters) │
│ 341 │ │ │ self._configure_lr_scheduler(lr_scheduler) │
│ 342 │ │ │ self._report_progress(0) │
│ 343 │ │ elif self.zero_optimization(): │
│ │
│ /home/dm/.miniconda3/envs/LLM/lib/python3.9/site-packages/deepspeed/runtime/engine.py:1298 in │
│ _configure_optimizer │
│ │
│ 1295 │ │ optimizer_wrapper = self._do_optimizer_sanity_check(basic_optimizer) │
│ 1296 │ │ │
│ 1297 │ │ if optimizer_wrapper == ZERO_OPTIMIZATION: │
│ ❱ 1298 │ │ │ self.optimizer = self._configure_zero_optimizer(basic_optimizer) │
│ 1299 │ │ elif optimizer_wrapper == AMP: │
│ 1300 │ │ │ amp_params = self.amp_params() │
│ 1301 │ │ │ log_dist(f"Initializing AMP with these params: {amp_params}", ranks=[0]) │
│ │
│ /home/dm/.miniconda3/envs/LLM/lib/python3.9/site-packages/deepspeed/runtime/engine.py:1599 in │
│ _configure_zero_optimizer │
│ │
│ 1596 │ │ │ │ log_dist(f'Creating {model_dtype} ZeRO stage {zero_stage} optimizer', │
│ 1597 │ │ │ │ │ │ ranks=[0]) │
│ 1598 │ │ │ │ from deepspeed.runtime.zero.stage3 import DeepSpeedZeroOptimizer_Stage3 │
│ ❱ 1599 │ │ │ │ optimizer = DeepSpeedZeroOptimizer_Stage3( │
│ 1600 │ │ │ │ │ self.module, │
│ 1601 │ │ │ │ │ optimizer, │
│ 1602 │ │ │ │ │ timers=timers, │
│ │
│ /home/dm/.miniconda3/envs/LLM/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py:312 │
│ in __init__ │
│ │
│ 309 │ │ │ f'Largest partitioned param numel = {largest_partitioned_param_numel}', │
│ 310 │ │ │ force=False) │
│ 311 │ │ │
│ ❱ 312 │ │ self._setup_for_real_optimizer() │
│ 313 │ │ self.grad_position = {} │
│ 314 │ │ self.set_grad_positions() │
│ 315 │
│ │
│ /home/dm/.miniconda3/envs/LLM/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py:371 │
│ in _setup_for_real_optimizer │
│ │
│ 368 │ │ │
│ 369 │ │ see_memory_usage("Before initializing optimizer states", force=True) │
│ 370 │ │ │
│ ❱ 371 │ │ self.initialize_optimizer_states() │
│ 372 │ │ see_memory_usage("After initializing optimizer states", force=True) │
│ 373 │ │ dist.barrier() │
│ 374 │
│ │
│ /home/dm/.miniconda3/envs/LLM/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py:924 │
│ in initialize_optimizer_states │
│ │
│ 921 │ │ │ │ self._optimizer_states_and_gradient_swap_in(i, timer_names) │
│ 922 │ │ │ │
│ 923 │ │ │ if self.offload_optimizer and not swappable_optimizer_subgroup: │
│ ❱ 924 │ │ │ │ subgroup_gradient_buffer = torch.zeros(num_elements, │
│ 925 │ │ │ │ │ │ │ │ │ │ │ │ │ dtype=gradient_dtype, │
│ 926 │ │ │ │ │ │ │ │ │ │ │ │ │ device=self.device) │
│ 927 │ │ │ │ if self.offload_optimizer_pin_memory: │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: [enforce fail at alloc_cpu.cpp:75] err == 0. DefaultCPUAllocator: can't allocate memory: you tried to allocate 4009910272 bytes. Error code 12 (Cannot allocate memory)
[2023-04-03 22:15:03,293] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 126329
[2023-04-03 22:15:03,739] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 126330
[2023-04-03 22:15:05,078] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 126331
[2023-04-03 22:15:06,340] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 126332
[2023-04-03 22:15:06,341] [ERROR] [launch.py:324:sigkill_handler] ['/home/dm/.miniconda3/envs/LLM/bin/python', '-u', 'examples/finetune.py', '--local_rank=3', '--model_name_or_path', 'decapoda-research/llama-7b-hf', '--lora_model_path', '/home/dm/projects/LMFlow/output_models/llama7b-lora-380k', '--dataset_path', '/home/dm/projects/LMFlow/data/example_dataset/train', '--output_dir', '/home/dm/projects/LMFlow/output_models/finetune', '--overwrite_output_dir', '--local_rank=4', '--per_device_train_batch_size', '1', '--per_device_eval_batch_size', '1', '--num_train_epochs', '0.01', '--learning_rate', '2e-5', '--block_size', '256', '--use_ram_optimized_load', 'False', '--per_device_train_batch_size', '1', '--deepspeed', 'configs/ds_config_zero3.json', '--run_name', 'finetune', '--validation_split_percentage', '0', '--logging_steps', '20', '--do_train', '--ddp_timeout', '72000', '--save_steps', '5000', '--dataloader_num_workers', '1'] exits with return code = 1