Giter Club home page Giter Club logo

Comments (4)

arnocandel avatar arnocandel commented on July 17, 2024

Even xturing fails out of the box, which is using deepspeed. https://github.com/stochasticai/xturing/blob/main/examples/gptj/gptj_lora.py

(env) arno@rippa:/nfs4/llm/xturing/examples/gptj(main)$ python gptj_lora.py 

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 121
CUDA SETUP: Loading binary /nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/bitsandbytes-0.37.2-py3.10.egg/bitsandbytes/libbitsandbytes_cuda121.so...
trainable params: 3670016 || all params: 6054552800 || trainable%: 0.060615806339982696
2023-03-28 09:37:14,171 | DEBUG | xturing.models.causal 34 | Finetuning parameters: {'learning_rate': '1e-4', 'gradient_accumulation_steps': 1, 'batch_size': 4, 'weight_decay': 0.01, 'warmup_steps': 50, 'eval_steps': 5000, 'save_steps': 5000, 'max_length': 512, 'num_train_epochs': 3, 'logging_steps': 10, 'max_grad_norm': 2.0, 'save_total_limit': 4, 'optimizer_name': 'adamw', 'output_dir': 'saved_model'}
/nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/lightning_fabric/connector.py:562: UserWarning: 16 is supported for historical reasons but its usage is discouraged. Please set your precision to 16-mixed instead!
  rank_zero_warn(
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:67: UserWarning: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
  warning_cache.warn(
/nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/pytorch_lightning/trainer/configuration_validator.py:72: PossibleUserWarning: You defined a `validation_step` but have no `val_dataloader`. Skipping val loop.
  rank_zero_warn(
initializing deepspeed distributed: GLOBAL_RANK: 0, MEMBER: 1/2

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 121
CUDA SETUP: Loading binary /nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/bitsandbytes-0.37.2-py3.10.egg/bitsandbytes/libbitsandbytes_cuda121.so...
trainable params: 3670016 || all params: 6054552800 || trainable%: 0.060615806339982696
2023-03-28 09:39:13,881 | DEBUG | xturing.models.causal 34 | Finetuning parameters: {'learning_rate': '1e-4', 'gradient_accumulation_steps': 1, 'batch_size': 4, 'weight_decay': 0.01, 'warmup_steps': 50, 'eval_steps': 5000, 'save_steps': 5000, 'max_length': 512, 'num_train_epochs': 3, 'logging_steps': 10, 'max_grad_norm': 2.0, 'save_total_limit': 4, 'optimizer_name': 'adamw', 'output_dir': 'saved_model'}
initializing deepspeed distributed: GLOBAL_RANK: 1, MEMBER: 2/2
Enabling DeepSpeed FP16.
You are using a CUDA device ('NVIDIA GeForce RTX 4090') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]
LOCAL_RANK: 1 - CUDA_VISIBLE_DEVICES: [0,1]
Using /home/arno/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Using /home/arno/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Emitting ninja build file /home/arno/.cache/torch_extensions/py310_cu117/utils/build.ninja...
Building extension module utils...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module utils...
Time to load utils op: 0.053357839584350586 seconds
Loading extension module utils...
Time to load utils op: 0.10169410705566406 seconds
Rank: 1 partition count [2] and sizes[(1835008, False)] 
Rank: 0 partition count [2] and sizes[(1835008, False)] 
Using /home/arno/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.00026106834411621094 seconds
Using /home/arno/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0002510547637939453 seconds

  | Name          | Type                 | Params
-------------------------------------------------------
0 | pytorch_model | PeftModelForCausalLM | 6.1 B 
-------------------------------------------------------
3.7 M     Trainable params
6.1 B     Non-trainable params
6.1 B     Total params
24,218.211Total estimated model params size (MB)
You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
/nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:2365: UserWarning: `max_length` is ignored when `padding`=`True` and there is no truncation strategy. To pad to max length, use `padding='max_length'`.
  warnings.warn(
/nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:2365: UserWarning: `max_length` is ignored when `padding`=`True` and there is no truncation strategy. To pad to max length, use `padding='max_length'`.
  warnings.warn(
/nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:430: PossibleUserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 64 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
  rank_zero_warn(
Epoch 0:   0%|                                                                           | 0/6501 [00:00<?, ?it/s]You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
/nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:2365: UserWarning: `max_length` is ignored when `padding`=`True` and there is no truncation strategy. To pad to max length, use `padding='max_length'`.
  warnings.warn(
You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
/nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:2365: UserWarning: `max_length` is ignored when `padding`=`True` and there is no truncation strategy. To pad to max length, use `padding='max_length'`.
  warnings.warn(
Epoch 0:   0%|                                              | 7/6501 [00:02<46:06,  2.35it/s, v_num=0, loss=6.350]Traceback (most recent call last):
  File "/nfs4/llm/xturing/examples/gptj/gptj_lora.py", line 8, in <module>
    model.finetune(dataset=instruction_dataset)
  File "/nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/xturing/models/causal.py", line 62, in finetune
    trainer.fit()
  File "/nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/xturing/trainers/lightning_trainer.py", line 179, in fit
    self.trainer.fit(self.lightning_model)
  File "/nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 520, in fit
    call._call_and_handle_interrupt(
  File "/nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 42, in _call_and_handle_interrupt
    return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
  File "/nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 92, in launch
    return function(*args, **kwargs)
  File "/nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 559, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 935, in _run
    results = self._run_stage()
  File "/nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 978, in _run_stage
    self.fit_loop.run()
  File "/nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 201, in run
    self.advance()
  File "/nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 354, in advance
    self.epoch_loop.run(self._data_fetcher)
  File "/nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 133, in run
    self.advance(data_fetcher)
  File "/nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 218, in advance
    batch_output = self.automatic_optimization.run(trainer.optimizers[0], kwargs)
  File "/nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 185, in run
    self._optimizer_step(kwargs.get("batch_idx", 0), closure)
  File "/nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 261, in _optimizer_step
    call._call_lightning_module_hook(
  File "/nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 142, in _call_lightning_module_hook
    output = fn(*args, **kwargs)
  File "/nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/pytorch_lightning/core/module.py", line 1266, in optimizer_step
    optimizer.step(closure=optimizer_closure)
  File "/nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/pytorch_lightning/core/optimizer.py", line 158, in step
    step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs)
  File "/nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/pytorch_lightning/strategies/ddp.py", line 257, in optimizer_step
    optimizer_output = super().optimizer_step(optimizer, closure, model, **kwargs)
  File "/nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 224, in optimizer_step
    return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs)
  File "/nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/pytorch_lightning/plugins/precision/deepspeed.py", line 92, in optimizer_step
    closure_result = closure()
  File "/nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 140, in __call__
    self._result = self.closure(*args, **kwargs)
  File "/nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 126, in closure
    step_output = self._step_fn()
  File "/nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 308, in _training_step
    training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values())
  File "/nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 288, in _call_strategy_hook
    output = fn(*args, **kwargs)
  File "/nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/pytorch_lightning/strategies/ddp.py", line 329, in training_step
    return self.model(*args, **kwargs)
  File "/nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 11, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "/nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1846, in forward
    loss = self.module(*inputs, **kwargs)
  File "/nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/pytorch_lightning/overrides/base.py", line 90, in forward
    output = self._forward_module.training_step(*inputs, **kwargs)
  File "/nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/xturing/trainers/lightning_trainer.py", line 73, in training_step
    loss = self.model_engine.training_step(batch)
  File "/nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/xturing/engines/causal.py", line 48, in training_step
    outputs = self.model(
  File "/nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/peft/peft_model.py", line 529, in forward
    return self.base_model(
  File "/nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/transformers/models/gptj/modeling_gptj.py", line 852, in forward
    transformer_outputs = self.transformer(
  File "/nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/transformers/models/gptj/modeling_gptj.py", line 687, in forward
    outputs = block(
  File "/nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/transformers/models/gptj/modeling_gptj.py", line 308, in forward
    attn_outputs = self.attn(
  File "/nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/transformers/models/gptj/modeling_gptj.py", line 236, in forward
    query = torch.cat([q_rot, q_pass], dim=-1)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 26.00 MiB (GPU 1; 23.69 GiB total capacity; 22.86 GiB already allocated; 13.56 MiB free; 23.27 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTOR

from h2ogpt.

arnocandel avatar arnocandel commented on July 17, 2024

https://nn.labml.ai/neox/samples/finetune.html

from h2ogpt.

arnocandel avatar arnocandel commented on July 17, 2024

https://github.com/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/neox/samples/finetune.py

from h2ogpt.

arnocandel avatar arnocandel commented on July 17, 2024

Should be able to just use FSDP in PyTorch
https://pytorch.org/blog/introducing-pytorch-fully-sharded-data-parallel-api/#auto-wrapping

from h2ogpt.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.