Giter Club home page Giter Club logo

building-with-instruction-tuned-llms-a-step-by-step-guide's Issues

CUDA error: device-side assert triggered; CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.

I'm re-running the Supervised_Instruct_tuning_OpenLLaMA_... notebook on Colab Pro with A100 GPU. I got the following error during supervised_finetuning_trainer.train() step (a quick Stack Overflow search suggests shape mismatch as the likely cause, but I did not change anything in the code):

You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the __call__ method is faster than using a method to encode the text followed by a call to the pad method to get a padded encoding.
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ in <cell line: 1>:1 │
│ │
│ /usr/local/lib/python3.10/dist-packages/transformers/trainer.py:1696 in train │
│ │
│ 1693 │ │ inner_training_loop = find_executable_batch_size( │
│ 1694 │ │ │ self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size │
│ 1695 │ │ ) │
│ ❱ 1696 │ │ return inner_training_loop( │
│ 1697 │ │ │ args=args, │
│ 1698 │ │ │ resume_from_checkpoint=resume_from_checkpoint, │
│ 1699 │ │ │ trial=trial, │
│ │
│ /usr/local/lib/python3.10/dist-packages/transformers/trainer.py:1973 in _inner_training_loop │
│ │
│ 1970 │ │ │ │ │ with model.no_sync(): │
│ 1971 │ │ │ │ │ │ tr_loss_step = self.training_step(model, inputs) │
│ 1972 │ │ │ │ else: │
│ ❱ 1973 │ │ │ │ │ tr_loss_step = self.training_step(model, inputs) │
│ 1974 │ │ │ │ │
│ 1975 │ │ │ │ if ( │
│ 1976 │ │ │ │ │ args.logging_nan_inf_filter │
│ │
│ /usr/local/lib/python3.10/dist-packages/transformers/trainer.py:2787 in training_step │
│ │
│ 2784 │ │ │ return loss_mb.reduce_mean().detach().to(self.args.device) │
│ 2785 │ │ │
│ 2786 │ │ with self.compute_loss_context_manager(): │
│ ❱ 2787 │ │ │ loss = self.compute_loss(model, inputs) │
│ 2788 │ │ │
│ 2789 │ │ if self.args.n_gpu > 1: │
│ 2790 │ │ │ loss = loss.mean() # mean() to average on multi-gpu parallel training │
│ │
│ /usr/local/lib/python3.10/dist-packages/transformers/trainer.py:2819 in compute_loss │
│ │
│ 2816 │ │ │ labels = inputs.pop("labels") │
│ 2817 │ │ else: │
│ 2818 │ │ │ labels = None │
│ ❱ 2819 │ │ outputs = model(**inputs) │
│ 2820 │ │ # Save past state if it exists │
│ 2821 │ │ # TODO: this needs to be fixed and made cleaner later. │
│ 2822 │ │ if self.args.past_index >= 0: │
│ │
│ /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1501 in _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ /usr/local/lib/python3.10/dist-packages/peft/peft_model.py:827 in forward │
│ │
│ 824 │ │ │ │ │ **kwargs, │
│ 825 │ │ │ │ ) │
│ 826 │ │ │ │
│ ❱ 827 │ │ │ return self.base_model( │
│ 828 │ │ │ │ input_ids=input_ids, │
│ 829 │ │ │ │ attention_mask=attention_mask, │
│ 830 │ │ │ │ inputs_embeds=inputs_embeds, │
│ │
│ /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1501 in _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ /usr/local/lib/python3.10/dist-packages/accelerate/hooks.py:165 in new_forward │
│ │
│ 162 │ │ │ with torch.no_grad(): │
│ 163 │ │ │ │ output = old_forward(*args, **kwargs) │
│ 164 │ │ else: │
│ ❱ 165 │ │ │ output = old_forward(*args, **kwargs) │
│ 166 │ │ return module._hf_hook.post_forward(module, output) │
│ 167 │ │
│ 168 │ module.forward = new_forward │
│ │
│ /usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py:687 in │
│ forward │
│ │
│ 684 │ │ return_dict = return_dict if return_dict is not None else self.config.use_return │
│ 685 │ │ │
│ 686 │ │ # decoder outputs consists of (dec_features, layer_state, dec_hidden, dec_attn) │
│ ❱ 687 │ │ outputs = self.model( │
│ 688 │ │ │ input_ids=input_ids, │
│ 689 │ │ │ attention_mask=attention_mask, │
│ 690 │ │ │ position_ids=position_ids, │
│ │
│ /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1501 in _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ /usr/local/lib/python3.10/dist-packages/accelerate/hooks.py:165 in new_forward │
│ │
│ 162 │ │ │ with torch.no_grad(): │
│ 163 │ │ │ │ output = old_forward(*args, **kwargs) │
│ 164 │ │ else: │
│ ❱ 165 │ │ │ output = old_forward(*args, **kwargs) │
│ 166 │ │ return module._hf_hook.post_forward(module, output) │
│ 167 │ │
│ 168 │ module.forward = new_forward │
│ │
│ /usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py:536 in │
│ forward │
│ │
│ 533 │ │ │ attention_mask = torch.ones( │
│ 534 │ │ │ │ (batch_size, seq_length_with_past), dtype=torch.bool, device=inputs_embe │
│ 535 │ │ │ ) │
│ ❱ 536 │ │ attention_mask = self._prepare_decoder_attention_mask( │
│ 537 │ │ │ attention_mask, (batch_size, seq_length), inputs_embeds, past_key_values_len │
│ 538 │ │ ) │
│ 539 │
│ │
│ /usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py:464 in │
│ _prepare_decoder_attention_mask │
│ │
│ 461 │ │ # [bsz, seq_len] -> [bsz, 1, tgt_seq_len, src_seq_len] │
│ 462 │ │ combined_attention_mask = None │
│ 463 │ │ if input_shape[-1] > 1: │
│ ❱ 464 │ │ │ combined_attention_mask = _make_causal_mask( │
│ 465 │ │ │ │ input_shape, │
│ 466 │ │ │ │ inputs_embeds.dtype, │
│ 467 │ │ │ │ device=inputs_embeds.device, │
│ │
│ /usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py:49 in │
make_causal_mask │
│ │
│ 46 │ Make causal mask used for bi-directional self-attention. │
│ 47 │ """ │
│ 48 │ bsz, tgt_len = input_ids_shape │
│ ❱ 49 │ mask = torch.full((tgt_len, tgt_len), torch.tensor(torch.finfo(dtype).min, device=de │
│ 50 │ mask_cond = torch.arange(mask.size(-1), device=device) │
│ 51 │ mask.masked_fill
(mask_cond < (mask_cond + 1).view(mask.size(-1), 1), 0) │
│ 52 │ mask = mask.to(dtype) │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be
incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Bug report: uploading model to Huggingface failed

Hi, after I finished training and trying to upload model to HF using command

base_model.push_to_hub("zbruceli/openLLaMA_QLora", private=True)

The following error occurred:

NotImplementedError: You are calling save_pretrained on a 4-bit converted model. This is currently not supported

Are there something I need to change when I create the model/repo on HF? Thank you!

Uploading the tokenizer works fine, so it is not a HF token issue.

BloomForSequenceClassification' does not have a lm_head ... can this technique still apply?

re the notebook :✉️ MarketMail AI ✉️ Fine tuning BLOOMZ (Completed Version).ipynb

https://colab.research.google.com/drive/1ARmlaZZaKyAg6HTi57psFLPeh0hDRcPX?usp=sharing

i tried to modify the example to use BloomForSequenceClassification instead of AutoModelForCausalLM but the "Post-processing on the model":
model.lm_head = CastOutputToFloat(model.lm_head)
fails because BloomForSequenceClassification does not have an attribute lm_head.

This is true, so i change code to try and affect the last layer of BloomForSequenceClassification:
model.ln_f = CastOutputToFloat(model.ln_f)
This also fails: AttributeError: 'BloomForSequenceClassification' object has no attribute 'ln_f'

This leaves me wondering can thus gradient accumulation work for BloomForSequenceClassification? Or only for AutoModelForCausalLM? Alternatively, does anyone know if AutoModelForCausalLM can be used for fine tuning a classification task equally well as BloomForSequenceClassification?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.