System Info transformers ve

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thanks for the explanation <a class="user-mention notranslate" data-hovercard-type="us

We discussed this with <a class="user-mention notranslate" data-hovercard-type="user"

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Setting compute_metrics in Trainer with Idefics2ForConditionalGeneration leads to AttributeError: 'DynamicCache' object has no attribute 'detach' about transformers HOT 14 CLOSED

EloiEynard commented on June 16, 2024

Setting compute_metrics in Trainer with Idefics2ForConditionalGeneration leads to AttributeError: 'DynamicCache' object has no attribute 'detach'

from transformers.

Comments (14)

zucchini-nlp commented on June 16, 2024 2

@amyeroberts I am not sure what should be the correct format of cache objects we return for language models since now we do not have consistency, so I wanted @gante to look at it.

There are two options for this:

The language model should always return a tuple type cache (as current Llama), in which case we would have to only update Mistral to follow the same logic
The language model should return the same type of cache as it received in forward. In that case Idefics2 has to add cache.to_legacy_cache() in the end by ensuring it returns a tuple type, which will be consistent with how caching works for most current language models.

Also I believe we are going to get rid of the tuple type cache sometime in the future, so cache+Trainer is something to have in mind for then

from transformers.

NielsRogge commented on June 16, 2024 1

I had the same error and fixed it by using model.config.use_cache=False during training. But @VictorSanh might know a better option

from transformers.

NielsRogge commented on June 16, 2024 1

Yes this is due to batches having different lengths of input_ids (in the code snippet of your first message, you set padding=True which means dynamic padding, each batch may have a different length). If your eval batch size is smaller than or equal to your training batch size, then it's fine.

It can be fixed by either padding all examples to the same length (i.e. using padding="max_length", max_length=200, truncation=True for instance), or by passing the flag eval_do_concat_batches=False to the TrainingArguments). In the latter case, you'll get a list of predictions/labels in the compute_metrics function rather than stacked tensors, so you would need to adapt your compute_metrics function accordingly.

from transformers.

amyeroberts commented on June 16, 2024 1

@zucchini-nlp OK, great, thanks for explaining. Let's leave as-is and then once the cache format is standardized we can propogate this to idefics2 + other models.

from transformers.

EloiEynard commented on June 16, 2024

I had the same error and fixed it by using model.config.use_cache=False during training

That fixes this issue as the past_key_values are now full tensors.
But leads to a new error :

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/home/eyel/pm-ia-traitement-documents/src/python/notebooks/Idefics2_Fine_tuning_example.ipynb Cell 9 line [1](vscode-notebook-cell://wsl%2Bubuntu/home/eyel/pm-ia-traitement-documents/src/python/notebooks/Idefics2_Fine_tuning_example.ipynb#X43sdnNjb2RlLXJlbW90ZQ%3D%3D?line=0)
----> 1 trainer.evaluate()

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer.py:3513](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer.py:3513), in Trainer.evaluate(self, eval_dataset, ignore_keys, metric_key_prefix)
   3510 start_time = time.time()
   3512 eval_loop = self.prediction_loop if self.args.use_legacy_prediction_loop else self.evaluation_loop
-> 3513 output = eval_loop(
   3514     eval_dataloader,
   3515     description="Evaluation",
   3516     # No point gathering the predictions if there are no metrics, otherwise we defer to
   3517     # self.args.prediction_loss_only
   3518     prediction_loss_only=True if self.compute_metrics is None else None,
   3519     ignore_keys=ignore_keys,
   3520     metric_key_prefix=metric_key_prefix,
   3521 )
   3523 total_batch_size = self.args.eval_batch_size * self.args.world_size
   3524 if f"{metric_key_prefix}_jit_compilation_time" in output.metrics:

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer.py:3716](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer.py:3716), in Trainer.evaluation_loop(self, dataloader, description, prediction_loss_only, ignore_keys, metric_key_prefix)
   3714         logits = self.preprocess_logits_for_metrics(logits, labels)
   3715     logits = self.gather_function((logits))
-> 3716     all_preds.add(logits)
   3717 if labels is not None:
   3718     labels = self.accelerator.pad_across_processes(labels, dim=1, pad_index=-100)

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:326](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:326), in EvalLoopContainer.add(self, tensors)
    324     self.tensors = tensors if self.do_nested_concat else [tensors]
    325 elif self.do_nested_concat:
--> 326     self.tensors = nested_concat(self.tensors, tensors, padding_index=self.padding_index)
    327 else:
    328     self.tensors.append(tensors)

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138), in nested_concat(tensors, new_tensors, padding_index)
    134 assert type(tensors) == type(
    135     new_tensors
    136 ), f"Expected `tensors` and `new_tensors` to have the same type but found {type(tensors)} and {type(new_tensors)}."
    137 if isinstance(tensors, (list, tuple)):
--> 138     return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
    139 elif isinstance(tensors, torch.Tensor):
    140     return torch_pad_and_concatenate(tensors, new_tensors, padding_index=padding_index)

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138), in <genexpr>(.0)
    134 assert type(tensors) == type(
    135     new_tensors
    136 ), f"Expected `tensors` and `new_tensors` to have the same type but found {type(tensors)} and {type(new_tensors)}."
    137 if isinstance(tensors, (list, tuple)):
--> 138     return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
    139 elif isinstance(tensors, torch.Tensor):
    140     return torch_pad_and_concatenate(tensors, new_tensors, padding_index=padding_index)

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138), in nested_concat(tensors, new_tensors, padding_index)
    134 assert type(tensors) == type(
    135     new_tensors
    136 ), f"Expected `tensors` and `new_tensors` to have the same type but found {type(tensors)} and {type(new_tensors)}."
    137 if isinstance(tensors, (list, tuple)):
--> 138     return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
    139 elif isinstance(tensors, torch.Tensor):
    140     return torch_pad_and_concatenate(tensors, new_tensors, padding_index=padding_index)

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138), in <genexpr>(.0)
    134 assert type(tensors) == type(
    135     new_tensors
    136 ), f"Expected `tensors` and `new_tensors` to have the same type but found {type(tensors)} and {type(new_tensors)}."
    137 if isinstance(tensors, (list, tuple)):
--> 138     return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
    139 elif isinstance(tensors, torch.Tensor):
    140     return torch_pad_and_concatenate(tensors, new_tensors, padding_index=padding_index)

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138), in nested_concat(tensors, new_tensors, padding_index)
    134 assert type(tensors) == type(
    135     new_tensors
    136 ), f"Expected `tensors` and `new_tensors` to have the same type but found {type(tensors)} and {type(new_tensors)}."
    137 if isinstance(tensors, (list, tuple)):
--> 138     return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
    139 elif isinstance(tensors, torch.Tensor):
    140     return torch_pad_and_concatenate(tensors, new_tensors, padding_index=padding_index)

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138), in <genexpr>(.0)
    134 assert type(tensors) == type(
    135     new_tensors
    136 ), f"Expected `tensors` and `new_tensors` to have the same type but found {type(tensors)} and {type(new_tensors)}."
    137 if isinstance(tensors, (list, tuple)):
--> 138     return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
    139 elif isinstance(tensors, torch.Tensor):
    140     return torch_pad_and_concatenate(tensors, new_tensors, padding_index=padding_index)

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:140](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:140), in nested_concat(tensors, new_tensors, padding_index)
    138     return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
    139 elif isinstance(tensors, torch.Tensor):
--> 140     return torch_pad_and_concatenate(tensors, new_tensors, padding_index=padding_index)
    141 elif isinstance(tensors, Mapping):
    142     return type(tensors)(
    143         {k: nested_concat(t, new_tensors[k], padding_index=padding_index) for k, t in tensors.items()}
    144     )

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:99](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:99), in torch_pad_and_concatenate(tensor1, tensor2, padding_index)
     96 tensor2 = atleast_1d(tensor2)
     98 if len(tensor1.shape) == 1 or tensor1.shape[1] == tensor2.shape[1]:
---> 99     return torch.cat((tensor1, tensor2), dim=0)
    101 # Let's figure out the new shape
    102 new_shape = (tensor1.shape[0] + tensor2.shape[0], max(tensor1.shape[1], tensor2.shape[1])) + tensor1.shape[2:]

RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 119 but got size 99 for tensor number 1 in the list.

from transformers.

VictorSanh commented on June 16, 2024

I had the same error and fixed it by using model.config.use_cache=False during training. But @VictorSanh might know a better option

I don't have a better fix!

from transformers.

zucchini-nlp commented on June 16, 2024

I think the cache problem should be fixed by converting DynamicCache back to legacy_cache in Idefics2's backbone language model, like it's already done in llama.

These changes are partially related to issue of making language models "compile" compatible, and should be available soon 🤗

from transformers.

amyeroberts commented on June 16, 2024

Thanks for the explanation @zucchini-nlp! Does this mean that this fix won't be needed soon, or that it enables something which isn't available yet but will be soon?

from transformers.

zucchini-nlp commented on June 16, 2024

We discussed this with @gante the cache input-output format yesterday. Maybe llama-format cache is not what we need, by anyway @gante will take care of it 😄

from transformers.

amyeroberts commented on June 16, 2024

@zucchini-nlp OK. The main thing to know is what, if anything, should be updated in idefics2. Is what @gante is doing addressing this?

from transformers.

NielsRogge commented on June 16, 2024

Hi @EloiEynard I just uploaded an example notebook for fine-tuning Idefics2 on an image -> JSON dataset here: https://github.com/NielsRogge/Transformers-Tutorials/blob/master/Idefics2/Fine_tune_Idefics2_for_JSON_extraction_use_cases_(PyTorch_Lightning).ipynb

from transformers.

EloiEynard commented on June 16, 2024

Thanks @NielsRogge, I got it all figured out with the Trainer and am currently finetuning with my custom eval. Wish I knew about lightning earlier though, seems more explicit.

By the way, if you don't mind me asking, I've noticed in your notebooks you use
model.add_adapter(lora_config)
model.enable_adapters()
Where I mostly used to see model = get_peft_model(model, lora_config)
Is there any difference between these two ? Thanks

from transformers.

NielsRogge commented on June 16, 2024

I had the same question, turns out both are equivalent. The get_peft_model API is recommended as it returns a PeftModel which has additionally utility methods such as save_adapter() with support for saving resized embedding layers. I tried leveraging it, but for some reason I gave me out-of-memory errors which I did not encounter with add_adapter. This could be due to PyTorch Lightning, the fact that I was using a notebook, or something else.

I'm currently looking into creating a similar notebook that leverages the Trainer API with get_peft_model. The reason I used PyTorch Lightning is because it allowed me to get up and running very quickly, especially regarding computing metrics during evaluation.

from transformers.

EloiEynard commented on June 16, 2024

I see, thanks for the details !

from transformers.

Setting compute_metrics in Trainer with Idefics2ForConditionalGeneration leads to AttributeError: 'DynamicCache' object has no attribute 'detach' about transformers HOT 14 CLOSED

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent