Giter Club home page Giter Club logo

Comments (14)

zucchini-nlp avatar zucchini-nlp commented on June 16, 2024 2

@amyeroberts I am not sure what should be the correct format of cache objects we return for language models since now we do not have consistency, so I wanted @gante to look at it.

There are two options for this:

  1. The language model should always return a tuple type cache (as current Llama), in which case we would have to only update Mistral to follow the same logic
  2. The language model should return the same type of cache as it received in forward. In that case Idefics2 has to add cache.to_legacy_cache() in the end by ensuring it returns a tuple type, which will be consistent with how caching works for most current language models.

Also I believe we are going to get rid of the tuple type cache sometime in the future, so cache+Trainer is something to have in mind for then

from transformers.

NielsRogge avatar NielsRogge commented on June 16, 2024 1

I had the same error and fixed it by using model.config.use_cache=False during training. But @VictorSanh might know a better option

from transformers.

NielsRogge avatar NielsRogge commented on June 16, 2024 1

Yes this is due to batches having different lengths of input_ids (in the code snippet of your first message, you set padding=True which means dynamic padding, each batch may have a different length). If your eval batch size is smaller than or equal to your training batch size, then it's fine.

It can be fixed by either padding all examples to the same length (i.e. using padding="max_length", max_length=200, truncation=True for instance), or by passing the flag eval_do_concat_batches=False to the TrainingArguments). In the latter case, you'll get a list of predictions/labels in the compute_metrics function rather than stacked tensors, so you would need to adapt your compute_metrics function accordingly.

from transformers.

amyeroberts avatar amyeroberts commented on June 16, 2024 1

@zucchini-nlp OK, great, thanks for explaining. Let's leave as-is and then once the cache format is standardized we can propogate this to idefics2 + other models.

from transformers.

EloiEynard avatar EloiEynard commented on June 16, 2024

I had the same error and fixed it by using model.config.use_cache=False during training

That fixes this issue as the past_key_values are now full tensors.
But leads to a new error :

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/home/eyel/pm-ia-traitement-documents/src/python/notebooks/Idefics2_Fine_tuning_example.ipynb Cell 9 line [1](vscode-notebook-cell://wsl%2Bubuntu/home/eyel/pm-ia-traitement-documents/src/python/notebooks/Idefics2_Fine_tuning_example.ipynb#X43sdnNjb2RlLXJlbW90ZQ%3D%3D?line=0)
----> 1 trainer.evaluate()

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer.py:3513](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer.py:3513), in Trainer.evaluate(self, eval_dataset, ignore_keys, metric_key_prefix)
   3510 start_time = time.time()
   3512 eval_loop = self.prediction_loop if self.args.use_legacy_prediction_loop else self.evaluation_loop
-> 3513 output = eval_loop(
   3514     eval_dataloader,
   3515     description="Evaluation",
   3516     # No point gathering the predictions if there are no metrics, otherwise we defer to
   3517     # self.args.prediction_loss_only
   3518     prediction_loss_only=True if self.compute_metrics is None else None,
   3519     ignore_keys=ignore_keys,
   3520     metric_key_prefix=metric_key_prefix,
   3521 )
   3523 total_batch_size = self.args.eval_batch_size * self.args.world_size
   3524 if f"{metric_key_prefix}_jit_compilation_time" in output.metrics:

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer.py:3716](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer.py:3716), in Trainer.evaluation_loop(self, dataloader, description, prediction_loss_only, ignore_keys, metric_key_prefix)
   3714         logits = self.preprocess_logits_for_metrics(logits, labels)
   3715     logits = self.gather_function((logits))
-> 3716     all_preds.add(logits)
   3717 if labels is not None:
   3718     labels = self.accelerator.pad_across_processes(labels, dim=1, pad_index=-100)

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:326](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:326), in EvalLoopContainer.add(self, tensors)
    324     self.tensors = tensors if self.do_nested_concat else [tensors]
    325 elif self.do_nested_concat:
--> 326     self.tensors = nested_concat(self.tensors, tensors, padding_index=self.padding_index)
    327 else:
    328     self.tensors.append(tensors)

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138), in nested_concat(tensors, new_tensors, padding_index)
    134 assert type(tensors) == type(
    135     new_tensors
    136 ), f"Expected `tensors` and `new_tensors` to have the same type but found {type(tensors)} and {type(new_tensors)}."
    137 if isinstance(tensors, (list, tuple)):
--> 138     return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
    139 elif isinstance(tensors, torch.Tensor):
    140     return torch_pad_and_concatenate(tensors, new_tensors, padding_index=padding_index)

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138), in <genexpr>(.0)
    134 assert type(tensors) == type(
    135     new_tensors
    136 ), f"Expected `tensors` and `new_tensors` to have the same type but found {type(tensors)} and {type(new_tensors)}."
    137 if isinstance(tensors, (list, tuple)):
--> 138     return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
    139 elif isinstance(tensors, torch.Tensor):
    140     return torch_pad_and_concatenate(tensors, new_tensors, padding_index=padding_index)

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138), in nested_concat(tensors, new_tensors, padding_index)
    134 assert type(tensors) == type(
    135     new_tensors
    136 ), f"Expected `tensors` and `new_tensors` to have the same type but found {type(tensors)} and {type(new_tensors)}."
    137 if isinstance(tensors, (list, tuple)):
--> 138     return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
    139 elif isinstance(tensors, torch.Tensor):
    140     return torch_pad_and_concatenate(tensors, new_tensors, padding_index=padding_index)

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138), in <genexpr>(.0)
    134 assert type(tensors) == type(
    135     new_tensors
    136 ), f"Expected `tensors` and `new_tensors` to have the same type but found {type(tensors)} and {type(new_tensors)}."
    137 if isinstance(tensors, (list, tuple)):
--> 138     return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
    139 elif isinstance(tensors, torch.Tensor):
    140     return torch_pad_and_concatenate(tensors, new_tensors, padding_index=padding_index)

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138), in nested_concat(tensors, new_tensors, padding_index)
    134 assert type(tensors) == type(
    135     new_tensors
    136 ), f"Expected `tensors` and `new_tensors` to have the same type but found {type(tensors)} and {type(new_tensors)}."
    137 if isinstance(tensors, (list, tuple)):
--> 138     return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
    139 elif isinstance(tensors, torch.Tensor):
    140     return torch_pad_and_concatenate(tensors, new_tensors, padding_index=padding_index)

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138), in <genexpr>(.0)
    134 assert type(tensors) == type(
    135     new_tensors
    136 ), f"Expected `tensors` and `new_tensors` to have the same type but found {type(tensors)} and {type(new_tensors)}."
    137 if isinstance(tensors, (list, tuple)):
--> 138     return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
    139 elif isinstance(tensors, torch.Tensor):
    140     return torch_pad_and_concatenate(tensors, new_tensors, padding_index=padding_index)

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:140](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:140), in nested_concat(tensors, new_tensors, padding_index)
    138     return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
    139 elif isinstance(tensors, torch.Tensor):
--> 140     return torch_pad_and_concatenate(tensors, new_tensors, padding_index=padding_index)
    141 elif isinstance(tensors, Mapping):
    142     return type(tensors)(
    143         {k: nested_concat(t, new_tensors[k], padding_index=padding_index) for k, t in tensors.items()}
    144     )

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:99](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:99), in torch_pad_and_concatenate(tensor1, tensor2, padding_index)
     96 tensor2 = atleast_1d(tensor2)
     98 if len(tensor1.shape) == 1 or tensor1.shape[1] == tensor2.shape[1]:
---> 99     return torch.cat((tensor1, tensor2), dim=0)
    101 # Let's figure out the new shape
    102 new_shape = (tensor1.shape[0] + tensor2.shape[0], max(tensor1.shape[1], tensor2.shape[1])) + tensor1.shape[2:]

RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 119 but got size 99 for tensor number 1 in the list.

from transformers.

VictorSanh avatar VictorSanh commented on June 16, 2024

I had the same error and fixed it by using model.config.use_cache=False during training. But @VictorSanh might know a better option

I don't have a better fix!

from transformers.

zucchini-nlp avatar zucchini-nlp commented on June 16, 2024

I think the cache problem should be fixed by converting DynamicCache back to legacy_cache in Idefics2's backbone language model, like it's already done in llama.

These changes are partially related to issue of making language models "compile" compatible, and should be available soon 🤗

from transformers.

amyeroberts avatar amyeroberts commented on June 16, 2024

Thanks for the explanation @zucchini-nlp! Does this mean that this fix won't be needed soon, or that it enables something which isn't available yet but will be soon?

from transformers.

zucchini-nlp avatar zucchini-nlp commented on June 16, 2024

We discussed this with @gante the cache input-output format yesterday. Maybe llama-format cache is not what we need, by anyway @gante will take care of it 😄

from transformers.

amyeroberts avatar amyeroberts commented on June 16, 2024

@zucchini-nlp OK. The main thing to know is what, if anything, should be updated in idefics2. Is what @gante is doing addressing this?

from transformers.

NielsRogge avatar NielsRogge commented on June 16, 2024

Hi @EloiEynard I just uploaded an example notebook for fine-tuning Idefics2 on an image -> JSON dataset here: https://github.com/NielsRogge/Transformers-Tutorials/blob/master/Idefics2/Fine_tune_Idefics2_for_JSON_extraction_use_cases_(PyTorch_Lightning).ipynb

from transformers.

EloiEynard avatar EloiEynard commented on June 16, 2024

Thanks @NielsRogge, I got it all figured out with the Trainer and am currently finetuning with my custom eval. Wish I knew about lightning earlier though, seems more explicit.

By the way, if you don't mind me asking, I've noticed in your notebooks you use
model.add_adapter(lora_config)
model.enable_adapters()
Where I mostly used to see model = get_peft_model(model, lora_config)
Is there any difference between these two ? Thanks

from transformers.

NielsRogge avatar NielsRogge commented on June 16, 2024

I had the same question, turns out both are equivalent. The get_peft_model API is recommended as it returns a PeftModel which has additionally utility methods such as save_adapter() with support for saving resized embedding layers. I tried leveraging it, but for some reason I gave me out-of-memory errors which I did not encounter with add_adapter. This could be due to PyTorch Lightning, the fact that I was using a notebook, or something else.

I'm currently looking into creating a similar notebook that leverages the Trainer API with get_peft_model. The reason I used PyTorch Lightning is because it allowed me to get up and running very quickly, especially regarding computing metrics during evaluation.

from transformers.

EloiEynard avatar EloiEynard commented on June 16, 2024

I see, thanks for the details !

from transformers.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.