Comments (14)
@amyeroberts I am not sure what should be the correct format of cache objects we return for language models since now we do not have consistency, so I wanted @gante to look at it.
There are two options for this:
- The language model should always return a tuple type cache (as current Llama), in which case we would have to only update Mistral to follow the same logic
- The language model should return the same type of cache as it received in forward. In that case Idefics2 has to add
cache.to_legacy_cache()
in the end by ensuring it returns a tuple type, which will be consistent with how caching works for most current language models.
Also I believe we are going to get rid of the tuple type cache sometime in the future, so cache+Trainer is something to have in mind for then
from transformers.
I had the same error and fixed it by using model.config.use_cache=False
during training. But @VictorSanh might know a better option
from transformers.
Yes this is due to batches having different lengths of input_ids
(in the code snippet of your first message, you set padding=True
which means dynamic padding, each batch may have a different length). If your eval batch size is smaller than or equal to your training batch size, then it's fine.
It can be fixed by either padding all examples to the same length (i.e. using padding="max_length", max_length=200, truncation=True
for instance), or by passing the flag eval_do_concat_batches=False
to the TrainingArguments). In the latter case, you'll get a list of predictions/labels in the compute_metrics
function rather than stacked tensors, so you would need to adapt your compute_metrics
function accordingly.
from transformers.
@zucchini-nlp OK, great, thanks for explaining. Let's leave as-is and then once the cache format is standardized we can propogate this to idefics2 + other models.
from transformers.
I had the same error and fixed it by using
model.config.use_cache=False
during training
That fixes this issue as the past_key_values are now full tensors.
But leads to a new error :
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
/home/eyel/pm-ia-traitement-documents/src/python/notebooks/Idefics2_Fine_tuning_example.ipynb Cell 9 line [1](vscode-notebook-cell://wsl%2Bubuntu/home/eyel/pm-ia-traitement-documents/src/python/notebooks/Idefics2_Fine_tuning_example.ipynb#X43sdnNjb2RlLXJlbW90ZQ%3D%3D?line=0)
----> 1 trainer.evaluate()
File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer.py:3513](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer.py:3513), in Trainer.evaluate(self, eval_dataset, ignore_keys, metric_key_prefix)
3510 start_time = time.time()
3512 eval_loop = self.prediction_loop if self.args.use_legacy_prediction_loop else self.evaluation_loop
-> 3513 output = eval_loop(
3514 eval_dataloader,
3515 description="Evaluation",
3516 # No point gathering the predictions if there are no metrics, otherwise we defer to
3517 # self.args.prediction_loss_only
3518 prediction_loss_only=True if self.compute_metrics is None else None,
3519 ignore_keys=ignore_keys,
3520 metric_key_prefix=metric_key_prefix,
3521 )
3523 total_batch_size = self.args.eval_batch_size * self.args.world_size
3524 if f"{metric_key_prefix}_jit_compilation_time" in output.metrics:
File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer.py:3716](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer.py:3716), in Trainer.evaluation_loop(self, dataloader, description, prediction_loss_only, ignore_keys, metric_key_prefix)
3714 logits = self.preprocess_logits_for_metrics(logits, labels)
3715 logits = self.gather_function((logits))
-> 3716 all_preds.add(logits)
3717 if labels is not None:
3718 labels = self.accelerator.pad_across_processes(labels, dim=1, pad_index=-100)
File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:326](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:326), in EvalLoopContainer.add(self, tensors)
324 self.tensors = tensors if self.do_nested_concat else [tensors]
325 elif self.do_nested_concat:
--> 326 self.tensors = nested_concat(self.tensors, tensors, padding_index=self.padding_index)
327 else:
328 self.tensors.append(tensors)
File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138), in nested_concat(tensors, new_tensors, padding_index)
134 assert type(tensors) == type(
135 new_tensors
136 ), f"Expected `tensors` and `new_tensors` to have the same type but found {type(tensors)} and {type(new_tensors)}."
137 if isinstance(tensors, (list, tuple)):
--> 138 return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
139 elif isinstance(tensors, torch.Tensor):
140 return torch_pad_and_concatenate(tensors, new_tensors, padding_index=padding_index)
File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138), in <genexpr>(.0)
134 assert type(tensors) == type(
135 new_tensors
136 ), f"Expected `tensors` and `new_tensors` to have the same type but found {type(tensors)} and {type(new_tensors)}."
137 if isinstance(tensors, (list, tuple)):
--> 138 return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
139 elif isinstance(tensors, torch.Tensor):
140 return torch_pad_and_concatenate(tensors, new_tensors, padding_index=padding_index)
File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138), in nested_concat(tensors, new_tensors, padding_index)
134 assert type(tensors) == type(
135 new_tensors
136 ), f"Expected `tensors` and `new_tensors` to have the same type but found {type(tensors)} and {type(new_tensors)}."
137 if isinstance(tensors, (list, tuple)):
--> 138 return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
139 elif isinstance(tensors, torch.Tensor):
140 return torch_pad_and_concatenate(tensors, new_tensors, padding_index=padding_index)
File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138), in <genexpr>(.0)
134 assert type(tensors) == type(
135 new_tensors
136 ), f"Expected `tensors` and `new_tensors` to have the same type but found {type(tensors)} and {type(new_tensors)}."
137 if isinstance(tensors, (list, tuple)):
--> 138 return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
139 elif isinstance(tensors, torch.Tensor):
140 return torch_pad_and_concatenate(tensors, new_tensors, padding_index=padding_index)
File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138), in nested_concat(tensors, new_tensors, padding_index)
134 assert type(tensors) == type(
135 new_tensors
136 ), f"Expected `tensors` and `new_tensors` to have the same type but found {type(tensors)} and {type(new_tensors)}."
137 if isinstance(tensors, (list, tuple)):
--> 138 return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
139 elif isinstance(tensors, torch.Tensor):
140 return torch_pad_and_concatenate(tensors, new_tensors, padding_index=padding_index)
File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138), in <genexpr>(.0)
134 assert type(tensors) == type(
135 new_tensors
136 ), f"Expected `tensors` and `new_tensors` to have the same type but found {type(tensors)} and {type(new_tensors)}."
137 if isinstance(tensors, (list, tuple)):
--> 138 return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
139 elif isinstance(tensors, torch.Tensor):
140 return torch_pad_and_concatenate(tensors, new_tensors, padding_index=padding_index)
File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:140](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:140), in nested_concat(tensors, new_tensors, padding_index)
138 return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
139 elif isinstance(tensors, torch.Tensor):
--> 140 return torch_pad_and_concatenate(tensors, new_tensors, padding_index=padding_index)
141 elif isinstance(tensors, Mapping):
142 return type(tensors)(
143 {k: nested_concat(t, new_tensors[k], padding_index=padding_index) for k, t in tensors.items()}
144 )
File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:99](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:99), in torch_pad_and_concatenate(tensor1, tensor2, padding_index)
96 tensor2 = atleast_1d(tensor2)
98 if len(tensor1.shape) == 1 or tensor1.shape[1] == tensor2.shape[1]:
---> 99 return torch.cat((tensor1, tensor2), dim=0)
101 # Let's figure out the new shape
102 new_shape = (tensor1.shape[0] + tensor2.shape[0], max(tensor1.shape[1], tensor2.shape[1])) + tensor1.shape[2:]
RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 119 but got size 99 for tensor number 1 in the list.
from transformers.
I had the same error and fixed it by using
model.config.use_cache=False
during training. But @VictorSanh might know a better option
I don't have a better fix!
from transformers.
I think the cache problem should be fixed by converting DynamicCache
back to legacy_cache in Idefics2's backbone language model, like it's already done in llama.
These changes are partially related to issue of making language models "compile" compatible, and should be available soon 🤗
from transformers.
Thanks for the explanation @zucchini-nlp! Does this mean that this fix won't be needed soon, or that it enables something which isn't available yet but will be soon?
from transformers.
We discussed this with @gante the cache input-output format yesterday. Maybe llama-format cache is not what we need, by anyway @gante will take care of it 😄
from transformers.
@zucchini-nlp OK. The main thing to know is what, if anything, should be updated in idefics2. Is what @gante is doing addressing this?
from transformers.
Hi @EloiEynard I just uploaded an example notebook for fine-tuning Idefics2 on an image -> JSON dataset here: https://github.com/NielsRogge/Transformers-Tutorials/blob/master/Idefics2/Fine_tune_Idefics2_for_JSON_extraction_use_cases_(PyTorch_Lightning).ipynb
from transformers.
Thanks @NielsRogge, I got it all figured out with the Trainer and am currently finetuning with my custom eval. Wish I knew about lightning earlier though, seems more explicit.
By the way, if you don't mind me asking, I've noticed in your notebooks you use
model.add_adapter(lora_config)
model.enable_adapters()
Where I mostly used to see model = get_peft_model(model, lora_config)
Is there any difference between these two ? Thanks
from transformers.
I had the same question, turns out both are equivalent. The get_peft_model
API is recommended as it returns a PeftModel
which has additionally utility methods such as save_adapter()
with support for saving resized embedding layers. I tried leveraging it, but for some reason I gave me out-of-memory errors which I did not encounter with add_adapter
. This could be due to PyTorch Lightning, the fact that I was using a notebook, or something else.
I'm currently looking into creating a similar notebook that leverages the Trainer API with get_peft_model
. The reason I used PyTorch Lightning is because it allowed me to get up and running very quickly, especially regarding computing metrics during evaluation.
from transformers.
I see, thanks for the details !
from transformers.
Related Issues (20)
- RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::BFloat16 HOT 6
- Add post_process_depth_estimation to image processors HOT 1
- [BUG] Offline loading of non-safe tensors fails HOT 3
- `center_crop` outputs wrong sized array if provided with odd-numbered dimensions smaller than requested crop size HOT 1
- LLama3-70b LoRa results in OOM with torchrun but succeeds with python3 command HOT 2
- Sink Cache Attention Scores are strange. CausalMask seems not working. HOT 2
- Libraries import missing, unable to load image for inference and not able to load pipeline with the trained model HOT 4
- CLIPTokenizerFast cause memory leak HOT 2
- VisEncoderDecoderModel generate text incomplete when predict image with long text label HOT 1
- Trained tokenizer has broken encoding for cyrillic HOT 3
- Running out of memory while finetuning and inferencing VideoMAE due to which script is being killed. HOT 5
- Trainer memory leak for evaluation with `compute_metrics` with persistent workers HOT 5
- Llama Model throwing "RuntimeError: expected scalar type BFloat16 but found Float" when using torch.compile and AMP together HOT 8
- [LLaMA3] 'add_bos_token=True, add_eos_token=True' seems not taking effect HOT 4
- google/siglip-so400m-patch14-384 inference output mismatch with pipeline output HOT 4
- Why using empty tensor to initialize? HOT 3
- Allow `ConversationalPipeline` to receive string input HOT 3
- Weird behaviour running AWQ code on RTX 4000 Ada that worked on Tesla T4 HOT 5
- AttributeError: 'BertModel' object has no attribute 'attn_implementation' HOT 16
- Training GPT2 with run_clm.py exceeds the described memory amount . HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from transformers.