🐛 Bug It is a known issue with PyTorch's IterableDataset that iss

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Sorry <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

StreamingDataset incompatibility with PyTorch Lightning about litdata HOT 11 CLOSED

enrico-stauss commented on September 27, 2024

StreamingDataset incompatibility with PyTorch Lightning

from litdata.

Comments (11)

enrico-stauss commented on September 27, 2024 1

That is exactly the problem. As you can read in the second issue I linked, even if the size IS exact, when specifying drop_last=True, pytorch_lightning seems to skip the validation.
Also the warning is still raised by pytorch lightning.

I'll try to provide a MWE for it when I got some time to spare.

from litdata.

tchaton commented on September 27, 2024 1

@enrico-stauss I think I have a fix. Could you try this branch: #139. This will work only with the StreamingDataLoader

Example of the issue: There is 300 samples, 2 workers, batch size of 4. This is 300 / (4 * 2) = 37.5 batches. Because there is a non completed batch, the StopIteration is triggered while fetching the last batch and the validation is skipped.

My PR extends the StreamingDataLoader to pass the number of workers and batch size to the dataset, so the shuffler can drop the extra 0.5 batches causing the issue.

from litdata.

tchaton commented on September 27, 2024 1

Hey @enrico-stauss, can you confirm it works for you with the PR ?

from litdata.

tchaton commented on September 27, 2024

Hey @enrico-stauss, can you share a reproducible script of the problem. Not sure I fully follow it. The size of the StreamingDataset should be exact. If not, there is a bug.

from litdata.

enrico-stauss commented on September 27, 2024

@tchaton
Please have a look at the modified original post. You can exchange DROP_LAST_TRAIN_SAMPLE=False to see that it then does run the validation epoch.

from litdata.

enrico-stauss commented on September 27, 2024

Maybe changing to the standard Dataset type could also help with this one #135 (comment).

from litdata.

tchaton commented on September 27, 2024

Hey @enrico-stauss, changing the base type is a very large task and not something I am planing to do.

from litdata.

enrico-stauss commented on September 27, 2024

I understand. Do you have any idea how to proceed though, as it does severely break compatibility? I might have a look at it but can't promise anything.
In all honesty, I think the change should be made on the side of PyTorchLightning but as mentioned here, it seems as this is just not possible at the moment.

from litdata.

enrico-stauss commented on September 27, 2024

Sorry @tchaton I did not find time to test it earlier. My MWE however still shows that no validation is performed even with the updates that are not merged into main. I don't think it's possible to resolve this from the side of LitData without either removing the __len__ method or switching to the standard Dataset base class.

from litdata.

tchaton commented on September 27, 2024

Hey @enrico-stauss Trust me, we are going to figure this out. And I am one of the core dev of PyTorch Lightning, so we will find a way. But It think this is a litdata problem.

Would you be available to pair debug this with me sometimes next week ?

Also, would you be interested to join the core team of litdata ?

from litdata.

enrico-stauss commented on September 27, 2024

Hi @tchaton
The reason I believe that it's not a LitData problem is, that the second issue I linked in the original post already reported the issue using the 'IterableDataset' as base class.

But with you being a core dev of PyTorchLightning, too, I'm confident that we can figure it out.

I think we can schedule a meeting for next week, let's get in touch on discord. Then we can also talk about what you proposed. :)

from litdata.

StreamingDataset incompatibility with PyTorch Lightning about litdata HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent