Giter Club home page Giter Club logo

Comments (11)

enrico-stauss avatar enrico-stauss commented on September 27, 2024 1

That is exactly the problem. As you can read in the second issue I linked, even if the size IS exact, when specifying drop_last=True, pytorch_lightning seems to skip the validation.
Also the warning is still raised by pytorch lightning.

I'll try to provide a MWE for it when I got some time to spare.

from litdata.

tchaton avatar tchaton commented on September 27, 2024 1

@enrico-stauss I think I have a fix. Could you try this branch: #139. This will work only with the StreamingDataLoader

Example of the issue: There is 300 samples, 2 workers, batch size of 4. This is 300 / (4 * 2) = 37.5 batches. Because there is a non completed batch, the StopIteration is triggered while fetching the last batch and the validation is skipped.

My PR extends the StreamingDataLoader to pass the number of workers and batch size to the dataset, so the shuffler can drop the extra 0.5 batches causing the issue.

from litdata.

tchaton avatar tchaton commented on September 27, 2024 1

Hey @enrico-stauss, can you confirm it works for you with the PR ?

from litdata.

tchaton avatar tchaton commented on September 27, 2024

Hey @enrico-stauss, can you share a reproducible script of the problem. Not sure I fully follow it. The size of the StreamingDataset should be exact. If not, there is a bug.

from litdata.

enrico-stauss avatar enrico-stauss commented on September 27, 2024

@tchaton
Please have a look at the modified original post. You can exchange DROP_LAST_TRAIN_SAMPLE=False to see that it then does run the validation epoch.

from litdata.

enrico-stauss avatar enrico-stauss commented on September 27, 2024

Maybe changing to the standard Dataset type could also help with this one #135 (comment).

from litdata.

tchaton avatar tchaton commented on September 27, 2024

Hey @enrico-stauss, changing the base type is a very large task and not something I am planing to do.

from litdata.

enrico-stauss avatar enrico-stauss commented on September 27, 2024

I understand. Do you have any idea how to proceed though, as it does severely break compatibility? I might have a look at it but can't promise anything.
In all honesty, I think the change should be made on the side of PyTorchLightning but as mentioned here, it seems as this is just not possible at the moment.

from litdata.

enrico-stauss avatar enrico-stauss commented on September 27, 2024

Sorry @tchaton I did not find time to test it earlier. My MWE however still shows that no validation is performed even with the updates that are not merged into main. I don't think it's possible to resolve this from the side of LitData without either removing the __len__ method or switching to the standard Dataset base class.

from litdata.

tchaton avatar tchaton commented on September 27, 2024

Hey @enrico-stauss Trust me, we are going to figure this out. And I am one of the core dev of PyTorch Lightning, so we will find a way. But It think this is a litdata problem.

Would you be available to pair debug this with me sometimes next week ?

Also, would you be interested to join the core team of litdata ?

from litdata.

enrico-stauss avatar enrico-stauss commented on September 27, 2024

Hi @tchaton
The reason I believe that it's not a LitData problem is, that the second issue I linked in the original post already reported the issue using the 'IterableDataset' as base class.

But with you being a core dev of PyTorchLightning, too, I'm confident that we can figure it out.

I think we can schedule a meeting for next week, let's get in touch on discord. Then we can also talk about what you proposed. :)

from litdata.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.