Giter Club home page Giter Club logo

Comments (4)

jackcyc avatar jackcyc commented on September 26, 2024

I also encountered a similar issue while training an ImageNet classification model on a resource-limited PC. Specifically, I found that the prefetch_factor of StreamingDataLoader is set to 10 by default when num_worker > 0.

prefetch_factor=(10 if num_workers > 0 else None) if prefetch_factor is None else prefetch_factor,

This default value appears to differ from what is described in the StreamingDataLoader docstring and also PyTorch's DataLoader default.

Manually setting the prefetch_factor to a smaller number, like 2, significantly reduced the RAM usage.

from litdata.

deependujha avatar deependujha commented on September 26, 2024

Thanks for pointing out wrong default value in docstring, it'll be fixed soon.

https://github.com/Lightning-AI/litdata/blob/d5eff393cd17ba4f789fa846788f40b5ca4d0779/src/litdata/streaming/dataloader.py#L533C1-L538C1

Did you try values in the middle, like 5-6? If yes, please share your experience.

It'll help in fixing this if something needs to be done.

from litdata.

jackcyc avatar jackcyc commented on September 26, 2024

I think the behavior of StreamingDataLoader is expected.

prefetch_factor (int, optional, keyword-only arg) – Number of batches loaded in advance by each worker. 2 means there will be a total of 2 * num_workers batches prefetched across all workers. (default value depends on the set value for num_workers. If value of num_workers=0 default is None. Otherwise, if value of num_workers > 0 default is 2).

If the training_step consumes batches faster than the dataloader prepares them, the prefetch queue could be small, thereby consuming less RAM. However, if the training_step slows down and the dataloader gradually fills up the prefetch queue until it reaches the limit (prefetch_factor * num_workers), an increase in host memory usage can be observed. Therefore, I think the only problem is that the default prefetch_factor is too high and not in sync with well-known defaults, causing people to easily overlook this issue.

from litdata.

tchaton avatar tchaton commented on September 26, 2024

Hey @jackcyc. Yes, it seems to be fine with LLM pre-training and ImageNet but as you perfectly stated, it is data specific. This was a tuning I made to accelerate training as I noticied lower GPU utilization.

If you feel like the value should be put back to something lower, feel free to make a PR and we will merge it.

from litdata.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.