Giter Club home page Giter Club logo

Comments (13)

titu1994 avatar titu1994 commented on May 29, 2024

While this is an interesting application of device placement for larger models, the cost is in training time.

Your moving average weights are on the cpu, whereas the gradients of every parameter are on the gpu. With your device blocks, you are effectively shuttling gpu gradients on the cpu, performing the op and then shuttling it back onto the gpu.

This has several issues :

  1. Shuttling of gradients from GPU <-> CPU for large models will millions of parameters is done per batch. This costs too much time.
  2. The CPU must perform multiple tasks, multiprocess data loading (if the images are from ImageNet or external source in general), batching, shuffling, and finally now it must also incur the cost of synchronizing gradients and performing CPU ops on large matrices. This will cause a bottleneck on the IO pipeline. As IO is generally the major bottleneck anyway, this is highly inefficient.

This is fine, when one is willing to pay the price on compute time wrt larger models, but it is not feasible in general case.

EDIT:
Why not then just force the entire optimizer to be on the CPU device, if you incur the cost of device shuttling anyway. That way, at least the CPU ops can be streamlined.

from keras-adabound.

titu1994 avatar titu1994 commented on May 29, 2024

In addition, Gradient Checkpointing likewise already does this in similar line of thought, recomputing gradients on requirement rather than preserve them on GPU RAM. You could look into that if memory is the bottleneck and time is not a consideration.

from keras-adabound.

iperov avatar iperov commented on May 29, 2024

I already tested it and applied in my DeepFaceLab project ( deepfakes ).
8bs - 128x128 maximum face model (~500MB model files) for my 6 GB card with tf_cpu_mode=0
4bs - 256x256 (~1000MB model files) with tf_cpu_mode=1 , -10% slower. But it is due to model bigger.
8bs - 256x256 (~1000MB model files) with tf_cpu_mode=2 , -30% slower.

So this approach brings deepfakes to the new era.

from keras-adabound.

iperov avatar iperov commented on May 29, 2024

if you dont like it, just close it :)
I just wanted to share the find.

from keras-adabound.

iperov avatar iperov commented on May 29, 2024

trying AdaBound right after Adam
same lr, but final_lr = lr * 100

history of last 5k iters
NSFW pic

interesting :)

from keras-adabound.

titu1994 avatar titu1994 commented on May 29, 2024

I believe you will find similar results with simply marking the entire optimizer to lie on the CPU, but im glad you found a good alternative. I'll probably review the PR on keras-contrib sometime if it gets merged.

I must ask you to remove the image though.

from keras-adabound.

iperov avatar iperov commented on May 29, 2024

This is fine, when one is willing to pay the price on compute time wrt larger models, but it is not feasible in general case.

Batch size is very important parameter for GAN networks. So getting rid of optimizer's weights from VRAM, we can train higher batch size, sacrificing 10-20% of time per iteration.
Also I cannot feel noticeable performance loss on my coffelake machine with 2400Mhz 32GB ram.

from keras-adabound.

titu1994 avatar titu1994 commented on May 29, 2024

Sure, if one can disregard the additional training time, then your approach is fine. I wont be merging it into this since I keep 1:1 equivalence with Keras proper.

from keras-adabound.

titu1994 avatar titu1994 commented on May 29, 2024

Btw, a slight question, why not place the

            if self.amsbound:
                denom = (K.sqrt(vhat_t) + self.epsilon)
            else:
                denom = (K.sqrt(v_t) + self.epsilon)                        

            # Compute the bounds
            step_size_p = step_size * K.ones_like(denom)

inside the CPU block as well? That would offer even more memory saving since you dont need a K.ones_like() on the GPU then.

from keras-adabound.

iperov avatar iperov commented on May 29, 2024

should be tested, thanks for the tip.

from keras-adabound.

iperov avatar iperov commented on May 29, 2024

I must ask you to remove the image though.

and why?
your religion does not allow to look at the faces of women? :0

from keras-adabound.

titu1994 avatar titu1994 commented on May 29, 2024

Informed consent. If someone is casually browsing, they should not be shown random nsfw information, unless provided behind a link that clearly states that the content is nsfw and therefore they implicitly are responsible for viewing it at their own discretion.

from keras-adabound.

iperov avatar iperov commented on May 29, 2024

did not know that a bunch of clear women faces is not safe for work.

from keras-adabound.

Related Issues (10)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.