Giter Club home page Giter Club logo

Comments (20)

bhack avatar bhack commented on August 15, 2024 1

Yes It Is what I meant that PR is more than 1 year old but the fallback code with sort when deterministic is required it is still there.

So if you see the perf regression only in deterministic mode it could be that forcing upsample to be float32 is going to significantly impact only the sort fallback.

from benchmark.

xuzhao9 avatar xuzhao9 commented on August 15, 2024

@bhack pytorch/pytorch#121324 slows down pytorch_unet (45ms -> 51ms) with higher gpu memory usage. This model is using the amp precision.
Is this expected?

from benchmark.

bhack avatar bhack commented on August 15, 2024

/cc @albanD
It is expected as the upsampling is now float32 with amp. Is there a way for you to test the torch.comile perf with amp?

from benchmark.

albanD avatar albanD commented on August 15, 2024

Not sure, @xuzhao9 are we running any compile-d perf measurement with amp?

from benchmark.

bhack avatar bhack commented on August 15, 2024

This was the big topic on the PR thread as we didn't see the same gradient limit with the compiled path.
So I suppose compiled doesn't need to be forced float32 with amp. At least with the inputs we have tested at:
pytorch/pytorch#121324 (comment)

from benchmark.

xuzhao9 avatar xuzhao9 commented on August 15, 2024

Not sure, @xuzhao9 are we running any compile-d perf measurement with amp?

In this CI workflow we do not run compile, the PR affects the eager mode path (non-compiled).
For the compile-d workflow result we need to check the HUD: https://hud.pytorch.org/benchmark/compilers

from benchmark.

bhack avatar bhack commented on August 15, 2024

For the compile-d workflow result we need to check the HUD: https://hud.pytorch.org/benchmark/compilers

Is there a way to isolate pytorch_unet on that HUD page?

from benchmark.

xuzhao9 avatar xuzhao9 commented on August 15, 2024

@bhack Check: https://hud.pytorch.org/benchmark/torchbench/inductor_with_cudagraphs?startTime=Fri,%2019%20Apr%202024%2019:18:11%20GMT&stopTime=Fri,%2026%20Apr%202024%2019:18:11%20GMT&granularity=hour&mode=training&model=pytorch_unet&dtype=amp&lBranch=main&lCommit=59a1f1f308545e3ac1d81940a51f8dc0db3d82d4&rBranch=main&rCommit=b2f6cfd9c061a212cde8c8768fda41cc75a3110c

from benchmark.

bhack avatar bhack commented on August 15, 2024

It seems we cannot isolate before and after the PR on that page but on a coarse daily timescale I don't see the perf drop on the compiled version so I think it could be ok.
Do we care about the final effect on trained network in eager mode of:
pytorch/pytorch#121072

How is it going to impact accuracy this increased precision in eager mode?
Cause the problem is that now we skipped the accuracy check it as it is not deterministic in eager mode.

from benchmark.

xuzhao9 avatar xuzhao9 commented on August 15, 2024

@bhack One thing I noticed: in eager mode, the regression happens only on amp+inference. However, Inductor CI does not test this combination: https://hud.pytorch.org/benchmark/torchbench/inductor_with_cudagraphs?startTime=Fri%2C+19+Apr+2024+19%3A18%3A11+GMT&stopTime=Fri%2C+26+Apr+2024+19%3A18%3A11+GMT&granularity=hour&mode=training&model=pytorch_unet&dtype=amp&lBranch=main&lCommit=59a1f1f308545e3ac1d81940a51f8dc0db3d82d4&rBranch=main&rCommit=b2f6cfd9c061a212cde8c8768fda41cc75a3110c.
.
It does not regress on train+amp or inference+bf16, but I am not sure about inference+amp since there is no data.

from benchmark.

bhack avatar bhack commented on August 15, 2024

eager mode, the regression happens only on amp+inference.

It is strange. Are you sure this is tested also eager + amp training?

from benchmark.

bhack avatar bhack commented on August 15, 2024

Is the ops list in the PR complete to cover the backward or we need to add something else?

from benchmark.

xuzhao9 avatar xuzhao9 commented on August 15, 2024

@bhack Yes, we test both train and eval:

Left: 20240424
Right: 20240426

image image

Train has a much smaller regression than eval.

Is this because we have different torch.backends.cudnn.deterministic setup for train and eval tests? https://github.com/pytorch/benchmark/blob/main/torchbenchmark/models/pytorch_unet/__init__.py#L91C9-L91C43

I think the regression happens only when torch.backends.cudnn.deterministic = True.

from benchmark.

bhack avatar bhack commented on August 15, 2024

I think it is expected. See @lezcano comment at pytorch/pytorch#121769 (comment).

But I don't know how it could be connected to my PR. If you git blame for the sort fallback can you check when it is introduced?

from benchmark.

bhack avatar bhack commented on August 15, 2024

The deterministic fallback was merged on 29 march 2023:
pytorch/pytorch#96898

from benchmark.

bhack avatar bhack commented on August 15, 2024

So I suppose that in pure eager we have a similar sort fallback when we ask for determinism (I have not checked it in the source code but I suppose is there).

Is it that the sort fallback is going to be more impacted by working at float32 then the non fallback atomic_add version?

This is the only explanation I have in mind for your regression only in deterministic mode.

from benchmark.

xuzhao9 avatar xuzhao9 commented on August 15, 2024

@bhack The regression happened on 20240425, so it can't be pytorch/pytorch#96898

from benchmark.

xuzhao9 avatar xuzhao9 commented on August 15, 2024

cc @malfet I am wondering what should we do about this regression - looks like it is expected since we are upsampling the float32 tensors in amp and it only affects the deterministic mode?

from benchmark.

xuzhao9 avatar xuzhao9 commented on August 15, 2024

Closing this case since it is expected result of upsampling.

from benchmark.

albanD avatar albanD commented on August 15, 2024

Marked the PR bc-breaking to make sure we properly warn in the release notes.

from benchmark.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.