Giter Club home page Giter Club logo

Comments (23)

csyonghe avatar csyonghe commented on July 21, 2024

The UE5 integration is creating a slangGlobalSession every time when compiling an HLSL file.
creating global session is a very expensive operation and is only intended to be called once by the application, and having the resulting global session shared for the entire application life-time. This should cut a lot of compile time from the benchmark.

We still need to investigate the increased dxc time though.

from slang.

jkwak-work avatar jkwak-work commented on July 21, 2024

There are 10 shaders for UE TSR.
And there are 40 permutations.
The compile time for most of shaders are unchanged and they take under 1 second.

I found that following five permutations are taking longer with DXC for compiling the HLSL output from Slang.

 TSRRejectShading.usf permutation 0: compile time increased from 6s to 17s
 TSRRejectShading.usf permutation 1: compile time increased from 15s to 123s
 TSRRejectShading.usf permutation 2: compile time increased from 2s to 3s
 TSRRejectShading.usf permutation 3: compile time increased from 8s to 20s
 TSRRejectShading.usf permutation 4: compile time increased from 19s to 119s
 TSRRejectShading.usf permutation 5: compile time increased from 2s to 4s

Note that all of them are from the same shader, TSRRejectShading.usf.
And this one is the one that uses "enum" workaround.

Since we have a fix for the "AnyValue16" issue, I am going to replace "enum" with the existential type approach.
Because "enum" workaround wasn't efficient, I expect to have an improved compile time with the new approach.

from slang.

csyonghe avatar csyonghe commented on July 21, 2024

Unfortunately, Slang is generating a lot of boilerplate with the existential type approach too, so it is likely that we are still going to suffer from the compile time issue.

from slang.

csyonghe avatar csyonghe commented on July 21, 2024

This is because with the way the code is written, we are falling to the path that generate dynamic dispatch logic from the existential type code. DXC will eventually find out all the RTTI stuff we generate are actually static resolvable after inlining everything, but still that will result a lot of time spend during the optimziation passes.

from slang.

csyonghe avatar csyonghe commented on July 21, 2024

We can try something like:

interface IOperation
{
      static T init<T>(inout T state);
      static T add<T>(inout T state, T val);
      static T finalize<T>(T state);
}

from slang.

csyonghe avatar csyonghe commented on July 21, 2024

If we stay out of generic interfaces, then we can just use ordinary interfaces with ordinary generic functions, and things should be nice to the compiler.

from slang.

jkwak-work avatar jkwak-work commented on July 21, 2024

We can try something like:

interface IOperation
{
      static T init<T>(inout T state);
      static T add<T>(inout T state, T val);
      static T finalize<T>(T state);
}

You read my mind. That is what I was going to try next. I will share an update after making the changes and try out.

from slang.

jkwak-work avatar jkwak-work commented on July 21, 2024

I have made some improvement on my enum WAR in TSRRejectShader.usf.
https://gitlab-master.nvidia.com/jkwak/UnrealEngine_ngx/-/commit/9e8f1025a4d2123ffd0d78e8b2bf0bed6faef146

The improvement removed some of unnecessary dynamic branching logics.

DXC takes still longer to compile the shaders but not as much as before.

 TSRRejectShading.usf permutation 0: compile time increased from 6s to 9s (down from 17s)
 TSRRejectShading.usf permutation 1: compile time increased from 15s to 48s (down from 123s)
 TSRRejectShading.usf permutation 2: compile time increased from 2s to 2s (down from 3s)
 TSRRejectShading.usf permutation 3: compile time increased from 8s to 11s (down from 20s)
 TSRRejectShading.usf permutation 4: compile time increased from 19s to 55s (down from 119s)
 TSRRejectShading.usf permutation 5: compile time increased from 2s to 3s (down from 4s)

from slang.

jkwak-work avatar jkwak-work commented on July 21, 2024

With the improved WAR, the runtime is also improved.
The previous observation was that the runtime of TSR went up by 30% when Slang generated the hlsl.
With the improved WAR, the runtime is still little slower but not by far, 4%.

Without Slang, TSR takes 0.79ms on my current setup.
With Slang, TSR takes 0.82ms on the same setup.

from slang.

jkwak-work avatar jkwak-work commented on July 21, 2024

Thanks to Yong, I made some changes based on his suggestions.
And the compile time went down little more.
https://gitlab-master.nvidia.com/jkwak/UnrealEngine_ngx/-/commit/ec3aab179c60c41918457a016bd3339bb9529d69
It was only a few lines of changes but the improvement was more than expected.

TSRRejectShading.usf permutation 0: compile time increased from 6s to 8s (down from 9s)
TSRRejectShading.usf permutation 1: compile time increased from 15s to 36s (down from 48s)
TSRRejectShading.usf permutation 2: compile time increased from 2s to 2s (same as 2s)
TSRRejectShading.usf permutation 3: compile time increased from 8s to 8s (down from 11s)
TSRRejectShading.usf permutation 4: compile time increased from 19s to 32s (down from 55s)
TSRRejectShading.usf permutation 5: compile time increased from 2s to 3s (same as 3s)

It really came down to the permutation 1 and 4.
The differences on the permutations are following:

0 : DIM_FLICKERING_DETECTION=0 DIM_WAVE_SIZE=64
1 : DIM_FLICKERING_DETECTION=0 DIM_WAVE_SIZE=32
2 : DIM_FLICKERING_DETECTION=0 DIM_WAVE_SIZE=0
3 : DIM_FLICKERING_DETECTION=1 DIM_WAVE_SIZE=64
4 : DIM_FLICKERING_DETECTION=1 DIM_WAVE_SIZE=32
5 : DIM_FLICKERING_DETECTION=1 DIM_WAVE_SIZE=0

When I compare the permutation 0 and 1, I don't see anything obviously different other than permutation 1 will unroll little more.

from slang.

csyonghe avatar csyonghe commented on July 21, 2024

The runtime perf might be due to us not supporting wavesize. We need to support it.

from slang.

csyonghe avatar csyonghe commented on July 21, 2024

There is already an issue here: #3385

from slang.

csyonghe avatar csyonghe commented on July 21, 2024

Note that when COMPILER_SUPPORTS_WAVE_SIZE is defined, there will be different code. And that code is likely simpler to compile and run faster.

from slang.

jkwak-work avatar jkwak-work commented on July 21, 2024

I tried to see if the runtime performance is also improved.
But it is hard to tell when I am using RDP. The number fluctuate a lot more and I cannot read a reliable number.
I will try it on Monday again.

For COMPILER_SUPPORTS_WAVE_SIZE, I am not sure if it is a factor for the increased compile-time/runtime, because it is not defined when Slang is used and not used.

from slang.

csyonghe avatar csyonghe commented on July 21, 2024

My understanding is that when not using slang, dxc sees the HLSL source with WaveSize and is compiling the other branch. When using slang we no longer generate code that uses WaveSize which can be mire complicated and runs slower.

from slang.

jkwak-work avatar jkwak-work commented on July 21, 2024

The only difference on the macro define when Slang is used is that "COMPILER_SLANG" is set to 1.

	if (Input.Environment.CompilerFlags.Contains(CFLAG_UseSlang))
	{
		AdditionalDefines.SetDefine(TEXT("COMPILER_SLANG"), 1);
	}

https://gitlab-master.nvidia.com/jkwak/UnrealEngine_ngx/-/commit/5bb9bcd16e984a26d678b895c611c569015bf67e

The rest of macro defines are exactly the same with and without Slang.

from slang.

csyonghe avatar csyonghe commented on July 21, 2024

But I also see that there is a
#ifdef COMPILER_SLANG
#define COMPILER_SUPPORTS_WAVE_SIZE 0
...
#endif

And those things may have more profound consequences.

from slang.

jkwak-work avatar jkwak-work commented on July 21, 2024

Oh! You are right.
I almost forget about this.

#if SM6_PROFILE && !COMPILER_SLANG
        #define COMPILER_SUPPORTS_WAVE_SIZE 1
        #define WAVESIZE(N) [WaveSize(N)]
#endif

It looks definitely related.

I guess when the issue 3385 is resolved, we should revisit the compile-time issue.
I think all of the other compile time looks fine.

from slang.

jkwak-work avatar jkwak-work commented on July 21, 2024

I have a local change for Slang that uses [WaveSize(N)].
With the change, the compile time went down but only by little.

TSRRejectShading.usf permutation 0: compile time increased from 6s to 8s (same as before)
TSRRejectShading.usf permutation 1: compile time increased from 15s to 33s (down from 36s)
TSRRejectShading.usf permutation 2: compile time increased from 2s to 2s (same as before)
TSRRejectShading.usf permutation 3: compile time increased from 8s to 8s (same as before)
TSRRejectShading.usf permutation 4: compile time increased from 19s to 30s (down from 32s)
TSRRejectShading.usf permutation 5: compile time increased from 2s to 2s (down from 3s)

from slang.

csyonghe avatar csyonghe commented on July 21, 2024

from slang.

jkwak-work avatar jkwak-work commented on July 21, 2024

It appears that the cause of slower runtime is same as this.
In other words, TSRRejectShading.usf using WaveSize=32 increases both compile-time and runtime.

from slang.

csyonghe avatar csyonghe commented on July 21, 2024

OK. We are probably unlikely to make any further speculative progress without bisecting into the functions and figure out what is causing the difference.

from slang.

jkwak-work avatar jkwak-work commented on July 21, 2024

I am going to close this issue and open a new issue for the specific compile permutation setting.

from slang.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.