Comments (23)
The UE5 integration is creating a slangGlobalSession
every time when compiling an HLSL file.
creating global session is a very expensive operation and is only intended to be called once by the application, and having the resulting global session shared for the entire application life-time. This should cut a lot of compile time from the benchmark.
We still need to investigate the increased dxc time though.
from slang.
There are 10 shaders for UE TSR.
And there are 40 permutations.
The compile time for most of shaders are unchanged and they take under 1 second.
I found that following five permutations are taking longer with DXC for compiling the HLSL output from Slang.
TSRRejectShading.usf permutation 0: compile time increased from 6s to 17s
TSRRejectShading.usf permutation 1: compile time increased from 15s to 123s
TSRRejectShading.usf permutation 2: compile time increased from 2s to 3s
TSRRejectShading.usf permutation 3: compile time increased from 8s to 20s
TSRRejectShading.usf permutation 4: compile time increased from 19s to 119s
TSRRejectShading.usf permutation 5: compile time increased from 2s to 4s
Note that all of them are from the same shader, TSRRejectShading.usf.
And this one is the one that uses "enum" workaround.
Since we have a fix for the "AnyValue16" issue, I am going to replace "enum" with the existential type approach.
Because "enum" workaround wasn't efficient, I expect to have an improved compile time with the new approach.
from slang.
Unfortunately, Slang is generating a lot of boilerplate with the existential type approach too, so it is likely that we are still going to suffer from the compile time issue.
from slang.
This is because with the way the code is written, we are falling to the path that generate dynamic dispatch logic from the existential type code. DXC will eventually find out all the RTTI stuff we generate are actually static resolvable after inlining everything, but still that will result a lot of time spend during the optimziation passes.
from slang.
We can try something like:
interface IOperation
{
static T init<T>(inout T state);
static T add<T>(inout T state, T val);
static T finalize<T>(T state);
}
from slang.
If we stay out of generic interfaces, then we can just use ordinary interfaces with ordinary generic functions, and things should be nice to the compiler.
from slang.
We can try something like:
interface IOperation { static T init<T>(inout T state); static T add<T>(inout T state, T val); static T finalize<T>(T state); }
You read my mind. That is what I was going to try next. I will share an update after making the changes and try out.
from slang.
I have made some improvement on my enum WAR in TSRRejectShader.usf.
https://gitlab-master.nvidia.com/jkwak/UnrealEngine_ngx/-/commit/9e8f1025a4d2123ffd0d78e8b2bf0bed6faef146
The improvement removed some of unnecessary dynamic branching logics.
DXC takes still longer to compile the shaders but not as much as before.
TSRRejectShading.usf permutation 0: compile time increased from 6s to 9s (down from 17s)
TSRRejectShading.usf permutation 1: compile time increased from 15s to 48s (down from 123s)
TSRRejectShading.usf permutation 2: compile time increased from 2s to 2s (down from 3s)
TSRRejectShading.usf permutation 3: compile time increased from 8s to 11s (down from 20s)
TSRRejectShading.usf permutation 4: compile time increased from 19s to 55s (down from 119s)
TSRRejectShading.usf permutation 5: compile time increased from 2s to 3s (down from 4s)
from slang.
With the improved WAR, the runtime is also improved.
The previous observation was that the runtime of TSR went up by 30% when Slang generated the hlsl.
With the improved WAR, the runtime is still little slower but not by far, 4%.
Without Slang, TSR takes 0.79ms on my current setup.
With Slang, TSR takes 0.82ms on the same setup.
from slang.
Thanks to Yong, I made some changes based on his suggestions.
And the compile time went down little more.
https://gitlab-master.nvidia.com/jkwak/UnrealEngine_ngx/-/commit/ec3aab179c60c41918457a016bd3339bb9529d69
It was only a few lines of changes but the improvement was more than expected.
TSRRejectShading.usf permutation 0: compile time increased from 6s to 8s (down from 9s)
TSRRejectShading.usf permutation 1: compile time increased from 15s to 36s (down from 48s)
TSRRejectShading.usf permutation 2: compile time increased from 2s to 2s (same as 2s)
TSRRejectShading.usf permutation 3: compile time increased from 8s to 8s (down from 11s)
TSRRejectShading.usf permutation 4: compile time increased from 19s to 32s (down from 55s)
TSRRejectShading.usf permutation 5: compile time increased from 2s to 3s (same as 3s)
It really came down to the permutation 1 and 4.
The differences on the permutations are following:
0 : DIM_FLICKERING_DETECTION=0 DIM_WAVE_SIZE=64
1 : DIM_FLICKERING_DETECTION=0 DIM_WAVE_SIZE=32
2 : DIM_FLICKERING_DETECTION=0 DIM_WAVE_SIZE=0
3 : DIM_FLICKERING_DETECTION=1 DIM_WAVE_SIZE=64
4 : DIM_FLICKERING_DETECTION=1 DIM_WAVE_SIZE=32
5 : DIM_FLICKERING_DETECTION=1 DIM_WAVE_SIZE=0
When I compare the permutation 0 and 1, I don't see anything obviously different other than permutation 1 will unroll little more.
from slang.
The runtime perf might be due to us not supporting wavesize. We need to support it.
from slang.
There is already an issue here: #3385
from slang.
Note that when COMPILER_SUPPORTS_WAVE_SIZE is defined, there will be different code. And that code is likely simpler to compile and run faster.
from slang.
I tried to see if the runtime performance is also improved.
But it is hard to tell when I am using RDP. The number fluctuate a lot more and I cannot read a reliable number.
I will try it on Monday again.
For COMPILER_SUPPORTS_WAVE_SIZE, I am not sure if it is a factor for the increased compile-time/runtime, because it is not defined when Slang is used and not used.
from slang.
My understanding is that when not using slang, dxc sees the HLSL source with WaveSize and is compiling the other branch. When using slang we no longer generate code that uses WaveSize which can be mire complicated and runs slower.
from slang.
The only difference on the macro define when Slang is used is that "COMPILER_SLANG" is set to 1.
if (Input.Environment.CompilerFlags.Contains(CFLAG_UseSlang))
{
AdditionalDefines.SetDefine(TEXT("COMPILER_SLANG"), 1);
}
The rest of macro defines are exactly the same with and without Slang.
from slang.
But I also see that there is a
#ifdef COMPILER_SLANG
#define COMPILER_SUPPORTS_WAVE_SIZE 0
...
#endif
And those things may have more profound consequences.
from slang.
Oh! You are right.
I almost forget about this.
#if SM6_PROFILE && !COMPILER_SLANG
#define COMPILER_SUPPORTS_WAVE_SIZE 1
#define WAVESIZE(N) [WaveSize(N)]
#endif
It looks definitely related.
I guess when the issue 3385 is resolved, we should revisit the compile-time issue.
I think all of the other compile time looks fine.
from slang.
I have a local change for Slang that uses [WaveSize(N)].
With the change, the compile time went down but only by little.
TSRRejectShading.usf permutation 0: compile time increased from 6s to 8s (same as before)
TSRRejectShading.usf permutation 1: compile time increased from 15s to 33s (down from 36s)
TSRRejectShading.usf permutation 2: compile time increased from 2s to 2s (same as before)
TSRRejectShading.usf permutation 3: compile time increased from 8s to 8s (same as before)
TSRRejectShading.usf permutation 4: compile time increased from 19s to 30s (down from 32s)
TSRRejectShading.usf permutation 5: compile time increased from 2s to 2s (down from 3s)
from slang.
from slang.
It appears that the cause of slower runtime is same as this.
In other words, TSRRejectShading.usf using WaveSize=32 increases both compile-time and runtime.
from slang.
OK. We are probably unlikely to make any further speculative progress without bisecting into the functions and figure out what is causing the difference.
from slang.
I am going to close this issue and open a new issue for the specific compile permutation setting.
from slang.
Related Issues (20)
- [MDL:4/5] Benchmark the runtime performance HOT 1
- [MDL:5/5] Add support for embedded precompiled library feature HOT 1
- Enable warnings-as-error on CI tests
- Slang parser incorrectly recognizes line continuations HOT 1
- slang not annotating array as restricted or aliased
- Capability System: Simplify CapabilitySet Diagnostic Printing
- Capability System: Fully define all capabilities meant for a user with 2 representations `_Internal_atom`/`External_atom` HOT 1
- Need further development of built-in interfaces
- Capability System: Simplify how we report incompatibilities of CapabilitySets to users
- Ignore CI when only `docs/proposal` is changed.
- `all` and/or `==` doesn't work as intended on metal. HOT 1
- ConstantBuffer inside a ParameterBlock breaks the compiler when targeting metal.
- AST: `CheckSwizzleExpr` can cause Stack around the variable 'elementIndices' was corrupted.
- AST: `CheckSwizzleExpr` can cause `Stack around the variable 'elementIndices' was corrupted` error
- Address compiler warnings HOT 3
- Reflection API: Allow querying backward and forward derivative methods for `public` methods that are also `[Differentiable]`
- Allow non-static methods on structs to be differentiated through `bwd_diff` and `fwd_diff`
- Reflection API: Allow lookups of derivative member decorations
- SPIRV validation error message is no longer printed
- DXIL validation errors with uninitialized fields HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from slang.