Comments (15)
Yes I don't see much of a slowdown between bloom on/off. Just to check are you only toggling bloom? You're leaving tonemapping and Camera::hdr the same between runs?
from bevy.
I've also noticed this with simple scenes. It should be possible to rewrite bloom to use a compute shader for down- and upscaling like SPD.
from bevy.
RenderDoc is not really a great GPU profiling tool. You'll want to use NSight/RGP/Xcode/GPA/etc (Nvidia, AMD, Apple, Intel) depending on your GPU manufacturer.
CPU metrics are also important for rendering - maybe recording so many render passes is expensive. Tracy will show you that.
I can take a look later this week and figure out why it's expensive.
from bevy.
RenderDoc is not really a great GPU profiling tool. You'll want to use NSight/RGP/Xcode/GPA/etc (Nvidia, AMD, Apple, Intel) depending on your GPU manufacturer.
CPU metrics are also important for rendering - maybe recording so many render passes is expensive. Tracy will show you that.
I can take a look later this week and figure out why it's expensive.
Thanks for the tip. Will definitely check them out and even give Tracy a go next time I do some profiling.
The performance hit happens without profiling and can be seen in the first two bloom on/off pictures. Those pictures are without profiling. There I am just using bevy frame diagnostics to track the frames per second. Sorry for not clarifying that when using those pictures to explain the issue.
In this case - all the profiling is doing is confirming the >50% reduction is in fact coming from the bloom pass. The bloom pass taking up just over 50% of the render time in the profiling lines up 1:1 with the framerate reduction seen when not profiling.
Currently I'm taking a look into the first two downsampling and last two upsampling passes in the code to see if I can get more information or optimize anything. Please also ignore me open/closing the issue. Misclick!
from bevy.
I reduced bloom.wgsl to its simplest form and only noticed a negligible increase in performance with bevy frame diagnostics and the profiler (somewhere between 5-10 fps improvement?). Here is the reduced code:
struct BloomUniforms {
threshold_precomputations: vec4<f32>,
viewport: vec4<f32>,
aspect: f32,
};
@group(0) @binding(0) var input_texture: texture_2d<f32>;
@group(0) @binding(1) var s: sampler;
@group(0) @binding(2) var<uniform> uniforms: BloomUniforms;
fn rgb_to_srgb_simple(color: vec3<f32>) -> vec3<f32> {
return pow(color, vec3<f32>(1.0 / 2.2));
}
fn sample_input_4_tap(uv: vec2<f32>) -> vec3<f32> {
let j = textureSample(input_texture, s, uv, vec2<i32>(-1, 1)).rgb;
let k = textureSample(input_texture, s, uv, vec2<i32>(1, 1)).rgb;
let l = textureSample(input_texture, s, uv, vec2<i32>(-1, -1)).rgb;
let m = textureSample(input_texture, s, uv, vec2<i32>(1, -1)).rgb;
var sample = (j + k + l + m) * 0.125;
return sample;
}
fn sample_input_mini_tent(uv: vec2<f32>) -> vec3<f32> {
let x = 0.004 / uniforms.aspect;
let y = 0.004;
let e = textureSample(input_texture, s, vec2<f32>(uv.x, uv.y)).rgb;
let a = textureSample(input_texture, s, vec2<f32>(uv.x - x, uv.y + y)).rgb;
let c = textureSample(input_texture, s, vec2<f32>(uv.x + x, uv.y + y)).rgb;
let g = textureSample(input_texture, s, vec2<f32>(uv.x - x, uv.y - y)).rgb;
var sample = (a + c + g + e) * 0.25;
return sample;
}
@fragment
fn downsample_first(@location(0) output_uv: vec2<f32>) -> @location(0) vec4<f32> {
//let sample_uv = output_uv;
var sample = sample_input_4_tap(output_uv);
return vec4<f32>(sample, 1.0);
}
@fragment
fn downsample(@location(0) uv: vec2<f32>) -> @location(0) vec4<f32> {
return vec4<f32>(sample_input_4_tap(uv), 1.0);
}
@fragment
fn upsample(@location(0) uv: vec2<f32>) -> @location(0) vec4<f32> {
return vec4<f32>(sample_input_mini_tent(uv), 1.0);
}
from bevy.
After moving on to the render code.. So far my only clue has been this:
When I divide into the mip dimensions in bloom/mod.rs to reduce it, I get a good portion of the the frames back.
Obviously this isn't a solution or anything. Just sharing what I found before throwing my hands in the air for the day.
from bevy.
On my system, at 1080p, it takes 355 microseconds on the CPU (according to Tracy) to encode rendering commands for bloom, and 0.20ms of GPU time (according to NSight) to execute those commands.
2024-04-27T06:53:32.154332Z INFO bevy_diagnostic::system_information_diagnostics_plugin::internal: SystemInfo { os: "Windows 11 Home", kernel: "22631", cpu: "AMD Ryzen 5 2600 Six-Core Processor", core_count: "6", memory: "15.9 GiB" }
2024-04-27T06:53:32.710031Z INFO bevy_render::renderer: AdapterInfo { name: "NVIDIA GeForce RTX 3080", vendor: 4318, device: 8710, device_type: DiscreteGpu, driver: "NVIDIA", driver_info: "551.61", backend: Vulkan }
from bevy.
I'm probably a bit newer to graphics programming than some of you @JMS55. Can you go into detail into what that means for you? Are you not seeing a huge hit to your framerate using that Nvidia card with bloom enabled? Here are my AMD profiler results (1200p 16:10)
Bloom/hdr/tonemapping on (250fps):
Here is off (700fps):
If you aren't seeing the same issue perhaps it's just another AMD "feature" and can be marked as a driver bug for now?
from bevy.
I'm turning all three on yes, but tonemapping itself can be on/off without a difference. I'll provide the code just to fully clarify.
250fps (hdr on w/ bloom on and tonemapping (optional):
fn setup(mut commands: Commands) {
commands.spawn((
Camera2dBundle {
camera: Camera {
hdr: true,
..Default::default()
},
//tonemapping: Tonemapping::AcesFitted
..default()
},
Name::new("light_camera"),
BloomSettings::default(),
));
}
700-750+fps (nothing on):
fn setup(mut commands: Commands) {
commands.spawn((
Camera2dBundle {
camera: Camera {
..Default::default()
},
..default()
},
Name::new("light_camera"),
));
}
500fps (hdr on w/ bloom off):
fn setup(mut commands: Commands) {
commands.spawn((
Camera2dBundle {
camera: Camera {
hdr: true,
..Default::default()
},
..default()
},
Name::new("light_camera"),
));
}
And this happens in every case I've tried running it. Different version, local fresh bevy deps, using bevys examples etc.
from bevy.
Could you use frame times instead of frame rates?
700 fps: 1.43ms
500 fps: 2.00ms (so, 0.57ms slower)
250 fps: 4.00ms (an additional 2ms slower)
from bevy.
@superdump sure np i'll make sure to convert to ms whenever I can moving forward. Sorry about that.
from bevy.
It might also help to note that this occurs even without any actual bloom in the scene. Blank screens get the same hit when changing the camera settings.
from bevy.
Just to confirm what @JMS55 was saying, I tested it on an older Nvidia machine with a different display (1050 ti, 1080p display) also using Vulkan and the performance loss was a tad bit less, but for me it was still around 25-30% on average. I'd be interested in seeing benchmarks on different resolutions. For now I'm just going to take the node out, as I don't really need it in my graphics pipeline atm anyways. If there's anything else I can share that will help, just let me know.
from bevy.
I also get this 50% performance hit on bloom_2d.
SystemInfo { os: "Linux 23.10 Ubuntu", kernel: "6.5.0-28-generic", cpu: "Intel(R) Core(TM) i5-4670K CPU @ 3.40GHz", core_count: "4", memory: "23.3 GiB" }
AdapterInfo { name: "AMD Radeon RX 580 Series (RADV POLARIS10)", vendor: 4098, device: 26591, device_type: DiscreteGpu, driver: "radv", driver_info: "Mesa 23.2.1-1ubuntu3.1", backend: Vulkan }
from bevy.
It's possible this is due to the large number of separate render passes that do very little to no work.
Especially on lower mip levels the GPU does mostly nothing and is stalled by barriers (previous passes) and other overhead:
snapshot of a frame in the bloom_3d example
It also looks like there's a clear pass for each level of the mip chain, which I feel like isn't needed.
Edit: I forgot to point out what I think is most important:
The overall complexity of this scene is extremely simple, hence taking ~0.25 milliseconds of actual GPU time for a frame on my machine. Bloom itself is a "fixed" cost, only dependent on the resolution - so it doesn't tell you much to compare the ratios of bloom on vs off, especially in such a simple case.
Because of that I'd expect the bloom cost to mostly vanish when there is an actual workload that draws more than just a handful of meshes. While I think there's definitely room for improvement with the bloom implementation, I don't think it's performance is a major concern unless users see a GPU time for bloom that is significantly larger (more than 0.5-1ms).
from bevy.
Related Issues (20)
- Text styles should be inheritable HOT 1
- Support user interaction in `ci_testing` HOT 1
- UI Node's transform is sometimes incorrectly calculated HOT 17
- Setting 'Specular Transmission' to a value higher than 0.0 on a standard material (or ext material) -> it wont write to depth buffer HOT 1
- Adding VolumetricLight makes objects near the camera brighter HOT 1
- Re-using transform in other apps requires a lot of deps HOT 2
- bevy_app doesn't compile without default features
- Implement `Reflect` for all `bevy_math` types HOT 1
- ComputedStates are not linked from the `States` documentation
- Debugger Remix: Terminal outputting ALSA function 'snd_pcm_poll_descriptors' failed with error 'UnknownErrno: Unknown errno'
- Examples should use the `From<Color>` impl for `StandardMaterial`? HOT 2
- Lightmaps break when deferred rendering is enabled HOT 1
- STATUS_ACCESS_VIOLATION in CorePipelinePlugin HOT 13
- Allow me to rotate a UiImage node or ui texture HOT 1
- Panic when `bevy_pbr` feature is not included
- SyncEntityPool Proposal HOT 1
- Docs for `NextState` are outdated HOT 1
- Increase default font size HOT 4
- Buttons do not update in color_grading example
- Range<f32> is reflectable but not serializable
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bevy.