AZDO OpenGL techniques including visibility culling and LOD selection inside the GPU
For more information about AZDO, checkout this video
We implement three techniques in the GPU:
-
Hardware instancing: Fast rendering of repeated geometries
- Single draw call for all instances
- Avoids API overhead
- Access per-instance data
- Use gl_InstanceID inside vertex shader
- 3x4 transformation matrix
- Combined with culling
- gl_InstanceID -> visible buffer -> instance data
- Single draw call for all instances
-
Visibility culling Discard geometries outside frustum and occluded
- Compute shader
- Test AABB vs frustum in clip space
- Compute screen-space 2D AABB and its min-z
- Fetch max depth from hierarchical-z 2D texture
- If AABB min-z < max depth, store 1 else 0
- How to build hi-z map
- Draw occluders and use depth with mipmapping
- logN rendering passes gather max of 4 texels=
- Compute shader
-
Level of detail: Choose discrete levels during rendering
- Compute shader
- Gather instances with visible == 1
- Use distance to camera to choose LOD
- Write to different output buffer per LOD
- Keep track of output index with atomic counters
- Render all LOD at once
- glMultiDrawElementsIndirect
- drawCmds[i].baseInstance = i
- LOD level vertex attrib w/ divisor = # instance + 1
- Compute shader
The general rendering algorithm consists of the following steps:
- Render previously visible
- Update hierarchical-z mipmaps
- Test AABBs for visibility and store in current
- Gather visible in current and not last frames
- Render newly visible
- Gather visible in current for next frame
- Swap current with last
Here are some images of a test scene consisting solely of cylinders and corresponding performance results: