computeshaderfun's People
computeshaderfun's Issues
Global scan compute shader
Radix count compute shader
Allow scan shader to run only on warp
#7 needs to do a scan of counts, but the counts will always be of size 16, thus there is no need to sum up between thread groups. Global sync is needed however, so it might be best to do in another dispatch rather than just stuffing in at the end of the count shader.
A different shader program using a variant of the same compute shader is probably the way to go.
Radix re-order compute shader
Radix sort shader program
Depends on #2, #3, #4, #5, #8, #9.
Steps needed:
- Generate histogram (#5) and scan (#6)
- Count bitwise combinations and scan (#9)
- Re-order input using the results of steps 1 and 2 to sort input by radix (#8)
- Repeat 8 times to sort the entire input
All of these should be possible to dispatch right away.
Make scan and histogram shaders compatible
Options:
- Store histogram as AOS and make version of scan that operates on this structure.
- Store histogram as SOA and run a scan for each array using an offset per dispatch.
Option 1 has some advantages for smaller input sizes, as it can do all the scans at once. OTOH it increases register usage which is bad for performance. Larger input sizes should satisfy option 2 just fine, which could then be made to operate on multiple inputs from the same sequence, thus not increasing register usage as much.
Radix tree constructor
Create a compute shader for constructing a radix tree from a sorted integer buffer according to Karras2012.
Thread group scan compute shader
Modify radix sort to use separate value/key
Use unity camera matrix in basic ray tracer
There are weird issues right now, which could be resolved much easier by just using the CameraToWorld matrix from Unity
Distance field combination
It might make sense to combine a BVH with a distance field in some way to lessen backtracking. E.g. put cells all around the bounding volume (so maybe 4x4 cells for each face) containing the distance to either the closest direct child bounding volume, or maybe closest distance to actual geometry. This could avoid situations where a box is a really bad approximation of the geometry. Might also be that it's just too impractical to be useful.
BVH fitting
Calculate bounding boxes
BVH construction
Process multiple items per thread in scan
At least 4 items should be processed, as that will make each read 128 bits which is the stride recommended by NVIDIA, although the exact number should be found through experimentation.
In the spirit of the NVIDIA paper on GPU scan, the adds another level to the hierarchy (they do also state that CUDPP does this): Intra-thread. This brings each thread group to 4096 items processed, meaning that 4096^2=16.777.216 items can be scanned using the usual 3 dispatches (scan, scan, group add). I believe that a good method for scaling to larger numbers will be to simply increase the number of items processed by each thread, rather than applying more scans. Doubling the number of items processed will quadruple the number of items that can be scanned in 3 dispatches.
Collect buffers into a single using "buffer views"
A buffer view should contain a buffer reference, an offset and a limit.
Using this it should be possible to manually manage everything in 1 buffer. It might however not make sense to do so, must be looked into.
Space optimization for histogram
Currently all histograms are generated at once. If n elements are processed, then the histogram has a size of n*16, which is not nice. Also, every thread writing into 16 separate parts of an array is not nice.
The thing is that the 16 histograms can be stored in the size of 1. The parts of a scanned histogram that will be read from, is the parts where a 1 was in the original histogram. Since 2 histograms cannot ever have a 1 in the same place, this means that we only need n to store the final scanned histogram.
Turn ray tracer into image effect
Currently implemented as an editor window. To profile more precisely we need to export as game. Also, this will make it easier to support looking around. I imagine there being a hotkey for turning off rasterization and rendering using ray tracing only. Should also show fairly huge performance increases for BVH, kinda proving that it is not terrible.
Radix histogram generation compute shader
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.