The AMD FidelityFX SDK v1.1 unleashed Brixelizer and Brixelizer GI to the world. In this blog, we aim to discuss a few practical use cases and provide you with some tips you can apply for getting the most performance out of Brixelizer in your application.
Soft shadows
With ray-traced shadows becoming more and more prevalent, having an alternative to hardware-accelerated ray-tracing is ideal for lower-end GPUs.
In this example we have set up the Flying World scene with 1spp ray-traced soft shadows using both Brixelizer and DXR. The following is a comparison of the raw 1spp ray tracing output between DXR and Brixelizer.
left: DXR, right: Brixelizer
As you can see, the output from Brixelizer is near-identical to DXR. Inaccuracies are expected due to the lower-frequency nature of the SDF scene representation, but as the light source becomes larger, these discrepancies would be less noticeable due to the shadows becoming softer.
When it comes to tracing performance, Brixelizer is faster than DXR in this use case for the majority of time. The following is a performance comparison on an AMD Radeon™ RX 7900 XTX.
However the performance gap becomes smaller in smaller scenes such as Sponza.
left: DXR, right: Brixelizer
Brixelizer and DXR trade blows in this scene as they both perform very similarly.
General tips
Brixelizer tracing performance can be improved at the cost of quality by increasing the voxel size which would make hard shadows look less accurate but be an acceptable trade-off for soft shadows with large penumbrae.
Similar to DXR, Brixelizer performance also suffers as rays diverge. So in the case of soft shadows be wary of the size of your light sources.
Ambient Occlusion
Ambient Occlusion is a great way to give your scene some nice indirect shadowing without going all out with Global Illumination. Using Brixelizer you can add even more accurate AO without the limitations of screen space techniques.
Similar to soft shadows, we set up the same Flying World scene with 1spp ray-traced Ambient Occlusion with Brixelizer and DXR. We are randomly sampling the hemisphere around the pixel normal and firing off a visiblity ray. Here is a comparision of the raw 1spp output between DXR and Brixelizer.
left: DXR, right: Brixelizer
Again, any mismatches are due to the low-frequency nature of the SDF. Since the surface represented by the SDF might be slightly above the actual geometry you will need to adjust the ray bias to offset the ray origin along the normal in order to match the output from DXR.
Brixelizer performs faster than the DXR implementation in the Flying World scene with relatively short rays with a voxel size of 3.0.
However in Sponza the same length rays performs similar to DXR with a voxel size of 0.2 which provides a similar level of detail at this scale.
left: DXR, right: Brixelizer
We could also use longer rays in order to get large-scale occlusion. Again, we can match the DXR result rather closely when tracing with Brixelizer.
left: DXR, right: Brixelizer
With a distance field of this quality, things become much worse for Brixelizer with longer rays. However, we can claw back performance by adjusting the voxel size to give us a coarser representation of the scene that we can trace through much faster.
Using a voxel size of 2.0 puts Brixelizer back in the lead. As you can see below this larger voxel size results in a much more coarser SDF.
left: Voxel size = 0.2, right: Voxel size = 2.0
Despite this, the final ambient occlusion outputs are visually similar with some added ray bias to bump the ray up to the surface.
left: Voxel size = 0.2, right: Voxel size = 2.0
General tips
Longer rays will perform worse than shorter ones, but you can trade accuracy for more performance by increasing the voxel size.
Increase ray bias and T-min values to prevent self-occlusion due to the coarseness of the SDF.
Cascade updates
In addition to tracing performance, we also need to take special care to make sure the SDF generation is also as fast as it can be. There are a few factors that contribute to this and we shall go over them in more detail (no pun intended).
Level of Detail (LOD)
Mesh LOD plays an important part when it comes to the update cost of Brixelizer as that directly corresponds to the number of triangles that are required to be voxelized per-frame. While this isn’t much of an issue for static geometry, it poses a huge problem for moving or animated objects as these can severely slow down the voxelization stage of Brixelizer due to dynamic objects being voxelized every frame. This issue is amplified when updating larger cascades as you have to process even more dynamic objects within a single frame.
We have taken the above Toy Shop scene and forced a full-rebuild of the SDF during each frame to exaggerate the performance cost of LODs. At LOD 0 the scene contains roughly 40 million triangles. The following graph shows the cost of a full rebuild with LOD 0 vs LOD 1.
As mentioned before, these numbers are with a forced full rebuild of the scene. However, when only rebuilding the dynamic portions of the scene we still see a performance gain, although it is not as significant as the previous example simply due to this scene not having too many dynamic elements.
Voxel size
Another factor that contributes to update performance is the voxel size. The larger the voxel size, the coarser the resulting distance field is. This improves update performance as it can lead to fewer voxel-triangle references being generated.
The following graph shows performance results for the update time for cascade 0 with various voxel sizes in the Toy Shop scene with a full cascade rebuild every frame.
Here we can see that doubling the voxel size from 0.3 to 0.6 gives us a significant reduction to the update cost for cascade 0. However, going from 0.6 to 0.9 barely gives us any improvement. This is due to each cascade having a fixed voxel grid resolution of 64x64x64 voxels. This means that increasing the voxel size also increases the overall foot print of the cascade, as visualized below.
Therefore, we will have to process more triangles in the cascade with the larger voxel size. This could lead to lower performance in certain cases as well. So it requires careful tweaking in order to strike the best balance.
Conclusions
AMD FidelityFX Brixelizer offers a variety of options for reaching the quality and performance balance needed for your application. So let’s recap the key takeaways from this blog:
Increase voxel size to improve tracing performance.
Divergent rays are more expensive to trace than coherant rays.
Apply a larger ray bias to compensate for the mismatch between the depth buffer and SDF surface.
Use a more coarse level of detail for your mesh assets to improve SDF update performance.
Increase voxel size to an extent if update performance is still not ideal.
Have fun tracing rays with Brixelizer!
Useful links
AMD FidelityFX SDK
Learn more about AMD FidelityFX SDK v1.1 (blog post)
AMD FidelityFX SDK on GPUOpen
AMD FidelityFX Brixelizer
AMD FidelityFX Brixelizer on GPUOpen
AMD FidelityFX Brixelizer documentation
AMD FidelityFX Brixelizer GI sample (same sample for both)
AMD FidelityFX Brixelizer GI
AMD FidelityFX Brixelizer GI on GPUOpen
AMD FidelityFX Brixelizer GI documentation
Performance testing done on following system: AMD Ryzen™ Threadripper™ PRO 3975WX, AMD Radeon RX 7900 XTX (AMD Software: Adrenalin Edition 24.2.1) 128GB DDR4-3600 memory, ASUS Pro WS WRX80E-SAGE SE WIFI Motherboard, 1TB M.2 NVME SSD, Windows® 10 Pro 22H2