Hi, I've got a question about Adreno 540 texkill/discard performance. I am trying to decide whether to modify shaders that use texkill to use alpha blending instead. My traditional understanding of texkill performance is it's ok as long as you have depth write disabled, and potentially better than blending if there are large enough texkilled areas that entire thread groups can terminate early and avoid raster operations.
The Adreno SDK has this to say on the subject, which sounds like it is in keeping with my understanding:
9.1.14 Avoid discarding pixels in the fragment shader
Some developers believe that manually discarding, also know an killing, pixels in the fragment shader boosts performance. The rules are not that simple for two reasons:
If some pixels in a thread are killed, and others are not, the shader still executes
It depends on how the shader compiler generates microcode
In theory, if all pixels in a thread are killed, the GPU will stop processing that thread as soon as possible. In practice, discard operations can disable hardware optimizations. If a shader cannot avoid discard operations, attempt to render geometry, which depends on them after opaque draw calls.
--
However, I have also heard from other sources that texkill on the Adreno 540 causes the GPU to have to flush and block and that the geometry has to be rebinned when texkill is used, which sounds bad. This documentation doesn't explicitly mention that, but perhaps that is what it's referring to when it says discard can disable hardware optimizations, although I would have expected that to mean something more along the lines of it disabling a feature like Early-Z when depth-write is enabled.
I am familiar with ImgTec's publically available PowerVR documentation that strongly recommends preferring blending to texkill because texkill breaks their hidden surface removal algorithms. I assume that this is only true for PowerVR, but you don't give as much detail about how your hardware works, which makes me nervous, but also reinforces my assumption.
Can you be more explicit about what optimizations texkill can disable, and whether disabling them depends on whether depth write or other features are enabled?