“Reducing shading on GPUs using quad-fragment merging” by Fatahalian, Boulos, Hegarty, Akeley, Mark, et al. …

  • ©Kayvon Fatahalian, Solomon Boulos, James Hegarty, Kurt Akeley, William R. Mark, Henry Moreton, and Patrick (Pat) Hanrahan




    Reducing shading on GPUs using quad-fragment merging



    Current GPUs perform a significant amount of redundant shading when surfaces are tessellated into small triangles. We address this inefficiency by augmenting the GPU pipeline to gather and merge rasterized fragments from adjacent triangles in a mesh. This approach has minimal impact on output image quality, is amenable to implementation in fixed-function hardware, and, when rendering pixel-sized triangles, requires only a small amount of buffering to reduce overall pipeline shading work by a factor of eight. We find that a fragment-shading pipeline with this optimization is competitive with the REYES pipeline approach of shading at micropolygon vertices and, in cases of complex occlusion, can perform up to two times less shading work.


    1. Akeley, K. 1993. RealityEngine graphics. In Proceedings of SIGGRAPH 93, ACM Press / ACM SIGGRAPH, Computer Graphics Proceedings, Annual Conference Series, ACM, 109–116. Google ScholarDigital Library
    2. Apodaca, A. A., and Gritz, L. 2000. Advanced RenderMan: Creating CGI for Motion Pictures. Morgan Kaufmann. Google ScholarDigital Library
    3. Blythe, D. 2006. The Direct3D 10 system. ACM Transactions on Graphics 25, 3 (Aug), 724–734. Google ScholarDigital Library
    4. Cook, R., Carpenter, L., and Catmull, E. 1987. The Reyes image rendering architecture. In Computer Graphics (Proceedings of SIGGRAPH 87), ACM, vol. 27, 95–102. Google ScholarDigital Library
    5. Deering, M., Winner, S., Schediwy, B., Duffy, C., and Hunt, N. 1988. The triangle processor and normal vector shader: a VLSI system for high performance graphics. In Computer Graphics (Proceedings of SIGGRAPH 88), ACM, vol. 22, 21–30. Google ScholarDigital Library
    6. Fatahalian, K., Luong, E., Boulos, S., Akeley, K., Mark, W. R., and Hanrahan, P. 2009. Data-parallel rasterization of micropolygons with defocus and motion blur. In HPG ’09: Proceedings of the Conference on High Performance Graphics 2009, ACM, 59–68. Google ScholarDigital Library
    7. Fisher, M., Fatahalian, K., Boulos, S., Akeley, K., Mark, W. R., and Hanrahan, P. 2009. DiagSplit: parallel, crack-free, adaptive tessellation for micropolygon rendering. ACM Transactions on Graphics 28, 5, 1–10. Google ScholarDigital Library
    8. Greene, N., Kass, M., and Miller, G. 1993. Hierarchical z-buffer visibility. In Proceedings of SIGGRAPH 93, ACM Press / ACM SIGGRAPH, Computer Graphics Proceedings, Annual Conference Series, ACM, 231–238. Google ScholarDigital Library
    9. Kessenich, J., 2009. The OpenGL Shading Language Specification, language version 1.5.Google Scholar
    10. Microsoft, 2010. Windows DirectX graphics documentation. http://msdn.microsoft.com/en-us/library/ee663301 {Online; accessed 27-April-2010}.Google Scholar
    11. Molnar, S., Eyles, J., and Poulton, J. 1992. PixelFlow: high-speed rendering using image composition. In Computer Graphics (Proceedings of SIGGRAPH 92), ACM, vol. 26, 231–240. Google ScholarDigital Library
    12. Patney, A., and Owens, J. D. 2008. Real-time Reyes-style adaptive surface subdivision. ACM Transactions on Graphics 27, 5, 1–8. Google ScholarDigital Library
    13. Ragan-Kelley, J., Lehtinen, J., Chen, J., Doggett, M., and Durand, F. 2010. Decoupled sampling for real-time graphics pipelines. MIT Computer Science and Artificial Intelligence Laboratory Technical Report Series, MIT-CSAIL-TR-2010-015.Google Scholar
    14. Wexler, D., Gritz, L., Enderton, E., and Rice, J. 2005. GPU-accelerated high-quality hidden surface removal. In HWWS ’05: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware, ACM, ACM, 7–14. Google ScholarDigital Library
    15. Zhou, K., Hou, Q., Ren, Z., Gong, M., Sun, X., and Guo, B. 2009. RenderAnts: interactive reyes rendering on gpus. ACM Transactions on Graphics 28, 5, 1–11. Google ScholarDigital Library

ACM Digital Library Publication: