“A sort-based deferred shading architecture for decoupled sampling” by Clarberg, Toth and Munkberg

  • ©Petrik Clarberg, Robert Toth, and Jacob Munkberg




    A sort-based deferred shading architecture for decoupled sampling

Session/Category Title: Hardware Rendering




    Stochastic sampling in time and over the lens is essential to produce photo-realistic images, and it has the potential to revolutionize real-time graphics. In this paper, we take an architectural view of the problem and propose a novel hardware architecture for efficient shading in the context of stochastic rendering. We replace previous caching mechanisms by a sorting step to extract coherence, thereby ensuring that only non-occluded samples are shaded. The memory bandwidth is kept at a minimum by operating on tiles and using new buffer compression methods. Our architecture has several unique benefits not traditionally associated with deferred shading. First, shading is performed in primitive order, which enables late shading of vertex attributes and avoids the need to generate a G-buffer of pre-interpolated vertex attributes. Second, we support state changes, e.g., change of shaders and resources in the deferred shading pass, avoiding the need for a single über-shader. We perform an extensive architectural simulation to quantify the benefits of our algorithm on real workloads.


    1. Akeley, K. 1993. RealityEngine Graphics. In Proceedings of SIGGRAPH 93, ACM, 109–116. Google ScholarDigital Library
    2. Akenine-Möller, T., Munkberg, J., and Hasselgren, J. 2007. Stochastic Rasterization using Time-Continuous Triangles. In Graphics Hardware, 7–16. Google ScholarDigital Library
    3. Andersson, M., Hasselgren, J., and Akenine-Möller, T. 2011. Depth Buffer Compression for Stochastic Motion Blur Rasterization. In High Performance Graphics, 127–134. Google ScholarDigital Library
    4. Boulos, S., Luong, E., Fatahalian, K., Moreton, H., and Hanrahan, P. 2010. Space-Time Hierarchical Occlusion Culling for Micropolygon Rendering with Motion Blur. In High Performance Graphics, 11–18. Google ScholarDigital Library
    5. Burns, C. A., Fatahalian, K., and Mark, W. R. 2010. A Lazy Object-Space Shading Architecture with Decoupled Sampling. In High Performance Graphics, 19–28. Google ScholarDigital Library
    6. Cook, R. L., Carpenter, L., and Catmull, E. 1987. The Reyes Image Rendering Architecture. In Computer Graphics (Proceedings of SIGGRAPH 87), ACM, vol. 21, 95–102. Google ScholarDigital Library
    7. Deering, M., Winner, S., Schediwy, B., Duffy, C., and Hunt, N. 1988. The Triangle Processor and Normal Vector Shader: A VLSI System for High Performance Graphics. In Computer Graphics (Proceedings of SIGGRAPH 88), ACM, vol. 22, 21–30. Google ScholarDigital Library
    8. Fuchs, H., Poulton, J., Eyles, J., Greer, T., Goldfeather, J., Ellsworth, D., Molnar, S., Turk, G., Tebbs, B., and Israel, L. 1989. Pixel-Planes 5: A Heterogeneous Multiprocessor Graphics System using Processor-Enhanced Memories. In Computer Graphics (Proceedings of SIGGRAPH 89), ACM, vol. 23, 79–88. Google ScholarDigital Library
    9. Harada, T., McKee, J., and Yang, J. C. 2012. Forward+: Bringing Deferred Lighting to the Next Level. In Eurographics 2012 — Short Papers, 5–8.Google Scholar
    10. Hasselgren, J., and Akenine-Möller, T. 2006. Efficient Depth Buffer Compression. In Graphics Hardware, 103–110. Google ScholarDigital Library
    11. Imagination Technologies Ltd., 2011. POWERVR Series5 Graphics — SGX architecture guide for developers.Google Scholar
    12. Joe, S., and Kuo, F. Y. 2008. Constructing Sobol Sequences with Better Two-Dimensional Projections. SIAM Journal on Scientific Computing, 30, 5, 2635–2654. Google ScholarDigital Library
    13. Laine, S., and Karras, T. 2011. Efficient Triangle Coverage Tests for Stochastic Rasterization. Tech. Rep. NVR-2011-003, NVIDIA Corporation, Sep.Google Scholar
    14. Laine, S., and Karras, T. 2011. Improved Dual-Space Bounds for Simultaneous Motion and Defocus Blur. Tech. Rep. NVR-2011-004, NVIDIA Corporation, Nov.Google Scholar
    15. Laine, S., Aila, T., Karras, T., and Lehtinen, J. 2011. Clipless Dual-Space Bounds for Faster Stochastic Rasterization. ACM Transactions on Graphics, 30, 106:1–106:6. Google ScholarDigital Library
    16. Lehtinen, J., Aila, T., Chen, J., Laine, S., and Durand, F. 2011. Temporal Light Field Reconstruction for Rendering Distribution Effects. ACM Transactions on Graphics, 30, 55:1–55:12. Google ScholarDigital Library
    17. Liktor, G., and Dachsbacher, C. 2012. Decoupled Deferred Shading for Hardware Rasterization. In Symposium on Interactive 3D Graphics and Games, 143–150. Google ScholarDigital Library
    18. McGuire, M., Enderton, E., Shirley, P., and Luebke, D. 2010. Real-Time Stochastic Rasterization on Conventional GPU Architectures. In High Performance Graphics, 173–182. Google ScholarDigital Library
    19. Morein, S. 2000. ATI Radeon HyperZ Technology. In Graphics Hardware, Hot3D Proceedings.Google Scholar
    20. Munkberg, J., and Akenine-Möller, T. 2011. Backface Culling for Motion Blur and Depth of Field. Journal of Graphics, GPU, and Game Tools, 15, 2, 123–139.Google ScholarCross Ref
    21. Munkberg, J., and Akenine-Möller, T. 2012. Hyperplane Culling for Stochastic Rasterization. In Eurographics 2012 — Short Papers, 105–108.Google Scholar
    22. Munkberg, J., Clarberg, P., Hasselgren, J., Toth, R., Sugihara, M., and Akenine-Möller, T. 2011. Hierarchical Stochastic Motion Blur Rasterization. In High Performance Graphics, 107–118. Google ScholarDigital Library
    23. Nilsson, J., Clarberg, P., Johnsson, B., Munkberg, J., Hasselgren, J., Toth, R., Salvi, M., and Akenine-Möller, T. 2012. Design and Novel Uses of Higher-Dimensional Rasterization. In High Performance Graphics, 1–11. Google ScholarDigital Library
    24. Olsson, O., and Assarsson, U. 2011. Tiled Shading. Journal of Graphics, GPU, and Game Tools, 15, 4, 235–251.Google ScholarCross Ref
    25. Olsson, O., Billeter, M., and Assarsson, U. 2012. Clustered Deferred and Forward Shading. In High Performance Graphics, 87–96. Google ScholarDigital Library
    26. Ragan-Kelley, J., Lehtinen, J., Chen, J., Doggett, M., and Durand, F. 2011. Decoupled Sampling for Graphics Pipelines. ACM Transactions on Graphics, 30, 3, 17:1–17:17. Google ScholarDigital Library
    27. Rasmusson, J., Hasselgren, J., and Akenine-Möller, T. 2007. Exact and Error-Bounded Approximate Color Buffer Compression and Decompression. In Graphics Hardware, 41–48. Google ScholarDigital Library
    28. Saito, T., and Takahashi, T. 1990. Comprehensible Rendering of 3-D Shapes. In Computer Graphics (Proceedings of SIGGRAPH 90), ACM, vol. 24, 197–206. Google ScholarDigital Library
    29. Seiler, L., Carmean, D., Sprangle, E., Forsyth, T., Abrash, M., Dubey, P., Junkins, S., Lake, A., Sugerman, J., Cavin, R., Espasa, R., Grochowski, E., Juan, T., and Hanrahan, P. 2008. Larrabee: A Many-Core x86 Architecture for Visual Computing. ACM Transactions on Graphics, 27, 3, 18:1–18:15. Google ScholarDigital Library
    30. Shirley, P., Aila, T., Cohen, J., Enderton, E., Laine, S., Luebke, D., and McGuire, M. 2011. A Local Image Reconstruction Algorithm for Stochastic Rendering. In Symposium on Interactive 3D Graphics and Games, 9–14. Google ScholarDigital Library
    31. Ström, J., Wennersten, P., Rasmusson, J., Hasselgren, J., Munkberg, J., Clarberg, P., and Akenine-Möller, T. 2008. Floating-Point Buffer Compression in a Unified Codec Architecture. In Graphics Hardware, 75–84. Google ScholarDigital Library
    32. Vaidyanathan, K., Toth, R., Salvi, M., Boulos, S., and Lefohn, A. 2012. Adaptive Image Space Shading for Motion and Defocus Blur. In High Performance Graphics, 13–21. Google ScholarDigital Library

ACM Digital Library Publication:

Overview Page: