“A high-performance software graphics pipeline architecture for the GPU” by Kenzel, Kerbl, Schmalstieg and Steinberger

  • ©Michael Kenzel, Bernhard Kerbl, Dieter Schmalstieg, and Markus Steinberger



Entry Number: 140


    A high-performance software graphics pipeline architecture for the GPU

Session/Category Title: Pipelines and Languages for the GPU




    In this paper, we present a real-time graphics pipeline implemented entirely in software on a modern GPU. As opposed to previous work, our approach features a fully-concurrent, multi-stage, streaming design with dynamic load balancing, capable of operating efficiently within bounded memory. We address issues such as primitive order, vertex reuse, and screen-space derivatives of dependent variables, which are essential to real-world applications, but have largely been ignored by comparable work in the past. The power of a software approach lies in the ability to tailor the graphics pipeline to any given application. In exploration of this potential, we design and implement four novel pipeline modifications. Evaluation of the performance of our approach on more than 100 real-world scenes collected from video games shows rendering speeds within one order of magnitude of the hardware graphics pipeline as well as significant improvements over previous work, not only in terms of capabilities and performance, but also robustness.


    1. Timo Aila and Samuli Laine. 2009. Understanding the Efficiency of Ray Traversal on GPUs. In Proceedings of the Conference on High Performance Graphics 2009 (HPG ’09). ACM, New York, NY, USA, 145–149. Google ScholarDigital Library
    2. Johan Andersson. 2009. Parallel Graphics in Frostbite – Current & Future. SIGGRAPH ’09 Courses, Beyond Programmable Shading, Talk.Google Scholar
    3. Guy E. Blelloch. 1990. Prefix Sums and Their Applications. Technical Report CMU-CS-90-190. School of Computer Science, Carnegie Mellon University.Google Scholar
    4. David Blythe. 2006. The Direct3D 10 System. ACM Trans. Graph. 25, 3 (July 2006), 724–734. Google ScholarDigital Library
    5. Matthew Eldridge, Homan Igehy, and Pat Hanrahan. 2000. Pomegranate: A Fully Scalable Graphics Architecture. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH ’00). ACM Press/Addison-Wesley Publishing Co., New York, NY, USA, 443–454. Google ScholarDigital Library
    6. Kayvon Fatahalian and Mike Houston. 2008. A Closer Look at GPUs. Commun. ACM 51, 10 (Oct. 2008), 50–57. Google ScholarDigital Library
    7. Ned Greene. 1996. Hierarchical Polygon Tiling with Coverage Masks. In Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH ’96). ACM, New York, NY, USA, 65–74. Google ScholarDigital Library
    8. K. E. Hillesland and J. C. Yang. 2016. Texel Shading. In Proceedings of the 37th Annual Conference of the European Association for Computer Graphics: Short Papers (EG ’16). Eurographics Association, Goslar, Germany, 73–76. Google ScholarDigital Library
    9. ISO. 2008. Document management – Portable document format – Part 1: PDF 1.7. International Standard 32000-1:2008. International Organization for Standardization.Google Scholar
    10. Bernhard Kainz, Markus Grabner, Alexander Børnik, Stefan Hauswiesner, Judith Muehl, and Dieter Schmalstieg. 2009. Ray Casting of Multiple Volumetric Datasets with Polyhedral Boundaries on Manycore GPUs. ACM Trans. Graph. 28, 5 (Dec. 2009), 152:1–152:9. Google ScholarDigital Library
    11. Michael Kenzel, Bernhard Kerbl, Wolfgang Tatzgern, Elena Ivanchenko, Dieter Schmalstieg, and Markus Steinberger. 2018. On-the-fly Vertex Reuse for Massively-Parallel Software Geometry Processing. (2018). arXiv:1805.08893Google Scholar
    12. Bernhard Kerbl, Michael Kenzel, Dieter Schmalstieg, and Markus Steinberger. 2017. Effective Static Bin Patterns for Sort-middle Rendering. In Proceedings of High Performance Graphics (HPG ’17). ACM, New York, NY, USA, 14:1–14:10. Google ScholarDigital Library
    13. Khronos. 2016a. The OpenGL Graphics System: A Specification. The Khronos Group Inc.Google Scholar
    14. Khronos. 2016b. Vulkan 1.0.32 – A Specification. The Khronos Group Inc.Google Scholar
    15. Samuli Laine and Tero Karras. 2011. High-performance Software Rasterization on GPUs. In Proceedings of the ACM SIGGRAPH Symposium on High Performance Graphics (HPG ’11). ACM, New York, NY, USA, 79–88. Google ScholarDigital Library
    16. Samuli Laine, Tero Karras, and Timo Aila. 2013. Megakernels Considered Harmful: Wavefront Path Tracing on GPUs. In Proceedings of the 5th High-Performance Graphics Conference (HPG ’13). ACM, New York, NY, USA, 137–143. Google ScholarDigital Library
    17. Fang Liu, Meng-Cheng Huang, Xue-Hui Liu, and En-Hua Wu. 2010. FreePipe: A Programmable Parallel Rendering Architecture for Efficient Multi-fragment Effects. In Proceedings of the 2010 ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games (I3D ’10). ACM, New York, NY, USA, 75–82. Google ScholarDigital Library
    18. Charles Loop and Jim Blinn. 2005. Resolution Independent Curve Rendering Using Programmable Graphics Hardware. ACM Trans. Graph. 24, 3 (July 2005), 1000–1009. Google ScholarDigital Library
    19. Michael D. McCool, Zheng Qin, and Tiberiu S. Popa. 2002. Shader Metaprogramming. In Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware (HWWS ’02). Eurographics Association, Aire-la-Ville, Switzerland, 57–68. Google ScholarDigital Library
    20. Mesa 3D. 1993. The Mesa 3D Graphics Library, https://www.mesa3d.org.Google Scholar
    21. Steven Molnar, Michael Cox, David Ellsworth, and Henry Fuchs. 1994. A Sorting Classification of Parallel Rendering. IEEE Comput. Graph. Appl. 14, 4 (July 1994), 23–32. Google ScholarDigital Library
    22. John Nickolls, Ian Buck, Michael Garland, and Kevin Skadron. 2008. Scalable Parallel Programming with CUDA. Queue 6, 2 (March 2008), 40–53. Google ScholarDigital Library
    23. NVIDIA. 2007. Solid Wireframe. Whitepaper WP-03014-001_v01. NVIDIA Corporation.Google Scholar
    24. NVIDIA. 2016. CUDA C Programming Guide. NVIDIA Corporation.Google Scholar
    25. Marc Olano and Trey Greer. 1997. Triangle Scan Conversion Using 2D Homogeneous Coordinates. In Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Workshop on Graphics Hardware (HWWS ’97). ACM, New York, NY, USA, 89–95. Google ScholarDigital Library
    26. Steven G. Parker, James Bigler, Andreas Dietrich, Heiko Friedrich, Jared Hoberock, David Luebke, David McAllister, Morgan McGuire, Keith Morley, Austin Robison, and Martin Stich. 2010. OptiX: A General Purpose Ray Tracing Engine. ACM Trans. Graph. 29, 4 (July 2010), 66:1–66:13. Google ScholarDigital Library
    27. Anjul Patney, Stanley Tzeng, Kerry A. Seitz, Jr., and John D. Owens. 2015. Piko: A Framework for Authoring Programmable Graphics Pipelines. ACM Trans. Graph. 34, 4 (July 2015), 147:1–147:13. Google ScholarDigital Library
    28. Ken Perlin. 2001. Noise Hardware. SIGGRAPH ’01 Courses, Real-Time Shading, Talk.Google Scholar
    29. Tim Purcell. 2010. Fast Tessellated Rendering on the Fermi GF100. High Performance Graphics 2010, Hot 3D, Talk.Google Scholar
    30. RAD. 2002. Pixomatic SDK Features. RAD Game Tools Inc. http://www.radgametools.com/cn/pixofeat.htmGoogle Scholar
    31. D. Sanchez, D. Lo, R. M. Yoo, J. Sugerman, and C. Kozyrakis. 2011. Dynamic Fine-Grain Scheduling of Pipeline Parallelism. In 2011 International Conference on Parallel Architectures and Compilation Techniques. IEEE Computer Society, Washington, D.C., USA, 22–32. Google ScholarDigital Library
    32. Larry Seiler, Doug Carmean, Eric Sprangle, Tom Forsyth, Michael Abrash, Pradeep Dubey, Stephen Junkins, Adam Lake, Jeremy Sugerman, Robert Cavin, Roger Espasa, Ed Grochowski, Toni Juan, and Pat Hanrahan. 2008. Larrabee: A Many-core x86 Architecture for Visual Computing. ACM Trans. Graph. 27, 3 (Aug. 2008), 18:1–18:15. Google ScholarDigital Library
    33. Markus Steinberger, Michael Kenzel, Pedro Boechat, Bernhard Kerbl, Mark Dokter, and Dieter Schmalstieg. 2014. Whippletree: Task-based Scheduling of Dynamic Workloads on the GPU. ACM Trans. Graph. 33, 6 (Nov. 2014), 228:1–228:11. Google ScholarDigital Library
    34. John E Stone, David Gohara, and Guochun Shi. 2010. OpenCL: A parallel programming standard for heterogeneous computing systems. Computing in science & engineering 12, 1–3 (2010), 66–73. Google ScholarDigital Library
    35. Jeremy Sugerman, Kayvon Fatahalian, Solomon Boulos, Kurt Akeley, and Pat Hanrahan. 2009. GRAMPS: A Programming Model for Graphics Pipelines. ACM Trans. Graph. 28, 1 (Feb. 2009), 4:1–4:11. Google ScholarDigital Library
    36. Stanley Tzeng, Anjul Patney, and John D. Owens. 2010. Task Management for Irregular-parallel Workloads on the GPU. In Proceedings of the Conference on High Performance Graphics (HPG ’10). Eurographics Association, Aire-la-Ville, Switzerland, 29–37. Google ScholarDigital Library
    37. Alexis Vaisse. 2014. Efficient Usage of Compute Shaders on Xbox One and PS4. Game Developers Conference Europe 2014, Talk.Google Scholar
    38. Alex Vlachos. 2016. Advanced VR Rendering Performance. Game Developers Conference 2016, Talk.Google Scholar
    39. Lance Williams. 1983. Pyramidal Parametrics. SIGGRAPH Comput. Graph. 17, 3 (July 1983), 1–11. Google ScholarDigital Library

ACM Digital Library Publication: