“Piko: a framework for authoring programmable graphics pipelines” by Patney, Tzeng, Seitz and Owens

  • ©Anjul Patney, Stanlet Tzeng, Kerry A Seitz, and John D. Owens




    Piko: a framework for authoring programmable graphics pipelines



    We present Piko, a framework for designing, optimizing, and retargeting implementations of graphics pipelines on multiple architectures. Piko programmers express a graphics pipeline by organizing the computation within each stage into spatial bins and specifying a scheduling preference for these bins. Our compiler, Pikoc, compiles this input into an optimized implementation targeted to a massively-parallel GPU or a multicore CPU.Piko manages work granularity in a programmable and flexible manner, allowing programmers to build load-balanced parallel pipeline implementations, to exploit spatial and producer-consumer locality in a pipeline implementation, and to explore tradeoffs between these considerations. We demonstrate that Piko can implement a wide range of pipelines, including rasterization, Reyes, ray tracing, rasterization/ray tracing hybrid, and deferred rendering. Piko allows us to implement efficient graphics pipelines with relative ease and to quickly explore design alternatives by modifying the spatial binning configurations and scheduling preferences for individual stages, all while delivering real-time performance that is within a factor six of state-of-the-art rendering systems.


    1. Aila, T., and Laine, S. 2009. Understanding the efficiency of ray traversal on GPUs. In Proceedings of High Performance Graphics 2009, 145–149. Google ScholarDigital Library
    2. Aila, T., Miettinen, V., and Nordlund, P. 2003. Delay streams for graphics hardware. ACM Transactions on Graphics 22, 3 (July), 792–800. Google ScholarDigital Library
    3. Andersson, J. 2009. Parallel graphics in Frostbite—current & future. In Beyond Programmable Shading (ACM SIGGRAPH 2009 Course), ACM, New York, NY, USA, SIGGRAPH ’09, 7:1–7:312. Google ScholarDigital Library
    4. Apodaca, A. A., and Mantle, M. W. 1990. RenderMan: Pursuing the future of graphics. IEEE Computer Graphics & Applications 10, 4 (July), 44–49. Google ScholarDigital Library
    5. Cook, R. L., Carpenter, L., and Catmull, E. 1987. The Reyes image rendering architecture. In Computer Graphics (Proceedings of SIGGRAPH 87), 95–102. Google ScholarDigital Library
    6. Eldridge, M., Igehy, H., and Hanrahan, P. 2000. Pomegranate: A fully scalable graphics architecture. In Proceedings of ACM SIGGRAPH 2000, Computer Graphics Proceedings, Annual Conference Series, 443–454. Google ScholarDigital Library
    7. Fatahalian, K., Horn, D. R., Knight, T. J., Leem, L., Houston, M., Park, J. Y., Erez, M., Ren, M., Aiken, A., Dally, W. J., and Hanrahan, P. 2006. Sequoia: Programming the memory hierarchy. In SC ’06: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, 83. Google ScholarDigital Library
    8. Foley, T., and Hanrahan, P. 2011. Spark: modular, composable shaders for graphics hardware. ACM Transactions on Graphics 30, 4 (July), 107:1–107:12. Google ScholarDigital Library
    9. Fuchs, H., Poulton, J., Eyles, J., Greer, T., Goldfeather, J., Ellsworth, D., Molnar, S., Turk, G., Tebbs, B., and Israel, L. 1989. Pixel-Planes 5: A heterogeneous multiprocessor graphics system using processor-enhanced memories. In Computer Graphics (Proceedings of SIGGRAPH 89), vol. 23, 79–88. Google ScholarDigital Library
    10. Gupta, K., Stuart, J., and Owens, J. D. 2012. A study of persistent threads style GPU programming for GPGPU workloads. In Proceedings of Innovative Parallel Computing, InPar ’12.Google Scholar
    11. Hasselgren, J., and Akenine-Möller, T. 2007. PCU: The programmable culling unit. ACM Transactions on Graphics 26, 3 (July), 92:1–92:10. Google ScholarDigital Library
    12. Humphreys, G., Houston, M., Ng, R., Frank, R., Ahern, S., Kirchner, P., and Klosowski, J. 2002. Chromium: A stream-processing framework for interactive rendering on clusters. ACM Transactions on Graphics 21, 3 (July), 693–702. Google ScholarDigital Library
    13. Imagination Technologies Ltd. 2011. Powervr Series5 Graphics SGX architecture guide for developers, 5 July. Version 1.0.8.Google Scholar
    14. Laine, S., and Karras, T. 2011. High-performance software rasterization on GPUs. In High Performance Graphics, 79–88. Google ScholarDigital Library
    15. Lattner, C., and Adve, V. 2004. Llvm: A compilation framework for lifelong program analysis & transformation. In Proceedings of the 2004 International Symposium on Code Generation and Optimization, CGO ’04, 75–86. Google ScholarDigital Library
    16. Liu, F., Huang, M.-C., Liu, X.-H., and Wu, E.-H. 2010. Freepipe: a programmable parallel rendering architecture for efficient multi-fragment effects. In I3D ’10: Proceedings of the 2010 ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games, 75–82. Google ScholarDigital Library
    17. Mark, W. R., Glanville, R. S., Akeley, K., and Kilgard, M. J. 2003. Cg: A system for programming graphics hardware in a C-like language. ACM Transactions on Graphics 22, 3 (July), 896–907. Google ScholarDigital Library
    18. Montrym, J. S., Baum, D. R., Dignam, D. L., and Migdal, C. J. 1997. InfiniteReality: A real-time graphics system. In Proceedings of SIGGRAPH 97, Computer Graphics Proceedings, Annual Conference Series, 293–302. Google ScholarDigital Library
    19. Olano, M., and Lastra, A. 1998. A shading language on graphics hardware: The PixelFlow shading system. In Proceedings of SIGGRAPH 98, Computer Graphics Proceedings, Annual Conference Series, 159–168. Google ScholarDigital Library
    20. Olson, T. J. 2012. Saving the planet, one handset at a time: Designing low-power, low-bandwidth GPUs. In ACM SIGGRAPH 2012 Mobile. Google ScholarDigital Library
    21. Pantaleoni, J. 2011. Voxelpipe: A programmable pipeline for 3D voxelization. In High Performance Graphics, HPG ’11, 99–106. Google ScholarDigital Library
    22. Parker, S. G., Bigler, J., Dietrich, A., Friedrich, H., Hoberock, J., Luebke, D., McAllister, D., McGuire, M., Morley, K., Robison, A., and Stich, M. 2010. OptiX: A general purpose ray tracing engine. ACM Transactions on Graphics 29, 4 (July), 66:1–66:13. Google ScholarDigital Library
    23. Patney, A., and Owens, J. D. 2008. Real-time Reyes-style adaptive surface subdivision. ACM Transactions on Graphics 27, 5 (Dec.), 143:1–143:8. Google ScholarDigital Library
    24. Peercy, M. S., Olano, M., Airey, J., and Ungar, P. J. 2000. Interactive multi-pass programmable shading. In Proceedings of ACM SIGGRAPH 2000, Computer Graphics Proceedings, Annual Conference Series, 425–432. Google ScholarDigital Library
    25. Potmesil, M., and Hoffert, E. M. 1989. The Pixel Machine: A parallel image computer. In Computer Graphics (Proceedings of SIGGRAPH 89), vol. 23, 69–78. Google ScholarDigital Library
    26. Purcell, T. 2010. Fast tessellated rendering on Fermi GF100. In High Performance Graphics Hot3D.Google Scholar
    27. Ragan-Kelley, J., Adams, A., Paris, S., Levoy, M., Amarasinghe, S., and Durand, F. 2012. Decoupling algorithms from schedules for easy optimization of image processing pipelines. ACM Transactions on Graphics 31, 4 (July), 32:1–32:12. Google ScholarDigital Library
    28. Sanchez, D., Lo, D., Yoo, R. M., Sugerman, J., and Kozyrakis, C. 2011. Dynamic fine-grain scheduling of pipeline parallelism. In Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, PACT ’11, 22–32. Google ScholarDigital Library
    29. Seiler, L., Carmean, D., Sprangle, E., Forsyth, T., Abrash, M., Dubey, P., Junkins, S., Lake, A., Sugerman, J., Cavin, R., Espasa, R., Grochowski, E., Juan, T., and Hanrahan, P. 2008. Larrabee: A many-core x86 architecture for visual computing. ACM Transactions on Graphics 27, 3 (Aug.), 18:1–18:15. Google ScholarDigital Library
    30. Steinberger, M., Kenzel, M., Boechat, P., Kerbl, B., Dokter, M., and Schmalstieg, D. 2014. Whippletree: task-based scheduling of dynamic workloads on the GPU. ACM Transactions on Graphics 33, 6 (Nov.), 228:1–228:11. Google ScholarDigital Library
    31. Stoll, G., Eldridge, M., Patterson, D., Webb, A., Berman, S., Levy, R., Caywood, C., Taveira, M., Hunt, S., and Hanrahan, P. 2001. Lightning-2: A high-performance display subsystem for PC clusters. In Proceedings of ACM SIGGRAPH 2001, Computer Graphics Proceedings, Annual Conference Series, 141–148. Google ScholarDigital Library
    32. Sugerman, J., Fatahalian, K., Boulos, S., Akeley, K., and Hanrahan, P. 2009. Gramps: A programming model for graphics pipelines. ACM Transactions on Graphics 28, 1 (Jan.), 4:1–4:11. Google ScholarDigital Library
    33. Thies, W., Karczmarek, M., and Amarasinghe, S. P. 2002. StreamIt: A language for streaming applications. In Proceedings of the 11th International Conference on Compiler Construction, R. N. Horspool, Ed., Lecture Notes in Computer Science. Springer-Verlag, Apr., 179–196. Google ScholarDigital Library
    34. Tzeng, S., Patney, A., and Owens, J. D. 2010. Task management for irregular-parallel workloads on the GPU. In Proceedings of High Performance Graphics 2010, 29–37. Google ScholarDigital Library
    35. Weber, T., Wimmer, M., and Owens, J. D. 2015. Parallel Reyes-style adaptive subdivision with bounded memory usage. In Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games, i3D 2015, 39–45. Google ScholarDigital Library
    36. Zhou, K., Hou, Q., Ren, Z., Gong, M., Sun, X., and Guo, B. 2009. RenderAnts: Interactive Reyes rendering on GPUs. ACM Transactions on Graphics 28, 5 (Dec.), 155:1–155:11. Google ScholarDigital Library

ACM Digital Library Publication:

Overview Page: