“PCU: the programmable culling unit” by Hasselgren and Akenine-Moller

  • ©Jon Hasselgren and Tomas Akenine-Moller




    PCU: the programmable culling unit



    Culling techniques have always been a central part of computer graphics, but graphics hardware still lack efficient and flexible support for culling. To improve the situation, we introduce the programmable culling unit, which is as flexible as the fragment program unit and capable of quickly culling entire blocks of fragments. Furthermore, it is very easy for the developer to use the PCU as culling programs can be automatically derived from fragment programs containing a discard instruction. Our PCU can be integrated into an existing fragment program unit with a modest hardware overhead of only about 10%. Using the PCU, we have observed shader speedups between 1.4 and 2.1 for relevant scenes.


    1. Aila, T., Miettinen, V., and Nordlund, P. 2003. Delay streams for graphics hardware. ACM Transactions on Graphics, 22, 3, 792–800. Google ScholarDigital Library
    2. Akenine-Möller, T., and Ström, J. 2003. Graphics for the masses: A hardware rasterization architecture for mobile phones. ACM Transactions on Graphics, 22, 3, 801–808. Google ScholarDigital Library
    3. Bittner, J., Wimmer, M., Piringer, H., and Purgathofer, W. 2004. Coherent hierarchical culling: Hardware occlusion queries made useful. Computer Graphics Forum, 23, 3, 615–624.Google ScholarCross Ref
    4. Blythe, D. 2006. The direct3d 10 system. ACM Transactions on Graphics, 25, 3, 724–734. Google ScholarDigital Library
    5. Comba, J. L. D., and Stolfi, J. 1993. Affine arithmetic and its applications to computer graphics. In SIBGRAPI 1993, 9–18.Google Scholar
    6. Cook, R. L. 1984. Shade trees. In Computer Graphics (Proceedings of ACM SIGGRAPH 84), 223–231. Google ScholarDigital Library
    7. Cytron, R., Ferrante, J., Rosen, B. K., Wegman, M. N., and Zadeck, F. K. 1991. Efficiently computing static single assignment form and the control dependence graph. ACM Transactions on Programming Language Systems 13, 4, 451–490. Google ScholarDigital Library
    8. Doggett, M., 2005. Overview of the xbox360 gpu. Keynote at EUROGRAPHICS 2005.Google Scholar
    9. Donovan, W., 2006. Pixel load instruction for a programmable graphics processor. US Patent 7,091,979.Google Scholar
    10. Greene, N., and Kass, M. 1994. Error-bounded antialiased rendering of complex environments. In Proceedings of ACM SIGGRAPH 1994, 59–66. Google ScholarDigital Library
    11. Greene, N., Kass, M., and Miller, G. 1993. Hierarchical z-buffer visibility. In Proceedings of ACM SIGGRAPH 1993, 231–238. Google ScholarDigital Library
    12. Heidrich, W., Slusallek, P., and Seidel, H.-P. 1998. Sampling procedural shaders using affine arithmetic. In Proceedings of ACM SIGGRAPH 1998, 158–176. Google ScholarDigital Library
    13. Kearfott, R. B. 1996. Interval computations: Introduction, uses, and resources. Euromath Bulletin 2, 1, 95–112.Google Scholar
    14. Lindholm, E., Kilgard, M. J., and Moreton, H. 2001. A user-programmable vertex engine. In Proceedings of ACM SIGGRAPH 2001, ACM Press, 149–158. Google ScholarDigital Library
    15. Loop, C., and Blinn, J. 2006. Real-time gpu rendering of piece-wise algebraic surfaces. ACM Transactions on Graphics, 25, 3, 664–670. Google ScholarDigital Library
    16. Mammen, A. 1989. Transparency and antialiasing algorithms implemented with the virtual pixel maps technique. IEEE Computer Graphics and Applications 9, 4, 43–55. Google ScholarDigital Library
    17. McCool, M. D., Wales, C., and Moule, K. 2002. Incremental and hierarchical hilbert order edge equation polygon rasterization. In Graphics Hardware, 65–72. Google ScholarDigital Library
    18. Molnar, S., and Montrym, J., 2006. Position conflict detection and avoidance in a programmable graphics processor using tile coverage data. US Patent 7,053,893.Google Scholar
    19. Moore, R. E. 1966. Interval Analysis. Prentice-Hall.Google Scholar
    20. Morein, S. 2000. Ati radeon hyperz technology. In Workshop on Graphics Hardware, Hot3D Proceedings, ACM Press.Google Scholar
    21. Moule, K., and McCool, M. D. 2002. Efficient bounded adaptive tesselation of displacement maps. In Graphics Interface, 171–180.Google Scholar
    22. Purcell, T. J., Donner, C., Cammarano, M., Jensen, H. W., and Hanrahan, P. 2003. Photon mapping on programmable graphics hardware. In Graphics Hardware, 41–50. Google ScholarDigital Library
    23. Stamminger, M., Slusallek, P., and Seidel, H.-P. 1997. Bounded radiosity — illumination on general surfaces and clusters. Computer Graphics Forum 16, 3, C309–C317.Google ScholarCross Ref
    24. Tatarchuk, N. 2006. Dynamic parallax occlusion mapping with approximate soft shadows. In Proceedings of ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games (SI3D ’06), 63–69. Google ScholarDigital Library
    25. Uralsky, Y. 2005. Efficient Soft-Edged Shadows Using Pixel Shader Branching. In GPU Gems 2. Addison-Wesley Professional, 269–282.Google Scholar

ACM Digital Library Publication:

Overview Page: