“Dynamic ray stream traversal” by Barringer and Akenine-Moller

  • ©Rasmus Barringer and Tomas Akenine-Moller



Session Title:

    Fast Rendering


    Dynamic ray stream traversal




    While each new generation of processors gets larger caches and more compute power, external memory bandwidth capabilities increase at a much lower pace. Additionally, processors are equipped with wide vector units that require low instruction level divergence to be efficiently utilized. In order to exploit these trends for ray tracing, we present an alternative to traditional depth-first ray traversal that takes advantage of the available cache hierarchy, and provides high SIMD efficiency, while keeping memory bus traffic low. Our main contribution is an efficient algorithm for traversing large packets of rays against a bounding volume hierarchy in a way that groups coherent rays during traversal. In contrast to previous large packet traversal methods, our algorithm allows for individual traversal order for each ray, which is essential for efficient ray tracing. Ray tracing algorithms is a mature research field in computer graphics, and despite this, our new technique increases traversal performance by 36–53%, and is applicable to most ray tracers.


    1. Áfra, A. T., and Szirmay-Kalos, L. 2013. Stackless Multi-BVH Traversal for CPU, MIC and GPU Ray Tracing. Computer Graphics Forum, 33, 1, 129–140.Google ScholarDigital Library
    2. Barringer, R., and Akenine-Möller, T. 2013. Dynamic Stackless Binary Tree Traversal. Journal of Computer Graphics Techniques, 2, 1 (March), 38–49.Google Scholar
    3. Benthin, C., Wald, I., Woop, S., Ernst, M., and Mark, W. 2012. Combining Single and Packet-Ray Tracing for Arbitrary Ray Distributions on the Intel MIC Architecture. IEEE Transactions on Visualization and Computer Graphics, 18, 9, 1438–1448. Google ScholarDigital Library
    4. Boulos, S., Wald, I., and Benthin, C. 2008. Adaptive Ray Packet Reordering. In Symposium on Interactive Ray Tracing, 131–138.Google Scholar
    5. Dammertz, H., Hanika, J., and Keller, A. 2008. Shallow Bounding Volume Hierarchies for Fast SIMD Ray Tracing of Incoherent Rays. Computer Graphics Forum, 27, 4, 1225–1233. Google ScholarDigital Library
    6. Eisenacher, C., Nichols, G., Selle, A., and Burley, B. 2013. Sorted Deferred Shading for Production Path Tracing. Computer Graphics Forum, 32, 4, 125–132. Google ScholarDigital Library
    7. Ernst, M., and Greiner, G. 2008. Multi Bounding Volume Hierarchies. In IEEE Interactive Ray Tracing, 35–40.Google Scholar
    8. Garanzha, K., and Loop, C. 2010. Fast Ray Sorting and Breadth-First Packet Traversal for GPU Ray Tracing. Computer Graphics Forum, 29, 2, 289–298.Google ScholarCross Ref
    9. Gribble, C. P., and Ramani, K. 2008. Coherent Ray Tracing via Stream Filtering. In Symposium on Interactive Ray Tracing, 59–66.Google Scholar
    10. Hanrahan, P. 1986. Using Caches and Breadth-First Search to Speed Up Ray Tracing. In Graphics Interface, 56–61. Google ScholarDigital Library
    11. Hapala, M., Davidovic, T., Wald, I., Havran, V., and Slusallek, P. 2011. Efficient Stackless BVH Traversal for Ray Tracing. In 27th Spring Conference of Computer Graphics, 29–34. Google ScholarDigital Library
    12. Hennessey, J. L., and Pattersson, D. A. 2011. Computer Architecture: A Quantitative Approach, 5th ed. Morgan Kaufmann.Google Scholar
    13. Hughes, D. M., and Lim, I. S. 2009. Kd-Jump: a Path-Preserving Stackless Traversal for Faster Isosurface Raytracing on GPUs. IEEE Transactions on Visualization and Computer Graphics, 15, 6, 1555–1562. Google ScholarDigital Library
    14. Ize, T., and Hansen, C. 2011. RTSAH Traversal Order for Occlusion Rays. Computer Graphics Forum, 30, 2, 297–305.Google ScholarCross Ref
    15. Kajiya, J. T. 1986. The Rendering Equation. Computer Graphics (Proceedings of ACM SIGGRAPH 86), 20, 4, 143–150. Google ScholarDigital Library
    16. Laine, S., Karras, T., and Aila, T. 2013. Megakernels Considered Harmful: Wavefront Path Tracing on GPUs. In High Performance Graphics, 137–143. Google ScholarDigital Library
    17. Laine, S. 2010. Restart Trail for Stackless BVH Traversal. In High Performance Graphics, 107–111. Google ScholarDigital Library
    18. Möller, T., and Trumbore, B. 1997. Fast, Minimum Storage Ray-triangle Intersection. Journal of Graphics Tools, 2, 1, 21–28. Google ScholarDigital Library
    19. Mora, B. 2011. Naive Ray-tracing: A Divide-and-conquer Approach. ACM Transactions on Graphics, 30, 5, 117:1–117:12. Google ScholarDigital Library
    20. Pharr, M., and Mark, W. 2012. ispc: A SPMD Compiler for High-Performance CPU Programming. In Innovative Parallel Computing (InPar), 1–13.Google Scholar
    21. Ramani, K., Gribble, C. P., and Davis, A. 2009. Stream-Ray: A Stream Filtering Architecture for Coherent Ray Tracing. In 14th International Conference on Architectural Support for Programming Languages and Operating Systems, 325–336. Google ScholarDigital Library
    22. Tsakok, J. A. 2009. Faster Incoherent Rays: Multi-BVH Ray Stream Tracing. In High Performance Graphics, 151–158. Google ScholarDigital Library
    23. Wald, I., Slusallek, P., Benthin, C., and Wagner, M. 2001. Interactive Rendering with Coherent Ray Tracing. Computer Graphics Forum, 20, 3, 153–164. Google ScholarDigital Library
    24. Wald, I., Gribble, C. P., Boulos, S., and Kensler, A. 2007. SIMD Ray Stream Tracing – SIMD Ray Traversal with Generalized Ray Packets and On-the-fly Re-Ordering. Tech. Rep. UUSCI-2007-012.Google Scholar
    25. Wald, I., Benthin, C., and Boulos, S. 2008. Getting Rid of Packets – Efficient SIMD Single-Ray Traversal using Multi-Branching BVHs. In IEEE Symposium on Interactive Ray Tracing, 49–57.Google Scholar
    26. Wald, I. 2007. On Fast Construction of SAH-based Bounding Volume Hierarchies. In IEEE Symposium on Interactive Ray Tracing, 33–40. Google ScholarDigital Library
    27. Whitted, T. 1980. An Improved Illumination Model for Shaded Display. Communications of the ACM, 23, 6, 343–349. Google ScholarDigital Library
    28. Woop, S., Feng, L., Wald, I., and Benthin, C. 2013. Embree Ray Tracing Kernels for CPUs and the Xeon Phi Architecture. In ACM SIGGRAPH 2013 Talks. Google ScholarDigital Library

ACM Digital Library Publication: