“R2E2: low-latency path tracing of terabyte-scale scenes using thousands of cloud CPUs” by Fouladi, Shacklett, Poms, Arora, Ozdemir, et al. …

  • ©Sadjad Fouladi, Brennan Shacklett, Fait Poms, Arjun Arora, Alex Ozdemir, Deepti Raghavan, Patrick (Pat) Hanrahan, and Kayvon Fatahalian




    R2E2: low-latency path tracing of terabyte-scale scenes using thousands of cloud CPUs



    In this paper we explore the viability of path tracing massive scenes using a “supercomputer” constructed on-the-fly from thousands of small, serverless cloud computing nodes. We present R2E2 (Really Elastic Ray Engine) a scene decomposition-based parallel renderer that rapidly acquires thousands of cloud CPU cores, loads scene geometry from a pre-built scene BVH into the aggregate memory of these nodes in parallel, and performs full path traced global illumination using an inter-node messaging service designed for communicating ray data. To balance ray tracing work across many nodes, R2E2 adopts a service-oriented design that statically replicates geometry and texture data from frequently traversed scene regions onto multiple nodes based on estimates of load, and dynamically assigns ray tracing work to lightly loaded nodes holding the required data. We port pbrt’s ray-scene intersection components to the R2E2 architecture, and demonstrate that scenes with up to a terabyte of geometry and texture data (where as little as 1/250th of the scene can fit on any one node) can be path traced at 4K resolution, in tens of seconds using thousands of tiny serverless nodes on the AWS Lambda platform.


    1. Timo Aila and Tero Karras. 2010. Architecture considerations for tracing incoherent rays. In Proceedings of the Conference on High Performance Graphics. 113–122.Google ScholarDigital Library
    2. Amazon Web Services. 2021. Simple Queue Service (SQS). https://aws.amazon.com/sqs/.Google Scholar
    3. Lixiang Ao, Liz Izhikevich, Geoffrey M Voelker, and George Porter. 2018. Sprocket: A serverless video processing framework. In Proceedings of the ACM Symposium on Cloud Computing. 263–274.Google ScholarDigital Library
    4. Carsten Benthin, Ingo Wald, Sven Woop, and Attila T. Áfra. 2018. Compressed-Leaf Bounding Volume Hierarchies. In Proceedings of the Conference on HighPerformance Graphics (Vancouver, British Columbia, Canada) (HPG ’18). Association for Computing Machinery, New York, NY, USA, Article 6, 4 pages. Google ScholarDigital Library
    5. Brian Budge, Tony Bernardin, Jeff A. Stuart, Shubhabrata Sengupta, Kenneth I. Joy, and John D. Owens. 2009. Out-of-core Data Management for Path Tracing on Hybrid Resources. Computer Graphics Forum (2009). Google ScholarCross Ref
    6. Brent Burley, David Adler, Matt Jen-Yuan Chiang, Hank Driskill, Ralf Habel, Patrick Kelly, Peter Kutz, Yining Karl Li, and Daniel Teece. 2018. The Design and Evolution of Disney’s Hyperion Renderer. ACM Trans. Graph. 37, 3, Article 33 (jul 2018), 22 pages. Google ScholarDigital Library
    7. Brent Burley and Dylan Lacewell. 2008. Ptex: Per-Face Texture Mapping for Production Rendering. In Proceedings of the Nineteenth Eurographics Conference on Rendering (Sarajevo, Bosnia and Herzegovina) (EGSR ’08). Eurographics Association, Goslar, DEU, 1155–1164. Google ScholarDigital Library
    8. Joao Carreira, Pedro Fonseca, Alexey Tumanov, Andrew Zhang, and Randy Katz. 2019. Cirrus: A serverless framework for end-to-end ml workflows. In Proceedings of the ACM Symposium on Cloud Computing. 13–24.Google ScholarDigital Library
    9. Per H. Christensen, David M. Laur, Julia Fong, Wayne L. Wooten, and Dana Batali. 2003. Ray Differentials and Multiresolution Geometry Caching for Distribution Ray Tracing in Complex Scenes. Computer Graphics Forum 22, 3 (2003), 543–552. Google ScholarCross Ref
    10. J G Cleary, B M Wyvill, G M Birtwistle, and R Vatti. 1986. Multiprocessor Ray Tracing. Comput. Graph. Forum 5, 1 (March 1986), 3–12. Google ScholarDigital Library
    11. Mark Dippé and John Swensen. 1984. An Adaptive Subdivision Algorithm and Parallel Architecture for Realistic Image Synthesis. SIGGRAPH Comput. Graph. 18, 3 (Jan. 1984), 149–158. Google ScholarDigital Library
    12. Christian Eisenacher, Gregory Nichols, Andrew Selle, and Brent Burley. 2013. Sorted Deferred Shading for Production Path Tracing. In Proceedings of the Eurographics Symposium on Rendering (Zaragoza, Spain) (EGSR ’13). Eurographics Association, Goslar, DEU, 125–132. Google ScholarDigital Library
    13. Brad Fitzpatrick. 2004. Distributed Caching with Memcached. Linux J. 2004, 124 (Aug. 2004), 5.Google ScholarDigital Library
    14. Sadjad Fouladi, Francisco Romero, Dan Iter, Qian Li, Shuvo Chatterjee, Christos Kozyrakis, Matei Zaharia, and Keith Winstein. 2019. From laptop to lambda: Outsourcing everyday jobs to thousands of transient functional containers. In 2019 USENIX Annual Technical Conference (USENIX ATC 19). 475–488.Google Scholar
    15. Sadjad Fouladi, Riad S Wahby, Brennan Shacklett, Karthikeyan Vasuki Balasubramaniam, William Zeng, Rahul Bhalerao, Anirudh Sivaraman, George Porter, and Keith Winstein. 2017. Encoding, fast and slow: Low-latency video processing using thousands of tiny threads. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). 363–376.Google Scholar
    16. Iliyan Georgiev, Thiago Ize, Mike Farnsworth, Ramón Montoya-Vozmediano, Alan King, Brecht Van Lommel, Angel Jimenez, Oscar Anson, Shinji Ogaki, Eric Johnston, Adrien Herubel, Declan Russell, Frédéric Servant, and Marcos Fajardo. 2018. Arnold: A Brute-Force Production Path Tracer. ACM Trans. Graph. 37, 3, Article 32 (aug 2018), 12 pages. Google ScholarDigital Library
    17. F.W Jansen and A.G Chalmers. 1993. Realism in real time?. In 4th Eurographics Workshop on Rendering. 27 — 46.Google Scholar
    18. Eric Jonas, Qifan Pu, Shivaram Venkataraman, Ion Stoica, and Benjamin Recht. 2017. Occupy the cloud: Distributed computing for the 99%. In Proceedings of the 2017 Symposium on Cloud Computing. 445–451.Google ScholarDigital Library
    19. Toshi Kato and Jun Saito. 2002. “Kilauea” – Parallel Global Illumination Renderer. In Eurographics Workshop on Parallel Graphics and Visualization, D. Bartz, X. Pueyo, and E. Reinhard (Eds.). The Eurographics Association. Google ScholarCross Ref
    20. Hiroaki Kobayashi, Satoshi Nishimura, Hideyuki Kubota, Tadao Nakamura, and Yoshiharu Shigei. 1988. Load balancing strategies for a parallel ray-tracing system based on constant subdivision. The Visual Computer 4 (07 1988), 197–209.Google Scholar
    21. J. Mahovsky and B. Wyvill. 2006. Memory-Conserving Bounding Volume Hierarchies with Coherent Raytracing. Computer Graphics Forum (2006). Google ScholarCross Ref
    22. P. A. Navrátil, H. Childs, D. S. Fussell, and C. Lin. 2014. Exploring the Spectrum of Dynamic Scheduling Algorithms for Scalable Distributed-MemoryRay Tracing. IEEE Transactions on Visualization and Computer Graphics 20, 6 (2014), 893–906.Google ScholarDigital Library
    23. K. Nemoto and T. Omachi. 1986. An Adaptive Subdivision by Sliding Boundary Surfaces for Fast Ray Tracing. In Proceedings of Graphics Interface and Vision Interface ’86 (Vancouver, British Columbia, Canada) (GI ’86). Canadian Man-Computer Communications Society, Toronto, Ontario, Canada, 43–48. http://graphicsinterface.org/wp-content/uploads/gi1986-9.pdfGoogle Scholar
    24. Jacopo Pantaleoni, Luca Fascione, Martin Hill, and Timo Aila. 2010. PantaRay: Fast Ray-Traced Occlusion Caching of Massive Scenes. ACM Trans. Graph. 29, 4, Article 37 (July 2010), 10 pages. Google ScholarDigital Library
    25. Steven G. Parker, James Bigler, Andreas Dietrich, Heiko Friedrich, Jared Hoberock, David Luebke, David McAllister, Morgan McGuire, Keith Morley, Austin Robison, and Martin Stich. 2010. OptiX: A General Purpose Ray Tracing Engine. ACM Trans. Graph. 29, 4, Article 66 (jul 2010), 13 pages. Google ScholarDigital Library
    26. Matt Pharr, Wenzel Jakob, and Greg Humphreys. 2016. Physically based rendering: From theory to implementation. Morgan Kaufmann.Google Scholar
    27. Matt Pharr, Craig Kolb, Reid Gershbein, and Pat Hanrahan. 1997. Rendering complex scenes with memory-coherent ray tracing. In Proceedings of the 24th annual conference on Computer graphics and interactive techniques. 101–108.Google ScholarDigital Library
    28. Thierry Priol and Kadi Bouatouch. 1989. Static load balancing for a parallel ray tracing on a MIMD hypercube. The Visual Computer 5 (1989), 109–119.Google ScholarCross Ref
    29. Erik Reinhard, Alan Chalmers, and Frederik W. Jansen. 1999. Hybrid Scheduling for Parallel Rendering Using Coherent Ray Tasks. In Proceedings of the 1999 IEEE Symposium on Parallel Visualization and Graphics (San Francisco, California, USA) (PVGS ’99). IEEE Computer Society, USA, 21–28. Google ScholarDigital Library
    30. J. Salmon and J. Goldsmith. 1989. A Hypercube Ray-Tracer. In Proceedings of the Third Conference on Hypercube Concurrent Computers and Applications – Volume 2 (Pasadena, California, USA) (C3P). Association for Computing Machinery, New York, NY, USA, 1194–1206. Google ScholarDigital Library
    31. Isaac D. Scherson and Elisha Caspary. 1988. Multiprocessing for ray tracing: a hierarchical self-balancing approach. The Visual Computer 4 (1988), 188–196.Google ScholarCross Ref
    32. Vaishaal Shankar, Karl Krauth, Qifan Pu, Eric Jonas, Shivaram Venkataraman, Ion Stoica, Benjamin Recht, and Jonathan Ragan-Kelley. 2018. numpywren: Serverless linear algebra. arXiv preprint arXiv:1810.09679 (2018).Google Scholar
    33. Myungbae Son and Sung-Eui Yoon. 2017. Timeline Scheduling for Out-of-Core Ray Batching. In Proceedings of High Performance Graphics (Los Angeles, California) (HPG ’17). Association for Computing Machinery, New York, NY, USA, Article 11, 10 pages. Google ScholarDigital Library
    34. Ingo Wald, Sven Woop, Carsten Benthin, Gregory S. Johnson, and Manfred Ernst. 2014. Embree: A Kernel Framework for Efficient CPU Ray Tracing. ACM Trans. Graph. 33, 4, Article 143 (jul 2014), 8 pages. Google ScholarDigital Library
    35. Walt Disney Animation Studios. 2018. Moana Island Scene (v1.1). https://www.disneyanimation.com/resources/moana-island-scene/.Google Scholar
    36. Mike Wawrzoniak, Ingo Müller, Rodrigo Fraga Barcelos Paulus Bruno, and Gustavo Alonso. 2021. Boxer: Data Analytics on Network-enabled Serverless Platforms. In 11th Annual Conference on Innovative Data Systems Research (CIDR’21).Google Scholar
    37. Henri Ylitie, Tero Karras, and Samuli Laine. 2017a. Efficient Incoherent Ray Traversal on GPUs through Compressed Wide BVHs. In Proceedings of High Performance Graphics (Los Angeles, California) (HPG ’17). Association for Computing Machinery, New York, NY, USA, Article 4, 13 pages. Google ScholarDigital Library
    38. Henri Ylitie, Tero Karras, and Samuli Laine. 2017b. Efficient Incoherent Ray Traversal on GPUs through Compressed Wide BVHs. In Proceedings of High Performance Graphics (Los Angeles, California) (HPG ’17). Association for Computing Machinery, New York, NY, USA, Article 4, 13 pages. Google ScholarDigital Library

ACM Digital Library Publication:

Overview Page: