“Taichi: a language for high-performance computation on spatially sparse data structures” by Hu, Li, Anderson, Ragan-Kelley and Durand
Conference:
Type(s):
Title:
- Taichi: a language for high-performance computation on spatially sparse data structures
Session/Category Title: Watch Your Language
Presenter(s)/Author(s):
Moderator(s):
Abstract:
3D visual computing data are often spatially sparse. To exploit such sparsity, people have developed hierarchical sparse data structures, such as multi-level sparse voxel grids, particles, and 3D hash tables. However, developing and using these high-performance sparse data structures is challenging, due to their intrinsic complexity and overhead. We propose Taichi, a new data-oriented programming language for efficiently authoring, accessing, and maintaining such data structures. The language offers a high-level, data structure-agnostic interface for writing computation code. The user independently specifies the data structure. We provide several elementary components with different sparsity properties that can be arbitrarily composed to create a wide range of multi-level sparse data structures. This decoupling of data structures from computation makes it easy to experiment with different data structures without changing computation code, and allows users to write computation as if they are working with a dense array. Our compiler then uses the semantics of the data structure and index analysis to automatically optimize for locality, remove redundant operations for coherent accesses, maintain sparsity and memory allocations, and generate efficient parallel and vectorized instructions for CPUs and GPUs.Our approach yields competitive performance on common computational kernels such as stencil applications, neighbor lookups, and particle scattering. We demonstrate our language by implementing simulation, rendering, and vision tasks including a material point method simulation, finite element analysis, a multigrid Poisson solver for pressure projection, volumetric path tracing, and 3D convolution on sparse grids. Our computation-data structure decoupling allows us to quickly experiment with different data arrangements, and to develop high-performance data structures tailored for specific computational tasks. With 110 th as many lines of code, we achieve 4.55× higher performance on average, compared to hand-optimized reference implementations.
References:
1. Mike Acton. 2014. Data-oriented design and C++. (2014). https://www.youtube.com/watch?v=rX0ItVEVjHcGoogle Scholar
2. Aseem Agarwala. 2007. Efficient Gradient-domain Compositing Using Quadtrees. ACM Trans. Graph. (Proc. SIGGRAPH) 26, 3 (2007), 94.Google ScholarDigital Library
3. Riyadh Baghdadi, Ulysse Beaugnon, Albert Cohen, Tobias Grosser, Michael Kruse, Chandan Reddy, Sven Verdoolaege, Adam Betts, Alastair F Donaldson, Jeroen Ketema, et al. 2015. PENCIL: A platform-neutral compute intermediate language for accelerator programming. In Parallel Architecture and Compilation. IEEE, 138–149.Google Scholar
4. Riyadh Baghdadi, Jessica Ray, Malek Ben Romdhane, Emanuele Del Sozzo, Abdurrahman Akkas, Yunming Zhang, Patricia Suriana, Shoaib Kamil, and Saman Amarasinghe. 2019. Tiramisu: A polyhedral compiler for expressing fast and portable code. Code Generation and Optimization (2019), 193–205.Google Scholar
5. Dan Bailey, Ian Masters, Matt Warner, and Harry Biddle. 2013. Simulating fluids using a coupled voxel-particle data model. In ACM SIGGRAPH 2013 Talks. ACM, 15.Google ScholarDigital Library
6. Gilbert Louis Bernstein, Chinmayee Shah, Crystal Lemire, Zachary Devito, Matthew Fisher, Philip Levis, and Pat Hanrahan. 2016. Ebb: A DSL for physical simulation on CPUs and GPUs. ACM Trans. Graph. 35, 2 (2016), 21:1–21:12.Google ScholarDigital Library
7. Robert Edward Bridson. 2003. Computational Aspects of Dynamic Surfaces. Ph.D. Dissertation. Stanford University, Stanford, CA, USA. Advisor(s) Fedkiw, Ronald.Google Scholar
8. Jiawen Chen, Dennis Bautembach, and Shahram Izadi. 2013. Scalable Real-time Volumetric Surface Reconstruction. ACM Trans. Graph. (Proc. SIGGRAPH) 32, 4 (2013), 113:1–113:16.Google ScholarDigital Library
9. Kazem Cheshmi, Shoaib Kamil, Michelle Mills Strout, and Maryam Mehri Dehnavi. 2017. Sympiler: Transforming sparse matrix codes by decoupling symbolic analysis. In International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, 13:1–13:13.Google Scholar
10. Kazem Cheshmi, Shoaib Kamil, Michelle Mills Strout, and Maryam Mehri Dehnavi. 2018. ParSy: Inspection and transformation of sparse matrix computations for parallelism. In International Conference for High Performance Computing, Networking, Storage, and Analysis. 62.Google ScholarDigital Library
11. Stephen Chou, Fredrik Kjolstad, and Saman Amarasinghe. 2018. Format abstraction for sparse tensor algebra compilers. Proceedings of the ACM on Programming Languages 2, OOPSLA (2018), 123.Google ScholarDigital Library
12. Frederica Darema, David A George, V Alan Norton, and Gregory F Pfister. 1988. A single-program-multiple-data computational model for EPEX/FORTRAN. Parallel Comput. 7, 1 (1988), 11–24.Google ScholarCross Ref
13. Zachary DeVito, Niels Joubert, Francisco Palacios, Stephen Oakley, Montserrat Medina, Mike Barrientos, Erich Elsen, Frank Ham, Alex Aiken, Karthik Duraisamy, et al. 2011. Liszt: A domain specific language for building portable mesh-based PDE solvers. In International Conference for High Performance Computing, Networking, Storage and Analysis. 9.Google ScholarDigital Library
14. Ming Gao, Xinlei Wang, Kui Wu, Andre Pradhana-Tampubolon, Eftychios Sifakis, Yuksel Cem, and Chenfanfu Jiang. 2018. GPU Optimization of Material Point Methods. ACM Trans. Graph. (Proc. SIGGRAPH Asia) 32, 4 (2018), 102.Google Scholar
15. Benjamin Graham, Martin Engelcke, and Laurens van der Maaten. 2018. 3D semantic segmentation with submanifold sparse convolutional networks. In Computer Vision and Pattern Recognition. 9224–9232.Google Scholar
16. Rama Karl Hoetzlein. 2016. GVDB: Raytracing sparse voxel database structures on the GPU. In Proceedings of High Performance Graphics. Eurographics Association, 109–117.Google Scholar
17. Ben Houston, Michael B. Nielsen, Christopher Batty, Ola Nilsson, and Ken Museth. 2006. Hierarchical RLE level set: A compact and versatile deformable surface representation. ACM Trans. Graph. 25, 1 (2006), 151–175.Google ScholarDigital Library
18. Yuanming Hu. 2018. Taichi: An open-source computer graphics library. arXiv preprint arXiv:1804.09293 (2018).Google Scholar
19. Yuanming Hu, Yu Fang, Ziheng Ge, Ziyin Qu, Yixin Zhu, Andre Pradhana, and Chenfanfu Jiang. 2018. A moving least squares material point method with displacement discontinuity and two-way rigid body coupling. ACM Trans. Graph. (Proc. SIGGRAPH Asia) 37, 4 (2018), 150.Google ScholarDigital Library
20. Wenzel Jakob. 2010. Mitsuba renderer. (2010). http://www.mitsuba-renderer.org.Google Scholar
21. Michael Kazhdan, Matthew Bolitho, and Hugues Hoppe. 2006. Poisson surface reconstruction. In Eurographics Symposium on Geometry Processing, Vol. 7.Google Scholar
22. Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe. 2017. The tensor algebra compiler. Proceedings of the ACM on Programming Languages 1, OOPSLA (2017), 77.Google ScholarDigital Library
23. Fredrik Kjolstad, Shoaib Kamil, Jonathan Ragan-Kelley, David I. W. Levin, Shinjiro Sueda, Desai Chen, Etienne Vouga, Danny M. Kaufman, Gurtej Kanwar, Wojciech Matusik, and Saman Amarasinghe. 2016. Simit: A language for physical simulation. ACM Trans. Graph. 35, 2 (2016), 20:1–20:21.Google ScholarDigital Library
24. Chris Lattner and Vikram Adve. 2004. LLVM: A compilation framework for lifelong program analysis & transformation. In Code Generation and Optimization.Google Scholar
25. Mark Lee, Brian Green, Feng Xie, and Eric Tabellion. 2017. Vectorized Production Path Tracing. In High Performance Graphics.Google Scholar
26. Roland Leißa, Sebastian Hack, and Ingo Wald. 2012. Extending a C-like Language for Portable SIMD Programming. SIGPLAN Not. 47, 8 (2012), 65–74.Google ScholarDigital Library
27. Haixiang Liu, Yuanming Hu, Bo Zhu, Wojciech Matusik, and Eftychios Sifakis. 2018. Narrow-band Topology Optimization on a Sparsely Populated Grid. ACM Trans. Graph. (Proc. SIGGRAPH Asia) 37, 6 (2018), 251:1–251:14.Google Scholar
28. Frank Losasso, Frédéric Gibou, and Ron Fedkiw. 2004. Simulating water and smoke with an octree data structure. In ACM Trans. Graph. (Proc. SIGGRAPH), Vol. 23. ACM, 457–462.Google ScholarDigital Library
29. Aleka McAdams, Eftychios Sifakis, and Joseph Teran. 2010. A parallel multigrid Poisson solver for fluids simulation on large grids. In Symposium on Computer Animation. ACM/Eurographics Association, 65–74.Google Scholar
30. Ravi Teja Mullapudi, Vinay Vasista, and Uday Bondhugula. 2015. PolyMage: Automatic optimization for image processing pipelines. SIGARCH Comput. Archit. News 43, 1 (2015), 429–443.Google ScholarDigital Library
31. Ken Museth. 2013. VDB: High-resolution sparse volumes with dynamic topology. ACM Trans. Graph. 32, 3 (2013), 27.Google ScholarDigital Library
32. Ken Museth. 2014. Hierarchical digital differential analyzer for efficient ray-marching in OpenVDB. (2014).Google Scholar
33. Michael B Nielsen and Robert Bridson. 2016. Spatially adaptive FLIP fluid simulations in bifrost. In ACM SIGGRAPH 2016 Talks. ACM, 41.Google ScholarDigital Library
34. Michael B. Nielsen and Ken Museth. 2006. Dynamic Tubular Grid: An Efficient Data Structure and Algorithms for High Resolution Level Sets. J. Sci. Comput. 26, 3 (2006), 261–299.Google ScholarDigital Library
35. Stanley Osher and James A. Sethian. 1988. Fronts Propagating with Curvature-dependent Speed: Algorithms Based on Hamilton-Jacobi Formulations. J. Comput. Phys. 79, 1 (1988), 12–49.Google ScholarDigital Library
36. Matt Pharr, Craig Kolb, Reid Gershbein, and Pat Hanrahan. 1997. Rendering Complex Scenes with Memory-coherent Ray Tracing. In SIGGRAPH. ACM, 101–108.Google Scholar
37. Matt Pharr and William R Mark. 2012. ispc: A SPMD compiler for high-performance CPU programming. In Innovative Parallel Computing. 1–13.Google Scholar
38. Jonathan Ragan-Kelley, Andrew Adams, Sylvain Paris, Marc Levoy, Saman Amarasinghe, and Frédo Durand. 2012. Decoupling Algorithms from Schedules for Easy Optimization of Image Processing Pipelines. ACM Trans. Graph. (Proc. SIGGRAPH) 31, 4 (2012), 32:1–32:12.Google ScholarDigital Library
39. Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. SIGPLAN Not. 48, 6 (jun 2013), 519–530.Google ScholarDigital Library
40. Gernot Riegler, Ali Osman Ulusoy, and Andreas Geiger. 2017. Octnet: Learning deep 3d representations at high resolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3577–3586.Google ScholarCross Ref
41. Rajsekhar Setaluri, Mridul Aanjaneya, Sean Bauer, and Eftychios Sifakis. 2014. SPGrid: A sparse paged grid structure applied to adaptive smoke simulation. ACM Trans. Graph. (Proc. SIGGRAPH Asia) 33, 6 (2014), 205.Google ScholarDigital Library
42. Alexey Stomakhin, Craig Schroeder, Lawrence Chai, Joseph Teran, and Andrew Selle. 2013. A material point method for snow simulation. ACM Transactions on Graphics (TOG) 32, 4 (2013), 102.Google ScholarDigital Library
43. John E Stone, David Gohara, and Guochun Shi. 2010. OpenCL: A parallel programming standard for heterogeneous computing systems. Computing in science & engineering 12, 3 (2010), 66–73.Google ScholarDigital Library
44. Deborah Sulsky, Shi-Jian Zhou, and Howard L Schreyer. 1995. Application of a particlein-cell method to solid mechanics. Computer physics communications 87, 1–2 (1995), 236–252.Google Scholar
45. Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S Moses, Sven Verdoolaege, Andrew Adams, and Albert Cohen. 2018. Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions. arXiv:1802.04730 (2018).Google Scholar
46. Peng-Shuai Wang, Yang Liu, Yu-Xiao Guo, Chun-Yu Sun, and Xin Tong. 2017. OCNN: Octree-based convolutional neural networks for 3D shape analysis. ACM Transactions on Graphics (SIGGRAPH) 36, 4 (2017).Google ScholarDigital Library
47. Peng-Shuai Wang, Chun-Yu Sun, Yang Liu, and Xin Tong. 2018. Adaptive O-CNN: A patch-based deep representation of 3D shapes. ACM Transactions on Graphics (SIGGRAPH Asia) 37, 6 (2018).Google Scholar
48. Yangzihao Wang, Andrew Davidson, Yuechao Pan, Yuduo Wu, Andy Riffel, and John D. Owens. 2016. Gunrock: A high-performance graph processing library on the GPU. SIGPLAN Not. 51, 8 (2016), 11:1–11:12.Google Scholar
49. E Woodcock, T Murphy, P Hemmings, and S Longworth. 1965. Techniques used in the GEM code for Monte Carlo neutronics calculations in reactors and other systems of complex geometry. In Applications of Computing Methods to Reactor Problems, Vol. 557.Google Scholar
50. Kui Wu, Nghia Truong, Cem Yuksel, and Rama Hoetzlein. 2018. Fast fluid simulations with sparse volumes on the GPU. In Computer Graphics Forum (Proc. Eurographics), Vol. 37. Wiley Online Library, 157–167.Google Scholar
51. Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian Shun, and Saman Amarasinghe. 2018. GraphIt: A high-performance graph DSL. Proceedings of the ACM on Programming Languages 2, OOPSLA (2018), 121.Google ScholarDigital Library
52. Yongning Zhu and Robert Bridson. 2005. Animating sand as a fluid. ACM Transactions on Graphics (TOG) 24, 3 (2005), 965–972.Google ScholarDigital Library


