“Automatically Distributing Eulerian and Hybrid Fluid Simulations in the Cloud” by Mashayekhi, Shah, Qu, Lim and Levis

  • ©Omid Mashayekhi, Chinmayee Shah, Hang Qu, Andrew Lim, and Philip Levis




    Automatically Distributing Eulerian and Hybrid Fluid Simulations in the Cloud

Session/Category Title: Fluids 2: Vortex Boogaloo




    Distributing a simulation across many machines can drastically speed up computations and increase detail. The computing cloud provides tremendous computing resources, but weak service guarantees force programs to manage significant system complexity: nodes, networks, and storage occasionally perform poorly or fail.

    We describe Nimbus, a system that automatically distributes grid-based and hybrid simulations across cloud computing nodes. The main simulation loop is sequential code and launches distributed computations across many cores. The simulation on each core runs as if it is stand-alone: Nimbus automatically stitches these simulations into a single, larger one. To do this efficiently, Nimbus introduces a four-layer data model that translates between the contiguous, geometric objects used by simulation libraries and the replicated, fine-grain objects managed by its underlying cloud computing runtime.

    Using PhysBAM particle-level set fluid simulations, we demonstrate that Nimbus can run higher detail simulations faster, distribute simulations on up to 512 cores, and run enormous simulations (10243 cells). Nimbus automatically manages these distributed simulations, balancing load across nodes and recovering from failures. Implementations of PhysBAM water and smoke simulations as well as an open source heat-diffusion simulation show that Nimbus is general and can support complex simulations.

    Nimbus can be downloaded from https://nimbus.stanford.edu.


    1. Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI’16). 265–283. http://dl.acm.org/citation.cfm?id=3026877.3026899 Google ScholarDigital Library
    2. Academy of Motion Picture Arts and Sciences. 2017. Oscar Sci-Tech Awards. Retrieved April 3, 2018, from http://www.oscars.org/sci-tech.Google Scholar
    3. Jérémie Allard and Bruno Raffin. 2005. A shader-based parallel rendering framework. In Proceedings of IEEE Visualization (VIS’05). IEEE, Los Alamitos, CA, 127–134.Google Scholar
    4. Ganesh Ananthanarayanan, Ali Ghodsi, Scott Shenker, and Ion Stoica. 2013. Effective straggler mitigation: Attack of the clones. In Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation (NSDI’13). 185–198. http://dl.acm.org/citation.cfm?id=2482626.2482645 Google ScholarDigital Library
    5. Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha, and Edward Harris. 2010. Reining in the outliers in map-reduce clusters using Mantri. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation (OSDI’10). 265–278. http://dl.acm.org/citation.cfm?id=1924943.1924962 Google ScholarDigital Library
    6. Jason Ansel, Kapil Arya, and Gene Cooperman. 2009. DMTCP: Transparent checkpointing for cluster computations and the desktop. In Proceedings of the 2009 IEEE International Symposium on Parallel and Distributed Processing (IPDPS’09). IEEE, Los Alamitos, CA, 1–12. Google ScholarDigital Library
    7. David C. Arney and Joseph E. Flaherty. 1990. An adaptive mesh-moving and local refinement method for time-dependent partial differential equations. ACM Trans. Math. Softw. 16, 1, 48–71. Google ScholarDigital Library
    8. Michael Edward Bauer. 2014. Legion: Programming Distributed Heterogeneous Architectures With Logical Regions. Ph.D. Dissertation.Google Scholar
    9. Gilbert Louis Bernstein, Chinmayee Shah, Crystal Lemire, Zachary Devito, Matthew Fisher, Philip Levis, and Pat Hanrahan. 2016. Ebb: A DSL for physical simulation on CPUs and GPUs. ACM Transactions on Graphics 35, 2, Article 21, 12 pages. Google ScholarDigital Library
    10. Umit V. Catalyurek, Erik G. Boman, Karen D. Devine, Doruk Bozdag, Robert Heaphy, and Lee Ann Riesen. 2007. Hypergraph-based dynamic load balancing for adaptive scientific computations. In Proceedings of the Parallel and Distributed Processing Symposium (IPDPS’07). IEEE, Los Alamitos, CA, 1–11.Google ScholarCross Ref
    11. Philippe Charles, Christian Grothoff, Vijay Saraswat, Christopher Donawa, Allan Kielstra, Kemal Ebcioglu, Christoph Von Praun, and Vivek Sarkar. 2005. X10: An object-oriented approach to non-uniform cluster computing. ACM SIGPLAN Notices 40, 519–538. Google ScholarDigital Library
    12. NVIDIA Corporation. 2017. CUDA C Programming Guide. Retrieved April 3, 2018, from https://docs.nvidia.com/cuda/cuda-c-programming-guide/.Google Scholar
    13. Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Andrew Senior, Paul Tucker, Ke Yang, Quoc V Le, et al. 2012. Large scale distributed deep networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS’12). 1223–1231. Google ScholarDigital Library
    14. Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified data processing on large clusters. Commun. ACM 51, 1, 107–113. Google ScholarDigital Library
    15. Steven J. Deitz, Bradford L. Chamberlain, and Lawrence Snyder. 2004. Abstractions for dynamic data distribution. In Proceedings of the 9th International Workshops on High-Level Parallel Programming Models and Supportive Environments. IEEE, Los Alamitos, CA, 42–51.Google ScholarCross Ref
    16. Tyler Denniston, Shoaib Kamil, and Saman Amarasinghe. 2016. Distributed Halide. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’16). ACM, New York, NY, Article 5, 12 pages. Google ScholarDigital Library
    17. Mathieu Desbrun and Marie-Paule Gascuel. 1996. Smoothed particles: A new paradigm for animating highly deformable bodies. In Computer Animation and Simulation’96. Springer, 61–76. Google ScholarDigital Library
    18. Zachary DeVito, Niels Joubert, Francisco Palacios, Stephen Oakley, Montserrat Medina, Mike Barrientos, Erich Elsen, Frank Ham, Alex Aiken, Karthik Duraisamy, and others. 2011. Liszt: A domain specific language for building portable mesh-based PDE solvers. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’11). ACM, New York, NY, 9. Google ScholarDigital Library
    19. Sheng Di, Yves Robert, Frédéric Vivien, Derrick Kondo, Cho-Li Wang, and Franck Cappello. 2013. Optimization of cloud task processing with checkpoint-restart mechanism. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’13). IEEE, Los Alamitos, CA, 1–12. Google ScholarDigital Library
    20. J. Dinan, D. B. Larkins, P. Sadayappan, S. Krishnamoorthy, and J. Nieplocha. 2009. Scalable work stealing. In Proceedings of the Conference on High Performance Computing Networking, Storage, and Analysis (SC’09). ACM, New York, NY, Article 53, 11 pages. Google ScholarDigital Library
    21. Jens Dittrich and Jorge-Arnulfo Quiané-Ruiz. 2012. Efficient big data processing in Hadoop MapReduce. Proc. VLDB Endow. 5, 12, 2014–2015. Google ScholarDigital Library
    22. Pradeep Dubey, Pat Hanrahan, Ronald Fedkiw, Michael Lentine, and Craig Schroeder. 2011. PhysBAM: Physically based simulation. In Proceedings of the ACM SIGGRAPH 2011 Courses. ACM, New York, NY, 10. Google ScholarDigital Library
    23. R. Elliot English, Linhai Qiu, Yue Yu, and Ronald Fedkiw. 2013. Chimera grids for water simulation. In Proceedings of the 12th ACM SIGGRAPH/Eurographics Symposium on Computer Animation. ACM, New York, NY, 85–94. Google ScholarDigital Library
    24. Douglas Enright, Ronald Fedkiw, Joel Ferziger, and Ian Mitchell. 2002a. A hybrid particle level set method for improved interface capturing. J. Comput. Phys. 183, 1, 83–116. Google ScholarDigital Library
    25. Douglas Enright, Stephen Marschner, and Ronald Fedkiw. 2002b. Animation and rendering of complex water surfaces. In Proceedings of the 29th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH’02). ACM, New York, NY, 736–744. Google ScholarDigital Library
    26. Kayvon Fatahalian, Daniel Reiter Horn, Timothy J. Knight, Larkhoon Leem, Mike Houston, Ji Young Park, Mattan Erez, Manman Ren, et al. 2006. Sequoia: Programming the memory hierarchy. In Proceedings of the 2006 ACM/IEEE Conference on Supercomputing. ACM, New York, NY, 83. Google ScholarDigital Library
    27. J. Davison de St Germain, John McCorquodale, Steven G. Parker, and Christopher R. Johnson. 2000. Uintah: A massively parallel problem solving environment. In Proceedings of the 9th International Symposium on High-Performance Distributed Computing. IEEE, Los Alamitos, CA, 33–41. Google ScholarDigital Library
    28. Sanjay Ghemawat and Jeff Dean. 2017. LevelDB. Retrieved April 3, 2018, from https://github.com/google/leveldb.Google Scholar
    29. Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed graph-parallel computation on natural graphs. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI’12). 17–30. http://dl.acm.org/citation.cfm?id=2387880.2387883 Google ScholarDigital Library
    30. Nolan Goodnight. 2007. CUDA/OpenGL Fluid Simulation. NVIDIA Corporation.Google Scholar
    31. Pat Hanrahan and Jim Lawson. 1990. A language for shading and lighting calculations. ACM SIGGRAPH Computer Graphics 24, 289–298. Google ScholarDigital Library
    32. Paul H. Hargrove and Jason C. Duell. 2006. Berkeley lab checkpoint/restart (BLCR)for Linux clusters. J. Phys. Conf. Ser. 46, 494.Google ScholarCross Ref
    33. Francis H. Harlow. 1962. The Particle-in-Cell Method for Numerical Solution of Problems in Fluid Dynamics. Technical Report. Los Alamos Scientific Laboratory, New Mexico.Google Scholar
    34. Christopher J. Hughes, Radek Grzeszczuk, Eftychios Sifakis, Daehyun Kim, Sanjeev Kumar, Andrew P. Selle, Jatin Chhugani, Matthew Holliman, and Yen-Kuang Chen. 2007. Physical simulation for animation and visual effects: Parallelization and characterization for chip multiprocessors. ACM SIGARCH Computer Architecture News 35, 220–231. Google ScholarDigital Library
    35. Greg Humphreys, Mike Houston, Ren Ng, Randall Frank, Sean Ahern, Peter D. Kirchner, and James T. Klosowski. 2002. Chromium: A stream-processing framework for interactive rendering on clusters. ACM Trans. Graphics 21, 3, 693–702. Google ScholarDigital Library
    36. Emmanuel Jeannot, Esteban Meneses, Guillaume Mercier, François Tessier, and Gengbin Zheng. 2013. Communication and topology-aware load balancing in Charm++ with TreeMatch. In Proceedings of the IEEE International Conference on Cluster Computing (CLUSTER’13). 1–8.Google ScholarCross Ref
    37. Chenfanfu Jiang, Craig Schroeder, Andrew Selle, Joseph Teran, and Alexey Stomakhin. 2015. The affine particle-in-cell method. ACM Trans. Graphics 34, 4, 51. Google ScholarDigital Library
    38. Laxmikant V. Kale and Sanjeev Krishnan. 1993. CHARM++: A Portable Concurrent Object Oriented System Based on C++. Vol. 28. ACM, New York, NY.Google Scholar
    39. Shoaib Kamil. 2017. StencilProbe: A Microbenchmark for Stencil Applications. Retrieved April 3, 2018, from http://people.csail.mit.edu/skamil/projects/stencilprobe/.Google Scholar
    40. George Karypis and Vipin Kumar. 1996. Parallel multilevel K-way partitioning scheme for irregular graphs. In Proceedings of the 1996 ACM/IEEE Conference on Supercomputing (SC’96). IEEE, Los Alamitos, CA, Article 35. Google ScholarDigital Library
    41. Fredrik Kjolstad, Shoaib Kamil, Jonathan Ragan-Kelley, David I. W. Levin, Shinjiro Sueda, Desai Chen, Etienne Vouga, et al. 2016. Simit: A language for physical simulation. ACM Trans. Graphics 35, 2, 20. Google ScholarDigital Library
    42. Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Y. Ng. 2009. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML’09). ACM, New York, NY, 609–616. Google ScholarDigital Library
    43. Jonathan Lifflander, Sriram Krishnamoorthy, and Laxmikant V. Kale. 2012. Work stealing and persistence-based load balancers for iterative overdecomposed applications. In Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing (HPDC’12). ACM, New York, NY, 137–148. Google ScholarDigital Library
    44. Frank Losasso, Frédéric Gibou, and Ron Fedkiw. 2004. Simulating water and smoke with an octree data structure. ACM Trans. Graphics 23, 457–462. Google ScholarDigital Library
    45. David Luebke. 2008. CUDA: Scalable parallel programming for high-performance scientific computing. In Proceedings of the 2008 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro. IEEE, Los Alamitos, CA, 836–838.Google ScholarCross Ref
    46. Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. ACM, New York, NY, 135–146. Google ScholarDigital Library
    47. Omid Mashayekhi, Hang Qu, Chnimayee Shah, and Philip Levis. 2017. Execution templates: Caching control plane decisions for strong scaling of data analytics. In Proceedings of the 2017 USENIX Annual Technical Conference (ATC’17). 513–526. https://www.usenix.org/conference/atc17/technical-sessions/presentation/mashayekhi. Google ScholarDigital Library
    48. Derek G. Murray, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, and Martín Abadi. 2013. Naiad: A timely dataflow system. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP’13). ACM, New York, NY, 439–455. Google ScholarDigital Library
    49. Ken Museth, Jeff Lait, John Johanson, Jeff Budsberg, Ron Henderson, Mihai Alden, Peter Cucka, David Hill, and Andrew Pearce. 2013. OpenVDB: An open-source data structure and toolkit for high-resolution volumes. In Proceedings of the ACM SIGGRAPH 2013 Courses. ACM, New York, NY, 19. Google ScholarDigital Library
    50. Lionel M. Ni and Kai Hwang. 1985. Optimal load balancing in a multiple processor system with many job classes. IEEE Trans. Softw. Eng. 11, 5, 491–496. Google ScholarDigital Library
    51. Kay Ousterhout, Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, Byung-Gon Chun, and VMware ICSI. 2015. Making sense of performance in data analytics frameworks. In Proceedings of the 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI’15). 293–307. Google ScholarDigital Library
    52. Kay Ousterhout, Patrick Wendell, Matei Zaharia, and Ion Stoica. 2013. Sparrow: Distributed, low latency scheduling. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP’13). ACM, New York, NY, 69–84. Google ScholarDigital Library
    53. Saket Patkar, Mridul Aanjaneya, Dmitriy Karpman, and Ronald Fedkiw. 2013. A hybrid Lagrangian-Eulerian formulation for bubble generation and dynamics. In Proceedings of the 12th ACM SIGGRAPH/Eurographics Symposium on Computer Animation. ACM, New York, NY, 105–114. Google ScholarDigital Library
    54. Joshua Peraza, Ananta Tiwari, Michael Laurenzano, Laura Carrington, William A. Ward, and Roy Campbell. 2013. Understanding the performance of stencil computations on Intel’s Xeon Phi. In Proceedings of the IEEE International Conference on Cluster Computing (CLUSTER’13). IEEE, Los Alamitos, CA, 1–5.Google Scholar
    55. Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. ACM SIGPLAN Notices 48, 6, 519–530. Google ScholarDigital Library
    56. Jon Reisch, Stephen Marshall, Magnus Wrenninge, Tolga Göktekin, Michael Hall, Michael O’Brien, Jason Johnston, Jordan Rempel, and Andy Lin. 2016. Simulating rivers in the good dinosaur. In Proceedings of the ACM SIGGRAPH 2016 Talks. ACM, New York, NY, 40. Google ScholarDigital Library
    57. Gabriel Rivera and Chau-Wen Tseng. 2000. Tiling optimizations for 3D scientific computations. In Proceedings of the ACM/IEEE 2000 Conference on Supercomputing. IEEE, Los Alamitos, CA, 32–32. Google ScholarDigital Library
    58. Elliott Slaughter, Wonchan Lee, Sean Treichler, Michael Bauer, and Alex Aiken. 2015. Regent: A high-productivity programming language for HPC with logical regions. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’15). ACM, New York, NY, 81. Google ScholarDigital Library
    59. Marc Snir. 1998. MPI—The Complete Reference: The MPI Core. Vol. 1. MIT Press, Cambridge, MA. Google ScholarDigital Library
    60. Jos Stam. 1999. Stable fluids. In Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques. ACM, New York, NY, 121–128. Google ScholarDigital Library
    61. Matt Stanton, Ben Humberston, Brandon Kase, James F. O’Brien, Kayvon Fatahalian, and Adrien Treuille. 2014. Self-refining games using player analytics. ACM Trans. Graphics 33, 4, 73. Google ScholarDigital Library
    62. Yuan Tang, Rezaul Alam Chowdhury, Bradley C. Kuszmaul, Chi-Keung Luk, and Charles E. Leiserson. 2011. The Pochoir stencil compiler. In Proceedings of the 23rd Annual ACM Symposium on Parallelism in Algorithms and Architectures. ACM, New York, NY, 117–128. Google ScholarDigital Library
    63. The Khronos Group. 2017a. OpenCL Overview. Retrieved April 3, 2018, from https://www.khronos.org/opencl/.Google Scholar
    64. The Khronos Group. 2017b. OpenGL. Retrieved April 3, 2018, from https://www.opengl.org/.Google Scholar
    65. Kashi Venkatesh Vishwanath and Nachiappan Nagappan. 2010. Characterizing cloud computing hardware reliability. In Proceedings of the 1st ACM Symposium on Cloud Computing (SoCC’10). ACM, New York, NY, 193–204. Google ScholarDigital Library
    66. William W. White. 2012. River Running Through It. Retrieved April 3, 2018, from https://www.cs.siue.edu/∼wwhite/SIGGRAPH/SIGGRAPH2012Itinerary.pdf.Google Scholar
    67. Samuel Williams, John Shalf, Leonid Oliker, Shoaib Kamil, Parry Husbands, and Katherine Yelick. 2006. The potential of the cell processor for scientific computing. In Proceedings of the 3rd Conference on Computing Frontiers. ACM, New York, NY, 9–20. Google ScholarDigital Library
    68. John W. Young. 1974. A first order approximation to the optimum checkpoint interval. Commun. ACM 17, 9, 530–531. Google ScholarDigital Library
    69. Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation (NSDI’16). 2. Google ScholarDigital Library
    70. Gengbin Zheng, Abhinav Bhatelé, Esteban Meneses, and Laxmikant V. Kalé. 2011. Periodic hierarchical load balancing for large supercomputers. Int. J. High Perform. Comput. Appl. 25, 4, 371–385. Google ScholarDigital Library
    71. Yongning Zhu and Robert Bridson. 2005. Animating sand as a fluid. ACM Trans. Graphics 24, 965–972. 

ACM Digital Library Publication:

Overview Page: