“Decoupling algorithms from schedules for easy optimization of image processing pipelines” by Ragan-Kelley, Adams, Paris, Levoy, Amarasinghe, et al. …

  • ©Jonathan Ragan-Kelley, Andrew Adams, Sylvain Paris, Marc Levoy, Saman Amarasinghe, and Frédo Durand




    Decoupling algorithms from schedules for easy optimization of image processing pipelines



    Using existing programming tools, writing high-performance image processing code requires sacrificing readability, portability, and modularity. We argue that this is a consequence of conflating what computations define the algorithm, with decisions about storage and the order of computation. We refer to these latter two concerns as the schedule, including choices of tiling, fusion, recomputation vs. storage, vectorization, and parallelism.We propose a representation for feed-forward imaging pipelines that separates the algorithm from its schedule, enabling high-performance without sacrificing code clarity. This decoupling simplifies the algorithm specification: images and intermediate buffers become functions over an infinite integer domain, with no explicit storage or boundary conditions. Imaging pipelines are compositions of functions. Programmers separately specify scheduling strategies for the various functions composing the algorithm, which allows them to efficiently explore different optimizations without changing the algorithmic code.We demonstrate the power of this representation by expressing a range of recent image processing applications in an embedded domain specific language called Halide, and compiling them for ARM, x86, and GPUs. Our compiler targets SIMD units, multiple cores, and complex memory hierarchies. We demonstrate that it can handle algorithms such as a camera raw pipeline, the bilateral grid, fast local Laplacian filtering, and image segmentation. The algorithms expressed in our language are both shorter and faster than state-of-the-art implementations.


    1. Adams, A., Talvala, E.-V., Park, S. H., Jacobs, D. E., Ajdin, B., Gelfand, N., Dolson, J., Vaquero, D., Baek, J., Tico, M., Lensch, H. P. A., Matusik, W., Pulli, K., Horowitz, M., and Levoy, M. 2010. The Frankencamera: An experimental platform for computational photography. ACM Transactions on Graphics 29, 4 (July), 29:1–29:12. Google ScholarDigital Library
    2. Aubry, M., Paris, S., Hasinoff, S. W., Kautz, J., and Durand, F. 2011. Fast and robust pyramid-based image processing. Tech. Rep. MIT-CSAIL-TR-2011-049, Massachusetts Institute of Technology.Google Scholar
    3. Buck, I. 2007. GPU computing: Programming a massively parallel processor. In CGO ’07: Proceedings of the International Symposium on Code Generation and Optimization, IEEE Computer Society, 17. Google ScholarDigital Library
    4. Chen, J., Paris, S., and Durand, F. 2007. Real-time edge-aware image processing with the bilateral grid. ACM Transactions on Graphics 26, 3 (July), 103:1–103:9. Google ScholarDigital Library
    5. CoreImage. Apple CoreImage programming guide. http://developer.apple.com/library/mac/#documentation/GraphicsImaging/Conceptual/CoreImaging.Google Scholar
    6. Elliott, C., Finne, S., and de Moor, O. 2003. Compiling embedded languages. Journal of Functional Programming 13, 2. Updated version of paper by the same name that appeared in SAIG ’00 proceedings. Google ScholarDigital Library
    7. Elliott, C. 2001. Functional image synthesis. In Proceedings of Bridges.Google Scholar
    8. Fatahalian, K., Horn, D. R., Knight, T. J., Leem, L., Houston, M., Park, J. Y., Erez, M., Ren, M., Aiken, A., Dally, W. J., and Hanrahan, P. 2006. Sequoia: programming the memory hierarchy. In Proceedings of the 2006 ACM/IEEE conference on Supercomputing, ACM, SC ’06. Google ScholarDigital Library
    9. Feautrier, P. 1991. Dataflow analysis of array and scalar references. International Journal of Parallel Programming 20.Google Scholar
    10. Gordon, M. I., Thies, W., Karczmarek, M., Lin, J., Meli, A. S., Leger, C., Lamb, A. A., Wong, J., Hoffman, H., Maze, D. Z., and Amarasinghe, S. 2002. A stream compiler for communication-exposed architectures. In International Conference on Architectural Support for Programming Languages and Operating Systems. Google ScholarDigital Library
    11. Guenter, B., and Nehab, D. 2010. The neon image processing language. Tech. Rep. MSR-TR-2010-175, Microsoft Research.Google Scholar
    12. IPP. Intel Integrated Performance Primitives. http://software.intel.com/en-us/articles/intel-ipp/.Google Scholar
    13. Kapasi, U. J., Mattson, P., Dally, W. J., Owens, J. D., and Towles, B. 2002. Stream scheduling. Concurrent VLSI Architecture Tech Report 122, Stanford University, March.Google Scholar
    14. Kass, M., Witkin, A., and Terzopoulos, D. 1988. Snakes: Active contour models. International Journal of Computer Vision 1, 4.Google ScholarCross Ref
    15. Levoy, M. 1994. Spreadsheets for images. In Proceedings of SIGGRAPH 94, Computer Graphics Proceedings, Annual Conference Series, 139–146. Google ScholarDigital Library
    16. Li, C., Xu, C., Gui, C., and Fox, M. D. 2010. Distance regularized level set evolution and its application to image segmentation. IEEE Transactions on Image Processing 19, 12 (December), 3243–3254. Google ScholarDigital Library
    17. LLVM. The LLVM compiler infrastructure. http://llvm.org.Google Scholar
    18. McCool, M. D., Qin, Z., and Popa, T. S. 2002. Shader metaprogramming. In Graphics Hardware 2002, 57–68. Google ScholarDigital Library
    19. Newburn, C. J., So, B., Liu, Z., McCool, M., Ghuloum, A., Toit, S. D., Wang, Z. G., Du, Z. H., Chen, Y., Wu, G., Guo, P., Liu, Z., and Zhang, D. 2011. Intel’s array building blocks: A retargetable, dynamic compiler and embedded language. In Proceedings of the 2011 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization, IEEE Computer Society, CGO ’11, 224–235. Google ScholarDigital Library
    20. OpenCL, 2011. The OpenCL specification, version 1.2. http://www.khronos.org/registry/cl/specs/opencl-1.2.pdf.Google Scholar
    21. OpenMP. OpenMP. http://openmp.org/.Google Scholar
    22. Paris, S., and Durand, F. 2009. A fast approximation of the bilateral filter using a signal processing approach. International Journal of Computer Vision 81, 1, 24–52. Google ScholarDigital Library
    23. Paris, S., Kornprobst, P., Tumblin, J., and Durand, F. 2009. Bilateral filtering: Theory and applications. Foundations and Trends in Computer Graphics and Vision. Google ScholarDigital Library
    24. Paris, S., Hasinoff, S. W., and Kautz, J. 2011. Local Laplacian filters: Edge-aware image processing with a Laplacian pyramid. ACM Transactions on Graphics 30, 4. Google ScholarDigital Library
    25. Pharr, M., and Mark, W. R. 2012. ispc: A SPMD compiler for high-performance CPU programming. In Proceedings of Innovative Parallel Computing (InPar).Google Scholar
    26. PixelBender. Adobe PixelBender reference. http://www.adobe.com/content/dam/Adobe/en/devnet/pixelbender/pdfs/pixelbender_reference.pdf.Google Scholar
    27. Püschel, M., Moura, J. M. F., Johnson, J., Padua, D., Veloso, M., Singer, B., Xiong, J., Franchetti, F., Gacic, A., Voronenko, Y., Chen, K., Johnson, R. W., and Rizzolo, N. 2005. SPIRAL: Code generation for DSP transforms. Proceedings of the IEEE, special issue on “Program Generation, Optimization, and Adaptation” 93, 2, 232–275.Google Scholar
    28. Shantzis, M. A. 1994. A model for efficient and flexible image computing. In Proceedings of the 21st annual conference on Computer graphics and interactive techniques, ACM, SIGGRAPH ’94, 147–154. Google ScholarDigital Library

ACM Digital Library Publication:

Overview Page: