Decoupling algorithms from schedules for easy optimization of image processing pipelines

Using existing programming tools, writing high-performance image processing code requires sacrificing readability, portability, and modularity. We argue that this is a consequence of conflating what computations define the algorithm, with decisions about storage and the order of computation. We refer to these latter two concerns as the schedule, including choices of tiling, fusion, recomputation vs. storage, vectorization, and parallelism.We propose a representation for feed-forward imaging pipelines that separates the algorithm from its schedule, enabling high-performance without sacrificing code clarity. This decoupling simplifies the algorithm specification: images and intermediate buffers become functions over an infinite integer domain, with no explicit storage or boundary conditions. Imaging pipelines are compositions of functions. Programmers separately specify scheduling strategies for the various functions composing the algorithm, which allows them to efficiently explore different optimizations without changing the algorithmic code.We demonstrate the power of this representation by expressing a range of recent image processing applications in an embedded domain specific language called Halide, and compiling them for ARM, x86, and GPUs. Our compiler targets SIMD units, multiple cores, and complex memory hierarchies. We demonstrate that it can handle algorithms such as a camera raw pipeline, the bilateral grid, fast local Laplacian filtering, and image segmentation. The algorithms expressed in our language are both shorter and faster than state-of-the-art implementations.

References:

1. Adams, A., Talvala, E.-V., Park, S. H., Jacobs, D. E., Ajdin, B., Gelfand, N., Dolson, J., Vaquero, D., Baek, J., Tico, M., Lensch, H. P. A., Matusik, W., Pulli, K., Horowitz, M., and Levoy, M. 2010. The Frankencamera: An experimental platform for computational photography. ACM Transactions on Graphics 29, 4 (July), 29:1–29:12. Google ScholarDigital Library
2. Aubry, M., Paris, S., Hasinoff, S. W., Kautz, J., and Durand, F. 2011. Fast and robust pyramid-based image processing. Tech. Rep. MIT-CSAIL-TR-2011-049, Massachusetts Institute of Technology.Google Scholar
3. Buck, I. 2007. GPU computing: Programming a massively parallel processor. In CGO ’07: Proceedings of the International Symposium on Code Generation and Optimization, IEEE Computer Society, 17. Google ScholarDigital Library
4. Chen, J., Paris, S., and Durand, F. 2007. Real-time edge-aware image processing with the bilateral grid. ACM Transactions on Graphics 26, 3 (July), 103:1–103:9. Google ScholarDigital Library
5. CoreImage. Apple CoreImage programming guide. http://developer.apple.com/library/mac/#documentation/GraphicsImaging/Conceptual/CoreImaging.Google Scholar
6. Elliott, C., Finne, S., and de Moor, O. 2003. Compiling embedded languages. Journal of Functional Programming 13, 2. Updated version of paper by the same name that appeared in SAIG ’00 proceedings. Google ScholarDigital Library
7. Elliott, C. 2001. Functional image synthesis. In Proceedings of Bridges.Google Scholar
8. Fatahalian, K., Horn, D. R., Knight, T. J., Leem, L., Houston, M., Park, J. Y., Erez, M., Ren, M., Aiken, A., Dally, W. J., and Hanrahan, P. 2006. Sequoia: programming the memory hierarchy. In Proceedings of the 2006 ACM/IEEE conference on Supercomputing, ACM, SC ’06. Google ScholarDigital Library
9. Feautrier, P. 1991. Dataflow analysis of array and scalar references. International Journal of Parallel Programming 20.Google Scholar
10. Gordon, M. I., Thies, W., Karczmarek, M., Lin, J., Meli, A. S., Leger, C., Lamb, A. A., Wong, J., Hoffman, H., Maze, D. Z., and Amarasinghe, S. 2002. A stream compiler for communication-exposed architectures. In International Conference on Architectural Support for Programming Languages and Operating Systems. Google ScholarDigital Library
11. Guenter, B., and Nehab, D. 2010. The neon image processing language. Tech. Rep. MSR-TR-2010-175, Microsoft Research.Google Scholar
12. IPP. Intel Integrated Performance Primitives. http://software.intel.com/en-us/articles/intel-ipp/.Google Scholar
13. Kapasi, U. J., Mattson, P., Dally, W. J., Owens, J. D., and Towles, B. 2002. Stream scheduling. Concurrent VLSI Architecture Tech Report 122, Stanford University, March.Google Scholar
14. Kass, M., Witkin, A., and Terzopoulos, D. 1988. Snakes: Active contour models. International Journal of Computer Vision 1, 4.Google ScholarCross Ref
15. Levoy, M. 1994. Spreadsheets for images. In Proceedings of SIGGRAPH 94, Computer Graphics Proceedings, Annual Conference Series, 139–146. Google ScholarDigital Library
16. Li, C., Xu, C., Gui, C., and Fox, M. D. 2010. Distance regularized level set evolution and its application to image segmentation. IEEE Transactions on Image Processing 19, 12 (December), 3243–3254. Google ScholarDigital Library
17. LLVM. The LLVM compiler infrastructure. http://llvm.org.Google Scholar
18. McCool, M. D., Qin, Z., and Popa, T. S. 2002. Shader metaprogramming. In Graphics Hardware 2002, 57–68. Google ScholarDigital Library
19. Newburn, C. J., So, B., Liu, Z., McCool, M., Ghuloum, A., Toit, S. D., Wang, Z. G., Du, Z. H., Chen, Y., Wu, G., Guo, P., Liu, Z., and Zhang, D. 2011. Intel’s array building blocks: A retargetable, dynamic compiler and embedded language. In Proceedings of the 2011 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization, IEEE Computer Society, CGO ’11, 224–235. Google ScholarDigital Library
20. OpenCL, 2011. The OpenCL specification, version 1.2. http://www.khronos.org/registry/cl/specs/opencl-1.2.pdf.Google Scholar
21. OpenMP. OpenMP. http://openmp.org/.Google Scholar
22. Paris, S., and Durand, F. 2009. A fast approximation of the bilateral filter using a signal processing approach. International Journal of Computer Vision 81, 1, 24–52. Google ScholarDigital Library
23. Paris, S., Kornprobst, P., Tumblin, J., and Durand, F. 2009. Bilateral filtering: Theory and applications. Foundations and Trends in Computer Graphics and Vision. Google ScholarDigital Library
24. Paris, S., Hasinoff, S. W., and Kautz, J. 2011. Local Laplacian filters: Edge-aware image processing with a Laplacian pyramid. ACM Transactions on Graphics 30, 4. Google ScholarDigital Library
25. Pharr, M., and Mark, W. R. 2012. ispc: A SPMD compiler for high-performance CPU programming. In Proceedings of Innovative Parallel Computing (InPar).Google Scholar
26. PixelBender. Adobe PixelBender reference. http://www.adobe.com/content/dam/Adobe/en/devnet/pixelbender/pdfs/pixelbender_reference.pdf.Google Scholar
27. Püschel, M., Moura, J. M. F., Johnson, J., Padua, D., Veloso, M., Singer, B., Xiong, J., Franchetti, F., Gacic, A., Voronenko, Y., Chen, K., Johnson, R. W., and Rizzolo, N. 2005. SPIRAL: Code generation for DSP transforms. Proceedings of the IEEE, special issue on “Program Generation, Optimization, and Adaptation” 93, 2, 232–275.Google Scholar
28. Shantzis, M. A. 1994. A model for efficient and flexible image computing. In Proceedings of the 21st annual conference on Computer graphics and interactive techniques, ACM, SIGGRAPH ’94, 147–154. Google ScholarDigital Library

ACM Digital Library Publication:

Overview Page:

SIGGRAPH 2012: Technical Papers

“Decoupling algorithms from schedules for easy optimization of image processing pipelines” by Ragan-Kelley, Adams, Paris, Levoy, Amarasinghe, et al. …

Conference:

Type(s):

Title:

Presenter(s)/Author(s):

Abstract:

References:

ACM Digital Library Publication:

Overview Page:

Sponsored by: