“VizGen: accelerating visual computing prototypes in dynamic languages”
Conference:
Type(s):
Title:
- VizGen: accelerating visual computing prototypes in dynamic languages
Session/Category Title: Filtering Images
Presenter(s)/Author(s):
Abstract:
This paper introduces a novel domain-specific compiler, which translates visual computing programs written in dynamic languages to highly efficient code. We define “dynamic” languages as those such as Python and MATLAB, which feature dynamic typing and flexible array operations. Such language features can be useful for rapid prototyping, however, the dynamic computation model introduces significant overheads in program execution time. We introduce a compiler framework for accelerating visual computing programs, such as graphics and vision programs, written in generalpurpose dynamic languages. Our compiler allows substantial performance gains (frequently orders of magnitude) over general compilers for dynamic languages by specializing the compiler for visual computation. Specifically, our compiler takes advantage of three key properties of visual computing programs, which permit optimizations: (1) many array data structures have small, constant, or bounded size, (2) many operations on visual data are supported in hardware or are embarrassingly parallel, and (3) humans are not sensitive to small numerical errors in visual outputs due to changing floating-point precisions. Our compiler integrates program transformations that have been described previously, and improves existing transformations to handle visual programs that perform complicated array computations. In particular, we show that dependent type analysis can be used to infer sizes and guide optimizations for many small-sized array operations that arise in visual programs. Programmers who are not experts on visual computation can use our compiler to produce more efficient Python programs than if they write manually parallelized C, with fewer lines of application logic.
References:
1. Adams, A., Talvala, E.-V., Park, S. H., Jacobs, D. E., Ajdin, B., Gelfand, N., Dolson, J., Vaquero, D., Baek, J., Tico, M., et al. 2010. The frankencamera: an experimental platform for computational photography. In ACM Transactions on Graphics (TOG), vol. 29, ACM, 29.
2. Akeret, J., Gamper, L., Amara, A., and Refregier, A. 2015. Hope: A python just-in-time compiler for astrophysical computations. Astronomy and Computing 10, 1–8. Cross Ref
3. Ammons, G., and Larus, J. R. 1998. Improving data-flow analysis with path profiles. In ACM PLDI.
4. Ansel, J., Kamil, S., Veeramachaneni, K., Ragan-Kelley, J., Bosboom, J., O’Reilly, U.-M., and Amarasinghe, S. 2014. Opentuner: An extensible framework for program autotuning. In Proceedings of the 23rd international conference on Parallel architectures and compilation, ACM.
5. Augustsson, L. 1998. Cayennea language with dependent types. In ACM SIGPLAN Notices, vol. 34, ACM, 239–250.
6. Ball, T., Mataga, P., and Sagiv, S. 1998. Edge profiling versus path profiling: The showdown. In POPL ’98, 134–148.
7. Barnes, C., Shechtman, E., Finkelstein, A., and Goldman, D. 2009. PatchMatch: A randomized correspondence algorithm for structural image editing. ACM Transactions on Graphics 28, 3, 24:1–24:10.
8. Behnel, S., Bradshaw, R., Citro, C., Dalcin, L., Seljebotn, D. S., and Smith, K. 2011. Cython: The best of both worlds. Computing in Science & Engineering 13, 2, 31–39.
9. Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., and Bengio, Y. 2010. Theano: A cpu and gpu math compiler in python. In Proc. 9th Python in Science Conf.
10. Bezanson, J., Karpinski, S., Shah, V. B., and Edelman, A. 2012. Julia: A fast dynamic language for technical computing. arXiv preprint arXiv:1209.5145.
11. Blume, W., Doallo, R., Rauchwerger, L., Tu, P., Eigenmann, R., Grout, J., Hoeflinger, J., Lawrence, T., Lee, J., Padua, D., et al. 1996. Parallel programming with polaris. Computer, 12, 78–82.
12. Bolz, C. F., Cuni, A., Fijalkowski, M., and Rigo, A. 2009. Tracing the meta-level: Pypy’s tracing jit compiler. In Workshop on Implementation, Compilation, Optimization.
13. Buse, R. P. L., and Weimer, W. 2009. The road not taken: Estimating path execution frequency statically. In 31st International Conference on Software Engineering, ICSE 2009.
14. Cann, D. C., and Evripidou, P. 1995. Advanced array optimizations for high performance functional languages. Parallel and Distributed Systems, IEEE Transactions on 6, 3, 229–239.
15. Chen, J., Paris, S., and Durand, F. 2007. Real-time edge-aware image processing with the bilateral grid. In ACM Transactions on Graphics (TOG), vol. 26, ACM, 103.
16. Condit, J., Harren, M., Anderson, Z., Gay, D., and Necula, G. C. 2007. Dependent types for low-level programming. In Programming Languages and Systems. Springer, 520–535.
17. Cornwall, J. L., Howes, L., Kelly, P. H., Parsonage, P., and Nicoletti, B. 2009. High-performance simt code generation in an active visual effects library. In Proceedings of the 6th ACM conference on Computing frontiers, ACM, 175–184.
18. De Moura, L., and Bjørner, N. 2008. Z3: An efficient smt solver. In Tools and Algorithms for the Construction and Analysis of Systems. Springer, 337–340.
19. DeVito, Z., Hegarty, J., Aiken, A., Hanrahan, P., and Vitek, J. 2013. Terra: a multi-stage language for high-performance computing. In ACM SIGPLAN Notices, vol. 48.
20. Dufour, M., and Coughlan, J., 2013. Shedskin: an experimental (restricted-python)-to-c++ compiler. https://code.google.com/p/shedskin/wiki/docs.
21. Elliott, C. Functional image synthesis. In Proceedings Bridges 2001, Mathematical Connections in Art, Music, and Science.
22. Frigo, M., and Johnson, S. G. 1998. Fftw: An adaptive software architecture for the fft. In 1998 IEEE Acoustics, Speech and Signal Processing, vol. 3.
23. Frigo, M., and Strumpen, V. 2005. Cache oblivious stencil computations. In Proc. 19th conf. Supercomputing, ACM.
24. Gao, G., Olsen, R., Sarkar, V., and Thekkath, R. 1993. Collective loop fusion for array contraction. In Languages and Compilers for Parallel Computing. Springer, 281–295.
25. Garg, R., and Amaral, J. N. 2010. Compiling python to a hybrid execution environment. In Proceedings of the 3rd Workshop on GPGPU, ACM.
26. Graham, S. L., Kessler, P. B., and McKusick, M. K. 1982. gprof: a call graph execution profiler (with retrospective). In ACM SIGPLAN PLDI.
27. Grosser, T., Cohen, A., Holewinski, J., Sadayappan, P., and Verdoolaege, S. 2014. Hybrid hexagonal/classical tiling for gpus. In Proc. IEEE/ACM Symposium on Code Generation and Optimization.
28. Guelton, S., Brunet, P., Amini, M., Merlini, A., Corbillon, X., and Raynaud, A. 2015. Pythran: Enabling static optimization of scientific python programs. Computational Science & Discovery 8, 1, 014001. Cross Ref
29. Harman, M., and Jones, B. F. 2001. Search-based software engineering. Information and Software Technology 43, 14. Cross Ref
30. Harris, C., and Stephens, M. 1988. A combined corner and edge detector. In Alvey vision conference, vol. 15, Citeseer, 50.
31. Holewinski, J., Pouchet, L.-N., and Sadayappan, P. 2012. High-performance code generation for stencil computations on gpu architectures. In Proceedings of the 26th ACM international conference on Supercomputing, ACM, 311–320.
32. Karcher, T., and Pankratius, V. 2011. Run-time automatic performance tuning for multicore applications. In Euro-Par 2011 Parallel Processing. Springer, 3–14.
33. Kennedy, K., and McKinley, K. S. 1994. Maximizing loop parallelism and improving data locality via loop fusion and distribution. Springer.
34. Krishnamoorthy, S., Baskaran, M., Bondhugula, U., Ramanujam, J., Rountev, A., and Sadayappan, P. 2007. Effective automatic parallelization of stencil computations. In ACM Sigplan Notices, vol. 42, ACM, 235–244.
35. Lam, S. K., Pitrou, A., and Seibert, S. 2015. Numba: a llvm-based python jit compiler. In Proceedings of Workshop on LLVM Compiler Infrastructure in HPC, ACM.
36. Morajko, A., Margalef, T., and Luque, E. 2007. Design and implementation of a dynamic tuning environment. Journal of Parallel and Distributed Computing 67, 4, 474–490.
37. Mullapudi, R. T., Vasista, V., and Bondhugula, U. 2015. Polymage: Automatic optimization for image processing pipelines. In Proc. Conference on Architectural Support for Programming Languages and Operating Systems, ACM.
38. Mullapudi, R. T., Adams, A., Sharlet, D., Ragan-Kelley, J., and Fatahalian, K. 2016. Automatically scheduling halide image processing pipelines. In Proc. ACM SIGGRAPH.
39. Paris, S., Hasinoff, S. W., and Kautz, J. 2011. Local laplacian filters: edge-aware image processing with a laplacian pyramid. ACM Trans. Graph. 30, 4, 68.
40. Pharr, M., and Mark, W. R. 2012. ispc: A spmd compiler for high-performance cpu programming. In Innovative Parallel Computing (InPar), 2012, IEEE, 1–13.
41. Quora: How many photos are uploaded to facebook every day? https://www.quora.com/How-many-photos-are-uploaded-to-Facebook-each-day.
42. Ragan-Kelley, J., Barnes, C., Adams, A., Paris, S., Durand, F., and Amarasinghe, S. 2013. Halide: A language and compiler for optimizing parallelism, locality and recomputation in image processing pipelines. In ACM SIGPLAN Conference on Programming Language Design and Implementation.
43. Salib, M. 2004. Starkiller: A static type inferencer and compiler for Python. PhD thesis, Citeseer.
44. Shantzis, M. A. 1994. A model for efficient and flexible image computing. In Proceedings of the 21st annual conference on Computer graphics and interactive techniques, ACM, 147–154.
45. Syed-Abdul, S., Fernandez-Luque, L., Jian, W.-S., Li, Y.-C., Crain, S., Hsu, M.-H., Wang, Y.-C., Khandregzen, D., Chuluunbaatar, E., and Nguyen, P. A. 2013. Misleading health-related information promoted through video-based social media: anorexia on youtube. Journal of medical Internet research 15, 2, e30.
46. Talbot, J., DeVito, Z., and Hanrahan, P. 2012. Riposte: a trace-driven compiler and parallel vm for vector code in r. In Proceedings of the 21st international conference on Parallel architectures and compilation techniques, ACM, 43–52.
47. Tang, Y., Chowdhury, R. A., Kuszmaul, B. C., Luk, C.-K., and Leiserson, C. E. 2011. The pochoir stencil compiler. In Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures, ACM, 117–128.
48. Whaley, R. C., Petitet, A., and Dongarra, J. J. 2001. Automated empirical optimizations of software and the atlas project. Parallel Computing 27, 1, 3–35. Cross Ref
49. Xi, H., and Pfenning, F. 1998. Eliminating array bound checking through dependent types. ACM SIGPLAN Notices 33, 5, 249–257.
50. Xi, H., and Pfenning, F. 1999. Dependent types in practical programming. In Proc. 26th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, ACM.


