“BSGP: bulk-synchronous GPU programming” by Hou, Zhou and Guo
Conference:
Type(s):
Title:
- BSGP: bulk-synchronous GPU programming
Presenter(s)/Author(s):
Abstract:
We present BSGP, a new programming language for general purpose computation on the GPU. A BSGP program looks much the same as a sequential C program. Programmers only need to supply a bare minimum of extra information to describe parallel processing on GPUs. As a result, BSGP programs are easy to read, write, and maintain. Moreover, the ease of programming does not come at the cost of performance. A well-designed BSGP compiler converts BSGP programs to kernels and combines them using optimally allocated temporary streams. In our benchmark, BSGP programs achieve similar or better performance than well-optimized CUDA programs, while the source code complexity and programming time are significantly reduced. To test BSGP’s code efficiency and ease of programming, we implemented a variety of GPU applications, including a highly sophisticated X3D parser that would be extremely difficult to develop with existing GPU programming languages.
References:
1. ARTIS, 2004. X3DToolKit homepage. http://artis.imag.fr/Software/X3D/.Google Scholar
2. ATI, 2006. Researcher CTM documentation. http://ati.amd.com/companyinfo/researcher/documents.html.Google Scholar
3. Buck, I., Foley, T., Horn, D., Sugerman, J., Fatahalian, K., Houston, M., and Hanrahan, P. 2004. Brook for GPUs: stream computing on graphics hardware. ACM Trans. Graph. 23, 3, 777–786. Google ScholarDigital Library
4. Cytron, R., Ferrante, J., Rosen, B. K., Wegman, M. N., and Zadeck, F. K. 1991. Efficiently computing static single assignment form and the control dependence graph. ACM Trans. Program. Lang. Syst. 13, 4, 451–490. Google ScholarDigital Library
5. Foley, T., and Sugerman, J. 2005. Kd-tree acceleration structures for a gpu raytracer. In Proceedings of Graphics Hardware, 15–22. Google ScholarDigital Library
6. Ford, L. R., and Fulkerson, D. R. 1962. Flows in Networks. Princeton University Press.Google Scholar
7. Fortune, S., and Wyllie, J. 1978. Parallelism in random access machines. In STOC ’78: Proceedings of the tenth annual ACM symposium on Theory of computing, 114–118. Google ScholarDigital Library
8. Gress, A., and Zachmann, G. 2006. GPU-ABiSort: optimal parallel sorting on stream architectures. In Parallel and Distributed Processing Symposium, 45–54. Google ScholarDigital Library
9. Harris, M., Owens, J., Sengupta, S., Zhang, Y., and Davidson, A., 2007. CUDPP homepage. http://www.gpgpu.org/developer/cudpp/.Google Scholar
10. Hill, J. M. D., McColl, B., Stefanescu, D. C., Goudreau, M. W., Lang, K., Rao, S. B., Suel, T., Tsantilas, T., and Bisseling, R. H. 1998. Bsplib: The bsp programming library. Parallel Comput. 24, 14, 1947–1980. Google ScholarDigital Library
11. Horn, D. R., Sugerman, J., Houston, M., and Hanrahan, P. 2007. Interactive k-d tree gpu raytracing. In I3D’07, 167–174. Google ScholarDigital Library
12. Kale, L. V., and Krishnan, S. 1993. Charm++: a portable concurrent object oriented system based on c++. SIGPLAN Notices 28, 10, 91–108. Google ScholarDigital Library
13. Mark, W. R., Glanville, R. S., Akeley, K., and Kilgard, M. J. 2003. Cg: a system for programming graphics hardware in a c-like language. In Proceedings of SIGGRAPH ’03, 896–907. Google ScholarDigital Library
14. McCool, M., and Du Toit, S. 2004. Metaprogramming GPUs with Sh. AK Peters Ltd. Google ScholarDigital Library
15. McCool, M. D., Qin, Z., and Popa, T. S. 2002. Shader metaprogramming. In Proceedings of Graphics hardware, 57–68. Google ScholarDigital Library
16. McCool, M., Du Toit, S., Popa, T., Chan, B., and Moule, K. 2004. Shader algebra. ACM Trans. Graph. 23, 3, 787–795. Google ScholarDigital Library
17. Moreton, H. 2001. Watertight tessellation using forward differencing. In Proceedings of Graphics Hardware, 25–32. Google ScholarDigital Library
18. NVIDIA, 2007. CUDA homepage. http://developer.nvidia.com/object/cuda.html.Google Scholar
19. Olano, M., and Lastra, A. 1998. A shading language on graphics hardware: the pixelflow shading system. In Proceedings of SIGGRAPH’98, 159–168. Google ScholarDigital Library
20. Parisi, T. 2003. Flux: lightweight, standards-based web graphics in xml. In ACM SIGGRAPH 2003 Web Graphics, 1–1. Google ScholarDigital Library
21. Peercy, M. S., Olano, M., Airey, J., and Ungar, P. J. 2000. Interactive multi-pass programmable shading. In Proceedings of SIGGRAPH’00, 425–432. Google ScholarDigital Library
22. Popov, S., Günther, J., Seidel, H.-P., and Slusallek, P. 2007. Stackless kd-tree traversal for high performance gpu ray tracing. In Proceedings of Eurographics’07, 415–424.Google Scholar
23. Proudfoot, K., Mark, W. R., Tzvetkov, S., and Hanrahan, P. 2001. A real-time procedural shading system for programmable graphics hardware. In Proceedings of SIGGRAPH’01, 159–170. Google ScholarDigital Library
24. Sengupta, S., Harris, M., Zhang, Y., and Owens, J. D. 2007. Scan primitives for gpu computing. In Graphics Hardware, 97–106. Google ScholarDigital Library
25. Skillicorn, D. B., Hill, J. M. D., and McColl, W. F. 1997. Questions and Answers about BSP. Scientific Programming 6, 3, 249–274.Google ScholarCross Ref
26. Tarditi, D., Puri, S., and Oglesby, J. 2006. Accelerator: using data parallelism to program gpus for general-purpose uses. In ASPLOS-XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, 325–335. Google ScholarDigital Library
27. Valiant, L. G. 1990. A bridging model for parallel computation. Communications of the ACM 33, 8, 103–111. Google ScholarDigital Library
28. Wegman, M. N., and Zadeck, F. K. 1991. Constant propagation with conditional branches. ACM Trans. Program. Lang. Syst. 13, 2, 181–210. Google ScholarDigital Library