“BSGP: bulk-synchronous GPU programming” by Hou, Zhou and Guo

  • ©Qiming Hou, Kun Zhou, and Baining Guo




    BSGP: bulk-synchronous GPU programming



    We present BSGP, a new programming language for general purpose computation on the GPU. A BSGP program looks much the same as a sequential C program. Programmers only need to supply a bare minimum of extra information to describe parallel processing on GPUs. As a result, BSGP programs are easy to read, write, and maintain. Moreover, the ease of programming does not come at the cost of performance. A well-designed BSGP compiler converts BSGP programs to kernels and combines them using optimally allocated temporary streams. In our benchmark, BSGP programs achieve similar or better performance than well-optimized CUDA programs, while the source code complexity and programming time are significantly reduced. To test BSGP’s code efficiency and ease of programming, we implemented a variety of GPU applications, including a highly sophisticated X3D parser that would be extremely difficult to develop with existing GPU programming languages.


    1. ARTIS, 2004. X3DToolKit homepage. http://artis.imag.fr/Software/X3D/.Google Scholar
    2. ATI, 2006. Researcher CTM documentation. http://ati.amd.com/companyinfo/researcher/documents.html.Google Scholar
    3. Buck, I., Foley, T., Horn, D., Sugerman, J., Fatahalian, K., Houston, M., and Hanrahan, P. 2004. Brook for GPUs: stream computing on graphics hardware. ACM Trans. Graph. 23, 3, 777–786. Google ScholarDigital Library
    4. Cytron, R., Ferrante, J., Rosen, B. K., Wegman, M. N., and Zadeck, F. K. 1991. Efficiently computing static single assignment form and the control dependence graph. ACM Trans. Program. Lang. Syst. 13, 4, 451–490. Google ScholarDigital Library
    5. Foley, T., and Sugerman, J. 2005. Kd-tree acceleration structures for a gpu raytracer. In Proceedings of Graphics Hardware, 15–22. Google ScholarDigital Library
    6. Ford, L. R., and Fulkerson, D. R. 1962. Flows in Networks. Princeton University Press.Google Scholar
    7. Fortune, S., and Wyllie, J. 1978. Parallelism in random access machines. In STOC ’78: Proceedings of the tenth annual ACM symposium on Theory of computing, 114–118. Google ScholarDigital Library
    8. Gress, A., and Zachmann, G. 2006. GPU-ABiSort: optimal parallel sorting on stream architectures. In Parallel and Distributed Processing Symposium, 45–54. Google ScholarDigital Library
    9. Harris, M., Owens, J., Sengupta, S., Zhang, Y., and Davidson, A., 2007. CUDPP homepage. http://www.gpgpu.org/developer/cudpp/.Google Scholar
    10. Hill, J. M. D., McColl, B., Stefanescu, D. C., Goudreau, M. W., Lang, K., Rao, S. B., Suel, T., Tsantilas, T., and Bisseling, R. H. 1998. Bsplib: The bsp programming library. Parallel Comput. 24, 14, 1947–1980. Google ScholarDigital Library
    11. Horn, D. R., Sugerman, J., Houston, M., and Hanrahan, P. 2007. Interactive k-d tree gpu raytracing. In I3D’07, 167–174. Google ScholarDigital Library
    12. Kale, L. V., and Krishnan, S. 1993. Charm++: a portable concurrent object oriented system based on c++. SIGPLAN Notices 28, 10, 91–108. Google ScholarDigital Library
    13. Mark, W. R., Glanville, R. S., Akeley, K., and Kilgard, M. J. 2003. Cg: a system for programming graphics hardware in a c-like language. In Proceedings of SIGGRAPH ’03, 896–907. Google ScholarDigital Library
    14. McCool, M., and Du Toit, S. 2004. Metaprogramming GPUs with Sh. AK Peters Ltd. Google ScholarDigital Library
    15. McCool, M. D., Qin, Z., and Popa, T. S. 2002. Shader metaprogramming. In Proceedings of Graphics hardware, 57–68. Google ScholarDigital Library
    16. McCool, M., Du Toit, S., Popa, T., Chan, B., and Moule, K. 2004. Shader algebra. ACM Trans. Graph. 23, 3, 787–795. Google ScholarDigital Library
    17. Moreton, H. 2001. Watertight tessellation using forward differencing. In Proceedings of Graphics Hardware, 25–32. Google ScholarDigital Library
    18. NVIDIA, 2007. CUDA homepage. http://developer.nvidia.com/object/cuda.html.Google Scholar
    19. Olano, M., and Lastra, A. 1998. A shading language on graphics hardware: the pixelflow shading system. In Proceedings of SIGGRAPH’98, 159–168. Google ScholarDigital Library
    20. Parisi, T. 2003. Flux: lightweight, standards-based web graphics in xml. In ACM SIGGRAPH 2003 Web Graphics, 1–1. Google ScholarDigital Library
    21. Peercy, M. S., Olano, M., Airey, J., and Ungar, P. J. 2000. Interactive multi-pass programmable shading. In Proceedings of SIGGRAPH’00, 425–432. Google ScholarDigital Library
    22. Popov, S., Günther, J., Seidel, H.-P., and Slusallek, P. 2007. Stackless kd-tree traversal for high performance gpu ray tracing. In Proceedings of Eurographics’07, 415–424.Google Scholar
    23. Proudfoot, K., Mark, W. R., Tzvetkov, S., and Hanrahan, P. 2001. A real-time procedural shading system for programmable graphics hardware. In Proceedings of SIGGRAPH’01, 159–170. Google ScholarDigital Library
    24. Sengupta, S., Harris, M., Zhang, Y., and Owens, J. D. 2007. Scan primitives for gpu computing. In Graphics Hardware, 97–106. Google ScholarDigital Library
    25. Skillicorn, D. B., Hill, J. M. D., and McColl, W. F. 1997. Questions and Answers about BSP. Scientific Programming 6, 3, 249–274.Google ScholarCross Ref
    26. Tarditi, D., Puri, S., and Oglesby, J. 2006. Accelerator: using data parallelism to program gpus for general-purpose uses. In ASPLOS-XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, 325–335. Google ScholarDigital Library
    27. Valiant, L. G. 1990. A bridging model for parallel computation. Communications of the ACM 33, 8, 103–111. Google ScholarDigital Library
    28. Wegman, M. N., and Zadeck, F. K. 1991. Constant propagation with conditional branches. ACM Trans. Program. Lang. Syst. 13, 2, 181–210. Google ScholarDigital Library

ACM Digital Library Publication:

Overview Page: