“Unstructured video-based rendering: interactive exploration of casually captured videos” by Ballan, Brostow, Puwein and Pollefeys

  • ©Luca Ballan, Gabriel J. Brostow, Jens Puwein, and Marc Pollefeys




    Unstructured video-based rendering: interactive exploration of casually captured videos



    We present an algorithm designed for navigating around a performance that was filmed as a “casual” multi-view video collection: real-world footage captured on hand held cameras by a few audience members. The objective is to easily navigate in 3D, generating a video-based rendering (VBR) of a performance filmed with widely separated cameras. Casually filmed events are especially challenging because they yield footage with complicated backgrounds and camera motion. Such challenging conditions preclude the use of most algorithms that depend on correlation-based stereo or 3D shape-from-silhouettes.Our algorithm builds on the concepts developed for the exploration of photo-collections of empty scenes. Interactive performer-specific view-interpolation is now possible through innovations in interactive rendering and offline-matting relating to i) modeling the foreground subject as video-sprites on billboards, ii) modeling the background geometry with adaptive view-dependent textures, and iii) view interpolation that follows a performer. The billboards are embedded in a simple but realistic reconstruction of the environment. The reconstructed environment provides very effective visual cues for spatial navigation as the user transitions between viewpoints. The prototype is tested on footage from several challenging events, and demonstrates the editorial utility of the whole system and the particular value of our new inter-billboard optimization.


    1. Arulampalam, M. S., Maskell, S., and Gordon, N. 2002. A tutorial on particle filters for online nonlinear/non-gaussian bayesian tracking. IEEE Trans. Signal Processing 50, 174–188. Google ScholarDigital Library
    2. Bai, X., Wang, J., Simons, D., and Sapiro, G. 2009. Video snapcut: robust video object cutout using localized classifiers. ACM Trans. Graph. 28, 3. Google ScholarDigital Library
    3. Ballan, L., and Cortelazzo, G. M. 2008. Marker-less motion capture of skinned models in a four camera set-up using optical flow and silhouettes. In 3DPVT.Google Scholar
    4. Boykov, Y., and Kolmogorov, V. 2004. An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Trans. Pattern Anal. Mach. Intell. 26, 9, 1124–1137. Google ScholarDigital Library
    5. Buehler, C., Bosse, M., McMillan, L., Gortler, S. J., and Cohen, M. F. 2001. Unstructured lumigraph rendering. In SIGGRAPH, 425–432. Google ScholarDigital Library
    6. Campbell, N. D., Vogiatzis, G., Hernández, C., and Cipolla, R. 2007. Automatic 3d object segmentation in multiple views using volumetric graph-cuts. In 18th British Machine Vision Conference, vol. 1, 530–539.Google Scholar
    7. Carranza, J., Theobalt, C., Magnor, M. A., and peter Seidel, H. 2003. Free-viewpoint video of human actors. In ACM Transactions on Graphics, 569–577. Google ScholarDigital Library
    8. Chen, S. E., and Williams, L. 1993. View interpolation for image synthesis. In SIGGRAPH ’93: Proceedings of the 20th annual conference on Computer graphics and interactive techniques, 279–288. Google ScholarDigital Library
    9. Chuang, Y.-Y., Curless, B., Salesin, D. H., and Szeliski, R. 2001. A bayesian approach to digital matting. In Proceedings of IEEE CVPR 2001, vol. 2, 264–271.Google Scholar
    10. Chuang, Y.-Y., Agarwala, A., Curless, B., Salesin, D. H., and Szeliski, R. 2002. Video matting of complex scenes. ACM Transactions on Graphics 21, 3 (July), 243–248. Google ScholarDigital Library
    11. de Aguiar, E., Stoll, C., Theobalt, C., Ahmed, N., Seidel, H. P., and Thrun, S. 2008. Performance capture from sparse multi-view video. ACM Trans. Graph. 27, 3, 1–10. Google ScholarDigital Library
    12. Debevec, P. E., Taylor, C. J., and Malik, J. 1996. Modeling and rendering architecture from photographs: A hybrid geometry- and image-based approach. In Proceedings of SIGGRAPH 96, Computer Graphics Proceedings, Annual Conference Series, 11–20. Google ScholarDigital Library
    13. Debevec, P., Borshukov, G., and Yu, Y. 1998. Efficient view-dependent image-based rendering with projective texture-mapping. In 9th Eurographics Workshop on Rendering.Google Scholar
    14. Dragicevic, P., Ramos, G., Bibliowitcz, J., Nowrouzezahrai, D., Balakrishnan, R., and Singh, K. 2008. Video browsing by direct manipulation. In CHI ’08: Proceeding of the twenty-sixth annual SIGCHI conference on Human factors in computing systems, 237–246. Google ScholarDigital Library
    15. Eisemann, M., Decker, B. D., Magnor, M., Bekaert, P., de Aguiar, E., Ahmed, N., Theobalt, C., and Sellent, A. 2008. Floating Textures. Computer Graphics Forum (Proc. Eurographics EG’08) 27, 2 (4), 409–418.Google Scholar
    16. Franco, J.-S., and Boyer, E. 2005. Fusion of multi-view silhouette cues using a space occupancy grid. In ICCV, 1747–1753. Google ScholarDigital Library
    17. Goesele, M., Snavely, N., Curless, B., Hoppe, H., and Seitz, S. M. 2007. Multi-view stereo for community photo collections. In ICCV, 1–8.Google Scholar
    18. Goldman, D. B., Gonterman, C., Curless, B., Salesin, D., and Seitz, S. M. 2008. Video object annotation, navigation, and composition. In UIST ’08: Proceedings of the 21st annual ACM symposium on User interface software and technology, 3–12. Google ScholarDigital Library
    19. Goldman, D. B. 2007. A framework for video annotation, visualization, and interaction. PhD thesis. Google ScholarDigital Library
    20. Gortler, S. J., Grzeszczuk, R., Szeliski, R., and Cohen, M. F. 1996. The lumigraph. In SIGGRAPH, 43–54. Google ScholarDigital Library
    21. Grundland, M., Vohra, R., Williams, G. P., and Dodgson, N. A. 2006. Cross dissolve without cross fade: Preserving contrast, color and salience in image compositing. In Proceedings of EUROGRAPHICS, Computer Graphics Forum, 577–586.Google Scholar
    22. Guillemaut, J.-Y., Hilton, A., Starck, J., Kilner, J., and Grau, O. 2007. A bayesian framework for simultaneous matting and 3d reconstruction. In 3DIM ’07: Proceedings of the Sixth International Conference on 3-D Digital Imaging and Modeling, 167–176. Google ScholarDigital Library
    23. Guillemaut, J.-Y., Kilner, J., and Hilton, A. 2009. Robust graph-cut scene segmentation and reconstruction for free-viewpoint video of complex dynamic scenes. In Proc. International Conference on Computer Vision (ICCV 2009).Google Scholar
    24. Hartley, R. I., and Zisserman, A. 2000. Multiple View Geometry in Computer Vision. Cambridge University Press, ISBN: 0521623049. Google ScholarDigital Library
    25. Hasler, N., Rosenhahn, B., Thormählen, T., Wand, M., Gall, J., and Seidel, H.-P. 2009. Markerless motion capture with unsynchronized moving cameras. In CVPR, 224–231.Google Scholar
    26. Hayashi, K., and Saito, H. 2006. Synthesizing free-viewpoint images from multiple view videos in soccer stadium. In CGIV ’06: Proceedings of the International Conference on Computer Graphics, Imaging and Visualisation, 220–225. Google ScholarDigital Library
    27. Hays, J., and Efros, A. A. 2007. Scene completion using millions of photographs. ACM Transactions on Graphics (SIGGRAPH 2007) 26, 3. Google ScholarDigital Library
    28. Heigl, B., Koch, R., Pollefeys, M., Denzler, J., and Van Gool, L. 1999. Plenoptic modeling and rendering from image sequences taken by hand-held camera. In Patter Recognition 1999, 21. DAGM-Symposium, 94–101. Google ScholarDigital Library
    29. Kanade, T., 2001. Carnegie mellon goes to the superbowl. http://www.ri.cmu.edu/events/sb35/tksuperbowl.html.Google Scholar
    30. Karrer, T., Weiss, M., Lee, E., and Borchers, J. 2008. Dragon: a direct manipulation interface for frame-accurate in-scene video navigation. In CHI ’08, 247–250. Google ScholarDigital Library
    31. Kilner, J., Starck, J., and Hilton, A. 2006. A comparative study of free-viewpoint video techniques for sports events. European Conference on Visual Media Production (CVMP).Google Scholar
    32. Kilner, J., Starck, J., Hilton, A., and Grau, O. 2007. Dual-mode deformable models for free-viewpoint video of sports events. In 3DIM07, 177–184. Google ScholarDigital Library
    33. Kopf, J., Neubert, B., Chen, B., Cohen, M., Cohen-Or, D., Deussen, O., Uyttendaele, M., and Lischinski, D. 2008. Deep photo: model-based photograph enhancement and viewing. ACM Trans. Graph. 27, 5, 116. Google ScholarDigital Library
    34. Levoy, M., and Hanrahan, P. 1996. Light field rendering. In SIGGRAPH, 31–42. Google ScholarDigital Library
    35. Lhuillier, M., and Quan, L. 2005. A quasi-dense approach to surface reconstruction from uncalibrated images. IEEE Trans. Pattern Anal. Mach. Intell. 27, 3, 418–433. Google ScholarDigital Library
    36. Liu, F., Gleicher, M., Jin, H., and Agarwala, A. 2009. Content-preserving warps for 3d video stabilization. In ACM SIGGRAPH 2009, 1–9. Google ScholarDigital Library
    37. Lowe, D. G. 2004. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60, 2, 91–110. Google ScholarDigital Library
    38. Matusik, W., Buehler, C., Raskar, R., Gortler, S. J., and McMillan, L. 2000. Image-based visual hulls. In Proceedings of ACM SIGGRAPH, 369–374. Google ScholarDigital Library
    39. Pollefeys, M., Van Gool, L., Vergauwen, M., Verbiest, F., Cornelis, K., Tops, J., and Koch, R. 2004. Visual modeling with a hand-held camera. IJCV 59, 3, 207–232. Google ScholarDigital Library
    40. Rav-Acha, A., Kohli, P., Rother, C., and Fitzgibbon, A. 2008. Unwrap mosaics: A new representation for video editing. ACM Transactions on Graphics (SIGGRAPH 2008) (August). Google ScholarDigital Library
    41. Rong, G., and Tan, T.-S. 2006. Jump flooding in gpu with applications to voronoi diagram and distance transform. In ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games (I3D), ACM, 109–116. Google ScholarDigital Library
    42. Schindler, G., and Dellaert, F. 2010. Probabilistic temporal inference on reconstructed 3D scenes. In CVPR, 1–8.Google Scholar
    43. Schödl, A., Szeliski, R., Salesin, D. H., and Essa, I. 2000. Video textures. In SIGGRAPH ’00: Proceedings of the 27th annual conference on Computer graphics and interactive techniques, 489–498. Google ScholarDigital Library
    44. Schönemann, P. 1966. A generalized solution of the orthogonal procrustes problem. Psychometrika 31, 1 (March), 1–10.Google ScholarCross Ref
    45. Seitz, S. M., and Dyer, C. R. 1996. View morphing. In Proceedings of ACM SIGGRAPH, 21–30. Google ScholarDigital Library
    46. Seitz, S. M., Curless, B., Diebel, J., Scharstein, D., and Szeliski, R. 2006. A comparison and evaluation of multiview stereo reconstruction algorithms. In 2006 Conference on Computer Vision and Pattern Recognition (CVPR 2006), 519–528. Google ScholarDigital Library
    47. Sinha, S. N., and Pollefeys, M. 2004. Synchronization and calibration of camera networks from silhouettes. In ICPR ’04: Proceedings of the Pattern Recognition, 17th International Conference on (ICPR’04) Volume 1, 116–119. Google ScholarDigital Library
    48. Sinha, S. N., Steedly, D., Szeliski, R., Agrawala, M., and Pollefeys, M. 2008. Interactive 3d architectural modeling from unordered photo collections. ACM Transactions on Graphics (Proceedings of SIGGRAPH Asia 2008) 27, 5, 159. Google ScholarDigital Library
    49. Sivic, J., and Zisserman, A. 2003. Video Google: A text retrieval approach to object matching in videos. In Proceedings of the International Conference on Computer Vision, vol. 2, 1470–1477. Google ScholarDigital Library
    50. Snavely, N., Seitz, S. M., and Szeliski, R. 2006. Photo tourism: Exploring photo collections in 3d. In SIGGRAPH Conference Proceedings, 835–846. Google ScholarDigital Library
    51. Snavely, N., Garg, R., Seitz, S. M., and Szeliski, R. 2008. Finding paths through the world’s photos. ACM Transactions on Graphics (Proceedings of SIGGRAPH 2008) 27, 3, 11–21. Google ScholarDigital Library
    52. Starck, J., and Hilton, A. 2007. Surface capture for performance based animation. IEEE Computer Graphics and Applications 27(3), 21–31. Google ScholarDigital Library
    53. Stich, T., Linz, C., Albuquerque, G., and Magnor, M. 2008. View and time interpolation in image space. Computer Graphics Forum (Proc. Pacific Graphics) 27, 7.Google Scholar
    54. Sun, J., Zhang, W., Tang, X., and Shum, H.-Y. 2006. Background cut. In ECCV (2), 628–641. Google ScholarDigital Library
    55. Tuytelaars, T., and Van Gool, L. 2004. Synchronizing video sequences. Computer Vision and Pattern Recognition, IEEE Computer Society Conference on 1, 762–768.Google Scholar
    56. van den Hengel, A., Dick, A., Thormählen, T., Ward, B., and Torr, P. H. S. 2007. Videotrace: Rapid interactive scene modelling from video. ACM Transactions on Graphics 26, 3 (July), 86:1–86:5. Google ScholarDigital Library
    57. Vedula, S., Baker, S., and Kanade, T. 2005. Image-based spatio-temporal modeling and view interpolation of dynamic events. ACM Transactions on Graphics 24, 2 (Apr.), 240–261. Google ScholarDigital Library
    58. Vlasic, D., Baran, I., Matusik, W., and Popović, J. 2008. Articulated mesh animation from multi-view silhouettes. ACM Transactions on Graphics 27, 3, 97:1–97:9. Google ScholarDigital Library
    59. Wang, J., and Bodenheimer, B. 2008. Synthesis and evaluation of linear motion transitions. ACM Trans. Graph. 27, 1, 1–15. Google ScholarDigital Library
    60. Wang, J., Bhat, P., Colburn, R. A., Agrawala, M., and Cohen, M. F. 2005. Interactive video cutout. ACM Trans. Graph. 24, 3, 585–594. Google ScholarDigital Library
    61. Waschbüsch, M., Würmlin, S., and Gross, M. H. 2007. 3d video billboard clouds. Computer Graphics Forum (Proc. Eurographics EG’07) 26, 3, 561–569.Google Scholar
    62. Würmlin, S., and Niederberger, C., 2010. Realistic virtual replays for sports broadcasts. http://www.liberovision.com/.Google Scholar
    63. Zach, C., Pock, T., and Bischof, H. 2007. A globally optimal algorithm for robust tv-11 range image integration. In IEEE International Conference on Computer Vision (ICCV).Google Scholar
    64. Zitnick, C. L., Kang, S. B., Uyttendaele, M., Winder, S., and Szeliski, R. 2004. High-quality video view interpolation using a layered representation. ACM Transactions on Graphics 23, 3 (Aug.), 600–608. Google ScholarDigital Library

ACM Digital Library Publication: