Unstructured video-based rendering: interactive exploration of casually captured videos

Luca Ballan; Gabriel J. Brostow; Jens Puwein; Marc Pollefeys

“Unstructured video-based rendering: interactive exploration of casually captured videos” by Ballan, Brostow, Puwein and Pollefeys

Next: “Unsupervised Cell Identification on... »

« Previous: “Unstructured lumigraph rendering” by...

Conference:

SIGGRAPH 2010

Type(s):

Technical Papers

Title:

Unstructured video-based rendering: interactive exploration of casually captured videos

Presenter(s)/Author(s):

Luca Ballan

Gabriel J. Brostow

Jens Puwein

Marc Pollefeys

Abstract:

We present an algorithm designed for navigating around a performance that was filmed as a “casual” multi-view video collection: real-world footage captured on hand held cameras by a few audience members. The objective is to easily navigate in 3D, generating a video-based rendering (VBR) of a performance filmed with widely separated cameras. Casually filmed events are especially challenging because they yield footage with complicated backgrounds and camera motion. Such challenging conditions preclude the use of most algorithms that depend on correlation-based stereo or 3D shape-from-silhouettes.Our algorithm builds on the concepts developed for the exploration of photo-collections of empty scenes. Interactive performer-specific view-interpolation is now possible through innovations in interactive rendering and offline-matting relating to i) modeling the foreground subject as video-sprites on billboards, ii) modeling the background geometry with adaptive view-dependent textures, and iii) view interpolation that follows a performer. The billboards are embedded in a simple but realistic reconstruction of the environment. The reconstructed environment provides very effective visual cues for spatial navigation as the user transitions between viewpoints. The prototype is tested on footage from several challenging events, and demonstrates the editorial utility of the whole system and the particular value of our new inter-billboard optimization.

References:

1. Arulampalam, M. S., Maskell, S., and Gordon, N. 2002. A tutorial on particle filters for online nonlinear/non-gaussian bayesian tracking. IEEE Trans. Signal Processing 50, 174–188. Google ScholarDigital Library
2. Bai, X., Wang, J., Simons, D., and Sapiro, G. 2009. Video snapcut: robust video object cutout using localized classifiers. ACM Trans. Graph. 28, 3. Google ScholarDigital Library
3. Ballan, L., and Cortelazzo, G. M. 2008. Marker-less motion capture of skinned models in a four camera set-up using optical flow and silhouettes. In 3DPVT.Google Scholar
4. Boykov, Y., and Kolmogorov, V. 2004. An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Trans. Pattern Anal. Mach. Intell. 26, 9, 1124–1137. Google ScholarDigital Library
5. Buehler, C., Bosse, M., McMillan, L., Gortler, S. J., and Cohen, M. F. 2001. Unstructured lumigraph rendering. In SIGGRAPH, 425–432. Google ScholarDigital Library
6. Campbell, N. D., Vogiatzis, G., Hernández, C., and Cipolla, R. 2007. Automatic 3d object segmentation in multiple views using volumetric graph-cuts. In 18th British Machine Vision Conference, vol. 1, 530–539.Google Scholar
7. Carranza, J., Theobalt, C., Magnor, M. A., and peter Seidel, H. 2003. Free-viewpoint video of human actors. In ACM Transactions on Graphics, 569–577. Google ScholarDigital Library
8. Chen, S. E., and Williams, L. 1993. View interpolation for image synthesis. In SIGGRAPH ’93: Proceedings of the 20th annual conference on Computer graphics and interactive techniques, 279–288. Google ScholarDigital Library
9. Chuang, Y.-Y., Curless, B., Salesin, D. H., and Szeliski, R. 2001. A bayesian approach to digital matting. In Proceedings of IEEE CVPR 2001, vol. 2, 264–271.Google Scholar
10. Chuang, Y.-Y., Agarwala, A., Curless, B., Salesin, D. H., and Szeliski, R. 2002. Video matting of complex scenes. ACM Transactions on Graphics 21, 3 (July), 243–248. Google ScholarDigital Library
11. de Aguiar, E., Stoll, C., Theobalt, C., Ahmed, N., Seidel, H. P., and Thrun, S. 2008. Performance capture from sparse multi-view video. ACM Trans. Graph. 27, 3, 1–10. Google ScholarDigital Library
12. Debevec, P. E., Taylor, C. J., and Malik, J. 1996. Modeling and rendering architecture from photographs: A hybrid geometry- and image-based approach. In Proceedings of SIGGRAPH 96, Computer Graphics Proceedings, Annual Conference Series, 11–20. Google ScholarDigital Library
13. Debevec, P., Borshukov, G., and Yu, Y. 1998. Efficient view-dependent image-based rendering with projective texture-mapping. In 9th Eurographics Workshop on Rendering.Google Scholar
14. Dragicevic, P., Ramos, G., Bibliowitcz, J., Nowrouzezahrai, D., Balakrishnan, R., and Singh, K. 2008. Video browsing by direct manipulation. In CHI ’08: Proceeding of the twenty-sixth annual SIGCHI conference on Human factors in computing systems, 237–246. Google ScholarDigital Library
15. Eisemann, M., Decker, B. D., Magnor, M., Bekaert, P., de Aguiar, E., Ahmed, N., Theobalt, C., and Sellent, A. 2008. Floating Textures. Computer Graphics Forum (Proc. Eurographics EG’08) 27, 2 (4), 409–418.Google Scholar
16. Franco, J.-S., and Boyer, E. 2005. Fusion of multi-view silhouette cues using a space occupancy grid. In ICCV, 1747–1753. Google ScholarDigital Library
17. Goesele, M., Snavely, N., Curless, B., Hoppe, H., and Seitz, S. M. 2007. Multi-view stereo for community photo collections. In ICCV, 1–8.Google Scholar
18. Goldman, D. B., Gonterman, C., Curless, B., Salesin, D., and Seitz, S. M. 2008. Video object annotation, navigation, and composition. In UIST ’08: Proceedings of the 21st annual ACM symposium on User interface software and technology, 3–12. Google ScholarDigital Library
19. Goldman, D. B. 2007. A framework for video annotation, visualization, and interaction. PhD thesis. Google ScholarDigital Library
20. Gortler, S. J., Grzeszczuk, R., Szeliski, R., and Cohen, M. F. 1996. The lumigraph. In SIGGRAPH, 43–54. Google ScholarDigital Library
21. Grundland, M., Vohra, R., Williams, G. P., and Dodgson, N. A. 2006. Cross dissolve without cross fade: Preserving contrast, color and salience in image compositing. In Proceedings of EUROGRAPHICS, Computer Graphics Forum, 577–586.Google Scholar
22. Guillemaut, J.-Y., Hilton, A., Starck, J., Kilner, J., and Grau, O. 2007. A bayesian framework for simultaneous matting and 3d reconstruction. In 3DIM ’07: Proceedings of the Sixth International Conference on 3-D Digital Imaging and Modeling, 167–176. Google ScholarDigital Library
23. Guillemaut, J.-Y., Kilner, J., and Hilton, A. 2009. Robust graph-cut scene segmentation and reconstruction for free-viewpoint video of complex dynamic scenes. In Proc. International Conference on Computer Vision (ICCV 2009).Google Scholar
24. Hartley, R. I., and Zisserman, A. 2000. Multiple View Geometry in Computer Vision. Cambridge University Press, ISBN: 0521623049. Google ScholarDigital Library
25. Hasler, N., Rosenhahn, B., Thormählen, T., Wand, M., Gall, J., and Seidel, H.-P. 2009. Markerless motion capture with unsynchronized moving cameras. In CVPR, 224–231.Google Scholar
26. Hayashi, K., and Saito, H. 2006. Synthesizing free-viewpoint images from multiple view videos in soccer stadium. In CGIV ’06: Proceedings of the International Conference on Computer Graphics, Imaging and Visualisation, 220–225. Google ScholarDigital Library
27. Hays, J., and Efros, A. A. 2007. Scene completion using millions of photographs. ACM Transactions on Graphics (SIGGRAPH 2007) 26, 3. Google ScholarDigital Library
28. Heigl, B., Koch, R., Pollefeys, M., Denzler, J., and Van Gool, L. 1999. Plenoptic modeling and rendering from image sequences taken by hand-held camera. In Patter Recognition 1999, 21. DAGM-Symposium, 94–101. Google ScholarDigital Library
29. Kanade, T., 2001. Carnegie mellon goes to the superbowl. http://www.ri.cmu.edu/events/sb35/tksuperbowl.html.Google Scholar
30. Karrer, T., Weiss, M., Lee, E., and Borchers, J. 2008. Dragon: a direct manipulation interface for frame-accurate in-scene video navigation. In CHI ’08, 247–250. Google ScholarDigital Library
31. Kilner, J., Starck, J., and Hilton, A. 2006. A comparative study of free-viewpoint video techniques for sports events. European Conference on Visual Media Production (CVMP).Google Scholar
32. Kilner, J., Starck, J., Hilton, A., and Grau, O. 2007. Dual-mode deformable models for free-viewpoint video of sports events. In 3DIM07, 177–184. Google ScholarDigital Library
33. Kopf, J., Neubert, B., Chen, B., Cohen, M., Cohen-Or, D., Deussen, O., Uyttendaele, M., and Lischinski, D. 2008. Deep photo: model-based photograph enhancement and viewing. ACM Trans. Graph. 27, 5, 116. Google ScholarDigital Library
34. Levoy, M., and Hanrahan, P. 1996. Light field rendering. In SIGGRAPH, 31–42. Google ScholarDigital Library
35. Lhuillier, M., and Quan, L. 2005. A quasi-dense approach to surface reconstruction from uncalibrated images. IEEE Trans. Pattern Anal. Mach. Intell. 27, 3, 418–433. Google ScholarDigital Library
36. Liu, F., Gleicher, M., Jin, H., and Agarwala, A. 2009. Content-preserving warps for 3d video stabilization. In ACM SIGGRAPH 2009, 1–9. Google ScholarDigital Library
37. Lowe, D. G. 2004. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60, 2, 91–110. Google ScholarDigital Library
38. Matusik, W., Buehler, C., Raskar, R., Gortler, S. J., and McMillan, L. 2000. Image-based visual hulls. In Proceedings of ACM SIGGRAPH, 369–374. Google ScholarDigital Library
39. Pollefeys, M., Van Gool, L., Vergauwen, M., Verbiest, F., Cornelis, K., Tops, J., and Koch, R. 2004. Visual modeling with a hand-held camera. IJCV 59, 3, 207–232. Google ScholarDigital Library
40. Rav-Acha, A., Kohli, P., Rother, C., and Fitzgibbon, A. 2008. Unwrap mosaics: A new representation for video editing. ACM Transactions on Graphics (SIGGRAPH 2008) (August). Google ScholarDigital Library
41. Rong, G., and Tan, T.-S. 2006. Jump flooding in gpu with applications to voronoi diagram and distance transform. In ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games (I3D), ACM, 109–116. Google ScholarDigital Library
42. Schindler, G., and Dellaert, F. 2010. Probabilistic temporal inference on reconstructed 3D scenes. In CVPR, 1–8.Google Scholar
43. Schödl, A., Szeliski, R., Salesin, D. H., and Essa, I. 2000. Video textures. In SIGGRAPH ’00: Proceedings of the 27th annual conference on Computer graphics and interactive techniques, 489–498. Google ScholarDigital Library
44. Schönemann, P. 1966. A generalized solution of the orthogonal procrustes problem. Psychometrika 31, 1 (March), 1–10.Google ScholarCross Ref
45. Seitz, S. M., and Dyer, C. R. 1996. View morphing. In Proceedings of ACM SIGGRAPH, 21–30. Google ScholarDigital Library
46. Seitz, S. M., Curless, B., Diebel, J., Scharstein, D., and Szeliski, R. 2006. A comparison and evaluation of multiview stereo reconstruction algorithms. In 2006 Conference on Computer Vision and Pattern Recognition (CVPR 2006), 519–528. Google ScholarDigital Library
47. Sinha, S. N., and Pollefeys, M. 2004. Synchronization and calibration of camera networks from silhouettes. In ICPR ’04: Proceedings of the Pattern Recognition, 17th International Conference on (ICPR’04) Volume 1, 116–119. Google ScholarDigital Library
48. Sinha, S. N., Steedly, D., Szeliski, R., Agrawala, M., and Pollefeys, M. 2008. Interactive 3d architectural modeling from unordered photo collections. ACM Transactions on Graphics (Proceedings of SIGGRAPH Asia 2008) 27, 5, 159. Google ScholarDigital Library
49. Sivic, J., and Zisserman, A. 2003. Video Google: A text retrieval approach to object matching in videos. In Proceedings of the International Conference on Computer Vision, vol. 2, 1470–1477. Google ScholarDigital Library
50. Snavely, N., Seitz, S. M., and Szeliski, R. 2006. Photo tourism: Exploring photo collections in 3d. In SIGGRAPH Conference Proceedings, 835–846. Google ScholarDigital Library
51. Snavely, N., Garg, R., Seitz, S. M., and Szeliski, R. 2008. Finding paths through the world’s photos. ACM Transactions on Graphics (Proceedings of SIGGRAPH 2008) 27, 3, 11–21. Google ScholarDigital Library
52. Starck, J., and Hilton, A. 2007. Surface capture for performance based animation. IEEE Computer Graphics and Applications 27(3), 21–31. Google ScholarDigital Library
53. Stich, T., Linz, C., Albuquerque, G., and Magnor, M. 2008. View and time interpolation in image space. Computer Graphics Forum (Proc. Pacific Graphics) 27, 7.Google Scholar
54. Sun, J., Zhang, W., Tang, X., and Shum, H.-Y. 2006. Background cut. In ECCV (2), 628–641. Google ScholarDigital Library
55. Tuytelaars, T., and Van Gool, L. 2004. Synchronizing video sequences. Computer Vision and Pattern Recognition, IEEE Computer Society Conference on 1, 762–768.Google Scholar
56. van den Hengel, A., Dick, A., Thormählen, T., Ward, B., and Torr, P. H. S. 2007. Videotrace: Rapid interactive scene modelling from video. ACM Transactions on Graphics 26, 3 (July), 86:1–86:5. Google ScholarDigital Library
57. Vedula, S., Baker, S., and Kanade, T. 2005. Image-based spatio-temporal modeling and view interpolation of dynamic events. ACM Transactions on Graphics 24, 2 (Apr.), 240–261. Google ScholarDigital Library
58. Vlasic, D., Baran, I., Matusik, W., and Popović, J. 2008. Articulated mesh animation from multi-view silhouettes. ACM Transactions on Graphics 27, 3, 97:1–97:9. Google ScholarDigital Library
59. Wang, J., and Bodenheimer, B. 2008. Synthesis and evaluation of linear motion transitions. ACM Trans. Graph. 27, 1, 1–15. Google ScholarDigital Library
60. Wang, J., Bhat, P., Colburn, R. A., Agrawala, M., and Cohen, M. F. 2005. Interactive video cutout. ACM Trans. Graph. 24, 3, 585–594. Google ScholarDigital Library
61. Waschbüsch, M., Würmlin, S., and Gross, M. H. 2007. 3d video billboard clouds. Computer Graphics Forum (Proc. Eurographics EG’07) 26, 3, 561–569.Google Scholar
62. Würmlin, S., and Niederberger, C., 2010. Realistic virtual replays for sports broadcasts. http://www.liberovision.com/.Google Scholar
63. Zach, C., Pock, T., and Bischof, H. 2007. A globally optimal algorithm for robust tv-11 range image integration. In IEEE International Conference on Computer Vision (ICCV).Google Scholar
64. Zitnick, C. L., Kang, S. B., Uyttendaele, M., Winder, S., and Szeliski, R. 2004. High-quality video view interpolation using a layered representation. ACM Transactions on Graphics 23, 3 (Aug.), 600–608. Google ScholarDigital Library

ACM Digital Library Publication: