“High-quality streamable free-viewpoint video” by Collet, Chuang, Sweeney, Gillett, Evseev, et al. …

  • ©Alvaro Collet, Ming Chuang, Pat Sweeney, Don Gillett, Dennis Evseev, David Calabrese, Hugues Hoppe, and Steve Sullivan




    High-quality streamable free-viewpoint video

Session/Category Title: Video Processing




    We present the first end-to-end solution to create high-quality free-viewpoint video encoded as a compact data stream. Our system records performances using a dense set of RGB and IR video cameras, generates dynamic textured surfaces, and compresses these to a streamable 3D video format. Four technical advances contribute to high fidelity and robustness: multimodal multi-view stereo fusing RGB, IR, and silhouette information; adaptive meshing guided by automatic detection of perceptually salient areas; mesh tracking to create temporally coherent subsequences; and encoding of tracked textured meshes as an MPEG video stream. Quantitative experiments demonstrate geometric accuracy, texture fidelity, and encoding efficiency. We release several datasets with calibrated inputs and processed results to foster future research.


    1. 4D View Solutions, 2007. http://www.4dviews.com.Google Scholar
    2. Ahmed, N., Theobalt, C., Dobrev, P., and Seidel, H. 2008. Robust fusion of dynamic shape and normal capture for high-quality reconstruction of time-varying geometry. In Proc. CVPR.Google Scholar
    3. Ahmed, N., Theobalt, C., Rossl, C., Thrun, S., and Seidel, H. 2008. Dense correspondence finding for parameterization-free animation reconstruction from video. In Proc. CVPR.Google Scholar
    4. Alexa, M., Behr, J., Cohen-Or, D., Fleishman, S., Levin, D., and Silva, C. T. 2001. Point set surfaces. In Proc. Conf. on Visualization. Google ScholarDigital Library
    5. Aspert, N., Santa-cruz, D., and Ebrahimi, T. 2002. MESH: Measuring errors between surfaces using the Hausdorff distance. In Proc. ICME.Google Scholar
    6. Bleyer, M., Rhemann, C., and Rother, C. 2011. PatchMatch stereo – stereo matching with slanted support windows. In Proc. BMVC.Google Scholar
    7. Bojsen-Hansen, M., Li, H., and Wojtan, C. 2012. Tracking surfaces with evolving topology. ACM Trans. Graph. 31, 4. Google ScholarDigital Library
    8. Borshukov, G., Piponi, D., Larsen, O., Lewis, J. P., and Tempelaar-Lietz, C. 2005. Universal capture — Image-based facial animation for “The Matrix Reloaded”. In ACM SIGGRAPH Courses. Google ScholarDigital Library
    9. Budd, C., Huang, P., Klaudiny, M., and Hilton, A. 2013. Global non-rigid alignment of surface sequences. Int. J. Comput. Vision 102, 1–3. Google ScholarDigital Library
    10. Campbell, N. D. F., Vogiatzis, G., Hernandez, C., and Cipolla, R. 2008. Using multiple hypotheses to improve depth-maps for multi-view stereo. In Proc. ECCV. Google ScholarDigital Library
    11. Carranza, J., Theobalt, C., Magnor, M. A., and Seidel, H.-P. 2003. Free-viewpoint video of human actors. ACM Trans. Graph. 22, 3. Google ScholarDigital Library
    12. Casas, D., Volino, M., Collomosse, J., and Hilton, A. 2014. 4D video textures for interactive character appearance. Comput. Graph. Forum 33, 2. Google ScholarDigital Library
    13. Chuang, M., Luo, L., Brown, B., Rusinkiewicz, S., and Kazhdan, M. 2009. Estimating the Laplace-Beltrami operator by restricting 3D functions. Symposium on Geometry Processing. Google ScholarDigital Library
    14. de Aguiar, E., Stoll, C., Theobalt, C., Ahmed, N., Seidel, H.-P., and Thrun, S. 2008. Performance capture from sparse multi-view video. ACM Trans. Graph. 27, 3. Google ScholarDigital Library
    15. DoubleMe, 2014. https://www.doubleme.me.Google Scholar
    16. Erickson, J., and Whittlesey, K. 2005. Greedy optimal homotopy and homology generators. In Proc. ACM-SIAM Symposium on Discrete algorithms. Google ScholarDigital Library
    17. Franco, J., Lapierre, M., and Boyer, E. 2006. Visual shapes of silhouette sets. In Proc. Intl. Symp. 3D Data Processing, Visualization and Transmission. Google ScholarDigital Library
    18. FreeD, 2014. http://replay-technologies.com.Google Scholar
    19. Furukawa, Y., and Ponce, J. 2010. Accurate, dense, and robust multiview stereopsis. IEEE PAMI 32, 8. Google ScholarDigital Library
    20. Gal, R., Wexler, Y., Ofek, E., Hoppe, H., and Cohen-Or, D. 2010. Seamless montage for texturing models. Comput. Graph. Forum 29, 2.Google ScholarCross Ref
    21. Gall, J., Stoll, C., Aguiar, E. D., Theobalt, C., Rosenhahn, B., and peter Seidel, H. 2009. Motion capture using joint skeleton tracking and surface estimation. In Proc. CVPR.Google ScholarCross Ref
    22. Garland, M., and Heckbert, P. S. 1997. Surface simplification using quadric error metrics. In ACM SIGGRAPH. Google ScholarDigital Library
    23. Goesele, M., Curless, B., and Seitz, S. M. 2006. Multi-view stereo revisited. In Proc. CVPR. Google ScholarDigital Library
    24. Goldluecke, B., and Magnor, M. 2004. Space-time isosurface evolution for temporally coherent 3D reconstruction. In Proc. CVPR.Google Scholar
    25. Golomb, S. 1966. Run-length encodings (corresp.). IEEE Transactions on Information Theory 12, 3. Google ScholarDigital Library
    26. Guennebaud, G., Jacob, B., et al., 2010. Eigen v3. http://eigen.tuxfamily.org.Google Scholar
    27. Guskov, I., and Wood, Z. J. 2001. Topological noise removal. In Proc. Graphics Interface. Google ScholarDigital Library
    28. Hernandez, C., and Schmitt, F. 2004. Silhouette and stereo fusion for 3D object modeling. Computer Vision and Image Understanding 96, 3. Google ScholarDigital Library
    29. Hiep, V. H., Keriven, R., Labatut, P., and Pons, J.-P. 2009. Towards high-resolution large-scale multi-view stereo. In Proc. CVPR.Google Scholar
    30. Hu, X., and Mordohai, P. 2012. A quantitative evaluation of confidence measures for stereo vision. IEEE PAMI 34, 11. Google ScholarDigital Library
    31. Huang, C.-H., Boyer, E., Navab, N., and Ilic, S. 2014. Human shape and pose tracking using keyframes. In Proc. CVPR. Google ScholarDigital Library
    32. ISO/IEC 23009-1, 2014. Information technology — dynamic adaptive streaming over HTTP (DASH) — Part 1: Media presentation description and segment formats.Google Scholar
    33. Kanade, T., Rander, P., and Narayanan, P. J. 1997. Virtualized reality: Constructing virtual worlds from real scenes. IEEE Multimedia 4, 1. Google ScholarDigital Library
    34. Kazhdan, M., and Hoppe, H. 2013. Screened Poisson surface reconstruction. ACM Trans. Graph. 32, 3. Google ScholarDigital Library
    35. Kazhdan, M., Bolitho, M., and Hoppe, H. 2006. Poisson surface reconstruction. In Symposium on Geometry Processing. Google ScholarDigital Library
    36. Klaudiny, M., Budd, C., and Hilton, A. 2012. Towards optimal non-rigid surface tracking. In Proc. ECCV. Google ScholarDigital Library
    37. Labatut, P., Pons, J.-P., and Keriven, R. 2007. Efficient multi-view reconstruction of large-scale scenes using interest points, delaunay triangulation and graph cuts. In Proc. ICCV.Google Scholar
    38. Lee, C. H., Varshney, A., and Jacobs, D. W. 2005. Mesh saliency. ACM Trans. Graph. 24, 3. Google ScholarDigital Library
    39. Lempitsky, V. S., and Ivanov, D. V. 2007. Seamless mosaicing of image-based texture maps. In Proc. CVPR.Google Scholar
    40. Letouzey, A., and Boyer, E. 2012. Progressive shape models. In Proc. CVPR. Google ScholarDigital Library
    41. Li, H., Adams, B., Guibas, L. J., and Pauly, M. 2009. Robust single-view geometry and motion reconstruction. ACM Trans. Graph. 28, 5. Google ScholarDigital Library
    42. Lindstrom, P., and Turk, G. 2000. Image-driven simplification. ACM Trans. Graph. 19, 3. Google ScholarDigital Library
    43. Liu, Y., Dai, Q., and Xu, W. 2010. A point-cloud-based multiview stereo algorithm for free-viewpoint video. IEEE TVCG. Google ScholarDigital Library
    44. Matusik, W., Buehler, C., Raskar, R., Gortler, S. J., and McMillan, L. 2000. Image-based visual hulls. In ACM SIGGRAPH. Google ScholarDigital Library
    45. Microsoft, 2011. UVAtlas. http://uvatlas.codeplex.com.Google Scholar
    46. Moezzi, S., Tai, L.-C., and Gerard, P. 1997. Virtual view generation for 3D digital video. IEEE Multimedia 4, 1. Google ScholarDigital Library
    47. Narayanan, P., Rander, P., and Kanade, T. 1998. Constructing virtual worlds using dense stereo. In Proc. ICCV. Google ScholarDigital Library
    48. Shan, Q., Curless, B., Furukawa, Y., Hernandez, C., and Seitz, S. M. 2014. Occluding contours for multi-view stereo. In Proc. ECCV.Google Scholar
    49. Sinha, S. N., and Pollefeys, M. 2005. Multi-view reconstruction using photo-consistency and exact silhouette constraints: a maximum-flow formulation. In Proc. ICCV. Google ScholarDigital Library
    50. Song, P., Wu, X., and Wang, M. Y. 2010. Volumetric stereo and silhouette fusion for image-based modeling. The Visual Computer 26, 12. Google ScholarDigital Library
    51. Starck, J., and Hilton, A. 2007. Surface capture for performance-based animation. IEEE Computer Graphics and Application 27, 6. Google ScholarDigital Library
    52. Sumner, R. W., Schmid, J., and Pauly, M. 2007. Embedded deformation for shape manipulation. ACM Trans. Graph. 26, 3. Google ScholarDigital Library
    53. Vasa, L., and Skala, V. 2007. CoDDyaC: Connectivity Driven Dynamic Mesh Compression. In Proc. 3DTV.Google Scholar
    54. Vlasic, D., Baran, I., Matusik, W., and Popovic, J. 2008. Articulated mesh animation from multiview silhouettes. ACM Trans. Graph. 27, 3. Google ScholarDigital Library
    55. Vlasic, D., Peers, P., Baran, I., Debevec, P., Popović, J., Rusinkiewicz, S., and Matusik, W. 2009. Dynamic shape capture using multi-view photometric stereo. ACM Trans. Graph. 28, 5. Google ScholarDigital Library
    56. Volino, M., Casas, D., Collomosse, J. P., and Hilton, A. 2014. Optimal representation of multiple view video. In Proc. BMVC.Google Scholar
    57. Wand, M., Adams, B., Ovsjanikov, M., Berner, A., Bokeloh, M., Jenke, P., Guibas, L., Seidel, H.-P., and Schilling, A. 2009. Efficient reconstruction of nonrigid shape and motion from real-time 3D scanner data. ACM Trans. Graph. 28, 2. Google ScholarDigital Library
    58. Wang, Z., Bovik, A. C., Sheikh, H. R., and Simoncelli, E. P. 2004. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Proc. 13, 4. Google ScholarDigital Library
    59. Wood, Z., Hoppe, H., Desbrun, M., and Schröder, P. 2004. Removing excess topology from isosurfaces. ACM Trans. Graph. 23, 2. Google ScholarDigital Library
    60. Wu, C., Varanasi, K., Liu, Y., Seidel, H.-P., and Theobalt, C. 2011. Shading–based dynamic shape refinement from multi-view video under general illumination. In Proc. ICCV. Google ScholarDigital Library
    61. Ye, G., Liu, Y., Deng, Y., Hasler, N., Ji, X., Dai, Q., and Theobalt, C. 2013. Free-viewpoint video of human actors using multiple handheld Kinects. IEEE Trans. on System, Man & Cybernetics 43, 5.Google Scholar
    62. Yu, F., Luo, H., Lu, Z., and Wang, P. 2010. 3D mesh compression. Three-Dimensional Model Analysis and Processing.Google Scholar
    63. Zhou, Q.-Y., and Koltun, V. 2014. Color map optimization for 3D reconstruction with consumer depth cameras. ACM Trans. Graph. 33, 4. Google ScholarDigital Library
    64. Zitnick, C. L., Kang, S. B., Uyttendaele, M., Winder, S., and Szeliski, R. 2004. High-quality video view interpolation using a layered representation. ACM Trans. Graph. 23, 3. Google ScholarDigital Library
    65. Zollhöfer, M., Niessner, M., Izadi, S., Rehmann, C., Zach, C., Fisher, M., Wu, C., Fitzgibbon, A., Loop, C., Theobalt, C., and Stamminger, M. 2014. Real-time non-rigid reconstruction using an RGB-D camera. ACM Trans. Graph. 33, 4. Google ScholarDigital Library

ACM Digital Library Publication: