Depth Synthesis and Local Warps for Plausible Image‐Based Navigation

Modern camera calibration and multiview stereo techniques enable users to smoothly navigate between different views of a scene captured using standard cameras. The underlying automatic 3D reconstruction methods work well for buildings and regular structures but often fail on vegetation, vehicles, and other complex geometry present in everyday urban scenes. Consequently, missing depth information makes Image-Based Rendering (IBR) for such scenes very challenging. Our goal is to provide plausible free-viewpoint navigation for such datasets. To do this, we introduce a new IBR algorithm that is robust to missing or unreliable geometry, providing plausible novel views even in regions quite far from the input camera positions. We first oversegment the input images, creating superpixels of homogeneous color content which often tends to preserve depth discontinuities. We then introduce a depth synthesis approach for poorly reconstructed regions based on a graph structure on the oversegmentation and appropriate traversal of the graph. The superpixels augmented with synthesized depth allow us to define a local shape-preserving warp which compensates for inaccurate depth. Our rendering algorithm blends the warped images, and generates plausible image-based novel views for our challenging target scenes. Our results demonstrate novel view synthesis in real time for multiple challenging scenes with significant depth complexity, providing a convincing immersive navigation experience.

References:

Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., and Susstrunk, S. 2012. SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Analy. Machine Intell. 34, 11, 2274–2282.
Andreetto, M., Zelnik-Manor, L., and Perona, P. 2008. Unsupervised learning of categorical segments in image collections. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshop.
Barnes, C., Shechtman, E., Finkelstein, A., and Goldman, D. B. 2009. Patchmatch: A randomized correspondence algorithm for structural image editing. ACM Trans. Graph. 28, 3, 24:1–24:11.
Bleyer, M., Rother, C., Kohli, P., Scharstein, D., and Sinha, S. 2011. Object stereo joint stereo matching and object segmentation. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’11). 3081–3088.
Buehler, C., Bosse, M., Mcmillan, L., Gortler, S., and Cohen, M. 2001. Unstructured lumigraph rendering. In Proceedings of the 28^th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH’01). 425–432.
Chaurasia, G., Sorkine, O., and Drettakis, G. 2011. Silhouette-aware warping for image-based rendering. Comput. Graph. Forum 30, 4, 1223–1232.
Chen, J., Paris, S., Wang, J., Matusik, W., Cohen, M., and Durand, F. 2011. The video mesh: A data structure for image-based three dimensional video editing. In Proceedings of the IEEE International Conference on Computational Photography (ICCP’11).
Chen, Y., Davis, T. A., Hager, W. W., and Rajamanickam, S. 2008. Algorithm 887: Cholmod, supernodal sparse cholesky factorization and update/downdate. ACM Trans. Math. Softw. 35, 3, 22:1–22:14.
Cigla, C., Zabulis, X., and Alatan, A. 2007. Region-based dense depth extraction from multi-view video. In Proceedings of the IEEE International Conference on Image Processing (ICIP’07).
Criminisi, A., Perez, P., and Toyama, K. 2003. Object removal by exemplar-based inpainting. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’03). 721–728.
Debevec, P. E., Taylor, C. J., and Malik, J. 1996. Modeling and rendering architecture from photographs: A hybrid geometry- and image-based approach. In Proceedings of the Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH’96). 11–20.
Dolson, J., Baek, J., Plagemann, C., and Thrun, S. 2010. Upsampling range data in dynamic environments. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’10). 1141–1148.
Eisemann, M., Decker, B. D., Magnor, M., Bekaert, P., De Aguiar, E., Ahmed, N., Theobalt, C., and Sellent, A. 2008. Floating textures. Comput. Graph. Forum 27, 2, 409–418.
Felzenszwalb, P. F. and Huttenlocher, D. P. 2004. Efficient graph-based image segmentation. Int. J. Comput. Vision 59, 167–181.
Fuhrmann, S. and Goesele, M. 2011. Fusion of depth maps with multiple scales. In Proceedings of the SIGGRAPH Asia Conference. 148:1–148:8.
Furukawa, Y., Curless, B., Seitz, S. M., and Szeliski, R. 2009. Manhattan-world stereo. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’09). 1422–1429.
Furukawa, Y. and Ponce, J. 2009. Accurate, dense, and robust multiview stereopsis. IEEE Trans. PAMI 32, 8, 1362–1376.
Gallup, D., Frahm, J.-M., and Pollefeys, M. 2010. Piecewise planar and non-planar stereo for urban scene reconstruction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10).
Goesele, M., Ackermann, J., Fuhrmann, S., Haubold, C., and Klowsky, R. 2010. Ambient point clouds for view interpolation. ACM Trans. Graph. 29, 95:1–95:6.
Goesele, M., Snavely, N., Curless, B., Hoppe, H., and Seitz, S. M. 2007. Multi-view stereo for community photo collections. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’07).
Grundmann, M., Kwatra, V., Han, M., and Essa, I. 2010. Efficient hierarchical graph based video segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10).
Gupta, A., Bhat, P., Dontcheva, M., Curless, B., Deussen, O., and Cohen, M. 2009. Enhancing and experiencing spacetime resolution with videos and stills. In Proceedings of the IEEE International Conference on Computational Photography (ICCP’09).
Hawe, S., Kleinsteuber, M., and Diepold, K. 2011. Dense disparity maps from sparse disparity measurements. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’11).
Hoiem, D., Efros, A. A., and Hebert, M. 2007. Recovering surface layout from an image. Int. J. Comput. Vision 75, 1, 151–172.
Kazhdan, M., Bolitho, M., and Hoppe, H. 2006. Poisson surface reconstruction. In Proceedings of the 4^th Eurographics Symposium on Geometry Processing (SGP’06). 61–70.
Kolmogorov, V. and Zabih, R. 2004. What energy functions can be minimized via graph cuts&quest; IEEE Trans. Pattern Analy. Machine Intell. 26, 2, 147–159.
Kowdle, A., Sinha, S. N., and Szeliski, R. 2012. Multiple view object cosegmentation using appearance and stereo cues. In Proceedings of the 12^th European Conference on Computer Vision (ECCV’12).
Levoy, M. and Hanrahan, P. 1996. Light field rendering. In Proceedings of the Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH’96). 31–42.
Lipski, C., Linz, C., Berger, K., Sellent, A., and Magnor, M. 2010. Virtual video camera: Image-based viewpoint navigation through space and time. Comput. Graph. Forum 29, 8, 2555–2568.
Liu, F., Gleicher, M., Jin, H., and Agarwala, A. 2009. Contentpreserving warps for 3D video stabilization. In Proceedings of the ACM SIGGRAPH Papers. 44:1–44:9.
Mahajan, D., Huang, F.-C., Matusik, W., Ramamoorthi, R., and Belhumeur, P. 2009. Moving gradients: A path-based method for plausible image interpolation. ACM Trans. Graph. 28, 3.
Mcmillan, L. and Bishop, G. 1995. Plenoptic modeling: An image based rendering system. In Proceedings of the Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH’95). 39–46.
Micusik, B. and Kosecka, J. 2010. Multi-view superpixel stereo in urban environments. Int. J. Comput. Vision 89, 1, 106–119.
Perez, P., Gangnet, M., and Blake, A. 2003. Poisson image editing. In Proceedings of the Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH’03). 313–318.
Pollefeys, M., Nistér, D., Frahm, J. M., Akbarzadeh, A., Mordohai, P., Clipp, B., Engels, C., Gallup, D., Kim, S.-J., Merrell, P., Salmi, C., Sinha, S., Talton, B., Wang, L., Yang, Q., Stewenius, H., Yang, R., Welch, G., and Towles, H. 2008. Detailed real-time urban 3D reconstruction from video. Int. J. Comput. Vision 78, 2–3, 143–167.
Seitz, S. M., Curless, B., Diebel, J., Scharstein, D., and Szeliski, R. 2006. A comparison and evaluation of multi-view stereo reconstruction algorithms. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’06).
Sinha, S. N., Kopf, J., Goesele, M., Scharstein, D., and Szeliski, R. 2012. Image-based rendering for scenes with reflections. ACM Trans. Graph. 31, 4, 100:1–100:10.
Sinha, S. N., Mordohai, P., and Pollefeys, M. 2007. Multi-view stereo via graph cuts on the dual of an adaptive tetrahedral mesh. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’07).
Sinha, S. N., Steedly, D., and Szeliski, R. 2009. Piecewise planar stereo for image-based rendering. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’09). 1881–1888.
Snavely, N., Seitz, S. M., and Szeliski, R. 2006. Photo tourism: Exploring photo collections in 3D. ACM Trans. Graph. 25, 3, 835–846.
Stich, T., Linz, C., Wallraven, C., Cunningham, D., and Magnor, M. 2011. Perception-motivated interpolation of image sequences. ACM Trans. Appl. Percept. 8, 2, 11:1–11:25.
Vangorp, P., Chaurasia, G., Laffont, P.-Y., Fleming, R. W., and Drettakis, G. 2011. Perception of visual artifacts in image-based rendering of facades. Comput. Graph. Forum 30, 4, 1241–1250.
Yang, Q., Yang, R., Davis, J., and Niste R, D. 2007. Spatial-depth super resolution for range images. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’07).
Zitnick, C. L., Jojic, N., and Kang, S. B. 2005. Consistent segmentation for optical flow estimation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’05). 1308–1315.
Zitnick, C. L. and Kang, S. B. 2007. Stereo for image-based rendering using image over-segmentation. Int. J. Comput. Vision 75, 1, 49–65.
Zitnick, C. L., Kang, S. B., Uyttendaele, M., Winder, S., and Szeliski, R. 2004. High-quality video view interpolation using a layered representation. ACM Trans. Graph. 23, 3, 600–608.

ACM Digital Library Publication:

Overview Page:

SIGGRAPH 2013: Technical Papers

“Depth Synthesis and Local Warps for Plausible Image‐Based Navigation” by Chaurasia, Duchene, Sorkine-Hornung and Drettakis

Conference:

Type(s):

Title:

Session/Category Title: Video & Warping

Presenter(s)/Author(s):

Moderator(s):

Abstract:

References:

ACM Digital Library Publication:

Overview Page:

Sponsored by: