Efficient 3D Object Segmentation From Densely Sampled Light Fields With Applications to 3D Reconstruction

Precise object segmentation in image data is a fundamental problem with various applications, including 3D object reconstruction. We present an efficient algorithm to automatically segment a static foreground object from highly cluttered background in light fields. A key insight and contribution of our article is that a significant increase of the available input data can enable the design of novel, highly efficient approaches. In particular, the central idea of our method is to exploit high spatio-angular sampling on the order of thousands of input frames, for example, captured as a hand-held video, such that new structures are revealed due to the increased coherence in the data. We first show how purely local gradient information contained in slices of such a dense light field can be combined with information about the camera trajectory to make efficient estimates of the foreground and background. These estimates are then propagated to textureless regions using edge-aware filtering in the epipolar volume. Finally, we enforce global consistency in a gathering step to derive a precise object segmentation in both 2D and 3D space, which captures fine geometric details even in very cluttered scenes. The design of each of these steps is motivated by efficiency and scalability, allowing us to handle large, real-world video datasets on a standard desktop computer. We demonstrate how the results of our method can be used for considerably improving the speed and quality of image-based 3D reconstruction algorithms, and we compare our results to state-of-the-art segmentation and multiview stereo methods.

References:

Xiaobo An and Fabio Pellacini. 2008. AppProp: All-pairs appearance-space edit propagation. ACM Trans. Graphics 27, 3, 40:1–40:9.
Nicholas Apostoloff and Andrew W. Fitzgibbon. 2006. Automatic video segmentation using spatiotemporal t-junctions. In Proceedings of the British Machine Vision Conference. 1089–1098.
Jesse Berent and Pier Luigi Dragotti. 2007. Plenoptic manifolds — exploiting structure and coherence in multiview images. IEEE Signal Proc. Mag. 24, 7, 34–44.
Robert C. Bolles, H. Harlyn Baker, and David H. Marimont. 1987. Epipolar-plane image analysis: An approach to determining structure from motion. Int. J. Comput. Vision 1, 1, 7–55.
Adam Bowen, Andrew Mullins, Roland Wilson, and Nasir Rajpoot. 2007. Bayesian surface estimation from multiple cameras using a prior based on the visual hull and its application to image based rendering. In Proceedings of the British Machine Vision Conference. 1–8.
Yuri Boykov and Marie-Pierre Jolly. 2001. Interactive graph cuts for optimal boundary and region segmentation of objects in N-D images. In Proceedings of the IEEE International Conference on Computer Vision. 105–112.
Derek Bradley, Tamy Boubekeur, and Wolfgang Heidrich. 2008. Accurate multi-view reconstruction using robust binocular stereo and surface meshing. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 1–8.
Chris Buehler, Michael Bosse, Leonard McMillan, Steven Gortler, and Michael Cohen. 2001. Unstructured lumigraph rendering. In Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques. 425–432.
N. D. F. Campbell, G. Vogiatzis, C. Hernández, and R. Cipolla. 2010. Automatic 3D object segmentation in multiple views using volumetric graph-cuts. Image Vision Comput. 28, 1, 14–25.
Neill D. F. Campbell, George Vogiatzis, Carlos Hernandez, and Roberto Cipolla. 2011. Automatic object segmentation from calibrated images. In Proceedings of the European Conference on Visual Media Production. 126–137.
Can Chen, Haiting Lin, Zhan Yu, Sing Bing Kang, and Jingyi Yu. 2014. Light field stereo matching using bilateral statistics of surface cameras. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 1518–1525.
Yung-Yu Chuang, Aseem Agarwala, Brian Curless, David H. Salesin, and Richard Szeliski. 2002. Video matting of complex scenes. ACM Trans. Graphics 21, 3, 243–248.
Antonio Criminisi, Sing Bing Kang, Rahul Swaminathan, Richard Szeliski, and P. Anandan. 2005. Extracting layers and analyzing their specular properties using epipolar-plane-image analysis. Comput. Vision Image Understanding 97, 1, 51–85.
Abe Davis, Marc Levoy, and Fredo Durand. 2012. Unstructured light fields. Comput. Graphics Forum 31, 2, 305–314.
Elmar Eisemann and Frédo Durand. 2004. Flash photography enhancement via intrinsic relighting. ACM Trans. Graphics 23, 3, 673–678.
Martin Eisemann, Bert de Decker, Marcus A. Magnor, Philippe Bekaert, Edi and Anita Sellent. 2008. Floating textures. Comput. Graphics Forum 27, 2, 409–418.
Ingo Feldmann, Peter Kauff, and Peter Eisert. 2003a. Extension of epipolar image analysis to circular camera movements. In Proceedings of the International Conference on Image Processing. 697–700.
Ingo Feldmann, Peter Kauff, and Peter Eisert. 2003b. Image cube trajectory analysis for 3D reconstruction of concentric mosaics. In Proceedings of the International Conference on Vision, Modeling and Visualization. 569–576.
Jean-Sébastien Franco and Edmond Boyer. 2005. Fusion of multi-view silhouette cues using a space occupancy grid. In Proceedings of the IEEE International Conference on Computer Vision. 1747–1753.
Simon Fuhrmann and Michael Goesele. 2014. Floating scale surface reconstruction. ACM Trans. Graphics 33, 4, 46:1–46:11.
Yasutaka Furukawa, Brian Curless, Steven M. Seitz, and Richard Szeliski. 2010. Towards internet-scale multi-view stereo. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 1434–1441.
Yasutaka Furukawa and Jean Ponce. 2009. Carved visual hulls for image-based modeling. Int. J. Comput. Vision 81, 1, 53–67.
Yasutaka Furukawa and Jean Ponce. 2010. Accurate, dense, and robust multi-view stereopsis. IEEE Trans. Pattern Anal. Mach. Intell. 32, 8, 1362–1376.
Eduardo S. L. Gastal and Manuel M. Oliveira. 2011. Domain transform for edge-aware image and video processing. ACM Trans. Graphics 30, 4, 69:1–69:12.
Michael Goesele, Noah Snavely, Brian Curless, Hugues Hoppe, and Steven M. Seitz. 2007. Multi-view stereo for community photo collections. In Proceedings of the IEEE International Conference on Computer Vision. 1–8.
Bastian Goldlücke and Marcus A. Magnor. 2003. Joint 3D-reconstruction and background separation in multiple views using graph cuts. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 683–688.
Steven J. Gortler, Radek Grzeszczuk, Richard Szeliski, and Michael F. Cohen. 1996. The lumigraph. In Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques. 43–54.
Kristen Grauman, Gregory Shakhnarovich, and Trevor Darrell. 2003. A Bayesian approach to image-based visual hull reconstruction. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 187–194.
Matthias Grundmann, Vivek Kwatra, Mei Han, and Irfan A. Essa. 2010. Efficient hierarchical graph-based video segmentation. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 2141–2148.
Jean-Yves Guillemaut and Adrian Hilton. 2011. Joint multi-layer segmentation and reconstruction for free-viewpoint video applications. Int. J. Comput. Vision 93, 1, 73–100.
Christian Hane, Christopher Zach, Andrea Cohen, Roland Angst, and Marc Pollefeys. 2013. Joint 3D scene reconstruction and class segmentation. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 97–104.
Heiko Hirschmüller. 2006. Stereo vision in structured environments by consistent semi-global matching. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 2386–2393.
John Isidoro and Stan Sclaroff. 2003. Stochastic refinement of the visual hull to satisfy photometric and silhouette consistency constraints. In Proceedings of the IEEE International Conference on Computer Vision. 1335–1342.
Armand Joulin, Francis R. Bach, and Jean Ponce. 2010. Discriminative clustering for image co-segmentation. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 1943–1950.
Michael M. Kazhdan and Hugues Hoppe. 2013. Screened Poisson surface reconstruction. ACM Trans. Graphics 32, 3, 29:1–29:13.
Changil Kim, Henning Zimmer, Yael Pritch, Alexander Sorkine-Hornung, and Markus Gross. 2013. Scene reconstruction from high spatio-angular resolution light fields. ACM Trans. Graphics 32, 4, 73:1–73:12.
Kalin Kolev, Thomas Brox, and Daniel Cremers. 2006. Robust variational segmentation of 3D objects from multiple views. In Proceedings of the DAGM Symposium. 688–697.
Johannes Kopf, Michael F. Cohen, Dani Lischinski, and Matthew Uyttendaele. 2007. Joint bilateral upsampling. ACM Trans. Graphics 26, 3, 96:1–96:5.
Adarsh Kowdle, Sudipta N. Sinha, and Richard Szeliski. 2012. Multiple view object cosegmentation using appearance and stereo cues. In Proceedings of the European Conference on Computer Vision. 789–803.
Philipp Krähenbühl and Vladlen Koltun. 2012. Efficient inference in fully connected CRFs with Gaussian edge potentials. In Proceedings of the Annual Conference on Neural Information Processing Systems. 109–117.
Kiriakos N. Kutulakos. 1997. Shape from the light field boundary. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 53–59.
Manuel Lang, Oliver Wang, Tunc Aydin, Aljoscha Smolic, and Markus Gross. 2012. Practical temporal consistency for image-based graphics applications. ACM Trans. Graphics 31, 4, 34:1–34:8.
A. Laurentini. 1994. The visual hull concept for silhouette-based image understanding. IEEE TPAMI 16, 2, 150–162.
Wonwoo Lee, Woontack Woo, and Edmond Boyer. 2011. Silhouette segmentation in multiple views. IEEE TPAMI 33, 7, 1429–1441.
Marc Levoy and Pat Hanrahan. 1996. Light field rendering. In Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques. 31–42.
J. Lezama, Karteek Alahari, Josef Sivic, and Ivan Laptev. 2011. Track to the future: Spatio-temporal video segmentation with long-range motion cues. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 3369–3376.
Fuxin Li, Taeyoung Kim, Ahmad Humayun, David Tsai, and James M. Rehg. 2013. Video segmentation by tracking many figure-ground segments. In Proceedings of the IEEE International Conference on Computer Vision. 2192–2199.
Yin Li, Jian Sun, and Heung-Yeung Shum. 2005. Video object cut and paste. ACM Trans. Graphics 24, 3, 595–600.
Worthy N. Martin and J. K. Aggarwal. 1983. Volumetric descriptions of objects from multiple views. IEEE TPAMI 5, 2, 150–158.
Wojciech Matusik, Chris Buehler, Ramesh Raskar, Steven J. Gortler, and Leonard McMillan. 2000. Image-based visual hulls. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques. 369–374.
M. R. Oswald and D. Cremers. 2013. A convex relaxation approach to space time multi-view 3D reconstruction. In Proceedings of the ICCV Workshop on Dynamic Shape Capture and Analysis (4DMOD’13). 291–298.
Sylvain Paris, Pierre Kornprobst, Jack Tumblin, and Frédo Durand. 2007. A gentle introduction to bilateral filtering and its applications. In ACM SIGGRAPH 2007 Courses. 1–50.
Georg Petschnigg, Richard Szeliski, Maneesh Agrawala, Michael Cohen, Hugues Hoppe, and Kentaro Toyama. 2004. Digital photography with flash and no-flash image pairs. ACM Trans. Graphics 23, 3, 664–672.
Carsten Rother, Vladimir Kolmogorov, and Andrew Blake. 2004. “GrabCut”: Interactive foreground extraction using iterated graph cuts. ACM Trans. Graphics 23, 3, 309–314.
Qi Shan, Brian Curless, Yasutaka Furukawa, Carlos Hernandez, and Steven M. Seitz. 2014. Occluding contours for multi-view stereo. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 4002–4009.
Sudipta N. Sinha and Marc Pollefeys. 2005. Multi-view reconstruction using photo-consistency and exact silhouette constraints: A maximum-flow formulation. In Proceedings of the IEEE International Conference on Computer Vision. 349–356.
Dan Snow, Paul Viola, and Ramin Zabih. 2000. Exact voxel occupancy with graph cuts. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 345–352.
Jonathan Starck, Gregor Miller, and Adrian Hilton. 2006. Volumetric stereo with silhouette and feature constraints. In Proceedings of the British Machine Vision Conference. 1189–1198.
Richard Szeliski. 1993. Rapid octree construction from image sequences. CVGIP: Image Underst. 58, 1, 23–32.
Amy Tabb. 2013. Shape from silhouette probability maps: Reconstruction of thin objects in the presence of silhouette extraction and calibration error. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 161–168.
George Vogiatzis, Philip H. S. Torr, and Roberto Cipolla. 2005. Multi-view stereo via volumetric graph-cuts. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 391–398.
Jue Wang, Pravin Bhat, R. Alex Colburn, Maneesh Agrawala, and Michael F. Cohen. 2005. Interactive video cutout. ACM Trans. Graphics 24, 3, 585–594.
S. Wanner and B. Goldluecke. 2012. Globally consistent depth labeling of 4D lightfields. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 41–48.
Sven Wanner, Christoph N. Straehle, and Bastian Goldluecke. 2013. Globally consistent multi-label assignment on the ray space of 4D light fields. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 1011–1018.
Changchang Wu. 2013. Towards linear-time incremental structure from motion. In Proceedings of the 3DTV Conference. 127–134.
Anthony J. Yezzi and Stefano Soatto. 2001. Stereoscopic segmentation. In Proceedings of the IEEE International Conference on Computer Vision. 59–66.
Zhan Yu, Xinqing Guo, Haibin Ling, Andrew Lumsdaine, and Jingyi Yu. 2013. Line assisted light field triangulation and stereo matching. In Proceedings of the IEEE International Conference on Computer Vision. 2792–2799.
Guofeng Zhang, Jiaya Jia, Tien-Tsin Wong, and Hujun Bao. 2009. Consistent depth maps recovery from a video sequence. IEEE Trans. Pattern Anal. Mach. Intell. 31, 6, 974–988.

ACM Digital Library Publication:

Overview Page:

SIGGRAPH 2016: Technical Papers

“Efficient 3D Object Segmentation From Densely Sampled Light Fields With Applications to 3D Reconstruction” by Yücer, Sorkine-Hornung, Wang and Sorkine-Hornung

Conference:

Type(s):

Title:

Session/Category Title: SHAPE ANALYSIS

Presenter(s)/Author(s):

Abstract:

References:

ACM Digital Library Publication:

Overview Page:

Sponsored by: