Automatic Scene Inference for 3D Object Compositing

We present a user-friendly image editing system that supports a drag-and-drop object insertion (where the user merely drags objects into the image, and the system automatically places them in 3D and relights them appropriately), postprocess illumination editing, and depth-of-field manipulation. Underlying our system is a fully automatic technique for recovering a comprehensive 3D scene model (geometry, illumination, diffuse albedo, and camera parameters) from a single, low dynamic range photograph. This is made possible by two novel contributions: an illumination inference algorithm that recovers a full lighting model of the scene (including light sources that are not directly visible in the photograph), and a depth estimation algorithm that combines data-driven depth transfer with geometric reasoning about the scene layout. A user study shows that our system produces perceptually convincing results, and achieves the same level of realism as techniques that require significant user interaction.

References:

R. Achanta, A. Shah, K. Smith, A. Lucchi, P. Fua, and S. Strunk. 2012. Slic superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell. 34, 11, 2274–2282.
J. T. Barron and J. Malik. 2013. Intrinsic scene properties from a single rgb-d image. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’13).
S. Boivin and A. Gagalowicz. 2001. Image-based rendering of diffuse, specular and glossy surfaces from a single image. In Proceedings of the Annual ACM SIGGRAPH Conference on Computer Graphics and Interactive Techniques.
I. Boyadzhiev, S. Paris, and K. Bala. 2013. Example-based synthesis of 3d object arrangements. In Proceedings of the Annual ACM SIGGRAPH Conference on Computer Graphics and Interactive Techniques.
A. Criminisi, I. Reid, and A. Zisserman. 2000. Single view metrology. Int. J. Comput. Vis. 40, 2, 123–148.
P. Debevec. 1998. Rendering synthetic objects into real scenes: Bridging traditional and image-based graphics with global illumination and high dynamic range photography. In Proceedings of the Annual ACM SIGGRAPH Conference on Computer Graphics and Interactive Techniques.
P. Debevec. 2005. Making “the parthenon”. In Proceedings of the International Symposium on Virtual Reality, Archaeology, and Culturage Heritage.
E. Delage, H. Lee, and A. Y. Ng. 2005. Automatic single-image 3d reconstructions of indoor manhattan world scenes. In Proceedings of the International Symposium on Robotics Research (ISRR’05). 305–321.
L. D. del Pero, J. Bowdish, E. Hartley, B. Kermgard, and K. Barnard. 2013. Understanding bayesian rooms using composite 3d object models. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’13).
M. Diaz and P. Sturm. 2013. Estimating photometric properties from image collections. J. Math. Imag. Vis. 47, 1–2, 93–107.
R. O. Dror, A. S. Willsky, and E. H. Adelson. 2004. Statistical characterization of real-world illumination. J. Vis. 4, 9, 821–837.
Y. Furukawa, B. Curless, S. M. Seitz, and R. Szeliski. 2009. Manhattan-world stereo. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’09). IEEE, 1422–1429.
D. Gallup, J.-M. Frahm, and M. Pollefeys. 2010. Piecewise planar and non-planar stereo for urban scene reconstruction. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’10).
S. Gibson and A. Murta. 2000. Interactive rendering with real-world illumination. In Proceedings of the Eurographics Symposium on Rendering (EGSR’00). Springer, 365–376.
R. Grosse, M. K. Johnson, E. H. Adelson, and W. Freeman. 2009. Ground truth dataset and baseline evaluations for intrinsic image algorithms. In Proceedings of the International Conference on Computer Vision (ICCV’09).
R. Hartley and A. Zisserman. 2003. Multiple View Geometry in Computer Vision. Cambridge University Press.
V. Hedau, D. Hoiem, and D. Forsyth. 2009. Recovering the spatial layout of cluttered rooms. In Proceedings of the International Conference on Computer Vision (ICCV’09).
D. Hoiem, A. Efros, and M. Hebert. 2005a. Geometric context from a single image. In Proceedings of the International Conference on Computer Vision (ICCV’05). Vol. 1. 654–661.
D. Hoiem, A. A. Efros, and M. Hebert. 2005b. Automatic photo pop-up. ACM Trans. Graph. 24, 3, 577–584.
Y. Horry, K.-L. Aniyo, and K. Arai. 1997. Tour into the picture: Using a spidery mesh interface to make animation from a single image. In Proceedings of the Annual ACM SIGGRAPH Conference on Computer Graphics and Interactive Techniques.
H. Intraub and M. Richardson. 1989. Wide-angle memories of close-up scenes. J. Exper. Psychol. Learn. Memor. Cogn. 15, 2, 179–187.
T. Joachims. 2006. Training linear svm in linear time. In Proceedings of the International ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD’06). 217–226.
M. K. Johnson and H. Farid. 2005. Exposing digital forgeries by detecting inconsistencies in lighting. In Proceedings of the Workshop on Multimedia and Security.
M. K. Johnson and H. Farid. 2007. Exposing digital forgeries in complex lighting environments. IEEE Trans. Inf. Forens. Secur. 2, 3, 450–461.
K. Karsch, V. Hedau, D. Forsyth, and D. Hoiem. 2011. Rendering synthetic objects into legacy photographs. In Proceedings of the ACM Conference and Exhibition of Computer Graphics and Interactive Techniques in Asia. 157:1–157:12.
K. Karsch, C. Liu, and S. B. Kang. 2012. Depth extraction from video using non-parametric sampling. In Proceedings of the European Conference on Computer Vision (ECCV’12).
E. A. Khan, E. Reinhard, R. W. W. Fleming, and H. H. Bulthoff. 2006. Image-based material editing. In Proceedings of the Annual ACM SIGGRAPH Conference on Computer Graphics and Interactive Techniques.
J. Lalonde, A. A. Efros, and S. Narasimhan. 2009. Estimating natural illumination from a single outdoor image. In Proceedings of the International Conference on Computer Vision (ICCV’09).
J. Lalonde, D. Hoiem, A. A. Efros, and C. Rother. 2007. Photo clip art. In Proceedings of the Annual ACM SIGGRAPH Conference on Computer Graphics and Interactive Techniques.
S. Lazebnik, C. Schmid, and J. Ponce. 2006. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’06). 2169–2178.
D. C. Lee, M. Hebert, and T. Kanade. 2009. Geometric reasoning for single image structure recovery. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’09). 2136–2143.
B. Liu, S. Gould, and D. Koller. 2010. Single image depth estimation from predicted semantic labels. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’10). 1253–1260.
S. Lombardi and K. Nishino. 2012a. Reflectance and natural illumination from a single image. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’12).
S. Lombardi and K. Nishino. 2012b. Single image multimaterial estimation. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’12).
J. Lopez-Moreno, S. Hadap, E. Reinhard, and D. Gutierrez. 2010. Compositing images through light source detection. Comput. Graph. 34, 6, 698–707.
C. Loscos, M.-C. Frasson, G. Drettakis, B. Walter, X. Granier, and P. Poulin. 1999. Interactive virtual relighting and remodeling of real scenes. In Proceedings of the Eurographics Symposium on Rendering (EGSR’99). 329–340.
J. S. Nimeroff, E. Simoncelli, and J. Dorsey. 1994. Efficient rerendering of naturally illuminated environments. In Proceedings of the Eurographics Symposium on Rendering (EGSR). 359–373.
K. Nishino and S. K. Nayar. 2004. Eyes for relighting. ACM Trans. Graph. 23, 3, 704–711.
J. Nocedal and S. J. Wright. 2006. Numerical Optimization 2^nd Ed. Springer.
B. M. Oh, M. Chen, J. Dorsey, and F. Durand. 2001. Image-based modeling and photo editing. In Proceedings of the Annual ACM SIGGRAPH Conference on Computer Graphics and Interactive Techniques. 433–442.
A. Oliva and A. Torralba. 2001. Modeling the shape of the scene: A holistic representation of the spatial envelope. Int. J. Comput. Vis. 42, 3, 145–175.
A. Panagopoulos, C. Wang, D. Samaras, and N. Paragios. 2011. Illumination estimation and cast shadow detection through a higher-order graphical model. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’11). 673–680.
M. Pharr and G. Humphreys. 2010. Physically Based Rendering: From Theory to Implementation 2^nd Ed. Morgan Kaufmann, San Fransisco.
R. Ramamoorthi and P. Hanrahan. 2004. A signal-processing framework for reflection. ACM Trans. Graph. 23, 4, 1004–1042.
G. Ramanarayanan, J. A. Ferwerda, B. Walter, and K. Bala. 2007. Visual equivalence: Towards a new standard for image fidelity. ACM Trans. Graph. 26, 3.
F. Romeiro, Y. Vasilyev, and T. Zickler. 2008. Passive reflectometry. In Proceedings of the European Conference on Computer Vision (ECCV’08).
F. Romeiro and T. Zickler. 2010. Blind reflectometry. In Proceedings of the European Conference on Computer Vision (ECCV’10).
S. Satkin, J. Lin, and M. Hebert. 2012. Data-driven scene understanding from 3d models. In Proceedings of the 2^nd British Machine Vision Conference.
A. Saxena, M. Sun, and A. Y. Ng. 2009. Make3D: Learning 3d scene structure from a single still image. IEEE Trans. Pattern Anal. Mach. Intell. 31, 5, 824–840.
C. Schoeneman, J. Dorsey, B. Smith, J. Arvo, and D. Greenberg. 1993. Painting with light. In Proceedings of the 20^th Annual ACM SIGGRAPH Conference on Computer Graphics and Interactive Techniques. ACM Press, New York, 143–146.
A. G. Schwing and R. Urtasun. 2O12. Efficient exact inference for 3d indoor scene understanding. In Proceedings of the European Conference on Computer Vision (ECCV’12). 299–313.
R. Tibshirani. 1996. Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. B58, 1, 267–288.
J. Xiao, K. A. Ehinger, A. Oliva, and A. Torralba. 2012. Recognizing scene viewpoint using panoramic place representation. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’12).
Y. Yu, P. Debevec, J. Malik, and T. Hawkins. 1999. Inverse global illumination: Recovering reflectance models of real scenes from photographs. In Proceedings of the Annual ACM SIGGRAPH Conference on Computer Graphics and Interactive Techniques.
A. R. Zang, D. Felinto, and L. Velho. 2012. Augmented reality using full panoramic captured scene light-depth maps. In ACM SIGGRAPH Asia Papers. 28:1.
Y. Zhang, J. Xiao, J. Hays, and P. Tan. 2013. Framebreak: Dramatic image extrapolation by guided shift-maps. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’13).

ACM Digital Library Publication:

Overview Page:

SIGGRAPH 2014: Technical Papers

“Automatic Scene Inference for 3D Object Compositing” by Karsch, Sunkavalli, Hadap, Carr, Jin, et al. …

Conference:

Type(s):

Title:

Session/Category Title: Shady Images

Presenter(s)/Author(s):

Moderator(s):

Abstract:

References:

ACM Digital Library Publication:

Overview Page:

Sponsored by: