Photo tourism: exploring photo collections in 3D

Noah Snavely; Steven M. Seitz; Richard Szeliski

“Photo tourism: exploring photo collections in 3D” by Snavely, Seitz and Szeliski

Next: “Photo zoom: high resolution from unordered... »

« Previous: “Photo clip art” by Lalonde, Hoiem, Efros,...

Conference:

SIGGRAPH 2006

Type(s):

Technical Papers

Title:

Photo tourism: exploring photo collections in 3D

Presenter(s)/Author(s):

Noah Snavely

Steven M. Seitz

Richard Szeliski

Abstract:

We present a system for interactively browsing and exploring large unstructured collections of photographs of a scene using a novel 3D interface. Our system consists of an image-based modeling front end that automatically computes the viewpoint of each photograph as well as a sparse 3D model of the scene and image to model correspondences. Our photo explorer uses image-based rendering techniques to smoothly transition between photographs, while also enabling full 3D navigation and exploration of the set of images and world geometry, along with auxiliary information such as overhead maps. Our system also makes it easy to construct photo tours of scenic or historic locations, and to annotate image details, which are automatically transferred to other relevant images. We demonstrate our system on several large personal photo collections as well as images gathered from Internet photo sharing sites.

References:

1. Aliaga, D., Funkhouser, T., Yanovsky, D., and Carlbom, I. 2003. Sea of images. IEEE Computer Graphics and Applications 23, 6, 22–30. Google ScholarDigital Library
2. Aliaga, D., Yanovsky, D., Funkhouser, T., and Carlbom, I. 2003. Interactive image-based rendering using feature globalization. In Proc. SIGGRAPH Symposium on Interactive 3D Graphics, 163–170. Google ScholarDigital Library
3. Arya, S., Mount, D. M., Netanyahu, N. S., Silverman, R., and Wu, A. Y. 1998. An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J. of the ACM 45, 6, 891–923. Google ScholarDigital Library
4. Brown, M., and Lowe, D. G. 2005. Unsupervised 3d object recognition and reconstruction in unordered datasets. In Proc. Int. Conf. on 3D Digital Imaging and Modelling, 56–63. Google ScholarDigital Library
5. Buehler, C., Bosse, M., McMillan, L., Gortler, S., and Cohen, M. 2001. Unstructured lumigraph rendering. In SIGGRAPH Conf. Proc., 425–432. Google ScholarDigital Library
6. Chen, S., and Williams, L. 1993. View interpolation for image synthesis. In SIGGRAPH Conf. Proc., 279–288. Google ScholarDigital Library
7. Chew, L. P. 1987. Constrained delaunay triangulations. In Proc. Sym. on Computational geometry, 215–222. Google ScholarDigital Library
8. Cooper, M., Foote, J., Girgensohn, A., and Wilcox, L. 2003. Temporal event clustering for digital photo collections. In Proc. ACM Int. Conf. on Multimedia, 364–373. Google ScholarDigital Library
9. Debevec, P. E., Taylor, C. J., and Malik, J. 1996. Modeling and rendering architecture from photographs: a hybrid geometry- and image-based approach. In SIGGRAPH Conf. Proc., 11–20. Google ScholarDigital Library
10. Dick, A. R., Torr, P. H. S., and Cipolla, R. 2004. Modelling and interpretation of architecture from several images. Int. J. of Computer Vision 60, 2, 111–134. Google ScholarDigital Library
11. Feiner, S., MacIntyre, B., Hollerer, T., and Webster, A. 1997. A touring machine: Prototyping 3d mobile augmented reality systems for exploring the urban environment. In Proc. IEEE Int. Sym. on Wearable Computers, 74–81. Google ScholarDigital Library
12. Fischler, M., and Bolles, R. 1987. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Readings in computer vision: issues, problems, principles, and paradigms, 726–740. Google ScholarDigital Library
13. Gortler, S. J., Grzeszczuk, R., Szeliski, R., and Cohen, M. F. 1996. The Lumigraph. In SIGGRAPH Conf. Proc., 43–54. Google ScholarDigital Library
14. Grzeszczuk, R. 2002. Course 44: Image-based modeling. In SIGGRAPH 2002.Google Scholar
15. Hartley, R. I., and Zisserman, A. 2004. Multiple View Geometry. Cambridge University Press, Cambridge, UK. Google ScholarDigital Library
16. Irani, M., and Anandan, P. 1998. Video indexing based on mosaic representation. IEEE Trans. on Pattern Analysis and Machine Intelligence 86, 5, 905–921.Google Scholar
17. Johansson, B., and Cipolla, R. 2002. A system for automatic pose-estimation from a single image in a city scene. In Proc. IASTED Int. Conf. Signal Processing, Pattern Recognition and Applications.Google Scholar
18. Kadobayashi, R., and Tanaka, K. 2005. 3d viewpoint-based photo search and information browsing. In Proc. ACM Int. Conf. on Research and development in information retrieval, 621–622. Google ScholarDigital Library
19. Levoy, M., and Hanrahan, P. 1996. Light field rendering. In SIGGRAPH Conf. Proc., 31–42. Google ScholarDigital Library
20. Lippman, A. 1980. Movie maps: An application of the optical videodisc to computer graphics. In SIGGRAPH Conf. Proc., 32–43. Google ScholarDigital Library
21. Lourakis, M., and Argyros, A. 2004. The design and implementation of a generic sparse bundle adjustment software package based on the levenberg-marquardt algorithm. Tech. Rep. 340, Inst. of Computer Science-FORTH, Heraklion, Crete, Greece. Available from www.ics.forth.gr/~lourakis/sba.Google Scholar
22. Lowe, D. 2004. Distinctive image features from scale-invariant keypoints. Int. J. of Computer Vision 60, 2, 91–110. Google ScholarDigital Library
23. McCurdy, N., and Griswold, W. 2005. A systems architecture for ubiquitous video. In Proc. Int. Conf. on mobile systems, applications, and services, 1–14. Google ScholarDigital Library
24. McMillan, L., and Bishop, G. 1995. Plenoptic modeling: An image-based rendering system. In SIGGRAPH Conf. Proc., 39–46. Google ScholarDigital Library
25. Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., and van Gool, L. 2005. A comparison of affine region detectors. Int. J. of Computer Vision 65, 1/2, 43–72. Google ScholarDigital Library
26. Naaman, M., Paepcke, A., and Garcia-Molina, H. 2003. From where to what: Metadata sharing for digital photographs with geographic coordinates. In Proc. Int. Conf. on Cooperative Information Systems, 196–217.Google Scholar
27. Naaman, M., Song, Y. J., Paepcke, A., and Garcia-Molina, H. 2004. Automatic organization for digital photographs with geographic coordinates. In Proc. ACM/IEEE-CS Joint Conf. on Digital libraries, 53–62. Google ScholarDigital Library
28. Nocedal, J., and Wright, S. J. 1999. Numerical Optimization. Springer Series in Operations Research. Springer-Verlag, New York, NY.Google Scholar
29. Pollefeys, M., Van Gool, L., Vergauwen, M., Verbiest, F., Cornelis, K., Tops, J., and Koch, R. 2004. Visual modeling with a hand-held camera. Int. J. of Computer Vision 59, 3, 207–232. Google ScholarDigital Library
30. Robertson, D. P., and Cipolla, R. 2002. Building architectural models from many views using map constraints. In Proc. European Conf. on Computer Vision, vol. II, 155–169. Google ScholarDigital Library
31. Rodden, K., and Wood, K. R. 2003. How do people manage their digital photographs? In Proc. Conf. on Human Factors in Computing Systems, 409–416. Google ScholarDigital Library
32. Román, A., Garg, G., and Levoy, M. 2004. Interactive design of multi-perspective images for visualizing urban landscapes. In Proc. IEEE Visualization, 537–544. Google ScholarDigital Library
33. Russell, B. C., Torralba, A., Murphy, K. P., and Freeman, W. T. 2005. Labelme: A database and web-based tool for image annotation. Tech. Rep. MIT-CSAIL-TR-2005-056, Massachusetts Institute of Technology.Google Scholar
34. Schaffalitzky, F., and Zisserman, A. 2002. Multi-view matching for unordered image sets, or “How do I organize my holiday snaps?”. In Proc. European Conf. on Computer Vision, vol. 1, 414–431. Google ScholarDigital Library
35. Schmid, C., and Zisserman, A. 1997. Automatic line matching across views. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 666–671. Google ScholarDigital Library
36. Seitz, S. M., and Dyer, C. M. 1996. View morphing. In SIGGRAPH Conf. Proc., 21–30. Google ScholarDigital Library
37. Sivic, J., and Zisserman, A. 2003. Video Google: A text retrieval approach to object matching in videos. In Proc. Int. Conf. on Computer Vision, 1470–1477. Google ScholarDigital Library
38. Steedly, D., Essa, I., and Delleart, F. 2003. Spectral partitioning for structure from motion. In Proc. Int. Conf. on Computer Vision, 996–103. Google ScholarDigital Library
39. Szeliski, R. 2005. Image alignment and stitching: A tutorial. Tech. Rep. MSR-TR-2004-92, Microsoft Research.Google Scholar
40. Teller, S., et al. 2003. Calibrated, registered images of an extended urban area. Int. J. of Computer Vision 53, 1, 93–107. Google ScholarDigital Library
41. Toyama, K., Logan, R., and Roseway, A. 2003. Geographic location tags on digital images. In Proc. Int. Conf. on Multimedia, 156–166. Google ScholarDigital Library
42. von Ahn, L., and Dabbish, L. 2004. Labeling images with a computer game. In Proc. Conf. on Human Factors in Computing Systems, 319–326. Google ScholarDigital Library
43. Zitnick, L., Kang, S. B., Uyttendaele, M., Winder, S., and Szeliski, R. 2004. High-quality video view interpolation using a layered representation. In SIGGRAPH Conf. Proc., 600–608. Google ScholarDigital Library

ACM Digital Library Publication: