“Photo tourism: exploring photo collections in 3D” by Snavely, Seitz and Szeliski

  • ©Noah Snavely, Steven Seitz, and Richard Szeliski




    Photo tourism: exploring photo collections in 3D



    We present a system for interactively browsing and exploring large unstructured collections of photographs of a scene using a novel 3D interface. Our system consists of an image-based modeling front end that automatically computes the viewpoint of each photograph as well as a sparse 3D model of the scene and image to model correspondences. Our photo explorer uses image-based rendering techniques to smoothly transition between photographs, while also enabling full 3D navigation and exploration of the set of images and world geometry, along with auxiliary information such as overhead maps. Our system also makes it easy to construct photo tours of scenic or historic locations, and to annotate image details, which are automatically transferred to other relevant images. We demonstrate our system on several large personal photo collections as well as images gathered from Internet photo sharing sites.


    1. Aliaga, D., Funkhouser, T., Yanovsky, D., and Carlbom, I. 2003. Sea of images. IEEE Computer Graphics and Applications 23, 6, 22–30. Google ScholarDigital Library
    2. Aliaga, D., Yanovsky, D., Funkhouser, T., and Carlbom, I. 2003. Interactive image-based rendering using feature globalization. In Proc. SIGGRAPH Symposium on Interactive 3D Graphics, 163–170. Google ScholarDigital Library
    3. Arya, S., Mount, D. M., Netanyahu, N. S., Silverman, R., and Wu, A. Y. 1998. An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J. of the ACM 45, 6, 891–923. Google ScholarDigital Library
    4. Brown, M., and Lowe, D. G. 2005. Unsupervised 3d object recognition and reconstruction in unordered datasets. In Proc. Int. Conf. on 3D Digital Imaging and Modelling, 56–63. Google ScholarDigital Library
    5. Buehler, C., Bosse, M., McMillan, L., Gortler, S., and Cohen, M. 2001. Unstructured lumigraph rendering. In SIGGRAPH Conf. Proc., 425–432. Google ScholarDigital Library
    6. Chen, S., and Williams, L. 1993. View interpolation for image synthesis. In SIGGRAPH Conf. Proc., 279–288. Google ScholarDigital Library
    7. Chew, L. P. 1987. Constrained delaunay triangulations. In Proc. Sym. on Computational geometry, 215–222. Google ScholarDigital Library
    8. Cooper, M., Foote, J., Girgensohn, A., and Wilcox, L. 2003. Temporal event clustering for digital photo collections. In Proc. ACM Int. Conf. on Multimedia, 364–373. Google ScholarDigital Library
    9. Debevec, P. E., Taylor, C. J., and Malik, J. 1996. Modeling and rendering architecture from photographs: a hybrid geometry- and image-based approach. In SIGGRAPH Conf. Proc., 11–20. Google ScholarDigital Library
    10. Dick, A. R., Torr, P. H. S., and Cipolla, R. 2004. Modelling and interpretation of architecture from several images. Int. J. of Computer Vision 60, 2, 111–134. Google ScholarDigital Library
    11. Feiner, S., MacIntyre, B., Hollerer, T., and Webster, A. 1997. A touring machine: Prototyping 3d mobile augmented reality systems for exploring the urban environment. In Proc. IEEE Int. Sym. on Wearable Computers, 74–81. Google ScholarDigital Library
    12. Fischler, M., and Bolles, R. 1987. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Readings in computer vision: issues, problems, principles, and paradigms, 726–740. Google ScholarDigital Library
    13. Gortler, S. J., Grzeszczuk, R., Szeliski, R., and Cohen, M. F. 1996. The Lumigraph. In SIGGRAPH Conf. Proc., 43–54. Google ScholarDigital Library
    14. Grzeszczuk, R. 2002. Course 44: Image-based modeling. In SIGGRAPH 2002.Google Scholar
    15. Hartley, R. I., and Zisserman, A. 2004. Multiple View Geometry. Cambridge University Press, Cambridge, UK. Google ScholarDigital Library
    16. Irani, M., and Anandan, P. 1998. Video indexing based on mosaic representation. IEEE Trans. on Pattern Analysis and Machine Intelligence 86, 5, 905–921.Google Scholar
    17. Johansson, B., and Cipolla, R. 2002. A system for automatic pose-estimation from a single image in a city scene. In Proc. IASTED Int. Conf. Signal Processing, Pattern Recognition and Applications.Google Scholar
    18. Kadobayashi, R., and Tanaka, K. 2005. 3d viewpoint-based photo search and information browsing. In Proc. ACM Int. Conf. on Research and development in information retrieval, 621–622. Google ScholarDigital Library
    19. Levoy, M., and Hanrahan, P. 1996. Light field rendering. In SIGGRAPH Conf. Proc., 31–42. Google ScholarDigital Library
    20. Lippman, A. 1980. Movie maps: An application of the optical videodisc to computer graphics. In SIGGRAPH Conf. Proc., 32–43. Google ScholarDigital Library
    21. Lourakis, M., and Argyros, A. 2004. The design and implementation of a generic sparse bundle adjustment software package based on the levenberg-marquardt algorithm. Tech. Rep. 340, Inst. of Computer Science-FORTH, Heraklion, Crete, Greece. Available from www.ics.forth.gr/~lourakis/sba.Google Scholar
    22. Lowe, D. 2004. Distinctive image features from scale-invariant keypoints. Int. J. of Computer Vision 60, 2, 91–110. Google ScholarDigital Library
    23. McCurdy, N., and Griswold, W. 2005. A systems architecture for ubiquitous video. In Proc. Int. Conf. on mobile systems, applications, and services, 1–14. Google ScholarDigital Library
    24. McMillan, L., and Bishop, G. 1995. Plenoptic modeling: An image-based rendering system. In SIGGRAPH Conf. Proc., 39–46. Google ScholarDigital Library
    25. Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., and van Gool, L. 2005. A comparison of affine region detectors. Int. J. of Computer Vision 65, 1/2, 43–72. Google ScholarDigital Library
    26. Naaman, M., Paepcke, A., and Garcia-Molina, H. 2003. From where to what: Metadata sharing for digital photographs with geographic coordinates. In Proc. Int. Conf. on Cooperative Information Systems, 196–217.Google Scholar
    27. Naaman, M., Song, Y. J., Paepcke, A., and Garcia-Molina, H. 2004. Automatic organization for digital photographs with geographic coordinates. In Proc. ACM/IEEE-CS Joint Conf. on Digital libraries, 53–62. Google ScholarDigital Library
    28. Nocedal, J., and Wright, S. J. 1999. Numerical Optimization. Springer Series in Operations Research. Springer-Verlag, New York, NY.Google Scholar
    29. Pollefeys, M., Van Gool, L., Vergauwen, M., Verbiest, F., Cornelis, K., Tops, J., and Koch, R. 2004. Visual modeling with a hand-held camera. Int. J. of Computer Vision 59, 3, 207–232. Google ScholarDigital Library
    30. Robertson, D. P., and Cipolla, R. 2002. Building architectural models from many views using map constraints. In Proc. European Conf. on Computer Vision, vol. II, 155–169. Google ScholarDigital Library
    31. Rodden, K., and Wood, K. R. 2003. How do people manage their digital photographs? In Proc. Conf. on Human Factors in Computing Systems, 409–416. Google ScholarDigital Library
    32. Román, A., Garg, G., and Levoy, M. 2004. Interactive design of multi-perspective images for visualizing urban landscapes. In Proc. IEEE Visualization, 537–544. Google ScholarDigital Library
    33. Russell, B. C., Torralba, A., Murphy, K. P., and Freeman, W. T. 2005. Labelme: A database and web-based tool for image annotation. Tech. Rep. MIT-CSAIL-TR-2005-056, Massachusetts Institute of Technology.Google Scholar
    34. Schaffalitzky, F., and Zisserman, A. 2002. Multi-view matching for unordered image sets, or “How do I organize my holiday snaps?”. In Proc. European Conf. on Computer Vision, vol. 1, 414–431. Google ScholarDigital Library
    35. Schmid, C., and Zisserman, A. 1997. Automatic line matching across views. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 666–671. Google ScholarDigital Library
    36. Seitz, S. M., and Dyer, C. M. 1996. View morphing. In SIGGRAPH Conf. Proc., 21–30. Google ScholarDigital Library
    37. Sivic, J., and Zisserman, A. 2003. Video Google: A text retrieval approach to object matching in videos. In Proc. Int. Conf. on Computer Vision, 1470–1477. Google ScholarDigital Library
    38. Steedly, D., Essa, I., and Delleart, F. 2003. Spectral partitioning for structure from motion. In Proc. Int. Conf. on Computer Vision, 996–103. Google ScholarDigital Library
    39. Szeliski, R. 2005. Image alignment and stitching: A tutorial. Tech. Rep. MSR-TR-2004-92, Microsoft Research.Google Scholar
    40. Teller, S., et al. 2003. Calibrated, registered images of an extended urban area. Int. J. of Computer Vision 53, 1, 93–107. Google ScholarDigital Library
    41. Toyama, K., Logan, R., and Roseway, A. 2003. Geographic location tags on digital images. In Proc. Int. Conf. on Multimedia, 156–166. Google ScholarDigital Library
    42. von Ahn, L., and Dabbish, L. 2004. Labeling images with a computer game. In Proc. Conf. on Human Factors in Computing Systems, 319–326. Google ScholarDigital Library
    43. Zitnick, L., Kang, S. B., Uyttendaele, M., Winder, S., and Szeliski, R. 2004. High-quality video view interpolation using a layered representation. In SIGGRAPH Conf. Proc., 600–608. Google ScholarDigital Library

ACM Digital Library Publication:

Overview Page: