“Automatic triage for a photo series”

  • ©Huiwen Chang, Fisher Yu, Jue Wang, Douglas Ashley, and Adam Finkelstein




    Automatic triage for a photo series





    People often take a series of nearly redundant pictures to capture a moment or scene. However, selecting photos to keep or share from a large collection is a painful chore. To address this problem, we seek a relative quality measure within a series of photos taken of the same scene, which can be used for automatic photo triage. Towards this end, we gather a large dataset comprised of photo series distilled from personal photo albums. The dataset contains 15, 545 unedited photos organized in 5,953 series. By augmenting this dataset with ground truth human preferences among photos within each series, we establish a benchmark for measuring the effectiveness of algorithmic models of how people select photos. We introduce several new approaches for modeling human preference based on machine learning. We also describe applications for the dataset and predictor, including a smart album viewer, automatic photo enhancement, and providing overviews of video clips.


    1. Bell, S., and Bala, K. 2015. Learning visual similarity for product design with convolutional neural networks. ACM Transactions on Graphics (TOG) 34, 4, 98. Google ScholarDigital Library
    2. Bhattacharya, S., Sukthankar, R., and Shah, M. 2010. A framework for photo-quality assessment and enhancement based on visual aesthetics. In Proceedings of the international conference on Multimedia, ACM, 271–280. Google ScholarDigital Library
    3. Breiman, L. 2001. Random forests. Machine learning 45, 1, 5–32. Google ScholarDigital Library
    4. Bromley, J., Bentz, J. W., Bottou, L., Guyon, I., LeCun, Y., Moore, C., Säckinger, E., and Shah, R. 1993. Signature verification using a “siamese” time delay neural network. International Journal of Pattern Recognition and Artificial Intelligence 7, 04, 669–688.Google ScholarCross Ref
    5. Bychkovsky, V., Paris, S., Chan, E., and Durand, F. 2011. Learning photographic global tonal adjustment with a database of input/output image pairs. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, IEEE, 97–104. Google ScholarDigital Library
    6. Cao, X., Wei, Y., Wen, F., and Sun, J. 2014. Face alignment by explicit shape regression. International Journal of Computer Vision 107, 2, 177–190. Google ScholarDigital Library
    7. Chopra, S., Hadsell, R., and LeCun, Y. 2005. Learning a similarity metric discriminatively, with application to face verification. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 1, IEEE, 539–546. Google ScholarDigital Library
    8. Cootes, T. F., Edwards, G. J., and Taylor, C. J. 1998. Active appearance models. In Computer Vision?ECCV?98. Springer, 484–498. Google ScholarDigital Library
    9. Datta, R., Joshi, D., Li, J., and Wang, J. Z. 2006. Studying aesthetics in photographic images using a computational approach. In Computer Vision–ECCV 2006. Springer, 288–301. Google ScholarDigital Library
    10. Dhar, S., Ordonez, V., and Berg, T. L. 2011. High level de-scribable attributes for predicting aesthetics and interestingness. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, IEEE, 1657–1664. Google ScholarDigital Library
    11. Drucker, S., Wong, C., Roseway, A., Glenner, S., and De Mar, S. 2003. Photo-triage: Rapidly annotating your digital photographs. Tech. rep., Microsoft Research Technical Report, MSR-TR-2003-99.Google Scholar
    12. Girshick, R., Donahue, J., Darrell, T., and Malik, J. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, IEEE, 580–587. Google ScholarDigital Library
    13. Girshick, R. 2015. Fast r-cnn. arXiv preprint arXiv:1504.08083. Google ScholarDigital Library
    14. Guo, Y., Liu, M., Gu, T., and Wang, W. 2012. Improving photo composition elegantly: Considering image similarity during composition optimization. In Computer Graphics Forum, Wiley Online Library, 2193–2202. Google ScholarDigital Library
    15. HaCohen, Y., Shechtman, E., Goldman, D. B., and Lischinski, D. 2011. Non-rigid dense correspondence with applications for image enhancement. ACM Transactions on Graphics (Proceedings of ACM SIGGRAPH 2011) 30, 4, 70:1–70:9. Google ScholarDigital Library
    16. Hariharan, B., Arbeláez, P., Girshick, R., and Malik, J. 2014. Hypercolumns for object segmentation and fine-grained localization. arXiv preprint arXiv:1411.5752.Google Scholar
    17. He, K., Zhang, X., Ren, S., and Sun, J. 2015. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385.Google Scholar
    18. Hertzmann, A., Jacobs, C. E., Oliver, N., Curless, B., and Salesin, D. H. 2001. Image analogies. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques, ACM, 327–340. Google ScholarDigital Library
    19. Jacobs, D. E., Goldman, D. B., and Shechtman, E. 2010. Cosaliency: Where people look when comparing images. In Proceedings of the 23nd annual ACM symposium on User interface software and technology, ACM, 219–228. Google ScholarDigital Library
    20. Judd, T., Ehinger, K., Durand, F., and Torralba, A. 2009. Learning to predict where humans look. In IEEE International Conference on Computer Vision (ICCV), IEEE.Google Scholar
    21. Karayev, S., Trentacoste, M., Han, H., Agarwala, A., Darrell, T., Hertzmann, A., and Winnemoeller, H. 2013. Recognizing image style. arXiv preprint arXiv:1311.3715.Google Scholar
    22. Kaufman, L., Lischinski, D., and Werman, M. 2012. Content-aware automatic photo enhancement. In Computer Graphics Forum, Wiley Online Library, 2528–2540. Google ScholarDigital Library
    23. Ke, Y., Tang, X., and Jing, F. 2006. The design of high-level features for photo quality assessment. In Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, vol. 1, IEEE, 419–426. Google ScholarDigital Library
    24. Khosla, A., Raju, A. S., Torralba, A., and Oliva, A. 2015. Understanding and predicting image memorability at a large scale. In International Conference on Computer Vision (ICCV). Google ScholarDigital Library
    25. Kittur, A., Chi, E. H., and Suh, B. 2008. Crowdsourcing user studies with mechanical turk. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ACM, New York, NY, USA, CHI ’08, 453–456. Google ScholarDigital Library
    26. Krizhevsky, A., Sutskever, I., and Hinton, G. E. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, 1097–1105.Google Scholar
    27. Liu, L., Chen, R., Wolf, L., and Cohen-Or, D. 2010. Optimizing photo composition. Computer Graphic Forum (Proceedings of Eurographics) 29, 2, 469–478.Google ScholarCross Ref
    28. Long, J., Shelhamer, E., and Darrell, T. 2014. Fully convolutional networks for semantic segmentation. arXiv preprint arXiv:1411.4038.Google Scholar
    29. Lowe, D. G. 2004. Distinctive image features from scale-invariant keypoints. International journal of computer vision 60, 2, 91–110. Google ScholarDigital Library
    30. Lu, X., Lin, Z., Jin, H., Yang, J., and Wang, J. Z. 2014. Rapid: Rating pictorial aesthetics using deep learning. In Proceedings of the ACM International Conference on Multimedia, ACM, 457–466. Google ScholarDigital Library
    31. Luo, Y., and Tang, X. 2008. Photo and video quality evaluation: Focusing on the subject. In Computer Vision–ECCV 2008. Springer, 386–399. Google ScholarDigital Library
    32. Luo, W., Wang, X., and Tang, X. 2011. Content-based photo quality assessment. In Computer Vision (ICCV), 2011 IEEE International Conference on, IEEE, 2206–2213. Google ScholarDigital Library
    33. Ma, Y.-F., Lu, L., Zhang, H.-J., and Li, M. 2002. A user attention model for video summarization. In Proceedings of the tenth ACM international conference on Multimedia, ACM, 533–542. Google ScholarDigital Library
    34. Marchesotti, L., Perronnin, F., Larlus, D., and Csurka, G. 2011. Assessing the aesthetic quality of photographs using generic image descriptors. In Computer Vision (ICCV), 2011 IEEE International Conference on, IEEE, 1784–1791. Google ScholarDigital Library
    35. Megvii Inc., 2013. Face++ research toolkit. www.faceplusplus.com.Google Scholar
    36. Murray, N., Marchesotti, L., and Perronnin, F. 2012. Ava: A large-scale database for aesthetic visual analysis. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, IEEE, 2408–2415. Google ScholarDigital Library
    37. Nishiyama, M., Okabe, T., Sato, I., and Sato, Y. 2011. Aesthetic quality classification of photographs based on color harmony. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, IEEE, 33–40. Google ScholarDigital Library
    38. Oliva, A., and Torralba, A. 2001. Modeling the shape of the scene: A holistic representation of the spatial envelope. International journal of computer vision 42, 3, 145–175. Google ScholarDigital Library
    39. Paige, C. C., and Saunders, M. A. 1982. Lsqr: An algorithm for sparse linear equations and sparse least squares. ACM Trans. Math. Softw. 8, 1, 43–71. Google ScholarDigital Library
    40. Park, J., Lee, J.-Y., Tai, Y.-W., and Kweon, I. S. 2012. Modeling photo composition and its application to photo rearrangement. In Image Processing (ICIP), 2012 19th IEEE International Conference on, IEEE, 2741–2744.Google Scholar
    41. Ralph Allan Bradley, M. E. T. 1952. Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika 39, 3/4, 324–345.Google Scholar
    42. Ren, X., and Malik, J. 2003. Learning a classification model for segmentation. In Computer Vision, 2003. Proceedings. Ninth IEEE International Conference on, IEEE, 10–17. Google ScholarDigital Library
    43. Ren, S., He, K., Girshick, R., and Sun, J. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems, 91–99. Google ScholarDigital Library
    44. Ren, S., He, K., Girshick, R. B., Zhang, X., and Sun, J. 2015. Object detection networks on convolutional feature maps. CoRR abs/1504.06066.Google Scholar
    45. Simon, I., Snavely, N., and Seitz, S. M. 2007. Scene summarization for online image collections. In ICCV, IEEE.Google Scholar
    46. Simonyan, K., and Zisserman, A. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.Google Scholar
    47. Sinha, P., Mehrotra, S., and Jain, R. 2011. Summarization of personal photologs using multidimensional content and context. In Proceedings of the 1st ACM International Conference on Multimedia Retrieval, ACM, 4. Google ScholarDigital Library
    48. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. 2014. Going deeper with convolutions. arXiv preprint arXiv:1409.4842.Google Scholar
    49. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna, Z. 2015. Rethinking the inception architecture for computer vision. arXiv preprint arXiv:1512.00567.Google Scholar
    50. Tang, H., Joshi, N., and Kapoor, A. 2011. Learning a blind measure of perceptual image quality. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, IEEE, 305–312. Google ScholarDigital Library
    51. Wang, X.-J., Zhang, L., and Liu, C. 2013. Duplicate discovery on 2 billion internet images. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2013 IEEE Conference on, IEEE, 429–436. Google ScholarDigital Library
    52. Ye, P., Kumar, J., Kang, L., and Doermann, D. 2012. Unsupervised feature learning framework for no-reference image quality assessment. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, IEEE, 1098–1105. Google ScholarDigital Library
    53. Yu, F., and Koltun, V. 2015. Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122.Google Scholar
    54. Yu, F., Zhang, Y., Song, S., Seff, A., and Xiao, J. 2015. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365.Google Scholar
    55. Yuan, L., and Sun, J. 2012. Automatic exposure correction of consumer photographs. In Computer Vision–ECCV 2012. Springer, 771–785. Google ScholarDigital Library
    56. Zhang, L., Song, M., Zhao, Q., Liu, X., Bu, J., and Chen, C. 2013. Probabilistic graphlet transfer for photo cropping. Image Processing, IEEE Transactions on 22, 2, 802–815. Google ScholarDigital Library
    57. Zhou, E., Fan, H., Cao, Z., Jiang, Y., and Yin, Q. 2013. Extensive facial landmark localization with coarse-to-fine convolutional network cascade. In Proceedings of the IEEE International Conference on Computer Vision Workshops, 386–391. Google ScholarDigital Library
    58. Zhu, J.-Y., Agarwala, A., Efros, A. A., Shechtman, E., and Wang, J. 2014. Mirror mirror: Crowdsourcing better portraits. ACM Transactions on Graphics (TOG) 33, 6, 234. Google ScholarDigital Library

ACM Digital Library Publication:

Overview Page: