Automatic triage for a photo series

People often take a series of nearly redundant pictures to capture a moment or scene. However, selecting photos to keep or share from a large collection is a painful chore. To address this problem, we seek a relative quality measure within a series of photos taken of the same scene, which can be used for automatic photo triage. Towards this end, we gather a large dataset comprised of photo series distilled from personal photo albums. The dataset contains 15, 545 unedited photos organized in 5,953 series. By augmenting this dataset with ground truth human preferences among photos within each series, we establish a benchmark for measuring the effectiveness of algorithmic models of how people select photos. We introduce several new approaches for modeling human preference based on machine learning. We also describe applications for the dataset and predictor, including a smart album viewer, automatic photo enhancement, and providing overviews of video clips.

References:

1. Bell, S., and Bala, K. 2015. Learning visual similarity for product design with convolutional neural networks. ACM Transactions on Graphics (TOG) 34, 4, 98. Google ScholarDigital Library
2. Bhattacharya, S., Sukthankar, R., and Shah, M. 2010. A framework for photo-quality assessment and enhancement based on visual aesthetics. In Proceedings of the international conference on Multimedia, ACM, 271–280. Google ScholarDigital Library
3. Breiman, L. 2001. Random forests. Machine learning 45, 1, 5–32. Google ScholarDigital Library
4. Bromley, J., Bentz, J. W., Bottou, L., Guyon, I., LeCun, Y., Moore, C., Säckinger, E., and Shah, R. 1993. Signature verification using a “siamese” time delay neural network. International Journal of Pattern Recognition and Artificial Intelligence 7, 04, 669–688.Google ScholarCross Ref
5. Bychkovsky, V., Paris, S., Chan, E., and Durand, F. 2011. Learning photographic global tonal adjustment with a database of input/output image pairs. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, IEEE, 97–104. Google ScholarDigital Library
6. Cao, X., Wei, Y., Wen, F., and Sun, J. 2014. Face alignment by explicit shape regression. International Journal of Computer Vision 107, 2, 177–190. Google ScholarDigital Library
7. Chopra, S., Hadsell, R., and LeCun, Y. 2005. Learning a similarity metric discriminatively, with application to face verification. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 1, IEEE, 539–546. Google ScholarDigital Library
8. Cootes, T. F., Edwards, G. J., and Taylor, C. J. 1998. Active appearance models. In Computer Vision?ECCV?98. Springer, 484–498. Google ScholarDigital Library
9. Datta, R., Joshi, D., Li, J., and Wang, J. Z. 2006. Studying aesthetics in photographic images using a computational approach. In Computer Vision–ECCV 2006. Springer, 288–301. Google ScholarDigital Library
10. Dhar, S., Ordonez, V., and Berg, T. L. 2011. High level de-scribable attributes for predicting aesthetics and interestingness. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, IEEE, 1657–1664. Google ScholarDigital Library
11. Drucker, S., Wong, C., Roseway, A., Glenner, S., and De Mar, S. 2003. Photo-triage: Rapidly annotating your digital photographs. Tech. rep., Microsoft Research Technical Report, MSR-TR-2003-99.Google Scholar
12. Girshick, R., Donahue, J., Darrell, T., and Malik, J. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, IEEE, 580–587. Google ScholarDigital Library
13. Girshick, R. 2015. Fast r-cnn. arXiv preprint arXiv:1504.08083. Google ScholarDigital Library
14. Guo, Y., Liu, M., Gu, T., and Wang, W. 2012. Improving photo composition elegantly: Considering image similarity during composition optimization. In Computer Graphics Forum, Wiley Online Library, 2193–2202. Google ScholarDigital Library
15. HaCohen, Y., Shechtman, E., Goldman, D. B., and Lischinski, D. 2011. Non-rigid dense correspondence with applications for image enhancement. ACM Transactions on Graphics (Proceedings of ACM SIGGRAPH 2011) 30, 4, 70:1–70:9. Google ScholarDigital Library
16. Hariharan, B., Arbeláez, P., Girshick, R., and Malik, J. 2014. Hypercolumns for object segmentation and fine-grained localization. arXiv preprint arXiv:1411.5752.Google Scholar
17. He, K., Zhang, X., Ren, S., and Sun, J. 2015. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385.Google Scholar
18. Hertzmann, A., Jacobs, C. E., Oliver, N., Curless, B., and Salesin, D. H. 2001. Image analogies. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques, ACM, 327–340. Google ScholarDigital Library
19. Jacobs, D. E., Goldman, D. B., and Shechtman, E. 2010. Cosaliency: Where people look when comparing images. In Proceedings of the 23nd annual ACM symposium on User interface software and technology, ACM, 219–228. Google ScholarDigital Library
20. Judd, T., Ehinger, K., Durand, F., and Torralba, A. 2009. Learning to predict where humans look. In IEEE International Conference on Computer Vision (ICCV), IEEE.Google Scholar
21. Karayev, S., Trentacoste, M., Han, H., Agarwala, A., Darrell, T., Hertzmann, A., and Winnemoeller, H. 2013. Recognizing image style. arXiv preprint arXiv:1311.3715.Google Scholar
22. Kaufman, L., Lischinski, D., and Werman, M. 2012. Content-aware automatic photo enhancement. In Computer Graphics Forum, Wiley Online Library, 2528–2540. Google ScholarDigital Library
23. Ke, Y., Tang, X., and Jing, F. 2006. The design of high-level features for photo quality assessment. In Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, vol. 1, IEEE, 419–426. Google ScholarDigital Library
24. Khosla, A., Raju, A. S., Torralba, A., and Oliva, A. 2015. Understanding and predicting image memorability at a large scale. In International Conference on Computer Vision (ICCV). Google ScholarDigital Library
25. Kittur, A., Chi, E. H., and Suh, B. 2008. Crowdsourcing user studies with mechanical turk. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ACM, New York, NY, USA, CHI ’08, 453–456. Google ScholarDigital Library
26. Krizhevsky, A., Sutskever, I., and Hinton, G. E. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, 1097–1105.Google Scholar
27. Liu, L., Chen, R., Wolf, L., and Cohen-Or, D. 2010. Optimizing photo composition. Computer Graphic Forum (Proceedings of Eurographics) 29, 2, 469–478.Google ScholarCross Ref
28. Long, J., Shelhamer, E., and Darrell, T. 2014. Fully convolutional networks for semantic segmentation. arXiv preprint arXiv:1411.4038.Google Scholar
29. Lowe, D. G. 2004. Distinctive image features from scale-invariant keypoints. International journal of computer vision 60, 2, 91–110. Google ScholarDigital Library
30. Lu, X., Lin, Z., Jin, H., Yang, J., and Wang, J. Z. 2014. Rapid: Rating pictorial aesthetics using deep learning. In Proceedings of the ACM International Conference on Multimedia, ACM, 457–466. Google ScholarDigital Library
31. Luo, Y., and Tang, X. 2008. Photo and video quality evaluation: Focusing on the subject. In Computer Vision–ECCV 2008. Springer, 386–399. Google ScholarDigital Library
32. Luo, W., Wang, X., and Tang, X. 2011. Content-based photo quality assessment. In Computer Vision (ICCV), 2011 IEEE International Conference on, IEEE, 2206–2213. Google ScholarDigital Library
33. Ma, Y.-F., Lu, L., Zhang, H.-J., and Li, M. 2002. A user attention model for video summarization. In Proceedings of the tenth ACM international conference on Multimedia, ACM, 533–542. Google ScholarDigital Library
34. Marchesotti, L., Perronnin, F., Larlus, D., and Csurka, G. 2011. Assessing the aesthetic quality of photographs using generic image descriptors. In Computer Vision (ICCV), 2011 IEEE International Conference on, IEEE, 1784–1791. Google ScholarDigital Library
35. Megvii Inc., 2013. Face++ research toolkit. www.faceplusplus.com.Google Scholar
36. Murray, N., Marchesotti, L., and Perronnin, F. 2012. Ava: A large-scale database for aesthetic visual analysis. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, IEEE, 2408–2415. Google ScholarDigital Library
37. Nishiyama, M., Okabe, T., Sato, I., and Sato, Y. 2011. Aesthetic quality classification of photographs based on color harmony. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, IEEE, 33–40. Google ScholarDigital Library
38. Oliva, A., and Torralba, A. 2001. Modeling the shape of the scene: A holistic representation of the spatial envelope. International journal of computer vision 42, 3, 145–175. Google ScholarDigital Library
39. Paige, C. C., and Saunders, M. A. 1982. Lsqr: An algorithm for sparse linear equations and sparse least squares. ACM Trans. Math. Softw. 8, 1, 43–71. Google ScholarDigital Library
40. Park, J., Lee, J.-Y., Tai, Y.-W., and Kweon, I. S. 2012. Modeling photo composition and its application to photo rearrangement. In Image Processing (ICIP), 2012 19th IEEE International Conference on, IEEE, 2741–2744.Google Scholar
41. Ralph Allan Bradley, M. E. T. 1952. Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika 39, 3/4, 324–345.Google Scholar
42. Ren, X., and Malik, J. 2003. Learning a classification model for segmentation. In Computer Vision, 2003. Proceedings. Ninth IEEE International Conference on, IEEE, 10–17. Google ScholarDigital Library
43. Ren, S., He, K., Girshick, R., and Sun, J. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems, 91–99. Google ScholarDigital Library
44. Ren, S., He, K., Girshick, R. B., Zhang, X., and Sun, J. 2015. Object detection networks on convolutional feature maps. CoRR abs/1504.06066.Google Scholar
45. Simon, I., Snavely, N., and Seitz, S. M. 2007. Scene summarization for online image collections. In ICCV, IEEE.Google Scholar
46. Simonyan, K., and Zisserman, A. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.Google Scholar
47. Sinha, P., Mehrotra, S., and Jain, R. 2011. Summarization of personal photologs using multidimensional content and context. In Proceedings of the 1st ACM International Conference on Multimedia Retrieval, ACM, 4. Google ScholarDigital Library
48. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. 2014. Going deeper with convolutions. arXiv preprint arXiv:1409.4842.Google Scholar
49. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna, Z. 2015. Rethinking the inception architecture for computer vision. arXiv preprint arXiv:1512.00567.Google Scholar
50. Tang, H., Joshi, N., and Kapoor, A. 2011. Learning a blind measure of perceptual image quality. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, IEEE, 305–312. Google ScholarDigital Library
51. Wang, X.-J., Zhang, L., and Liu, C. 2013. Duplicate discovery on 2 billion internet images. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2013 IEEE Conference on, IEEE, 429–436. Google ScholarDigital Library
52. Ye, P., Kumar, J., Kang, L., and Doermann, D. 2012. Unsupervised feature learning framework for no-reference image quality assessment. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, IEEE, 1098–1105. Google ScholarDigital Library
53. Yu, F., and Koltun, V. 2015. Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122.Google Scholar
54. Yu, F., Zhang, Y., Song, S., Seff, A., and Xiao, J. 2015. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365.Google Scholar
55. Yuan, L., and Sun, J. 2012. Automatic exposure correction of consumer photographs. In Computer Vision–ECCV 2012. Springer, 771–785. Google ScholarDigital Library
56. Zhang, L., Song, M., Zhao, Q., Liu, X., Bu, J., and Chen, C. 2013. Probabilistic graphlet transfer for photo cropping. Image Processing, IEEE Transactions on 22, 2, 802–815. Google ScholarDigital Library
57. Zhou, E., Fan, H., Cao, Z., Jiang, Y., and Yin, Q. 2013. Extensive facial landmark localization with coarse-to-fine convolutional network cascade. In Proceedings of the IEEE International Conference on Computer Vision Workshops, 386–391. Google ScholarDigital Library
58. Zhu, J.-Y., Agarwala, A., Efros, A. A., Shechtman, E., and Wang, J. 2014. Mirror mirror: Crowdsourcing better portraits. ACM Transactions on Graphics (TOG) 33, 6, 234. Google ScholarDigital Library

ACM Digital Library Publication:

Overview Page:

SIGGRAPH 2016: Technical Papers

“Automatic triage for a photo series”

Conference:

Type(s):

Title:

Session/Category Title: PHOTO ORGANIZATION & MANIPULATION

Presenter(s)/Author(s):

Moderator(s):

Abstract:

References:

ACM Digital Library Publication:

Overview Page:

Sponsored by: