RAID: a relation-augmented image descriptor

As humans, we regularly interpret scenes based on how objects are related, rather than based on the objects themselves. For example, we see a person riding an object X or a plank bridging two objects. Current methods provide limited support to search for content based on such relations. We present raid, a relation-augmented image descriptor that supports queries based on inter-region relations. The key idea of our descriptor is to encode region-to-region relations as the spatial distribution of point-to-region relationships between two image regions. raid allows sketch-based retrieval and requires minimal training data, thus making it suited even for querying uncommon relations. We evaluate the proposed descriptor by querying into large image databases and successfully extract non-trivial images demonstrating complex inter-region relations, which are easily missed or erroneously classified by existing methods. We assess the robustness of raid on multiple datasets even when the region segmentation is computed automatically or very noisy.

References:

1. Arnold, S., M., W., Worring, M., Santini, S., Gupta, A., and Jain, R. 2000. Content-based image retrieval at the end of the early years. IEEE PAMI 22, 12 (Dec.), 1349–1380. Google ScholarDigital Library
2. Badadapure, P. R. 2013. Content-Based Image Retrieval by Combining Structural and Content Based Features. International Journal of Engineering and Advanced Technology 2, 4, 154–156.Google Scholar
3. Belongie, S., Malik, J., and Puzicha, J. 2002. Shape matching and object recognition using shape contexts. Pattern Analysis and Machine Intelligence, IEEE Transactions on 24, 4, 509–522. Google ScholarDigital Library
4. Berthouzoz, F., Li, W., Dontcheva, M., and Agrawala, M. 2011. A framework for content-adaptive photo manipulation macros: Application to face, landscape, and global manipulations. ACM TOG 30, 5 (Oct.), 120:1–120:14. Google ScholarDigital Library
5. Bloch, I. 2005. Fuzzy spatial relationships for image processing and interpretation: A review. In Image and Vision Computing, vol. 23, 89–110. Google ScholarDigital Library
6. 2015. Boost polygon, version 1.58. www.boost.org.Google Scholar
7. Cao, Y., Wang, C., Zhang, L., and Zhang, L. 2011. Edgel index for large-scale sketch-based image search. In IEEE CVPR, 761–768. Google ScholarDigital Library
8. Celebi, M. E., and Aslandogan, Y. A. 2005. A comparative study of three moment-based shape descriptors. In IEEE Proc. of the Internat. Conf. on Information Technology, 788–793. Google ScholarDigital Library
9. Chandran, S., and Kiran, N. 2003. Image retrieval with embedded region relationships. In Proceedings of SAC, 760. Google ScholarDigital Library
10. Chao, Y.-W., Wang, Z., He, Y., Wang, J., and Deng, J. 2015. Hico: A benchmark for recognizing human-object interactions in images. In Proceedings of the IEEE International Conference on Computer Vision. Google ScholarDigital Library
11. Chen, T., Cheng, M.-M., Tan, P., Shamir, A., and Hu, S.-M. 2009. Sketch2photo: Internet image montage. ACM TOG 28, 5 (Dec.), 124:1–124:10. Google ScholarDigital Library
12. Chen, K., Lai, Y.-K., Wu, Y.-X., Martin, R., and Hu, S.-M. 2014. Automatic semantic modeling of indoor scenes from low-quality rgb-d data using contextual information. ACM TOG 33, 6 (Nov.), 208:1–208:12. Google ScholarDigital Library
13. Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., and Yuille, A. L. 2015. Semantic image segmentation with deep convolutional nets and fully connected crfs. ICLR (Nov.).Google Scholar
14. Choi, W., Shahid, K., and Savarese, S. 2009. What are they doing?: Collective activity classification using spatio-temporal relationship among people. In ICCV Workshops, 1282–1289.Google Scholar
15. Chua, T. S., Tan, K.-L., and Ooi, B. C. 1997. Fast signature-based color-spatial image retrieval. In Multimedia Computing and Systems ’97. Proceedings., IEEE International Conference on, 362–369. Google ScholarDigital Library
16. Eitz, M., Hildebrand, K., Boubekeur, T., and Alexa, M. 2009. A descriptor for large scale image retrieval based on sketched feature lines. In Eurographics Symposium on Sketch-Based Interfaces and Modeling, 29–38. Google ScholarDigital Library
17. Eitz, M., Hildebrand, K., Boubekeur, T., and Alexa, M. 2009. A descriptor for large scale image retrieval based on sketched feature lines. In SBIM ’09, ACM, New York, NY, USA, 29–36. Google ScholarDigital Library
18. Eitz, M., Richter, R., Hildebrand, K., Boubekeur, T., and Alexa, M. 2011. Photosketcher: Interactive sketch-based image synthesis. Computer Graphics and Applications, IEEE 31, 6 (Nov), 56–66. Google ScholarDigital Library
19. Eitz, M., Richter, R., Boubekeur, T., Hildebrand, K., and Alexa, M. 2012. Sketch-based shape retrieval. ACM TOG 31, 4 (July), 31:1–31:10. Google ScholarDigital Library
20. Fisher, M., Savva, M., and Hanrahan, P. 2011. Characterizing structural relationships in scenes using graph kernels. In ACM TOG, vol. 30, ACM, 34. Google ScholarDigital Library
21. Fisher, M., Ritchie, D., Savva, M., Funkhouser, T., and Hanrahan, P. 2012. Example-based synthesis of 3d object arrangements. In ACM SIGGRAPH Asia. Google ScholarDigital Library
22. Fisher, M., Savva, M., Li, Y., Hanrahan, P., and Niessner, M. 2015. Activity-centric scene synthesis for functional 3d scene modeling. ACM TOG 34, 6. Google ScholarDigital Library
23. Flusser, J. 1992. Invariant shape description and measure of object similarity. In Image Processing and its Applications, 1992., International Conference on, 139–142.Google Scholar
24. Goshtasby, A. 1985. Description and discrimination of planar shapes using shape matrices. IEEE PAMI 7, 6, 738–743. Google ScholarDigital Library
25. Hays, J., and Efros, A. A. 2007. Scene completion using millions of photographs. ACM TOG 26, 3 (July). Google ScholarDigital Library
26. Hsieh, S.-M., and Hsu, C.-C. 2008. Retrieval of images by spatial and object similarities. Inf. Process. Manage. 44, 3 (May), 1214–1233. Google ScholarDigital Library
27. Hu, S.-M., Zhang, F.-L., Wang, M., Martin, R. R., and Wang, J. 2013. PatchNet: A Patch-based Image Representation for Interactive Library-driven Image Editing. ACM TOG 32, 6, 1–12. Google ScholarDigital Library
28. Hu, R., Zhu, C., van Kaick, O., Liu, L., Shamir, A., and Zhang, H. 2015. Interaction context (icon): Towards a geometric functionality descriptor. ACM TOG 34, 4 (July), 83:1–83:12. Google ScholarDigital Library
29. Huang, H., Yin, K., Gong, M., Lischinski, D., Cohen-Or, D., Ascher, U., and Chen, B. 2013. “mind the gap”: Tele-registration for structure-driven image completion. ACM TOG 32, 6 (Nov.), 174:1–174:10. Google ScholarDigital Library
30. Huang, S., Wang, W., and Zhang, H. 2014. Retrieving images using saliency detection and graph matching. In IEEE ICIP, 3087–3091.Google Scholar
31. Jansen, S., Shantia, A., and Wiering, M. A. 2015. The neural-sift feature descriptor for visual vocabulary object recognition. In IJCNN, 1–8.Google Scholar
32. Karpathy, A., and Li, F.-F. 2015. Deep Visual-Semantic Alignments for Generating Image Descriptions. In IEEE CVPR.Google Scholar
33. Kazmi, I. K., You, L., and Zhang, J. J. 2013. A survey of 2d and 3d shape descriptors. 2014 11th International Conference on Computer Graphics, Imaging and Visualization 0, 1–10. Google ScholarDigital Library
34. Kim, V. G., Chaudhuri, S., Guibas, L., and Funkhouser, T. 2014. Shape2Pose: Human-Centric Shape Analysis. ACM SIGGRAPH 33, 4. Google ScholarDigital Library
35. Ko, B., and Byun, H. 2002. Multiple Regions and Their Spatial Relationship-Based Image Retrieval. In LNCS 2383. 81–90. Google ScholarDigital Library
36. Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., Chen, S., Kalantidis, Y., Li, L.-J., Shamma, D. A., Bernstein, M., and Fei-Fei, L. 2016. Visual genome: Connecting language and vision using crowd-sourced dense image annotations.Google Scholar
37. Kulkarni, G., Premraj, V., Ordonez, V., Dhar, S., Li, S., Choi, Y., Berg, A. C., and Berg, T. L. 2013. Baby talk: Understanding and generating simple image descriptions. IEEE PAMI 35, 12, 2891–2903. Google ScholarDigital Library
38. Lan, T., Yang, W., Wang, Y., and Mori, G. 2012. Image retrieval with structured object queries using latent ranking SVM. In Lect. Notes in Computer Science, vol. 7577 LNCS, 129–142. Google ScholarDigital Library
39. Lan, T., Raptis, M., Sigal, L., and Mori, G. 2013. From subcategories to visual composites: A multi-level framework for object detection. In IEEE ICCV. Google ScholarDigital Library
40. Lee, S. L. S., and Hwang, E. H. E. 2002. Spatial similarity and annotation-based image retrieval system. Proceedings of Fourth Int. Symposium on Multimedia Software Engineering. Google ScholarDigital Library
41. Lin, T., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., and Zitnick, C. L. 2014. Microsoft COCO: common objects in context. CoRR abs/1405.0312.Google Scholar
42. Liu, T., Chaudhuri, S., Kim, V. G., Huang, Q.-X., Mitra, N. J., and Funkhouser, T. 2014. Creating Consistent Scene Graphs Using a Probabilistic Grammar. ACM Transactions on Graphics (Proc. of SIGGRAPH Asia) 33, 6. Google ScholarDigital Library
43. Long, J., Shelhamer, E., and Darrell, T. 2015. Fully convolutional networks for semantic segmentation. IEEE CVPR.Google Scholar
44. Lowe, D. 2004. Distinctive image features from scale-invariant keypoints. Int. Journal of Computer Vision 60, 2, 91–110. Google ScholarDigital Library
45. Malisiewicz, T., and A., E. A. 2009. Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships. In NIPS, 1–9.Google Scholar
46. Ooi, B. C., Tan, K.-L., Chua, T. S., and Hsu, W. 1998. Fast image retrieval using color-spatial information. The VLDB Journal 7, 2, 115–128. Google ScholarDigital Library
47. Pentland, A., Picard, R. W., and Sclaroff, S. 1996. Photobook: Content-based manipulation of image databases. Int. J. Comput. Vision 18, 3 (June), 233–254. Google ScholarDigital Library
48. Rubner, Y., Tomasi, C., and Guibas, L. J. 1998. A metric for distributions with applications to image databases. IEEE Computer Society, Washington, DC, USA, IEEE ICCV, 59–66. Google ScholarDigital Library
49. Sadeghi, M. A., and Farhadi, A. 2011. Recognition using visual phrases. IEEE Computer Society, Washington, DC, USA, IEEE CVPR, 1745–1752. Google ScholarDigital Library
50. Shao, T., Monszpart, A., Zheng, Y., Koo, B., Xu, W., Zhou, K., and Mitra, N. 2014. Imagining the unseen: Stability-based cuboid arrangements for scene understanding. ACM SIGGRAPH Asia.* Joint first authors. Google ScholarDigital Library
51. Shechtman, E., and Irani, M. 2007. Matching local self-similarities across images and videos. In IEEE CVPR, 1–8.Google Scholar
52. Smith, J. R., and Chang, S.-F. 1996. Visualseek: A fully automated content-based image query system. In Proceedings of the Fourth ACM International Conference on Multimedia, ACM, New York, NY, USA, MULTIMEDIA ’96, 87–98. Google ScholarDigital Library
53. Teague, M. R. 1980. Image analysis via the general theory of moments*. J. Opt. Soc. Am. 70, 8 (Aug), 920–930.Google ScholarCross Ref
54. Wang, J., and Hua, X.-S. 2011. Interactive image search by color map. ACM Trans. Intell. Syst. Technol. 3, 1, 12:1–12:23. Google ScholarDigital Library
55. Wang, Y.-H., 2003. Image indexing and similarity retrieval based on spatial relationship model.Google Scholar
56. Xu, H., Wang, J., Hua, X.-S., and Li, S. 2010. Image search by concept map. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, New York, NY, USA, SIGIR ’10, 275–282. Google ScholarDigital Library
57. Xu, K., Chen, K., Fu, H., Sun, W.-L., and Hu, S.-M. 2013. Sketch2scene: Sketch-based co-retrieval and co-placement of 3d models. ACM TOG 32, 4 (July), 123:1–123:15. Google ScholarDigital Library
58. Yücer, K., Jacobson, A., Hornung, A., and Sorkine, O. 2012. Transfusive image manipulation. ACM TOG 31, 6 (Nov.), 176:1–176:9. Google ScholarDigital Library
59. Zhang, D., and Lu, G. 2004. Review of shape representation and description techniques. Pattern Recognition 37, 1, 1–19.Google ScholarCross Ref
60. Zhao, X., Wang, H., and Komura, T. 2014. Indexing 3d scenes using the interaction bisector surface. ACM TOG 33, 3 (June), 22:1–22:14. Google ScholarDigital Library
61. Zheng, Y., Cohen-Or, D., Averkiou, M., and Mitra, N. J. 2014. Recurring part arrangements in shape collections. Computer Graphics Forum. Google ScholarDigital Library
62. Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., Huang, C., and Torr, P. 2015. Conditional random fields as recurrent neural networks. In IEEE ICCV. Google ScholarDigital Library
63. Zhou, X. M., Ang, C. H., and Ling, T. W. 2001. Image retrieval based on object’s orientation spatial relationship. Pattern Recognition Letters 22, 5, 469–477. Google ScholarDigital Library

ACM Digital Library Publication:

Overview Page:

SIGGRAPH 2016: Technical Papers

“RAID: a relation-augmented image descriptor”

Conference:

Type(s):

Title:

Session/Category Title: SHAPE ANALYSIS

Presenter(s)/Author(s):

Moderator(s):

Abstract:

References:

ACM Digital Library Publication:

Overview Page:

Sponsored by: