“RAID: a relation-augmented image descriptor”

  • ©Paul Guerrero, Niloy J. Mitra, and Peter Wonka




    RAID: a relation-augmented image descriptor

Session/Category Title: SHAPE ANALYSIS




    As humans, we regularly interpret scenes based on how objects are related, rather than based on the objects themselves. For example, we see a person riding an object X or a plank bridging two objects. Current methods provide limited support to search for content based on such relations. We present raid, a relation-augmented image descriptor that supports queries based on inter-region relations. The key idea of our descriptor is to encode region-to-region relations as the spatial distribution of point-to-region relationships between two image regions. raid allows sketch-based retrieval and requires minimal training data, thus making it suited even for querying uncommon relations. We evaluate the proposed descriptor by querying into large image databases and successfully extract non-trivial images demonstrating complex inter-region relations, which are easily missed or erroneously classified by existing methods. We assess the robustness of raid on multiple datasets even when the region segmentation is computed automatically or very noisy.


    1. Arnold, S., M., W., Worring, M., Santini, S., Gupta, A., and Jain, R. 2000. Content-based image retrieval at the end of the early years. IEEE PAMI 22, 12 (Dec.), 1349–1380. Google ScholarDigital Library
    2. Badadapure, P. R. 2013. Content-Based Image Retrieval by Combining Structural and Content Based Features. International Journal of Engineering and Advanced Technology 2, 4, 154–156.Google Scholar
    3. Belongie, S., Malik, J., and Puzicha, J. 2002. Shape matching and object recognition using shape contexts. Pattern Analysis and Machine Intelligence, IEEE Transactions on 24, 4, 509–522. Google ScholarDigital Library
    4. Berthouzoz, F., Li, W., Dontcheva, M., and Agrawala, M. 2011. A framework for content-adaptive photo manipulation macros: Application to face, landscape, and global manipulations. ACM TOG 30, 5 (Oct.), 120:1–120:14. Google ScholarDigital Library
    5. Bloch, I. 2005. Fuzzy spatial relationships for image processing and interpretation: A review. In Image and Vision Computing, vol. 23, 89–110. Google ScholarDigital Library
    6. 2015. Boost polygon, version 1.58. www.boost.org.Google Scholar
    7. Cao, Y., Wang, C., Zhang, L., and Zhang, L. 2011. Edgel index for large-scale sketch-based image search. In IEEE CVPR, 761–768. Google ScholarDigital Library
    8. Celebi, M. E., and Aslandogan, Y. A. 2005. A comparative study of three moment-based shape descriptors. In IEEE Proc. of the Internat. Conf. on Information Technology, 788–793. Google ScholarDigital Library
    9. Chandran, S., and Kiran, N. 2003. Image retrieval with embedded region relationships. In Proceedings of SAC, 760. Google ScholarDigital Library
    10. Chao, Y.-W., Wang, Z., He, Y., Wang, J., and Deng, J. 2015. Hico: A benchmark for recognizing human-object interactions in images. In Proceedings of the IEEE International Conference on Computer Vision. Google ScholarDigital Library
    11. Chen, T., Cheng, M.-M., Tan, P., Shamir, A., and Hu, S.-M. 2009. Sketch2photo: Internet image montage. ACM TOG 28, 5 (Dec.), 124:1–124:10. Google ScholarDigital Library
    12. Chen, K., Lai, Y.-K., Wu, Y.-X., Martin, R., and Hu, S.-M. 2014. Automatic semantic modeling of indoor scenes from low-quality rgb-d data using contextual information. ACM TOG 33, 6 (Nov.), 208:1–208:12. Google ScholarDigital Library
    13. Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., and Yuille, A. L. 2015. Semantic image segmentation with deep convolutional nets and fully connected crfs. ICLR (Nov.).Google Scholar
    14. Choi, W., Shahid, K., and Savarese, S. 2009. What are they doing?: Collective activity classification using spatio-temporal relationship among people. In ICCV Workshops, 1282–1289.Google Scholar
    15. Chua, T. S., Tan, K.-L., and Ooi, B. C. 1997. Fast signature-based color-spatial image retrieval. In Multimedia Computing and Systems ’97. Proceedings., IEEE International Conference on, 362–369. Google ScholarDigital Library
    16. Eitz, M., Hildebrand, K., Boubekeur, T., and Alexa, M. 2009. A descriptor for large scale image retrieval based on sketched feature lines. In Eurographics Symposium on Sketch-Based Interfaces and Modeling, 29–38. Google ScholarDigital Library
    17. Eitz, M., Hildebrand, K., Boubekeur, T., and Alexa, M. 2009. A descriptor for large scale image retrieval based on sketched feature lines. In SBIM ’09, ACM, New York, NY, USA, 29–36. Google ScholarDigital Library
    18. Eitz, M., Richter, R., Hildebrand, K., Boubekeur, T., and Alexa, M. 2011. Photosketcher: Interactive sketch-based image synthesis. Computer Graphics and Applications, IEEE 31, 6 (Nov), 56–66. Google ScholarDigital Library
    19. Eitz, M., Richter, R., Boubekeur, T., Hildebrand, K., and Alexa, M. 2012. Sketch-based shape retrieval. ACM TOG 31, 4 (July), 31:1–31:10. Google ScholarDigital Library
    20. Fisher, M., Savva, M., and Hanrahan, P. 2011. Characterizing structural relationships in scenes using graph kernels. In ACM TOG, vol. 30, ACM, 34. Google ScholarDigital Library
    21. Fisher, M., Ritchie, D., Savva, M., Funkhouser, T., and Hanrahan, P. 2012. Example-based synthesis of 3d object arrangements. In ACM SIGGRAPH Asia. Google ScholarDigital Library
    22. Fisher, M., Savva, M., Li, Y., Hanrahan, P., and Niessner, M. 2015. Activity-centric scene synthesis for functional 3d scene modeling. ACM TOG 34, 6. Google ScholarDigital Library
    23. Flusser, J. 1992. Invariant shape description and measure of object similarity. In Image Processing and its Applications, 1992., International Conference on, 139–142.Google Scholar
    24. Goshtasby, A. 1985. Description and discrimination of planar shapes using shape matrices. IEEE PAMI 7, 6, 738–743. Google ScholarDigital Library
    25. Hays, J., and Efros, A. A. 2007. Scene completion using millions of photographs. ACM TOG 26, 3 (July). Google ScholarDigital Library
    26. Hsieh, S.-M., and Hsu, C.-C. 2008. Retrieval of images by spatial and object similarities. Inf. Process. Manage. 44, 3 (May), 1214–1233. Google ScholarDigital Library
    27. Hu, S.-M., Zhang, F.-L., Wang, M., Martin, R. R., and Wang, J. 2013. PatchNet: A Patch-based Image Representation for Interactive Library-driven Image Editing. ACM TOG 32, 6, 1–12. Google ScholarDigital Library
    28. Hu, R., Zhu, C., van Kaick, O., Liu, L., Shamir, A., and Zhang, H. 2015. Interaction context (icon): Towards a geometric functionality descriptor. ACM TOG 34, 4 (July), 83:1–83:12. Google ScholarDigital Library
    29. Huang, H., Yin, K., Gong, M., Lischinski, D., Cohen-Or, D., Ascher, U., and Chen, B. 2013. “mind the gap”: Tele-registration for structure-driven image completion. ACM TOG 32, 6 (Nov.), 174:1–174:10. Google ScholarDigital Library
    30. Huang, S., Wang, W., and Zhang, H. 2014. Retrieving images using saliency detection and graph matching. In IEEE ICIP, 3087–3091.Google Scholar
    31. Jansen, S., Shantia, A., and Wiering, M. A. 2015. The neural-sift feature descriptor for visual vocabulary object recognition. In IJCNN, 1–8.Google Scholar
    32. Karpathy, A., and Li, F.-F. 2015. Deep Visual-Semantic Alignments for Generating Image Descriptions. In IEEE CVPR.Google Scholar
    33. Kazmi, I. K., You, L., and Zhang, J. J. 2013. A survey of 2d and 3d shape descriptors. 2014 11th International Conference on Computer Graphics, Imaging and Visualization 0, 1–10. Google ScholarDigital Library
    34. Kim, V. G., Chaudhuri, S., Guibas, L., and Funkhouser, T. 2014. Shape2Pose: Human-Centric Shape Analysis. ACM SIGGRAPH 33, 4. Google ScholarDigital Library
    35. Ko, B., and Byun, H. 2002. Multiple Regions and Their Spatial Relationship-Based Image Retrieval. In LNCS 2383. 81–90. Google ScholarDigital Library
    36. Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., Chen, S., Kalantidis, Y., Li, L.-J., Shamma, D. A., Bernstein, M., and Fei-Fei, L. 2016. Visual genome: Connecting language and vision using crowd-sourced dense image annotations.Google Scholar
    37. Kulkarni, G., Premraj, V., Ordonez, V., Dhar, S., Li, S., Choi, Y., Berg, A. C., and Berg, T. L. 2013. Baby talk: Understanding and generating simple image descriptions. IEEE PAMI 35, 12, 2891–2903. Google ScholarDigital Library
    38. Lan, T., Yang, W., Wang, Y., and Mori, G. 2012. Image retrieval with structured object queries using latent ranking SVM. In Lect. Notes in Computer Science, vol. 7577 LNCS, 129–142. Google ScholarDigital Library
    39. Lan, T., Raptis, M., Sigal, L., and Mori, G. 2013. From subcategories to visual composites: A multi-level framework for object detection. In IEEE ICCV. Google ScholarDigital Library
    40. Lee, S. L. S., and Hwang, E. H. E. 2002. Spatial similarity and annotation-based image retrieval system. Proceedings of Fourth Int. Symposium on Multimedia Software Engineering. Google ScholarDigital Library
    41. Lin, T., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., and Zitnick, C. L. 2014. Microsoft COCO: common objects in context. CoRR abs/1405.0312.Google Scholar
    42. Liu, T., Chaudhuri, S., Kim, V. G., Huang, Q.-X., Mitra, N. J., and Funkhouser, T. 2014. Creating Consistent Scene Graphs Using a Probabilistic Grammar. ACM Transactions on Graphics (Proc. of SIGGRAPH Asia) 33, 6. Google ScholarDigital Library
    43. Long, J., Shelhamer, E., and Darrell, T. 2015. Fully convolutional networks for semantic segmentation. IEEE CVPR.Google Scholar
    44. Lowe, D. 2004. Distinctive image features from scale-invariant keypoints. Int. Journal of Computer Vision 60, 2, 91–110. Google ScholarDigital Library
    45. Malisiewicz, T., and A., E. A. 2009. Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships. In NIPS, 1–9.Google Scholar
    46. Ooi, B. C., Tan, K.-L., Chua, T. S., and Hsu, W. 1998. Fast image retrieval using color-spatial information. The VLDB Journal 7, 2, 115–128. Google ScholarDigital Library
    47. Pentland, A., Picard, R. W., and Sclaroff, S. 1996. Photobook: Content-based manipulation of image databases. Int. J. Comput. Vision 18, 3 (June), 233–254. Google ScholarDigital Library
    48. Rubner, Y., Tomasi, C., and Guibas, L. J. 1998. A metric for distributions with applications to image databases. IEEE Computer Society, Washington, DC, USA, IEEE ICCV, 59–66. Google ScholarDigital Library
    49. Sadeghi, M. A., and Farhadi, A. 2011. Recognition using visual phrases. IEEE Computer Society, Washington, DC, USA, IEEE CVPR, 1745–1752. Google ScholarDigital Library
    50. Shao, T., Monszpart, A., Zheng, Y., Koo, B., Xu, W., Zhou, K., and Mitra, N. 2014. Imagining the unseen: Stability-based cuboid arrangements for scene understanding. ACM SIGGRAPH Asia.* Joint first authors. Google ScholarDigital Library
    51. Shechtman, E., and Irani, M. 2007. Matching local self-similarities across images and videos. In IEEE CVPR, 1–8.Google Scholar
    52. Smith, J. R., and Chang, S.-F. 1996. Visualseek: A fully automated content-based image query system. In Proceedings of the Fourth ACM International Conference on Multimedia, ACM, New York, NY, USA, MULTIMEDIA ’96, 87–98. Google ScholarDigital Library
    53. Teague, M. R. 1980. Image analysis via the general theory of moments*. J. Opt. Soc. Am. 70, 8 (Aug), 920–930.Google ScholarCross Ref
    54. Wang, J., and Hua, X.-S. 2011. Interactive image search by color map. ACM Trans. Intell. Syst. Technol. 3, 1, 12:1–12:23. Google ScholarDigital Library
    55. Wang, Y.-H., 2003. Image indexing and similarity retrieval based on spatial relationship model.Google Scholar
    56. Xu, H., Wang, J., Hua, X.-S., and Li, S. 2010. Image search by concept map. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, New York, NY, USA, SIGIR ’10, 275–282. Google ScholarDigital Library
    57. Xu, K., Chen, K., Fu, H., Sun, W.-L., and Hu, S.-M. 2013. Sketch2scene: Sketch-based co-retrieval and co-placement of 3d models. ACM TOG 32, 4 (July), 123:1–123:15. Google ScholarDigital Library
    58. Yücer, K., Jacobson, A., Hornung, A., and Sorkine, O. 2012. Transfusive image manipulation. ACM TOG 31, 6 (Nov.), 176:1–176:9. Google ScholarDigital Library
    59. Zhang, D., and Lu, G. 2004. Review of shape representation and description techniques. Pattern Recognition 37, 1, 1–19.Google ScholarCross Ref
    60. Zhao, X., Wang, H., and Komura, T. 2014. Indexing 3d scenes using the interaction bisector surface. ACM TOG 33, 3 (June), 22:1–22:14. Google ScholarDigital Library
    61. Zheng, Y., Cohen-Or, D., Averkiou, M., and Mitra, N. J. 2014. Recurring part arrangements in shape collections. Computer Graphics Forum. Google ScholarDigital Library
    62. Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., Huang, C., and Torr, P. 2015. Conditional random fields as recurrent neural networks. In IEEE ICCV. Google ScholarDigital Library
    63. Zhou, X. M., Ang, C. H., and Ling, T. W. 2001. Image retrieval based on object’s orientation spatial relationship. Pattern Recognition Letters 22, 5, 469–477. Google ScholarDigital Library

ACM Digital Library Publication: