CrossLink: joint understanding of image and 3D model collections through shape and camera pose variations

Collections of images and 3D models hide in them many interesting aspects of our surroundings. Significant efforts have been devoted to organize and explore such data repositories. Most such efforts, however, process the two data modalities separately, and do not take full advantage of the complementary information that exist in different domains, which can help to solve difficult problems in one by exploiting the structure in the other. Beyond the obvious difference in data representations, a key difficulty in such joint analysis lies in the significant variability in the structure and inherent properties of the 2D and 3D data collections, which hinders cross-domain analysis and exploration. We introduce CrossLink, a system for joint image-3D model processing that uses the complementary strengths of each data modality to facilitate analysis and exploration. We first show how our system significantly improves the quality of text-based 3D model search by using side information coming from an image database. We then demonstrate how to consistently align the filtered 3D model collections, and then use them to re-sort image collections based on pose and shape attributes. We evaluate our framework both quantitatively and qualitatively on 20 object categories of 2D image and 3D model collections, and quantitatively demonstrate how a wide variety of tasks in each data modality can strongly benefit from the complementary information present in the other, paving the way to a richer 2D and 3D processing toolbox.

References:

1. Aubry, M., Maturana, D., Efros, A. A., Russell, B., and Sivic, J. 2014. Seeing 3D chairs: exemplar part-based 2d-3d alignment using a large dataset of CAD models. In IEEE CVPR.
2. Aubry, M., Russell, B. C., and Sivic, J. 2014. Painting-to-3D model alignment via discriminative visual elements. ACM TOG 33, 2, 14:1–14:14.
3. Averkiou, M., Kim, V., Zheng, Y., and Mitra, N. J. 2014. Shapesynth: Parameterizing model collections for coupled shape exploration and synthesis. CGF Eurographics.
4. Averkiou, M., Kim, V. G., and Mitra, N. J. 2015. Autocorrelation descriptor for efficient co-alignment of 3d shape collections. CGF.
5. Blanz, V., and Vetter, T. 1999. A morphable model for the synthesis of 3d faces. In Proc. SIGGRAPH, 187–194.
6. Boureau, Y.-L., Ponce, J., and LeCun, Y. 2010. A theoretical analysis of feature pooling in visual recognition. In Proc. ICML, 111–118.
7. Chen, D.-Y., Tian, X.-P., Shen, Y.-T., and Ouhyoung, M. 2003. On visual similarity based 3d model retrieval. In Computer graphics forum, vol. 22, Wiley Online Library, 223–232.
8. Corsini, M., Dellepiane, M., Ponchio, F., and Scopigno, R. 2009. Image-to-geometry registration: a mutual information method exploiting illumination-related geometric properties. In CGF, vol. 28, 1755–1764.
9. Cortes, C., and Vapnik, V. 1995. Support-vector networks. Machine learning 20, 3, 273–297.
10. Dalal, N., and Triggs, B. 2005. Histograms of oriented gradients for human detection. In IEEE CVPR, vol. 1, IEEE, 886–893.
11. Dambreville, S., Sandhu, R., Yezzi, A., and Tannenbaum, A. 2008. Robust 3d pose estimation and efficient 2d region-based segmentation from a 3d shape prior. In ECCV. 169–182.
12. Datta, R., Joshi, D., Li, J., and Wang, J. Z. 2008. Image retrieval: Ideas, influences, and trends of the new age. ACM Comput. Surv. 40, 2, 5:1–5:60.
13. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Feifei, L. 2009. Imagenet: A large-scale hierarchical image database. In IEEE CVPR, IEEE, 248–255.
14. Eitz, M., Richter, R., Boubekeur, T., Hildebrand, K., and Alexa, M. 2012. Sketch-based shape retrieval. ACM SIGGRAPH 31, 4, 31:1–31:10.
15. Evgeniou, T., and Pontil, M. 2004. Regularized multi–task learning. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 109–117.
16. Felzenszwalb, P., McAllester, D., and Ramanan, D. 2008. A discriminatively trained, multiscale, deformable part model. In IEEE CVPR, IEEE, 1–8.
17. Gong, B., Shi, Y., Sha, F., and Grauman, K. 2012. Geodesic flow kernel for unsupervised domain adaptation. In IEEE CVPR, IEEE, 2066–2073.
18. Huang, Q.-X., Su, H., and Guibas, L. 2013. Fine-grained semi-supervised labeling of large shape collections. ACM SIGGRAPH Asia 32, 6, 190:1–190:10.
19. Huang, S.-S., Shamir, A., Shen, C.-H., Zhang, H., Sheffer, A., Hu, S.-M., and Cohen-Or, D. 2013. Qualitative organization of collections of shapes via quartet analysis. ACM SIGGRAPH 32, 4 (July), 71:1–71:10.
20. Huang, Q., Wang, F., and Guibas, L. 2014. Functional map networks for analyzing and exploring large shape collections. ACM SIGGRAPH 33, 4, 36:1–36:11.
21. Jarrett, K., Kavukcuoglu, K., Ranzato, M., and LeCun, Y. 2009. What is the best multi-stage architecture for object recognition? In IEEE ICCV, 2146–2153.
22. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., and Darrell, T. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proc. ACM ICM, ACM, 675–678.
23. Kim, V. G., Li, W., Mitra, N. J., Chaudhuri, S., DiVerdi, S., and Funkhouser, T. 2013. Learning part-based templates from large collections of 3d shapes. ACM SIGGRAPH 32, 4.
24. Kleiman, Y., Fish, N., Lanir, J., and Cohen-Or, D. 2013. Dynamic maps for exploring and browsing shapes. In Proc. SGP, 187–196.
25. Krizhevsky, A., Sutskever, I., and Hinton, G. E. 2012. Imagenet classification with deep convolutional neural networks. In NIPS, 1097–1105.
26. Learned-Miller, E. G. 2006. Data driven image models through continuous joint alignment. Pattern Analysis and Machine Intelligence, IEEE Transactions on 28, 2, 236–250.
27. Li, Y., Zheng, Q., Sharf, A., Cohen-Or, D., Chen, B., and Mitra, N. J. 2011. 2d-3d fusion for layer decomposition of urban facades. In IEEE ICCV.
28. Li, B., et al. 2015. A comparison of 3d shape retrieval methods based on a large-scale benchmark supporting multimodal queries. Computer Vision and Image Understanding 131, 0, 1–27.
29. Masci, J., Bronstein, M. M., Bronstein, A. M., and Schmidhuber, J. 2014. Multimodal similarity-preserving hashing. IEEE PAMI 36, 4, 824–830.
30. Min, P., Kazhdan, M., and Funkhouser, T. 2004. A comparison of text and shape matching for retrieval of online 3d models. In Research and Advanced Technology for Digital Libraries. Springer, 209–220.
31. Mitra, N. J., Wand, M., Zhang, H., Cohen-Or, D., and Bokeloh, M. 2013. Structure-aware shape processing. In EUROGRAPHICS State-of-the-art Report.
32. Ovsjanikov, M., Li, W., Guibas, L., and Mitra, N. J. 2011. Exploration of continuous variability in collections of 3d shapes. ACM SIGGRAPH 30, 4, 33:1–33:10.
33. Pereira, Costa, J., Coviello, E., Doyle, G., Rasiwasia, N., Lanckriet, G., Levy, R., and Vasconcelos, N. 2014. On the role of correlation and abstraction in cross-modal multi-media retrieval. IEEE PAMI 36, 3, 521–535.
34. Platt, J. C. 1999. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In Advances in large margin classifiers, Citeseer.
35. Prisacariu, V. A., and Reid, I. D. 2012. Pwp3d: Real-time segmentation and tracking of 3d objects. IJCV 98, 3, 335–354.
36. Rafiee, G., Dlay, S., and Woo, W. 2010. A review of content-based image retrieval. In Communication Systems Networks and Digital Signal Processing (CSNDSP), 2010 7th International Symposium on, 775–779.
37. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., and Fei-Fei, L., 2014. ImageNet Large Scale Visual Recognition Challenge.
38. Russell, B. C., Sivic, J., Ponce, J., and Dessales, H. 2011. Automatic alignment of paintings and photographs depicting a 3d scene. In 3dRR ICCV Workshop, 545–552.
39. Saunders, C., Gammerman, A., and Vovk, V. 1998. Ridge regression learning algorithm in dual variables. In Proc. ICML, Morgan Kaufmann, 515–521.
40. Shilane, P., Min, P., Kazhdan, M., and Funkhouser, T. 2004. The princeton shape benchmark. In Shape modeling applications, 2004. Proceedings, IEEE, 167–178.
41. Shrivastava, A., Malisiewicz, T., Gupta, A., and Efros, A. A. 2011. Data-driven visual similarity for cross-domain image matching. ACM SIGGRAPH Asia 30, 6.
42. Su, H., Huang, Q., Mitra, N. J., Li, Y., and Guibas, L. 2014. Estimating image depth using shape collections. ACM SIGGRAPH.
43. van Gemert, J. C., Geusebroek, J.-M., Veenman, C. J., and Smeulders, A. W. 2008. Kernel codebooks for scene categorization. In ECCV. Springer, 696–709.
44. Vicente, S., Carreira, J., Agapito, L., and Batista, J. 2014. Reconstructing pascal voc. In IEEE CVPR.
45. Wang, Y., Gong, M., Wang, T., Cohen-Or, D., Zhang, H., and Chen, B. 2013. Projective analysis for 3d shape segmentation. ACM SIGGRAPH Asia 32, 6, 192:1–192:12.
46. Weston, J., Bengio, S., and Usunier, N. 2011. Wsabie: Scaling up to large vocabulary image annotation. In Proc. IJCAI.
47. Xu, K., Li, H., Zhang, H., Cohen-Or, D., Xiong, Y., and Cheng, Z.-Q. 2010. Style-content separation by anisotropic part scales. In ACM SIGGRAPH Asia, 184:1–184:10.
48. Xu, K., Zheng, H., Zhang, H., Cohen-Or, D., Liu, L., and Xiong, Y. 2011. Photo-inspired model-driven 3d object modeling. ACM SIGGRAPH 30, 4, 80:1–80:10.
49. Xu, K., Ma, R., Zhang, H., Zhu, C., Shamir, A., Cohen-Or, D., and Huang, H. 2014. Organizing heterogeneous scene collection through contextual focal points. ACM SIGGRAPH 33, 4.
50. Yumer, M. E., and Kara, L. B. 2014. Co-constrained handles for deformation in shape collections. ACM SIGGRAPH Asia.
51. Yumer, M. E., Chaudhuri, S., Hodgins, J. K., and Kara, L. B. 2015. Semantic shape editing using deformation handles. ACM SIGGRAPH 34.
52. Zhang, L., and Rui, Y. 2013. Image searchfrom thousands to billions in 20 years. ACM TOMCCAP 9, 1s, 36.
53. Zhu, J.-Y., Lee, Y. J., and Efros, A. A. 2014. Averageexplorer: Interactive exploration and alignment of visual data collections. ACM SIGGRAPH 33, 4, 160:1–160:11.

ACM Digital Library Publication:

Overview Page:

SIGGRAPH Asia 2015: Technical Papers

Submit a story:

If you would like to submit a story about this presentation, please contact us: historyarchives@siggraph.org

ACM SIGGRAPH HISTORY ARCHIVES

“CrossLink: joint understanding of image and 3D model collections through shape and camera pose variations” by Hueting, Ovsjanikov and Mitra

Conference:

Type(s):

Title:

Session/Category Title:

Presenter(s)/Author(s):

Abstract:

References:

ACM Digital Library Publication:

Overview Page:

Submit a story:

Sponsored by: