“VRProp-Net: Real-time Interaction with Virtual Props” by Taylor, McNicholas and Cosker

  • ©Catherine Taylor, Robin McNicholas, and Darren Cosker

  • ©Catherine Taylor, Robin McNicholas, and Darren Cosker

  • ©Catherine Taylor, Robin McNicholas, and Darren Cosker


Entry Number: 31


    VRProp-Net: Real-time Interaction with Virtual Props




    Virtual and Augmented Reality (VR and AR) are two fast growing mediums, not only in the entertainment industry but also in health, education and engineering. A good VR or AR application seamlessly merges the real and virtual world, making the user feel fully immersed. Traditionally, a computer-generated object can be interacted with using controllers or hand gestures [HTC 2019; Microsoft 2019; Oculus 2019]. However, these motions can feel unnatural and do not accurately represent the motion of interacting with a real object. On the other hand, a physical object can be used to control the motion of a virtual object. At present, this can be done by tracking purely rigid motion using an external sensor [HTC 2019]. Alternatively, a sparse number of markers can be tracked, for example using a motion capture system, and the positions of these used to drive the motion of an underlying non-rigid model. However, this approach is sensitive to changes in marker position and occlusions and often involves costly non-standard hardware [Vicon 2019]. In addition, these approaches often require a virtual model to be manually sculpted and rigged which can be a time consuming process. Neural networks have been shown to be successful tools in computer vision, with several key methods using networks for tracking rigid and non-rigid motion in RGB images [Andrychowicz et al. 2018; Kanazawa et al. 2018; Pumarola et al. 2018]. While these methods show potential, they are limited to using multiple RGB cameras or large, costly amounts of labelled training data. To address these limitations, we propose an end to end pipeline for creating interactive virtual props from real-world physical objects. As part of our pipeline, we propose a new neural network – VRProp-Net – based on a Wide Residual Network [Zagoruyko and Komodakis 2016], to accurately predict rigid and non-rigid deformation parameters from unlabelled RGB images. We compare the success of VRProp-Net to a basic Resnet34 [He et al. 2016] for predicting 3D pose and shape for non-rigid objects. We demonstrate our results for several rigid and non-rigid objects.


    • Andrychowicz, B. Baker, M. Chociej, R. Józefowicz, B. McGrew, J. W. Pachocki, A. Petron, M. Plappert, G. Powell, A. Ray, J. Schneider, S. Sidor, J. Tobin, P. Welinder, L. Weng, and W. Zaremba. 2018. Learning Dexterous In-Hand Manipulation. CoRR (2018). 
    • D. Cook. 1989. Concepts and applications of finite element analysis. (3rd ed.). 
    • He, X. Zhang, S. Ren, and J. Sun. 2016. Identity mappings in deep residual networks. In European conference on computer vision. 630–645.
    • HTC. 2019. Discover Virtual Reality Beyond Imagination. https://www.vive.com/uk/.
    • Kanazawa, M. J Black, D. W Jacobs, and J. Malik. 2018. End-to-end recovery of human shape and pose. In Proceedings of the CVPR. 7122–7131.
    • Microsoft. 2019. Microsoft HoloLens | Mixed Reality Technology for Business. https: //www.microsoft.com/en-us/hololens.
    • Oculus. 2019. Oculus Rift. https://www.oculus.com/rift/.
    • Pumarola, A. Agudo, A. Porzi, L.and Sanfeliu, V. Lepetit, and F. Moreno-Noguer. 2018. Geometry-aware network for non-rigid shape prediction from a single view. In Proceedings of CVPR. 4681–4690.
    • Vicon. 2019. Motion Capture Systems. https://www.vicon.com/.
    • Zagoruyko and N. Komodakis. 2016. Wide residual networks. arXiv preprint arXiv:1605.07146 (2016).