“InteractionFusion: real-time reconstruction of hand poses and deformable objects in hand-object interactions” by Zhang, Bo, Yong and Xu

  • ©Zi-Hao Bo, Jun-Hai Yong, and Feng Xu

Conference:


Type:


Title:

    InteractionFusion: real-time reconstruction of hand poses and deformable objects in hand-object interactions

Session/Category Title: Human Capture and Modeling


Presenter(s)/Author(s):



Abstract:


    Hand-object interaction is challenging to reconstruct but important for many applications like HCI, robotics and so on. Previous works focus on either the hand or the object while we jointly track the hand poses, fuse the 3D object model and reconstruct its rigid and nonrigid motions, and perform all these tasks in real time. To achieve this, we first use a DNN to segment the hand and object in the two input depth streams and predict the current hand pose based on the previous poses by a pre-trained LSTM network. With this information, a unified optimization framework is proposed to jointly track the hand poses and object motions. The optimization integrates the segmented depth maps, the predicted motion, a spatial-temporal varying rigidity regularizer and a real-time contact constraint. A nonrigid fusion technique is further involved to reconstruct the object model. Experiments demonstrate that our method can solve the ambiguity caused by heavy occlusions between hand and object, and generate accurate results for various objects and interacting motions.

References:


    1. Luca Ballan, Aparna Taneja, Jürgen Gall, Luc Van Gool, and Marc Pollefeys. 2012. Motion capture of hands in action using discriminative salient points. In European Conference on Computer Vision. Springer, 640–653. Google ScholarDigital Library
    2. Zihao Bo, Hao Zhang, Junhai Yong, and Feng Xu. 2019. DenseAttentionSeg: Segment Hands from Interacted Objects Using Depth Input. arXiv preprint arXiv:1903.12368 (2019).Google Scholar
    3. Chiho Choi, Sang Ho Yoon, Chin-Ning Chen, and Karthik Ramani. 2017. Robust hand pose estimation during the interaction with an unknown object. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3123–3132.Google ScholarCross Ref
    4. Mingsong Dou, Sameh Khamis, Yury Degtyarev, Philip Davidson, Sean Ryan Fanello, Adarsh Kowdle, Sergio Orts Escolano, Christoph Rhemann, David Kim, Jonathan Taylor, et al. 2016. Fusion4d: Real-time performance capture of challenging scenes. ACM Transactions on Graphics (TOG) 35, 4 (2016), 114. Google ScholarDigital Library
    5. Henning Hamer, Konrad Schindler, Esther Koller-Meier, and Luc Van Gool. 2009. Tracking a hand manipulating an object. In Computer Vision, 2009 IEEE 12th International Conference On. IEEE, 1475–1482.Google ScholarCross Ref
    6. Shangchen Han, Beibei Liu, Robert Wang, Yuting Ye, Christopher D Twigg, and Kenrick Kin. 2018. Online optical marker-based hand tracking with deep labels. ACM Transactions on Graphics (TOG) 37, 4 (2018), 166. Google ScholarDigital Library
    7. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780. Google ScholarDigital Library
    8. Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google Scholar
    9. Michael Krainin, Peter Henry, Xiaofeng Ren, and Dieter Fox. 2011. Manipulator and object tracking for in-hand 3D object modeling. The International Journal of Robotics Research 30, 11 (2011), 1311–1327. Google ScholarDigital Library
    10. Nikolaos Kyriazis and Antonis Argyros. 2013. Physically plausible 3d scene tracking: The single actor hypothesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9–16. Google ScholarDigital Library
    11. Nikolaos Kyriazis and Antonis Argyros. 2014. Scalable 3d tracking of multiple interacting objects. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3430–3437. Google ScholarDigital Library
    12. Franziska Mueller, Florian Bernard, Oleksandr Sotnychenko, Dushyant Mehta, Srinath Sridhar, Dan Casas, and Christian Theobalt. 2018. Ganerated hands for real-time 3d hand tracking from monocular rgb. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 49–59.Google ScholarCross Ref
    13. Franziska Mueller, Dushyant Mehta, Oleksandr Sotnychenko, Srinath Sridhar, Dan Casas, and Christian Theobalt. 2017. Real-time hand tracking under occlusion from an egocentric rgb-d sensor. In Proceedings of International Conference on Computer Vision (ICCV), Vol. 10.Google Scholar
    14. Richard A Newcombe, Dieter Fox, and Steven M Seitz. 2015. Dynamicfusion: Reconstruction and tracking of non-rigid scenes in real-time. In Proceedings of the IEEE conference on computer vision and pattern recognition. 343–352.Google ScholarCross Ref
    15. Richard A Newcombe, Shahram Izadi, Otmar Hilliges, David Molyneaux, David Kim, Andrew J Davison, Pushmeet Kohli, Jamie Shotton, Steve Hodges, and Andrew W Fitzgibbon. 2011. Kinectfusion: Real-time dense surface mapping and tracking.. In ISMAR, Vol. 11. 127–136. Google ScholarDigital Library
    16. Iason Oikonomidis, Nikolaos Kyriazis, and Antonis A Argyros. 2011. Full dof tracking of a hand interacting with an object by modeling occlusions and physical constraints. (2011).Google Scholar
    17. Paschalis Panteleris and Antonis Argyros. 2017. Back to RGB: 3D tracking of hands and hand-object interactions based on short-baseline stereo. Hand 2, 63 (2017), 39.Google Scholar
    18. Paschalis Panteleris, Nikolaos Kyriazis, and Antonis A Argyros. 2015. 3D Tracking of Human Hands in Interaction with Unknown Objects.. In BMVC. 123–1.Google Scholar
    19. Antoine Petit, Stéphane Cotin, Vincenzo Lippiello, and Bruno Siciliano. 2018. Capturing Deformations of Interacting Non-rigid Objects Using RGB-D Data. In IROS 2018-IEEE/RSJ International Conference on Intelligent Robots and Systems.Google Scholar
    20. Kha Gia Quach, Chi Nhan Duong, Khoa Luu, and Tien D Bui. 2016. Depth-based 3D hand pose tracking. In 2016 23rd International Conference on Pattern Recognition (ICPR). IEEE, 2746–2751.Google ScholarCross Ref
    21. Grégory Rogez, James S Supancic, and Deva Ramanan. 2015a. First-person pose recognition using egocentric workspaces. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4325–4333.Google ScholarCross Ref
    22. Grégory Rogez, James S Supancic, and Deva Ramanan. 2015b. Understanding everyday hands in action from rgb-d images. In Proceedings of the IEEE international conference on computer vision. 3889–3897. Google ScholarDigital Library
    23. Javier Romero, Hedvig Kjellström, and Danica Kragic. 2010. Hands in action: real-time 3D reconstruction of hands in interaction with objects. In Robotics and Automation (ICRA), 2010 IEEE International Conference on. IEEE, 458–463.Google ScholarCross Ref
    24. Szymon Rusinkiewicz, Olaf Hall-Holt, and Marc Levoy. 2002. Real-time 3D model acquisition. ACM Transactions on Graphics (TOG) 21, 3 (2002), 438–446. Google ScholarDigital Library
    25. Szymon Rusinkiewicz and Marc Levoy. 2001. Efficient variants of the ICP algorithm. In 3-D Digital Imaging and Modeling, 2001. Proceedings. Third International Conference on. IEEE, 145–152.Google Scholar
    26. Tanner Schmidt, Katharina Hertkorn, Richard Newcombe, Zoltan Marton, Michael Suppa, and Dieter Fox. 2015. Depth-based tracking with physical constraints for robot manipulation. In 2015 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 119–126.Google ScholarCross Ref
    27. Miroslava Slavcheva, Maximilian Baust, Daniel Cremers, and Slobodan Ilic. 2017. Killing-fusion: Non-rigid 3d reconstruction without correspondences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1386–1395.Google Scholar
    28. Miroslava Slavcheva, Maximilian Baust, and Slobodan Ilic. 2018. SobolevFusion: 3D reconstruction of scenes undergoing free non-rigid motion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2646–2655.Google ScholarCross Ref
    29. Srinath Sridhar, Franziska Mueller, Michael Zollhöfer, Dan Casas, Antti Oulasvirta, and Christian Theobalt. 2016. Real-time joint tracking of a hand manipulating an object from rgb-d input. In European Conference on Computer Vision. Springer, 294–310.Google ScholarCross Ref
    30. Jonathan Taylor, Lucas Bordeaux, Thomas Cashman, Bob Corish, Cem Keskin, Toby Sharp, Eduardo Soto, David Sweeney, Julien Valentin, Benjamin Luff, et al. 2016. Efficient and precise interactive hand tracking through joint, continuous optimization of pose and correspondences. ACM Transactions on Graphics (TOG) 35, 4 (2016), 143. Google ScholarDigital Library
    31. Jonathan Taylor, Vladimir Tankovich, Danhang Tang, Cem Keskin, David Kim, Philip Davidson, Adarsh Kowdle, and Shahram Izadi. 2017. Articulated distance fields for ultra-fast tracking of hands interacting. ACM Transactions on Graphics (TOG) 36, 6 (2017), 244. Google ScholarDigital Library
    32. Anastasia Tkach, Mark Pauly, and Andrea Tagliasacchi. 2016. Sphere-meshes for real-time hand modeling and tracking. ACM Transactions on Graphics (TOG) 35, 6 (2016), 222. Google ScholarDigital Library
    33. Anastasia Tkach, Andrea Tagliasacchi, Edoardo Remelli, Mark Pauly, and Andrew Fitzgibbon. 2017. Online generative model personalization for hand tracking. ACM Transactions on Graphics (TOG) 36, 6 (2017), 243. Google ScholarDigital Library
    34. Aggeliki Tsoli and Antonis A Argyros. 2018. Joint 3D Tracking of a Deformable Object in Interaction with a Hand. In European Conference on Computer Vision.Google ScholarCross Ref
    35. Dimitrios Tzionas, Luca Ballan, Abhilash Srikantha, Pablo Aponte, Marc Pollefeys, and Juergen Gall. 2016. Capturing hands in action using discriminative salient points and physics simulation. International Journal of Computer Vision 118, 2 (2016), 172–193. Google ScholarDigital Library
    36. Dimitrios Tzionas and Juergen Gall. 2015. 3d object reconstruction from hand-object interactions. In Proceedings of the IEEE International Conference on Computer Vision. 729–737. Google ScholarDigital Library
    37. Yangang Wang, Jianyuan Min, Jianjie Zhang, Yebin Liu, Feng Xu, Qionghai Dai, and Jinxiang Chai. 2013. Video-based hand manipulation capture through composite motion control. ACM Transactions on Graphics (TOG) 32, 4 (2013), 43. Google ScholarDigital Library
    38. Thibaut Weise, Bastian Leibe, and Luc Van Gool. 2008. Accurate and robust registration for in-hand modeling. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on. IEEE, 1–8.Google ScholarCross Ref
    39. Thibaut Weise, Thomas Wismer, Bastian Leibe, and Luc Van Gool. 2011. Online loop closure for real-time interactive 3D scanning. Computer Vision and Image Understanding 115, 5 (2011), 635–648. Google ScholarDigital Library
    40. Carl Yuheng Ren, Victor Prisacariu, David Murray, and Ian Reid. 2013. Star3d: Simultaneous tracking and reconstruction of 3d objects using rgb-d data. In Proceedings of the IEEE International Conference on Computer Vision. 1561–1568. Google ScholarDigital Library


ACM Digital Library Publication:



Overview Page: