“Online optical marker-based hand tracking with deep labels” by Han, Liu and Wang

  • ©Shangchen Han, Beibei Liu, and Robert Y. Wang



Entry Number: 166


    Online optical marker-based hand tracking with deep labels

Session/Category Title:   Bodies in Motion Human Performance Capture




    Optical marker-based motion capture is the dominant way for obtaining high-fidelity human body animation for special effects, movies, and video games. However, motion capture has seen limited application to the human hand due to the difficulty of automatically identifying (or labeling) identical markers on self-similar fingers. We propose a technique that frames the labeling problem as a keypoint regression problem conducive to a solution using convolutional neural networks. We demonstrate robustness of our labeling solution to occlusion, ghost markers, hand shape, and even motions involving two hands or handheld objects. Our technique is equally applicable to sparse or dense marker sets and can run in real-time to support interaction prototyping with high-fidelity hand tracking and hand presence in virtual reality.


    1. 2018. OptiTrack Motion Capture Systems. (2018). https://www.optitrack.comGoogle Scholar
    2. 2018. Vicon Motion Systems. (2018). https://www.vicon.com/Google Scholar
    3. Simon Alexanderson, Carol O’Sullivan, and Jonas Beskow. 2016. Robust Online Motion Capture Labeling of Finger Markers. In Proceedings of the 9th International Conference on Motion in Games (MIG). Google ScholarDigital Library
    4. Simon Alexanderson, Carol O’Sullivan, and Jonas Beskow. 2017. Real-time labeling of non-rigid motion capture marker sets. Computers & graphics (2017). Google ScholarDigital Library
    5. Tameem Antoniades. 2016. Creating a Live Real-time Performance-captured Digital Human. In ACM SIGGRAPH 2016 Real-Time Live! (SIGGRAPH ’16). Google ScholarDigital Library
    6. A. Aristidou and J. Lasenby. 2010. Motion capture with constrained inverse kinematics for real-time hand tracking. In 2010 4th International Symposium on Communications, Control and Signal Processing (ISCCSP).Google Scholar
    7. Adrian Bulat and Georgios Tzimiropoulos. 2016. Human Pose Estimation via Convolutional Part Heatmap Regression. In Computer Vision – ECCV 2016, Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Springer International Publishing, Cham, 717–732.Google ScholarCross Ref
    8. Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms, Third Edition (3rd ed.). The MIT Press. Google ScholarDigital Library
    9. Kaiming He, Georgia Gkioxari, Piotr Dollar, and Ross Girshick. 2017. Mask R-CNN. In The IEEE International Conference on Computer Vision (ICCV).Google Scholar
    10. Wheatland N. Neff M. Zordan V. Kang, C. 2012. Automatic Hand-Over Animation for Free-Hand Motions from Low Resolution Input. In Proceedings of Motion in Games (MIG).Google ScholarCross Ref
    11. M. Kitagawa and B. Windsor. 2008. MoCap for Artists: Workflow and Techniques for Motion Capture. Focal Press. Google ScholarDigital Library
    12. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems (NIPS). Google ScholarDigital Library
    13. Jonathan Maycock, Tobias Rohlig, Matthias SchrÃűder, Mario Botsch, and Helge J. Ritter. 2015. Fully automatic optical motion tracking using an inverse kinematics approach.. In 15th IEEE-RAS International Conference on Humanoid Robots. 461–466.Google Scholar
    14. J. Meyer, M. Kuderer, J. Müller, and W. Burgard. 2014. Online marker labeling for fully automatic skeleton tracking in optical motion capture. In 2014 IEEE International Conference on Robotics and Automation (ICRA).Google Scholar
    15. Iason Oikonomidis, Nikolaos Kyriazis, and Antonis A. Argyros. 2012. Tracking the Articulated Motion of Two Strongly Interacting Hands. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Google ScholarDigital Library
    16. Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Advances in Neural Information Processing Systems (NIPS). Google ScholarDigital Library
    17. Gernot Riegler, Osman Ulusoy, and Andreas Geiger. 2017. OctNet: Learning Deep 3D Representations at High Resolutions. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
    18. Maurice Ringer and Joan Lasenby. 2002a. Multiple Hypothesis Tracking for Automatic Optical Motion Capture. In The 7th European Conference on Computer Vision (ECCV). Google ScholarDigital Library
    19. Maurice Ringer and Joan Lasenby. 2002b. A Procedure for Automatically Estimating Model Parameters in Optical Motion Capture. In In British Machine Vision Conference. 747–756.Google ScholarCross Ref
    20. Matthias Schröder, Jonathan Maycock, and Mario Botsch. 2015. Reduced Marker Layouts for Optical Motion Capture of Hands. In Proceedings of the 8th ACM SIGGRAPH Conference on Motion in Games (MIG). Google ScholarDigital Library
    21. Matthias Schröder, Thomas Waltemate, Jonathan Maycock, Tobias RÃűhlig, Helge Ritter, and Mario Botsch. 2017. Design and evaluation of reduced marker layouts for hand motion capture. Computer Animation and Virtual Worlds (2017), e1751.Google Scholar
    22. T. Schubert, K. Eggensperger, A. Gkogkidis, F. Hutter, T. Ball, and W Burgard. 2016. Automatic bone parameter estimation for skeleton tracking in optical motion capture. In 2016 IEEE International Conference on Robotics and Automation (ICRA).Google ScholarDigital Library
    23. T. Schubert, A. Gkogkidis, T. Ball, and W. Burgard. 2015. Automatic initialization for skeleton tracking in optical motion capture. In 2015 IEEE International Conference on Robotics and Automation (ICRA).Google Scholar
    24. Kevin J. Shih, Arun Mallya, Saurabh Singh, and Derek Hoiem. 2015. Part Localization using Multi-Proposal Consensus for Fine-Grained Categorization. In Proceedings of the British Machine Vision Conference 2015, BMVC 2015, Swansea, UK, September 7–10, 2015. 128.1–128.12.Google ScholarCross Ref
    25. Tomas Simon, Hanbyul Joo, Iain Matthews, and Yaser Sheikh. 2017. Hand Keypoint Detection in Single Images Using Multiview Bootstrapping. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
    26. K. Simonyan and A. Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the International Conference on Learning Representations (ICLR).Google Scholar
    27. Jonathan Taylor, Lucas Bordeaux, Thomas Cashman, Bob Corish, Cem Keskin, Toby Sharp, Eduardo Soto, David Sweeney, Julien Valentin, Benjamin Luff, Arran Topalian, Erroll Wood, Sameh Khamis, Pushmeet Kohli, Shahram Izadi, Richard Banks, Andrew Fitzgibbon, and Jamie Shotton. 2016. Efficient and Precise Interactive Hand Tracking Through Joint, Continuous Optimization of Pose and Correspondences. ACM Trans. Graph. 35, 4 (July 2016), 143:1–143:12. Google ScholarDigital Library
    28. Jonathan Taylor, Vladimir Tankovich, Danhang Tang, Cem Keskin, David Kim, Philip Davidson, Adarsh Kowdle, and Shahram Izadi. 2017. Articulated Distance Fields for Ultra-fast Tracking of Hands Interacting. ACM Trans. Graph. 36, 6 (Nov. 2017), 244:1–244:12. Google ScholarDigital Library
    29. Anastasia Tkach, Mark Pauly, and Andrea Tagliasacchi. 2016. Sphere-meshes for Real-time Hand Modeling and Tracking. ACM Trans. Graph. 35, 6 (Nov. 2016), 222:1–222:11. Google ScholarDigital Library
    30. Jonathan Tompson, Murphy Stein, Yann Lecun, and Ken Perlin. 2014. Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks. ACM Trans. Graph. 33, 5 (Aug. 2014), 169:1–169:10. Google ScholarDigital Library
    31. Nkenge Wheatland, Sophie Jörg, and Victor Zordan. 2013. Automatic Hand-Over Animation Using Principle Component Analysis. In Proceedings of Motion in Games (MIG). Google ScholarDigital Library
    32. Nkenge Wheatland, Yingying Wang, Huaguang Song, Michael Neff, Victor Zordan, and Sophie JÃűrg. 2015. State of the Art in Hand and Finger Modeling and Animation. Computer Graphics Forum (2015). Google ScholarDigital Library
    33. Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Lukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Gregory S. Corrado, Macduff Hughes, and Jeffrey Dean. 2016. Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. CoRR abs/1609.08144 (2016).Google Scholar

ACM Digital Library Publication:

Overview Page: