“MoCap-solver: a neural solver for optical motion capture data” by Chen, Wang, Zhang, Xu, Zhang, et al. …

  • ©Kang Chen, Yupan Wang, Song-Hai Zhang, Sen-Zhe Xu, Weidong Zhang, and Shi-Min Hu




    MoCap-solver: a neural solver for optical motion capture data



    In a conventional optical motion capture (MoCap) workflow, two processes are needed to turn captured raw marker sequences into correct skeletal animation sequences. Firstly, various tracking errors present in the markers must be fixed (cleaning or refining). Secondly, an agent skeletal mesh must be prepared for the actor/actress, and used to determine skeleton information from the markers (re-targeting or solving). The whole process, normally referred to as solving MoCap data, is extremely time-consuming, labor-intensive, and usually the most costly part of animation production. Hence, there is a great demand for automated tools in industry. In this work, we present MoCap-Solver, a production-ready neural solver for optical MoCap data. It can directly produce skeleton sequences and clean marker sequences from raw MoCap markers, without any tedious manual operations. To achieve this goal, our key idea is to make use of neural encoders concerning three key intrinsic components: the template skeleton, marker configuration and motion, and to learn to predict these latent vectors from imperfect marker sequences containing noise and errors. By decoding these components from latent vectors, sequences of clean markers and skeletons can be directly recovered. Moreover, we also provide a novel normalization strategy based on learning a pose-dependent marker reliability function, which greatly improves system robustness. Experimental results demonstrate that our algorithm consistently outperforms the state-of-the-art on both synthetic and real-world datasets.


    1. Kfir Aberman, Peizhuo Li, Dani Lischinski, Olga Sorkine-Hornung, Daniel Cohen-Or, and Baoquan Chen. 2020. Skeleton-aware networks for deep motion retargeting. ACM Trans. Graph. 39, 4 (2020), 62.Google ScholarDigital Library
    2. Ijaz Akhter, Tomas Simon, Sohaib Khan, Iain A. Matthews, and Yaser Sheikh. 2012. Bilinear spatiotemporal basis models. ACM Trans. Graph. 31, 2 (2012), 17:1–17:12.Google ScholarDigital Library
    3. Andreas Aristidou, Daniel Cohen-Or, Jessica K. Hodgins, and Ariel Shamir. 2018. Self-similarity analysis for motion capture cleaning. CGF 37, 2 (2018), 297–309.Google ScholarCross Ref
    4. Andreas Aristidou and Joan Lasenby. 2013. Real-time marker prediction and CoR estimation in optical motion capture. Vis. Comput. 29, 1 (2013), 7–26.Google ScholarCross Ref
    5. Jan Baumann, Björn Krüger, Arno Zinke, and Andreas Weber. 2011. Data-driven completion of motion capture data. In Proc. of VRIPHYS. 111–118.Google Scholar
    6. Michael Burke and Joan Lasenby. 2016. Estimating missing marker positions using low dimensional Kalman smoothing. J. Biomechanics 49, 9 (2016), 1854–1858.Google ScholarCross Ref
    7. Jinxiang Chai and Jessica K. Hodgins. 2005. Performance animation from low-dimensional control signals. ACM Trans. Graph. 24, 3 (2005), 686–696.Google ScholarDigital Library
    8. CMU. 2000. CMU graphics lab motion capture database. http://mocap.cs.cmu.edu (2000).Google Scholar
    9. Klaus Dorfmüller-Ulhaas. 2007. Robust optical user motion tracking using a kalman filter. (2007).Google Scholar
    10. Yinfu Feng, Mingming Ji, Jun Xiao, Xiaosong Yang, Jian J. Zhang, Yueting Zhuang, and Xuelong Li. 2015. Mining spatial-temporal patterns and structural sparsity for human motion data denoising. IEEE Trans. Cyber. 45, 12 (2015), 2693–2706.Google ScholarCross Ref
    11. Yinfu Feng, Jun Xiao, Yueting Zhuang, Xiaosong Yang, Jian J. Zhang, and Rong Song. 2014. Exploiting temporal stability and low-rank structure for motion capture data refinement. Inf. Sci. 277 (2014), 777–793.Google ScholarCross Ref
    12. Katerina Fragkiadaki, Sergey Levine, Panna Felsen, and Jitendra Malik. 2015. Recurrent network models for human dynamics. In Proc. of ICCV. 4346–4354.Google ScholarCross Ref
    13. Félix G. Harvey, Mike Yurick, Derek Nowrouzezahrai, and Christopher J. Pal. 2020. Robust motion in-betweening. ACM Trans. Graph. 39, 4 (2020), 60.Google ScholarDigital Library
    14. Gustav Eje Henter, Simon Alexanderson, and Jonas Beskow. 2020. MoGlow: probabilistic and controllable motion synthesis using normalising flows. ACM Trans. Graph. 39, 6 (2020), 236:1–236:14.Google ScholarDigital Library
    15. Lorna Herda, Pascal Fua, Ralf Plänkers, Ronan Boulic, and Daniel Thalmann. 2000. Skeleton-based motion capture for robust reconstruction of human motion. In Proc. of CA. IEEE, 77.Google ScholarCross Ref
    16. Daniel Holden. 2018. Robust solving of optical motion capture data by denoising. ACM Trans. Graph. 37, 4 (2018), 165:1–165:12.Google ScholarDigital Library
    17. Daniel Holden, Jun Saito, and Taku Komura. 2016. A deep learning framework for character motion synthesis and editing. ACM Trans. Graph. 35, 4 (2016), 138:1–138:11.Google ScholarDigital Library
    18. Daniel Holden, Jun Saito, Taku Komura, and Thomas Joyce. 2015. Learning motion manifolds with convolutional autoencoders. In SIGGRAPH Asia Technical Briefs. 18:1–18:4.Google Scholar
    19. Alexander Hornung, Sandip Sar-Dessai, and Leif Kobbelt. 2005. Self-calibrating optical motion tracking for articulated bodies. In Proc. of VR. 75–82.Google Scholar
    20. Manuel Kaufmann, Emre Aksan, Jie Song, Fabrizio Pece, Remo Ziegler, and Otmar Hilliges. 2020. Convolutional autoencoders for human motion infilling. CoRR (2020).Google Scholar
    21. Adam G. Kirk, James F. O’Brien, and David A. Forsyth. 2005. Skeletal parameter estimation from optical motion capture data. In Proc. of CVPR. 782–788.Google Scholar
    22. Ranch Y. Q. Lai, Pong C. Yuen, and Kelvin K. W. Lee. 2011. Motion capture data completion and denoising by singular value thresholding. In EG Short Papers. 45–48.Google Scholar
    23. Kyungho Lee, Seyoung Lee, and Jehee Lee. 2018. Interactive character animation by learning multi-objective control. ACM Trans. Graph. 37, 6 (2018), 180:1–180:10.Google ScholarDigital Library
    24. Lei Li, James McCann, Nancy S. Pollard, and Christos Faloutsos. 2010. BoLeRO: A principled technique for including bone length constraints in motion capture occlusion filling. In Proc. of SCA. 179–188.Google Scholar
    25. Shujie Li, Yang Zhou, Haisheng Zhu, Wenjun Xie, Yang Zhao, and Xiaoping Liu. 2019. Bidirectional recurrent autoencoder for 3D skeleton motion data refinement. Comput. Graph. 81 (2019), 92–103.Google ScholarDigital Library
    26. Shu-Jie Li, Hai-Sheng Zhu, Liping Zheng, and Lin Li. 2020. A perceptual-based noise-agnostic 3D skeleton motion data refinement network. IEEE Access 8 (2020), 52927–52940.Google ScholarCross Ref
    27. Guodong Liu and Leonard McMillan. 2006. Estimation of missing markers in human motion capture. Vis. Comput. 22, 9-11 (2006), 721–728.Google ScholarDigital Library
    28. Xin Liu, Yiu-ming Cheung, Shu-Juan Peng, Zhen Cui, Bineng Zhong, and Ji-Xiang Du. 2014. Automatic motion capture data denoising via filtered subspace clustering and low rank matrix approximation. Signal Process. 105 (2014), 350–362.Google ScholarCross Ref
    29. Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J Black. 2015. SMPL: A skinned multi-person linear model. ACM Trans. Graph. 34, 6 (2015), 248.Google ScholarDigital Library
    30. Naureen Mahmood, Nima Ghorbani, Nikolaus F. Troje, Gerard Pons-Moll, and Michael J. Black. 2019. AMASS: Archive of motion capture as surface shapes. In Proc. of ICCV. 5441–5450.Google Scholar
    31. Utkarsh Mall, G. Roshan Lal, Siddhartha Chaudhuri, and Parag Chaudhuri. 2017. A deep recurrent framework for cleaning motion capture data. CoRR (2017).Google Scholar
    32. Sang Il Park and Jessica K. Hodgins. 2006. Capturing and animating skin deformation in human motion. ACM Trans. Graph. 25, 3 (2006), 881–889.Google ScholarDigital Library
    33. Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed AA Osman, Dimitrios Tzionas, and Michael J Black. 2019. Expressive body capture: 3d hands, face, and body from a single image. In Proc. of CVPR. 10975–10985.Google ScholarCross Ref
    34. Dario Pavllo, Mathias Delahaye, Thibault Porssut, Bruno Herbelin, and Ronan Boulic. 2019. Real-time neural network prediction for handling two-hands mutual occlusions. Comput. Graph. X 2 (2019).Google Scholar
    35. Dario Pavllo, Christoph Feichtenhofer, Michael Auli, and David Grangier. 2020. Modeling human motion with Quaternion-based neural networks. Int. J. Comput. Vis. 128, 4 (2020), 855–872.Google ScholarDigital Library
    36. Maksym Perepichka, Daniel Holden, Sudhir Mudur, and Tiberiu Popa. 2019. Robust marker trajectory repair for MOCAP using kinematic reference. In Proc. of MIG. 1–10.Google ScholarDigital Library
    37. Kathleen M Robinette, Sherri Blackwell, Hein Daanen, Mark Boehmer, and Scott Fleming. 2002. Civilian American and European surface anthropometry resource (CAESAR). Technical Report.Google Scholar
    38. Javier Romero, Dimitrios Tzionas, and Michael J Black. 2017. Embodied hands: Modeling and capturing hands and bodies together. ACM Trans. Graph. 36, 6 (2017), 245.Google ScholarDigital Library
    39. Jochen Tautges, Arno Zinke, Björn Krüger, Jan Baumann, Andreas Weber, Thomas Helten, Meinard Müller, Hans-Peter Seidel, and Bernd Eberhardt. 2011. Motion reconstruction using sparse accelerometer data. ACM Trans. Graph. 30, 3 (2011), 18:1–18:12.Google ScholarDigital Library
    40. Graham W. Taylor, Geoffrey E. Hinton, and Sam T. Roweis. 2006. Modeling human motion using binary latent variables. In Proc. of NIPS. 1345–1352.Google Scholar
    41. Zhao Wang, Shuang Liu, Rongqiang Qian, Tao Jiang, Xiaosong Yang, and Jian J Zhang. 2016. Human motion data refinement unitizing structural sparsity and spatial-temporal information. In Proc. of ICSP. 975–982.Google ScholarCross Ref
    42. Jun Xiao, Yinfu Feng, Mingming Ji, Xiaosong Yang, Jian J. Zhang, and Yueting Zhuang. 2015. Sparse motion bases selection for human motion denoising. Signal Process. 110 (2015), 108–122.Google ScholarDigital Library
    43. Sijie Yan, Yuanjun Xiong, and Dahua Lin. 2018. Spatial temporal graph convolutional networks for skeleton-based action recognition. In Proc. of AAAI. 7444–7452.Google ScholarCross Ref
    44. Victor B. Zordan and Nicholas C. Van Der Horst. 2003. Mapping optical motion capture data to skeletal motion using a physical model. In Proc. of SCA. 245–250.Google Scholar

ACM Digital Library Publication:

Overview Page: