“A Locality-based Neural Solver for Optical Motion Capture” by Pan, Zheng, Jiang, Xu, Gu, et al. …
Conference:
Type(s):
Title:
- A Locality-based Neural Solver for Optical Motion Capture
Session/Category Title: Motion Capture and Reconstruction
Presenter(s)/Author(s):
Abstract:
We present a novel locality-based learning method for cleaning and solving optical motion capture data. Given noisy marker data, we propose a new heterogeneous graph neural network which treats markers and joints as different types of nodes, and uses graph convolution operations to extract the local features of markers and joints and transform them to clean motions. To deal with anomaly markers (e.g. missing or with big tracking errors), the key insight is that a marker motion show strong correlations with the motions of its immediate neighboring markers but less so with other markers, a.k.a. locality, which enables us to fill missing markers (e.g. due to occlusion). Additionally, we also identify marker outliers due to tracking errors by investigating their acceleration profiles. Finally, we propose a training regime based on representation learning and data augmentation, by training the model on data with masking. The masking schemes aim to mimic the missing and noisy markers often observed in the real data. Finally, we show that our method achieves high accuracy on multiple metrics across various datasets. Extensive comparison shows our method outperforms state-of-the-art methods in terms of prediction accuracy of occluded marker position error by approximately 20%, which leads to a further error reduction on the reconstructed joint rotations and positions by 30%. The code and data for this paper are available at github.com/localmocap/LocalMoCap.
References:
[1]
2000. CMU Graphics Lab Motion Capture Database. http://mocap.cs.cmu.edu/
[2]
Kfir Aberman, Peizhuo Li, Dani Lischinski, Olga Sorkine-Hornung, Daniel Cohen-Or, and Baoquan Chen. 2020. Skeleton-aware Networks for Deep Motion Retargeting. ACM Transactions on Graphics (TOG) 39, 4 (2020), 62–1.
[3]
Andreas Aristidou, Daniel Cohen-Or, Jessica K Hodgins, and Ariel Shamir. 2018. Self-similarity Analysis for Motion Capture Cleaning. In Computer graphics forum (CGF), Vol. 37. 297–309.
[4]
Andreas Aristidou and Joan Lasenby. 2013. Real-time marker prediction and CoR estimation in optical motion capture. The Visual Computer 29 (2013), 7–26.
[5]
Jan Baumann, Björn Krüger, Arno Zinke, and Andreas Weber. 2011. Data-Driven Completion of Motion Capture Data. VRIPHYS 2011 – 8th Workshop on Virtual Reality Interactions and Physical Simulations, 111–118.
[6]
Michael Burke and Joan Lasenby. 2016. Estimating Missing Marker Positions using Low Dimensional Kalman Smoothing. Journal of biomechanics 49, 9 (2016), 1854–1858.
[7]
Zhe Cao, Hang Gao, Karttikeya Mangalam, Qi-Zhi Cai, Minh Vo, and Jitendra Malik. 2020. Long-term human motion prediction with scene context. In ECCV 2020. 387–404.
[8]
Jinxiang Chai and Jessica K Hodgins. 2005. Performance Animation from Low-dimensional Control Signals. In ACM SIGGRAPH 2005 Papers. 686–696.
[9]
Kang Chen, Yupan Wang, Song-Hai Zhang, Sen-Zhe Xu, Weidong Zhang, and Shi-Min Hu. 2021. MoCap-Solver: A Neural Solver for Optical Motion Capture Data. ACM Transactions on Graphics (TOG) 40, 4, Article 84 (2021), 11 pages.
[10]
Xin Chen, Biao Jiang, Wen Liu, Zilong Huang, Bin Fu, Tao Chen, and Gang Yu. 2023. Executing your Commands via Motion Diffusion in Latent Space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 18000–18010.
[11]
Qiongjie Cui, Huaijiang Sun, Yue Kong, and Xiaoning Sun. 2021. Deep Human Dynamics Prior. In Proceedings of the 29th ACM International Conference on Multimedia (MM) (Virtual Event, China) (MM ’21). Association for Computing Machinery, New York, NY, USA, 4371–4379. https://doi.org/10.1145/3474085.3475581
[12]
Yinfu Feng, Jun Xiao, Yueting Zhuang, Xiaosong Yang, Jian J Zhang, and Rong Song. 2014. Exploiting Temporal Stability and Low-rank Structure for Motion Capture Data Refinement. Information Sciences 277 (2014), 777–793.
[13]
Nima Ghorbani and Michael J Black. 2021. Soma: Solving optical marker-based mocap automatically. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 11117–11126.
[14]
Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. 2022. Masked Autoencoders are Scalable Vision Learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 16000–16009.
[15]
Lorna Herda, Pascal Fua, Ralf Plankers, Ronan Boulic, and Daniel Thalmann. 2000. Skeleton-based Motion Capture for Robust Reconstruction of Human Motion. In Proceedings Computer Animation 2000. IEEE, 77–83.
[16]
Daniel Holden. 2018. Robust solving of optical motion capture data by denoising. ACM Transactions on Graphics (TOG) 37, 4 (2018), 1–12.
[17]
Yilei Hua, Wenhan Wu, Ce Zheng, Aidong Lu, Mengyuan Liu, Chen Chen, and Shiqian Wu. 2023. Part Aware Contrastive Learning for Self-Supervised Action Recognition. International Joint Conference on Artificial Intelligence (IJCAI) (2023).
[18]
Adam Kirk, James F O’Brien, and David A Forsyth. 2004. Skeletal Parameter Estimation from Optical Motion Capture Data. In ACM SIGGRAPH 2004 Sketches. 29.
[19]
Lei Li, James McCann, Nancy S. Pollard, and Christos Faloutsos. 2009. DynaMMo: Mining and Summarization of Coevolving Sequences with Missing Values. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD). Association for Computing Machinery, 507–516.
[20]
Lei Li, James McCann, Nancy S. Pollard, and Christos Faloutsos. 2010. BoLeRO: a principled technique for including bone length constraints in motion capture occlusion filling. In Symposium on Computer Animation (SCA). 179–188.
[21]
Guodong Liu and Leonard McMillan. 2006. Estimation of Missing Markers in Human Motion Capture. The Visual Computer (TVC) 22 (2006), 721–728.
[22]
Xin Liu, Yiu-ming Cheung, Shu-Juan Peng, Zhen Cui, Bineng Zhong, and Ji-Xiang Du. 2014. Automatic Motion Capture Data Denoising via Filtered Subspace Clustering and Low Rank Matrix Approximation. Signal processing 105 (2014), 350–362.
[23]
Meysam Madadi, Hugo Bertiche, and Sergio Escalera. 2021. Deep Unsupervised 3D Human Body Reconstruction from a Sparse Set of Landmarks. International Journal of Computer Vision (IJCV) 129, 8 (2021), 2499–2512.
[24]
Naureen Mahmood, Nima Ghorbani, Nikolaus F. Troje, Gerard Pons-Moll, and Michael J. Black. 2019. AMASS: Archive of Motion Capture as Surface Shapes. In The IEEE International Conference on Computer Vision (ICCV). 5442–5451.
[25]
Tewodros Legesse Munea, Yalew Zelalem Jembre, Halefom Tekle Weldegebriel, Longbiao Chen, Chenxi Huang, and Chenhui Yang. 2020. The progress of human pose estimation: A survey and taxonomy of models applied in 2D human pose estimation. IEEE Access 8 (2020), 133330–133348.
[26]
Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed A. A. Osman, Dimitrios Tzionas, and Michael J. Black. 2019. Expressive Body Capture: 3D Hands, Face, and Body from a Single Image. In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 10975–10985.
[27]
Dario Pavllo, Thibault Porssut, Bruno Herbelin, and Ronan Boulic. 2018. Real-time Marker-based Finger Tracking with Neural Networks. In 2018 IEEE Conference on Virtual Reality and 3D User Interfaces (VR). IEEE, 651–652.
[28]
Maksym Perepichka, Daniel Holden, Sudhir P Mudur, and Tiberiu Popa. 2019. Robust Marker Trajectory Repair for MoCap using Kinematic Reference. In Proceedings of the 12th ACM SIGGRAPH Conference on Motion, Interaction and Games (MIG). 1–10.
[29]
Javier Romero, Dimitrios Tzionas, and Michael J. Black. 2017. Embodied Hands: Modeling and Capturing Hands and Bodies Together. ACM Transactions on Graphics, (Proc. SIGGRAPH Asia) 36, 6 (Nov. 2017).
[30]
Abraham Savitzky and Marcel JE Golay. 1964. Smoothing and differentiation of data by simplified least squares procedures.Analytical chemistry 36, 8 (1964), 1627–1639.
[31]
Leonid Sigal, Alexandru O Balan, and Michael J Black. 2010. Humaneva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. International Journal of Computer Vision (IJCV) 87, 1-2 (2010), 4–27.
[32]
Omid Taheri, Nima Ghorbani, Michael J. Black, and Dimitrios Tzionas. 2020. GRAB: A Dataset of Whole-Body Human Grasping of Objects. In European Conference on Computer Vision (ECCV). 581–600.
[33]
Jochen Tautges, Arno Zinke, Björn Krüger, Jan Baumann, Andreas Weber, Thomas Helten, Meinard Müller, Hans-Peter Seidel, and Bernd Eberhardt. 2011. Motion Reconstruction Using Sparse Accelerometer Data. ACM Transactions on Graphics (TOG) 30, 3, Article 18 (2011), 12 pages.
[34]
Guy Tevet, Sigal Raab, Brian Gordon, Yoni Shafir, Daniel Cohen-or, and Amit Haim Bermano. 2022. Human Motion Diffusion Model. In The Eleventh International Conference on Learning Representations (ICLR).
[35]
Mickaël Tits, Joëlle Tilmanne, and Thierry Dutoit. 2018. Robust and Automatic Motion-capture Data Recovery using Soft Skeleton Constraints and Model Averaging. PloS one 13 (2018), 1–21.
[36]
Warren S Torgerson. 1952. Multidimensional Scaling: I. Theory and method. Psychometrika 17, 4 (1952), 401–419.
[37]
Matthew Trumble, Andrew Gilbert, Charles Malleson, Adrian Hilton, and John Collomosse. 2017. Total capture: 3d human pose estimation fusing video and inertial sensors. In Proceedings of 28th British Machine Vision Conference (BMVC). 1–13.
[38]
Gül Varol, Javier Romero, Xavier Martin, Naureen Mahmood, Michael J. Black, Ivan Laptev, and Cordelia Schmid. 2017. Learning from Synthetic Humans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[39]
Vicon. 2023. Vicon Shogun.
[40]
Zhao Wang, Shuang Liu, Rongqiang Qian, Tao Jiang, Xiaosong Yang, and Jian J Zhang. 2016. Human Motion Data Refinement Unitizing Structural Sparsity and Spatial-temporal information. In 2016 IEEE 13th International Conference on Signal Processing (ICSP). 975–982.
[41]
Jun Xiao, Yinfu Feng, Mingming Ji, Xiaosong Yang, Jian J Zhang, and Yueting Zhuang. 2015. Sparse Motion Bases Selection for Human Motion Denoising. Signal Processing 110 (2015), 108–122.
[42]
Sijie Yan, Yuanjun Xiong, and Dahua Lin. 2018. Spatial temporal graph convolutional networks for skeleton-based action recognition. In Proceedings of the AAAI conference on artificial intelligence (AAAI).
[43]
Lei Zhang, Ligang Liu, Craig Gotsman, and Steven J Gortler. 2010. An As-rigid-as-possible Approach to Sensor Network Localization. ACM Transactions on Sensor Networks (TOSN) 6, 4 (2010), 1–21.
[44]
Ce Zheng, Wenhan Wu, Chen Chen, Taojiannan Yang, Sijie Zhu, Ju Shen, Nasser Kehtarnavaz, and Mubarak Shah. 2020. Deep learning-based human pose estimation: A survey. Comput. Surveys (2020).
[45]
Shenglong Zhou, Naihua Xiu, and Hou-Duo Qi. 2020. Robust Euclidean embedding via EDM optimization. Mathematical Programming Computation 12 (2020), 337–387.
[46]
Victor Brian Zordan and Nicholas C Van Der Horst. 2003. Mapping Optical Motion Capture Data to Skeletal Motion Using a Physical Model. In Proceedings of the 2003 ACM SIGGRAPH/Eurographics symposium on Computer animation. 245–250.


