Accurate realtime full-body motion capture using a single depth camera

We present a fast, automatic method for accurately capturing full-body motion data using a single depth camera. At the core of our system lies a realtime registration process that accurately reconstructs 3D human poses from single monocular depth images, even in the case of significant occlusions. The idea is to formulate the registration problem in a Maximum A Posteriori (MAP) framework and iteratively register a 3D articulated human body model with monocular depth cues via linear system solvers. We integrate depth data, silhouette information, full-body geometry, temporal pose priors, and occlusion reasoning into a unified MAP estimation framework. Our 3D tracking process, however, requires manual initialization and recovery from failures. We address this challenge by combining 3D tracking with 3D pose detection. This combination not only automates the whole process but also significantly improves the robustness and accuracy of the system. Our whole algorithm is highly parallel and is therefore easily implemented on a GPU. We demonstrate the power of our approach by capturing a wide range of human movements in real time and achieve state-of-the-art accuracy in our comparison against alternative systems such as Kinect [2012].

References:

1. Amit, Y., and Geman, D. 1997. Shape quantization and recognition with randomized trees. Neural Computation. 9(7):1545–1588.
2. Baak, A., Müller, M., Bharaj, G., Seidel, H.-P., and Theobalt, C. 2011. A data-driven approach for real-time full body pose reconstruction from a depth camera. In IEEE 13th International Conference on Computer Vision (ICCV), 1092–1099.
3. Baker, S., and Matthews, I. 2004. Lucas-kanade 20 years on: A unifying framework. International Journal of Computer Vision. 56(3):221–255.
4. Bregler, C., Malik, J., and Pullen, K. 2004. Twist based acquisition and tracking of animal and human kinematics. International Journal of Computer Vision. 56(3):179–194.
5. Chai, J., and Hodgins, J. 2005. Performance animation from low-dimensional control signals. In ACM Transactions on Graphics. 24(3):686–696.
6. Ganapathi, V., Plagemann, C., Koller, D., and Thrun, S. 2010. Real time motion capture using a single time-of-flight camera. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 755–762.
7. Girshick, R., Shotton, J., Kohli, P., Criminisi, A., and Fitzgibbon, A. 2011. Efficient regression of general-activity human poses from depth images. In Proceedings of IEEE 13th International Conference on Computer Vision, 415–422.
8. Grest, D., Kruger, V., and Koch, R. 2007. Single view motion tracking by depth and silhouette information. In Proceedings of the 15th Scandinavian Conference on Image Analysis (SCIA), 719–729.
9. Kinect, 2012. Microsoft Kinect for Xbox 360.
10. Knoop, S., Vacek, S., and Dillmann, R. 2006. Sensor fusion for 3D human body tracking with an articulated 3D body model. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 1686–1691.
11. Lepetit, V., and Fua, P. 2006. Keypoint recognition using randomized trees. IEEE Transactions on Pattern Analysis and Machine Intelligence. 28(9): 1465–1479.
12. Liu, H., Wei, X., Chai, J., Ha, I., and Rhee, T. 2011. Realtime human motion control with a small number of inertial sensors. In Symposium on Interactive 3D Graphics and Games, ACM, I3D ’11, 133–140.
13. Microsft Kinect API for Windows, 2012. http://www.microsoft.com/en-us/kinectforwindows/.
14. Moeslund, T. B., Hilton, A., and Kruger, V. 2006. A survey of advances in vision-based human motion capture and analysis. Computer Vision and Image Understanding. 104:90–126.
15. Plagemann, C., Ganapathi, V., Koller, D., and Thrun, S. 2010. Realtime identification and localization of body parts from depth images. In Proceedings of International Conferences on Robotics and Automation (ICRA 2010), 3108–3113.
16. Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., and Blake, A. 2011. Real-time human pose recognition in parts from a single depth image. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1297–1304.
17. Siddiqui, M., and Medioni, G. 2010. Human pose estimation from a single view point, real-time range sensor. In CVCG at CVPR.
18. Slyper, R., and Hodgins, J. 2008. Action capture with ac-celerometers. In ACM SIGGRAPH/Eurographics Symposium on Computer Animation, 193–199.
19. Tautges, J., Zinke, A., Krüger, B., Baumann, J., Weber, A., Helten, T., Müller, M., Seidel, H.-P., and Eberhardt, B. 2011. Motion reconstruction using sparse accelerometer data. ACM Transactions on Graphics. 30(3): 18:1–18:12.
20. Vicon Systems, 2011. http://www.vicon.com.
21. Ye, M., Wang, X., Yang, R., Ren, L., and Pollefeys, M. 2011. Accurate 3D pose estimation from a single depth image. In Proceedings of IEEE 13th International Conference on Computer Vision, 731–738.

ACM Digital Library Publication:

Overview Page:

SIGGRAPH Asia 2012: Technical Papers

Submit a story:

If you would like to submit a story about this presentation, please contact us: historyarchives@siggraph.org

ACM SIGGRAPH HISTORY ARCHIVES

“Accurate realtime full-body motion capture using a single depth camera”

Conference:

Type(s):

Title:

Session/Category Title:

Presenter(s)/Author(s):

Abstract:

References:

ACM Digital Library Publication:

Overview Page:

Submit a story:

Sponsored by: