“Motion capture from body-mounted cameras” by Shiratori, Park, Sigal, Sheikh and Hodgins

  • ©Takaaki Shiratori, Hyun Soo Park, Leonid Sigal, Yaser Sheikh, and Jessica K. Hodgins




    Motion capture from body-mounted cameras



    Motion capture technology generally requires that recordings be performed in a laboratory or closed stage setting with controlled lighting. This restriction precludes the capture of motions that require an outdoor setting or the traversal of large areas. In this paper, we present the theory and practice of using body-mounted cameras to reconstruct the motion of a subject. Outward-looking cameras are attached to the limbs of the subject, and the joint angles and root pose are estimated through non-linear optimization. The optimization objective function incorporates terms for image matching error and temporal continuity of motion. Structure-from-motion is used to estimate the skeleton structure and to provide initialization for the non-linear optimization procedure. Global motion is estimated and drift is controlled by matching the captured set of videos to reference imagery. We show results in settings where capture would be difficult or impossible with traditional motion capture systems, including walking outside and swinging on monkey bars. The quality of the motion reconstruction is evaluated by comparing our results against motion capture data produced by a commercially available optical system.


    1. Agarwal, S., Snavely, N., Simon, I., Seitz, S. M., and Szeliski, R. 2009. Building Rome in a day. In Proc. International Conference on Computer Vision, 72–79.Google Scholar
    2. Ballan, L., Puwein, J., Brostow, G., and Polleteys, M. 2010. Unstructured video-based rendering: Interactive exploration of casually captured videos. ACM Transactions on Graphics 29, 4. Google ScholarDigital Library
    3. Cheung, G. K., Baker, S., and Kanade, T. 2003. Shape-from-silhouette of articulated objects and its use for human body kinematics estimation and motion capture. In Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 77–84. Google ScholarDigital Library
    4. Corazza, S., Mündermann, L., Chaudhari, A., Demattio, T., Cobelli, C., and Andriacchi, T. 2006. A markerless motion capture system to study musculoskeletal biomechanics: Visual hull and simulated annealing approach. Annals of Biomedical Engineering 34, 6, 1019–1029.Google ScholarCross Ref
    5. Corazza, S., Gambaretto, E., Mündermann, L., and Andriacchi, T. 2010. Automatic generation of a subject-specific model for accurate markerless motion capture and biomechanical applications. IEEE Transactions on Biomedical Engineering 57, 4, 806–812.Google ScholarCross Ref
    6. Davison, A., Reid, I., Molton, N., and Stasse, O. 2007. MonoSLAM: Real-time single camera SLAM. IEEE Transactions on Pattern Analysis and Machine Intelligence 29, 6, 1052–1067. Google ScholarDigital Library
    7. Deutscher, J., and Reid, I. 2005. Articulated body motion capture by stochastic search. International Journal of Computer Vision 61, 2, 185–205. Google ScholarDigital Library
    8. Devernay, F., and Faugeras, O. 2000. Straight lines have to be straight. Machine Vision and Applications 13, 1, 14–24. Google ScholarDigital Library
    9. Duncan, J. 2010. Avatar. Cinefex 120 (January), 68–146.Google Scholar
    10. Fischler, M., and Bolles, R. 1981. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM 24, 6, 381–395. Google ScholarDigital Library
    11. Frahm, J.-M., Georgel, P., Gallup, D., Johnson, T., Raguram, R., Wu, C., Jen, Y.-H., Dunn, E., Clipp, B., Lazebnik, S., and Pollefeys, M. 2010. Building Rome on a cloudless day. In Proc. European Conference on Computer Vision, 368–381. Google ScholarDigital Library
    12. Hartley, R. I., and Zisserman, A. 2004. Multiple View Geometry in Computer Vision. Cambridge University Press. Google Scholar
    13. Hasler, N., Rosenhahn, B., Thormählen, T., Wand, M., Gall, J., and Seidel, H.-P. 2009. Markerless motion capture with unsynchronized moving cameras. In Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 224–231.Google Scholar
    14. Kelly, P., Conaire, C. Ó., and O’Connor, N. E. 2010. Human motion reconstruction using wearable accelerometers. In Proc. ACM SIGGRAPH/Eurographics Symposium on Computer Animation (Poster).Google Scholar
    15. Klein, G., and Murray, D. 2007. Parallel tracking and mapping for small AR workspaces. In Proc. IEEE and ACM International Symposium on Mixed and Augmented Reality, 225–234. Google Scholar
    16. Lepetit, V., Moreno-Noguer, F., and Fua, P. 2009. EPnP: An accurate O(n) solution to the PnP problem. International Journal of Computer Vision 81, 2, 155–166. Google ScholarDigital Library
    17. Lourakis, M. A., and Argyros, A. 2009. SBA: A software package for generic sparse bundle adjustment. ACM Transactions on Mathematical Software 36, 1, 1–30. Google ScholarDigital Library
    18. Lowe, D. 2004. Distinctive image features from scale-invariant key points. International Journal of Computer Vision 60, 2, 91–110. Google ScholarCross Ref
    19. Moeslund, T. B., Hilton, A., and Krüger, V. 2006. A survey of advances in vision-based human motion capture and analysis. Computer Vision and Image Understanding 104, 90–126. Google ScholarDigital Library
    20. Muja, M., and Lowe, D. G. 2009. Fast approximate nearest neighbors with automatic algorithm configuration. In Proc. International Conference on Computer Vision Theory and Application, 331–340.Google Scholar
    21. Níster, D., Naroditsky, O., and Bergen, J. 2006. Visual odometry for ground vehicle applications. Journal of Field Robotics 23, 1, 3–20.Google ScholarCross Ref
    22. O’Brien, J. F., Bodenheimer, R. E., Brostow, G. J., and Hodgins, J. K. 2000. Automatic joint parameter estimation from magnetic motion capture data. In Proc. Graphics Interface, 53–60.Google Scholar
    23. Oskiper, T., Zhu, Z., Samarasekera, S., and Kumar, R. 2007. Visual odometry system using multiple stereo cameras and inertial measurement unit. In Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Google Scholar
    24. Pollefeys, M., Gool, L. V., Vergauwen, M., Verbiest, F., Cornelis, K., Tops, J., and Koch, R. 2004. Visual modeling with a hand-held camera. International Journal of Computer Vision 59, 3, 207–232. Google ScholarCross Ref
    25. Raskar, R., Nii, H., de Decker, B., Hashimoto, Y., Summet, J., Moore, D., Zhao, Y., Westhues, J., Dietz, P., Inami, M., Nayar, S., Barnwell, J., Noland, M., Bekaert, P., Branzoi, V., and Bruns, E. 2007. Prakash: Lighting-aware motion capture using photosensing markers and multiplexed illuminators. ACM Transactions on Graphics 26, 3. Google ScholarDigital Library
    26. Schwarz, L. A., Mateus, D., and Navab, N. 2010. Multiple-activity human body tracking in unconstrained environments. In Proc. International Conference on Articulated Motion and Deformable Objects, 192–202. Google ScholarDigital Library
    27. Slyper, R., and Hodgins, J. K. 2008. Action capture with accelerometers. In Proc. ACM SIGGRAPH/Eurographics Symposium on Computer Animation. Google ScholarDigital Library
    28. Snavely, N., Seitz, S. M., and Szeliski, R. 2006. Photo tourism: Exploring photo collections in 3D. ACM Transactions on Graphics 25, 3, 835–846. Google ScholarDigital Library
    29. Tautges, J., Zinke, A., Krüger, B., Baumann, J., Weber, A., Helten, T., Müller, M., Seidel, H.-P., and Eberhardt, B. 2011. Motion reconstruction using sparse accelerometer data. ACM Transactions on Graphics 30, 3. Google ScholarDigital Library
    30. Vlasic, D., Adelsberger, R., Vannucci, G., Barnwell, J., Gross, M., Matusik, W., and Popović, J. 2007. Practical motion capture in everyday surroundings. ACM Transactions on Graphics 26, 3, 35. Google ScholarDigital Library
    31. Welch, G., and Foxlin, E. 2002. Motion tracking: No silver bullet, but a respectable arsenal. IEEE Computer Graphics and Applications 22, 6, 24–38. Google ScholarDigital Library
    32. Welch, G., Bishop, G., Vicci, L., Brumback, S., Keller, K., and Colucci, D. 1999. The HiBall tracker: High-performance wide-area tracking for virtual and augmented environments. In Proc. ACM Symposium on Virtual Reality Software and Technology, 1–10. Google Scholar
    33. Woltring, H. 1974. New possibilities for human motion studies by real-time light spot position measurement. Biotelemetry 1, 3.Google Scholar
    34. Xie, L., Kumar, M., Cao, Y., Gracanin, D., and Quek, F. 2008. Data-driven motion estimation with low-cost sensors. In Proc. International Conference on Visual Information Engineering.Google Scholar
    35. Zhang, Z., Wu, Z., Chen, J., and Wu, J.-K. 2009. Ubiquitous human body motion capture using micro-sensors. In Proc. IEEE International Conference on Pervasive Computing and Communications. Google Scholar
    36. Zhu, Z., Oskiper, T., Samarasekera, S., Sawhney, H., and Kumar, R. 2007. Ten-fold improvement in visual odometry using landmark matching. In Proc. International Conference on Computer Vision.Google Scholar
    37. Zhu, Z., Oskiper, T., Samarasekera, S., Kumar, R., and Sawhney, H. 2008. Real-time global localization with a pre-built visual landmark database. In Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Google Scholar

ACM Digital Library Publication: