“VNect: real-time 3D human pose estimation with a single RGB camera” by Mehta, Sridhar, Sotnychenko, Rhodin, Shafiei, et al. …

  • ©Dushyant Mehta, Srinath Sridhar, Oleksandr Sotnychenko, Helge Rhodin, Mohammad Shafiei, Hans-Peter Seidel, Weipeng Xu, Dan Casas, and Christian Theobalt

Conference:


Type:


Title:

    VNect: real-time 3D human pose estimation with a single RGB camera

Session/Category Title: Get More Out of Your Photo


Presenter(s)/Author(s):


Moderator(s):



Abstract:


    We present the first real-time method to capture the full global 3D skeletal pose of a human in a stable, temporally consistent manner using a single RGB camera. Our method combines a new convolutional neural network (CNN) based pose regressor with kinematic skeleton fitting. Our novel fully-convolutional pose formulation regresses 2D and 3D joint positions jointly in real time and does not require tightly cropped input frames. A real-time kinematic skeleton fitting method uses the CNN output to yield temporally stable 3D global pose reconstructions on the basis of a coherent kinematic skeleton. This makes our approach the first monocular RGB method usable in real-time applications such as 3D character control—thus far, the only monocular methods for such applications employed specialized RGB-D cameras. Our method’s accuracy is quantitatively on par with the best offline 3D monocular RGB pose estimation methods. Our results are qualitatively comparable to, and sometimes better than, results from monocular RGB-D approaches, such as the Kinect. However, we show that our approach is more broadly applicable than RGB-D solutions, i.e., it works for outdoor scenes, community videos, and low quality commodity RGB cameras.

References:


    1. Ankur Agarwal and Bill Triggs. 2006. Recovering 3D human pose from monocular images. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 28, 1 (2006), 44–58. Google ScholarDigital Library
    2. Sameer Agarwal, Keir Mierle, and Others. 2017. Ceres Solver. http://ceres-solver.org. (2017).Google Scholar
    3. Ijaz Akhter and Michael J Black. 2015. Pose-conditioned joint angle limits for 3D human pose reconstruction. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1446–1455. Google ScholarCross Ref
    4. Sikandar Amin, Mykhaylo Andriluka, Marcus Rohrbach, and Bernt Schiele. 2013. Multiview Pictorial Structures for 3D Human Pose Estimation. In BMVC.Google Scholar
    5. Mykhaylo Andriluka, Leonid Pishchulin, Peter Gehler, and Bernt Schiele. 2014. 2D Human Pose Estimation: New Benchmark and State of the Art Analysis. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Google ScholarDigital Library
    6. Mykhaylo Andriluka, Stefan Roth, and Bernt Schiele. 2009. Pictorial structures revisited: People detection and articulated pose estimation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1014–1021. Google ScholarCross Ref
    7. Anelia Angelova, Alex Krizhevsky, Vincent Vanhoucke, Abhijit Ogale, and Dave Ferguson. 2015. Real-Time Pedestrian Detection With Deep Network Cascades. In Proceedings of BMVC 2015. Google ScholarCross Ref
    8. Andreas Baak, Meinard Müller, Gaurav Bharaj, Hans-Peter Seidel, and Christian Theobalt. 2011. A Data-Driven Approach for Real-Time Full Body Pose Reconstruction from a Depth Camera. In IEEE International Conference on Computer Vision (ICCV). Google ScholarDigital Library
    9. Alexandru O Balan, Leonid Sigal, and Michael J Black. 2005. A quantitative evaluation of video-based 3D person tracking. In 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance. IEEE, 349–356.Google ScholarCross Ref
    10. Alexandru O Balan, Leonid Sigal, Michael J Black, James E Davis, and Horst W Haussecker. 2007. Detailed human shape and pose from images. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1–8.Google ScholarCross Ref
    11. Vasileios Belagiannis, Sikandar Amin, Mykhaylo Andriluka, Bernt Schiele, Nassir Navab, and Slobodan Ilic. 2014. 3D pictorial structures for multiple human pose estimation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1669–1676. Google ScholarDigital Library
    12. Vasileios Belagiannis and Andrew Zisserman. 2016. Recurrent Human Pose Estimation. arXiv preprint arXiv:1605.02914 (2016).Google Scholar
    13. Alessandro Bissacco, Ming-Hsuan Yang, and Stefano Soatto. 2007. Fast human pose estimation using appearance and motion via multi-dimensional boosting regression. In 2007 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1–8. Google ScholarCross Ref
    14. Liefeng Bo and Cristian Sminchisescu. 2010. Twin gaussian processes for structured prediction. International Journal of Computer Vision 87, 1–2 (2010), 28–52.Google ScholarDigital Library
    15. Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter Gehler, Javier Romero, and Michael J. Black. 2016. Keep it SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image. In European Conference on Computer Vision (ECCV). Google ScholarCross Ref
    16. Lubomir Bourdev and Jitendra Malik. 2009. Poselets: Body part detectors trained using 3d human pose annotations. In IEEE International Conference on Computer Vision (ICCV). 1365–1372. Google ScholarCross Ref
    17. Ernesto Brau and Hao Jiang. 2016. 3D Human Pose Estimation via Deep Learning from 2D Annotations. In International Conference on 3D Vision (3DV).Google ScholarCross Ref
    18. Christoph Bregler and Jitendra Malik. 1998. Tracking people with twists and exponential maps. In Conference on Computer Vision and Pattern Recognition. 8–15. Google ScholarCross Ref
    19. Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2016. Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. arXiv preprint arXiv:1611.08050 (2016).Google Scholar
    20. Géry Casiez, Nicolas Roussel, and Daniel Vogel. 2012. 1âĆň filter: a simple speed-based low-pass filter for noisy input in interactive systems. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 2527–2530.Google ScholarDigital Library
    21. Jinxiang Chai and Jessica K Hodgins. 2005. Performance animation from low-dimensional control signals. ACM Transactions on Graphics (TOG) 24, 3 (2005), 686–696. Google ScholarDigital Library
    22. Wenzheng Chen, Huan Wang, Yangyan Li, Hao Su, Zhenhua Wang, Changhe Tu, Dani Lischinski, Daniel Cohen-Or, and Baoquan Chen. 2016. Synthesizing Training Images for Boosting Human 3D Pose Estimation. In International Conference on 3D Vision (3DV).Google ScholarCross Ref
    23. Martin de La Gorce, Nikos Paragios, and David J Fleet. 2008. Model-based hand tracking with texture, shading and self-occlusions. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference On. IEEE, 1–8.Google ScholarCross Ref
    24. Jonathan Deutscher and Ian Reid. 2005. Articulated body motion capture by stochastic search. International Journal of Computer Vision 61, 2 (2005), 185–205. Google ScholarDigital Library
    25. Mingsong Dou, Sameh Khamis, Yury Degtyarev, Philip Davidson, Sean Ryan Fanello, Adarsh Kowdle, Sergio Orts Escolano, Christoph Rhemann, David Kim, Jonathan Taylor, and others. 2016. Fusion4d: Real-time performance capture of challenging scenes. ACM Transactions on Graphics (TOG) 35, 4 (2016), 114.Google ScholarDigital Library
    26. Ahmed Elgammal and Chan-Su Lee. 2004. Inferring 3D body pose from silhouettes using activity manifold learning. In Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on, Vol. 2. IEEE, II-681.Google ScholarCross Ref
    27. Ahmed Elhayek, Edilson de Aguiar, Arjun Jain, Jonathan Tompson, Leonid Pishchulin, Mykhaylo Andriluka, Christoph Bregler, Bernt Schiele, and Christian Theobalt. 2016. MARCOnI – ConvNet-based MARker-less Motion Capture in Outdoor and Indoor Scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) (2016).Google Scholar
    28. Pedro F Felzenszwalb, Ross B Girshick, David McAllester, and Deva Ramanan. 2010. Object detection with discriminatively trained part-based models. In IEEE transactions on pattern analysis and machine intelligence. IEEE, 1627–1645.Google Scholar
    29. Pedro F Felzenszwalb and Daniel P Huttenlocher. 2005. Pictorial structures for object recognition. International Journal of Computer Vision (IJCV) 61, 1 (2005), 55–79. Google ScholarDigital Library
    30. Vittorio Ferrari, Manuel Marin-Jimenez, and Andrew Zisserman. 2009. Pose search: retrieving people using their pose. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1–8. Google ScholarCross Ref
    31. Juergen Gall, Bodo Rosenhahn, Thomas Brox, and Hans-Peter Seidel. 2010. Optimization and Filtering for Human Motion Capture. International Journal of Computer Vision (IJCV) 87, 1–2 (2010), 75–92.Google ScholarDigital Library
    32. Varun Ganapathi, Christian Plagemann, Daphne Koller, and Sebastian Thrun. 2012. Real-time human pose tracking from range data. In European conference on computer vision. Springer, 738–751. Google ScholarDigital Library
    33. Ravi Garg, Anastasios Roussos, and Lourdes Agapito. 2013. Dense variational reconstruction of non-rigid surfaces from monocular video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1272–1279. Google ScholarDigital Library
    34. Ross Girshick, Jamie Shotton, Pushmeet Kohli, Antonio Criminisi, and Andrew Fitzgibbon. 2011. Efficient regression of general-activity human poses from depth images. In Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, 415–422. Google ScholarDigital Library
    35. Paulo FU Gotardo and Aleix M Martinez. 2011. Computing smooth time trajectories for camera and deformable shape in structure from motion with occlusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 10 (2011), 2051–2065. Google ScholarDigital Library
    36. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR). Google ScholarCross Ref
    37. Nicholas R Howe, Michael E Leventon, and William T Freeman. 1999. Bayesian Reconstruction of 3D Human Motion from Single-Camera Video.. In NIPS, Vol. 99. 820–6.Google Scholar
    38. Peiyun Hu, Deva Ramanan, Jia Jia, Sen Wu, Xiaohui Wang, Lianhong Cai, and Jie Tang. 2016. Bottom-Up and Top-Down Reasoning with Hierarchical Rectified Gaussians. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Google ScholarCross Ref
    39. Matthias Innmann, Michael Zollhöfer, Matthias Nießner, Christian Theobalt, and Marc Stamminger. 2016. VolumeDeform: Real-time Volumetric Non-rigid Reconstruction. (October 2016), 17.Google Scholar
    40. Eldar Insafutdinov, Leonid Pishchulin, Bjoern Andres, Mykhaylo Andriluka, and Bernt Schiele. 2016. DeeperCut: A Deeper, Stronger, and Faster Multi-Person Pose Estimation Model. In European Conference on Computer Vision (ECCV). Google ScholarCross Ref
    41. Sergey Ioffe and Christian Szegedy. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of The 32nd International Conference on Machine Learning. 448–456.Google ScholarDigital Library
    42. Catalin Ionescu, Joao Carreira, and Cristian Sminchisescu. 2014a. Iterated second-order label sensitive pooling for 3d human pose estimation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1661–1668. Google ScholarDigital Library
    43. Catalin Ionescu, Dragos Papava, Vlad Olaru, and Cristian Sminchisescu. 2014b. Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 36, 7 (2014), 1325–1339. Google ScholarDigital Library
    44. Arjun Jain, Thorsten Thormählen, Hans-Peter Seidel, and Christian Theobalt. 2010. MovieReshape: Tracking and Reshaping of Humans in Videos. ACM Transactions on Graphics 29, 5 (2010). Google ScholarDigital Library
    45. Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM International Conference on Multimedia. 675–678. Google ScholarDigital Library
    46. Sam Johnson and Mark Everingham. 2010. Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation. In British Machine Vision Conference (BMVC). Google ScholarCross Ref
    47. Sam Johnson and Mark Everingham. 2011. Learning Effective Human Pose Estimation from Inaccurate Annotation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Google ScholarDigital Library
    48. Minsik Lee, Jungchan Cho, Chong-Ho Choi, and Songhwai Oh. 2013. Procrustean normal distribution for non-rigid structure from motion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1280–1287. Google ScholarDigital Library
    49. Sijin Li and Antoni B Chan. 2014. 3d human pose estimation from monocular images with deep convolutional neural network. In Asian Conference on Computer Vision (ACCV). 332–347.Google Scholar
    50. Sijin Li, Weichen Zhang, and Antoni B Chan. 2015a. Maximum-margin structured learning with deep networks for 3d human pose estimation. In IEEE International Conference on Computer Vision (ICCV). 2848–2856.Google ScholarDigital Library
    51. Sijin Li, Weichen Zhang, and Antoni B Chan. 2015b. Maximum-margin structured learning with deep networks for 3d human pose estimation. In IEEE International Conference on Computer Vision (ICCV). 2848–2856. Google ScholarDigital Library
    52. Ita Lifshitz, Ethan Fetaya, and Shimon Ullman. 2016. Human Pose Estimation using Deep Consensus Voting. In European Conference on Computer Vision (ECCV). Google ScholarCross Ref
    53. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, and Scott E. Reed. 2016. SSD: Single Shot MultiBox Detector. In European Conference on Computer Vision (ECCV). Google ScholarCross Ref
    54. Matthew M Loper and Michael J Black. 2014. OpenDR: An approximate differentiable renderer. In European Conference on Computer Vision. Springer, 154–169.Google ScholarCross Ref
    55. Ziyang Ma and Enhua Wu. 2014. Real-time and robust hand tracking with a single depth camera. The Visual Computer 30, 10 (2014), 1133–1144. Google ScholarDigital Library
    56. Dushyant Mehta, Helge Rhodin, Dan Casas, Oleksandr Sotnychenko, Weipeng Xu, and Christian Theobalt. 2016. Monocular 3D Human Pose Estimation In The Wild Using Improved CNN Supervision. arXiv preprint arXiv:1611.09813v2 (2016).Google Scholar
    57. Alberto Menache. 2000. Understanding motion capture for computer animation and video games. Morgan kaufmann.Google Scholar
    58. Microsoft Corporation. 2010. Kinect for Xbox 360. http://www.xbox.com/en-US/xbox-360/accessories/kinect. (2010).Google Scholar
    59. Microsoft Corporation. 2013. Kinect for Xbox One. http://www.xbox.com/en-US/xbox-one/accessories/kinect. (2013).Google Scholar
    60. Microsoft Corporation. 2015. Kinect SDK. https://developer.microsoft.com/en-us/windows/kinect. (2015).Google Scholar
    61. Thomas B. Moeslund, Adrian Hilton, and Volker KrÃiger. 2006. A Survey of Advances in Vision-based Human Motion Capture and Analysis. CVIU 104, 2–3 (2006), 90–126.Google ScholarDigital Library
    62. Greg Mori and Jitendra Malik. 2006. Recovering 3d human body configurations using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 28, 7 (2006), 1052–1062. Google ScholarDigital Library
    63. Richard A. Newcombe, Dieter Fox, and Steven M. Seitz. 2015. DynamicFusion: Reconstruction and Tracking of Non-Rigid Scenes in Real-Time. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Google ScholarCross Ref
    64. Alejandro Newell, Kaiyu Yang, and Jia Deng. 2016. Stacked Hourglass Networks for Human Pose Estimation. In European Conference on Computer Vision (ECCV). Google ScholarCross Ref
    65. Iason Oikonomidis, Nikolaos Kyriazis, and Antonis A Argyros. 2011. Efficient model-based 3D tracking of hand articulations using Kinect.. In BmVC, Vol. 1. 3.Google Scholar
    66. Sergio Orts-Escolano, Christoph Rhemann, Sean Fanello, Wayne Chang, Adarsh Kowdle, Yury Degtyarev, David Kim, Philip L Davidson, Sameh Khamis, Mingsong Dou, and others. 2016. Holoportation: Virtual 3D Teleportation in Real-time. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology. ACM, 741–754. Google ScholarDigital Library
    67. Hyun Soo Park and Yaser Sheikh. 2011. 3D reconstruction of a smooth articulated trajectory from a monocular image sequence. In International Conference on Computer Vision (ICCV). 201–208. Google ScholarDigital Library
    68. Georgios Pavlakos, Xiaowei Zhou, Konstantinos G Derpanis, and Kostas Daniilidis. 2016. Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose. arXiv preprint arXiv:1611.07828 (2016).Google Scholar
    69. Leonid Pishchulin, Mykhaylo Andriluka, Peter Gehler, and Bernt Schiele. 2013. Strong appearance and expressive spatial models for human pose estimation. In Proceedings of the IEEE International Conference on Computer Vision. 3487–3494. Google ScholarDigital Library
    70. Leonid Pishchulin, Eldar Insafutdinov, Siyu Tang, Bjoern Andres, Mykhaylo Andriluka, Peter Gehler, and Bernt Schiele. 2016. DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Google ScholarCross Ref
    71. Gerard Pons-Moll, David J Fleet, and Bodo Rosenhahn. 2014. Posebits for monocular human pose estimation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2337–2344.Google ScholarDigital Library
    72. Real Madrid C.F. 2016. Cristiano Ronaldo and Coentrao continue their recovery. https://www.youtube.com/watch?v=xqiPuX_buOo. (2016).Google Scholar
    73. Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2015. You only look once: Unified, real-time object detection. arXiv preprint arXiv:1506.02640 (2015).Google Scholar
    74. Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems. 91–99.Google Scholar
    75. Helge Rhodin, Christian Richardt, Dan Casas, Eldar Insafutdinov, Mohammad Shafiei, Hans-Peter Seidel, Bernt Schiele, and Christian Theobalt. 2016a. EgoCap: Egocentric Marker-less Motion Capture with Two Fisheye Cameras. ACM Trans. Graph. (Proc. SIGGRAPH Asia) (2016).Google Scholar
    76. Helge Rhodin, Nadia Robertini, Dan Casas, Christian Richardt, Hans-Peter Seidel, and Christian Theobalt. 2016b. General automatic human shape and motion capture using volumetric contour cues. In European Conference on Computer Vision (ECCV). Springer, 509–526. Google ScholarCross Ref
    77. Helge Rhodin, Nadia Robertini, Christian Richardt, Hans-Peter Seidel, and Christian Theobalt. 2015. A Versatile Scene Model With Differentiable Visibility Applied to Generative Pose Estimation. In ICCV. Google ScholarDigital Library
    78. Grégory Rogez and Cordelia Schmid. 2016. MoCap-guided Data Augmentation for 3D Pose Estimation in the Wild. arXiv preprint arXiv:1607.02046 (2016).Google Scholar
    79. Lorenz Rogge, Felix Klose, Michael Stengel, Martin Eisemann, and Marcus Magnor. 2014. Garment replacement in monocular video sequences. ACM Transactions on Graphics (TOG) 34, 1 (2014), 6.Google ScholarDigital Library
    80. Rómer Rosales and Stan Sclaroff. 2000. Specialized mappings and the estimation of human body pose from a single image. In Human Motion, 2000. Proceedings. Workshop on. IEEE, 19–24. Google ScholarCross Ref
    81. Rómer Rosales and Stan Sclaroff. 2006. Combining generative and discriminative models in a framework for articulated pose estimation. International Journal of Computer Vision 67, 3 (2006), 251–276. Google ScholarDigital Library
    82. RUSFENCING-TV. 2017. The Most Beautiful Strike / Saber Woman (Translated from Russian). https://www.youtube.com/watch?v=0gOcMsWUkCU. (2017).Google Scholar
    83. Jamie Shotton, Toby Sharp, Alex Kipman, Andrew Fitzgibbon, Mark Finocchio, Andrew Blake, Mat Cook, and Richard Moore. 2013. Real-time human pose recognition in parts from single depth images. Commun. ACM 56, 1 (2013), 116–124. Google ScholarDigital Library
    84. Hedvig Sidenbladh, Michael J Black, and David J Fleet. 2000. Stochastic tracking of 3D human figures using 2D image motion. In European conference on computer vision. Springer, 702–718.Google ScholarCross Ref
    85. Leonid Sigal, Michael Isard, Horst Haussecker, and Michael J Black. 2012. Loose-limbed people: Estimating 3D human pose and motion using non-parametric belief propagation. International Journal of Computer Vision (IJCV) 98, 1 (2012), 15–48. Google ScholarDigital Library
    86. Cristian Sminchisescu, Atul Kanaujia, and Dimitris Metaxas. 2006. Learning joint top-down and bottom-up processes for 3D visual inference. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1743–1752. Google ScholarDigital Library
    87. Cristian Sminchisescu, Atul Kanaujia, and Dimitris N Metaxas. 2007. BM3E: Discriminative Density Propagation for Visual Tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence 29, 11 (2007), 2030–2044. Google ScholarDigital Library
    88. Cristian Sminchisescu and Bill Triggs. 2001. Covariance scaled sampling for monocular 3D body tracking. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vol. 1. IEEE, I-447. Google ScholarCross Ref
    89. Jonathan Starck and Adrian Hilton. 2003. Model-based multiple view reconstruction of people. In IEEE International Conference on Computer Vision (ICCV). 915–922. Google ScholarCross Ref
    90. Carsten Stoll, Nils Hasler, Juergen Gall, Hans-Peter Seidel, and Christian Theobalt. 2011. Fast articulated motion tracking using a sums of Gaussians body model. In IEEE International Conference on Computer Vision (ICCV). 951–958. Google ScholarDigital Library
    91. Leonid Taycher, David Demirdjian, Trevor Darrell, and Gregory Shakhnarovich. 2006. Conditional random people: Tracking humans with crfs and grid filters. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), Vol. 1. IEEE, 222–229. Google ScholarDigital Library
    92. Camillo J Taylor. 2000. Reconstruction of articulated objects from point correspondences in a single uncalibrated image. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vol. 1. 677–684.Google ScholarCross Ref
    93. Bugra Tekin, Isinsu Katircioglu, Mathieu Salzmann, Vincent Lepetit, and Pascal Fua. 2016a. Structured Prediction of 3D Human Pose with Deep Neural Networks. In British Machine Vision Conference (BMVC). Google ScholarCross Ref
    94. Bugra Tekin, Pablo Márquez-Neila, Mathieu Salzmann, and Pascal Fua. 2016b. Fusing 2D Uncertainty and 3D Cues for Monocular Body Pose Estimation. arXiv preprint arXiv:1611.05708 (2016).Google Scholar
    95. Bugra Tekin, Artem Rozantsev, Vincent Lepetit, and Pascal Fua. 2016c. Direct Prediction of 3D Body Poses from Motion Compensated Sequences. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Google ScholarCross Ref
    96. Jonathan J Tompson, Arjun Jain, Yann LeCun, and Christoph Bregler. 2014. Joint training of a convolutional network and a graphical model for human pose estimation. In Advances in Neural Information Processing Systems (NIPS). 1799–1807.Google Scholar
    97. Alexander Toshev and Christian Szegedy. 2014. Deeppose: Human pose estimation via deep neural networks. In Conference on Computer Vision and Pattern Recognition (CVPR). 1653–1660. Google ScholarDigital Library
    98. Raquel Urtasun, David J Fleet, and Pascal Fua. 2006. Temporal motion models for monocular and multiview 3d human body tracking. Computer vision and image understanding 104, 2 (2006), 157–177. Google ScholarDigital Library
    99. Marek Vondrak, Leonid Sigal, Jessica Hodgins, and Odest Jenkins. 2012. Video-based 3D motion capture through biped control. ACM Transactions On Graphics (TOG) 31, 4 (2012), 27.Google ScholarDigital Library
    100. Chunyu Wang, Yizhou Wang, Zhouchen Lin, Alan L Yuille, and Wen Gao. 2014. Robust estimation of 3d human poses from a single image. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2361–2368.Google ScholarDigital Library
    101. Shih-En Wei, Varun Ramakrishna, Takeo Kanade, and Yaser Sheikh. 2016. Convolutional Pose Machines. In Conference on Computer Vision and Pattern Recognition (CVPR). Google ScholarCross Ref
    102. Xiaolin Wei and Jinxiang Chai. 2010. Videomocap: modeling physically realistic human motion from monocular video sequences. In ACM Transactions on Graphics (TOG), Vol. 29. ACM, 42. Google ScholarDigital Library
    103. Xiaolin Wei, Peizhao Zhang, and Jinxiang Chai. 2012. Accurate realtime full-body motion capture using a single depth camera. ACM Transactions on Graphics (TOG) 31, 6 (2012), 188.Google ScholarDigital Library
    104. Christopher Richard Wren, Ali Azarbayejani, Trevor Darrell, and Alex Paul Pentland. 1997. Pfinder: real-time tracking of the human body. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 19, 7 (1997), 780–785. Google ScholarDigital Library
    105. Hashim Yasin, Umar Iqbal, Björn Krüger, Andreas Weber, and Juergen Gall. 2016. A Dual-Source Approach for 3D Pose Estimation from a Single Image. In Conference on Computer Vision and Pattern Recognition (CVPR). Google ScholarCross Ref
    106. Mao Ye and Ruigang Yang. 2014. Real-time simultaneous pose and shape estimation for articulated objects using a single depth camera. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2345–2352. Google ScholarDigital Library
    107. Yongkang Yu, Feilinand Yonghao, Zhen Yilin, and Weidong Mohan. 2016. Marker-less 3D Human Motion Capture with Monocular Image Sequence and Height-Maps. In European Conference on Computer Vision (ECCV).Google Scholar
    108. Matthew D Zeiler. 2012. ADADELTA: an adaptive learning rate method. arXiv preprint arXiv:1212.5701 (2012).Google Scholar
    109. Xiaowei Zhou, Spyridon Leonardos, Xiaoyan Hu, and Kostas Daniilidis. 2015. 3D shape estimation from 2D landmarks: A convex relaxation approach. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4447–4455. Google ScholarCross Ref
    110. Xingyi Zhou, Xiao Sun, Wei Zhang, Shuang Liang, and Yichen Wei. 2016. Deep Kinematic Pose Regression. ECCV Worktp on Geometry Meets Deep Learning.Google Scholar
    111. Xiaowei Zhou, Menglong Zhu, Spyridon Leonardos, and Kostas Daniilidis. 2015a. Sparse Representation for 3D Shape Estimation: A Convex Relaxation Approach. arXiv preprint arXiv:1509.04309 (2015).Google Scholar
    112. Xiaowei Zhou, Menglong Zhu, Spyridon Leonardos, Kosta Derpanis, and Kostas Daniilidis. 2015b. Sparseness Meets Deepness: 3D Human Pose Estimation from Monocular Video. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
    113. Yingying Zhu, Mark Cox, and Simon Lucey. 2011. 3D motion reconstruction for real-world camera motion. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 1–8.Google ScholarDigital Library
    114. Michael Zollhöfer, Matthias Nießner, Shahram Izadi, Christoph Rhemann, Christopher Zach, Matthew Fisher, Chenglei Wu, Andrew Fitzgibbon, Charles Loop, Christian Theobalt, and Marc Stamminger. 2014. Real-time Non-rigid Reconstruction using an RGB-D Camera. ACM Transactions on Graphics (TOG) 33, 4 (2014). Google ScholarDigital Library


ACM Digital Library Publication:



Overview Page: