“VNect: real-time 3D human pose estimation with a single RGB camera” by Mehta, Sridhar, Sotnychenko, Rhodin, Shafiei, et al. …
Conference:
Type(s):
Title:
- VNect: real-time 3D human pose estimation with a single RGB camera
Session/Category Title: Get More Out of Your Photo
Presenter(s)/Author(s):
- Dushyant Mehta
- Srinath Sridhar
- Oleksandr Sotnychenko
- Helge Rhodin
- Mohammad Shafiei
- Hans-Peter Seidel
- Weipeng Xu
- Dan Casas
- Christian Theobalt
Moderator(s):
Abstract:
We present the first real-time method to capture the full global 3D skeletal pose of a human in a stable, temporally consistent manner using a single RGB camera. Our method combines a new convolutional neural network (CNN) based pose regressor with kinematic skeleton fitting. Our novel fully-convolutional pose formulation regresses 2D and 3D joint positions jointly in real time and does not require tightly cropped input frames. A real-time kinematic skeleton fitting method uses the CNN output to yield temporally stable 3D global pose reconstructions on the basis of a coherent kinematic skeleton. This makes our approach the first monocular RGB method usable in real-time applications such as 3D character control—thus far, the only monocular methods for such applications employed specialized RGB-D cameras. Our method’s accuracy is quantitatively on par with the best offline 3D monocular RGB pose estimation methods. Our results are qualitatively comparable to, and sometimes better than, results from monocular RGB-D approaches, such as the Kinect. However, we show that our approach is more broadly applicable than RGB-D solutions, i.e., it works for outdoor scenes, community videos, and low quality commodity RGB cameras.
References:
1. Ankur Agarwal and Bill Triggs. 2006. Recovering 3D human pose from monocular images. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 28, 1 (2006), 44–58. Google ScholarDigital Library
2. Sameer Agarwal, Keir Mierle, and Others. 2017. Ceres Solver. http://ceres-solver.org. (2017).Google Scholar
3. Ijaz Akhter and Michael J Black. 2015. Pose-conditioned joint angle limits for 3D human pose reconstruction. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1446–1455. Google ScholarCross Ref
4. Sikandar Amin, Mykhaylo Andriluka, Marcus Rohrbach, and Bernt Schiele. 2013. Multiview Pictorial Structures for 3D Human Pose Estimation. In BMVC.Google Scholar
5. Mykhaylo Andriluka, Leonid Pishchulin, Peter Gehler, and Bernt Schiele. 2014. 2D Human Pose Estimation: New Benchmark and State of the Art Analysis. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Google ScholarDigital Library
6. Mykhaylo Andriluka, Stefan Roth, and Bernt Schiele. 2009. Pictorial structures revisited: People detection and articulated pose estimation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1014–1021. Google ScholarCross Ref
7. Anelia Angelova, Alex Krizhevsky, Vincent Vanhoucke, Abhijit Ogale, and Dave Ferguson. 2015. Real-Time Pedestrian Detection With Deep Network Cascades. In Proceedings of BMVC 2015. Google ScholarCross Ref
8. Andreas Baak, Meinard Müller, Gaurav Bharaj, Hans-Peter Seidel, and Christian Theobalt. 2011. A Data-Driven Approach for Real-Time Full Body Pose Reconstruction from a Depth Camera. In IEEE International Conference on Computer Vision (ICCV). Google ScholarDigital Library
9. Alexandru O Balan, Leonid Sigal, and Michael J Black. 2005. A quantitative evaluation of video-based 3D person tracking. In 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance. IEEE, 349–356.Google ScholarCross Ref
10. Alexandru O Balan, Leonid Sigal, Michael J Black, James E Davis, and Horst W Haussecker. 2007. Detailed human shape and pose from images. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1–8.Google ScholarCross Ref
11. Vasileios Belagiannis, Sikandar Amin, Mykhaylo Andriluka, Bernt Schiele, Nassir Navab, and Slobodan Ilic. 2014. 3D pictorial structures for multiple human pose estimation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1669–1676. Google ScholarDigital Library
12. Vasileios Belagiannis and Andrew Zisserman. 2016. Recurrent Human Pose Estimation. arXiv preprint arXiv:1605.02914 (2016).Google Scholar
13. Alessandro Bissacco, Ming-Hsuan Yang, and Stefano Soatto. 2007. Fast human pose estimation using appearance and motion via multi-dimensional boosting regression. In 2007 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1–8. Google ScholarCross Ref
14. Liefeng Bo and Cristian Sminchisescu. 2010. Twin gaussian processes for structured prediction. International Journal of Computer Vision 87, 1–2 (2010), 28–52.Google ScholarDigital Library
15. Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter Gehler, Javier Romero, and Michael J. Black. 2016. Keep it SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image. In European Conference on Computer Vision (ECCV). Google ScholarCross Ref
16. Lubomir Bourdev and Jitendra Malik. 2009. Poselets: Body part detectors trained using 3d human pose annotations. In IEEE International Conference on Computer Vision (ICCV). 1365–1372. Google ScholarCross Ref
17. Ernesto Brau and Hao Jiang. 2016. 3D Human Pose Estimation via Deep Learning from 2D Annotations. In International Conference on 3D Vision (3DV).Google ScholarCross Ref
18. Christoph Bregler and Jitendra Malik. 1998. Tracking people with twists and exponential maps. In Conference on Computer Vision and Pattern Recognition. 8–15. Google ScholarCross Ref
19. Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2016. Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. arXiv preprint arXiv:1611.08050 (2016).Google Scholar
20. Géry Casiez, Nicolas Roussel, and Daniel Vogel. 2012. 1âĆň filter: a simple speed-based low-pass filter for noisy input in interactive systems. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 2527–2530.Google ScholarDigital Library
21. Jinxiang Chai and Jessica K Hodgins. 2005. Performance animation from low-dimensional control signals. ACM Transactions on Graphics (TOG) 24, 3 (2005), 686–696. Google ScholarDigital Library
22. Wenzheng Chen, Huan Wang, Yangyan Li, Hao Su, Zhenhua Wang, Changhe Tu, Dani Lischinski, Daniel Cohen-Or, and Baoquan Chen. 2016. Synthesizing Training Images for Boosting Human 3D Pose Estimation. In International Conference on 3D Vision (3DV).Google ScholarCross Ref
23. Martin de La Gorce, Nikos Paragios, and David J Fleet. 2008. Model-based hand tracking with texture, shading and self-occlusions. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference On. IEEE, 1–8.Google ScholarCross Ref
24. Jonathan Deutscher and Ian Reid. 2005. Articulated body motion capture by stochastic search. International Journal of Computer Vision 61, 2 (2005), 185–205. Google ScholarDigital Library
25. Mingsong Dou, Sameh Khamis, Yury Degtyarev, Philip Davidson, Sean Ryan Fanello, Adarsh Kowdle, Sergio Orts Escolano, Christoph Rhemann, David Kim, Jonathan Taylor, and others. 2016. Fusion4d: Real-time performance capture of challenging scenes. ACM Transactions on Graphics (TOG) 35, 4 (2016), 114.Google ScholarDigital Library
26. Ahmed Elgammal and Chan-Su Lee. 2004. Inferring 3D body pose from silhouettes using activity manifold learning. In Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on, Vol. 2. IEEE, II-681.Google ScholarCross Ref
27. Ahmed Elhayek, Edilson de Aguiar, Arjun Jain, Jonathan Tompson, Leonid Pishchulin, Mykhaylo Andriluka, Christoph Bregler, Bernt Schiele, and Christian Theobalt. 2016. MARCOnI – ConvNet-based MARker-less Motion Capture in Outdoor and Indoor Scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) (2016).Google Scholar
28. Pedro F Felzenszwalb, Ross B Girshick, David McAllester, and Deva Ramanan. 2010. Object detection with discriminatively trained part-based models. In IEEE transactions on pattern analysis and machine intelligence. IEEE, 1627–1645.Google Scholar
29. Pedro F Felzenszwalb and Daniel P Huttenlocher. 2005. Pictorial structures for object recognition. International Journal of Computer Vision (IJCV) 61, 1 (2005), 55–79. Google ScholarDigital Library
30. Vittorio Ferrari, Manuel Marin-Jimenez, and Andrew Zisserman. 2009. Pose search: retrieving people using their pose. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1–8. Google ScholarCross Ref
31. Juergen Gall, Bodo Rosenhahn, Thomas Brox, and Hans-Peter Seidel. 2010. Optimization and Filtering for Human Motion Capture. International Journal of Computer Vision (IJCV) 87, 1–2 (2010), 75–92.Google ScholarDigital Library
32. Varun Ganapathi, Christian Plagemann, Daphne Koller, and Sebastian Thrun. 2012. Real-time human pose tracking from range data. In European conference on computer vision. Springer, 738–751. Google ScholarDigital Library
33. Ravi Garg, Anastasios Roussos, and Lourdes Agapito. 2013. Dense variational reconstruction of non-rigid surfaces from monocular video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1272–1279. Google ScholarDigital Library
34. Ross Girshick, Jamie Shotton, Pushmeet Kohli, Antonio Criminisi, and Andrew Fitzgibbon. 2011. Efficient regression of general-activity human poses from depth images. In Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, 415–422. Google ScholarDigital Library
35. Paulo FU Gotardo and Aleix M Martinez. 2011. Computing smooth time trajectories for camera and deformable shape in structure from motion with occlusion. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 10 (2011), 2051–2065. Google ScholarDigital Library
36. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR). Google ScholarCross Ref
37. Nicholas R Howe, Michael E Leventon, and William T Freeman. 1999. Bayesian Reconstruction of 3D Human Motion from Single-Camera Video.. In NIPS, Vol. 99. 820–6.Google Scholar
38. Peiyun Hu, Deva Ramanan, Jia Jia, Sen Wu, Xiaohui Wang, Lianhong Cai, and Jie Tang. 2016. Bottom-Up and Top-Down Reasoning with Hierarchical Rectified Gaussians. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Google ScholarCross Ref
39. Matthias Innmann, Michael Zollhöfer, Matthias Nießner, Christian Theobalt, and Marc Stamminger. 2016. VolumeDeform: Real-time Volumetric Non-rigid Reconstruction. (October 2016), 17.Google Scholar
40. Eldar Insafutdinov, Leonid Pishchulin, Bjoern Andres, Mykhaylo Andriluka, and Bernt Schiele. 2016. DeeperCut: A Deeper, Stronger, and Faster Multi-Person Pose Estimation Model. In European Conference on Computer Vision (ECCV). Google ScholarCross Ref
41. Sergey Ioffe and Christian Szegedy. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of The 32nd International Conference on Machine Learning. 448–456.Google ScholarDigital Library
42. Catalin Ionescu, Joao Carreira, and Cristian Sminchisescu. 2014a. Iterated second-order label sensitive pooling for 3d human pose estimation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1661–1668. Google ScholarDigital Library
43. Catalin Ionescu, Dragos Papava, Vlad Olaru, and Cristian Sminchisescu. 2014b. Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 36, 7 (2014), 1325–1339. Google ScholarDigital Library
44. Arjun Jain, Thorsten Thormählen, Hans-Peter Seidel, and Christian Theobalt. 2010. MovieReshape: Tracking and Reshaping of Humans in Videos. ACM Transactions on Graphics 29, 5 (2010). Google ScholarDigital Library
45. Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM International Conference on Multimedia. 675–678. Google ScholarDigital Library
46. Sam Johnson and Mark Everingham. 2010. Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation. In British Machine Vision Conference (BMVC). Google ScholarCross Ref
47. Sam Johnson and Mark Everingham. 2011. Learning Effective Human Pose Estimation from Inaccurate Annotation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Google ScholarDigital Library
48. Minsik Lee, Jungchan Cho, Chong-Ho Choi, and Songhwai Oh. 2013. Procrustean normal distribution for non-rigid structure from motion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1280–1287. Google ScholarDigital Library
49. Sijin Li and Antoni B Chan. 2014. 3d human pose estimation from monocular images with deep convolutional neural network. In Asian Conference on Computer Vision (ACCV). 332–347.Google Scholar
50. Sijin Li, Weichen Zhang, and Antoni B Chan. 2015a. Maximum-margin structured learning with deep networks for 3d human pose estimation. In IEEE International Conference on Computer Vision (ICCV). 2848–2856.Google ScholarDigital Library
51. Sijin Li, Weichen Zhang, and Antoni B Chan. 2015b. Maximum-margin structured learning with deep networks for 3d human pose estimation. In IEEE International Conference on Computer Vision (ICCV). 2848–2856. Google ScholarDigital Library
52. Ita Lifshitz, Ethan Fetaya, and Shimon Ullman. 2016. Human Pose Estimation using Deep Consensus Voting. In European Conference on Computer Vision (ECCV). Google ScholarCross Ref
53. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, and Scott E. Reed. 2016. SSD: Single Shot MultiBox Detector. In European Conference on Computer Vision (ECCV). Google ScholarCross Ref
54. Matthew M Loper and Michael J Black. 2014. OpenDR: An approximate differentiable renderer. In European Conference on Computer Vision. Springer, 154–169.Google ScholarCross Ref
55. Ziyang Ma and Enhua Wu. 2014. Real-time and robust hand tracking with a single depth camera. The Visual Computer 30, 10 (2014), 1133–1144. Google ScholarDigital Library
56. Dushyant Mehta, Helge Rhodin, Dan Casas, Oleksandr Sotnychenko, Weipeng Xu, and Christian Theobalt. 2016. Monocular 3D Human Pose Estimation In The Wild Using Improved CNN Supervision. arXiv preprint arXiv:1611.09813v2 (2016).Google Scholar
57. Alberto Menache. 2000. Understanding motion capture for computer animation and video games. Morgan kaufmann.Google Scholar
58. Microsoft Corporation. 2010. Kinect for Xbox 360. http://www.xbox.com/en-US/xbox-360/accessories/kinect. (2010).Google Scholar
59. Microsoft Corporation. 2013. Kinect for Xbox One. http://www.xbox.com/en-US/xbox-one/accessories/kinect. (2013).Google Scholar
60. Microsoft Corporation. 2015. Kinect SDK. https://developer.microsoft.com/en-us/windows/kinect. (2015).Google Scholar
61. Thomas B. Moeslund, Adrian Hilton, and Volker KrÃiger. 2006. A Survey of Advances in Vision-based Human Motion Capture and Analysis. CVIU 104, 2–3 (2006), 90–126.Google ScholarDigital Library
62. Greg Mori and Jitendra Malik. 2006. Recovering 3d human body configurations using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 28, 7 (2006), 1052–1062. Google ScholarDigital Library
63. Richard A. Newcombe, Dieter Fox, and Steven M. Seitz. 2015. DynamicFusion: Reconstruction and Tracking of Non-Rigid Scenes in Real-Time. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Google ScholarCross Ref
64. Alejandro Newell, Kaiyu Yang, and Jia Deng. 2016. Stacked Hourglass Networks for Human Pose Estimation. In European Conference on Computer Vision (ECCV). Google ScholarCross Ref
65. Iason Oikonomidis, Nikolaos Kyriazis, and Antonis A Argyros. 2011. Efficient model-based 3D tracking of hand articulations using Kinect.. In BmVC, Vol. 1. 3.Google Scholar
66. Sergio Orts-Escolano, Christoph Rhemann, Sean Fanello, Wayne Chang, Adarsh Kowdle, Yury Degtyarev, David Kim, Philip L Davidson, Sameh Khamis, Mingsong Dou, and others. 2016. Holoportation: Virtual 3D Teleportation in Real-time. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology. ACM, 741–754. Google ScholarDigital Library
67. Hyun Soo Park and Yaser Sheikh. 2011. 3D reconstruction of a smooth articulated trajectory from a monocular image sequence. In International Conference on Computer Vision (ICCV). 201–208. Google ScholarDigital Library
68. Georgios Pavlakos, Xiaowei Zhou, Konstantinos G Derpanis, and Kostas Daniilidis. 2016. Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose. arXiv preprint arXiv:1611.07828 (2016).Google Scholar
69. Leonid Pishchulin, Mykhaylo Andriluka, Peter Gehler, and Bernt Schiele. 2013. Strong appearance and expressive spatial models for human pose estimation. In Proceedings of the IEEE International Conference on Computer Vision. 3487–3494. Google ScholarDigital Library
70. Leonid Pishchulin, Eldar Insafutdinov, Siyu Tang, Bjoern Andres, Mykhaylo Andriluka, Peter Gehler, and Bernt Schiele. 2016. DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Google ScholarCross Ref
71. Gerard Pons-Moll, David J Fleet, and Bodo Rosenhahn. 2014. Posebits for monocular human pose estimation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2337–2344.Google ScholarDigital Library
72. Real Madrid C.F. 2016. Cristiano Ronaldo and Coentrao continue their recovery. https://www.youtube.com/watch?v=xqiPuX_buOo. (2016).Google Scholar
73. Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2015. You only look once: Unified, real-time object detection. arXiv preprint arXiv:1506.02640 (2015).Google Scholar
74. Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems. 91–99.Google Scholar
75. Helge Rhodin, Christian Richardt, Dan Casas, Eldar Insafutdinov, Mohammad Shafiei, Hans-Peter Seidel, Bernt Schiele, and Christian Theobalt. 2016a. EgoCap: Egocentric Marker-less Motion Capture with Two Fisheye Cameras. ACM Trans. Graph. (Proc. SIGGRAPH Asia) (2016).Google Scholar
76. Helge Rhodin, Nadia Robertini, Dan Casas, Christian Richardt, Hans-Peter Seidel, and Christian Theobalt. 2016b. General automatic human shape and motion capture using volumetric contour cues. In European Conference on Computer Vision (ECCV). Springer, 509–526. Google ScholarCross Ref
77. Helge Rhodin, Nadia Robertini, Christian Richardt, Hans-Peter Seidel, and Christian Theobalt. 2015. A Versatile Scene Model With Differentiable Visibility Applied to Generative Pose Estimation. In ICCV. Google ScholarDigital Library
78. Grégory Rogez and Cordelia Schmid. 2016. MoCap-guided Data Augmentation for 3D Pose Estimation in the Wild. arXiv preprint arXiv:1607.02046 (2016).Google Scholar
79. Lorenz Rogge, Felix Klose, Michael Stengel, Martin Eisemann, and Marcus Magnor. 2014. Garment replacement in monocular video sequences. ACM Transactions on Graphics (TOG) 34, 1 (2014), 6.Google ScholarDigital Library
80. Rómer Rosales and Stan Sclaroff. 2000. Specialized mappings and the estimation of human body pose from a single image. In Human Motion, 2000. Proceedings. Workshop on. IEEE, 19–24. Google ScholarCross Ref
81. Rómer Rosales and Stan Sclaroff. 2006. Combining generative and discriminative models in a framework for articulated pose estimation. International Journal of Computer Vision 67, 3 (2006), 251–276. Google ScholarDigital Library
82. RUSFENCING-TV. 2017. The Most Beautiful Strike / Saber Woman (Translated from Russian). https://www.youtube.com/watch?v=0gOcMsWUkCU. (2017).Google Scholar
83. Jamie Shotton, Toby Sharp, Alex Kipman, Andrew Fitzgibbon, Mark Finocchio, Andrew Blake, Mat Cook, and Richard Moore. 2013. Real-time human pose recognition in parts from single depth images. Commun. ACM 56, 1 (2013), 116–124. Google ScholarDigital Library
84. Hedvig Sidenbladh, Michael J Black, and David J Fleet. 2000. Stochastic tracking of 3D human figures using 2D image motion. In European conference on computer vision. Springer, 702–718.Google ScholarCross Ref
85. Leonid Sigal, Michael Isard, Horst Haussecker, and Michael J Black. 2012. Loose-limbed people: Estimating 3D human pose and motion using non-parametric belief propagation. International Journal of Computer Vision (IJCV) 98, 1 (2012), 15–48. Google ScholarDigital Library
86. Cristian Sminchisescu, Atul Kanaujia, and Dimitris Metaxas. 2006. Learning joint top-down and bottom-up processes for 3D visual inference. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1743–1752. Google ScholarDigital Library
87. Cristian Sminchisescu, Atul Kanaujia, and Dimitris N Metaxas. 2007. BM3E: Discriminative Density Propagation for Visual Tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence 29, 11 (2007), 2030–2044. Google ScholarDigital Library
88. Cristian Sminchisescu and Bill Triggs. 2001. Covariance scaled sampling for monocular 3D body tracking. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vol. 1. IEEE, I-447. Google ScholarCross Ref
89. Jonathan Starck and Adrian Hilton. 2003. Model-based multiple view reconstruction of people. In IEEE International Conference on Computer Vision (ICCV). 915–922. Google ScholarCross Ref
90. Carsten Stoll, Nils Hasler, Juergen Gall, Hans-Peter Seidel, and Christian Theobalt. 2011. Fast articulated motion tracking using a sums of Gaussians body model. In IEEE International Conference on Computer Vision (ICCV). 951–958. Google ScholarDigital Library
91. Leonid Taycher, David Demirdjian, Trevor Darrell, and Gregory Shakhnarovich. 2006. Conditional random people: Tracking humans with crfs and grid filters. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), Vol. 1. IEEE, 222–229. Google ScholarDigital Library
92. Camillo J Taylor. 2000. Reconstruction of articulated objects from point correspondences in a single uncalibrated image. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vol. 1. 677–684.Google ScholarCross Ref
93. Bugra Tekin, Isinsu Katircioglu, Mathieu Salzmann, Vincent Lepetit, and Pascal Fua. 2016a. Structured Prediction of 3D Human Pose with Deep Neural Networks. In British Machine Vision Conference (BMVC). Google ScholarCross Ref
94. Bugra Tekin, Pablo Márquez-Neila, Mathieu Salzmann, and Pascal Fua. 2016b. Fusing 2D Uncertainty and 3D Cues for Monocular Body Pose Estimation. arXiv preprint arXiv:1611.05708 (2016).Google Scholar
95. Bugra Tekin, Artem Rozantsev, Vincent Lepetit, and Pascal Fua. 2016c. Direct Prediction of 3D Body Poses from Motion Compensated Sequences. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Google ScholarCross Ref
96. Jonathan J Tompson, Arjun Jain, Yann LeCun, and Christoph Bregler. 2014. Joint training of a convolutional network and a graphical model for human pose estimation. In Advances in Neural Information Processing Systems (NIPS). 1799–1807.Google Scholar
97. Alexander Toshev and Christian Szegedy. 2014. Deeppose: Human pose estimation via deep neural networks. In Conference on Computer Vision and Pattern Recognition (CVPR). 1653–1660. Google ScholarDigital Library
98. Raquel Urtasun, David J Fleet, and Pascal Fua. 2006. Temporal motion models for monocular and multiview 3d human body tracking. Computer vision and image understanding 104, 2 (2006), 157–177. Google ScholarDigital Library
99. Marek Vondrak, Leonid Sigal, Jessica Hodgins, and Odest Jenkins. 2012. Video-based 3D motion capture through biped control. ACM Transactions On Graphics (TOG) 31, 4 (2012), 27.Google ScholarDigital Library
100. Chunyu Wang, Yizhou Wang, Zhouchen Lin, Alan L Yuille, and Wen Gao. 2014. Robust estimation of 3d human poses from a single image. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2361–2368.Google ScholarDigital Library
101. Shih-En Wei, Varun Ramakrishna, Takeo Kanade, and Yaser Sheikh. 2016. Convolutional Pose Machines. In Conference on Computer Vision and Pattern Recognition (CVPR). Google ScholarCross Ref
102. Xiaolin Wei and Jinxiang Chai. 2010. Videomocap: modeling physically realistic human motion from monocular video sequences. In ACM Transactions on Graphics (TOG), Vol. 29. ACM, 42. Google ScholarDigital Library
103. Xiaolin Wei, Peizhao Zhang, and Jinxiang Chai. 2012. Accurate realtime full-body motion capture using a single depth camera. ACM Transactions on Graphics (TOG) 31, 6 (2012), 188.Google ScholarDigital Library
104. Christopher Richard Wren, Ali Azarbayejani, Trevor Darrell, and Alex Paul Pentland. 1997. Pfinder: real-time tracking of the human body. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 19, 7 (1997), 780–785. Google ScholarDigital Library
105. Hashim Yasin, Umar Iqbal, Björn Krüger, Andreas Weber, and Juergen Gall. 2016. A Dual-Source Approach for 3D Pose Estimation from a Single Image. In Conference on Computer Vision and Pattern Recognition (CVPR). Google ScholarCross Ref
106. Mao Ye and Ruigang Yang. 2014. Real-time simultaneous pose and shape estimation for articulated objects using a single depth camera. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2345–2352. Google ScholarDigital Library
107. Yongkang Yu, Feilinand Yonghao, Zhen Yilin, and Weidong Mohan. 2016. Marker-less 3D Human Motion Capture with Monocular Image Sequence and Height-Maps. In European Conference on Computer Vision (ECCV).Google Scholar
108. Matthew D Zeiler. 2012. ADADELTA: an adaptive learning rate method. arXiv preprint arXiv:1212.5701 (2012).Google Scholar
109. Xiaowei Zhou, Spyridon Leonardos, Xiaoyan Hu, and Kostas Daniilidis. 2015. 3D shape estimation from 2D landmarks: A convex relaxation approach. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4447–4455. Google ScholarCross Ref
110. Xingyi Zhou, Xiao Sun, Wei Zhang, Shuang Liang, and Yichen Wei. 2016. Deep Kinematic Pose Regression. ECCV Worktp on Geometry Meets Deep Learning.Google Scholar
111. Xiaowei Zhou, Menglong Zhu, Spyridon Leonardos, and Kostas Daniilidis. 2015a. Sparse Representation for 3D Shape Estimation: A Convex Relaxation Approach. arXiv preprint arXiv:1509.04309 (2015).Google Scholar
112. Xiaowei Zhou, Menglong Zhu, Spyridon Leonardos, Kosta Derpanis, and Kostas Daniilidis. 2015b. Sparseness Meets Deepness: 3D Human Pose Estimation from Monocular Video. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
113. Yingying Zhu, Mark Cox, and Simon Lucey. 2011. 3D motion reconstruction for real-world camera motion. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 1–8.Google ScholarDigital Library
114. Michael Zollhöfer, Matthias Nießner, Shahram Izadi, Christoph Rhemann, Christopher Zach, Matthew Fisher, Chenglei Wu, Andrew Fitzgibbon, Charles Loop, Christian Theobalt, and Marc Stamminger. 2014. Real-time Non-rigid Reconstruction using an RGB-D Camera. ACM Transactions on Graphics (TOG) 33, 4 (2014). Google ScholarDigital Library