“SFV: reinforcement learning of physical skills from videos” – ACM SIGGRAPH HISTORY ARCHIVES

“SFV: reinforcement learning of physical skills from videos”

  • 2018 SA Technical Papers_Peng_SFV: reinforcement learning of physical skills from videos

Conference:


Type(s):


Title:

    SFV: reinforcement learning of physical skills from videos

Session/Category Title:   Character animation


Presenter(s)/Author(s):


Moderator(s):



Abstract:


    Data-driven character animation based on motion capture can produce highly naturalistic behaviors and, when combined with physics simulation, can provide for natural procedural responses to physical perturbations, environmental changes, and morphological discrepancies. Motion capture remains the most popular source of motion data, but collecting mocap data typically requires heavily instrumented environments and actors. In this paper, we propose a method that enables physically simulated characters to learn skills from videos (SFV). Our approach, based on deep pose estimation and deep reinforcement learning, allows data-driven animation to leverage the abundance of publicly available video clips from the web, such as those from YouTube. This has the potential to enable fast and easy design of character controllers simply by querying for video recordings of the desired behavior. The resulting controllers are robust to perturbations, can be adapted to new settings, can perform basic object interactions, and can be retargeted to new morphologies via reinforcement learning. We further demonstrate that our method can predict potential human motions from still images, by forward simulation of learned controllers initialized from the observed pose. Our framework is able to learn a broad range of dynamic skills, including locomotion, acrobatics, and martial arts. (Video1)

References:


    1. Salman Aslam. 2018. YouTube by the Numbers. https://www.omnicoreagency.com/youtube-statistics/. (2018). Accessed: 2018-05-15.Google Scholar
    2. Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter Gehler, Javier Romero, and Michael J. Black. 2016. Keep it SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image. In European Conference on Computer Vision, ECCV (Lecture Notes in Computer Science). Springer International Publishing.Google Scholar
    3. Christoph Bregler and Jitendra Malik. 1998. Tracking people with twists and exponential maps. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR. IEEE, 8–15. Google ScholarDigital Library
    4. Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. OpenAI Gym. CoRR abs/1606.01540 (2016). arXiv:1606.01540Google Scholar
    5. Bullet. 2015. Bullet Physics Library. (2015). http://bulletphysics.org.Google Scholar
    6. Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2017. Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. In CVPR.Google Scholar
    7. CMU. 2018. CMU Graphics Lab Motion Capture Database. (2018). http://mocap.cs.cmu.edu.Google Scholar
    8. Stelian Coros, Philippe Beaudoin, and Michiel van de Panne. 2009. Robust Task-based Control Policies for Physics-based Characters. ACM Trans. Graph. (Proc. SIGGRAPH Asia) 28, 5 (2009), Article 170. Google ScholarDigital Library
    9. Stelian Coros, Philippe Beaudoin, and Michiel van de Panne. 2010. Generalized Biped Walking Control. ACM Transctions on Graphics 29, 4 (2010), Article 130. Google ScholarDigital Library
    10. Stelian Coros, Andrej Karpathy, Ben Jones, Lionel Reveret, and Michiel van de Panne. 2011. Locomotion Skills for Simulated Quadrupeds. ACM Transactions on Graphics 30, 4 (2011), Article TBD. Google ScholarDigital Library
    11. Marco da Silva, Yeuhi Abe, and Jovan Popović. 2008. Interactive Simulation of Stylized Human Locomotion. In ACM SIGGRAPH 2008 Papers (SIGGRAPH ’08). ACM, New York, NY, USA, Article 82, 10 pages. Google ScholarDigital Library
    12. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. 2009. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09.Google Scholar
    13. Yan Duan, Xi Chen, Rein Houthooft, John Schulman, and Pieter Abbeel. 2016. Benchmarking Deep Reinforcement Learning for Continuous Control. CoRR abs/1604.06778 (2016). arXiv:1604.06778 Google ScholarDigital Library
    14. Thomas Geijtenbeek, Michiel van de Panne, and A. Frank van der Stappen. 2013. Flexible Muscle-Based Locomotion for Bipedal Creatures. ACM Transactions on Graphics 32, 6 (2013). Google ScholarDigital Library
    15. Daniel Holden, Taku Komura, and Jun Saito. 2017. Phase-functioned Neural Networks for Character Control. ACM Trans. Graph. 36, 4, Article 42 (July 2017), 13 pages. Google ScholarDigital Library
    16. Daniel Holden, Jun Saito, and Taku Komura. 2016. A Deep Learning Framework for Character Motion Synthesis and Editing. ACM Trans. Graph. 35, 4, Article 138 (July 2016), 11 pages. Google ScholarDigital Library
    17. Catalin Ionescu, Dragos Papava, Vlad Olaru, and Cristian Sminchisescu. 2014. Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments. IEEE Transactions on Pattern Analysis and Machine Intelligence, TPAMI 36, 7 (2014), 1325–1339. Google ScholarDigital Library
    18. Ronald J. Williams and Jing Peng. 1991. Function Optimization Using Connectionist Reinforcement Learning Algorithms. 3 (09 1991), 241-.Google Scholar
    19. Sam Johnson and Mark Everingham. 2010. Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation. In Proceedings of the British Machine Vision Conference, BMVC. 12.1–12.11.Google Scholar
    20. Sham Kakade. 2001. A Natural Policy Gradient. In Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic (NIPS’01). MIT Press, Cambridge, MA, USA, 1531–1538. http://dl.acm.org/citation.cfm?id=2980539.2980738 Google ScholarDigital Library
    21. Angjoo Kanazawa, Michael J. Black, David W. Jacobs, and Jitendra Malik. 2018. End-to-end Recovery of Human Shape and Pose. In Computer Vision and Pattern Regognition (CVPR).Google Scholar
    22. H. Lee and Z. Chen. 1985. Determination of 3D human body postures from a single view. Computer Vision Graphics and Image Processing 30, 2 (1985), 148–168.Google ScholarCross Ref
    23. Yoonsang Lee, Sungeun Kim, and Jehee Lee. 2010a. Data-driven Biped Control. In ACM SIGGRAPH 2010 Papers (SIGGRAPH ’10). ACM, New York, NY, USA, Article 129, 8 pages. Google ScholarDigital Library
    24. Yoonsang Lee, Moon Seok Park, Taesoo Kwon, and Jehee Lee. 2014. Locomotion Control for Many-muscle Humanoids. ACM Trans. Graph. 33, 6, Article 218 (Nov. 2014), 11 pages. Google ScholarDigital Library
    25. Yongjoon Lee, Kevin Wampler, Gilbert Bernstein, Jovan Popović, and Zoran Popović. 2010b. Motion Fields for Interactive Character Locomotion. In ACM SIGGRAPH Asia 2010 Papers (SIGGRAPH ASIA ’10). ACM, New York, NY, USA, Article 138, 8 pages. Google ScholarDigital Library
    26. Sergey Levine, Jack M. Wang, Alexis Haraux, Zoran Popović, and Vladlen Koltun. 2012. Continuous Character Control with Low-Dimensional Embeddings. ACM Transactions on Graphics 31, 4 (2012), 28. Google ScholarDigital Library
    27. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollar, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. In European Conference on Computer Vision (ECCV) (2014-01-01). Zaijrich./se3/wp-content/uploads/2014/09/coco_eccv.pdf, http://mscoco.orgGoogle Scholar
    28. Libin Liu and Jessica Hodgins. 2017. Learning to Schedule Control Fragments for Physics-Based Characters Using Deep Q-Learning. ACM Trans. Graph. 36, 3, Article 29 (June 2017), 14 pages. Google ScholarDigital Library
    29. Libin Liu, Michiel van de Panne, and KangKang Yin. 2016. Guided Learning of Control Graphs for Physics-Based Characters. ACM Transactions on Graphics 35, 3 (2016). Google ScholarDigital Library
    30. Libin Liu, KangKang Yin, Michiel van de Panne, Tianjia Shao, and Weiwei Xu. 2010. Sampling-based Contact-rich Motion Control. ACM Transctions on Graphics 29, 4 (2010), Article 128. Google ScholarDigital Library
    31. Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J. Black. 2015. SMPL: A Skinned Multi-Person Linear Model. ACM Trans. Graphics (Proc. SIGGRAPH Asia) 34, 6 (Oct. 2015), 248:1–248:16. Google ScholarDigital Library
    32. Dushyant Mehta, Srinath Sridhar, Oleksandr Sotnychenko, Helge Rhodin, Mohammad Shafiei, Hans-Peter Seidel, Weipeng Xu, Dan Casas, and Christian Theobalt. 2017. VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera. ACM Transactions on Graphics (TOG) – Proceedings of ACM SIGGRAPH 36 (July 2017), 14. http://gvv.mpi-inf.mpg.de/projects/VNect/ Google ScholarDigital Library
    33. Josh Merel, Yuval Tassa, Dhruva TB, Sriram Srinivasan, Jay Lemmon, Ziyu Wang, Greg Wayne, and Nicolas Heess. 2017. Learning human behaviors from motion capture by adversarial imitation. CoRR abs/1707.02201 (2017). arXiv:1707.02201Google Scholar
    34. Alejandro Newell, Kaiyu Yang, and Jia Deng. 2016. Stacked hourglass networks for human pose estimation. In European Conference on Computer Vision, ECCV. 483–499.Google ScholarCross Ref
    35. Georgios Pavlakos, Xiaowei Zhou, Konstantinos G Derpanis, and Kostas Daniilidis. 2017. Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR.Google ScholarCross Ref
    36. Xue Bin Peng, Pieter Abbeel, Sergey Levine, and Michiel van de Panne. 2018. DeepMimic: Example-Guided Deep Reinforcement Learning of Physics-Based Character Skills. ACM Transactions on Graphics (Proc. SIGGRAPH 2018 – to appear) 37, 4 (2018). Google ScholarDigital Library
    37. Xue Bin Peng, Glen Berseth, and Michiel van de Panne. 2015. Dynamic Terrain Traversal Skills Using Reinforcement Learning. ACM Trans. Graph. 34, 4, Article 80 (July 2015), 11 pages. Google ScholarDigital Library
    38. Xue Bin Peng, Glen Berseth, and Michiel van de Panne. 2016. Terrain-Adaptive Locomotion Skills Using Deep Reinforcement Learning. ACM Transactions on Graphics (Proc. SIGGRAPH 2016) 35, 4 (2016). Google ScholarDigital Library
    39. Aravind Rajeswaran, Vikash Kumar, Abhishek Gupta, John Schulman, Emanuel Todorov, and Sergey Levine. 2017. Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations. CoRR abs/1709.10087 (2017). arXiv:1709.10087Google Scholar
    40. Gregory Rogez and Cordelia Schmid. 2016. MoCap-guided Data Augmentation for 3D Pose Estimation in the Wild. In Advances in Neural Information Processing Systems, (NIPS). Google ScholarDigital Library
    41. John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, and Pieter Abbeel. 2015a. Trust Region Policy Optimization. CoRR abs/1502.05477 (2015). arXiv:1502.05477 Google ScholarDigital Library
    42. John Schulman, Philipp Moritz, Sergey Levine, Michael I. Jordan, and Pieter Abbeel. 2015b. High-Dimensional Continuous Control Using Generalized Advantage Estimation. CoRR abs/1506.02438 (2015). arXiv:1506.02438Google Scholar
    43. John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal Policy Optimization Algorithms. CoRR abs/1707.06347 (2017). arXiv:1707.06347Google Scholar
    44. Pierre Sermanet, Corey Lynch, Jasmine Hsu, and Sergey Levine. 2017. Time-Contrastive Networks: Self-Supervised Learning from Multi-View Observation. CoRR abs/1704.06888 (2017). arXiv:1704.06888 http://arxiv.org/abs/1704.06888Google Scholar
    45. SFU. 2018. SFU Motion Capture Database. (2018). http://mocap.cs.sfu.ca.Google Scholar
    46. R. Sutton, D. Mcallester, S. Singh, and Y. Mansour. 2001. Policy Gradient Methods for Reinforcement Learning with Function Approximation. (2001), 1057–1063 pages. Google ScholarDigital Library
    47. Richard S. Sutton and Andrew G. Barto. 1998. Introduction to Reinforcement Learning (1st ed.). MIT Press, Cambridge, MA, USA. Google ScholarDigital Library
    48. C. Taylor. 2000. Reconstruction of articulated objects from point correspondences in single uncalibrated image. Computer Vision and Image Understanding, CVIU 80, 10 (2000), 349–363. Google ScholarDigital Library
    49. Yee Whye Teh, Victor Bapst, Wojciech Marian Czarnecki, John Quan, James Kirkpatrick, Raia Hadsell, Nicolas Heess, and Razvan Pascanu. 2017. Distral: Robust Multitask Reinforcement Learning. CoRR abs/1707.04175 (2017). arXiv:1707.04175Google Scholar
    50. Bugra Tekin, Artem Rozantsev, Vincent Lepetit, and Pascal Fua. 2016. Direct Prediction of 3D Body Poses from Motion Compensated Sequences. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR. 991–1000.Google ScholarCross Ref
    51. Jonathan J Tompson, Arjun Jain, Yann LeCun, and Christoph Bregler. 2014. Joint training of a convolutional network and a graphical model for human pose estimation. In Advances in neural information processing systems. 1799–1807. Google ScholarDigital Library
    52. Alexander Toshev and Christian Szegedy. 2014. DeepPose: Human Pose Estimation via Deep Neural Networks. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR. 1653–1660. Google ScholarDigital Library
    53. Hsiao-Yu Tung, Hsiao-Wei Tung, Ersin Yumer, and Katerina Fragkiadaki. 2017. Self-supervised Learning of Motion Capture. In Advances in Neural Information Processing Systems. 5242–5252. Google ScholarDigital Library
    54. Marek Vondrak, Leonid Sigal, Jessica Hodgins, and Odest Jenkins. 2012. Video-based 3D Motion Capture Through Biped Control. ACM Trans. Graph. 31, 4, Article 27 (July 2012), 12 pages. Google ScholarDigital Library
    55. Kevin Wampler, Zoran Popović, and Jovan Popović. 2014. Generalizing Locomotion Style to New Animals with Inverse Optimal Regression. ACM Trans. Graph. 33, 4, Article 49 (July 2014), 11 pages. Google ScholarDigital Library
    56. Jack M. Wang, David J. Fleet, and Aaron Hertzmann. 2010. Optimizing Walking Controllers for Uncertain Inputs and Environments. In ACM SIGGRAPH 2010 Papers (SIGGRAPH ’10). ACM, New York, NY, USA, Article 73, 8 pages. Google ScholarDigital Library
    57. Jack M. Wang, Samuel R. Hamner, Scott L. Delp, and Vladlen Koltun. 2012. Optimizing Locomotion Controllers Using Biologically-based Actuators and Objectives. ACM Trans. Graph. 31, 4, Article 25 (July 2012), 11 pages. Google ScholarDigital Library
    58. Shih-En Wei, Varun Ramakrishna, Takeo Kanade, and Yaser Sheikh. 2016. Convolutional Pose Machines. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR. 4724–4732.Google Scholar
    59. Jungdam Won, Jongho Park, Kwanyu Kim, and Jehee Lee. 2017. How to Train Your Dragon: Example-guided Control of Flapping Flight. ACM Trans. Graph. 36, 6, Article 198 (Nov. 2017), 13 pages. Google ScholarDigital Library
    60. Weipeng Xu, Avishek Chatterjee, Michael Zollhoefer, Helge Rhodin, Dushyant Mehta, Hans-Peter Seidel, and Christian Theobalt. 2018. MonoPerfCap: Human Performance Capture from Monocular Video. ACM Transactions on Graphics (2018). http://gvv.mpi-inf.mpg.de/projects/wxu/MonoPerfCap Google ScholarDigital Library
    61. Tianhe Yu, Chelsea Finn, Annie Xie, Sudeep Dasari, Tianhao Zhang, Pieter Abbeel, and Sergey Levine. 2018a. One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning. CoRR abs/1802.01557 (2018). arXiv:1802.01557 http://arxiv.org/abs/1802.01557Google Scholar
    62. Wenhao Yu, Greg Turk, and C. Karen Liu. 2018b. Learning Symmetry and Low-energy Locomotion. CoRR abs/1801.08093 (2018). arXiv:1801.08093 http://arxiv.org/abs/1801.08093 Google ScholarDigital Library
    63. Xingyi Zhou, Qixing Huang, Xiao Sun, Xiangyang Xue, and Yichen Wei. 2017. Weakly-supervised Transfer for 3D Human Pose Estimation in the Wild. In IEEE International Conference on Computer Vision, ICCV.Google Scholar
    64. Xingyi Zhou, Xiao Sun, Wei Zhang, Shuang Liang, and Yichen Wei. 2016. Deep Kinematic Pose Regression. In ECCV Workshop on Geometry Meets Deep Learning. 186–201.Google Scholar
    65. X. Zhou, M. Zhu, S. Leonardos, K. Derpanis, and K. Daniilidis. 2015. Sparse Representation for 3D Shape Estimation: A Convex Relaxation Approach. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR. 4447–4455.Google Scholar
    66. X. Zhou, M. Zhu, S. Leonardos, K. Derpanis, and K. Daniilidis. 2016. Sparseness Meets Deepness: 3D Human Pose Estimation from Monocular Video. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR. 4966–4975.Google Scholar


ACM Digital Library Publication:



Overview Page:



Submit a story:

If you would like to submit a story about this presentation, please contact us: historyarchives@siggraph.org