“Ego3DPose: Capturing 3D Cues from Binocular Egocentric Views” by Kang, Lee, Zhang and Lee – ACM SIGGRAPH HISTORY ARCHIVES

“Ego3DPose: Capturing 3D Cues from Binocular Egocentric Views” by Kang, Lee, Zhang and Lee

  • 2023 SA_Technical_Papers_Kang_Ego3DPose_Capturing 3D Cues from Binocular Egocentric Views

Conference:


Type(s):


Title:

    Ego3DPose: Capturing 3D Cues from Binocular Egocentric Views

Session/Category Title:   Multidisciplinary Fusion


Presenter(s)/Author(s):



Abstract:


    We present Ego3DPose, a highly accurate binocular egocentric 3D pose reconstruction system. The binocular egocentric setup offers practicality and usefulness in various applications, however, it remains largely under-explored. It has been suffering from low pose estimation accuracy due to viewing distortion, severe self-occlusion, and limited field-of-view of the joints in egocentric 2D images. Here, we notice that two important 3D cues, stereo correspondences, and perspective, contained in the egocentric binocular input are neglected. Current methods heavily rely on 2D image features, implicitly learning 3D information, which introduces biases towards commonly observed motions and leads to low overall accuracy. We observe that they not only fail in challenging occlusion cases but also in estimating visible joint positions. To address these challenges, we propose two novel approaches. First, we design a two-path network architecture with a new path that estimates pose per limb independently. It learns to output the 3D orientation of each limb with confidence based on the specific limb’s heatmaps from a stereo view. It does not rely on full-body information and alleviates bias toward learned full-body poses. Second, we leverage the egocentric view of body limbs, which exhibits strong perspective variance (e.g., a significantly large-size hand when it is close to the camera). We propose a new perspective-aware representation using trigonometry, enabling the network to estimate the 3D orientation of limbs. Finally, we develop an end-to-end pose reconstruction network that synergizes both techniques. Our comprehensive evaluations demonstrate that Ego3DPose outperforms state-of-the-art models by a pose estimation error (i.e., MPJPE) reduction of 23.1% in the UnrealEgo dataset. Our qualitative results highlight the superiority of our approach across a range of scenarios and challenges

References:


    [1]
    Karan Ahuja, Chris Harrison, Mayank Goel, and Robert Xiao. 2019. MeCap: Whole-Body Digitization for Low-Cost VR/AR Headsets. In Proceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology (New Orleans, LA, USA) (UIST ’19). Association for Computing Machinery, New York, NY, USA, 453–462. https://doi.org/10.1145/3332165.3347889

    [2]
    Hiroyasu Akada, Jian Wang, Soshi Shimada, Masaki Takahashi, Christian Theobalt, and Vladislav Golyanik. 2022. UnrealEgo: A New Dataset for Robust Egocentric 3D Human Motion Capture. In European Conference on Computer Vision (ECCV).

    [3]
    Kristijan Bartol, David Bojanić, Tomislav Petković, and Tomislav Pribanić. 2022. Generalizable Human Pose Triangulation. In Proceedings of IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR).

    [4]
    Adrian Bulat and Georgios Tzimiropoulos. 2016. Human Pose Estimation via Convolutional Part Heatmap Regression. In Computer Vision – ECCV 2016, Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Springer International Publishing, Cham, 717–732.

    [5]
    Ching-Hang Chen and Deva Ramanan. 2017a. 3D Human Pose Estimation = 2D Pose Estimation + Matching. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017. IEEE Computer Society, 5759–5767. https://doi.org/10.1109/CVPR.2017.610

    [6]
    C. Chen and D. Ramanan. 2017b. 3D Human Pose Estimation = 2D Pose Estimation + Matching. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, 5759–5767. https://doi.org/10.1109/CVPR.2017.610

    [7]
    Xiao Chen and Genke Yang. 2018. Multi-Person Pose Estimation with LIMB Detection Heatmaps. In 2018 25th IEEE International Conference on Image Processing (ICIP). 4078–4082. https://doi.org/10.1109/ICIP.2018.8451055

    [8]
    Aaron Defazio and Konstantin Mishchenko. 2023. Learning-Rate-Free Learning by D-Adaptation. arxiv:2301.07733 [cs.LG]

    [9]
    Jihye Hwang, Sungheon Park, and Nojun Kwak. 2017. Athlete Pose Estimation by a Global-Local Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops.

    [10]
    Karim Iskakov, Egor Burkov, Victor Lempitsky, and Yury Malkov. 2019. Learnable Triangulation of Human Pose. In International Conference on Computer Vision (ICCV).

    [11]
    Ehsan Jahangiri and Alan L. Yuille. 2017. Generating Multiple Diverse Hypotheses for Human 3D Pose Consistent with 2D Joint Detections. In 2017 IEEE International Conference on Computer Vision Workshops (ICCVW). 805–814. https://doi.org/10.1109/ICCVW.2017.100

    [12]
    Arjun Jain, Jonathan Tompson, Mykhaylo Andriluka, Graham W. Taylor, and Christoph Bregler. 2013. Learning Human Pose Estimation Features with Convolutional Networks. CoRR abs/1312.7302 (2013).

    [13]
    Diederik Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In International Conference on Learning Representations (ICLR). San Diega, CA, USA.

    [14]
    Chen Li and Gim Hee Lee. 2019. Generating multiple hypotheses for 3d human pose estimation with mixture density network. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9887–9895.

    [15]
    Sijin Li and Antoni B. Chan. 2014. 3D Human Pose Estimation from Monocular Images with Deep Convolutional Neural Network. In Asian Conference on Computer Vision.

    [16]
    S. Li, W. Zhang, and A. B. Chan. 2015. Maximum-Margin Structured Learning with Deep Networks for 3D Human Pose Estimation. In 2015 IEEE International Conference on Computer Vision (ICCV). IEEE Computer Society, Los Alamitos, CA, USA, 2848–2856. https://doi.org/10.1109/ICCV.2015.326

    [17]
    Yuxuan Liu, Jianxin Yang, Xiao Gu, Yijun Chen, Yao Guo, and Guang-Zhong Yang. 2023. EgoFish3D: Egocentric 3D Pose Estimation from a Fisheye Camera via Self-Supervised Learning. IEEE Transactions on Multimedia (2023), 1–12. https://doi.org/10.1109/TMM.2023.3242551

    [18]
    Rytis Maskeliūnas, Audrius Kulikajevas, Robertas Damaševičius, Julius Griškevičius, and Aušra Adomavičienė. 2023. Biomac3D: 2D-to-3D Human Pose Analysis Model for Tele-Rehabilitation Based on Pareto Optimized Deep-Learning Architecture. Applied Sciences 13, 2 (2023). https://doi.org/10.3390/app13021116

    [19]
    Francesc Moreno-Noguer. 2016. 3D Human Pose Estimation from a Single Image via Distance Matrix Regression. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016), 1561–1570.

    [20]
    F. Moreno-Noguer. 2017. 3D Human Pose Estimation from a Single Image via Distance Matrix Regression. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, 1561–1570. https://doi.org/10.1109/CVPR.2017.170

    [21]
    Adrián Núñez-Marcos, Gorka Azkune, and Ignacio Arganda-Carreras. 2022. Egocentric Vision-based Action Recognition: A survey. Neurocomputing 472 (2022), 175–197. https://doi.org/10.1016/j.neucom.2021.11.081

    [22]
    Georgios Pavlakos, Xiaowei Zhou, Konstantinos G Derpanis, and Kostas Daniilidis. 2017. Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose. In Computer Vision and Pattern Recognition (CVPR).

    [23]
    G. Pavlakos, L. Zhu, X. Zhou, and K. Daniilidis. 2018. Learning to Estimate 3D Human Pose and Shape from a Single Color Image. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, 459–468. https://doi.org/10.1109/CVPR.2018.00055

    [24]
    Haibo Qiu, Chunyu Wang, Jingdong Wang, Naiyan Wang, and Wenjun Zeng. 2019. Cross View Fusion for 3D Human Pose Estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).

    [25]
    Helge Rhodin, Christian Richardt, Dan Casas, Eldar Insafutdinov, Mohammad Shafiei, Hans-Peter Seidel, Bernt Schiele, and Christian Theobalt. 2016. EgoCap: Egocentric Marker-Less Motion Capture with Two Fisheye Cameras. ACM Trans. Graph. 35, 6, Article 162 (dec 2016), 11 pages. https://doi.org/10.1145/2980179.2980235

    [26]
    Xiao Sun, Jiaxiang Shang, Shuang Liang, and Yichen Wei. 2017. Compositional Human Pose Regression. 2017 IEEE International Conference on Computer Vision (ICCV) (2017), 2621–2630.

    [27]
    Bugra Tekin, Isinsu Katircioglu, Mathieu Salzmann, Vincent Lepetit, and Pascal Fua. 2016a. Structured Prediction of 3D Human Pose with Deep Neural Networks. In Proceedings of the British Machine Vision Conference (BMVC), Edwin R. Hancock Richard C. Wilson and William A. P. Smith (Eds.). BMVA Press, Article 130, 11 pages. https://doi.org/10.5244/C.30.130

    [28]
    Bugra Tekin, Pablo Márquez-Neila, Mathieu Salzmann, and P. Fua. 2016b. Learning to Fuse 2D and 3D Image Cues for Monocular Body Pose Estimation. 2017 IEEE International Conference on Computer Vision (ICCV) (2016), 3961–3970.

    [29]
    D. Tome, T. Alldieck, P. Peluse, G. Pons-Moll, L. Agapito, H. Badino, and F. De la Torre. 2020. SelfPose: 3D Egocentric Pose Estimation from a Headset Mounted Camera. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020), 1–1. https://doi.org/10.1109/TPAMI.2020.3029700

    [30]
    Denis Tome, Patrick Peluse, Lourdes Agapito, and Hernan Badino. 2019. xR-EgoPose: Egocentric 3D Human Pose from an HMD Camera. In Proceedings of the IEEE International Conference on Computer Vision. 7728–7738.

    [31]
    Jonathan Tompson, Ross Goroshin, Arjun Jain, Yann LeCun, and Christoph Bregler. 2015. Efficient object localization using Convolutional Networks. In CVPR. IEEE Computer Society, 648–656. http://dblp.uni-trier.de/db/conf/cvpr/cvpr2015.html#TompsonGJLB15

    [32]
    Weipeng Xu, Avishek Chatterjee, Michael Zollhoefer, Helge Rhodin, Pascal Fua, Hans-Peter Seidel, and Christian Theobalt. 2019. Mo2Cap2 : Real-time Mobile 3D Motion Capture with a Cap-mounted Fisheye Camera. IEEE Transactions on Visualization and Computer Graphics (2019), 1–1. https://doi.org/10.1109/TVCG.2019.2898650

    [33]
    Dan Zecha, Moritz Einfalt, Christian Eggert, and Rainer Lienhart. 2018. Kinematic Pose Rectification for Performance Analysis and Retrieval in Sports. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops.

    [34]
    Yahui Zhang, Shaodi You, and Theo Gevers. 2021. Automatic Calibration of the Fisheye Camera for Egocentric 3D Human Pose Estimation from a Single Image. In 2021 IEEE Winter Conference on Applications of Computer Vision (WACV). 1771–1780. https://doi.org/10.1109/WACV48630.2021.00181

    [35]
    Zhe Zhang, Chunyu Wang, Weichao Qiu, Wenhu Qin, and Wenjun Zeng. 2020. AdaFuse: Adaptive Multiview Fusion for Accurate Human Pose Estimation in the Wild. IJCV (2020), 1–16.

    [36]
    Dongxu Zhao, Zhen Wei, Jisan Mahmud, and Jan-Michael Frahm. 2021. EgoGlass: Egocentric-View Human Pose Estimation From an Eyeglass Frame. In 2021 International Conference on 3D Vision (3DV). 32–41. https://doi.org/10.1109/3DV53792.2021.00014

    [37]
    K. Zhou, X. Han, N. Jiang, K. Jia, and J. Lu. 2022. HEMlets PoSh: Learning Part-Centric Heatmap Triplets for 3D Human Pose and Shape Estimation. IEEE Transactions on Pattern Analysis & Machine Intelligence 44, 06 (jun 2022), 3000–3014. https://doi.org/10.1109/TPAMI.2021.3051173


ACM Digital Library Publication:



Overview Page:



Submit a story:

If you would like to submit a story about this presentation, please contact us: historyarchives@siggraph.org