“High-fidelity facial reflectance and geometry inference from an unconstrained image” by Yamaguchi, Saito, Nagano, Zhao, Chen, et al. …

  • ©

Conference:


Type(s):


Entry Number: 162

Title:

    High-fidelity facial reflectance and geometry inference from an unconstrained image

Session/Category Title:   Portraits & Speech


Presenter(s)/Author(s):


Moderator(s):



Abstract:


    We present a deep learning-based technique to infer high-quality facial reflectance and geometry given a single unconstrained image of the subject, which may contain partial occlusions and arbitrary illumination conditions. The reconstructed high-resolution textures, which are generated in only a few seconds, include high-resolution skin surface reflectance maps, representing both the diffuse and specular albedo, and medium- and high-frequency displacement maps, thereby allowing us to render compelling digital avatars under novel lighting conditions. To extract this data, we train our deep neural networks with a high-quality skin reflectance and geometry database created with a state-of-the-art multi-view photometric stereo system using polarized gradient illumination. Given the raw facial texture map extracted from the input image, our neural networks synthesize complete reflectance and displacement maps, as well as complete missing regions caused by occlusions. The completed textures exhibit consistent quality throughout the face due to our network architecture, which propagates texture features from the visible region, resulting in high-fidelity details that are consistent with those seen in visible regions. We describe how this highly underconstrained problem is made tractable by dividing the full inference into smaller tasks, which are addressed by dedicated neural networks. We demonstrate the effectiveness of our network design with robust texture completion from images of faces that are largely occluded. With the inferred reflectance and geometry data, we demonstrate the rendering of high-fidelity 3D avatars from a variety of subjects captured under different lighting conditions. In addition, we perform evaluations demonstrating that our method can infer plausible facial reflectance and geometric details comparable to those obtained from high-end capture devices, and outperform alternative approaches that require only a single unconstrained input image.

References:


    1. M. Aittala, T. Aila, and J. Lehtinen. 2016. Reflectance modeling by neural texture synthesis. ACM Trans. Graph. 35, 4 (2016), 65. Google ScholarDigital Library
    2. O. Alexander, M. Rogers, W. Lambeth, M. Chiang, and P. Debevec. 2009. The Digital Emily Project: Photoreal Facial Modeling and Animation. In ACM SIGGRAPH 2009 Courses. ACM, New York, NY, USA, Article 12, 12:1–12:15 pages. Google ScholarDigital Library
    3. J. T. Barron and J. Malik. 2015a. Shape, illumination, and reflectance from shading. IEEE Transactions on Pattern Analysis and Machine Intelligence 37, 8 (2015), 1670–1687.Google ScholarDigital Library
    4. J. T. Barron and J. Malik. 2015b. Shape, Illumination, and Reflectance from Shading. IEEE Transactions on Pattern Analysis and Machine Intelligence (2015).Google Scholar
    5. T. Beeler, B. Bickel, P. Beardsley, B. Sumner, and M. Gross. 2010. High-quality single-shot capture of facial geometry. In ACM Trans. Graph., Vol. 29. ACM, 40. Google ScholarDigital Library
    6. T. Beeler, F. Hahn, D. Bradley, B. Bickel, P. Beardsley, C. Gotsman, R. W. Sumner, and M. Gross. 2011. High-quality passive facial performance capture using anchor frames. In ACM Trans. Graph., Vol. 30. ACM, 75. Google ScholarDigital Library
    7. V. Blanz and T. Vetter. 1999. A morphable model for the synthesis of 3D faces. In Proc. SIGGRAPH. 187–194. Google ScholarDigital Library
    8. J. Booth, A. Roussos, S. Zafeiriou, A. Ponniah, and D. Dunaway 2016. A 3d morphable model learnt from 10,000 faces. In Proc. CVPR. 5543–5552.Google ScholarCross Ref
    9. D. Bradley, T. Beeler, K. Mitchell, and others. 2017. Real-Time Multi-View Facial Capture with Synthetic Training. In Computer Graphics Forum, Vol. 36. Wiley Online Library, 325–336. Google ScholarDigital Library
    10. C. Cao, D. Bradley, K. Zhou, and T. Beeler. 2015. Real-time high-fidelity facial performance capture. ACM Trans. Graph. 34, 4 (2015), 46. Google ScholarDigital Library
    11. C. Cao, H. Wu, Y. Weng, T. Shao, and K. Zhou. 2016. Real-time facial animation with image-based dynamic avatars. ACM Trans. Graph. 35, 4 (2016), 126. Google ScholarDigital Library
    12. P. Debevec, T. Hawkins, C. Tchou, H.-P. Duiker, and W. Sarokin. 2000. Acquiring the Reflectance Field of a Human Face. In Proc. SIGGRAPH. Google ScholarDigital Library
    13. R. Donner, M. Reiter, G. Langs, P. Peloschek, and H. Bischof. 2006. Fast Active Appearance Model Search Using Canonical Correlation Analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 10 (2006), 1690–1694. Google ScholarDigital Library
    14. C. N Duong, K. Luu, K. G. Quach, and T. D. Bui. 2015. Beyond principal components: Deep boltzmann machines for face modeling. In Proc. CVPR. 4786–4794.Google Scholar
    15. G.J. Edwards, C.J. Taylor, and T. F. Cootes. 1998. Interpreting Face Images Using Active Appearance Models. In Proceedings of the 3rd. International Conference on Face and Gesture Recognition (FG ’98). IEEE Computer Society, 300–. Google ScholarDigital Library
    16. A. A. Efros and W. T. Freeman. 2001. Image Quilting for Texture Synthesis and Transfer. In Proc. SIGGRAPH. ACM, 341–346. Google ScholarDigital Library
    17. A. A. Efros and T. K. Leung. 1999. Texture Synthesis by Non-Parametric Sampling. In IEEE ICCV. 1033–. Google ScholarDigital Library
    18. G. Fyffe, A. Jones, O. Alexander, R. Ichikari, and P. Debevec. 2014. Driving high-resolution facial scans with video performance capture. ACM Trans. Graph. 34, 1 (2014), 8. Google ScholarDigital Library
    19. G. Fyffe, K. Nagano, L. Huynh, S. Saito, J. Busch, A. Jones, H. Li, and P. Debevec. 2017. Multi-View Stereo on Consistent Face Topology. In Computer Graphics Forum, Vol. 36. Wiley Online Library, 295–309. Google ScholarDigital Library
    20. P. Garrido, L. Valgaerts, C. Wu, and C. Theobalt. 2013. Reconstructing Detailed Dynamic Face Geometry from Monocular Video. In ACM Trans. Graph., Vol. 32. 158:1–158:10. Google ScholarDigital Library
    21. L. A. Gatys, M. Bethge, A. Hertzmann, and E. Shechtman. 2016. Preserving Color in Neural Artistic Style Transfer. CoRR abs/1606.05897 (2016).Google Scholar
    22. L. A. Gatys, A. S. Ecker, and M. Bethge. 2015. Texture synthesis and the controlled generation of natural stimuli using convolutional neural networks. CoRR abs/1505.07376 (2015). Google ScholarDigital Library
    23. A. Ghosh, G. Fyffe, B. Tunwattanapong, J. Busch, X. Yu, and P. Debevec. 2011. Multiview Face Capture Using Polarized Spherical Gradient Illumination. ACM Trans. Graph. 30, 6, Article 129 (2011), 129:1–129:10 pages. Google ScholarDigital Library
    24. M. Glencross, G.J. Ward, F. Melendez, C.Jay, J. Liu, and R. Hubbold. 2008. A perceptually validated model for surface depth hallucination. ACM Trans. Graph. 27, 3 (2008), 59. Google ScholarDigital Library
    25. A. Golovinskiy, W. Matusik, H. Pfister, S. Rusinkiewicz, and T Funkhouser. 2006. A Statistical Model for Synthesis of Detailed Facial Geometry. ACM Trans. Graph. 25, 3 (2006), 1025–1034. Google ScholarDigital Library
    26. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. 2014. Generative Adversarial Nets. In Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N D. Lawrence, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 2672–2680. Google ScholarDigital Library
    27. P. F. Gotardo, T. Simon, Y. Sheikh, and I. Matthews. 2015. Photogeometric scene flow for high-detail dynamic 3d reconstruction. In Proc. ICCV. 846–854. Google ScholarDigital Library
    28. P. Graham, B. Tunwattanapong, J. Busch, X. Yu, A. Jones, P. Debevec, and A. Ghosh. 2013a. Measurement-Based Synthesis of Facial Microgeometry. In Computer Graphics Forum, Vol. 32. Wiley Online Library, 335–344. Google ScholarDigital Library
    29. P. Graham, B. Tunwattanapong, J. Busch, X. Yu, A. Jones, P. Debevec, and A. Ghosh. 2013b. Measurement-based Synthesis of Facial Microgeometry. In EUROGRAPHICS. Google ScholarDigital Library
    30. J. Han, K. Zhou, L.-Y. Wei, M. Gong, H. Bao, X. Zhang, and B. Guo. 2006. Fast example-based surface texture synthesis via discrete optimization. The Visual Computer 22, 9–11 (2006), 918–925. Google ScholarDigital Library
    31. A. Haro, B. Guenterz, and I. Essay. 2001. Real-time, Photo-realistic, Physically Based Rendering of Fine Scale Human Skin Structure. In Eurographics Workshop on Rendering, S. J. Gortle and K. Myszkowski (Eds.). Google ScholarDigital Library
    32. L. Hu, S. Saito, L. Wei, K. Nagano, J. Seo, J. Fursund, I. Sadeghi, C. Sun, Y.-C. Chen, and H. Li. 2017. Avatar Digitization From a Single Image For Real-Time Rendering. ACM Trans. Graph. 36, 6 (2017). Google ScholarDigital Library
    33. A. E. Ichim, S. Bouaziz, and M. Pauly. 2015. Dynamic 3D Avatar Creation from Handheld Video Input. ACM Trans. Graph. 34, 4, Article 45 (2015), 45:1–45:14 pages. Google ScholarDigital Library
    34. S. Iizuka, E. Simo-Serra, and H. Ishikawa. 2017. Globally and Locally Consistent Image Completion. ACM Trans. Graph. 36, 4, Article 107 (2017), 107:1–107:14 pages. Google ScholarDigital Library
    35. P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros. 2016. Image-to-image translation with conditional adversarial networks. arXiv:1611.07004 (2016).Google Scholar
    36. M. K.Johnson, F. Cole, A. Raj, and E. H. Adelson. 2011. Microgeometry Capture using an Elastomeric Sensor. ACM Trans. Graph 30, 4 (2011), 46:1–46:8. Google ScholarDigital Library
    37. T. Karras, T. Aila, S. Laine, and J. Lehtinen. 2017. Progressive Growing of GANs for Improved Quality, Stability, and Variation. CoRR abs/1710.10196 (2017).Google Scholar
    38. I. Kemelmacher-Shlizerman. 2013. Internet-based Morphable Model. IEEE ICCV (2013). Google ScholarDigital Library
    39. I. Kemelmacher-Shlizerman and R. Basri. 2011. 3D face reconstruction from a single image using a single reference face shape. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 2 (2011), 394–405. Google ScholarDigital Library
    40. I. Kemelmacher-Shlizerman and S. M. Seitz. 2011. Face reconstruction in the wild. In IEEE ICCV. IEEE, 1746–1753. Google ScholarDigital Library
    41. H. Kim, M. Zollhöfer, A. Tewari, J. Thies, C. Richardt, and C. Theobalt. 2018. Inverse-FaceNet: Deep Monocular Inverse Face Rendering. In Proc. CVPR.Google Scholar
    42. D. P. Kingma and J. Ba. 2014. Adam: A Method for Stochastic Optimization. CoRR abs/1412.6980 (2014).Google Scholar
    43. T. D. Kulkarni, W. F. Whitney, P. Kohli, and J. Tenenbaum. 2015. Deep Convolutional Inverse Graphics Network. In Advances in Neural Information Processing Systems 28, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.). Curran Associates, Inc., 2539–2547. Google ScholarDigital Library
    44. V. Kwatra, I. Essa, A. Bobick, and N. Kwatra. 2005. Texture optimization for example-based synthesis. ACM Trans. Graph. 24, 3 (2005), 795–802. Google ScholarDigital Library
    45. V. Kwatra, A. Schödl, I. Essa, G. Turk, and A. Bobick. 2003. Graphcut Textures: Image and Video Synthesis Using Graph Cuts. In Proc. SIGGRAPH. ACM, 277–286. Google ScholarDigital Library
    46. M. S. Langer and S. W. Zucker. 1994. Shape-from-shading on a cloudy day. JOSA A 11, 2 (1994), 467–478.Google ScholarCross Ref
    47. A. Lasram and S. Lefebvre. 2012. Parallel patch-based texture synthesis. In Proceedings of the Fourth ACM SIGGRAPH/Eurographics conference on High-Performance Graphics. Eurographics Association, 115–124. Google ScholarDigital Library
    48. C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, and others. 2016. Photo-realistic single image super-resolution using a generative adversarial network. arXiv:1609.04802 (2016).Google Scholar
    49. S. Lefebvre and H. Hoppe. 2006. Appearance-space texture synthesis. ACM Trans. Graph. 25, 3 (2006), 541–548. Google ScholarDigital Library
    50. C. Li, K. Zhou, and S. Lin. 2014. Intrinsic Face Image Decomposition with Human Face Priors. In Proc. ECCV (5)’14. 218–233.Google Scholar
    51. H. Li, L. Trutoiu, K. Olszewski, L. Wei, T. Trutna, P.-L. Hsieh, A. Nicholls,, A. Nicholls, and C. Ma. 2015. Facial Performance Sensing Head-Mounted Display. ACM Trans. Graph. 34, 4 (July 2015). Google ScholarDigital Library
    52. Y. Li, S. Liu, J. Yang, and M.-H. Yang. 2017. Generative Face Completion. In Proc. CVPR.Google ScholarCross Ref
    53. C. Liu, H.-Y. Shum, and W. T. Freeman. 2007. Face Hallucination: Theory and Practice. Int. J. Comput. Vision 75, 1 (2007), 115–134. Google ScholarDigital Library
    54. F. Liu, D. Zeng, J. Li, and Q.-j. Zhao. 2017. On 3D face reconstruction via cascaded regression in shape space. Frontiers of Information Technology & Electronic Engineering 18, 12(2017), 1978–1990.Google ScholarCross Ref
    55. Z. Liu, P. Luo, X. Wang, and X. Tang. 2015. Deep Learning Face Attributes in the Wild. In IEEE ICCV. Google ScholarDigital Library
    56. D.S. Ma, J. Correll, and B. Wittenbrink. 2015. The Chicago face database: A free stimulus set of faces and norming data. Behavior Research Methods 47, 4 (2015), 1122–1135.Google ScholarCross Ref
    57. W.-C. Ma, T. Hawkins, P. Peers, C.-F. Chabert, M. Weiss, and P. Debevec. 2007a. Rapid Acquisition of Specular and Diffuse Normal Maps from Polarized Spherical Gradient Illumination. In Proc. EGSR 2007. Eurographics Association, 183–194. Google ScholarDigital Library
    58. W.-C. Ma, T. Hawkins, P. Peers, C.-F. Chabert, M. Weiss, and P. Debevec. 2007b. Rapid Acquisition of Specular and Diffuse Normal Maps from Polarized Spherical Gradient Illumination. In Eurographics Symposium on Rendering. Google ScholarDigital Library
    59. W.-C. Ma, A. Jones, J.-Y. Chiang, T. Hawkins, S. Frederiksen, P. Peers, M. Vukovic, M. Ouhyoung, and P. Debevec. 2008. Facial Performance Synthesis Using Deformation-driven Polynomial Displacement Maps. In Proc. SIGGRAPH. ACM, 121:1–121:10. Google ScholarDigital Library
    60. I. Matthews and S. Baker. 2004. Active Appearance Models Revisited. Int. J. Comput. Vision 60, 2 (2004), 135–164. Google ScholarDigital Library
    61. S. McDonagh, M. Klaudiny, D. Bradley, T. Beeler, I. Matthews, and K. Mitchell. 2016. Synthetic prior design for real-time face tracking. In 3D Vision (3DV), 2016 Fourth International Conference on. IEEE, 639–648.Google Scholar
    62. U. Mohammed, S. J. D. Prince, and J. Kautz. 2009. Visio-lization: Generating Novel Facial Images. In ACM Trans. Graph. ACM, Article 57, 57:1–57:8 pages. Google ScholarDigital Library
    63. K. Nagano, G. Fyffe, O. Alexander, J. Barbič, H. Li, A. Ghosh, and P. Debevec. 2015. Skin Microstructure Deformation with Displacement Map Convolution. ACM Trans. Graph. 34, 4 (2015). Google ScholarDigital Library
    64. C. Nhan Duong, K. Luu, K. Gia Quach, and T. D. Bui. 2015. Beyond principal components: Deep boltzmann machines for face modeling. In Proc. CVPR. 4786–4794.Google Scholar
    65. K. Olszewski, Z. Li, C. Yang, Y. Zhou, R. Yu, Z. Huang, S. Xiang, S. Saito, P. Kohli, and H. Li. 2017. Realistic Dynamic Facial Textures From a Single Image Using GANs. In IEEE ICCV.Google Scholar
    66. K. Olszewski, J. J. Lim, S. Saito, and H. Li. 2016. High-Fidelity Facial and Speech Animation for VR HMDs. ACM Trans. Graph. 35, 6 (December 2016). Google ScholarDigital Library
    67. D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and A. A. Efros. 2016. Context encoders: Feature learning by inpainting. In Proc. CVPR. 2536–2544.Google Scholar
    68. A. Radford, L. Metz, and S. Chintala. 2015. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. CoRR abs/1511.06434 (2015).Google Scholar
    69. E. Richardson, M. Sela, and R. Kimmel. 2016. 3D face reconstruction by learning from synthetic data. In 3D Vision (3DV), 2016 Fourth International Conference on. IEEE, 460–469.Google Scholar
    70. E. Richardson, M. Sela, R. Or-El, and R. Kimmel. 2017. Learning detailed face reconstruction from a single image. In Proc. CVPR. IEEE, 5553–5562.Google Scholar
    71. S. Romdhani and T. Vetter. 2005. Estimating 3D Shape and Texture Using Pixel Intensity, Edges, Specular Highlights, Texture Constraints and a Prior.. In Proc. CVPR. 986–993. Google ScholarDigital Library
    72. S. Saito, T. Li, and H. Li. 2016. Real-Time Facial Segmentation and Performance Capture from RGB Input. In Proc. ECCV.Google Scholar
    73. S. Saito, L. Wei, L. Hu, K. Nagano, and H. Li. 2017. Photorealistic Facial Texture Inference Using Deep Neural Networks. In Proc. CVPR.Google Scholar
    74. M. Sela, E. Richardson, and R. Kimmel. 2017. Unrestricted facial geometry reconstruction using image-to-image translation. In IEEE ICCV. IEEE, 1585–1594.Google Scholar
    75. S. Sengupta, A. Kanazawa, C. D. Castillo, and D. Jacobs. 2017. SfSNet: Learning Shape, Reflectance and Illuminance of Faces in the Wild. arXiv.1712.01261 (2017).Google Scholar
    76. F. Shi, H.-T. Wu, X. Tong, and J. Chai. 2014. Automatic acquisition of high-fidelity facial performances using monocular videos. ACM Trans. Graph. 33, 6 (2014), 222. Google ScholarDigital Library
    77. Z. Shu, E. Yumer, S. Hadap, K. Sunkavalli, E. Shechtman, and D. Samaras. 2017. Neural Face Editing with Intrinsic Image Disentangling. arXiv:1704.04131 (2017).Google Scholar
    78. Solid Angle. 2016. (2016). http://www.solidangle.com/arnold/.Google Scholar
    79. S. Suwajanakorn, I. Kemelmacher-Shlizerman, and S. M. Seitz. 2014. Total moving face reconstruction. In Proc. ECCV. Springer, 796–812.Google Scholar
    80. A. Tewari, M. Zollhöfer, P. Garrido, F. Bernard, H. Kim, P. Pérez, and C. Theobalt. 2017a. Self-supervised Multi-level Face Model Learning for Monocular Reconstruction at over 250 Hz. arXiv.1712.02859 (2017).Google Scholar
    81. A. Tewari, M. Zollhöfer, H. Kim, P. Garrido, F. Bernard, P. Perez, and C. Theobalt. 2017b. Mofa: Model-based deep convolutional face autoencoder for unsupervised monocular reconstruction. In IEEE ICCV, Vol. 2.Google Scholar
    82. The Digital Human League. 2015. Digital Emily 2.0. (2015). http://gl.ict.usc.edu/Research/DigitalEmily2/.Google Scholar
    83. J. Thies, M. Zollhöfer, M. Stamminger, C. Theobalt, and M. Nießner. 2016a. Face2Face: Real-time Face Capture and Reenactment of RGB Videos. In Proc. CVPR.Google Scholar
    84. J. Thies, M. Zollöfer, M. Stamminger, C. Theobalt, and M. Nießner. 2016b. FaceVR: Real-Time Facial Reenactment and Eye Gaze Control in Virtual Reality. arXiv:1610.03151 (2016).Google Scholar
    85. M. Turk and A. Pentland. 1991. Eigenfaces for Recognition. J. Cognitive Neuroscience 3, 1 (1991), 71–86. Google ScholarDigital Library
    86. J. von der Pahlen, J. Jimenez, E. Danvoye, P. Debevec, G. Fyffe, and O. Alexander. 2014. Digital Ira and Beyond: Creating Real-time Photoreal Digital Actors. In ACM SIGGRAPH 2014 Courses. ACM, New York, NY, USA, Article 1, 1:1–1:384 pages. Google ScholarDigital Library
    87. L.-Y. Wei, S. Lefebvre, V. Kwatra, and G. Turk. 2009. State of the art in example-based texture synthesis. In Eurographics 2009, State of the Art Report, EG-STAR. Eurographics Association, 93–117.Google Scholar
    88. L.-Y. Wei and M. Levoy. 2000. Fast Texture Synthesis Using Tree-structured Vector Quantization. In Proc. SIGGRAPH. 479–488. Google ScholarDigital Library
    89. T. Weyrich, W. Matusik, H. Pfister, B. Bickel, C. Donner, C. Tu, J. McAndless, J. Lee, A. Ngan, H. W. Jensen, and M. Gross. 2006. Analysis of Human Faces using a Measurement-Based Skin Reflectance Model. ACM Trans. Graph. 25, 3 (2006), 1013–1024. Google ScholarDigital Library
    90. C. A. Wilson, A. Ghosh, P. Peers, J.-Y. Chiang, J. Busch, and P. Debevec. 2010. Temporal upsampling of performance geometry using photometric alignment. ACM Trans. Graph. 29, 2 (2010), 17. Google ScholarDigital Library
    91. C. Wu, D. Bradley, M. Gross, and T. Beeler. 2016. An anatomically-constrained local deformation model for monocular face capture. ACM Trans. Graph. 35, 4 (2016), 115. Google ScholarDigital Library
    92. R. A. Yeh*, C. Chen*, T. Y. Lim, S. A. G., M. Hasegawa-Johnson, and M. N. Do. 2017. Semantic Image Inpainting with Deep Generative Models. In Proc. CVPR. * equal contribution.Google Scholar
    93. H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia. 2017. Pyramid Scene Parsing Network. In Proc. CVPR.Google Scholar
    94. J.-Y. Zhu, R. Zhang, D. Pathak, T. Darrell, A. A. Efros, O. Wang, and E. Shechtman. 2017. Toward Multimodal Image-to-image Translation. In Advances in Neural Information Processing Systems 30.Google Scholar
    95. X. Zhu, Z. Lei, J. Yan, D. Yi, and S. Z. Li. 2015. High-fidelity pose and expression normalization for face recognition in the wild. In Proc. CVPR. 787–796.Google Scholar


ACM Digital Library Publication:



Overview Page: