“High-fidelity facial reflectance and geometry inference from an unconstrained image” by Yamaguchi, Saito, Nagano, Zhao, Chen, et al. …
Conference:
Type(s):
Entry Number: 162
Title:
- High-fidelity facial reflectance and geometry inference from an unconstrained image
Session/Category Title: Portraits & Speech
Presenter(s)/Author(s):
- Shugo Yamaguchi
- Shun-Suke Saito
- Koki Nagano
- Yajie Zhao
- Weikai Chen
- Kyle Olszewski
- Shigeo Morishima
- Hao Li
Moderator(s):
Abstract:
We present a deep learning-based technique to infer high-quality facial reflectance and geometry given a single unconstrained image of the subject, which may contain partial occlusions and arbitrary illumination conditions. The reconstructed high-resolution textures, which are generated in only a few seconds, include high-resolution skin surface reflectance maps, representing both the diffuse and specular albedo, and medium- and high-frequency displacement maps, thereby allowing us to render compelling digital avatars under novel lighting conditions. To extract this data, we train our deep neural networks with a high-quality skin reflectance and geometry database created with a state-of-the-art multi-view photometric stereo system using polarized gradient illumination. Given the raw facial texture map extracted from the input image, our neural networks synthesize complete reflectance and displacement maps, as well as complete missing regions caused by occlusions. The completed textures exhibit consistent quality throughout the face due to our network architecture, which propagates texture features from the visible region, resulting in high-fidelity details that are consistent with those seen in visible regions. We describe how this highly underconstrained problem is made tractable by dividing the full inference into smaller tasks, which are addressed by dedicated neural networks. We demonstrate the effectiveness of our network design with robust texture completion from images of faces that are largely occluded. With the inferred reflectance and geometry data, we demonstrate the rendering of high-fidelity 3D avatars from a variety of subjects captured under different lighting conditions. In addition, we perform evaluations demonstrating that our method can infer plausible facial reflectance and geometric details comparable to those obtained from high-end capture devices, and outperform alternative approaches that require only a single unconstrained input image.
References:
1. M. Aittala, T. Aila, and J. Lehtinen. 2016. Reflectance modeling by neural texture synthesis. ACM Trans. Graph. 35, 4 (2016), 65. Google ScholarDigital Library
2. O. Alexander, M. Rogers, W. Lambeth, M. Chiang, and P. Debevec. 2009. The Digital Emily Project: Photoreal Facial Modeling and Animation. In ACM SIGGRAPH 2009 Courses. ACM, New York, NY, USA, Article 12, 12:1–12:15 pages. Google ScholarDigital Library
3. J. T. Barron and J. Malik. 2015a. Shape, illumination, and reflectance from shading. IEEE Transactions on Pattern Analysis and Machine Intelligence 37, 8 (2015), 1670–1687.Google ScholarDigital Library
4. J. T. Barron and J. Malik. 2015b. Shape, Illumination, and Reflectance from Shading. IEEE Transactions on Pattern Analysis and Machine Intelligence (2015).Google Scholar
5. T. Beeler, B. Bickel, P. Beardsley, B. Sumner, and M. Gross. 2010. High-quality single-shot capture of facial geometry. In ACM Trans. Graph., Vol. 29. ACM, 40. Google ScholarDigital Library
6. T. Beeler, F. Hahn, D. Bradley, B. Bickel, P. Beardsley, C. Gotsman, R. W. Sumner, and M. Gross. 2011. High-quality passive facial performance capture using anchor frames. In ACM Trans. Graph., Vol. 30. ACM, 75. Google ScholarDigital Library
7. V. Blanz and T. Vetter. 1999. A morphable model for the synthesis of 3D faces. In Proc. SIGGRAPH. 187–194. Google ScholarDigital Library
8. J. Booth, A. Roussos, S. Zafeiriou, A. Ponniah, and D. Dunaway 2016. A 3d morphable model learnt from 10,000 faces. In Proc. CVPR. 5543–5552.Google ScholarCross Ref
9. D. Bradley, T. Beeler, K. Mitchell, and others. 2017. Real-Time Multi-View Facial Capture with Synthetic Training. In Computer Graphics Forum, Vol. 36. Wiley Online Library, 325–336. Google ScholarDigital Library
10. C. Cao, D. Bradley, K. Zhou, and T. Beeler. 2015. Real-time high-fidelity facial performance capture. ACM Trans. Graph. 34, 4 (2015), 46. Google ScholarDigital Library
11. C. Cao, H. Wu, Y. Weng, T. Shao, and K. Zhou. 2016. Real-time facial animation with image-based dynamic avatars. ACM Trans. Graph. 35, 4 (2016), 126. Google ScholarDigital Library
12. P. Debevec, T. Hawkins, C. Tchou, H.-P. Duiker, and W. Sarokin. 2000. Acquiring the Reflectance Field of a Human Face. In Proc. SIGGRAPH. Google ScholarDigital Library
13. R. Donner, M. Reiter, G. Langs, P. Peloschek, and H. Bischof. 2006. Fast Active Appearance Model Search Using Canonical Correlation Analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 10 (2006), 1690–1694. Google ScholarDigital Library
14. C. N Duong, K. Luu, K. G. Quach, and T. D. Bui. 2015. Beyond principal components: Deep boltzmann machines for face modeling. In Proc. CVPR. 4786–4794.Google Scholar
15. G.J. Edwards, C.J. Taylor, and T. F. Cootes. 1998. Interpreting Face Images Using Active Appearance Models. In Proceedings of the 3rd. International Conference on Face and Gesture Recognition (FG ’98). IEEE Computer Society, 300–. Google ScholarDigital Library
16. A. A. Efros and W. T. Freeman. 2001. Image Quilting for Texture Synthesis and Transfer. In Proc. SIGGRAPH. ACM, 341–346. Google ScholarDigital Library
17. A. A. Efros and T. K. Leung. 1999. Texture Synthesis by Non-Parametric Sampling. In IEEE ICCV. 1033–. Google ScholarDigital Library
18. G. Fyffe, A. Jones, O. Alexander, R. Ichikari, and P. Debevec. 2014. Driving high-resolution facial scans with video performance capture. ACM Trans. Graph. 34, 1 (2014), 8. Google ScholarDigital Library
19. G. Fyffe, K. Nagano, L. Huynh, S. Saito, J. Busch, A. Jones, H. Li, and P. Debevec. 2017. Multi-View Stereo on Consistent Face Topology. In Computer Graphics Forum, Vol. 36. Wiley Online Library, 295–309. Google ScholarDigital Library
20. P. Garrido, L. Valgaerts, C. Wu, and C. Theobalt. 2013. Reconstructing Detailed Dynamic Face Geometry from Monocular Video. In ACM Trans. Graph., Vol. 32. 158:1–158:10. Google ScholarDigital Library
21. L. A. Gatys, M. Bethge, A. Hertzmann, and E. Shechtman. 2016. Preserving Color in Neural Artistic Style Transfer. CoRR abs/1606.05897 (2016).Google Scholar
22. L. A. Gatys, A. S. Ecker, and M. Bethge. 2015. Texture synthesis and the controlled generation of natural stimuli using convolutional neural networks. CoRR abs/1505.07376 (2015). Google ScholarDigital Library
23. A. Ghosh, G. Fyffe, B. Tunwattanapong, J. Busch, X. Yu, and P. Debevec. 2011. Multiview Face Capture Using Polarized Spherical Gradient Illumination. ACM Trans. Graph. 30, 6, Article 129 (2011), 129:1–129:10 pages. Google ScholarDigital Library
24. M. Glencross, G.J. Ward, F. Melendez, C.Jay, J. Liu, and R. Hubbold. 2008. A perceptually validated model for surface depth hallucination. ACM Trans. Graph. 27, 3 (2008), 59. Google ScholarDigital Library
25. A. Golovinskiy, W. Matusik, H. Pfister, S. Rusinkiewicz, and T Funkhouser. 2006. A Statistical Model for Synthesis of Detailed Facial Geometry. ACM Trans. Graph. 25, 3 (2006), 1025–1034. Google ScholarDigital Library
26. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. 2014. Generative Adversarial Nets. In Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N D. Lawrence, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 2672–2680. Google ScholarDigital Library
27. P. F. Gotardo, T. Simon, Y. Sheikh, and I. Matthews. 2015. Photogeometric scene flow for high-detail dynamic 3d reconstruction. In Proc. ICCV. 846–854. Google ScholarDigital Library
28. P. Graham, B. Tunwattanapong, J. Busch, X. Yu, A. Jones, P. Debevec, and A. Ghosh. 2013a. Measurement-Based Synthesis of Facial Microgeometry. In Computer Graphics Forum, Vol. 32. Wiley Online Library, 335–344. Google ScholarDigital Library
29. P. Graham, B. Tunwattanapong, J. Busch, X. Yu, A. Jones, P. Debevec, and A. Ghosh. 2013b. Measurement-based Synthesis of Facial Microgeometry. In EUROGRAPHICS. Google ScholarDigital Library
30. J. Han, K. Zhou, L.-Y. Wei, M. Gong, H. Bao, X. Zhang, and B. Guo. 2006. Fast example-based surface texture synthesis via discrete optimization. The Visual Computer 22, 9–11 (2006), 918–925. Google ScholarDigital Library
31. A. Haro, B. Guenterz, and I. Essay. 2001. Real-time, Photo-realistic, Physically Based Rendering of Fine Scale Human Skin Structure. In Eurographics Workshop on Rendering, S. J. Gortle and K. Myszkowski (Eds.). Google ScholarDigital Library
32. L. Hu, S. Saito, L. Wei, K. Nagano, J. Seo, J. Fursund, I. Sadeghi, C. Sun, Y.-C. Chen, and H. Li. 2017. Avatar Digitization From a Single Image For Real-Time Rendering. ACM Trans. Graph. 36, 6 (2017). Google ScholarDigital Library
33. A. E. Ichim, S. Bouaziz, and M. Pauly. 2015. Dynamic 3D Avatar Creation from Handheld Video Input. ACM Trans. Graph. 34, 4, Article 45 (2015), 45:1–45:14 pages. Google ScholarDigital Library
34. S. Iizuka, E. Simo-Serra, and H. Ishikawa. 2017. Globally and Locally Consistent Image Completion. ACM Trans. Graph. 36, 4, Article 107 (2017), 107:1–107:14 pages. Google ScholarDigital Library
35. P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros. 2016. Image-to-image translation with conditional adversarial networks. arXiv:1611.07004 (2016).Google Scholar
36. M. K.Johnson, F. Cole, A. Raj, and E. H. Adelson. 2011. Microgeometry Capture using an Elastomeric Sensor. ACM Trans. Graph 30, 4 (2011), 46:1–46:8. Google ScholarDigital Library
37. T. Karras, T. Aila, S. Laine, and J. Lehtinen. 2017. Progressive Growing of GANs for Improved Quality, Stability, and Variation. CoRR abs/1710.10196 (2017).Google Scholar
38. I. Kemelmacher-Shlizerman. 2013. Internet-based Morphable Model. IEEE ICCV (2013). Google ScholarDigital Library
39. I. Kemelmacher-Shlizerman and R. Basri. 2011. 3D face reconstruction from a single image using a single reference face shape. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 2 (2011), 394–405. Google ScholarDigital Library
40. I. Kemelmacher-Shlizerman and S. M. Seitz. 2011. Face reconstruction in the wild. In IEEE ICCV. IEEE, 1746–1753. Google ScholarDigital Library
41. H. Kim, M. Zollhöfer, A. Tewari, J. Thies, C. Richardt, and C. Theobalt. 2018. Inverse-FaceNet: Deep Monocular Inverse Face Rendering. In Proc. CVPR.Google Scholar
42. D. P. Kingma and J. Ba. 2014. Adam: A Method for Stochastic Optimization. CoRR abs/1412.6980 (2014).Google Scholar
43. T. D. Kulkarni, W. F. Whitney, P. Kohli, and J. Tenenbaum. 2015. Deep Convolutional Inverse Graphics Network. In Advances in Neural Information Processing Systems 28, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.). Curran Associates, Inc., 2539–2547. Google ScholarDigital Library
44. V. Kwatra, I. Essa, A. Bobick, and N. Kwatra. 2005. Texture optimization for example-based synthesis. ACM Trans. Graph. 24, 3 (2005), 795–802. Google ScholarDigital Library
45. V. Kwatra, A. Schödl, I. Essa, G. Turk, and A. Bobick. 2003. Graphcut Textures: Image and Video Synthesis Using Graph Cuts. In Proc. SIGGRAPH. ACM, 277–286. Google ScholarDigital Library
46. M. S. Langer and S. W. Zucker. 1994. Shape-from-shading on a cloudy day. JOSA A 11, 2 (1994), 467–478.Google ScholarCross Ref
47. A. Lasram and S. Lefebvre. 2012. Parallel patch-based texture synthesis. In Proceedings of the Fourth ACM SIGGRAPH/Eurographics conference on High-Performance Graphics. Eurographics Association, 115–124. Google ScholarDigital Library
48. C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, and others. 2016. Photo-realistic single image super-resolution using a generative adversarial network. arXiv:1609.04802 (2016).Google Scholar
49. S. Lefebvre and H. Hoppe. 2006. Appearance-space texture synthesis. ACM Trans. Graph. 25, 3 (2006), 541–548. Google ScholarDigital Library
50. C. Li, K. Zhou, and S. Lin. 2014. Intrinsic Face Image Decomposition with Human Face Priors. In Proc. ECCV (5)’14. 218–233.Google Scholar
51. H. Li, L. Trutoiu, K. Olszewski, L. Wei, T. Trutna, P.-L. Hsieh, A. Nicholls,, A. Nicholls, and C. Ma. 2015. Facial Performance Sensing Head-Mounted Display. ACM Trans. Graph. 34, 4 (July 2015). Google ScholarDigital Library
52. Y. Li, S. Liu, J. Yang, and M.-H. Yang. 2017. Generative Face Completion. In Proc. CVPR.Google ScholarCross Ref
53. C. Liu, H.-Y. Shum, and W. T. Freeman. 2007. Face Hallucination: Theory and Practice. Int. J. Comput. Vision 75, 1 (2007), 115–134. Google ScholarDigital Library
54. F. Liu, D. Zeng, J. Li, and Q.-j. Zhao. 2017. On 3D face reconstruction via cascaded regression in shape space. Frontiers of Information Technology & Electronic Engineering 18, 12(2017), 1978–1990.Google ScholarCross Ref
55. Z. Liu, P. Luo, X. Wang, and X. Tang. 2015. Deep Learning Face Attributes in the Wild. In IEEE ICCV. Google ScholarDigital Library
56. D.S. Ma, J. Correll, and B. Wittenbrink. 2015. The Chicago face database: A free stimulus set of faces and norming data. Behavior Research Methods 47, 4 (2015), 1122–1135.Google ScholarCross Ref
57. W.-C. Ma, T. Hawkins, P. Peers, C.-F. Chabert, M. Weiss, and P. Debevec. 2007a. Rapid Acquisition of Specular and Diffuse Normal Maps from Polarized Spherical Gradient Illumination. In Proc. EGSR 2007. Eurographics Association, 183–194. Google ScholarDigital Library
58. W.-C. Ma, T. Hawkins, P. Peers, C.-F. Chabert, M. Weiss, and P. Debevec. 2007b. Rapid Acquisition of Specular and Diffuse Normal Maps from Polarized Spherical Gradient Illumination. In Eurographics Symposium on Rendering. Google ScholarDigital Library
59. W.-C. Ma, A. Jones, J.-Y. Chiang, T. Hawkins, S. Frederiksen, P. Peers, M. Vukovic, M. Ouhyoung, and P. Debevec. 2008. Facial Performance Synthesis Using Deformation-driven Polynomial Displacement Maps. In Proc. SIGGRAPH. ACM, 121:1–121:10. Google ScholarDigital Library
60. I. Matthews and S. Baker. 2004. Active Appearance Models Revisited. Int. J. Comput. Vision 60, 2 (2004), 135–164. Google ScholarDigital Library
61. S. McDonagh, M. Klaudiny, D. Bradley, T. Beeler, I. Matthews, and K. Mitchell. 2016. Synthetic prior design for real-time face tracking. In 3D Vision (3DV), 2016 Fourth International Conference on. IEEE, 639–648.Google Scholar
62. U. Mohammed, S. J. D. Prince, and J. Kautz. 2009. Visio-lization: Generating Novel Facial Images. In ACM Trans. Graph. ACM, Article 57, 57:1–57:8 pages. Google ScholarDigital Library
63. K. Nagano, G. Fyffe, O. Alexander, J. Barbič, H. Li, A. Ghosh, and P. Debevec. 2015. Skin Microstructure Deformation with Displacement Map Convolution. ACM Trans. Graph. 34, 4 (2015). Google ScholarDigital Library
64. C. Nhan Duong, K. Luu, K. Gia Quach, and T. D. Bui. 2015. Beyond principal components: Deep boltzmann machines for face modeling. In Proc. CVPR. 4786–4794.Google Scholar
65. K. Olszewski, Z. Li, C. Yang, Y. Zhou, R. Yu, Z. Huang, S. Xiang, S. Saito, P. Kohli, and H. Li. 2017. Realistic Dynamic Facial Textures From a Single Image Using GANs. In IEEE ICCV.Google Scholar
66. K. Olszewski, J. J. Lim, S. Saito, and H. Li. 2016. High-Fidelity Facial and Speech Animation for VR HMDs. ACM Trans. Graph. 35, 6 (December 2016). Google ScholarDigital Library
67. D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and A. A. Efros. 2016. Context encoders: Feature learning by inpainting. In Proc. CVPR. 2536–2544.Google Scholar
68. A. Radford, L. Metz, and S. Chintala. 2015. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. CoRR abs/1511.06434 (2015).Google Scholar
69. E. Richardson, M. Sela, and R. Kimmel. 2016. 3D face reconstruction by learning from synthetic data. In 3D Vision (3DV), 2016 Fourth International Conference on. IEEE, 460–469.Google Scholar
70. E. Richardson, M. Sela, R. Or-El, and R. Kimmel. 2017. Learning detailed face reconstruction from a single image. In Proc. CVPR. IEEE, 5553–5562.Google Scholar
71. S. Romdhani and T. Vetter. 2005. Estimating 3D Shape and Texture Using Pixel Intensity, Edges, Specular Highlights, Texture Constraints and a Prior.. In Proc. CVPR. 986–993. Google ScholarDigital Library
72. S. Saito, T. Li, and H. Li. 2016. Real-Time Facial Segmentation and Performance Capture from RGB Input. In Proc. ECCV.Google Scholar
73. S. Saito, L. Wei, L. Hu, K. Nagano, and H. Li. 2017. Photorealistic Facial Texture Inference Using Deep Neural Networks. In Proc. CVPR.Google Scholar
74. M. Sela, E. Richardson, and R. Kimmel. 2017. Unrestricted facial geometry reconstruction using image-to-image translation. In IEEE ICCV. IEEE, 1585–1594.Google Scholar
75. S. Sengupta, A. Kanazawa, C. D. Castillo, and D. Jacobs. 2017. SfSNet: Learning Shape, Reflectance and Illuminance of Faces in the Wild. arXiv.1712.01261 (2017).Google Scholar
76. F. Shi, H.-T. Wu, X. Tong, and J. Chai. 2014. Automatic acquisition of high-fidelity facial performances using monocular videos. ACM Trans. Graph. 33, 6 (2014), 222. Google ScholarDigital Library
77. Z. Shu, E. Yumer, S. Hadap, K. Sunkavalli, E. Shechtman, and D. Samaras. 2017. Neural Face Editing with Intrinsic Image Disentangling. arXiv:1704.04131 (2017).Google Scholar
78. Solid Angle. 2016. (2016). http://www.solidangle.com/arnold/.Google Scholar
79. S. Suwajanakorn, I. Kemelmacher-Shlizerman, and S. M. Seitz. 2014. Total moving face reconstruction. In Proc. ECCV. Springer, 796–812.Google Scholar
80. A. Tewari, M. Zollhöfer, P. Garrido, F. Bernard, H. Kim, P. Pérez, and C. Theobalt. 2017a. Self-supervised Multi-level Face Model Learning for Monocular Reconstruction at over 250 Hz. arXiv.1712.02859 (2017).Google Scholar
81. A. Tewari, M. Zollhöfer, H. Kim, P. Garrido, F. Bernard, P. Perez, and C. Theobalt. 2017b. Mofa: Model-based deep convolutional face autoencoder for unsupervised monocular reconstruction. In IEEE ICCV, Vol. 2.Google Scholar
82. The Digital Human League. 2015. Digital Emily 2.0. (2015). http://gl.ict.usc.edu/Research/DigitalEmily2/.Google Scholar
83. J. Thies, M. Zollhöfer, M. Stamminger, C. Theobalt, and M. Nießner. 2016a. Face2Face: Real-time Face Capture and Reenactment of RGB Videos. In Proc. CVPR.Google Scholar
84. J. Thies, M. Zollöfer, M. Stamminger, C. Theobalt, and M. Nießner. 2016b. FaceVR: Real-Time Facial Reenactment and Eye Gaze Control in Virtual Reality. arXiv:1610.03151 (2016).Google Scholar
85. M. Turk and A. Pentland. 1991. Eigenfaces for Recognition. J. Cognitive Neuroscience 3, 1 (1991), 71–86. Google ScholarDigital Library
86. J. von der Pahlen, J. Jimenez, E. Danvoye, P. Debevec, G. Fyffe, and O. Alexander. 2014. Digital Ira and Beyond: Creating Real-time Photoreal Digital Actors. In ACM SIGGRAPH 2014 Courses. ACM, New York, NY, USA, Article 1, 1:1–1:384 pages. Google ScholarDigital Library
87. L.-Y. Wei, S. Lefebvre, V. Kwatra, and G. Turk. 2009. State of the art in example-based texture synthesis. In Eurographics 2009, State of the Art Report, EG-STAR. Eurographics Association, 93–117.Google Scholar
88. L.-Y. Wei and M. Levoy. 2000. Fast Texture Synthesis Using Tree-structured Vector Quantization. In Proc. SIGGRAPH. 479–488. Google ScholarDigital Library
89. T. Weyrich, W. Matusik, H. Pfister, B. Bickel, C. Donner, C. Tu, J. McAndless, J. Lee, A. Ngan, H. W. Jensen, and M. Gross. 2006. Analysis of Human Faces using a Measurement-Based Skin Reflectance Model. ACM Trans. Graph. 25, 3 (2006), 1013–1024. Google ScholarDigital Library
90. C. A. Wilson, A. Ghosh, P. Peers, J.-Y. Chiang, J. Busch, and P. Debevec. 2010. Temporal upsampling of performance geometry using photometric alignment. ACM Trans. Graph. 29, 2 (2010), 17. Google ScholarDigital Library
91. C. Wu, D. Bradley, M. Gross, and T. Beeler. 2016. An anatomically-constrained local deformation model for monocular face capture. ACM Trans. Graph. 35, 4 (2016), 115. Google ScholarDigital Library
92. R. A. Yeh*, C. Chen*, T. Y. Lim, S. A. G., M. Hasegawa-Johnson, and M. N. Do. 2017. Semantic Image Inpainting with Deep Generative Models. In Proc. CVPR. * equal contribution.Google Scholar
93. H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia. 2017. Pyramid Scene Parsing Network. In Proc. CVPR.Google Scholar
94. J.-Y. Zhu, R. Zhang, D. Pathak, T. Darrell, A. A. Efros, O. Wang, and E. Shechtman. 2017. Toward Multimodal Image-to-image Translation. In Advances in Neural Information Processing Systems 30.Google Scholar
95. X. Zhu, Z. Lei, J. Yan, D. Yi, and S. Z. Li. 2015. High-fidelity pose and expression normalization for face recognition in the wild. In Proc. CVPR. 787–796.Google Scholar