Corrective 3D reconstruction of lips from monocular video

In facial animation, the accurate shape and motion of the lips of virtual humans is of paramount importance, since subtle nuances in mouth expression strongly influence the interpretation of speech and the conveyed emotion. Unfortunately, passive photometric reconstruction of expressive lip motions, such as a kiss or rolling lips, is fundamentally hard even with multi-view methods in controlled studios. To alleviate this problem, we present a novel approach for fully automatic reconstruction of detailed and expressive lip shapes along with the dense geometry of the entire face, from just monocular RGB video. To this end, we learn the difference between inaccurate lip shapes found by a state-of-the-art monocular facial performance capture approach, and the true 3D lip shapes reconstructed using a high-quality multi-view system in combination with applied lip tattoos that are easy to track. A robust gradient domain regressor is trained to infer accurate lip shapes from coarse monocular reconstructions, with the additional help of automatically extracted inner and outer 2D lip contours. We quantitatively and qualitatively show that our monocular approach reconstructs higher quality lip shapes, even for complex shapes like a kiss or lip rolling, than previous monocular approaches. Furthermore, we compare the performance of person-specific and multi-person generic regression strategies and show that our approach generalizes to new individuals and general scenes, enabling high-fidelity reconstruction even from commodity video footage.

References:

1. Alexa, M. 2002. Linear combination of transformations. ACM TOG 21, 3, 380–387.
2. Alexander, O., Rogers, M., Lambeth, W., Chiang, J., Ma, W., Wang, C., and Debevec, P. E. 2010. The digital emily project: Achieving a photorealistic digital actor. IEEE CGAA 30, 4, 20–31.
3. Alexander, O., Fyffe, G., Busch, J., Yu, X., Ichikari, R., Jones, A., Debevec, P., Jimenez, J., Danvoye, E., Antionazzi, B., Eheler, M., Kysela, Z., and von der Pahlen, J. 2013. Digital Ira: Creating a real-time photoreal digital actor. In ACM Siggrah Posters.
4. Anderson, R., Stenger, B., and Cipolla, R. 2013. Lip tracking for 3D face registration. In Proc. MVA, 145–148.
5. Barnard, M., Holden, E. J., and Owens, R. 2002. Lip tracking using pattern matching snakes. In Proc. ACCV, 1–6.
6. Beeler, T., Bickel, B., Beardsley, P., Sumner, B., and Gross, M. 2010. High-quality single-shot capture of facial geometry. ACM TOG 29, 4, 40:1–40:9.
7. Beeler, T., Hahn, F., Bradley, D., Bickel, B., Beardsley, P., Gotsman, C., Sumner, R. W., and Gross, M. 2011. High-quality passive facial performance capture using anchor frames. ACM TOG 30, 4, 75:1–75:10.
8. Beeler, T., Bickel, B., Noris, G., Marschner, S., Beardsley, P., Sumner, R. W., and Gross, M. 2012. Coupled 3D reconstruction of sparse facial hair and skin. ACM TOG 31, 4, 117:1–117:10.
9. Bérard, P., Bradley, D., Nitti, M., Beeler, T., and Gross, M. 2014. High-quality capture of eyes. ACM TOG 33, 6, 223:1–223:12.
10. Bermano, A., Beeler, T., Kozlov, Y., Bradley, D., Bickel, B., and Gross, M. 2015. Detailed spatio-temporal reconstruction of eyelids. ACM TOG 34, 4, 44:1–44:11.
11. Bhat, K. S., Goldenthal, R., Ye, Y., Mallet, R., and Koperwas, M. 2013. High fidelity facial animation capture and retargeting with contours. In Proc. ACM SCA, 7–14.
12. Bickel, B., Botsch, M., Angst, R., Matusik, W., Otaduy, M. A., Pfister, H., and Gross, M. H. 2007. Multi-scale capture of facial geometry and motion. ACM TOG 26, 3, 33:1–33:10.
13. Bishop, C. M. 2006. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus, NJ, USA.
14. Blanz, V., and Vetter, T. 1999. A morphable model for the synthesis of 3D faces. In Proc. ACM Siggraph, 187–194.
15. Borshukov, G., Piponi, D., Larsen, O., Lewis, J. P., and Tempelaar-Lietz, C. 2003. Universal capture: Image-based facial animation for “The Matrix Reloaded”. In ACM SIGGRAPH 2003 Sketches & Applications.
16. Bouaziz, S., Wang, Y., and Pauly, M. 2013. Online modeling for realtime facial animation. ACM TOG 32, 4, 40:1–40:10.
17. Bradley, D., Heidrich, W., Popa, T., and Sheffer, A. 2010. High resolution passive facial performance capture. ACM TOG 29, 4, 41:1–41:10.
18. Cao, C., Hou, Q., and Zhou, K. 2014. Displaced dynamic expression regression for real-time facial tracking and animation. ACM TOG 33, 4, 43:1–43:10.
19. Cao, C., Bradley, D., Zhou, K., and Beeler, T. 2015. Real-time high-fidelity facial performance capture. ACM TOG 34, 4, 46:1–46:9.
20. Chen, Y.-L., Wu, H.-T., Shi, F., Tong, X., and Chai, J. 2013. Accurate and robust 3D facial capture using a single RGBD camera. In Proc. ICCV, 3615–3622.
21. Cootes, T. F., Edwards, G. J., and Taylor, C. J. 2001. Active appearance models. IEEE Trans. Pattern Anal. Machine Intell. 23, 6, 681–685.
22. Dale, K., Sunkavalli, K., Johnson, M. K., Vlasic, D., Matusik, W., and Pfister, H. 2011. Video face replacement. ACM TOG 30, 6, 130:1–130:10.
23. Dollár, P., Tu, Z., and Belongie, S. 2006. Supervised learning of edges and object boundaries. In Proc. CVPR, 1964–1971.
24. Echevarria, J. I., Bradley, D., Gutierrez, D., and Beeler, T. 2014. Capturing and stylizing hair for 3D fabrication. ACM TOG 33, 4, 125:1–125:11.
25. Eveno, N., Caplier, A., and Coulon, P. Y. 2004. Accurate and quasi-automatic lip tracking. IEEE Trans. Circuit and Systems for Video Tech. 14, 5, 706–715.
26. Fyffe, G., Jones, A., Alexander, O., Ichikari, R., and Debevec, P. 2014. Driving high-resolution facial scans with video performance capture. ACM TOG 34, 1, 8:1–8:14.
27. Garrido, P., Valgaerts, L., Wu, C., and Theobalt, C. 2013. Reconstructing detailed dynamic face geometry from monocular video. ACM TOG 32, 6, 158:1–158:10.
28. Garrido, P., Valgaerts, L., Sarmadi, H., Steiner, I., Varanasi, K., Perez, P., and Theobalt, C. 2015. VDub: Modifying face video of actors for plausible visual alignment to a dubbed audio track. CGF 34, 2, 193–204.
29. Garrido, P., Zollhöfer, M., Casas, D., Valgaerts, L., Varanasi, K., Pérez, P., and Theobalt, C. 2016. Reconstruction of personalized 3D face rigs from monocular video. ACM TOG 35, 3, 28:1–28:15.
30. Ghosh, A., Fyffe, G., Tunwattanapong, B., Busch, J., Yu, X., and Debevec, P. 2011. Multiview face capture using polarized spherical gradient illumination. ACM TOG 30, 6, 129:1–129:10.
31. Graham, P., Tunwattanapong, B., Busch, J., Yu, X., Jones, A., Debevec, P. E., and Ghosh, A. 2013. Measurement-based synthesis of facial microgeometry. CGF 32, 2, 335–344. Cross Ref
32. Guenter, B., Grimm, C., Wood, D., Malvar, H., and Pighin, F. 1998. Making faces. In Proc. ACM Siggraph, 55–66.
33. Higham, N. J. 1986. Computing the polar decomposition with applications. SIAM J. Sci. Stat. Comput. 7, 4, 1160–1174.
34. Hoerl, A. E., and Kennard, R. W. 2000. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 42, 1, 80–86.
35. Hsieh, P.-L., Ma, C., Yu, J., and Li, H. 2015. Unconstrained realtime facial performance capture. In Proc. CVPR, 1675–1683.
36. Hu, L., Ma, C., Luo, L., and Li, H. 2015. Single-view hair modeling using a hairstyle database. ACM TOG 34, 4, 125:1–125:9.
37. Huang, H., Chai, J., Tong, X., and Wu, H.-T. 2011. Lever-aging motion capture and 3D scanning for high-fidelity facial performance acquisition. ACM TOG 30, 4, 74:1–74:10.
38. Ichim, A. E., Bouaziz, S., and Pauly, M. 2015. Dynamic 3D avatar creation from hand-held video input. ACM TOG 34, 4, 45:1–45:14.
39. Kaucic, R., and Blake, A. 1998. Accurate, real-time, unadorned lip tracking. In Proc. ICCV, 370–375.
40. Kawai, M., Iwao, T., Maejima, A., and Morishima, S. 2014. Automatic photorealistic 3D inner mouth restoration from frontal images. In Proc. ISVC, 51–62.
41. Kemelmacher-Shlizerman, I., Sankar, A., Shechtman, E., and Seitz, S. M. 2010. Being John Malkovich. In Proc. ECCV, 341–353.
42. Klaudiny, M., and Hilton, A. 2012. High-detail 3D capture and non-sequential alignment of facial performance. In Proc. 3DIMPVT, 17–24.
43. Lewis, J., and Anjyo, K.-i. 2010. Direct manipulation blend-shapes. IEEE Comp. Graphics and Applications 30, 4, 42–50.
44. Li, H., Yu, J., Ye, Y., and Bregler, C. 2013. Realtime facial animation with on-the-fly correctives. ACM TOG 32, 4, 42:1–42:10.
45. Liu, Y., Xu, F., Chai, J., Tong, X., Wang, L., and Huo, Q. 2015. Video-audio driven real-time facial animation. ACM Trans. Graph. 34, 6, 182:1–182:10.
46. Luo, L., Li, H., Paris, S., Weise, T., Pauly, M., and Rusinkiewicz, S. 2012. Multi-view hair capture using orientation fields. In Proc. CVPR, 1490–1497.
47. Nagano, K., Fyffe, G., Alexander, O., Barbič, J., Li, H., Ghosh, A., and Debevec, P. 2015. Skin microstructure deformation with displacement map convolution. ACM TOG 34, 4, 109:1–109:10.
48. Nath, A. R., and Beauchamp, M. S. 2012. A neural basis for interindividual differences in the McGurk effect, a multisensory speech illusion. NeuroImage 59, 1, 781–787. Cross Ref
49. Nguyen, Q. D., and Milgram, M. 2009. Semi adaptive appearance models for lip tracking. In Proc. ICIP, 2437–2440.
50. Pighin, F., and Lewis, J. 2006. Performance-driven facial animation. In ACM Siggraph Courses.
51. Saragih, J. M., Lucey, S., and Cohn, J. F. 2009. Face alignment through subspace constrained mean-shifts. In Proc. ICCV, 1034–1041.
52. Saragih, J. M., Lucey, S., and Cohn, J. F. 2011. Deformable model fitting by regularized landmark mean-shift. Int. J. Computer Vision 91, 2, 200–215.
53. Shi, F., Wu, H.-T., Tong, X., and Chai, J. 2014. Automatic acquisition of high-fidelity facial performances using monocular videos. ACM TOG 33, 6, 222:1–222:13.
54. Sifakis, E., Neverov, I., and Fedkiw, R. 2005. Automatic determination of facial muscle activations from sparse motion capture marker data. ACM TOG 24, 3, 417–425.
55. Sumner, R. W., and Popovic, J. 2004. Deformation transfer for triangle meshes. ACM TOG 23, 3, 399–405.
56. Suwajanakorn, S., Kemelmacher-Shlizerman, I., and Seitz, S. M. 2014. Total moving face reconstruction. In Proc. ECCV, 796–812.
57. Suwajanakorn, S., Seitz, S. M., and Kemelmacher-Shlizerman, I. 2015. What makes Tom Hanks look like Tom Hanks. In Proc. ICCV, 3952–3960.
58. Thies, J., Zollhöfer, M., Niessner, M., Valgaerts, L., Stamminger, M., and Theobalt, C. 2015. Real-time expression transfer for facial reenactment. ACM TOG 34, 6, 183:1–183:14.
59. Thies, J., Zollhöfer, M., Stamminger, M., Theobalt, C., and Niessner, M. 2016. Face2Face: Real-time face capture and reenactment of RGB videos. In Proc. CVPR.
60. Tian, Y.-L., Kanade, T., and Cohn, J. F. 2000. Robust lip tracking by combining shape, color and motion. In Proc. ACCV, 1–6.
61. Valgaerts, L., Wu, C., Bruhn, A., Seidel, H.-P., and Theobalt, C. 2012. Lightweight binocular facial performance capture under uncontrolled lighting. ACM TOG 31, 6, 187:1–187:11.
62. Vlasic, D., Brand, M., Pfister, H., and Popovic, J. 2005. Face transfer with multilinear models. ACM TOG 24, 3, 426–433.
63. Wang, S. L., Lau, W. H., and Leung, S. H. 2004. Automatic lip contour extraction from color images. Pattern Recogn. 37, 12, 2375–2387.
64. Wang, Y., Huang, X., Su Lee, C., Zhang, S., Li, Z., Samaras, D., Metaxas, D., Elgammal, A., and Huang, P. 2004. High resolution acquisition, learning and transfer of dynamic 3D facial expressions. CGF 23, 3, 677–686. Cross Ref
65. Weise, T., Li, H., Gool, L. J. V., and Pauly, M. 2009. Face/Off: Live facial puppetry. In Proc. ACM SCA, 7–16.
66. Weise, T., Bouaziz, S., Li, H., and Pauly, M. 2011. Realtime performance-based facial animation. ACM TOG 30, 77:1–77:10.
67. Wenger, A., Gardner, A., Tchou, C., Unger, J., Hawkins, T., and Debevec, P. 2005. Performance relighting and reflectance transformation with time-multiplexed illumination. ACM TOG 24, 3, 756–764.
68. Weyrich, T., Matusik, W., Pfister, H., Bickel, B., Donner, C., Tu, C., McAndless, J., Lee, J., Ngan, A., Jensen, H. W., and Gross, M. 2006. Analysis of human faces using a measurement-based skin reflectance model. ACM TOG 25, 3, 1013–1024.
69. Williams, L. 1990. Performance-driven facial animation. In Proc. ACM Siggraph, 235–242.

ACM Digital Library Publication:

Overview Page:

SIGGRAPH Asia 2016: Technical Papers

Submit a story:

If you would like to submit a story about this presentation, please contact us: historyarchives@siggraph.org

ACM SIGGRAPH HISTORY ARCHIVES

“Corrective 3D reconstruction of lips from monocular video” by Garrido, Zollhöfer, Wu, Bradley, Perez, et al. …

Conference:

Type(s):

Title:

Session/Category Title:

Presenter(s)/Author(s):

Abstract:

References:

ACM Digital Library Publication:

Overview Page:

Submit a story:

Sponsored by: