“Automatic acquisition of high-fidelity facial performances using monocular videos” by Shi, Wu, Tong and Chai
Conference:
Type(s):
Title:
- Automatic acquisition of high-fidelity facial performances using monocular videos
Session/Category Title: Capturing Everything
Presenter(s)/Author(s):
Abstract:
This paper presents a facial performance capture system that automatically captures high-fidelity facial performances using uncontrolled monocular videos (e.g., Internet videos). We start the process by detecting and tracking important facial features such as the nose tip and mouth corners across the entire sequence and then use the detected facial features along with multilinear facial models to reconstruct 3D head poses and large-scale facial deformation of the subject at each frame. We utilize per-pixel shading cues to add fine-scale surface details such as emerging or disappearing wrinkles and folds into large-scale facial deformation. At a final step, we iterate our reconstruction procedure on large-scale facial geometry and fine-scale facial details to further improve the accuracy of facial reconstruction. We have tested our system on monocular videos downloaded from the Internet, demonstrating its accuracy and robustness under a variety of uncontrolled lighting conditions and overcoming significant shape differences across individuals. We show our system advances the state of the art in facial performance capture by comparing against alternative methods.
References:
1. Amit, Y., and Geman, D. 1997. Shape quantization and recognition with randomized trees. Neural Computation. 9(7):1545–1588.
2. Basri, R., and Jacobs, D. W. 2003. Lambertian reflectance and linear subspaces. IEEE Transactions on Pattern Analysis and Machine Intelligence. 25, 2, 218–233.
3. Beeler, T., Bickel, B., Beardsley, P., Sumner, B., and Gross, M. 2010. High-quality single-shot capture of facial geometry. ACM Trans. Graph. 29, 4, 40:1–40:9.
4. Beeler, T., Hahn, F., Bradley, D., Bickel, B., Beardsley, P., Gotsman, C., Sumner, R. W., and Gross, M. 2011. High-quality passive facial performance capture using anchor frames. ACM Trans. Graph. 30, 4, 75:1–75:10.
5. Belhumeur, P. N., Jacobs, D. W., Kriegman, D. J., and Kumar, N. 2011. Localizing parts of faces using a consensus of exemplars. In Computer Vision and Pattern Recognition, 545–552.
6. Bickel, B., Botsch, M., Angst, R., Matusik, W., Otaduy, M., Pfister, H., and Gross, M. 2007. Multi-scale capture of facial geometry and motion. ACM Trans. Graph. 26, 3, 33:1–33:10.
7. Blanz, V., Basso, C., Poggio, T., and Vetter, T. 2003. Reanimating faces in images and video. In Computer Graphics Forum. 22(3):641–650.Cross Ref
8. Bouaziz, S., Wang, Y., and Pauly, M. 2013. Online modeling for realtime facial animation. ACM Trans. Graph. 32, 4 (July), 40:1–40:10.
9. Bradley, D., Heidrich, W., Popa, T., and Sheffer, A. 2010. High resolution passive facial performance capture. ACM Trans. Graph. 29, 4, 41:1–41:10.
10. Bunnell, M. 2005. Adaptive tessellation of subdivision surfaces with displacement mapping. In GPU Gems 2, M. Pharr, Ed. Addison-Wesley, 109–122.
11. Cao, C., Weng, Y., Lin, S., and Zhou, K. 2013. 3d shape regression for real-time facial animation. ACM Trans. Graph. 32, 4 (July), 41:1–41:10.
12. Cao, C., Weng, Y., Zhou, S., Tong, Y., and Zhou, K. 2013. Facewarehouse: a 3d facial expression database for visual computing. IEEE Trans. on Visualization and Computer Graphics. 20, 3, 413–425.
13. Comaniciu, D., and Meer, P. 2002. Mean shift: A robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence. 24, 5, 603–619.
14. Dai, Y., Li, H., and He, M. 2012. A simple prior-free method for non-rigid structure-from-motion factorization. In Computer Vision and Pattern Recognition, 2018–2025.
15. Garrido, P., Valgaert, L., Wu, C., and Theobalt, C. 2013. Reconstructing detailed dynamic face geometry from monocular video. ACM Trans. Graph. 32, 6 (Nov.), 158:1–158:10.
16. Guenter, B., Grimm, C., Wood, D., Malvar, H., and Pighin, F. 1998. Making Faces. In Proceedings of ACM SIGGRAPH 1998, 55–66.
17. Horn, B. K., and Brooks, M. J. 1986. The variational approach to shape from shading. Computer Vision, Graphics, and Image Processing. 33, 2, 174–208.
18. Horn, B., and Brooks, M. 1989. Shape from Shading. MIT Press: Cambridge, MA.
19. Huang, H., Chai, J., Tong, X., and Wu, H.-T. 2011. Leveraging motion capture and 3d scanning for high-fidelity facial performance acquisition. ACM Trans. Graph. 30, 4, 74:1–74:10.
20. Kemelmacher-Shlizerman, I., and Basri, R. 2011. 3d face reconstruction from a single image using a single reference face shape. IEEE Transactions on Pattern Analysis and Machine Intelligence. 33, 2, 394–405.
21. Lamdan, Y., and Wolfson, H. 1988. Geometric hashing: A general and efficient model-based recognition scheme. International Conference on Computer Vision, 238–249.
22. Lepetit, V., and Fua, P. 2006. Keypoint recognition using randomized trees. IEEE Transactions on Pattern Analysis and Machine Intelligence. 28(9): 1465–1479.
23. Li, H., Adams, B., Guibas, L. J., and Pauly, M. 2009. Robust single-view geometry and motion reconstruction. ACM Trans. Graph. 28, 5, 175:1–175:10.
24. Li, H., Yu, J., Ye, Y., and Bregler, C. 2013. Realtime facial animation with on-the-fly correctives. ACM Trans. Graph. 32, 4 (July), 42:1–42:10.
25. Lourakis, M. I. A., 2009. levmar: Levenberg-Marquardt nonlinear least squares algorithms in {C}/{C}++.
26. Ma, W.-C., Jones, A., Chiang, J.-Y., Hawkins, T., Frederiksen, S., Peers, P., Vukovic, M., Ouhyoung, M., and Debevec, P. 2008. Facial performance synthesis using deformation-driven polynomial displacement maps. ACM Trans. Graph. 27, 5, 121:1–121:10.
27. Matthews, I., and Baker, S. 2004. Active appearance models revisited. International Journal of Computer Vision. 60, 2, 135–164.
28. Pérez, P., Gangnet, M., and Blake, A. 2003. Poisson image editing. ACM Trans. Graph. 22, 3 (July), 313–318.
29. Ross, D. A., Lim, J., Lin, R.-S., and Yang, M.-H. 2008. Incremental learning for robust visual tracking. International Journal of Computer Vision. 77, 1–3, 125–141.
30. Sumner, R. W., and Popović, J. 2004. Deformation transfer for triangle meshes. ACM Trans. Graph. 23, 3 (Aug.), 399–405.
31. Suwajanakorn, S., Kemelmacher-Shlizerman, I., and Seitz, S. M. 2014. Total moving face reconstruction. In European Conference on Computer Vision. 796–812.
32. Valgaerts, L., Wu, C., Bruhn, A., Seidel, H.-P., and Theobalt, C. 2012. Lightweight binocular facial performance capture under uncontrolled lighting. ACM Trans. Graph. 31, 6 (Nov.), 187:1–187:11.
33. Vlasic, D., Brand, M., Pfister, H., and Popović, J. 2005. Face transfer with multilinear models. ACM Transactions on Graphics 24, 3, 426–433.
34. Wedin, P. Å. 1983. On angles between subspaces of a finite dimensional inner product space. In Matrix Pencils. 263–285.
35. Weise, T., Li, H., Van Gool, L., and Pauly, M. 2009. Face/off: live facial puppetry. In Symposium on Computer Animation, 7–16.
36. Weise, T., Bouaziz, S., Li, H., and Pauly, M. 2011. Realtime performance-based facial animation. ACM Trans. Graph. 30, 4, 77:1–77:10.
37. Wu, C., Varanasi, K., Liu, Y., Seidel, H.-P., and Theobalt, C. 2011. Shading-based dynamic shape refinement from multi-view video under general illumination. In International Conference on Computer Vision, 1108–1115.
38. Zhang, L., Snavely, N., Curless, B., and Seitz, S. 2004. Spacetime faces: high resolution capture for modeling and animation. ACM Transactions on Graphics 23, 3, 548–558.


