“Universal Facial Encoding of Codec Avatars From VR Headsets” – ACM SIGGRAPH HISTORY ARCHIVES

“Universal Facial Encoding of Codec Avatars From VR Headsets”

  • ©

Conference:


Type(s):


Title:

    Universal Facial Encoding of Codec Avatars From VR Headsets

Presenter(s)/Author(s):



Abstract:


    We present a self-supervised and robust encoding algorithm for achieving high-fidelity real-time 3D facial animation, via head-mounted cameras on a consumer VR headset. Our model generalizes to unseen users and variability in illuminations, under incomplete views of the face, enabling accessible authentic avatar-mediated telepresence in VR.

References:


    [1]
    Oleg Alexander, Graham Fyffe, Jay Busch, Xueming Yu, Ryosuke Ichikari, Andrew Jones, Paul Debevec, Jorge Jimenez, Etienne Danvoye, Bernardo Antionazzi, Mike Eheler, Zybnek Kysela, and Javier von der Pahlen. 2013. Digital Ira: Creating a Real-time Photoreal Digital Actor. In ACM SIGGRAPH 2013 Posters (SIGGRAPH ’13). ACM, New York, NY, USA, 1:1–1:1.

    [2]
    Oleg Alexander, Mike Rogers, William Lambeth, Jen-Yuan Chiang, Wan-Chun Ma, Chuan-Chang Wang, and Paul Debevec. 2010. The Digital Emily Project: Achieving a Photorealistic Digital Actor. IEEE Computer Graphics and Applications 30, 4 (2010), 20–31.

    [3]
    Zara Ambadar, Jonathan W Schooler, and Jeffrey F Cohn. 2005. Deciphering the enigmatic face: The importance of facial dynamics in interpreting subtle facial expressions. Psychological science 16, 5 (2005), 403–410.

    [4]
    Apple. 2024. Set up your Persona (beta) on Apple Vision Pro. https://support.apple.com/en-us/118496.

    [5]
    Yuki M Asano, Christian Rupprecht, and Andrea Vedaldi. 2019. A critical analysis of self-supervision, or what we can learn from a single image. arXiv preprint arXiv:1904.13132 (2019).

    [6]
    ShahRukh Athar, Zhixin Shu, and Dimitris Samaras. 2020. Self-supervised deformation modeling for facial expression editing. In 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020). IEEE, 294–301.

    [7]
    Hadar Averbuch-Elor, Daniel Cohen-Or, Johannes Kopf, and Michael F Cohen. 2017. Bringing portraits to life. ACM transactions on graphics (TOG) 36, 6 (2017), 1–13.

    [8]
    Linchao Bao, Haoxian Zhang, Yue Qian, Tangli Xue, Changhai Chen, Xuefei Zhe, and Di Kang. 2023. Learning Audio-Driven Viseme Dynamics for 3D Face Animation. arXiv preprint arXiv:2301.06059 (2023).

    [9]
    Thabo Beeler, Fabian Hahn, Derek Bradley, Bernd Bickel, Paul Beardsley, Craig Gotsman, Robert W. Sumner, and Markus Gross. 2011. High-quality Passive Facial Performance Capture Using Anchor Frames. ACM Trans. Graph. 30, 4 (2011), 75:1–75:10.

    [10]
    Sai Bi, Stephen Lombardi, Shunsuke Saito, Tomas Simon, Shih-En Wei, Kevyn McPhail, Ravi Ramamoorthi, Yaser Sheikh, and Jason Saragih. 2021. Deep Relightable Appearance Models for Animatable Faces. ACM Trans. Graph. (Proc. SIGGRAPH) 40, 4 (2021).

    [11]
    Bernd Bickel, Mario Botsch, Roland Angst, Wojciech Matusik, Miguel Otaduy, Hanspeter Pfister, and Markus Gross. 2007. Multi-scale Capture of Facial Geometry and Motion. ACM Trans. Graph. 26, 3 (2007), 33:1–33:10.

    [12]
    Volker Blanz and Thomas Vetter. 1999. A morphable model for the synthesis of 3D faces. In Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques. 187–194.

    [13]
    George Borshukov and J. P. Lewis. 2003. Realistic Human Face Rendering for “The Matrix Reloaded”. In ACM SIGGRAPH 2003 Sketches & Applications (SIGGRAPH ’03). ACM, New York, NY, USA, 16:1–16:1.

    [14]
    Emma Bould, Neil Morris, and Brian Wink. 2008. Recognising subtle emotional expressions: The role of facial movements. Cognition and emotion 22, 8 (2008), 1569–1587.

    [15]
    Derek Bradley, Wolfgang Heidrich, Tiberiu Popa, and Alla Sheffer. 2010. High Resolution Passive Facial Performance Capture. ACM Trans. Graph. 29, 4 (2010), 41:1–41:10.

    [16]
    Chen Cao, Tomas Simon, Jin Kyu Kim, Gabe Schwartz, Michael Zollhoefer, Shun-Suke Saito, Stephen Lombardi, Shih-En Wei, Danielle Belko, Shoou-I Yu, Yaser Sheikh, and Jason Saragih. 2022. Authentic volumetric avatars from a phone scan. ACM Transactions on Graphics (TOG) 41, 4 (2022), 1–19.

    [17]
    Chen Cao, Hongzhi Wu, Yanlin Weng, Tianjia Shao, and Kun Zhou. 2016. Real-time facial animation with image-based dynamic avatars. ACM Transactions on Graphics 35, 4 (2016).

    [18]
    Lele Chen, Chen Cao, Fernando De la Torre, Jason Saragih, Chenliang Xu, and Yaser Sheikh. 2021a. High-fidelity face tracking for ar/vr via deep lighting adaptation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 13059–13069.

    [19]
    Shu-Yu Chen, Yu-Kun Lai, Shihong Xia, Paul Rosin, and Lin Gao. 2022. 3D face reconstruction and gaze tracking in the HMD for virtual interaction. IEEE Transactions on Multimedia (2022).

    [20]
    Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In International conference on machine learning. PMLR, 1597–1607.

    [21]
    Tuochao Chen, Yaxuan Li, Songyun Tao, Hyunchul Lim, Mose Sakashita, Ruidong Zhang, Francois Guimbretiere, and Cheng Zhang. 2021b. Neckface: Continuously tracking full facial expressions on neck-mounted wearables. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5, 2 (2021), 1–31.

    [22]
    T. F. Cootes, G. J. Edwards, and C. J. Taylor. 1998. Active appearance models. In Computer Vision — ECCV’98.

    [23]
    Yann N Dauphin, Angela Fan, Michael Auli, and David Grangier. 2017. Language modeling with gated convolutional networks. In International conference on machine learning. PMLR, 933–941.

    [24]
    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248–255.

    [25]
    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pretraining of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

    [26]
    Digital Domain. 2019. Project DIGI DOUG. https://digitaldomain.com/technology/project-digi-doug/.

    [27]
    Carl Doersch, Abhinav Gupta, and Alexei A Efros. 2015. Unsupervised visual representation learning by context prediction. In Proceedings of the IEEE international conference on computer vision. 1422–1430.

    [28]
    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16×16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).

    [29]
    Yasutaka Furukawa and Jean Ponce. 2009. Dense 3D motion capture for human faces. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’09). IEEE Computer Society, 1674–1681.

    [30]
    Graham Fyffe, Andrew Jones, Oleg Alexander, Ryosuke Ichikari, and Paul Debevec. 2014. Driving High-Resolution Facial Scans with Video Performance Capture. ACM Trans. Graph. 34, 1 (2014), 8:1–8:14.

    [31]
    Erik Geelhoed, Aaron Parker, Damien J Williams, and Martin Groen. 2009. Effects of latency on telepresence. HP Labs Technical Report: HPL-2009-120 (2009).

    [32]
    Abhijeet Ghosh, Graham Fyffe, Borom Tunwattanapong, Jay Busch, Xueming Yu, and Paul Debevec. 2011. Multiview Face Capture Using Polarized Spherical Gradient Illumination. ACM Trans. Graph. 30, 6 (2011), 129:1–129:10.

    [33]
    Philip-William Grassal, Malte Prinzler, Titus Leistner, Carsten Rother, Matthias Nie?ner, and Justus Thies. 2022. Neural head avatars from monocular rgb videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18653–18664.

    [34]
    Kuangxiao Gu, Yuqian Zhou, and Thomas Huang. 2020. Flnet: Landmark driven fetching and learning network for faithful talking facial animation synthesis. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 10861–10868.

    [35]
    Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C Courville. 2017. Improved training of wasserstein gans. Advances in neural information processing systems 30 (2017).

    [36]
    Tengda Han, Weidi Xie, and Andrew Zisserman. 2020. Self-supervised co-training for video representation learning. Advances in Neural Information Processing Systems 33 (2020), 5679–5690.

    [37]
    Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Doll?r, and Ross Girshick. 2022a. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 16000–16009.

    [38]
    Mingjie He, Jie Zhang, Shiguang Shan, and Xilin Chen. 2022b. Enhancing face recognition with self-supervised 3d reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4062–4071.

    [39]
    Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, et al. 2019. Searching for mobilenetv3. In Proceedings of the IEEE/CVF international conference on computer vision. 1314–1324.

    [40]
    HTC. 2021. HTC VIVE Facial Tracker. https://www.vive.com/eu/accessory/facial-tracker/.

    [41]
    Jin Huang, Xiaohan Shi, Xinguo Liu, Kun Zhou, Li-Yi Wei, Shang-Hua Teng, Hujun Bao, Baining Guo, and Heung-Yeung Shum. 2006. Subspace gradient domain mesh deformation. In ACM SIGGRAPH 2006 Papers. 1126–1134.

    [42]
    Alexandru Eugen Ichim, Sofien Bouaziz, and Mark Pauly. 2015. Dynamic 3D avatar creation from hand-held video input. ACM Transactions on Graphics (ToG) 34, 4 (2015), 1–14.

    [43]
    Satoshi Iizuka, Edgar Simo-Serra, and Hiroshi Ishikawa. 2017. Globally and locally consistent image completion. ACM Transactions on Graphics (ToG) 36, 4 (2017), 1–14.

    [44]
    Ashish Jaiswal, Ashwin Ramesh Babu, Mohammad Zaki Zadeh, Debapriya Banerjee, and Fillia Makedon. 2020. A survey on contrastive self-supervised learning. Technologies 9, 1 (2020), 2.

    [45]
    Xiaoyu Ji, Justin Yang, Jishang Wei, Yvonne Huang, Qian Lin, Jan P Allebach, and Fengqing Zhu. 2022. VR facial expression tracking via action unit intensity regression model. Electronic Imaging 34, 8 (2022), 255–1.

    [46]
    Rafal Jozefowicz, Oriol Vinyals, Mike Schuster, Noam Shazeer, and Yonghui Wu. 2016. Exploring the limits of language modeling. arXiv preprint arXiv:1602.02410 (2016).

    [47]
    Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

    [48]
    Martin Klaudiny and Adrian Hilton. 2012. High-Detail 3D Capture and Non-sequential Alignment of Facial Performance. In Proceedings of the 2nd International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission (3DIMPVT ’12). IEEE Computer Society, 17–24.

    [49]
    Elena Kokkinara and Rachel McDonnell. 2015. Animation realism affects perceived character appeal of a self-virtual face. In Proceedings of the 8th ACM SIGGRAPH Conference on Motion in Games. 221–226.

    [50]
    Robert M Krauss and Peter D Bricker. 1967. Effects of transmission delay and access delay on the efficiency of verbal communication. The Journal of the Acoustical Society of America 41, 2 (1967), 286–292.

    [51]
    Gustav Larsson, Michael Maire, and Gregory Shakhnarovich. 2016. Learning representations for automatic colorization. In Computer Vision-ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14. Springer, 577–593.

    [52]
    Hao Li, Laura Trutoiu, Kyle Olszewski, Lingyu Wei, Tristan Trutna, Pei-Lun Hsieh, Aaron Nicholls, and Chongyang Ma. 2015. Facial Performance Sensing Head-mounted Display. ACM Transactions on Graphics (TOG) 34, 4, Article 47 (July 2015), 47:1–47:9 pages.

    [53]
    James Jenn-Jier Lien, Takeo Kanade, Jeffrey F Cohn, and Ching-Chung Li. 2000. Detection, tracking, and classification of action units in facial expression. Robotics and Autonomous Systems 31, 3 (2000), 131–146.

    [54]
    Guilin Liu, Fitsum A Reda, Kevin J Shih, Ting-Chun Wang, Andrew Tao, and Bryan Catanzaro. 2018. Image inpainting for irregular holes using partial convolutions. In Proceedings of the European conference on computer vision (ECCV). 85–100.

    [55]
    Stephen Lombardi, Jason Saragih, Tomas Simon, and Yaser Sheikh. 2018. Deep Appearance Models for Face Rendering. ACM Trans. Graph. 37, 4, Article 68 (July 2018), 13 pages.

    [56]
    Stephen Lombardi, Tomas Simon, Gabriel Schwartz, Michael Zollhoefer, Yaser Sheikh, and Jason Saragih. 2021. Mixture of volumetric primitives for efficient neural rendering. ACM Transactions on Graphics (ToG) 40, 4 (2021), 1–13.

    [57]
    Ilya Loshchilov and Frank Hutter. 2016. Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983 (2016).

    [58]
    Shugao Ma, Tomas Simon, Jason Saragih, Dawei Wang, Yuecheng Li, Fernando De La Torre, and Yaser Sheikh. 2021. Pixel codec avatars. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 64–73.

    [59]
    Meta Inc. 2023a. Meta Quest: Movement SDK for Unity – Overview and Setup. https://developer.oculus.com/documentation/unity/move-overview/#face-tracking.

    [60]
    Meta Inc. 2023b. Meta Quest Pro: Premium Mixed Reality. https://www.meta.com/ie/quest/quest-pro/.

    [61]
    Vinod Nair, Josh Susskind, and Geoffrey E. Hinton. 2008. Analysis-by-Synthesis by Learning to Invert Generative Black Boxes. In Proceedings of the 18th International Conference on Artificial Neural Networks (ICANN), Part I (Prague, Czech Republic). 971–981.

    [62]
    Mehdi Noroozi and Paolo Favaro. 2016. Unsupervised learning of visual representations by solving jigsaw puzzles. In European conference on computer vision. Springer, 69–84.

    [63]
    Kyle Olszewski, Joseph J. Lim, Shunsuke Saito, and Hao Li. 2016. High-fidelity Facial and Speech Animation for VR HMDs. ACM Transactions on Graphics (TOG) 35, 6, Article 221 (Nov. 2016), 14 pages.

    [64]
    Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019).

    [65]
    Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, and Alexei A Efros. 2016. Context encoders: Feature learning by inpainting. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2536–2544.

    [66]
    F. Pighin and J.P. Lewis. 2006. Performance-Driven Facial Animation. In ACM SIGGRAPH Courses.

    [67]
    Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. 2017. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 652–660.

    [68]
    Edoardo Remelli, Timur Bagautdinov, Shunsuke Saito, Chenglei Wu, Tomas Simon, Shih-En Wei, Kaiwen Guo, Zhe Cao, Fabian Prada, Jason Saragih, et al. 2022. Drivable volumetric avatars using texel-aligned features. In ACM SIGGRAPH 2022 Conference Proceedings. 1–9.

    [69]
    Elad Richardson, Matan Sela, Roy Or-El, and Ron Kimmel. 2017. Learning detailed face reconstruction from a single image. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1259–1268.

    [70]
    Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015: 18th International Conference, Munich, Germany, October 5–9, 2015, Proceedings, Part III 18. Springer, 234–241.

    [71]
    Radu Alexandru Rosu, Shunsuke Saito, Ziyan Wang, Chenglei Wu, Sven Behnke, and Giljoo Nam. 2022. Neural Strands: Learning Hair Geometry and Appearance from Multi-View Images. ECCV (2022).

    [72]
    Shunsuke Saito, Gabriel Schwartz, Tomas Simon, Junxuan Li, and Giljoo Nam. 2023. Relightable Gaussian Codec Avatars. (2023). arXiv:2312.03704 [cs.GR]

    [73]
    Gabriel Schwartz, Shih-En Wei, Te-Li Wang, Stephen Lombardi, Tomas Simon, Jason Saragih, and Yaser Sheikh. 2020. The Eyes Have It: An Integrated Eye and Face Model for Photorealistic Facial Animation. ACM Trans. Graph. 39, 4, Article 91 (aug 2020), 15 pages.

    [74]
    Mike Seymour, Chris Evans, and Kim Libreri. 2017. Meet Mike: Epic Avatars. In ACM SIGGRAPH 2017 VR Village (Los Angeles, California) (SIGGRAPH ’17). ACM, New York, NY, USA, Article 12, 2 pages.

    [75]
    Abhinav Shukla, Konstantinos Vougioukas, Pingchuan Ma, Stavros Petridis, and Maja Pantic. 2020. Visually guided self supervised learning of speech representations. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6299–6303.

    [76]
    Aliaksandr Siarohin, St?phane Lathuili?re, Sergey Tulyakov, Elisa Ricci, and Nicu Sebe. 2019. First order motion model for image animation. Advances in neural information processing systems 32 (2019).

    [77]
    Lingxiao Song, Zhihe Lu, Ran He, Zhenan Sun, and Tieniu Tan. 2018. Geometry guided adversarial facial expression synthesis. In Proceedings of the 26th ACM international conference on Multimedia. 627–635.

    [78]
    Volker Steinbiss, Bach-Hiep Tran, and Hermann Ney. 1994. Improvements in beam search. In Third international conference on spoken language processing.

    [79]
    Ayush Tewari, Michael Zollh?fer, Pablo Garrido, Florian Bernard, Hyeongwoo Kim, Patrick P?rez, and Christian Theobalt. 2018. Self-supervised multi-level face model learning for monocular reconstruction at over 250 hz. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2549–2559.

    [80]
    Ayush Tewari, Michael Zollhofer, Hyeongwoo Kim, Pablo Garrido, Florian Bernard, Patrick Perez, and Christian Theobalt. 2017. Mofa: Model-based deep convolutional face autoencoder for unsupervised monocular reconstruction. In Proceedings of the IEEE international conference on computer vision workshops. 1274–1283.

    [81]
    Justus Thies, Michael Zollhofer, Marc Stamminger, Christian Theobalt, and Matthias Nie?ner. 2016. Face2face: Real-time face capture and reenactment of rgb videos. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2387–2395.

    [82]
    Justus Thies, Michael Zollh?fer, Marc Stamminger, Christian Theobalt, and Matthias Niessner. 2018. FaceVR: Real-Time Gaze-Aware Facial Reenactment in Virtual Reality. ACM Transactions on Graphics (TOG) 37, 2, Article 25 (June 2018), 25:1–25:15 pages.

    [83]
    Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Herv? J?gou. 2021. Training data-efficient image transformers & distillation through attention. In International conference on machine learning. PMLR, 10347–10357.

    [84]
    Luan Tran and Xiaoming Liu. 2018. Nonlinear 3d face morphable model. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7346–7355.

    [85]
    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ?ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).

    [86]
    Carl Vondrick, Abhinav Shrivastava, Alireza Fathi, Sergio Guadarrama, and Kevin Murphy. 2018. Tracking emerges by colorizing videos. In Proceedings of the European conference on computer vision (ECCV). 391–408.

    [87]
    Shih-En Wei, Jason Saragih, Tomas Simon, Adam W. Harley, Stephen Lombardi, Michal Perdoch, Alexander Hypes, Dawei Wang, Hernan Badino, and Yaser Sheikh. 2019. VR Facial Animation via Multiview Image Translation. ACM Trans. Graph. 38, 4, Article 67 (jul 2019), 16 pages.

    [88]
    Yandong Wen, Weiyang Liu, Bhiksha Raj, and Rita Singh. 2021. Self-supervised 3d face reconstruction via conditional estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 13289–13298.

    [89]
    Haotian Yang, Mingwu Zheng, Wanquan Feng, Haibin Huang, Yu-Kun Lai, Pengfei Wan, Zhongyuan Wang, and Chongyang Ma. 2023. Towards Practical Capture of High-Fidelity Relightable Avatars. In SIGGRAPH Asia 2023 Conference Proceedings.

    [90]
    Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S Huang. 2019. Free-form image inpainting with gated convolution. In Proceedings of the IEEE/CVF international conference on computer vision. 4471–4480.

    [91]
    Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, and Youngjoon Yoo. 2019. Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF international conference on computer vision. 6023–6032.

    [92]
    Jure Zbontar, Li Jing, Ishan Misra, Yann LeCun, and St?phane Deny. 2021. Barlow twins: Self-supervised learning via redundancy reduction. In International Conference on Machine Learning. PMLR, 12310–12320.

    [93]
    Li Zhang, Noah Snavely, Brian Curless, and Steven M. Seitz. 2004. Spacetime Faces: High Resolution Capture for Modeling and Animation. ACM Trans. Graph. 23, 3 (2004), 548–558.

    [94]
    Richard Zhang, Phillip Isola, and Alexei A Efros. 2016. Colorful image colorization. In Computer Vision-ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part III 14. Springer, 649–666.


ACM Digital Library Publication:



Overview Page:



Submit a story:

If you would like to submit a story about this presentation, please contact us: historyarchives@siggraph.org