“Mononizing binocular videos” by Hu, Xia, Fu and Wong – ACM SIGGRAPH HISTORY ARCHIVES

“Mononizing binocular videos” by Hu, Xia, Fu and Wong

  • 2020 SA Technical Papers_Hu_Mononizing binocular videos

Conference:


Type(s):


Title:

    Mononizing binocular videos

Session/Category Title:   Learning New Viewpoints


Presenter(s)/Author(s):



Abstract:


    This paper presents the idea of mono-nizing binocular videos and a framework to effectively realize it. Mono-nize means we purposely convert a binocular video into a regular monocular video with the stereo information implicitly encoded in a visual but nearly-imperceptible form. Hence, we can impartially distribute and show the mononized video as an ordinary monocular video. Unlike ordinary monocular videos, we can restore from it the original binocular video and show it on a stereoscopic display. To start, we formulate an encoding-and-decoding framework with the pyramidal deformable fusion module to exploit long-range correspondences between the left and right views, a quantization layer to suppress the restoring artifacts, and the compression noise simulation module to resist the compression noise introduced by modern video codecs. Our framework is self-supervised, as we articulate our objective function with loss terms defined on the input: a monocular term for creating the mononized video, an invertibility term for restoring the original video, and a temporal term for frame-to-frame coherence. Further, we conducted extensive experiments to evaluate our generated mononized videos and restored binocular videos for diverse types of images and 3D movies. Quantitative results on both standard metrics and user perception studies show the effectiveness of our method.

References:


    1. Eirikur Agustsson, Fabian Mentzer, Michael Tschannen, Lukas Cavigelli, Radu Timofte, Luca Benini, and Luc V. Gool. 2017. Soft-to-hard vector quantization for end-to-end learning compressible representations. In Advances in Neural Information Processing Systems.Google Scholar
    2. Amir Atapour-Abarghouei and Toby P. Breckon. 2018. Real-time monocular depth estimation using synthetic data with domain adaptation via image style transfer. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
    3. Johannes Ballé, Valero Laparra, and Eero P. Simoncelli. 2017. End-to-end Optimized Image Compression. In International Conference on Learning Representations (ICLR).Google Scholar
    4. Shumeet Baluja. 2017. Hiding images in plain sight: Deep steganography. In Advances in Neural Information Processing Systems.Google Scholar
    5. Kiana Calagari, Mohamed Elgharib, Piotr Didyk, Alexandre Kaspar, Wojciech Matusik, and Mohamed Hefeeda. 2017. Data Driven 2-D-To-3-D Video Conversion for Soccer. IEEE Transactions on Multimedia 20, 3 (2017), 605–619.Google ScholarDigital Library
    6. Pierre Charbonnier, Laure Blanc-Feraud, Gilles Aubert, and Michel Barlaud. 1994. Two deterministic half-quadratic regularization algorithms for computed imaging. In IEEE International Conference on Image Processing (ICIP).Google ScholarCross Ref
    7. Yue Chen, Debargha Mukherjee, Jingning Han, Adrian Grange, Yaowu Xu, Zoe Liu, Sarah Parker, Cheng Chen, Hui Su, Urvang Joshi, Ching-Han Chiang, Yunqing Wang, Paul Wilkins, Jim Bankoski, Luc N. Trudeau, Nathan E. Egge, Jean-Marc Valin, Thomas Davies, Steinar Midtskogen, Andrey Norkin, and Peter De Rivaz. 2018. An Overview of Core Coding Tools in the AV1 Video Codec. In IEEE Picture Coding Symposium (PCS).Google ScholarCross Ref
    8. Yoojin Choi, Mostafa El-Khamy, and Jungwon Lee. 2019. Variable rate deep image compression with a conditional autoencoder. In International Conference on Computer Vision (ICCV).Google ScholarCross Ref
    9. Xiaodong Cun, Feng Xu, Chi-Man Pun, and Hao Gao. 2018. Depth-Assisted Full Resolution Network for Single Image-Based View Synthesis. IEEE Computer Graphics and Applications 39, 2 (2018), 52–64.Google ScholarCross Ref
    10. Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, and Yichen Wei. 2017. Deformable convolutional networks. In International Conference on Computer Vision (ICCV).Google ScholarCross Ref
    11. Piotr Didyk, Tobias Ritschel, Elmar Eisemann, Karol Myszkowski, and Hans-Peter Seidel. 2011. A perceptual model for disparity. ACM Transactions on Graphics (SIGGRAPH) 30, 4 (2011), 96:1–96:10.Google ScholarDigital Library
    12. Piotr Didyk, Tobias Ritschel, Elmar Eisemann, Karol Myszkowski, Hans-Peter Seidel, and Wojciech Matusik. 2012. A luminance-contrast-aware disparity model and applications. ACM Transactions on Graphics (SIGGRAPH Asia) 31, 6 (2012), 184:1–184:10.Google Scholar
    13. David Eigen, Christian Puhrsch, and Rob Fergus. 2014. Depth Map Prediction from a Single Image using a Multi-Scale Deep Network. In Advances in Neural Information Processing Systems.Google Scholar
    14. José M. Fácil, Benjamin Ummenhofer, Huizhong Zhou, Luis Montesano, Thomas Brox, and Javier Civera. 2019. CAM-Convs: Camera-Aware Multi-Scale Convolutions for Single-View Depth. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
    15. John Flynn, Ivan Neulander, James Philbin, and Noah Snavely. 2016. DeepStereo: Learning to predict new views from the world’s imagery. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
    16. Taiki Fukiage, Takahiro Kawabe, and Shin’ya Nishida. 2017. Hiding of phase-based stereo disparity for ghost-free viewing without glasses. ACM Transactions on Graphics (SIGGRAPH) 36, 4 (2017), 147:1–147:17.Google ScholarDigital Library
    17. Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. 2013. Vision meets robotics: The KITTI dataset. The International Journal of Robotics Research (IJRR) 32, 11 (2013), 1231–1237.Google ScholarDigital Library
    18. Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press. http://www.deeplearningbook.org.Google ScholarDigital Library
    19. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
    20. Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized neural networks. In Advances in Neural Information Processing Systems.Google Scholar
    21. Sergey Ioffe and Christian Szegedy. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the 32nd International Conference on Machine Learning (ICML).Google Scholar
    22. Max Jaderberg, Karen Simonyan, Andrew Zisserman, et al. 2015. Spatial transformer networks. In Advances in Neural Information Processing Systems.Google Scholar
    23. Petr Kellnhofer, Piotr Didyk, Karol Myszkowski, Mohamed M. Hefeeda, Hans-Peter Seidel, and Wojciech Matusik. 2016. GazeStereo3D: Seamless disparity manipulations. ACM Transactions on Graphics (SIGGRAPH) 35, 4 (2016), 1–13.Google ScholarDigital Library
    24. Petr Kellnhofer, Piotr Didyk, Szu-Po Wang, Pitchaya Sitthi-Amorn, William Freeman, Fredo Durand, and Wojciech Matusik. 2017. 3DTV at home: Eulerian-Lagrangian stereo-to-multiview conversion. ACM Transactions on Graphics (SIGGRAPH) 36, 4 (2017), 146:1–146:13.Google ScholarDigital Library
    25. Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In International Conference on Learning Representations (ICLR).Google Scholar
    26. Yi Lai, Qian Wang, and Yin Gao. 2017. Content-Based Scalable Multi-View Video Coding Using 4D Wavelet. International Journal of Hybrid Information Technology 10, 8 (2017), 91–100.Google ScholarCross Ref
    27. Manuel Lang, Alexander Hornung, Oliver Wang, Steven Poulakos, Aljoscha Smolic, and Markus H. Gross. 2010. Nonlinear disparity mapping for stereoscopic 3D. ACM Transactions on Graphics (SIGGRAPH) 29, 4 (2010), 75:1–75:10.Google ScholarDigital Library
    28. Thomas Leimkühler, Petr Kellnhofer, Tobias Ritschel, Karol Myszkowski, and HansPeter Seidel. 2018. Perceptual real-time 2D-to-3D conversion using cue fusion. IEEE Transactions on Visualization & Computer Graphics 24, 6 (2018), 2037–2050.Google ScholarCross Ref
    29. Yue Li, Dong Liu, Houqiang Li, Li Li, Zhu Li, and Feng Wu. 2019b. Learning a Convolutional Neural Network for Image Compact-Resolution. IEEE Transactions on Image Processing (TIP) 28, 3 (2019), 1092–1107.Google ScholarCross Ref
    30. Zhengqi Li, Tali Dekel, Forrester Cole, Richard Tucker, Noah Snavely, Ce Liu, and William T. Freeman. 2019a. Learning the depths of moving people by watching frozen people. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
    31. Miaomiao Liu, Xuming He, and Mathieu Salzmann. 2018. Geometry-aware deep network for single-image novel view synthesis. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
    32. Stephen Lombardi, Tomas Simon, Jason M. Saragih, Gabriel Schwartz, Andreas M. Lehrmann, and Yaser Sheikh. 2019. Neural volumes: learning dynamic renderable volumes from images. ACM Transactions on Graphics (SIGGRAPH) 38, 4 (2019), 65:1–65:14.Google ScholarDigital Library
    33. Yue Luo, Jimmy Ren, Mude Lin, Jiahao Pang, Wenxiu Sun, Hongsheng Li, and Liang Lin. 2018. Single view stereo matching. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
    34. Bruhanth Mallik and Akbar Sheikh Akbari. 2016. HEVC based multi-view video codec using frame interleaving technique. In International Conference on Developments in eSystems Engineering (DeSE).Google ScholarCross Ref
    35. Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. 2020. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. arXiv:2003.08934 [cs.CV]Google Scholar
    36. Debargha Mukherjee, Jingning Han, Jim Bankoski, Ronald Bultje, Adrian Grange, John Koleszar, Paul Wilkins, and Yaowu Xu. 2015. A technical overview of VP9—The latest open-source video codec. SMPTE Motion Imaging Journal 124, 1 (2015), 44–54.Google ScholarCross Ref
    37. Seungjun Nah, Tae Hyun Kim, and Kyoung Mu Lee. 2017. Deep Multi-scale Convolutional Neural Network for Dynamic Scene Deblurring. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
    38. Simon Niklaus, Long Mai, Jimei Yang, and Feng Liu. 2019. 3D Ken Burns effect from a single image. ACM Transactions on Graphics (SIGGRAPH Asia) 38, 6 (2019), 184:1–184:15.Google Scholar
    39. Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems.Google Scholar
    40. Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2017. XNOR-Net: ImageNet classification using binary convolutional neural networks. In European Conference on Computer Vision (ECCV).Google Scholar
    41. Mark A. Robertson and Robert L. Stevenson. 2005. DCT quantization noise in compressed images. IEEE Transactions on Circuits and Systems for Video Technology (CSVT) 15, 1 (2005), 27–38.Google ScholarDigital Library
    42. Steven Scher, Jing Liu, Rajan Vaish, Prabath Gunawardane, and James Davis. 2013. 3D+2DTV: 3D displays with no ghosting for viewers without glasses. ACM Transactions on Graphics (SIGGRAPH) 32, 3 (2013), 21:1–21:10.Google ScholarDigital Library
    43. Donald J. Schuirmann. 1987. A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability. Journal of Pharmacokinetics and Biopharmaceutics 15, 6 (1987), 657–680.Google ScholarCross Ref
    44. Guillaume Seguin, Karteek Alahari, Josef Sivic, and Ivan Laptev. 2015. Pose Estimation and Segmentation of People in 3D Movies. IEEE Transactions on Pattern Analysis & Machine Intelligence (PAMI) 37, 8 (2015), 1643–1655.Google ScholarDigital Library
    45. Meng-Li Shih, Shih-Yang Su, Johannes Kopf, and Jia-Bin Huang. 2020. 3D Photography using Context-aware Layered Depth Inpainting. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
    46. Pratul P. Srinivasan, Richard Tucker, Jonathan T. Barron, Ravi Ramamoorthi, Ren Ng, and Noah Snavely. 2019. Pushing the Boundaries of View Extrapolation with Multiplane Images. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
    47. Gary J. Sullivan, Jens-Rainer Ohm, Woo-Jin Han, and Thomas Wiegand. 2012. Overview of the high efficiency video coding (HEVC) standard. IEEE Transactions on Circuits and Systems for Video Technology (CSVT) 22, 12 (2012), 1649–1668.Google ScholarDigital Library
    48. Deqing Sun, Xiaodong Yang, Ming-Yu Liu, and Jan Kautz. 2018. PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
    49. Gerhard Tech, Ying Chen, Karsten Müller, Jens-Rainer Ohm, Anthony Vetro, and Ye-Kui Wang. 2016. Overview of the multiview and 3D extensions of high efficiency video coding. IEEE Transactions on Circuits and Systems for Video Technology (CSVT) 26, 1 (2016), 35–49.Google ScholarDigital Library
    50. Anthony Vetro. 2010. Frame compatible formats for 3D video distribution. In IEEE International Conference on Image Processing (ICIP).Google ScholarCross Ref
    51. Anthony Vetro, Thomas Wiegand, and Gary J. Sullivan. 2011. Overview of the stereo and multiview video coding extensions of the H. 264/MPEG-4 AVC standard. Proc. IEEE 99, 4 (2011), 626–642.Google ScholarCross Ref
    52. Yingqian Wang, Longguang Wang, Jungang Yang, Wei An, and Yulan Guo. 2019b. Flickr1024: A Large-Scale Dataset for Stereo Image Super-Resolution. In International Conference on Computer Vision Workshops.Google ScholarCross Ref
    53. Zihan Wang, Neng Gao, Xin Wang, Ji Xiang, Daren Zha, and Linghui Li. 2019a. HidingGAN: High Capacity Information Hiding with Generative Adversarial Network. Computer Graphics Forum 38, 7 (2019), 393–401.Google ScholarCross Ref
    54. Zhou Wang, Eero P. Simoncelli, and Alan C. Bovik. 2003. Multiscale structural similarity for image quality assessment. In The Thirty-Seventh Asilomar Conference on Signals, Systems & Computers. IEEE.Google Scholar
    55. Eric Wengrowski and Kristin Dana. 2019. Light Field Messaging With Deep Photographic Steganography. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
    56. Thomas Wiegand, Gary J. Sullivan, Gisle Bjontegaard, and Ajay Luthra. 2003. Overview of the H. 264/AVC video coding standard. IEEE Transactions on Circuits and Systems for Video Technology (CSVT) 13, 7 (2003), 560–576.Google ScholarDigital Library
    57. Menghan Xia, Xueting Liu, and Tien-Tsin Wong. 2018. Invertible grayscale. ACM Transactions on Graphics (SIGGRAPH Asia) 37, 6 (2018), 246:1–246:10.Google Scholar
    58. Junyuan Xie, Ross B. Girshick, and Ali Farhadi. 2016. Deep3D: Fully Automatic 2D-to-3D Video Conversion with Deep Convolutional Neural Networks. In European Conference on Computer Vision (ECCV).Google ScholarCross Ref
    59. Zexiang Xu, Sai Bi, Kalyan Sunkavalli, Sunil Hadap, Hao Su, and Ravi Ramamoorthi. 2019. Deep view synthesis from sparse photometric images. ACM Transactions on Graphics (SIGGRAPH) 38, 4 (2019), 76:1–76:13.Google ScholarDigital Library
    60. Tinghui Zhou, Richard Tucker, John Flynn, Graham Fyffe, and Noah Snavely. 2018. Stereo magnification: learning view synthesis using multiplane images. ACM Transactions on Graphics (SIGGRAPH) 37, 4 (2018), 65:1–65:12.Google ScholarDigital Library
    61. Jiren Zhu, Russell Kaplan, Justin Johnson, and Li Fei-Fei. 2018. HiDDeN: Hiding Data With Deep Networks. In European Conference on Computer Vision (ECCV).Google Scholar
    62. Xizhou Zhu, Han Hu, Stephen Lin, and Jifeng Dai. 2019. Deformable ConvNets v2: More deformable, better results. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref


ACM Digital Library Publication:



Overview Page:



Submit a story:

If you would like to submit a story about this presentation, please contact us: historyarchives@siggraph.org