DeepLens: shallow depth of field from a single image

We aim to generate high resolution shallow depth-of-field (DoF) images from a single all-in-focus image with controllable focal distance and aperture size. To achieve this, we propose a novel neural network model comprised of a depth prediction module, a lens blur module, and a guided upsampling module. All modules are differentiable and are learned from data. To train our depth prediction module, we collect a dataset of 2462 RGB-D images captured by mobile phones with a dual-lens camera, and use existing segmentation datasets to improve border prediction. We further leverage a synthetic dataset with known depth to supervise the lens blur and guided upsampling modules. The effectiveness of our system and training strategies are verified in the experiments. Our method can generate high-quality shallow DoF images at high resolution, and produces significantly fewer artifacts than the baselines and existing solutions for single image shallow DoF synthesis. Compared with the iPhone portrait mode, which is a state-of-the-art shallow DoF solution based on a dual-lens depth camera, our method generates comparable results, while allowing for greater flexibility to choose focal points and aperture size, and is not limited to one capture setup.

References:

1. Jonathan T Barron, Andrew Adams, YiChang Shih, and Carlos Hernández. 2015. Fast bilateral-space stereo for synthetic defocus. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 4466–4474.Google ScholarCross Ref
2. Ming-Ming Cheng, Niloy J Mitra, Xiaolei Huang, Philip HS Torr, and Shi-Min Hu. 2015. Global contrast based salient region detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 37, 3 (2015), 569–582.Google ScholarDigital Library
3. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 248–255.Google ScholarCross Ref
4. David Eigen, Christian Puhrsch, and Rob Fergus. 2014. Depth map prediction from a single image using a multi-scale deep network. In Advances in neural information processing systems. 2366–2374. Google ScholarDigital Library
5. Andreas Geiger, Philip Lenz, and Raquel Urtasun. 2012. Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Google ScholarDigital Library
6. Hyowon Ha, Sunghoon Im, Jaesik Park, Hae-Gon Jeon, and In So Kweon. 2016. High-quality depth from uncalibrated small motion clip. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 5413–5421.Google ScholarCross Ref
7. Paul Haeberli and Kurt Akeley. 1990. The accumulation buffer: hardware support for high-quality rendering. ACM SIGGRAPH computer graphics 24, 4 (1990), 309–318. Google ScholarDigital Library
8. Kaiming He, Jian Sun, and Xiaoou Tang. 2010. Guided image filtering. In Proceedings of European Conference on Computer Vision. Springer, 1–14.Google ScholarCross Ref
9. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 770–778.Google ScholarCross Ref
10. Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-to-Image Translation with Conditional Adversarial Networks. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2017), 5967–5976.Google Scholar
11. Neel Joshi and C Lawrence Zitnick. 2014. Micro-baseline stereo. Technical Report MSR-TR-2014–73 (2014), 8.Google Scholar
12. Felix Klose, Oliver Wang, Jean-Charles Bazin, Marcus Magnor, and Alexander Sorkine-Hornung. 2015. Sampling based scene-space video processing. ACM Transactions on Graphics 34, 4 (2015), 67. Google ScholarDigital Library
13. Martin Kraus and Magnus Strengert. 2007. Depth-of-Field Rendering by Pyramidal Image Processing. In Computer Graphics Forum, Vol. 26. Wiley Online Library, 645–654.Google Scholar
14. Iro Laina, Christian Rupprecht, Vasileios Belagiannis, Federico Tombari, and Nassir Navab. 2016. Deeper depth prediction with fully convolutional residual networks. In 3D Vision, 2016 Fourth International Conference on. IEEE, 239–248.Google Scholar
15. Sungkil Lee, Elmar Eisemann, and Hans-Peter Seidel. 2010. Real-time lens blur effects and focus control. ACM Transactions on Graphics 29, 4 (2010), 65. Google ScholarDigital Library
16. Sungkil Lee, Gerard Jounghyun Kim, and Seungmoon Choi. 2009. Real-time depth-of-field rendering using anisotropically filtered mipmap interpolation. IEEE Transactions on Visualization and Computer Graphics 15, 3 (2009), 453–464. Google ScholarDigital Library
17. Zhengqi Li and Noah Snavely. 2018. MegaDepth: Learning Single-View Depth Prediction from Internet Photos. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
18. Fayao Liu, Chunhua Shen, and Guosheng Lin. 2015. Deep convolutional neural fields for depth estimation from a single image. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 5162–5170.Google ScholarCross Ref
19. Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. 2012. Indoor Segmentation and Support Inference from RGBD Images. In Proceedings of European Conference on Computer Vision. 746–760. Google ScholarDigital Library
20. Pratul P. Srinivasan, Rahul Garg, Neal Wadhwa, Ren Ng, and Jonathan T. Barron. 2018. Aperture Supervision for Monocular Depth Estimation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
21. Pratul P. Srinivasan, Tongzhou Wang, Ashwin Sreelal, Ravi Ramamoorthi, and Ren Ng. 2017. Learning to Synthesize a 4D RGBD Light Field from a Single Image. In Proceedings of IEEE International Conference on Computer Vision. 2262–2270.Google ScholarCross Ref
22. Supasorn Suwajanakorn, Carlos Hernandez, and Steven M Seitz. 2015. Depth from focus with your mobile phone. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 3497–3506.Google ScholarCross Ref
23. Lijun Wang, Huchuan Lu, Yifan Wang, Mengyang Feng, Dong Wang, Baocai Yin, and Xiang Ruan. 2017. Learning to Detect Salient Objects with Image-Level Supervision. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3796–3805.Google ScholarCross Ref
24. Ning Xu, Brian Price, Scott Cohen, and Thomas Huang. 2017. Deep image matting. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
25. Yang Yang, Haiting Lin, Zhan Yu, Sylvain Paris, and Jingyi Yu. 2016. Virtual DSLR: High Quality Dynamic Depth-of-Field Synthesis on Mobile Platforms. Electronic Imaging 2016, 18 (2016), 1–9.Google ScholarDigital Library
26. Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. 2017. Pyramid scene parsing network. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2881–2890.Google ScholarCross Ref

ACM Digital Library Publication:

Overview Page:

SIGGRAPH Asia 2018: Technical Papers

Submit a story:

If you would like to submit a story about this presentation, please contact us: historyarchives@siggraph.org

ACM SIGGRAPH HISTORY ARCHIVES

“DeepLens: shallow depth of field from a single image”

Conference:

Type(s):

Title:

Session/Category Title:

Presenter(s)/Author(s):

Moderator(s):

Abstract:

References:

ACM Digital Library Publication:

Overview Page:

Submit a story:

Sponsored by: