Sampling based scene-space video processing

Many compelling video processing effects can be achieved if per-pixel depth information and 3D camera calibrations are known. However, the success of such methods is highly dependent on the accuracy of this “scene-space” information. We present a novel, sampling-based framework for processing video that enables high-quality scene-space video effects in the presence of inevitable errors in depth and camera pose estimation. Instead of trying to improve the explicit 3D scene representation, the key idea of our method is to exploit the high redundancy of approximate scene information that arises due to most scene points being visible multiple times across many frames of video. Based on this observation, we propose a novel pixel gathering and filtering approach. The gathering step is general and collects pixel samples in scene-space, while the filtering step is application-specific and computes a desired output video from the gathered sample sets. Our approach is easily parallelizable and has been implemented on GPU, allowing us to take full advantage of large volumes of video data and facilitating practical runtimes on HD video using a standard desktop computer. Our generic scene-space formulation is able to comprehensively describe a multitude of video processing applications such as denoising, deblurring, super resolution, object removal, computational shutter functions, and other scene-space camera effects. We present results for various casually captured, hand-held, moving, compressed, monocular videos depicting challenging scenes recorded in uncontrolled environments.

References:

1. Alexa, M., Behr, J., Cohen-Or, D., Fleishman, S., Levin, D., and Silva, C. T. 2003. Computing and rendering point set surfaces. TVCG. Google ScholarDigital Library
2. Aubry, M., Paris, S., Hasinoff, S. W., Kautz, J., and Durand, F. 2014. Fast local Laplacian filters: Theory and applications. ACM Trans. Graphics. Google ScholarDigital Library
3. Bhat, P., Zitnick, C. L., Snavely, N., Agarwala, A., Agrawala, M., Cohen, M. F., Curless, B., and Kang, S. B. 2007. Using photographs to enhance videos of a static scene. In EGSR. Google ScholarDigital Library
4. Cho, S., Wang, J., and Lee, S. 2012. Video deblurring for hand-held cameras using patch-based synthesis. ACM Trans. Graphics (Proc. SIGGRAPH). Google ScholarDigital Library
5. Dabov, K., Foi, A., Katkovnik, V., and Egiazarian, K. O. 2007. Image denoising by sparse 3D transform-domain collaborative filtering. Trans. Image Processing. Google ScholarDigital Library
6. Furukawa, Y., and Ponce, J. 2010. Accurate, dense, and robust multiview stereopsis. TPAMI. Google ScholarDigital Library
7. Gastal, E. S. L., and Oliveira, M. M. 2011. Domain transform for edge-aware image and video processing. ACM Trans. Graphics (Proc. SIGGRAPH). Google ScholarDigital Library
8. Goesele, M., Ackermann, J., Fuhrmann, S., Haubold, C., Klowsky, R., Steedly, D., and Szeliski, R. 2010. Ambient point clouds for view interpolation. ACM Trans. Graphics (Proc. SIGGRAPH). Google ScholarDigital Library
9. Google, 2015. Project Tango. https://www.google.com/atap/projecttango/#project.Google Scholar
10. Granados, M., Kim, K. I., Andgtango Jan Kautz, J. T., and Theobalt, C. 2012. Background inpainting for videos with dynamic objects and a free-moving camera. In ECCV. Google ScholarDigital Library
11. Gupta, A., Bhat, P., Dontcheva, M., Curless, B., Deussen, O., and Cohen, M. 2009. Enhancing and experiencing spacetime resolution with videos and stills. In ICCP.Google Scholar
12. Infognition, 2015. Infognition superresolution plugin. http://www.infognition.com/super_resolution/.Google Scholar
13. Joo, H., Park, H. S., and Sheikh, Y. 2014. Map visibility estimation for large-scale dynamic 3D reconstruction. In CVPR. Google ScholarDigital Library
14. Kholgade, N., Simon, T., Efros, A. A., and Sheikh, Y. 2014. 3D object manipulation in a single photograph using stock 3D models. ACM Trans. Graphics (Proc. SIGGRAPH). Google ScholarDigital Library
15. Kolev, K., Klodt, M., Brox, T., and Cremers, D. 2009. Continuous global optimization in multiview 3D reconstruction. IJCV. Google ScholarDigital Library
16. Kopf, J., Cohen, M. F., Lischinski, D., and Uyttendaele, M. 2007. Joint bilateral upsampling. ACM Trans. Graphics (Proc. SIGGRAPH). Google ScholarDigital Library
17. Kopf, J., Cohen, M. F., and Szeliski, R. 2014. First-person hyper-lapse videos. ACM Trans. Graphics (Proc. SIGGRAPH). Google ScholarDigital Library
18. Kuster, C., Bazin, J.-C., Öztireli, A. C., Deng, T., Martin, T., Popa, T., and Gross, M. 2014. Spatio-temporal geometry fusion for multiple hybrid cameras using moving least squares surfaces. CGF (Eurographics). Google ScholarDigital Library
19. Lang, M., Wang, O., Aydin, T. O., Smolic, A., and Gross, M. 2012. Practical temporal consistency for image-based graphics applications. ACM Trans. Graphics (Proc. SIGGRAPH). Google ScholarDigital Library
20. Lipski, C., Klose, F., and Magnor, M. A. 2014. Correspondence and depth-image based rendering a hybrid approach for free-viewpoint video. T-CSVT.Google Scholar
21. Newcombe, R. A., and Davison, A. J. 2010. Live dense reconstruction with a single moving camera. In CVPR.Google Scholar
22. Öztireli, A. C., Guennebaud, G., and Gross, M. 2009. Feature preserving point set surfaces based on non-linear kernel regression. CGF (Eurographics).Google Scholar
23. Paris, S., Kornprobst, P., Tumblin, J., and Durand, F. 2007. A gentle introduction to bilateral filtering and its applications. In ACM SIGGRAPH courses. Google ScholarDigital Library
24. Pritch, Y., Rav-Acha, A., and Peleg, S. 2008. Nonchronological video synopsis and indexing. TPAMI. Google ScholarDigital Library
25. Richardt, C., Stoll, C., Dodgson, N. A., Seidel, H., and Theobalt, C. 2012. Coherent spatiotemporal filtering, upsampling and rendering of RGBZ videos. CGF (Eurographics). Google ScholarDigital Library
26. Scharstein, D., and Szeliski, R. 2002. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. IJCV. Google ScholarDigital Library
27. Seitz, S. M., Curless, B., Diebel, J., Scharstein, D., and Szeliski, R. 2006. A comparison and evaluation of multi-view stereo reconstruction algorithms. In CVPR. Google ScholarDigital Library
28. Shum, H., Chan, S., and Kang, S. B. 2007. Image-based rendering. Springer. Google ScholarDigital Library
29. Sun, J., Xu, Z., and Shum, H. 2008. Image super-resolution using gradient profile prior. In CVPR.Google Scholar
30. Sunkavalli, K., Joshi, N., Kang, S. B., Cohen, M. F., and Pfister, H. 2012. Video snapshots: Creating high-quality images from video clips. TVCG. Google ScholarDigital Library
31. Tanskanen, P., Kolev, K., Meier, L., Camposeco, F., Saurer, O., and Pollefeys, M. 2013. Live metric 3D reconstruction on mobile phones. In ICCV. Google ScholarDigital Library
32. Vaish, V., Garg, G., Talvala, E.-V., Antunez, E., Wilburn, B., Horowitz, M., and Levoy, M. 2005. Synthetic aperture focusing using a shear-warp factorization of the viewing transform. In CVPR Workshop. Google ScholarDigital Library
33. Wilburn, B., Joshi, N., Vaish, V., Talvala, E., Antúnez, E. R., Barth, A., Adams, A., Horowitz, M., and Levoy, M. 2005. High performance imaging using large camera arrays. ACM Trans. Graphics (Proc. SIGGRAPH). Google ScholarDigital Library
34. Zhang, G., Dong, Z., Jia, J., Wan, L., Wong, T.-T., and Bao, H. 2009. Refilming with depth-inferred videos. TVCG. Google ScholarDigital Library
35. Zhang, G., Jia, J., Wong, T., and Bao, H. 2009. Consistent depth maps recovery from a video sequence. TPAMI. Google ScholarDigital Library
36. Zhang, L., Vaddadi, S., Jin, H., and Nayar, S. K. 2009. Multiple view image denoising. In CVPR.Google Scholar
37. Zitnick, C. L., Kang, S. B., Uyttendaele, M., Winder, S. A. J., and Szeliski, R. 2004. High-quality video view interpolation using a layered representation. ACM Trans. Graphics (Proc. SIGGRAPH). Google ScholarDigital Library
38. Zwicker, M., Pfister, H., van Baar, J., and Gross, M. 2001. Surface splatting. In SIGGRAPH. Google ScholarDigital Library

ACM Digital Library Publication:

Overview Page:

SIGGRAPH 2015: Technical Papers

“Sampling based scene-space video processing”

Conference:

Type(s):

Title:

Session/Category Title: Video Processing

Presenter(s)/Author(s):

Moderator(s):

Abstract:

References:

ACM Digital Library Publication:

Overview Page:

Sponsored by: