“Sampling based scene-space video processing”

  • ©Felix Klose, Oliver Wang, Jean-Charles Bazin, Marcus A. Magnor, and Alexander Sorkine-Hornung

Conference:


Type(s):


Title:

    Sampling based scene-space video processing

Session/Category Title:   Video Processing


Presenter(s)/Author(s):


Moderator(s):



Abstract:


    Many compelling video processing effects can be achieved if per-pixel depth information and 3D camera calibrations are known. However, the success of such methods is highly dependent on the accuracy of this “scene-space” information. We present a novel, sampling-based framework for processing video that enables high-quality scene-space video effects in the presence of inevitable errors in depth and camera pose estimation. Instead of trying to improve the explicit 3D scene representation, the key idea of our method is to exploit the high redundancy of approximate scene information that arises due to most scene points being visible multiple times across many frames of video. Based on this observation, we propose a novel pixel gathering and filtering approach. The gathering step is general and collects pixel samples in scene-space, while the filtering step is application-specific and computes a desired output video from the gathered sample sets. Our approach is easily parallelizable and has been implemented on GPU, allowing us to take full advantage of large volumes of video data and facilitating practical runtimes on HD video using a standard desktop computer. Our generic scene-space formulation is able to comprehensively describe a multitude of video processing applications such as denoising, deblurring, super resolution, object removal, computational shutter functions, and other scene-space camera effects. We present results for various casually captured, hand-held, moving, compressed, monocular videos depicting challenging scenes recorded in uncontrolled environments.

References:


    1. Alexa, M., Behr, J., Cohen-Or, D., Fleishman, S., Levin, D., and Silva, C. T. 2003. Computing and rendering point set surfaces. TVCG. Google ScholarDigital Library
    2. Aubry, M., Paris, S., Hasinoff, S. W., Kautz, J., and Durand, F. 2014. Fast local Laplacian filters: Theory and applications. ACM Trans. Graphics. Google ScholarDigital Library
    3. Bhat, P., Zitnick, C. L., Snavely, N., Agarwala, A., Agrawala, M., Cohen, M. F., Curless, B., and Kang, S. B. 2007. Using photographs to enhance videos of a static scene. In EGSR. Google ScholarDigital Library
    4. Cho, S., Wang, J., and Lee, S. 2012. Video deblurring for hand-held cameras using patch-based synthesis. ACM Trans. Graphics (Proc. SIGGRAPH). Google ScholarDigital Library
    5. Dabov, K., Foi, A., Katkovnik, V., and Egiazarian, K. O. 2007. Image denoising by sparse 3D transform-domain collaborative filtering. Trans. Image Processing. Google ScholarDigital Library
    6. Furukawa, Y., and Ponce, J. 2010. Accurate, dense, and robust multiview stereopsis. TPAMI. Google ScholarDigital Library
    7. Gastal, E. S. L., and Oliveira, M. M. 2011. Domain transform for edge-aware image and video processing. ACM Trans. Graphics (Proc. SIGGRAPH). Google ScholarDigital Library
    8. Goesele, M., Ackermann, J., Fuhrmann, S., Haubold, C., Klowsky, R., Steedly, D., and Szeliski, R. 2010. Ambient point clouds for view interpolation. ACM Trans. Graphics (Proc. SIGGRAPH). Google ScholarDigital Library
    9. Google, 2015. Project Tango. https://www.google.com/atap/projecttango/#project.Google Scholar
    10. Granados, M., Kim, K. I., Andgtango Jan Kautz, J. T., and Theobalt, C. 2012. Background inpainting for videos with dynamic objects and a free-moving camera. In ECCV. Google ScholarDigital Library
    11. Gupta, A., Bhat, P., Dontcheva, M., Curless, B., Deussen, O., and Cohen, M. 2009. Enhancing and experiencing spacetime resolution with videos and stills. In ICCP.Google Scholar
    12. Infognition, 2015. Infognition superresolution plugin. http://www.infognition.com/super_resolution/.Google Scholar
    13. Joo, H., Park, H. S., and Sheikh, Y. 2014. Map visibility estimation for large-scale dynamic 3D reconstruction. In CVPR. Google ScholarDigital Library
    14. Kholgade, N., Simon, T., Efros, A. A., and Sheikh, Y. 2014. 3D object manipulation in a single photograph using stock 3D models. ACM Trans. Graphics (Proc. SIGGRAPH). Google ScholarDigital Library
    15. Kolev, K., Klodt, M., Brox, T., and Cremers, D. 2009. Continuous global optimization in multiview 3D reconstruction. IJCV. Google ScholarDigital Library
    16. Kopf, J., Cohen, M. F., Lischinski, D., and Uyttendaele, M. 2007. Joint bilateral upsampling. ACM Trans. Graphics (Proc. SIGGRAPH). Google ScholarDigital Library
    17. Kopf, J., Cohen, M. F., and Szeliski, R. 2014. First-person hyper-lapse videos. ACM Trans. Graphics (Proc. SIGGRAPH). Google ScholarDigital Library
    18. Kuster, C., Bazin, J.-C., Öztireli, A. C., Deng, T., Martin, T., Popa, T., and Gross, M. 2014. Spatio-temporal geometry fusion for multiple hybrid cameras using moving least squares surfaces. CGF (Eurographics). Google ScholarDigital Library
    19. Lang, M., Wang, O., Aydin, T. O., Smolic, A., and Gross, M. 2012. Practical temporal consistency for image-based graphics applications. ACM Trans. Graphics (Proc. SIGGRAPH). Google ScholarDigital Library
    20. Lipski, C., Klose, F., and Magnor, M. A. 2014. Correspondence and depth-image based rendering a hybrid approach for free-viewpoint video. T-CSVT.Google Scholar
    21. Newcombe, R. A., and Davison, A. J. 2010. Live dense reconstruction with a single moving camera. In CVPR.Google Scholar
    22. Öztireli, A. C., Guennebaud, G., and Gross, M. 2009. Feature preserving point set surfaces based on non-linear kernel regression. CGF (Eurographics).Google Scholar
    23. Paris, S., Kornprobst, P., Tumblin, J., and Durand, F. 2007. A gentle introduction to bilateral filtering and its applications. In ACM SIGGRAPH courses. Google ScholarDigital Library
    24. Pritch, Y., Rav-Acha, A., and Peleg, S. 2008. Nonchronological video synopsis and indexing. TPAMI. Google ScholarDigital Library
    25. Richardt, C., Stoll, C., Dodgson, N. A., Seidel, H., and Theobalt, C. 2012. Coherent spatiotemporal filtering, upsampling and rendering of RGBZ videos. CGF (Eurographics). Google ScholarDigital Library
    26. Scharstein, D., and Szeliski, R. 2002. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. IJCV. Google ScholarDigital Library
    27. Seitz, S. M., Curless, B., Diebel, J., Scharstein, D., and Szeliski, R. 2006. A comparison and evaluation of multi-view stereo reconstruction algorithms. In CVPR. Google ScholarDigital Library
    28. Shum, H., Chan, S., and Kang, S. B. 2007. Image-based rendering. Springer. Google ScholarDigital Library
    29. Sun, J., Xu, Z., and Shum, H. 2008. Image super-resolution using gradient profile prior. In CVPR.Google Scholar
    30. Sunkavalli, K., Joshi, N., Kang, S. B., Cohen, M. F., and Pfister, H. 2012. Video snapshots: Creating high-quality images from video clips. TVCG. Google ScholarDigital Library
    31. Tanskanen, P., Kolev, K., Meier, L., Camposeco, F., Saurer, O., and Pollefeys, M. 2013. Live metric 3D reconstruction on mobile phones. In ICCV. Google ScholarDigital Library
    32. Vaish, V., Garg, G., Talvala, E.-V., Antunez, E., Wilburn, B., Horowitz, M., and Levoy, M. 2005. Synthetic aperture focusing using a shear-warp factorization of the viewing transform. In CVPR Workshop. Google ScholarDigital Library
    33. Wilburn, B., Joshi, N., Vaish, V., Talvala, E., Antúnez, E. R., Barth, A., Adams, A., Horowitz, M., and Levoy, M. 2005. High performance imaging using large camera arrays. ACM Trans. Graphics (Proc. SIGGRAPH). Google ScholarDigital Library
    34. Zhang, G., Dong, Z., Jia, J., Wan, L., Wong, T.-T., and Bao, H. 2009. Refilming with depth-inferred videos. TVCG. Google ScholarDigital Library
    35. Zhang, G., Jia, J., Wong, T., and Bao, H. 2009. Consistent depth maps recovery from a video sequence. TPAMI. Google ScholarDigital Library
    36. Zhang, L., Vaddadi, S., Jin, H., and Nayar, S. K. 2009. Multiple view image denoising. In CVPR.Google Scholar
    37. Zitnick, C. L., Kang, S. B., Uyttendaele, M., Winder, S. A. J., and Szeliski, R. 2004. High-quality video view interpolation using a layered representation. ACM Trans. Graphics (Proc. SIGGRAPH). Google ScholarDigital Library
    38. Zwicker, M., Pfister, H., van Baar, J., and Gross, M. 2001. Surface splatting. In SIGGRAPH. Google ScholarDigital Library


ACM Digital Library Publication:



Overview Page: