“Joint Stabilization and Direction of 360 Degree Videos” by Tang, Wang, Liu and Tan

  • ©Chengzhou Tang, Oliver Wang, Feng Liu, and Ping Tan




    Joint Stabilization and Direction of 360 Degree Videos


Session Title: Video


    Three-hundred-sixty-degree (360°) video provides an immersive experience for viewers, allowing them to freely explore the world by turning their head. However, creating high-quality 360° video content can be challenging, as viewers may miss important events by looking in the wrong direction, or they may see things that ruin the immersion, such as stitching artifacts and the film crew. We take advantage of the fact that not all directions are equally likely to be observed; most viewers are more likely to see content located at “true north,” i.e., in front of them, due to ergonomic constraints. We therefore propose 360° video direction, where the video is jointly optimized to orient important events to the front of the viewer and visual clutter behind them, while producing smooth camera motion. Unlike traditional video, viewers can still explore the space as desired, but with the knowledge that the most important content is likely to be in front of them. Constraints can be user guided, either added directly on the equirectangular projection or by recording “guidance” viewing directions while watching the video in a VR headset or automatically computed, such as via visual saliency or forward-motion direction. To accomplish this, we propose a new motion estimation technique specifically designed for 360° video that outperforms the commonly used five-point algorithm on wide-angle video. We additionally formulate the direction problem as an optimization where a novel parametrization of spherical warping allows us to correct for some degree of parallax effects. We compare our approach to recent methods that address stabilization-only and converting 360° video to narrow field-of-view video. Our pipeline can also enable the viewing of wide-angle non-360° footage in a spherical 360° space, giving an immersive “virtual cinema” experience for a wide range of existing content filmed with first-person cameras.


    1. Sameer Agarwal, Keir Mierle, and others. 2017. Ceres Solver. Retrieved from https://code.google.com/p/ceres-solver/.
    2. Robert Anderson, David Gallup, Jonathan T. Barron, Janne Kontkanen, Noah Snavely, Carlos Hernández, Sameer Agarwal, and Steven M. Seitz. 2016. Jump: Virtual reality video. ACM Trans. Graph. 35, 6, Article 198 (Nov. 2016).
    3. Chris Buehler, Michael Bosse, and Leonard McMillan. 2001. Non-metric image-based rendering for video stabilization. In Proceedings of the 2001 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’01), Vol. 2. II–II.
    4. Bing-Yu Chen, Ken-Yi Lee, Wei-Ting Huang, and Jong-Shan Lin. 2008. Capturing intention-based full-frame video stabilization. Comput. Graph. Forum 27, 7 (2008).
    5. Y. Dai, H. Li, and L. Kneip. 2016. Rolling shutter camera relative pose: Generalized epipolar geometry. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 4132–4140.
    6. Vineet Gandhi, Remi Ronfard, and Michael Gleicher. 2014. Multi-clip video editing from a single viewpoint. In Proceedings of the ACM SIGGRAPH European Conference on Visual Media Production (CVMP’14). 9:1–9:10.
    7. Michael L. Gleicher and Feng Liu. 2008. Re-cinematography: Improving the camerawork of casual video. ACM Trans. Multimedia Comput. Commun. Appl. 5, 1, Article 2 (Oct. 2008).
    8. Amit Goldstein and Raanan Fattal. 2012. Video stabilization using epipolar geometry. ACM Trans. Graph. 31, 5, Article 126 (Sep. 2012).
    9. Jeremy J. Gray. 1980. Olinde Rodrigues’ paper of 1840 on transformation groups. Arch. Hist. Exact Sci. 21, 4 (Dec. 1980), 375–385.
    10. Matthias Grundmann, Vivek Kwatra, and Irfan Essa. 2011. Auto-directed video stabilization with robust L1 optimal camera paths. In Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11). 225–232.
    11. R. I. Hartley and A. Zisserman. 2004. Multiple View Geometry in Computer Vision (2nd ed.). Cambridge University Press.
    12. Hou-Ning Hu, Yen-Chen Lin, Ming-Yu Liu, Hsien-Tzu Cheng, Yung-Ju Chang, and Min Sun. 2017. Deep 360 pilot: Learning a deep agent for piloting through 360 sports video. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 1396–1405.
    13. Eakta Jain, Yaser Sheikh, Ariel Shamir, and Jessica Hodgins. 2015. Gaze-driven video re-editing. ACM Trans. Graph. 34, 2, Article 21 (Mar. 2015).
    14. Jason Jerald. 2016. The VR Book: Human-Centered Design for Virtual Reality. Association for Computing Machinery and Morgan 8 Claypool.
    15. L. Kneip and S. Lynen. 2013. Direct optimization of frame-to-frame rotation. In Proceedings of the 2013 IEEE International Conference on Computer Vision (CVPR’13). 2352–2359.
    16. Laurent Kneip, Roland Siegwart, and Marc Pollefeys. 2012. Finding the exact rotation between two images independently of the translation. In Proceedings of the 2012 European Conference on Computer Vision (ECCV’12). 696–709.
    17. Johannes Kopf. 2016. 360° video stabilization. ACM Trans. Graph. 35, 6, Article 195 (Nov. 2016).
    18. Johannes Kopf, Michael F. Cohen, and Richard Szeliski. 2014. First-person hyper-lapse videos. ACM Trans. Graph. 33, 4, Article 78 (Jul. 2014).
    19. Jungjin Lee, Bumki Kim, Kyehyun Kim, Younghui Kim, and Junyong Noh. 2016. Rich360: Optimized spherical representation from structured panoramic camera arrays. ACM Trans. Graph. 35, 4, Article 63 (Jul. 2016).
    20. Ken-Yi Lee, Yung-Yu Chuang, Bing-Yu Chen, and Ming Ouhyoung. 2009. Video stabilization using robust feature trajectories. In Proceedings of the 2009 IEEE International Conference on Computer Vision (ICCV’09). 1397–1404.
    21. Hongdong Li and Richard Hartley. 2006. Five-point motion estimation made easy. In Proceedings of the 2006 International Conference on Pattern Recognition (ICPR’06), Vol. 1. 630–633.
    22. Feng Liu, Michael Gleicher, Hailin Jin, and Aseem Agarwala. 2009. Content-preserving warps for 3d video stabilization. ACM Trans. Graph. 28, 3, Article 44 (Jul. 2009).
    23. Feng Liu, Michael Gleicher, Jue Wang, Hailin Jin, and Aseem Agarwala. 2011. Subspace video stabilization. ACM Trans. Graph. 30, 1, Article 4 (Feb. 2011).
    24. Shuaicheng Liu, Lu Yuan, Ping Tan, and Jian Sun. 2013. Bundled camera paths for video stabilization. ACM Trans. Graph. 32, 4, Article 78 (Jul. 2013).
    25. I. Scott MacKenzie. 2013. Human–Computer Interaction: An Empirical Research Perspective (1st ed.). Morgan Kaufmann Publishers Inc.
    26. Yasuyuki Matsushita, Eyal Ofek, Weina Ge, Xiaoou Tang, and Heung-Yeung Shum. 2006. Full-frame video stabilization with motion inpainting. IEEE Trans. Pattern Anal. Mach. Intell. 28, 7 (Jul. 2006), 1150–1163.
    27. D. Nister. 2004. An efficient solution to the five-point relative pose problem. IEEE Trans. Pattern Anal. Mach. Intell. 26, 6 (Jun. 2004), 756–770.
    28. Ana Serrano, Vincent Sitzmann, Jaime Ruiz-Borau, Gordon Wetzstein, Diego Gutierrez, and Belen Masia. 2017. Movie Editing and Cognitive Event Segmentation in Virtual Reality Video. ACM Trans. Graph. 36, 4, Article 47 (Jul. 2017).
    29. Jianbo Shi and Carlo Tomasi. 1994. Good features to track. In Proceedings of the 1994 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’94). 593–600.
    30. V. Sitzmann, A. Serrano, A. Pavel, M. Agrawala, D. Gutierrez, B. Masia, and G. Wetzstein. 2018. Saliency in VR: How do people explore virtual environments? IEEE Trans. Vis. Comput. Graph. 24, 4 (Apr. 2018), 1633–1642.
    31. H. Stewenius, D. Nister, F. Kahl, and F. Schaffalitzky. 2005. A minimal solution for relative pose with unknown focal length. In Proceedings of the 2005 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’05). 789–794.
    32. Hans Strasburger, Ingo Rentschler, and Martin Jüttner. 2011. Peripheral vision and pattern recognition: A review. J. Vis. 11, 5 (2011), 13.
    33. Y. Su and K. Grauman. 2017. Making 360° video watchable in 2D: Learning videography for click free viewing. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 1368–1376.
    34. Yu-Chuan Su, Dinesh Jayaraman, and Kristen Grauman. 2016. Pano2Vid: Automatic cinematography for watching 360° videos. In Proceedings of the 2016 Asian Conference on Computer Vision (ACCV’16).
    35. Qi Sun, Li-Yi Wei, and Arie Kaufman. 2016. Mapping virtual and physical reality. ACM Trans. Graph. 35, 4, Article 64 (Jul. 2016).
    36. Bill Triggs, Philip F. McLauchlan, Richard I. Hartley, and Andrew W. Fitzgibbon. 2000. Bundle adjustment – A modern synthesis. In Proceedings of the 2000 International Workshop on Vision Algorithms: Theory and Practice. 298–372.
    37. Yu-Shuen Wang, Hongbo Fu, Olga Sorkine, Tong-Yee Lee, and Hans-Peter Seidel. 2009. Motion-aware temporal coherence for video resizing. ACM Trans. Graph. 28, 5, Article 127 (Dec. 2009).

ACM Digital Library Publication:

Overview Page: