“An integrated 6DoF video camera and system design” by Pozo, Toksvig, Schrager, Hsu, Mathur, et al. …
Conference:
Type(s):
Title:
- An integrated 6DoF video camera and system design
Session/Category Title: Light Hardware
Presenter(s)/Author(s):
- Albert Parra Pozo
- Michael Toksvig
- Terry Filiba Schrager
- Joyce Hsu
- Uday Mathur
- Alexander Sorkine-Hornung
- Richard Szeliski
- Brian Cabral
Moderator(s):
Abstract:
Designing a fully integrated 360° video camera supporting 6DoF head motion parallax requires overcoming many technical hurdles, including camera placement, optical design, sensor resolution, system calibration, real-time video capture, depth reconstruction, and real-time novel view synthesis. While there is a large body of work describing various system components, such as multi-view depth estimation, our paper is the first to describe a complete, reproducible system that considers the challenges arising when designing, building, and deploying a full end-to-end 6DoF video camera and playback environment. Our system includes a computational imaging software pipeline supporting online markerless calibration, high-quality reconstruction, and real-time streaming and rendering. Most of our exposition is based on a professional 16-camera configuration, which will be commercially available to film producers. However, our software pipeline is generic and can handle a variety of camera geometries and configurations. The entire calibration and reconstruction software pipeline along with example datasets is open sourced to encourage follow-up research in high-quality 6DoF video reconstruction and rendering 1.
References:
1. Hossein Afshari, Laurent Jacques, Luigi Bagnato, Alexandre Schmid, Pierre Vandergheynst, and Yusuf Leblebici. 2013. The PANOPTIC Camera: A Plenoptic Sensor with Real-Time Omnidirectional Capability. Signal Processing Systems 70, 3 (2013), 305–328.Google ScholarDigital Library
2. Sameer Agarwal, Keir Mierle, et al. 2012. Ceres solver. (2012).Google Scholar
3. Robert Anderson, David Gallup, Jonathan T. Barron, Janne Kontkanen, Noah Snavely, Carlos Hernández, Sameer Agarwal, and Steven M. Seitz. 2016. Jump: Virtual Reality Video. ACM Trans. Graph. 35, 6, Article 198 (Nov. 2016), 13 pages.Google ScholarDigital Library
4. Murat Aytekin and Michele Rucci. 2012. Motion parallax from microscopic head movements during visual fixation. Vision research 70 (2012), 7–17.Google Scholar
5. Luigi Barazzetti, Luigi Mussio, Fabio Remondino, and Marco Scaioni. 2011. Targetless camera calibration. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences 38, 5/W16 (2011), 8.Google Scholar
6. Connelly Barnes, Eli Shechtman, Adam Finkelstein, and Dan B Goldman. 2009. PatchMatch: A randomized correspondence algorithm for structural image editing. In ACM Transactions on Graphics (ToG), Vol. 28. ACM, 24.Google ScholarDigital Library
7. Eric P. Bennett and Leonard McMillan. 2005. Video Enhancement Using Per-pixel Virtual Exposures. ACM Trans. Graph. 24, 3 (July 2005), 845–852.Google ScholarDigital Library
8. Tobias Bertel, Neill DF Campbell, and Christian Richardt. 2019. MegaParallax: Casual 360° Panoramas with Motion Parallax. IEEE transactions on visualization and computer graphics 25, 5 (2019), 1828–1835.Google Scholar
9. Michael Bleyer, Christoph Rhemann, and Carsten Rother. 2011. PatchMatch Stereo-Stereo Matching with Slanted Support Windows. In Bmvc, Vol. 11. 1–11.Google Scholar
10. Michael Bleyer, Carsten Rother, and Pushmeet Kohli. 2010. Surface stereo with soft segmentation. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE, 1570–1577.Google Scholar
11. Gary Bradski. 2000. The OpenCV Library. Dr. Dobb’s Journal of Software Tools (2000).Google Scholar
12. Chris Buehler, Michael Bosse, Leonard McMillan, Steven J. Gortler, and Michael F. Cohen. 2001. Unstructured Lumigraph Rendering. In ACM SIGGRAPH 2001 Conference Proceedings, Eugene Fiume (Ed.). ACM Press / ACM SIGGRAPH, 425–432.Google Scholar
13. Laurent Caraffa, Jean-Philippe Tarel, and Pierre Charbonnier. 2015. The Guided Bilateral Filter: When the Joint/Cross Bilateral Filter Becomes Robust. IEEE Transactions on Image Processing 24, 4 (April 2015), 1199–1208.Google ScholarDigital Library
14. Rohan Chabra, Julian Straub, Christopher Sweeney, Richard Newcombe, and Henry Fuchs. 2019. StereoDRNet: Dilated Residual StereoNet. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
15. VR Circle. 2018. VR Movies in 360 Degree Virtual Reality. https://www.vrcircle.com/virtual-reality-360-degree-movies/. (2018). Accessed: 2019-05-18.Google Scholar
16. Alvaro Collet, Ming Chuang, Pat Sweeney, Don Gillett, Dennis Evseev, David Calabrese, Hugues Hoppe, Adam Kirk, and Steve Sullivan. 2015. High-quality Streamable Free-viewpoint Video. ACM Trans. Graph. 34, 4, Article 69 (July 2015), 13 pages.Google ScholarDigital Library
17. Antonio Criminisi, Geoffrey Cross, Andrew Blake, and Vladimir Kolmogorov. 2006. Bilayer segmentation of live video. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), Vol. 1. IEEE, 53–60.Google Scholar
18. Ioana Croitoru, Simion-Vlad Bogolin, and Marius Leordeanu. 2019. Unsupervised Learning of Foreground Object Segmentation. International Journal of Computer Vision 127, 9 (01 Sep 2019), 1279–1302.Google ScholarDigital Library
19. Disney. 2008. Circle-Vision 360°. https://disney.fandom.com/wiki/Circle-Vision_360. (2008). Accessed: 2019-05-18.Google Scholar
20. Disney. 2016. Disney Movies VR. http://www.disneymoviesvr.com/. (2016). Accessed: 2019-05-18.Google Scholar
21. Simon Donne and Andreas Geiger. 2019. Learning Non-Volumetric Depth Fusion Using Successive Reprojections. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
22. Suyog Dutt Jain, Bo Xiong, and Kristen Grauman. 2017. FusionSeg: Learning to Combine Motion and Appearance for Fully Automatic Segmentation of Generic Objects in Videos. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
23. Facebook. 2016. Facebook Surround 360. https://facebook360.fb.com/. (2016). Accessed: 2016-12-26.Google Scholar
24. Steven J Gortler, Radek Grzeszczuk, Richard Szeliski, and Michael F Cohen. 1996. The lumigraph. In Proceedings of the 23rd annual conference on Computer graphics and interactive techniques. ACM, 43–54.Google ScholarDigital Library
25. Kaiming He, Jian Sun, and Xiaoou Tang. 2013. Guided Image Filtering. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 6 (June 2013), 1397–1409.Google ScholarDigital Library
26. Eugene Hecht et al. 2002. Optics. Reading, Mass.: Addison-Wesley,.Google Scholar
27. Peter Hedman, Suhib Alsisan, Richard Szeliski, and Johannes Kopf. 2017. Casual 3D Photography. ACM Transactions on Graphics (Proc. SIGGRAPH Asia) 36, 6 (2017), 234:1–234:15.Google Scholar
28. Peter Hedman, Tobias Ritschel, George Drettakis, and Gabriel Brostow. 2016. Scalable inside-out image-based rendering. ACM Transactions on Graphics (TOG) 35, 6 (2016), 231.Google ScholarDigital Library
29. Carlos Hernandez. 2016. Capture and share VR photos with Cardboard Camera, now on iOS. https://www.blog.google/products/cardboard/cardboard-camera-ios/. (2016).Google Scholar
30. Hiroshi Ishiguro, Masashi Yamamoto, and Saburo Tsuji. 1990. Omni-directional stereo for making global map. In Third International Conference on Computer Vision. IEEE, 540–547.Google ScholarCross Ref
31. Ehsan Khoramshahi and Eija Honkavaara. 2018. Modelling and automated calibration of a general multi-projective camera. The Photogrammetric Record (Mar 2018), 86–112.Google Scholar
32. Vladimir Kolmogorov and Ramin Zabih. 2002. Multi-camera Scene Reconstruction via Graph Cuts. In Proceedings of the 7th European Conference on Computer Vision-Part III (ECCV ’02). Springer-Verlag, Berlin, Heidelberg, 82–96.Google ScholarDigital Library
33. Robert Konrad, Donald G Dansereau, Aniq Masood, and Gordon Wetzstein. 2017. Spinvr: towards live-streaming 3d virtual reality video. ACM Transactions on Graphics (TOG) 36, 6 (2017), 209.Google ScholarDigital Library
34. Johannes Kopf, Michael Cohen, and Richard Szeliski. 2014. First-person Hyperlapse Videos. ACM Transactions on Graphics (Proc. SIGGRAPH 2014) 33, 4 (August 2014).Google Scholar
35. Marc Levoy and Pat Hanrahan. 1996. Light field rendering. In Proceedings of the 23rd annual conference on Computer graphics and interactive techniques. ACM, 31–42.Google ScholarDigital Library
36. K-K Maninis, Sergi Caelles, Yuhua Chen, Jordi Pont-Tuset, Laura Leal-Taixé, Daniel Cremers, and Luc Van Gool. 2018. Video object segmentation without temporal information. IEEE transactions on pattern analysis and machine intelligence 41, 6 (2018), 1515–1530.Google Scholar
37. Kevin Matzen, Michael F. Cohen, Bryce Evans, Johannes Kopf, and Richard Szeliski.Google Scholar
38. 2017. Low-cost 360 Stereo Photography and Video Capture. ACM Trans. Graph. 36, 4, Article 148 (July 2017), 12 pages.Google Scholar
39. Ben Mildenhall, Pratul P. Srinivasan, Rodrigo Ortiz-Cayon, Nima Khademi Kalantari, Ravi Ramamoorthi, Ren Ng, and Abhishek Kar. 2019. Local Light Field Fusion: Practical View Synthesis with Prescriptive Sampling Guidelines. ACM Trans. Graph. 38, 4, Article 29 (July 2019), 14 pages. Google ScholarDigital Library
40. Tim Milliron, Chrissy Szczupak, and Orin Green. 2017. Hallelujah: The World’s First Lytro VR Experience. In ACM SIGGRAPH 2017 VR Village (SIGGRAPH ’17). ACM, Article 7, 2 pages.Google ScholarDigital Library
41. Ryan S. Overbeck, Daniel Erickson, Daniel Evangelakos, Matt Pharr, and Paul Debevec. 2018. A System for Acquiring, Processing, and Rendering Panoramic Light Field Stills for Virtual Reality. ACM Trans. Graph. 37, 6, Article 197 (Dec. 2018), 15 pages.Google ScholarDigital Library
42. S. Peleg, M. Ben-Ezra, and Y. Pritch. 2001. Omnistereo: panoramic stereo imaging. IEEE Transactions on Pattern Analysis and Machine Intelligence 23, 3 (2001), 279–290.Google ScholarDigital Library
43. Eric Penner and Li Zhang. 2017. Soft 3D reconstruction for view synthesis. ACM Transactions on Graphics (TOG) 36, 6 (2017), 235.Google ScholarDigital Library
44. Christian Richardt, Yael Pritch, Henning Zimmer, and Alexander Sorkine-Hornung. 2013. Megastereo: Constructing High-Resolution Stereo Panoramas. IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2013) (2013), 1256–1263.Google Scholar
45. Wesley JM Ridgway and Alexei F Cheviakov. 2018. An iterative procedure for finding locally and globally optimal arrangements of particles on the unit sphere. Computer Physics Communications 233 (2018), 84–109.Google ScholarCross Ref
46. Nuno Roma, José Santos-Victor, and José Tomé. 2002. A Comparative Analysis Of Cross-Correlation Matching Algorithms Using a Pyramidal Resolution Approach. (05 2002).Google Scholar
47. Johannes L. Schönberger, Enliang Zheng, Jan-Michael Frahm, and Marc Pollefeys. 2016. Pixelwise View Selection for Unstructured Multi-View Stereo. In Computer Vision – ECCV 2016, Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Springer International Publishing, Cham, 501–518.Google ScholarCross Ref
48. Christopher Schroers, Jean Charles Bazin, and Alexander Sorkine-Hornung. 2018. An Omnistereoscopic Video Pipeline for Capture and Display of Real-World VR. ACM Trans. Graph. 37, 3 (2018), 37:1–37:13. https://dl.acm.org/citation.cfm?id=3225150Google ScholarDigital Library
49. Heung-Yeung Shum and Li-Wei He. 1999. Rendering with Concentric Mosaics. In Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH ’99). ACM Press/Addison-Wesley Publishing Co., New York, NY, USA, 299–306.Google ScholarDigital Library
50. Chester C Slama. 1980. Manual of Photogrammetry. Technical Report. America Society of Photogrammetry,.Google Scholar
51. Pratul P. Srinivasan, Richard Tucker, Jonathan T. Barron, Ravi Ramamoorthi, Ren Ng, and Noah Snavely. 2019. Pushing the Boundaries of View Extrapolation With Multi-plane Images. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
52. Jayant Thatte, Jean-Baptiste Boin, Haricharan Lakshman, and Bernd Girod. 2016. Depth augmented stereo panorama for cinematic virtual reality with head-motion parallax. In IEEE International Conference on Multimedia and Expo, ICME 2016, Seattle, WA, USA, July 11–15, 2016. 1–6.Google ScholarCross Ref
53. Alessio Tonioni, Fabio Tosi, Matteo Poggi, Stefano Mattoccia, and Luigi Di Stefano. 2019. Real-Time Self-Adaptive Deep Stereo. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
54. Kentaro Toyama, John Krumm, Barry Brumitt, and Brian Meyers. 1999. Wallflower: Principles and Practice of Background Maintenance. In Seventh International Conference on Computer Vision (ICCV’99). 255–261.Google Scholar
55. Bill Triggs, Philip F. McLauchlan, Richard I. Hartley, and Andrew W. Fitzgibbon. 1999. Bundle Adjustment — A Modern Synthesis. In International Workshop on Vision Algorithms. Springer, 298–372.Google ScholarDigital Library
56. Matthew Uyttendaele, Antonio Criminisi, Sing Bing Kang, Simon Winder, Richard Hartley, and Richard Szeliski. 2004. Image-Based Interactive Exploration of Real-World Environments. IEEE Computer Graphics and Applications 24, 3 (May/June 2004), 52–63.Google ScholarDigital Library
57. Michael Waechter, Mate Beljan, Simon Fuhrmann, Nils Moehrle, Johannes Kopf, and Michael Goesele. 2017. Virtual Rephotography: Novel View Prediction Error for 3D Reconstruction. ACM Trans. Graph. 36, 1, Article 45a (Jan. 2017).Google ScholarDigital Library
58. Zhou Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13, 4 (April 2004), 600–612.Google ScholarDigital Library
59. Christian Weissig, Oliver Schreer, Peter Eisert, and Peter Kauff. 2012. The ultimate immersive experience: panoramic 3D video acquisition. In International Conference on Multimedia Modeling. Springer, 671–681.Google ScholarDigital Library
60. Eric W. Weisstein. 1998. Thomson Problem. (1998). http://mathworld.wolfram.com/ThomsonProblem.html Visited on 19/05/16.Google Scholar
61. Oliver Woodford, Philip Torr, Ian Reid, and Andrew Fitzgibbon. 2009. Global Stereo Reconstruction under Second-Order Smoothness Priors. IEEE Transactions on Pattern Analysis and Machine Intelligence 31, 12 (Dec 2009), 2115–2128.Google ScholarDigital Library
62. Changchang Wu, B. Clipp, Xiaowei Li, J. Frahm, and M. Pollefeys. 2008. 3D model matching with Viewpoint-Invariant Patches (VIP). In 2008 IEEE Conference on Computer Vision and Pattern Recognition. 1–8.Google Scholar
63. Zexiang Xu, Sai Bi, Kalyan Sunkavalli, Sunil Hadap, Hao Su, and Ravi Ramamoorthi. 2019. Deep View Synthesis from Sparse Photometric Images. ACM Trans. Graph. 38, 4, Article 76 (July 2019), 13 pages. Google ScholarDigital Library
64. Gengshan Yang, Joshua Manela, Michael Happold, and Deva Ramanan. 2019. Hierarchical Deep Stereo Matching on High-Resolution Images. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
65. Rui Yao, Guosheng Lin, Shixiong Xia, Jiaqi Zhao, and Yong Zhou. 2019a. Video Object Segmentation and Tracking: A Survey. arXiv preprint arXiv:1904.09172 (2019).Google Scholar
66. Yao Yao, Zixin Luo, Shiwei Li, Tianwei Shen, Tian Fang, and Long Quan. 2019b. Recurrent MVSNet for High-Resolution Multi-View Stereo Depth Inference. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
67. Feihu Zhang, Victor Prisacariu, Ruigang Yang, and Philip H.S. Torr. 2019. GA-Net: Guided Aggregation Net for End-To-End Stereo Matching. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
68. Enliang Zheng, Enrique Dunn, Vladimir Jojic, and Jan-Michael Frahm. 2014. PatchMatch Based Joint View Selection and Depthmap Estimation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’14). IEEE Computer Society, Washington, DC, USA, 1510–1517.Google ScholarDigital Library
69. C. Lawrence Zitnick, Sing Bing Kang, Matthew Uyttendaele, Simon Winder, and Richard Szeliski. 2004. High-quality video view interpolation using a layered representation. ACM Transactions on Graphics (Proc. SIGGRAPH 2004) 23, 3 (August 2004), 600–608.Google ScholarDigital Library


