“Scanner: efficient video analysis at scale” by Poms, Crichton, Hanrahan and Fatahalian

  • ©Alex Poms, Will Crichton, Patrick (Pat) Hanrahan, and Kayvon Fatahalian



Entry Number: 138

Session Title:

    Pipelines and Languages for the GPU


    Scanner: efficient video analysis at scale




    A growing number of visual computing applications depend on the analysis of large video collections. The challenge is that scaling applications to operate on these datasets requires efficient systems for pixel data access and parallel processing across large numbers of machines. Few programmers have the capability to operate efficiently at these scales, limiting the field’s ability to explore new applications that leverage big video data. In response, we have created Scanner, a system for productive and efficient video analysis at scale. Scanner organizes video collections as tables in a data store optimized for sampling frames from compressed video, and executes pixel processing computations, expressed as dataflow graphs, on these frames. Scanner schedules video analysis applications expressed using these abstractions onto heterogeneous throughput computing hardware, such as multi-core CPUs, GPUs, and media processing ASICs, for high-throughput pixel processing. We demonstrate the productivity of Scanner by authoring a variety of video processing applications including the synthesis of stereo VR video streams from multi-camera rigs, markerless 3D human pose reconstruction from video, and data-mining big video datasets such as hundreds of feature-length films or over 70,000 hours of TV news. These applications achieve near-expert performance on a single machine and scale efficiently to hundreds of machines, enabling formerly long-running big video data analysis tasks to be carried out in minutes to hours.


    1. 2016. CaffeOnSpark. Github web site: https://github.com/yahoo/CaffeOnSpark. (2016).Google Scholar
    2. Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for Large-Scale Machine Learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). USENIX Association, GA, 265–283. Google ScholarDigital Library
    3. Ganesh Ananthanarayanan, Ali Ghodsi, Scott Shenker, and Ion Stoica. 2013. Effective Straggler Mitigation: Attack of the Clones. In Presented as part of the 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13). USENLX, Lombard, IL, 185–198. https://www.usenix.org/conference/nsdi13/technical-sessions/presentation/ananthanarayanan Google ScholarDigital Library
    4. Robert Anderson, David Gallup, Jonathan T. Barron, Janne Kontkanen, Noah Snavely, Carlos Hernández, Sameer Agarwal, and Steven M. Seitz. 2016. Jump: Virtual Reality Video. ACM Trans. Graph. 35, 6, Article 198 (Nov. 2016), 13 pages. Google ScholarDigital Library
    5. Michael Armbrust, Reynold S. Xin, Cheng Lian, Yin Huai, Davies Liu, Joseph K. Bradley, Xiangrui Meng, Tomer Kaftan, Michael J. Franklin, Ali Ghodsi, and Matei Zaharia. 2015. Spark SQL: Relational Data Processing in Spark. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD ’15). ACM, New York, NY, USA, 1383–1394. Google ScholarDigital Library
    6. P. Baumann, A. Dehmel, P. Furtado, R. Ritsch, and N. Widmann. 1998. The Multidimensional Database System RasDaMan. SIGMOD Rec. 27, 2 (June 1998), 575–577. Google ScholarDigital Library
    7. Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, et al. 2016. End to End Learning for Self-Driving Cars. arXiv preprint arXiv:1604.07316 (2016).Google Scholar
    8. Rajesh Bordawekar. 2016. Accelerating Spark workloads using GPUs. https://www.oreilly.com/learning/accelerating-spark-workloads-using-gpus. O’Reilly Media, Inc (2016).Google Scholar
    9. Ian Buck, Tim Foley, Daniel Horn, Jeremy Sugerman, Kayvon Fatahalian, Mike Houston, and Pat Hanrahan. 2004. Brook for GPUs: Stream Computing on Graphics Hardware. ACM Trans. Graph. 23, 3 (Aug. 2004), 777–786. Google ScholarDigital Library
    10. Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2016. Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. arXiv preprint arXiv:1611.08050 (2016).Google Scholar
    11. Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. arXiv:1512.01274 (2015). arXiv:eprint arXiv:1512.01274Google Scholar
    12. X. Chen, A. Shrivastava, and A. Gupta. 2013. NEIL: Extracting Visual Knowledge from Web Data. In 2013 IEEE International Conference on Computer Vision. 1409–1416. Google ScholarDigital Library
    13. E. F. Codd. 1970. A Relational Model of Data for Large Shared Data Banks. Commun. ACM 13, 6 (June 1970), 377–387. Google ScholarDigital Library
    14. P. Cudre-Mauroux, H. Kimura, K.-T Lim, J. Rogers, R. Simakov, E. Soroush, P. Velikhov, D. L. Wang, M. Balazinska, J. Becla, D. DeWitt, B. Heath, D. Maier, S. Madden, J. Patel, M. Stonebraker, andS. Zdonik. 2009. A Demonstration of SciDB: A Science-oriented DBMS. Proc. VLDB Endow. 2, 2 (Aug. 2009), 1534–1537. Google ScholarDigital Library
    15. DataBricks. 2016. TensorFrames. Github web site: https://github.com/databricks/tensorframes. (2016).Google Scholar
    16. Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified Data Processing on Large Clusters. In Proceedings of the 6th Conference on Symposium on Opearting Systems Design & Implementation – Volume 6 (OSDI’04). USENIX Association, Berkeley, CA, USA, 10–10. Google ScholarDigital Library
    17. Carl Doersch, Saurabh Singh, Abhinav Gupta, Josef Sivic, and Alexei A. Efros. 2012. What Makes Paris Look Like Paris? ACM Trans. Graph. 31, 4, Article 101 (July 2012), 9 pages. Google ScholarDigital Library
    18. Inc. Facebook. 2017. Facebook Surround 360. Web site: https://facebook360.fb.com/facebook-surround-360/. (2017).Google Scholar
    19. S. Ginosar, K. Rakelly, S. M. Sachs, B. Yin, C. Lee, P. Krahenbuhl, and A. A. Efros. 2017. A Century of Portraits: A Visual Historical Record of American High School Yearbooks. IEEE Transactions on Computational Imaging PP, 99 (2017).Google Scholar
    20. James Hays and Alexei A. Efros. 2007. Scene completion using millions of photographs. ACM Trans. Graph. 26, 3, Article 4 (July 2007). Google ScholarDigital Library
    21. ISO/IEC 2015. ISO/IEC 14496-12:2015: Coding of audio-visual objects – Part 12: ISO base media file format. Standard. International Organization for Standardization, Geneva, CH.Google Scholar
    22. Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv preprint arXiv:1408.5093 (2014).Google Scholar
    23. H. Joo, H. Liu, L. Tan, L. Gui, B. Nabbe, I. Matthews, T. Kanade, S. Nobuhara, and Y. Sheikh. 2015. Panoptic Studio: A Massively Multiview System for Social Motion Capture. In 2015 IEEE International Conference on Computer Vision (ICCV). 3334–3342. Google ScholarDigital Library
    24. Hanbyul Joo, Tomas Simon, Xulong Li, Hao Liu, Lei Tan, Lin Gui, Sean Banerjee, Timothy Godisart, Bart Nabbe, Iain Matthews, Takeo Kanade, Shohei Nobuhara, and Yaser Sheikh. 2016. Panoptic Studio: A Massively Multiview System for Social Interaction Capture. (2016). arXiv:arXiv:1612.03153Google Scholar
    25. Neel Joshi, Wolf Kienzle, Mike Toelle, Matt Uyttendaele, and Michael F Cohen. 2015. Real-time hyperlapse creation via optimal frame selection. ACM Transactions on Graphics (TOG) 34, 4 (2015), 63. Google ScholarDigital Library
    26. Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. 2017. In-Datacenter Performance Analysis of a Tensor Processing Unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA ’17). ACM, New York, NY, USA, 1–12. Google ScholarDigital Library
    27. Ira Kemelmacher-Shlizerman. 2016. Transfiguring Portraits. ACM Trans. Graph. 35, 4, Article 94 (July 2016), 8 pages. Google ScholarDigital Library
    28. D. Marpe, T. Wiegand, and G. J. Sullivan. 2006. The H.264/MPEG4 advanced video coding standard and its applications. IEEE Communications Magazine 44, 8 (Aug 2006), 134–143. Google ScholarDigital Library
    29. Kevin Matzen, Kavita Bala, and Noah Snavely. 2017. Streetstyle: Exploring world-wide clothing styles from millions of photos. (2017). https://arxiv.org/abs/1706.01869Google Scholar
    30. Microsoft. 2017. The Microsoft Cognitive Toolkit. Web site: https://www.microsoft.com/en-us/cognitive-toolkit/. (2017).Google Scholar
    31. PostGIS Project 2016. PostGIS 2.3.2dev Manual. PostGIS Project.Google Scholar
    32. Jonathan Ragan-Kelley, Andrew Adams, Sylvain Paris, Marc Levoy, Saman Amarasinghe, and Frédo Durand. 2012. Decoupling Algorithms from Schedules for Easy Optimization of Image Processing Pipelines. ACM Trans. Graph. 31, 4, Article 32 (July 2012), 12 pages. Google ScholarDigital Library
    33. Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines. SIGPLAN Not. 48, 6 (June 2013), 519–530. Google ScholarDigital Library
    34. Rasdaman.org 2015. Rasdaman Version 9.2 Query Language Guide. Rasdaman.org.Google Scholar
    35. Alexander Ratner, Stephen H. Bach, Henry Ehrenberg, Jason Fries, Sen Wu, and Christopher Ré. 2018. Snorkel: Rapid Training Data Creation with Weak Supervision. Proc. VLDB Endow, (to appear) 12, 1 (2018). Google ScholarDigital Library
    36. Josh Rosen and Reynold Xin. 2015. Project Tungsten: Bringing Apache Spark Closer to Bare Metal. Databricks Engineering Blog: https://databricks.com/blog/2015/04/28/project-tungsten-bringing-spark-closer-to-bare-metal.html. (2015).Google Scholar
    37. Krishna Kumar Singh, Kayvon Fatahalian, and Alexei Efros. 2016. KrishnaCam: Using a longitudinal, single-person, egocentric dataset for scene understanding tasks. In 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).Google ScholarCross Ref
    38. Josef Sivic, Biliana Kaneva, Antonio Torralba, Shai Avidan, and William T. Freeman. 2008. Creating and Exploring a Large Photorealistic Virtual Space. In First IEEE Workshop on Internet Vision.Google Scholar
    39. Noah Snavely, Steven M. Seitz, and Richard Szeliski. 2006. Photo Tourism: Exploring Photo Collections in 3D. ACM Trans. Graph. 25, 3 (July 2006), 835–846. Google ScholarDigital Library
    40. C. Szegedy, Wei Liu, Yangqing Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. 2015. Going deeper with convolutions. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1–9.Google Scholar
    41. William Thies, Michal Karczmarek, and Saman P. Amarasinghe. 2002. StreamIt: A Language for Streaming Applications. In Proceedings of the 11th International Conference on Compiler Construction (CC ’02). Springer-Verlag, London, UK, UK, 179–196. Google ScholarDigital Library
    42. S. Yang and B. Wu. 2015. Large Scale Video Data Analysis Based on Spark. In 2015 International Conference on Cloud Computing and Big Data (CCBD). 209–212. Google ScholarDigital Library
    43. Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster Computing with Working Sets. In Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing (HotCloud’10). USENIX Association, Berkeley, CA, USA, 10–10. Google ScholarDigital Library
    44. Jun-Yan Zhu, Yong Jae Lee, and Alexei A. Efros. 2014. AverageExplorer: Interactive Exploration and Alignment of Visual Data Collections. ACM Trans. Graph. 33, 4, Article 160 (July 2014), 11 pages. Google ScholarDigital Library

ACM Digital Library Publication: