“Roto++: accelerating professional rotoscoping using shape manifolds”

  • ©Wenbin Li, Fabio Viola, Jonathan Starck, Gabriel J. Brostow, and Neill D. F. Campbell




    Roto++: accelerating professional rotoscoping using shape manifolds


Session Title: CAMERA CONTROL & VR



    Rotoscoping (cutting out different characters/objects/layers in raw video footage) is a ubiquitous task in modern post-production and represents a significant investment in person-hours. In this work, we study the particular task of professional rotoscoping for high-end, live action movies and propose a new framework that works with roto-artists to accelerate the workflow and improve their productivity. Working with the existing keyframing paradigm, our first contribution is the development of a shape model that is updated as artists add successive keyframes. This model is used to improve the output of traditional interpolation and tracking techniques, reducing the number of keyframes that need to be specified by the artist. Our second contribution is to use the same shape model to provide a new interactive tool that allows an artist to reduce the time spent editing each keyframe. The more keyframes that are edited, the better the interactive tool becomes, accelerating the process and making the artist more efficient without compromising their control. Finally, we also provide a new, professionally rotoscoped dataset that enables truly representative, real-world evaluation of rotoscoping methods. We used this dataset to perform a number of experiments, including an expert study with professional roto-artists, to show, quantitatively, the advantages of our approach.


    1. Agarwal, S., Mierle, K., and Others, 2015. Ceres solver. http://ceres-solver.org.Google Scholar
    2. Agarwala, A., Hertzmann, A., Salesin, D. H., and Seitz, S. M. 2004. Keyframe-based tracking for rotoscoping and animation. ACM Trans. on Graphics 23, 3. Google ScholarDigital Library
    3. Bai, X., and Sapiro, G. 2007. A geodesic framework for fast interactive image and video segmentation and matting. In IEEE Int. Conf. on Computer Vision (ICCV).Google Scholar
    4. Bai, X., Wang, J., Simons, D., and Sapiro, G. 2009. Video snapcut: Robust video object cutout using localized classifiers. ACM Trans. on Graphics 28, 3. Google ScholarDigital Library
    5. Baker, S., and Matthews, I. 2004. Lucas-kanade 20 years on: A unifying framework. Int. Journal of Computer Vision 56, 3. Google ScholarDigital Library
    6. Baumberg, A. M., and Hogg, D. C. 1994. An efficient method for contour tracking using active shape models. In IEEE Workshop on Motion of Nonrigid and Articulated Objects, 194–199.Google Scholar
    7. Benhimane, S., and Malis, E. 2004. Real-time image-based tracking of planes using efficient second-order minimization. In IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), vol. 1, 943–948.Google Scholar
    8. Blender Online Community. 2015. Blender – a 3D modelling and rendering package. Blender Foundation.Google Scholar
    9. Boykov, Y. Y., and Jolly, M.-P. 2001. Interactive graph cuts for optimal boundary amp; region segmentation of objects in n-d images. In IEEE Int. Conf. on Computer Vision (ICCV), vol. 1, 105–112.Google ScholarCross Ref
    10. Bratt, B. 2011. Rotoscoping: Techniques and Tools for the Aspiring Artist. Taylor & Francis.Google Scholar
    11. Campbell, N. D. F., and Kautz, J. 2014. Learning a manifold of fonts. ACM Trans. on Graphics 33, 4. Google ScholarDigital Library
    12. Chuang, Y.-Y., Agarwala, A., Curless, B., Salesin, D. H., and Szeliski, R. 2002. Video matting of complex scenes. ACM Trans. on Graphics 21, 3. Google ScholarDigital Library
    13. Cootes, T. F., Taylor, C. J., Cooper, D. H., and Graham, J. 1995. Active shape models and their training and application. Computer Vision and Image Understanding 61, 1, 38–59. Google ScholarDigital Library
    14. Criminisi, A., Sharp, T., Rother, C., and Pérez, P. 2010. Geodesic image and video editing. ACM Trans. on Graphics 29, 5. Google ScholarDigital Library
    15. Dame, A., Prisacariu, V. A., Ren, C. Y., and Reid, I. 2013. Dense reconstruction using 3D object shape priors. In IEEE Int. Conf. on Computer Vision and Pattern Recognition (CVPR). Google ScholarDigital Library
    16. Erofeev, M., Gitman, Y., Vatolin, D., Fedorov, A., and Wang, J. 2015. Perceptually motivated benchmark for video matting. In British Machine Vision Conference (BMVC).Google Scholar
    17. Fan, Q., Zhong, F., Lischinski, D., Cohen-Or, D., and Chen, B. 2015. JumpCut: non-successive mask transfer and interpolation for video cutout. ACM Trans. on Graphics 34, 6. Google ScholarDigital Library
    18. Grochow, K., Martin, S. L., Hertzmann, A., and Popović, Z. 2004. Style-based inverse kinematics. ACM Trans. on Graphics 23, 3. Google ScholarDigital Library
    19. Grundmann, M., Kwatra, V., Han, M., and Essa, I. 2010. Efficient hierarchical graph based video segmentation. IEEE Int. Conf. on Computer Vision and Pattern Recognition (CVPR).Google Scholar
    20. Kwatra, V., Schödl, A., Essa, I., Turk, G., and Bobick, A. 2003. Graphcut textures: image and video synthesis using graph cuts. ACM Trans. on Graphics 22, 3. Google ScholarDigital Library
    21. Lawrence, N. 2005. Probabilistic non-linear principal component analysis with gaussian process latent variable models. Journal of Machine Learning Research 6, 1783–1816. Google ScholarDigital Library
    22. Lebeda, K., Hadfield, S., Matas, J., and Bowden, R. 2016. Texture-independent long-term tracking using virtual corners. IEEE Trans. on Image Processing 25, 1, 359–371.Google ScholarCross Ref
    23. Levin, A., Lischinski, D., and Weiss, Y. 2008. A closed-form solution to natural image matting. IEEE Trans. on Pattern Analysis and Machine Intelligence 30, 2, 228–242. Google ScholarDigital Library
    24. Li, Y., Sun, J., and Shum, H.-Y. 2005. Video object cut and paste. ACM Trans. on Graphics 24, 3. Google ScholarDigital Library
    25. Loper, M. M., Mahmood, N., and Black, M. J. 2014. MoSh: Motion and shape capture from sparse markers. ACM Trans. on Graphics 33, 6. Google ScholarDigital Library
    26. Lucas, B. D., and Kanade, T. 1981. An iterative image registration technique with an application to stereo vision. In Int. Joint Conf. on AI, vol. 2, 674–679. Google ScholarDigital Library
    27. Mortensen, E. N., and Barrett, W. A. 1995. Intelligent scissors for image composition. In Proc. of SIGGRAPH, ACM, 191–198. Google ScholarDigital Library
    28. Prisacariu, V. A., and Reid, I. 2011. Nonlinear shape manifolds as shape priors in level set segmentation and tracking. In IEEE Int. Conf. on Computer Vision and Pattern Recognition (CVPR). Google ScholarDigital Library
    29. Rasmussen, C. E., and Williams, C. 2006. Gaussian Processes for Machine Learning. MIT Press. Google ScholarDigital Library
    30. Rhemann, C., Rother, C., Wang, J., Gelautz, M., Kohli, P., and Rott, P. 2009. A perceptually motivated online benchmark for image matting. In IEEE Int. Conf. on Computer Vision and Pattern Recognition (CVPR).Google Scholar
    31. Rother, C., Kolmogorov, V., and Blake, A. 2004. “Grab-Cut”: Interactive foreground extraction using iterated graph cuts. ACM Trans. on Graphics 23, 3. Google ScholarDigital Library
    32. Rzeszutek, R., El-Maraghi, T., and Androutsos, D. 2009. Interactive rotoscoping through scale-space random walks. In IEEE Int. Conf. on Multimedia and Expo (ICME), 1334–1337. Google ScholarDigital Library
    33. Santosa, S., Chevalier, F., Balakrishnan, R., and Singh, K. 2013. Direct space-time trajectory control for visual media editing. In ACM SIGCHI Conf. on Human Factors in Computing Systems, 1149–1158. Google ScholarDigital Library
    34. SilhouetteFX. Silhouette 5.2 User Guide 2014.Google Scholar
    35. Smith, A. R., and Blinn, J. F. 1996. Blue screen matting. In Proc. of SIGGRAPH, ACM, 259–268. Google ScholarDigital Library
    36. Su, Q., Li, W. H. A., Wang, J., and Fu, H. 2014. EZ-Sketching: Three-level optimization for error-tolerant image tracing. ACM Trans. on Graphics 33, 4. Google ScholarDigital Library
    37. Subr, K., Paris, S., Soler, C., and Kautz, J. 2013. Accurate binary image selection from inaccurate user input. Computer Graphics Forum 32, 2, 41–50.Google ScholarCross Ref
    38. Titsias, M., and Lawrence, N. 2010. Bayesian gaussian process latent variable model. In Proc. of 13th Int. Workshop on AI and Stats.Google Scholar
    39. Torresani, L., Yang, D. B., Alexander, E. J., and Bregler, C. 2001. Tracking and modeling non-rigid objects with rank constraints. In IEEE Int. Conf. on Computer Vision and Pattern Recognition (CVPR).Google Scholar
    40. Torresani, L., Hertzmann, A., and Bregler, C. 2008. Nonrigid structure-from-motion: Estimating shape and motion with hierarchical priors. IEEE Trans. on Pattern Analysis and Machine Intelligence 30, 5, 878–892. Google ScholarDigital Library
    41. Tsai, D., Flagg, M., Nakazawa, A., and Rehg, J. M. 2012. Motion coherent tracking using multi-label mrf optimization. Int. Journal of Computer Vision 100, 2, 190–202. Google ScholarDigital Library
    42. Turmukhambetov, D., Campbell, N. D. F., Goldman, D. B., and Kautz, J. 2015. Interactive sketch-driven image synthesis. Computer Graphics Forum 34, 8, 130–142.Google ScholarDigital Library
    43. Wang, J., and Cohen, M. F. 2007. Image and video matting: A survey. ACM Foundations and Trends in Computer Graphics and Vision 3, 2. Google ScholarDigital Library
    44. Wang, T., and Collomosse, J. 2012. Probabilistic motion diffusion of labeling priors for coherent video segmentation. IEEE Trans. on Multimedia 14, 2, 389–400. Google ScholarDigital Library
    45. Wang, J., Bhat, P., Colburn, R. A., Agrawala, M., and Cohen, M. F. 2005. Interactive video cutout. ACM Trans. on Graphics 24, 3. Google ScholarDigital Library
    46. Wang, J., Agrawala, M., and Cohen, M. F. 2007. Soft scissors: An interactive tool for realtime high quality matting. ACM Trans. on Graphics 26, 3. Google ScholarDigital Library

ACM Digital Library Publication: