“Video tooning” by Wang, Xu, Shum and Cohen

  • ©Jue Wang, Yingqing Xu, Heung-Yeung Shum, and Michael F. Cohen




    Video tooning



    We describe a system for transforming an input video into a highly abstracted, spatio-temporally coherent cartoon animation with a range of styles. To achieve this, we treat video as a space-time volume of image data. We have developed an anisotropic kernel mean shift technique to segment the video data into contiguous volumes. These provide a simple cartoon style in themselves, but more importantly provide the capability to semi-automatically rotoscope semantically meaningful regions.In our system, the user simply outlines objects on keyframes. A mean shift guided interpolation algorithm is then employed to create three dimensional semantic regions by interpolation between the keyframes, while maintaining smooth trajectories along the time dimension. These regions provide the basis for creating smooth two dimensional edge sheets and stroke sheets embedded within the spatio-temporal video volume. The regions, edge sheets, and stroke sheets are rendered by slicing them at particular times. A variety of styles of rendering are shown. The temporal coherence provided by the smoothed semantic regions and sheets results in a temporally consistent non-photorealistic appearance.


    1. AGARWALA, A. 2002. Snaketoonz: A semi-automatic approach to creating cel animation from video. In Proceedings of NPAR 2002. Google ScholarDigital Library
    2. BELONGIE, S., MALIK, J., AND PUZICHA, J. 2002. Shape matching and object recognition using shape contexts. IEEE Trans. on Pattern Analysis and Machine Intelligence 24, 4, 509–522. Google ScholarDigital Library
    3. CHRISTOUDIAS, C., GEORGESCU, B., AND MEER, P. 2002. Synergism in low-level vision. In Proc. of 16th International Conference on Pattern Recognition, 150–155. Google ScholarDigital Library
    4. COLLOMOSSE, J. P., ROWNTREE, D., AND HALL, P. M. 2003. Stroke surfaces: A spatio-temporal framework for temporally coherent non-photorealistic animations. University of Bath, Technical Report CSBU 2003-01 (June 2003).Google Scholar
    5. COMANICIU, D., AND MEER, P. 2002. Mean shift: A robust approach toward feature space analysis. IEEE Trans. Pattern Analysis and Machine Intelligence 24, 5, 603–619. Google ScholarDigital Library
    6. COMANICIU, D., RAMESH, V., AND MEER, P. 2000. Real-time tracking of non-rigid objects using mean shift. In Porc. of IEEE Conf. on Comp. Vis. and Pat. Rec (CVPROO), 142–151.Google ScholarCross Ref
    7. DECARLO, D., AND SANTELLA, A. 2002. Stylization and abstraction of photographs. In Proceedings of SIGGRAPH 2002, 769–776. Google ScholarDigital Library
    8. DEMENTHON, D. 2002. Spatio-temporal segmentation of video by hierarchical mean shift analysis. In Porc. of Statistical Methods in Video Processing Workshop.Google Scholar
    9. FLOATER, M. S. 2003. Mean value coordinates. Computer Aided Geometric Design 20, 19–27. Google ScholarDigital Library
    10. FUKUNAGA, K., AND HOSTETLER, L. 1975. The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Trans. Information Theory 21, 32–40.Google ScholarDigital Library
    11. HERTZMANN, A., AND PERLIN, K. 2000. Painterly rendering for video and interaction. In Proceedings of NPAR 2000, 7–12. Google ScholarDigital Library
    12. HERTZMANN, A. 2001. Paint by relaxation. In Proc. Computer Graphics International 2001, 47–54. Google ScholarDigital Library
    13. HOCH, M., AND LITWINOWICZ, P. C. 1996. A semi-automatic system for edge tracking with snakes. The Visual Computer 12, 2, 75–83.Google ScholarCross Ref
    14. HSU, S. C., AND LEE, I. H. H. 1994. Drawing and animation using skeletal strokes. In Proceedings Computer Graphics (ACM SIGGRAPH), ACM Press, 109–118. Google ScholarDigital Library
    15. JONKER, R., AND VOLGENANT, A. 1987. A shortest augmenting path algorithm for dense and sparse linear assignment problems. Computing 38, 325–340. Google ScholarDigital Library
    16. KASS, M., WITKIN, A., AND TERZOPOULOS, D. 1987. Snakes: Active contour models. International Journal of Computer Vision 1, 4, 321–331.Google ScholarCross Ref
    17. KLEIN, A. W., SLOAN, P.-P. J., FINKELSTEIN, A., AND COHEN, M. F. 2002. Stylized video cubes. In Proceedings of SCA 2002. Google ScholarDigital Library
    18. KORT, A. 2002. Computer aided inbetweening. In NPAR 2002: Second International Symposium on Non Photorealistic Rendering, 125–132. Google ScholarDigital Library
    19. LINKLATER, R. 2001. Waking Life DVD. Twentieth Century Fox Home Video.Google Scholar
    20. LITWINOWICZ, P. 1997. Processing images and video for an impressionist effect. In Proceedings of SIGGRAPH 1997, ACM Press / ACM SIGGRAPH, Computer Graphics Proceedings, Annual Conference Series, ACM, 151–158. Google ScholarDigital Library
    21. LORENSEN, W., AND CLINE, H. 1987. Marching cubes: a high resolution 3D surface reconstruction algorithm. In Proceedings of SIGGRAPH 1987, 163–169. Google ScholarDigital Library
    22. WANG, J., THIESSON, B., XUU, Y., AND COHEN, M. F. 2004. Image and video segmentation by anisotropic kernel mean shift. In Proc. European Conference on Computer Vision, 2004.Google ScholarCross Ref

ACM Digital Library Publication:

Overview Page: