“3D position, attitude and shape input using video tracking of hands and lips” by Blake and Isard

  • ©Andrew Blake and Michael Isard




    3D position, attitude and shape input using video tracking of hands and lips



    Recent developments in video-tracking allow the outlines of moving, natural objects in a video-camera input stream to be tracked live, at full video-rate. Previous systems have been available to do this for specially illuminated objects or for naturally illuminated but polyhedral objects. Other systems have been able to track nonpolyhedral objects in motion, in some cases from live video, but following only centroids or key-points rather than tracking whole curves. The system described here can track accurately the curved silhouettes of moving non-polyhedral objects at frame-rate, for example hands, lips, legs, vehicles, fruit, and without any special hardware beyond a desktop workstation and a video-camera and framestore.The new algorithms are a synthesis of methods in deformable models, B-splines curve representation and control theory. This paper shows how such a facility can be used to turn parts of the body—for instance, hands and lips—into input devices. Rigid motion of a hand can be used as a 3D mouse with non-rigid gestures signalling a button press or the “lifting” of the mouse. Both rigid and non-rigid motions of lips can be tracked independently and used as inputs, for example to animate a computer-generated face.


    1. K. J. Astrom and B. Wittenmark. Computer Controlled Systems. AddisonWesley, 1984.
    2. K. J. Astrom and B. Wittenmark. Adaptive control. Addison Wesley, 1989.
    3. A. Azarbayejani, T. Starner, B. Horowitz, and A. Pentland. Visually controlled graphics. IEEE Trans. Pattern Analysis and Machine Intell., in press, 1993.
    4. Y. Bar-Shalom and T.E. Fortmann. Tracking and Data Association. Academic Press, 1988.
    5. Stephen Barnett. Matrices: Methods and Applications. OxfordUniversity Press, 1990.
    6. R.H. Bartels, J.C. Beatty, and B.A. Barsky. An Introductionto Splines for use in Computer Graphics and Geometric Modeling. Morgan Kaufmann, 1987.
    7. A. Blake, R. Curwen, and A. Zisserman. A framework for spatio-temporal control in the tracking of visual contours. Int. Journal of Computer Vision, 11(2):127-145,1993.
    8. R. Cipolla and A. Blake. The dynamic analysis of apparent contours. In Proc. 3rd Int. Conf. on Computer Vision, pages 616-625, 1990.
    9. T.F. Cootes, C.J. Taylor, A. Lanitis, D.H. Cooper, and J. Graham. Buiding and using flexible models incorporating grey-level information. In Proc. 4th Int. Conf. on Computer Vision, pages 242-246,1993.
    10. E.D. Dickmanns and V. Graefe. Applications of dynamic monocular machine vision. Machine Vision and Applications, 1:241-261,1988.
    11. I.D. Faux and M.J. Pratt. ComputationalGeometry for Design and Manufacture. Ellis-Horwood, 1979.
    12. M. A. Fischler andR. A. Elschlager. The representationandmatchingof pictorial structures. IEEE. Trans. Computers, C-22(1), 1973.
    13. Arthur Gelb, editor. Applied Optimal Estimation. MIT Press, Cambridge, MA, 1974.
    14. U. Grenander, Y. Chow, andD. M. Keenan. HANDS. A PatternTheoretical Study of Biological Shapes. Springer-Verlag. New York, 1991.
    15. C. Harris. Tracking with rigid models. In A. Blake and A. Yuille, editors, Active Vision, pages 59-74. MIT, 1992.
    16. B.K.P. Horn. Robot Vision. McGraw-Hill, NY, 1986.
    17. M. Kass, A. Witkin, and D. Terzopoulos. Snakes: active contour models. In Proc. 1st Int. Conf. on Computer Vision, pages 259-268,1987.
    18. J.J. Koenderink and A.J. Van Doorn. Affine structure from motion. J. Optical Soc. of America A., 8(2):337-385,1991.
    19. D.G. Lowe. Robust model-based motion tracking through the integration of search and estimation. Int. Journal of Computer Vision, 8(2):113-122, 1992.
    20. K. Mase and A. Pentland. Automatic lip-readingby optical flow analysis. Media Lab Report 117, MIT, 1991.
    21. S. Menet, P. Saint-Marc, and G. Medioni. B-snakes: implementation and application to stereo. In Proceedings DARPA, pages 720-726, 1990.
    22. A. Papoulis. Probability and Statistics. Prentice-Hall, 1990.
    23. R. Szeliski and D. Terzopoulos. Physically-based and probabilistic modelingfor computer vision. In B. C. Vemuri, editor, Proc. SPIE 1570, Geometric Methods in Computer Vision, pages 140-152, San Diego, CA, July 1991. Society of Photo-Optical Instrumentation Engineers.
    24. D. Terzopoulos and K. Waters. Physically-based facial modelling, analysis and animation. J. Visualization and COmputer Animation, 11(2), 1990.
    25. S. Ullman and R. Basri. Recognition by linear combinations of models. IEEE Trans. Pattern Analysis and Machine Intelligence, 13(10):992-1006, 1991.
    26. C. Ware and D.R. Jessome. Using the bat: a six-dimensional mouse for object placement. In Proc. Graphics Interface, pages 119-124,1988.
    27. L. Williams. Performance-driven facial animation. In Proc. Siggraph, pages 235-242.ACM, 1990.

ACM Digital Library Publication:

Overview Page: