“Seeing is believing: body motion dominates in multisensory conversations” by Ennis, McDonnell and O’Sullivan

  • ©Cathy Ennis, Rachel McDonnell, and Carol O'Sullivan




    Seeing is believing: body motion dominates in multisensory conversations



    In many scenes with human characters, interacting groups are an important factor for maintaining a sense of realism. However, little is known about what makes these characters appear realistic. In this paper, we investigate human sensitivity to audio mismatches (i.e., when individuals’ voices are not matched to their gestures) and visual desynchronization (i.e., when the body motions of the individuals in a group are mis-aligned in time) in virtual human conversers. Using motion capture data from a range of both polite conversations and arguments, we conduct a series of perceptual experiments and determine some factors that contribute to the plausibility of virtual conversing groups. We found that participants are more sensitive to visual desynchronization of body motions, than to mismatches between the characters’ gestures and their voices. Furthermore, synthetic conversations can appear sufficiently realistic once there is an appropriate balance between talker and listener roles. This is regardless of body motion desynchronization or mismatched audio.


    1. Bernard, S., Therien, J., Malone, C., Beeson, S., Gubman, A., and Pardo, R., 2008. Taming the Mob: Creating believable crowds in Assassin’s Creed. Presented at Game Developers Conference (San Francisco, CA, Feb 18–22).Google Scholar
    2. Bickmore, T., and Cassell, J. 2005. Social Dialogue with Embodied Conversational Agents. In Advances in natural multimodal dialogue systems, 23–54.Google Scholar
    3. Briton, N. J., and Hall, J. A. 1995. Beliefs about female and male nonverbal communication. Sex Roles: Journal of Research 32, 1–2, 79–90.Google ScholarCross Ref
    4. Cassell, J., Nakano, Y., Bickmore, T., Sidner, C., and Rich, C. 2001. Non-verbal cues for discourse structure. In Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, 114–123. Google ScholarDigital Library
    5. Cassell, J., Vihjálmsson, H., and Bickmore, T. 2001. BEAT: the Behavior Expression Animation Toolkit. In Proceedings of SIGGRAPH 2001, 477–486. Google ScholarDigital Library
    6. Durupinar, F., Allbeck, J., Pelechano, N., and Badler, N. 2008. Creating crowd variation with the OCEAN personality model. In Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems, 1217–1220. Google ScholarDigital Library
    7. 2010. Edge online: Performance art. http://www.edge-online.com/magazine/performance-art.Google Scholar
    8. Ekman, P., and Friesen, W. 1969. The repertoire of nonverbal behavior: Categories, origins, usage, and coding. Semiotica 1, 1, 49–98.Google ScholarCross Ref
    9. Ekman, P. 1992. An argument for basic emotions. Cognition and Emotion 6, 3, 169–200.Google ScholarCross Ref
    10. Giorgolo, G., and Verstraten, F. 2008. Perception of speech-and-gesture integration. In Proceedings of the International Conference on Auditory-Visual Speech Processing 2008, 31–36.Google Scholar
    11. Goldin-Meadow, S. 2005. Hearing gesture: How our hands help us think. Belknap Press.Google Scholar
    12. Johansson, G. 1973. Visual perception of biological motion and a model for its analysis. Perception and Psychophysics 14, 2, 201–211.Google ScholarCross Ref
    13. Kendon, A. 1994. Do gestures communicate? A review. Research on language and social interaction 27, 3, 175–200.Google Scholar
    14. Krahmer, E., and Swerts, M. 2007. The effects of visual beats on prosodic prominence: Acoustic analyses, auditory perception and visual perception. Journal of Memory and Language 57, 3, 396–414.Google ScholarCross Ref
    15. Lerner, A., Fitusi, E., Chrysanthou, Y., and Cohen-Or, D. 2009. Fitting behaviors to pedestrian simulations. In SCA ’09: Proceedings of the 2009 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, 199–208. Google ScholarDigital Library
    16. Levine, S., Theobalt, C., and Koltun, V. 2009. Real-time prosody-driven synthesis of body language. ACM Transactions on Graphics 28, 5, 1–10. Google ScholarDigital Library
    17. McDonnell, R., Larkin, M., Dobbyn, S., Collins, S., and O’Sullivan, C. 2008. Clone attack! perception of crowd variety. ACM Transactions on Graphics 27, 3, 26:1–26:8. Google ScholarDigital Library
    18. McDonnell, R., Ennis, C., Dobbyn, S., and O’Sullivan, C. 2009. Talking bodies: Sensitivity to desynchronization of conversations. ACM Transactions on Applied Perception 6, 4, 22:1–22:8. Google ScholarDigital Library
    19. McNeill, D. 1996. Hand and mind: What gestures reveal about thought. University of Chicago Press.Google Scholar
    20. Musicant, A. D., and Butler, R. A. 1984. The influence of pinnae-based spectral cues on sound localization. The Journal of the Acoustical Society of America 75, 4, 1195–1200.Google ScholarCross Ref
    21. Neff, M., Kipp, M., Albrecht, I., and Seidel, H. 2008. Gesture modeling and animation based on a probabilistic re-creation of speaker style. ACM Transactions on Graphics 27, 1, 1–24. Google ScholarDigital Library
    22. Peters, C., and Ennis, C. 2009. Modeling groups of plausible virtual pedestrians. IEEE Computer Graphics and Applications 29, 4, 54–63. Google ScholarDigital Library
    23. Rose, D., and Clarke, T. J. 2009. Look who’s talking: Visual detection of speech from whole-body biological motion cues during emotive interpersonal conversation. Perception 38, 1, 153–156.Google ScholarCross Ref
    24. Rouse, R. 1998. Embrace your limitations — cut-scenes in computer games. ACM SIGGRAPH Computer Graphics 32, 4, 7–9. Google ScholarDigital Library
    25. Vilhjálmsson, H., and Cassell, J. 1998. Bodychat: autonomous communicative behaviors in avatars. In Proceedings of the second international conference on Autonomous agents, 269–276. Google ScholarDigital Library

ACM Digital Library Publication: