“WordsEye: an automatic text-to-scene conversion system” by Coyne and Sproat

  • ©Bob Coyne and Richard Sproat




    WordsEye: an automatic text-to-scene conversion system



    Natural language is an easy and effective medium for describing visual ideas and mental images. Thus, we foresee the emergence of language-based 3D scene generation systems to let ordinary users quickly create 3D scenes without having to learn special software, acquire artistic skills, or even touch a desktop window-oriented interface. WordsEye is such a system for automatically converting text into representative 3D scenes. WordsEye relies on a large database of 3D models and poses to depict entities and actions. Every 3D model can have associated shape displacements, spatial tags, and functional properties to be used in the depiction process. We describe the linguistic analysis and depiction techniques used by WordsEye along with some general strategies by which more abstract concepts are made depictable.


    1. Proceedings of the Sixth Message Understanding Conference (MUC-6), San Mateo, CA, 1995. Morgan Kaufmann.
    2. G. Adorni, M. Di Manzo, and F. Giunchiglia. Natural Language Driven Image Generation. In COLING 84, pages 495- 500, 1984.
    3. N. Badler, R. Bindiganavale, J. Allbeck, W. Schuler, L. Zhao, and M. Palmer. Parameterized Action Representation for Virtual Human Agents. In J. Cassell, J. Sullivan, S. Prevost, and E. Churchill, editors, Embodied Conversational Agents, pages 256-284. MIT Press, Cambridge, MA, 2000.
    4. R. Bindiganavale, W. Schuler, J. Allbeck, N. Badler, A. Joshi, and M. Palmer. Dynamically Altering Agent Behaviors Using Natural Language Instructions. In Autonomous Agents, pages 293-300, 2000.
    5. C. Brugman. The Story of Over. Master’s thesis, University of California, Berkeley, Berkeley, CA, 1980.
    6. Y. Chang and A. P. Rockwood. A Generalized de Casteljau Approach to 3D Free-Form Deformation. In SIGGRAPH 94 Conference Proceedings, pages 257-260. ACM SIGGRAPH, Addison Wesley, 1994.
    7. K. Church. A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text. In Proceedings of the Second Conference on Applied Natural Language Processing, pages 136-143, Morristown, NJ, 1988. Association for Computational Linguistics.
    8. S. R. Clay and J. Wilhelms. Put: Language-Based Interactive Manipulation of Objects. IEEE Computer Graphics and Applications, pages 31-39, March 1996.
    9. M. Collins. Head-Driven Statistical Models for Natural Language Parsing. PhD thesis, University of Pennsylvania, Philadelphia, PA, 1999.
    10. C. Fellbaum, editor. WordNet: An Electronic Lexical Database. MIT Press, Cambridge, MA, 1998.
    11. C. Freksa, C. Habel, and K. F. Wender, editors. Spatial Cognition. Springer, Berlin, 1998.
    12. J. Funge, X. Tu, and D. Terzopoulos. Cognitive Modeling: Knowledge, Reasoning and Planning for Intelligent Characters. In SIGGRAPH 99 Conference Proceedings, pages 29- 38. ACM SIGGRAPH, Addison Wesley, 1999.
    13. Sanda Harabagiu and Steven Maiorano. Knowledge-lean coreference resolution and its relation to textual cohesion and coherence. In Proceedings of the ACL-99 Workshop on the Relation of Discourse/Dialogue Structure and Reference, pages 29-38, College Park, MD, 1999. Association for Computational Linguistics.
    14. B. Hawkins. The Semantics of English Spatial Prepositions. PhD thesis, University of California, San Diego, San Diego, CA, 1984.
    15. A. Herskovitz. Language and Spatial Cognition: An Interdisciplinary Study of the Prepositions in English. Cambridge University Press, Cambridge, 1986.
    16. R. Hudson. Word Grammar. Blackwell, Oxford, 1984.
    17. R. Langacker. Foundations of Cognitive Grammar : Theoretical Prerequisites. Stanford University Press, Stanford, CA, 1987.
    18. B. Levin. English Verb Classes And Alternations: A Preliminary Investigation. University of Chicago Press, Chicago, IL, 1993.
    19. M. Marcus, B. Santorini, and M. A. Marcinkiewicz. Building a Large Annotated Corpus of English: the Penn Treebank. Computational Linguistics, 19(2):313-330, 1993.
    20. P. Olivier and K.-P. Gapp, editors. Representation and Processing of Spatial Prepositions. Lawrence Erlbaum Associates, Mahwah, NJ, 1998.
    21. C. W. Reynolds. Flocks, Herds and Schools: A Distributed Behavioral Model. In SIGGRAPH 87 Conference Proceedings, pages 25-34. ACM SIGGRAPH, Addison Wesley, 1987.
    22. G. Senft, editor. Referring to Space: Studies in Austronesian and Papuan Languages. Clarendon Press, Oxford, 1997.
    23. T. Winograd. Understanding Natural Language. PhD thesis, Massachusetts Institute of Technology, 1972.
    24. A. Yamada. Studies on Spatial Description Understanding based on Geometric Constraints Satisfaction. PhD thesis, Kyoto University, Kyoto, 1993.
    25. J. Zhao and N. Badler. Inverse Kinematics Positioning Using Nonlinear Programming for Highly Articulated Figures. ACM Transactions on Graphics, pages 313-336, October 1994.

ACM Digital Library Publication:

Overview Page: