“Crafting 3D Worlds for Intelligent Vision: generation, reconstruction, and interaction” by Guo
Conference:
Type(s):
Title:
- Crafting 3D Worlds for Intelligent Vision: generation, reconstruction, and interaction
Presenter(s)/Author(s):
Abstract:
AI is experiencing a pivotal shift from the language-centric intelligence of LLMs to the visual and spatial intelligence enabled by world models, which are key for general intelligence. Modeling dynamic 3D environments allows agents to perceive and interact with complex worlds, enhancing embodied robotics, autonomous systems, and so on.
This talk will focus on the development and future of world models from a 3D vision perspective, introducing the Hunyuan World model series. Defining a “”world”” as an interactive 3D scene, we will explore our research across three core areas: world generation, reconstruction, and interaction.
The presentation will chart our work’s progression, beginning with Hunyuan World 1.0 for immersive 3D scene generation and our subsequent enhancements that dramatically accelerate generation speed and expand the scale of explorable 3D worlds. We will then introduce WorldMirror, our unified model designed for feedforward and high-fidelity 3D reconstruction from varied input settings. Finally, we will demonstrate WorldPlay, our model for enabling dynamic world interaction with long-term 3D memory and real-time latency. Together, the Hunyuan World series charts a path from static generation to dynamic interaction, representing a significant step in world-level 3D content creation and providing strong baselines for future research in this exciting field.


