Towards AI-Driven 3D Creation at the Speed of Thought

Ian Huang

“Towards AI-Driven 3D Creation at the Speed of Thought” by Huang

Next: “Towards an interactive high visual complexity... »

« Previous: “Towards a unified approach to 3D environment...

Conference:

SIGGRAPH 2025

Type(s):

Talks-Sketches

Title:

Towards AI-Driven 3D Creation at the Speed of Thought

Session/Category Title:

ML in Production

Presenter(s)/Author(s):

Ian Huang

Moderator(s):

Nora Wixom

Abstract:

Advances in generative AI have enabled the creation of compelling stories, images, 3D assets, and videos, but integrating these models into professional 3D workflows remains a challenge. In many cases, AI-generated content is most valuable when it remains editable—allowing artists, engineers, and designers to refine and adapt it within existing tools. My research at Stanford has explored the extent to which today’s generative AI models can use 3D graphics tools like Blender. Across multiple projects, we’ve examined the capabilities and limitations of large vision-language models (VLMs) in automating tedious editing workflows and generating structured 3D scenes. I will summarize key insights from our work, including: (1) The first VLM framework for completing spatial graphical tasks using the visual reasoning capabilities of VLMs, and strategies for improving performance without fine-tuning VLMs, including inference scaling and verifier-guided generation. (2) The first benchmark to evaluate how well foundation models understand and execute 3D editing tasks, and the study of new scaling laws that such a benchmark enables. (3) How VLMs can perform object placement within 3D scenes while respecting both common-sense semantics and geometric constraints. Our method outperforms prior state-of-the-art methods by 2X in terms of error and generates placements that are >4X more preferred than prior methods. A central theme of this talk is the idea that the knowledge encoded in existing creative tools is valuable —AI should learn to leverage these tools rather than replace them. Doing so will enable models to become powerful assistants within our 3D workflow while offering editability and control. Finally, I will reflect on the open research questions in this space and discuss the exciting applications that AI 3D tool-use could unlock across industries like robotics, game design, VFX, and industrial design, where AI will be able to solve complicated design optimization problems directly within simulation.

References:

[1] Yunqi Gu, Ian Huang, Jihyeon Je, Guandao Yang, and Leonidas Guibas. 2025. BlenderGym: Benchmarking Foundational Model Systems for Graphics Editing. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.
[2] Ian Huang, Yanan Bao, Karen Truong, Howard Zhou, Cordelia Schmid, Leonidas Guibas, and Alireza Fathi. 2025. FirePlace: Geometric Refinements of LLM Common Sense Reasoning for 3D Object Placement. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.
[3] Ian Huang, Guandao Yang, and Leonidas Guibas. 2024. Blenderalchemy: Editing 3d graphics with vision-language models. In European Conference on Computer Vision. Springer, 297–314.

ACM Digital Library Publication:

Towards AI-Driven 3D Creation at the Speed of Thought

Overview Page:

SIGGRAPH 2025: Talks

Submit a story:

If you would like to submit a story about this presentation, please contact us: historyarchives@siggraph.org

ACM SIGGRAPH HISTORY ARCHIVES