An open-source platform built on Unreal Engine 5 that generates interactive, physically-grounded 3D environments from natural language, with self- and co-evolution for embodied AI training.
1University of California San Diego 2New York University *Equal contribution
Watch SimCoder generate diverse UE5 scenes from text and image prompts, and see trained agents navigate them.
SimWorld Studio introduces three tightly integrated components that together close the loop between environment generation and agent learning.
Generate diverse, physically-grounded UE5 environments directly from natural language or image prompts. SimCoder synthesizes executable scene code via MCP tool calls and verifies it through physics and VLM checks โ no manual scene design required.
When SimCoder encounters repeated failures, it automatically converts them into generalizable capabilities โ authoring new MCP tools and skill-library entries that persist and improve all future generation tasks.
A bidirectional feedback loop: embodied agent success rates inform the coding agent to progressively raise difficulty, creating an adaptive curriculum that trains agents across all 8 difficulty levels without manual tuning.
SimWorld Studio integrates a UE5 backend with a Python MCP bridge, dual verification, skill registry, and a Gymnasium compliance layer.
SimWorld Studio pipeline: SimCoder receives natural language / image instructions, calls MCP tool APIs to generate UE5 scene code, runs dual verification (rule-based + VLM), self-evolves on failures, and exports a Gymnasium environment for downstream agent training.
SimCoder decomposes scene construction into a six-stage pipeline, using a growing library of skills and tools to achieve near-perfect physical validity while maintaining high semantic alignment with user intent.
SimCoder accepts a natural language (or image + text) prompt and orchestrates MCP tool calls, skill library composition, dual verification, and self-evolution to produce a physically valid UE5 scene.
UE5 function calls for asset placement, material assignment, and scene graph manipulation.
Reusable procedures (e.g. "furnish a room", "add outdoor lighting") composed for complex goals.
Collision-free rate, spatial validity, and object count checks give fast scalar feedback.
A vision-language model scores rendered screenshots for semantic alignment with the prompt.
Point-nav and object-nav tasks automatically derived from the scene graph โ no annotation needed.
Each verified scene compiles into a standard Gym environment with RGB-D, bearing, and distance.
Rather than discarding failed generation attempts, SimCoder extracts generalizable patterns and encodes them as permanent capabilities, expanding its skill repertoire autonomously over time.
Scene quality (VLM score) on the test set as SimCoder accumulates self-evolving examples. Each ablation condition is held fixed; adding tools, verification, and self-evolution each contribute โ self-evolution delivers the largest single jump (+0.21, from 0.55 to 0.76).
Co-evolution closes the feedback loop: agent performance metrics guide SimCoder to generate environments of the right difficulty, producing an automatic curriculum without any manual reward shaping.
(a) Environment difficulty ramps across 8 levels as the agent improves. (b) Training dynamics: the co-evolving agent drops then recovers at each level transition. (c) Test performance on SimWorld-MMNav: co-evolution reaches 90% SR, an 18pp gain over fixed-environment training (72%) and a 40pp gain over the untrained baseline (50%).
@misc{kang2026simworldstudio,
title={SimWorld Studio: Automatic Environment Generation with Evolving Coding Agent for Embodied Agent Learning},
author={Haoqiang Kang and Xiaokang Ye and Yuhan Liu and Siddhant Hitesh Mantri and Lingjun Mao and James Fleming and Drishti Regmi and Lianhui Qin},
year={2026},
eprint={2605.09423},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2605.09423},
}
Acknowledgement. We sincerely thank Aria Lin (hl3353@columbia.edu) for her contributions to the related-work investigation, and Lingge Meng (lim021@ucsd.edu), Vinayak Sharma (v9sharma@ucsd.edu), and Ishan Vaish (ivaish@ucsd.edu) for their help with running exploratory experiments, recording demonstration videos, and supporting the development of the project.