SimWorld Studio iconSimWorld Studio:
Automatic Environment Generation
with Evolving Coding Agent
for Embodied Agent Learning

An open-source platform built on Unreal Engine 5 that generates interactive, physically-grounded 3D environments from natural language, with self- and co-evolution for embodied AI training.

Haoqiang Kang1,*, Xiaokang Ye1,*, Yuhan Liu2, Siddhant Hitesh Mantri1, Lingjun Mao1,
James Fleming1, Drishti Regmi1, Lianhui Qin1

1University of California San Diego    2New York University    *Equal contribution

0.98+
Collision-free rate
physical validity
+12pp
Cross-env transfer
to SimWorld-MMNav
+18pp
Co-evolution SR gain
over fixed curriculum

Environment Generation & Agent Training

Watch SimCoder generate diverse UE5 scenes from text and image prompts, and see trained agents navigate them.

Overview

LLM/VLM-based digital agents have advanced rapidly thanks to scalable sandboxes for coding, web navigation, and computer use, which provide rich interactive training grounds. In contrast, embodied agents still lack abundant, diverse, and automatically generated 3D environments for interactive learning. Existing embodied simulators rely on manually crafted scenes or procedural templates, while recent LLM-based 3D generation systems mainly produce static scenes rather than deployable environments with verifiable tasks and standard learning interfaces.

We introduce SimWorld Studio, an open-source platform built on Unreal Engine 5 for generating evolving embodied learning environments. At its core is SimCoder, a tool/skill-augmented coding agent that writes and executes engine-level code to construct physically grounded 3D worlds from language/image instructions. SimCoder self-evolves by using verifier feedback (e.g., compilation errors, physics checks, VLM critiques) to revise environments and autonomously add reusable tools and skills to its library. Generated worlds are exported as Gym-style environments for embodied agent learning.

SimWorld Studio further enables co-evolution between environment generation and embodied learning: agent performance feedback guides SimCoder to generate adaptive curricula near the learner's capability frontier, so that environments become increasingly challenging as the embodied agent improves. Three case studies on embodied navigation show that self-evolution improves generation reliability, generated environments substantially improve embodied agent performance that generalizes to unseen benchmarks, and co-evolution yields an 18-point success-rate gain over fixed-environment learning and a 40-point gain over an untrained agent.

Core Innovations

SimWorld Studio introduces three tightly integrated components that together close the loop between environment generation and agent learning.

๐Ÿ—๏ธ

Automatic Environment Generation

Generate diverse, physically-grounded UE5 environments directly from natural language or image prompts. SimCoder synthesizes executable scene code via MCP tool calls and verifies it through physics and VLM checks โ€” no manual scene design required.

๐Ÿงฌ

Self-evolving SimCoder

When SimCoder encounters repeated failures, it automatically converts them into generalizable capabilities โ€” authoring new MCP tools and skill-library entries that persist and improve all future generation tasks.

๐Ÿ”„

Co-Evolving Environment and Embodied Agent

A bidirectional feedback loop: embodied agent success rates inform the coding agent to progressively raise difficulty, creating an adaptive curriculum that trains agents across all 8 difficulty levels without manual tuning.

Platform Overview

SimWorld Studio integrates a UE5 backend with a Python MCP bridge, dual verification, skill registry, and a Gymnasium compliance layer.

SimWorld Studio system architecture

SimWorld Studio pipeline: SimCoder receives natural language / image instructions, calls MCP tool APIs to generate UE5 scene code, runs dual verification (rule-based + VLM), self-evolves on failures, and exports a Gymnasium environment for downstream agent training.

Generating Environments with Vibe Coding

SimCoder decomposes scene construction into a six-stage pipeline, using a growing library of skills and tools to achieve near-perfect physical validity while maintaining high semantic alignment with user intent.

SimCoder workflow: tool calling, skill library, verification, and self-evolution

SimCoder accepts a natural language (or image + text) prompt and orchestrates MCP tool calls, skill library composition, dual verification, and self-evolution to produce a physically valid UE5 scene.

1

MCP Tool Calling

UE5 function calls for asset placement, material assignment, and scene graph manipulation.

2

Skill Library

Reusable procedures (e.g. "furnish a room", "add outdoor lighting") composed for complex goals.

3

Rule Verification

Collision-free rate, spatial validity, and object count checks give fast scalar feedback.

4

VLM Verification

A vision-language model scores rendered screenshots for semantic alignment with the prompt.

5

Task Generation

Point-nav and object-nav tasks automatically derived from the scene graph โ€” no annotation needed.

6

Gymnasium Export

Each verified scene compiles into a standard Gym environment with RGB-D, bearing, and distance.

Growing Smarter with Each Failure

Rather than discarding failed generation attempts, SimCoder extracts generalizable patterns and encodes them as permanent capabilities, expanding its skill repertoire autonomously over time.

Ablation: scene quality vs number of self-evolving examples across component configurations

Scene quality (VLM score) on the test set as SimCoder accumulates self-evolving examples. Each ablation condition is held fixed; adding tools, verification, and self-evolution each contribute โ€” self-evolution delivers the largest single jump (+0.21, from 0.55 to 0.76).

Agents and Environments, Growing Together

Co-evolution closes the feedback loop: agent performance metrics guide SimCoder to generate environments of the right difficulty, producing an automatic curriculum without any manual reward shaping.

90%
Co-Evolution SR
on SimWorld-MMNav
72%
Fixed Env Baseline
fixed curriculum
+18pp
Over Fixed Env
+18pp improvement
+40pp
Over Untrained Agent
50% โ†’ 90% success rate
Co-evolution results: environment difficulty progression, training dynamics, and test performance across conditions

(a) Environment difficulty ramps across 8 levels as the agent improves. (b) Training dynamics: the co-evolving agent drops then recovers at each level transition. (c) Test performance on SimWorld-MMNav: co-evolution reaches 90% SR, an 18pp gain over fixed-environment training (72%) and a 40pp gain over the untrained baseline (50%).

Cite This Work

@misc{kang2026simworldstudio,
      title={SimWorld Studio: Automatic Environment Generation with Evolving Coding Agent for Embodied Agent Learning},
      author={Haoqiang Kang and Xiaokang Ye and Yuhan Liu and Siddhant Hitesh Mantri and Lingjun Mao and James Fleming and Drishti Regmi and Lianhui Qin},
      year={2026},
      eprint={2605.09423},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2605.09423},
}

Acknowledgement. We sincerely thank Aria Lin (hl3353@columbia.edu) for her contributions to the related-work investigation, and Lingge Meng (lim021@ucsd.edu), Vinayak Sharma (v9sharma@ucsd.edu), and Ishan Vaish (ivaish@ucsd.edu) for their help with running exploratory experiments, recording demonstration videos, and supporting the development of the project.