SimWorld Studio:
Automatic Environment Generation
with Evolving Coding Agent
for Embodied Agent Learning

An open-source platform built on Unreal Engine 5 that generates interactive, physically-grounded 3D environments from natural language, with self- and co-evolution for embodied AI training.

Haoqiang Kang^1,*, Xiaokang Ye^1,*, Yuhan Liu², Siddhant Hitesh Mantri¹, Lingjun Mao¹,
James Fleming¹, Drishti Regmi¹, Lianhui Qin¹

¹University of California San Diego ²New York University ^*Equal contribution

Paper Code & Data View Demos

0.98+

Collision-free rate

physical validity

+12pp

Cross-env transfer

to SimWorld-MMNav

+18pp

Co-evolution SR gain

over fixed curriculum

Abstract

Overview

LLM/VLM-based digital agents have advanced rapidly thanks to scalable sandboxes for coding, web navigation, and computer use, which provide rich interactive training grounds. In contrast, embodied agents still lack abundant, diverse, and automatically generated 3D environments for interactive learning. Existing embodied simulators rely on manually crafted scenes or procedural templates, while recent LLM-based 3D generation systems mainly produce static scenes rather than deployable environments with verifiable tasks and standard learning interfaces.

We introduce SimWorld Studio, an open-source platform built on Unreal Engine 5 for generating evolving embodied learning environments. At its core is SimCoder, a tool/skill-augmented coding agent that writes and executes engine-level code to construct physically grounded 3D worlds from language/image instructions. SimCoder self-evolves by using verifier feedback (e.g., compilation errors, physics checks, VLM critiques) to revise environments and autonomously add reusable tools and skills to its library. Generated worlds are exported as Gym-style environments for embodied agent learning.

SimWorld Studio further enables co-evolution between environment generation and embodied learning: agent performance feedback guides SimCoder to generate adaptive curricula near the learner's capability frontier, so that environments become increasingly challenging as the embodied agent improves. Three case studies on embodied navigation show that self-evolution improves generation reliability, generated environments substantially improve embodied agent performance that generalizes to unseen benchmarks, and co-evolution yields an 18-point success-rate gain over fixed-environment learning and a 40-point gain over an untrained agent.

Contributions

Core Innovations

SimWorld Studio introduces three tightly integrated components that together close the loop between environment generation and agent learning.

🏗️

Automatic Environment Generation

Generate diverse, physically-grounded UE5 environments directly from natural language or image prompts. SimCoder synthesizes executable scene code via MCP tool calls and verifies it through physics and VLM checks — no manual scene design required.

🧬

Self-evolving SimCoder

When SimCoder encounters repeated failures, it automatically converts them into generalizable capabilities — authoring new MCP tools and skill-library entries that persist and improve all future generation tasks.

🔄

Co-Evolving Environment and Embodied Agent

A bidirectional feedback loop: embodied agent success rates inform the coding agent to progressively raise difficulty, creating an adaptive curriculum that trains agents across all 8 difficulty levels without manual tuning.

Architecture

Platform Overview

SimWorld Studio integrates a UE5 backend with a Python MCP bridge, dual verification, skill registry, and a Gymnasium compliance layer.

SimWorld Studio pipeline: SimCoder receives natural language / image instructions, calls MCP tool APIs to generate UE5 scene code, runs dual verification (rule-based + VLM), self-evolves on failures, and exports a Gymnasium environment for downstream agent training.

SimCoder

Generating Environments with Vibe Coding

SimCoder decomposes scene construction into a six-stage pipeline, using a growing library of skills and tools to achieve near-perfect physical validity while maintaining high semantic alignment with user intent.

SimCoder workflow: tool calling, skill library, verification, and self-evolution

SimCoder accepts a natural language (or image + text) prompt and orchestrates MCP tool calls, skill library composition, dual verification, and self-evolution to produce a physically valid UE5 scene.

MCP Tool Calling

UE5 function calls for asset placement, material assignment, and scene graph manipulation.

Skill Library

Reusable procedures (e.g. "furnish a room", "add outdoor lighting") composed for complex goals.

Rule Verification

Collision-free rate, spatial validity, and object count checks give fast scalar feedback.

VLM Verification

A vision-language model scores rendered screenshots for semantic alignment with the prompt.

Task Generation

Point-nav and object-nav tasks automatically derived from the scene graph — no annotation needed.

Gymnasium Export

Each verified scene compiles into a standard Gym environment with RGB-D, bearing, and distance.

Self-Evolution

Growing Smarter with Each Failure

Rather than discarding failed generation attempts, SimCoder extracts generalizable patterns and encodes them as permanent capabilities, expanding its skill repertoire autonomously over time.

Ablation: scene quality vs number of self-evolving examples across component configurations

Scene quality (VLM score) on the test set as SimCoder accumulates self-evolving examples. Each ablation condition is held fixed; adding tools, verification, and self-evolution each contribute — self-evolution delivers the largest single jump (+0.21, from 0.55 to 0.76).

Co-Evolution

Agents and Environments, Growing Together

Co-evolution closes the feedback loop: agent performance metrics guide SimCoder to generate environments of the right difficulty, producing an automatic curriculum without any manual reward shaping.

90%

Co-Evolution SR

on SimWorld-MMNav

72%

Fixed Env Baseline

fixed curriculum

+18pp

Over Fixed Env

+18pp improvement

+40pp

Over Untrained Agent

50% → 90% success rate

Co-evolution results: environment difficulty progression, training dynamics, and test performance across conditions

(a) Environment difficulty ramps across 8 levels as the agent improves. (b) Training dynamics: the co-evolving agent drops then recovers at each level transition. (c) Test performance on SimWorld-MMNav: co-evolution reaches 90% SR, an 18pp gain over fixed-environment training (72%) and a 40pp gain over the untrained baseline (50%).

Citation

Cite This Work

@misc{kang2026simworldstudio,
      title={SimWorld Studio: Automatic Environment Generation with Evolving Coding Agent for Embodied Agent Learning},
      author={Haoqiang Kang and Xiaokang Ye and Yuhan Liu and Siddhant Hitesh Mantri and Lingjun Mao and James Fleming and Drishti Regmi and Lianhui Qin},
      year={2026},
      eprint={2605.09423},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2605.09423},
}

arXiv PDF GitHub Repository SimWorld Main Site

Acknowledgement. We sincerely thank Aria Lin (hl3353@columbia.edu) for her contributions to the related-work investigation, and Lingge Meng (lim021@ucsd.edu), Vinayak Sharma (v9sharma@ucsd.edu), and Ishan Vaish (ivaish@ucsd.edu) for their help with running exploratory experiments, recording demonstration videos, and supporting the development of the project.

SimWorld Studio: Automatic Environment Generation with Evolving Coding Agent for Embodied Agent Learning

Environment Generation & Agent Training