Key Features
🌍 Open-ended Realistic Simulation
Realistic physical and social simulation with open-ended, language-controllable world generation.
🤖 Rich LLM/VLM Agent Interface
Gym-like interface, multimodal observations, and grounded natural-language actions spanning multiple levels of abstraction.
💡 Diverse Reasoning Scenarios
Support for diverse long-horizon physical and social reasoning, enabling systematic agent training and evaluation.
Simulator Comparison
| Simulator | Open-ended Realistic Simulation | Rich LLM/VLM Agent Interface | Diverse Reasoning Scenarios | |||||
|---|---|---|---|---|---|---|---|---|
| Simulation Realism | Procedural Generation | Language Control | Open Vocabulary Action Space | High-level Control | Low-level Control | Social Reasoning | Physical Reasoning | |
| SimWorld | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
| AI2-THOR | ✓ | - | - | - | ✓ | - | ✓ | |
| Genesis | ✓ | - | - | - | ✓ | - | ✓ | |
| VirtualCommunity | ✓ | - | - | - | ✓ | ✓ | ✓ | |
| Mindcraft | ✓ | - | - | ✓ | ✓ | ✓ | - | |
| Minedojo | ✓ | - | - | - | ✓ | - | - | |
| MetaUrban | ✓ | - | - | - | ✓ | - | ✓ | |
| EmbodiedCity | - | - | - | - | ✓ | - | - | |
| CARLA | - | - | - | - | ✓ | - | ✓ | |
| GRUtopia | - | - | - | - | ✓ | - | ✓ | |
| OmniGibson | - | - | - | ✓ | ✓ | - | ✓ | |
| Habitat 3.0 | - | - | - | - | ✓ | - | ✓ | |
| UnrealZoo | - | - | - | - | ✓ | - | ✓ | |
Open-ended Realistic Simulation
Procedural Scene Generation
SimWorld’s procedural generation system uses a modular, extensible pipeline with three stages: road generation, building generation, and street-element generation, each adding more structural and visual detail.
Various Environments
SimWorld offers a broad spectrum of meticulously designed environments, enabling diverse world-building and scenario development.
Loading video...
Physical and Social Dynamics
SimWorld simulates realistic physical, environmental, and social dynamics that shape the behavior of agents and the world around them.
Loading video...
Physical laws (e.g., gravity, momentum)
Loading video...
Lighting, weather, time of day
Loading video...
Traffic System
Language-based World Editing
Beyond static and procedurally generated maps, SimWorld supports open-ended, language-based world editing, allowing users and agents to create, modify, and compose scenes on the fly with natural-language commands.
Loading video...
“Generate several buildings that can fill the current empty block.”
Loading video...
“Generate a motorcycle and put it in the middle of the road.”
Loading video...
“Replace the buildings to make the overall style more consistent.”
Rich LLM/VLM Agent Interface
SimWorld provides a comprehensive interface for LLM/VLM agents with rich observation modalities and diverse action capabilities, enabling agents to perceive and interact with the environment in a natural and intuitive manner.
Observation Space
The simulator provides diverse observations including visual sensors (RGB, depth, segmentation), scene graph and GPS information (global and local maps).
RGB
Depth
Segmentation
Scene Graph
Global Map
Local Map
Open-Vocabulary Action Space
SimWorld supports an open-vocabulary action space that accepts natural language commands, which are then decomposed by a built-in action planner into sequences of low-level primitive actions.
Driving vehicles in realistic traffic
Natural social interaction between agents
Human–robot collaboration in shared spaces
Picking up and delivering objects
Fine-grained object manipulation
Pointing and gesturing to ground language
Diverse Reasoning Scenarios
Enable agents to perform complex reasoning and coordinated behaviors across diverse physical and social contexts.
Loading video...
Low-level motion control while avoiding obstacles.
Loading video...
Multimodal instruction-following navigation with visual hints.
Loading video...
Deliver food across the city, completing orders to earn money.