A multi-agent environment runtime for studying coordination under partial observability
MetaTraffic is a multi-agent reinforcement learning environment runtime for training and evaluating decentralized policies under realistic perception constraints.
The system focuses on environment design, simulation dynamics, and agent interaction, rather than model architecture alone. It is designed as a lightweight prototype of an agent training environment for real-world tasks.
Most RL environments assume either:
- single-agent control, or
- fully observable global state
MetaTraffic explores a more realistic setting:
Multiple agents interacting in a shared world, with limited local perception and competing objectives.
This setup is closer to real-world systems such as autonomous driving or robotic coordination.
Instead of exposing full world state, each agent observes a local spatial representation of the environment.
This is implemented as a rasterized road mask centered on the agent, functioning as a lightweight occupancy grid, similar to LiDAR or radar perception in real systems.
This forces agents to learn:
- local decision making
- implicit coordination
- behavior under partial observability
Map (SVG)
→ Road Graph Generation
→ Rasterized Mask (World Representation)
→ Physics Engine (Vehicle Dynamics)
→ Environment Runtime (DrivingEnv)
→ RL Training Loop
↘ Visualization / Debugging
The system implements a Gym-like interface:
reset()→ initialize episode with randomized start/goalstep(action)→ advance simulation
Returns:
- observation
- reward
- done
This design separates world simulation from agent interaction, making the environment reusable for different training setups.
Each agent receives:
- 32×32 local spatial mask (road / obstacle encoding)
- normalized velocity
- heading direction (cos, sin)
- relative displacement to goal
This enforces partial observability and removes access to global state.
Each episode consists of:
- generated or selected map
- randomized agent spawn points
- individual destinations
At every step, the system:
- updates world state
- applies agent actions
- resolves collisions and constraints
- computes rewards
- returns next observations
The environment includes stabilization mechanisms:
- potential-based shaping
- progress rewards
- collision penalties
- off-road penalties with temporal tracking
- stuck detection using history windows
These are critical for maintaining learning stability in multi-agent settings.
- limited local perception
- delayed reactions
- emergent congestion
- collision frequency grows non-linearly
- coordination instability
- noisy reward signals
- procedural map variation
- distribution shift
- inconsistent optimal policies
- Algorithm: MADDPG
- Continuous control
- Multi-agent training
- Parallel rollout support
The focus is not the algorithm itself, but how environment design shapes learning dynamics.
- real-time simulation rendering
- collision inspection
- trajectory visualization
- shared pipeline between training runtime and web interface
This enables consistent behavior across simulation and visualization environments.
The road network generation is adapted from: https://github.com/ProbableTrain/MapGenerator
We extended it to function as part of the environment runtime:
- integrated into the environment reset pipeline
- converted vector maps into rasterized masks for fast spatial queries
- unified usage across Python training and web visualization
This transforms the generator from a standalone tool into a reusable simulation component.
- not distributed across multiple machines
- limited agent communication mechanisms
- not benchmarked against standard MARL suites
- scaling beyond current agent count requires further optimization
Real-world AI systems operate under:
- partial observability
- noisy perception
- shared environments
- interacting agents
MetaTraffic provides a controlled environment to study these properties, bridging simulation and real-world system constraints.
RL/
env.py
physics.py
train_parallel.py
train_single.py
Web/
visualization / interface
Road generation is adapted from: https://github.com/ProbableTrain/MapGenerator
Extended for environment integration and cross-runtime usage.

