Skip to content

Add Automatic Domain Randomization (ADR) Framework for Robust Sim-to-Real Transfer #527

@Aaryan-549

Description

@Aaryan-549

Description

I would like to contribute an Automatic Domain Randomization (ADR) framework to dm_control that adaptively adjusts environment parameters during training based on policy performance. This feature is available in Isaac Gym and is critical for robust sim-to-real transfer, but is currently missing from dm_control.

Problem Statement

Currently, dm_control lacks built-in support for domain randomization, forcing researchers to:

  • Manually tune randomization ranges through trial-and-error, which is time-consuming and suboptimal
  • Use fixed randomization distributions that don't adapt to the learning agent's capabilities
  • Implement custom solutions for each project, leading to inconsistent and non-reusable code
  • Miss out on robust sim-to-real transfer that ADR enables

Competitors like Isaac Gym provide ADR out-of-the-box, giving them a significant advantage for robotics research. As documented in the "Solving Rubik's Cube with a Robot Hand" paper (OpenAI 2019), ADR enables successful zero-shot sim-to-real transfer by automatically expanding randomization ranges when agents achieve consistent performance.

What is ADR?

Automatic Domain Randomization progressively increases environment randomization difficulty based on agent performance:

  1. Start with minimal randomization (near-nominal physics)
  2. Test on boundary conditions of randomization ranges
  3. Expand ranges when agent succeeds consistently on boundaries
  4. Contract ranges when agent fails consistently
  5. Result: Maximally robust policy without manual tuning

Proposed Solution

I will implement a modular ADR framework for dm_control consisting of:

1. Core ADR Manager

class ADRManager:
    """Manages automatic domain randomization for dm_control environments.
    
    Attributes:
        params: Dictionary of randomizable parameters and their bounds
        performance_buffer: Tracks success rates at boundary conditions
        thresholds: High/low thresholds for expanding/contracting ranges
    """
    
    def __init__(self, config):
        self.randomization_params = {}  # e.g., {'friction': [0.5, 1.5]}
        self.performance_threshold_high = 0.95  # Expand when success > 95%
        self.performance_threshold_low = 0.7   # Contract when success < 70%
        self.buffer_size = 100  # Episodes to average over
        
    def get_randomized_params(self, mode='training'):
        """Returns physics parameters for current episode.
        
        Args:
            mode: 'training' (sample from ranges) or 'boundary' (test limits)
        """
        
    def update_ranges(self, boundary_results):
        """Adjusts randomization ranges based on performance."""
        
    def should_expand(self, param_name, boundary):
        """Check if range should expand for given parameter."""

2. Randomizable Parameters

Support randomization of key physics properties:

  • Dynamics: Mass, inertia, friction, damping, armature
  • Actuation: Motor gains (kp, kd), force limits, control noise
  • Observation: Sensor noise, latency, dropouts
  • Geometry: Link lengths, COM positions (where applicable)
  • External forces: Random pushes, wind, ground perturbations

3. ADR-Compatible Environment Wrapper

class ADRWrapper:
    """Wraps dm_control environments to support ADR.
    
    Automatically applies randomization at reset and tracks performance.
    """
    
    def __init__(self, env, adr_manager, eval_fraction=0.1):
        self.env = env
        self.adr = adr_manager
        self.eval_fraction = eval_fraction  # Fraction of envs for boundary testing
        
    def reset(self, env_idx=None):
        """Reset with ADR parameters."""
        mode = 'boundary' if self._is_eval_env(env_idx) else 'training'
        params = self.adr.get_randomized_params(mode=mode)
        self._apply_randomization(params)
        return self.env.reset()
    
    def step(self, action):
        """Step and track performance for ADR."""
        timestep = self.env.step(action)
        self._record_performance(timestep)
        return timestep

4. Configuration System

YAML-based configuration for easy setup:

adr_config:
  enabled: true
  performance_thresholds:
    high: 0.95  # Expand ranges
    low: 0.70   # Contract ranges
  buffer_size: 100  # Episodes for averaging
  evaluation_fraction: 0.1  # 10% of envs test boundaries
  
  randomization_params:
    dynamics:
      friction:
        initial_range: [0.8, 1.2]
        min_range: [0.5, 1.5]
        max_range: [0.1, 3.0]
        delta: 0.05  # Step size for expansion
      
      mass:
        initial_range: [0.9, 1.1]
        min_range: [0.5, 1.5]
        max_range: [0.3, 2.0]
        delta: 0.05
        
    actuation:
      kp_scale:
        initial_range: [0.95, 1.05]
        max_range: [0.5, 1.5]
        delta: 0.02
        
    observation:
      noise_std:
        initial_range: [0.0, 0.01]
        max_range: [0.0, 0.1]
        delta: 0.005

5. Integration with dm_control Suite

from dm_control import suite
from dm_control.rl.adr import ADRManager, ADRWrapper
import yaml

# Load ADR configuration
with open('adr_config.yaml') as f:
    adr_config = yaml.safe_load(f)

# Create base environment
base_env = suite.load('walker', 'walk')

# Wrap with ADR
adr_manager = ADRManager(adr_config['adr_config'])
env = ADRWrapper(base_env, adr_manager)

# Training loop
for episode in range(10000):
    timestep = env.reset()
    while not timestep.last():
        action = policy(timestep.observation)
        timestep = env.step(action)
    
    # ADR automatically adjusts ranges based on performance

Technical Implementation Details

File Structure:

dm_control/
├── rl/
│   └── adr/
│       ├── __init__.py
│       ├── adr_manager.py       # Core ADR logic
│       ├── adr_wrapper.py       # Environment wrapper
│       ├── randomizers.py       # Parameter randomization functions
│       ├── performance_tracker.py  # Track boundary test results
│       └── configs/
│           └── default_adr.yaml
├── examples/
│   └── adr_training_example.py
└── tests/
    └── adr_test.py

Core Skills Used:

  • Python class design and OOP
  • NumPy for parameter sampling and statistics
  • YAML for configuration management
  • MuJoCo physics property manipulation
  • Statistical performance tracking
  • Clean API design for extensibility

Benefits

  1. Enables robust sim-to-real transfer without manual tuning
  2. Competitive with Isaac Gym - brings dm_control to feature parity
  3. Reusable across all dm_control tasks - works with suite, composer, locomotion
  4. Reduces research iteration time - no need to manually tune DR ranges
  5. Improves policy robustness - automatically finds optimal randomization
  6. Well-documented approach - based on established OpenAI research

Success Metrics

  • ADR successfully expands randomization ranges during training
  • Policies trained with ADR show better robustness to parameter variations
  • Performance on boundary tests guides automatic range adjustments
  • Works across dm_control suite - walker, humanoid, quadruped, manipulator
  • Minimal overhead - <5% slowdown compared to fixed randomization

Example Use Case

Before ADR (manual):

# Researcher manually tunes these... takes days of trial-and-error
friction_range = [0.5, 1.5]  # Too wide? Too narrow? Who knows?
mass_range = [0.8, 1.2]

After ADR (automatic):

# ADR automatically finds optimal ranges during training
env = ADRWrapper(base_env, adr_config)
# Trains robustly without manual tuning!

Testing Plan

  1. Unit tests for ADR range expansion/contraction logic
  2. Integration tests with walker, humanoid tasks
  3. Benchmark tests comparing fixed DR vs ADR
  4. Robustness tests - policy performance under parameter variations
  5. Performance tests - overhead measurement

Why I Want to Fix This

This contribution would:

  • Address a major gap vs Isaac Gym and other simulators
  • Enable cutting-edge research in sim-to-real transfer
  • Use core ML/Python skills - statistics, numpy, clean APIs
  • Have clear success criteria - ADR should adapt ranges automatically
  • Benefit the entire community - usable across all dm_control tasks

I have experience with RL, sim-to-real transfer, and dm_control environments. I've implemented similar domain randomization systems before and understand the theoretical foundations from the OpenAI ADR paper. I'm excited to bring this critical feature to dm_control and make it competitive with other leading simulators.

Implementation Timeline

  • Week 1-2: Implement core ADRManager and performance tracking
  • Week 3: Build ADRWrapper with environment integration
  • Week 4: Add configuration system and parameter randomizers
  • Week 5: Comprehensive testing across dm_control suite
  • Week 6: Documentation, examples, and tutorials
  • Week 7: Performance optimization and edge case handling
  • Week 8: Address review feedback

I'm ready to start immediately and would appreciate guidance on dm_control-specific implementation details and preferred code style.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions