Add Automatic Domain Randomization (ADR) Framework for Robust Sim-to-Real Transfer

## Description

I would like to contribute an **Automatic Domain Randomization (ADR)** framework to dm_control that adaptively adjusts environment parameters during training based on policy performance. This feature is available in Isaac Gym and is critical for robust sim-to-real transfer, but is currently missing from dm_control.

## Problem Statement

Currently, dm_control lacks built-in support for domain randomization, forcing researchers to:
- **Manually tune randomization ranges** through trial-and-error, which is time-consuming and suboptimal
- **Use fixed randomization distributions** that don't adapt to the learning agent's capabilities
- **Implement custom solutions** for each project, leading to inconsistent and non-reusable code
- **Miss out on robust sim-to-real transfer** that ADR enables

**Competitors like Isaac Gym provide ADR out-of-the-box**, giving them a significant advantage for robotics research. As documented in the "Solving Rubik's Cube with a Robot Hand" paper (OpenAI 2019), ADR enables successful zero-shot sim-to-real transfer by automatically expanding randomization ranges when agents achieve consistent performance.

## What is ADR?

Automatic Domain Randomization progressively increases environment randomization difficulty based on agent performance:

1. **Start with minimal randomization** (near-nominal physics)
2. **Test on boundary conditions** of randomization ranges
3. **Expand ranges** when agent succeeds consistently on boundaries
4. **Contract ranges** when agent fails consistently
5. **Result:** Maximally robust policy without manual tuning

## Proposed Solution

I will implement a modular ADR framework for dm_control consisting of:

### 1. **Core ADR Manager**
```python
class ADRManager:
    """Manages automatic domain randomization for dm_control environments.
    
    Attributes:
        params: Dictionary of randomizable parameters and their bounds
        performance_buffer: Tracks success rates at boundary conditions
        thresholds: High/low thresholds for expanding/contracting ranges
    """
    
    def __init__(self, config):
        self.randomization_params = {}  # e.g., {'friction': [0.5, 1.5]}
        self.performance_threshold_high = 0.95  # Expand when success > 95%
        self.performance_threshold_low = 0.7   # Contract when success < 70%
        self.buffer_size = 100  # Episodes to average over
        
    def get_randomized_params(self, mode='training'):
        """Returns physics parameters for current episode.
        
        Args:
            mode: 'training' (sample from ranges) or 'boundary' (test limits)
        """
        
    def update_ranges(self, boundary_results):
        """Adjusts randomization ranges based on performance."""
        
    def should_expand(self, param_name, boundary):
        """Check if range should expand for given parameter."""
```

### 2. **Randomizable Parameters**

Support randomization of key physics properties:
- **Dynamics:** Mass, inertia, friction, damping, armature
- **Actuation:** Motor gains (kp, kd), force limits, control noise
- **Observation:** Sensor noise, latency, dropouts  
- **Geometry:** Link lengths, COM positions (where applicable)
- **External forces:** Random pushes, wind, ground perturbations

### 3. **ADR-Compatible Environment Wrapper**

```python
class ADRWrapper:
    """Wraps dm_control environments to support ADR.
    
    Automatically applies randomization at reset and tracks performance.
    """
    
    def __init__(self, env, adr_manager, eval_fraction=0.1):
        self.env = env
        self.adr = adr_manager
        self.eval_fraction = eval_fraction  # Fraction of envs for boundary testing
        
    def reset(self, env_idx=None):
        """Reset with ADR parameters."""
        mode = 'boundary' if self._is_eval_env(env_idx) else 'training'
        params = self.adr.get_randomized_params(mode=mode)
        self._apply_randomization(params)
        return self.env.reset()
    
    def step(self, action):
        """Step and track performance for ADR."""
        timestep = self.env.step(action)
        self._record_performance(timestep)
        return timestep
```

### 4. **Configuration System**

YAML-based configuration for easy setup:

```yaml
adr_config:
  enabled: true
  performance_thresholds:
    high: 0.95  # Expand ranges
    low: 0.70   # Contract ranges
  buffer_size: 100  # Episodes for averaging
  evaluation_fraction: 0.1  # 10% of envs test boundaries
  
  randomization_params:
    dynamics:
      friction:
        initial_range: [0.8, 1.2]
        min_range: [0.5, 1.5]
        max_range: [0.1, 3.0]
        delta: 0.05  # Step size for expansion
      
      mass:
        initial_range: [0.9, 1.1]
        min_range: [0.5, 1.5]
        max_range: [0.3, 2.0]
        delta: 0.05
        
    actuation:
      kp_scale:
        initial_range: [0.95, 1.05]
        max_range: [0.5, 1.5]
        delta: 0.02
        
    observation:
      noise_std:
        initial_range: [0.0, 0.01]
        max_range: [0.0, 0.1]
        delta: 0.005
```

### 5. **Integration with dm_control Suite**

```python
from dm_control import suite
from dm_control.rl.adr import ADRManager, ADRWrapper
import yaml

# Load ADR configuration
with open('adr_config.yaml') as f:
    adr_config = yaml.safe_load(f)

# Create base environment
base_env = suite.load('walker', 'walk')

# Wrap with ADR
adr_manager = ADRManager(adr_config['adr_config'])
env = ADRWrapper(base_env, adr_manager)

# Training loop
for episode in range(10000):
    timestep = env.reset()
    while not timestep.last():
        action = policy(timestep.observation)
        timestep = env.step(action)
    
    # ADR automatically adjusts ranges based on performance
```

## Technical Implementation Details

**File Structure:**
```
dm_control/
├── rl/
│   └── adr/
│       ├── __init__.py
│       ├── adr_manager.py       # Core ADR logic
│       ├── adr_wrapper.py       # Environment wrapper
│       ├── randomizers.py       # Parameter randomization functions
│       ├── performance_tracker.py  # Track boundary test results
│       └── configs/
│           └── default_adr.yaml
├── examples/
│   └── adr_training_example.py
└── tests/
    └── adr_test.py
```

**Core Skills Used:**
- Python class design and OOP
- NumPy for parameter sampling and statistics
- YAML for configuration management
- MuJoCo physics property manipulation
- Statistical performance tracking
- Clean API design for extensibility

## Benefits

1. **Enables robust sim-to-real transfer** without manual tuning
2. **Competitive with Isaac Gym** - brings dm_control to feature parity
3. **Reusable across all dm_control tasks** - works with suite, composer, locomotion
4. **Reduces research iteration time** - no need to manually tune DR ranges
5. **Improves policy robustness** - automatically finds optimal randomization
6. **Well-documented approach** - based on established OpenAI research

## Success Metrics

- **ADR successfully expands** randomization ranges during training
- **Policies trained with ADR** show better robustness to parameter variations
- **Performance on boundary tests** guides automatic range adjustments
- **Works across dm_control suite** - walker, humanoid, quadruped, manipulator
- **Minimal overhead** - <5% slowdown compared to fixed randomization

## Example Use Case

**Before ADR (manual):**
```python
# Researcher manually tunes these... takes days of trial-and-error
friction_range = [0.5, 1.5]  # Too wide? Too narrow? Who knows?
mass_range = [0.8, 1.2]
```

**After ADR (automatic):**
```python
# ADR automatically finds optimal ranges during training
env = ADRWrapper(base_env, adr_config)
# Trains robustly without manual tuning!
```

## Testing Plan

1. **Unit tests** for ADR range expansion/contraction logic
2. **Integration tests** with walker, humanoid tasks
3. **Benchmark tests** comparing fixed DR vs ADR
4. **Robustness tests** - policy performance under parameter variations
5. **Performance tests** - overhead measurement

## Why I Want to Fix This

This contribution would:
- **Address a major gap** vs Isaac Gym and other simulators
- **Enable cutting-edge research** in sim-to-real transfer
- **Use core ML/Python skills** - statistics, numpy, clean APIs
- **Have clear success criteria** - ADR should adapt ranges automatically
- **Benefit the entire community** - usable across all dm_control tasks

I have experience with RL, sim-to-real transfer, and dm_control environments. I've implemented similar domain randomization systems before and understand the theoretical foundations from the OpenAI ADR paper. I'm excited to bring this critical feature to dm_control and make it competitive with other leading simulators.

## Implementation Timeline

- **Week 1-2:** Implement core `ADRManager` and performance tracking
- **Week 3:** Build `ADRWrapper` with environment integration
- **Week 4:** Add configuration system and parameter randomizers
- **Week 5:** Comprehensive testing across dm_control suite
- **Week 6:** Documentation, examples, and tutorials
- **Week 7:** Performance optimization and edge case handling
- **Week 8:** Address review feedback

I'm ready to start immediately and would appreciate guidance on dm_control-specific implementation details and preferred code style.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Automatic Domain Randomization (ADR) Framework for Robust Sim-to-Real Transfer #527

Description

Problem Statement

What is ADR?

Proposed Solution

1. Core ADR Manager

2. Randomizable Parameters

3. ADR-Compatible Environment Wrapper

4. Configuration System

5. Integration with dm_control Suite

Technical Implementation Details

Benefits

Success Metrics

Example Use Case

Testing Plan

Why I Want to Fix This

Implementation Timeline

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Add Automatic Domain Randomization (ADR) Framework for Robust Sim-to-Real Transfer #527

Description

Description

Problem Statement

What is ADR?

Proposed Solution

1. Core ADR Manager

2. Randomizable Parameters

3. ADR-Compatible Environment Wrapper

4. Configuration System

5. Integration with dm_control Suite

Technical Implementation Details

Benefits

Success Metrics

Example Use Case

Testing Plan

Why I Want to Fix This

Implementation Timeline

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions