Skip to content

Add IQR-based environment fluctuation detection to benchmark timing stats#690

Closed
JewelRoam wants to merge 2 commits intoPaddlePaddle:developfrom
JewelRoam:iqr-2
Closed

Add IQR-based environment fluctuation detection to benchmark timing stats#690
JewelRoam wants to merge 2 commits intoPaddlePaddle:developfrom
JewelRoam:iqr-2

Conversation

@JewelRoam
Copy link
Copy Markdown
Collaborator

…tats

Overview:
This commit introduces IQR (Interquartile Range) based environment fluctuation
detection to the timing statistics calculation in test_compiler_util.py. The
feature helps detect unstable benchmarking environments by measuring the
relative variation in timing results.

Key Changes:
- Enhanced get_timing_stats() function to compute median, Q1, Q3, and IQR
- Added environment variable GRAPH_NET_FLUCTUATION_DETECT_THRESHOLD for
  configurable fluctuation detection sensitivity
- RuntimeError is raised when IQR/median exceeds the threshold
- Extended return stats dictionary with new fields: median, q1, q3, iqr

IQR/median Ratio:
- Measures relative variability of timing measurements
- Lower values indicate more consistent timing
- Higher values indicate environment instability or interference

Environment Variable Configuration:
- GRAPH_NET_FLUCTUATION_DETECT_THRESHOLD (default: 0.2)
- 0.0: Disable detection (always accept results)
- 0.5: Lenient (only flag severe fluctuations > 50%)
- 1.0: Default (flag fluctuations > 20%)
- 2.0: Very strict (flag fluctuations > 10%)

Detection Algorithm:
1. Calculate median, Q1 (25th percentile), Q3 (75th percentile)
2. Compute IQR = Q3 - Q1
3. Calculate relative IQR = IQR / median
4. Compare against threshold
5. Raise RuntimeError with detailed diagnostics if exceeded

Error Message Format:
When fluctuation is detected, the error message includes:
- IQR/median ratio and threshold
- Q1 and Q3 values as percentages
- IQR as percentage
- Raw timing values for manual inspection

Use Cases:
- Multi-user GPU environments where timing variance is common
- CI/CD pipeline monitoring for performance regression detection
- Manual benchmark verification in shared resources
- Identifying external workload interference

Performance Impact:
- Minimal: Adds a single numpy array conversion and percentile calculations
- Scales with number of timing trials (typically 5-10 runs)
- O(n) complexity for array operations

Backward Compatibility:
- Fully backward compatible
- Existing code continues to work without modification
- Only raises RuntimeError when fluctuation is detected
- Default threshold (0.2) provides balanced sensitivity

Testing:
- Verified with sample timing data showing correct IQR calculations
- Tested threshold sensitivity with various timing distributions
- Confirmed graceful handling when all times are equal (IQR=0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Apr 15, 2026

Thanks for your contribution!

This change updates the speedup calculation in print_times_and_speedup()
to use median instead of mean for consistency with the IQR-based
fluctuation detection feature.

Rationale:
- Median is more robust against outliers for speedup calculations
- Using median for speedup while adding median for fluctuation
  detection ensures consistency across the timing statistics
- Median provides better representation of typical performance
  in the presence of environmental noise

Changes:
- eager_e2e_time_ms: get("e2e", {}).get("median", 0)
- compiled_e2e_time_ms: get("e2e", {}).get("median", 0)
- eager_gpu_time_ms: get("gpu", {}).get("median", 0)
- compiled_gpu_time_ms: get("gpu", {}).get("median", 0)

Impact:
- Speedup values now reflect typical performance rather than
  average performance
- More stable speedup metrics across multiple runs
- Aligns with IQR fluctuation detection approach

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@JewelRoam JewelRoam closed this Apr 15, 2026
@JewelRoam JewelRoam deleted the iqr-2 branch April 15, 2026 09:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant