Add IQR-based environment fluctuation detection to benchmark timing stats#690
Closed
JewelRoam wants to merge 2 commits intoPaddlePaddle:developfrom
Closed
Add IQR-based environment fluctuation detection to benchmark timing stats#690JewelRoam wants to merge 2 commits intoPaddlePaddle:developfrom
JewelRoam wants to merge 2 commits intoPaddlePaddle:developfrom
Conversation
…tats Overview: This commit introduces IQR (Interquartile Range) based environment fluctuation detection to the timing statistics calculation in test_compiler_util.py. The feature helps detect unstable benchmarking environments by measuring the relative variation in timing results. Key Changes: - Enhanced get_timing_stats() function to compute median, Q1, Q3, and IQR - Added environment variable GRAPH_NET_FLUCTUATION_DETECT_THRESHOLD for configurable fluctuation detection sensitivity - RuntimeError is raised when IQR/median exceeds the threshold - Extended return stats dictionary with new fields: median, q1, q3, iqr IQR/median Ratio: - Measures relative variability of timing measurements - Lower values indicate more consistent timing - Higher values indicate environment instability or interference Environment Variable Configuration: - GRAPH_NET_FLUCTUATION_DETECT_THRESHOLD (default: 0.2) - 0.0: Disable detection (always accept results) - 0.5: Lenient (only flag severe fluctuations > 50%) - 1.0: Default (flag fluctuations > 20%) - 2.0: Very strict (flag fluctuations > 10%) Detection Algorithm: 1. Calculate median, Q1 (25th percentile), Q3 (75th percentile) 2. Compute IQR = Q3 - Q1 3. Calculate relative IQR = IQR / median 4. Compare against threshold 5. Raise RuntimeError with detailed diagnostics if exceeded Error Message Format: When fluctuation is detected, the error message includes: - IQR/median ratio and threshold - Q1 and Q3 values as percentages - IQR as percentage - Raw timing values for manual inspection Use Cases: - Multi-user GPU environments where timing variance is common - CI/CD pipeline monitoring for performance regression detection - Manual benchmark verification in shared resources - Identifying external workload interference Performance Impact: - Minimal: Adds a single numpy array conversion and percentile calculations - Scales with number of timing trials (typically 5-10 runs) - O(n) complexity for array operations Backward Compatibility: - Fully backward compatible - Existing code continues to work without modification - Only raises RuntimeError when fluctuation is detected - Default threshold (0.2) provides balanced sensitivity Testing: - Verified with sample timing data showing correct IQR calculations - Tested threshold sensitivity with various timing distributions - Confirmed graceful handling when all times are equal (IQR=0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Thanks for your contribution! |
This change updates the speedup calculation in print_times_and_speedup()
to use median instead of mean for consistency with the IQR-based
fluctuation detection feature.
Rationale:
- Median is more robust against outliers for speedup calculations
- Using median for speedup while adding median for fluctuation
detection ensures consistency across the timing statistics
- Median provides better representation of typical performance
in the presence of environmental noise
Changes:
- eager_e2e_time_ms: get("e2e", {}).get("median", 0)
- compiled_e2e_time_ms: get("e2e", {}).get("median", 0)
- eager_gpu_time_ms: get("gpu", {}).get("median", 0)
- compiled_gpu_time_ms: get("gpu", {}).get("median", 0)
Impact:
- Speedup values now reflect typical performance rather than
average performance
- More stable speedup metrics across multiple runs
- Aligns with IQR fluctuation detection approach
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
File change the same as https://github.com/PaddlePaddle/ai4c/pull/72/changes https://github.com/PaddlePaddle/ai4c/pull/75/changes