Optimize breaking_changes_checker performance for large packages#45937
Optimize breaking_changes_checker performance for large packages#45937
Conversation
… exit, and O(n) algorithms Key optimizations: 1. Add AST module cache (_ast_cache, _get_parsed_module) to avoid redundant file I/O and AST parsing in get_properties() and get_overloads() 2. Use exception-based early exit in ClassTreeAnalyzer to stop AST traversal once the target class is found 3. Replace O(n*m) run_async_cleanup() with O(n) set-based approach 4. Replace deepcopy + list.remove() in get_reportable_changes() with single-pass filtering - eliminates expensive deep copy and O(n^2) removal 5. Use re.search() instead of re.findall() for ignore rule matching 6. Add comprehensive performance tests demonstrating the optimizations Co-authored-by: msyyc <70930885+msyyc@users.noreply.github.com> Agent-Logs-Url: https://github.com/Azure/azure-sdk-for-python/sessions/328bed23-3c8d-44f6-bea0-a24571d01b8d
…hable handling Co-authored-by: msyyc <70930885+msyyc@users.noreply.github.com> Agent-Logs-Url: https://github.com/Azure/azure-sdk-for-python/sessions/328bed23-3c8d-44f6-bea0-a24571d01b8d
There was a problem hiding this comment.
Pull request overview
Improves breaking_changes_checker runtime on large Azure SDK packages by reducing redundant AST parsing and optimizing post-processing algorithms that previously had quadratic behavior.
Changes:
- Added AST parsing cache and an early-exit mechanism for locating class nodes during AST traversal.
- Optimized
run_async_cleanup()andget_reportable_changes()to avoidlist.remove()/deepcopy-driven quadratic patterns. - Added a new
test_performance.pytest module covering correctness and performance/benchmark scenarios.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
scripts/breaking_changes_checker/detect_breaking_changes.py |
Adds _ast_cache + cached parsing helpers and exception-based early exit in ClassTreeAnalyzer; clears cache at start of report generation. |
scripts/breaking_changes_checker/breaking_changes_tracker.py |
Reworks async cleanup and ignore-rule filtering to single-pass/set-based implementations. |
scripts/breaking_changes_checker/tests/test_performance.py |
Introduces new performance-focused tests/benchmarks and cache/early-exit verification. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
I think overall these look like good improvements to the tool, thanks for working on this! My main concern is that there are some things that are changing the behavior and while they seem correct, I wonder if things will begin breaking due to these changes. To build confidence, have you done a run through of the changelog generation for more mgmt plane libraries to check how it behaves in different scenarios?
| # Pre-create Suppression objects once instead of recreating per change | ||
| suppressions = [Suppression(*rule) for rule in ignored] | ||
|
|
||
| # Filter out ignored changes in a single pass instead of deepcopy + list.remove() |
There was a problem hiding this comment.
I recall there were issues without the deepcopy at some point. How many real packages have you tested this code with?
| should_keep = True | ||
| for suppression in suppressions: | ||
| if suppression.parameter_or_property_name is not None: | ||
| # If the ignore rule is for a property or parameter, we should check up to that level on the original change |
There was a problem hiding this comment.
Nit: this command was helpful to understand what we're doing
for #45952
The changelog generator is prohibitively slow on large packages like
azure-mgmt-networkdue to redundant file I/O/AST parsing, quadratic algorithms in post-processing, and unnecessary deep copies.AST caching and early exit
_ast_cache/_get_parsed_module):get_properties()andget_overloads()both parse the same source files per class per base class. Now parsed once, cached by path, shared across both call sites.ClassTreeAnalyzer: raises_ClassNodeFoundto break out of the fullast.NodeVisitortraversal immediately upon finding the target class, rather than continuing to walk the entire tree.Algorithm fixes
run_async_cleanup(): O(n²m) → O(n). Replaced nested loop +list.remove()with a set of non-aio change keys and a single-pass list comprehension. Handles unhashable types (lists in parameter ordering tuples) via recursive_make_hashable().get_reportable_changes(): eliminateddeepcopy+list.remove(). Single-pass filter builds a new list directly.Suppressionobjects pre-created once instead of per-change.re.findall()→re.search().Performance tests
14 new tests in
test_performance.pycovering correctness and benchmarks: