Skip to content

Optimize breaking_changes_checker performance for large packages#45937

Open
Copilot wants to merge 7 commits intomainfrom
copilot/optimize-breaking-changes-checker-performance
Open

Optimize breaking_changes_checker performance for large packages#45937
Copilot wants to merge 7 commits intomainfrom
copilot/optimize-breaking-changes-checker-performance

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 26, 2026

for #45952

The changelog generator is prohibitively slow on large packages like azure-mgmt-network due to redundant file I/O/AST parsing, quadratic algorithms in post-processing, and unnecessary deep copies.

AST caching and early exit

  • Module-level AST cache (_ast_cache/_get_parsed_module): get_properties() and get_overloads() both parse the same source files per class per base class. Now parsed once, cached by path, shared across both call sites.
  • Exception-based early exit in ClassTreeAnalyzer: raises _ClassNodeFound to break out of the full ast.NodeVisitor traversal immediately upon finding the target class, rather than continuing to walk the entire tree.

Algorithm fixes

  • run_async_cleanup(): O(n²m) → O(n). Replaced nested loop + list.remove() with a set of non-aio change keys and a single-pass list comprehension. Handles unhashable types (lists in parameter ordering tuples) via recursive _make_hashable().
# Before: O(n * m * m) — nested scan + linear removal
for change in non_aio_changes:
    for c in changes_list:
        if "aio" in c[2] and change[1] == c[1] and change[3:] == c[3:]:
            changes_list.remove(c)  # O(n)

# After: O(n) — set lookup + filter
non_aio_keys = {_make_key(bc) for bc in changes_list if "aio" not in bc[2]}
changes_list[:] = [bc for bc in changes_list if "aio" not in bc[2] or _make_key(bc) not in non_aio_keys]
  • get_reportable_changes(): eliminated deepcopy + list.remove(). Single-pass filter builds a new list directly. Suppression objects pre-created once instead of per-change. re.findall()re.search().

Performance tests

14 new tests in test_performance.py covering correctness and benchmarks:

  • 50K-change async cleanup completes in <2s
  • 5K changes with 100 ignore rules filtered in <1s
  • Full ChangelogTracker run on large synthetic API surface in <5s
  • AST cache hit verification and 200-lookup benchmark
  • Early exit speedup measurement (first-class vs last-class in 500-class file)

Copilot AI and others added 2 commits March 26, 2026 08:01
… exit, and O(n) algorithms

Key optimizations:
1. Add AST module cache (_ast_cache, _get_parsed_module) to avoid redundant file I/O
   and AST parsing in get_properties() and get_overloads()
2. Use exception-based early exit in ClassTreeAnalyzer to stop AST traversal once
   the target class is found
3. Replace O(n*m) run_async_cleanup() with O(n) set-based approach
4. Replace deepcopy + list.remove() in get_reportable_changes() with single-pass
   filtering - eliminates expensive deep copy and O(n^2) removal
5. Use re.search() instead of re.findall() for ignore rule matching
6. Add comprehensive performance tests demonstrating the optimizations

Co-authored-by: msyyc <70930885+msyyc@users.noreply.github.com>
Agent-Logs-Url: https://github.com/Azure/azure-sdk-for-python/sessions/328bed23-3c8d-44f6-bea0-a24571d01b8d
@msyyc msyyc requested a review from catalinaperalta as a code owner March 27, 2026 05:22
Copilot AI review requested due to automatic review settings March 27, 2026 05:22
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improves breaking_changes_checker runtime on large Azure SDK packages by reducing redundant AST parsing and optimizing post-processing algorithms that previously had quadratic behavior.

Changes:

  • Added AST parsing cache and an early-exit mechanism for locating class nodes during AST traversal.
  • Optimized run_async_cleanup() and get_reportable_changes() to avoid list.remove()/deepcopy-driven quadratic patterns.
  • Added a new test_performance.py test module covering correctness and performance/benchmark scenarios.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
scripts/breaking_changes_checker/detect_breaking_changes.py Adds _ast_cache + cached parsing helpers and exception-based early exit in ClassTreeAnalyzer; clears cache at start of report generation.
scripts/breaking_changes_checker/breaking_changes_tracker.py Reworks async cleanup and ignore-rule filtering to single-pass/set-based implementations.
scripts/breaking_changes_checker/tests/test_performance.py Introduces new performance-focused tests/benchmarks and cache/early-exit verification.

msyyc and others added 4 commits March 27, 2026 15:01
Copy link
Copy Markdown
Member

@catalinaperalta catalinaperalta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think overall these look like good improvements to the tool, thanks for working on this! My main concern is that there are some things that are changing the behavior and while they seem correct, I wonder if things will begin breaking due to these changes. To build confidence, have you done a run through of the changelog generation for more mgmt plane libraries to check how it behaves in different scenarios?

# Pre-create Suppression objects once instead of recreating per change
suppressions = [Suppression(*rule) for rule in ignored]

# Filter out ignored changes in a single pass instead of deepcopy + list.remove()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall there were issues without the deepcopy at some point. How many real packages have you tested this code with?

should_keep = True
for suppression in suppressions:
if suppression.parameter_or_property_name is not None:
# If the ignore rule is for a property or parameter, we should check up to that level on the original change
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this command was helpful to understand what we're doing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants