Skip to content

ci: suppress LeakSanitizer SEGV in TensorFlow/MLIR/LLVM stack#5301

Closed
njzjz-bot wants to merge 2 commits intodeepmodeling:masterfrom
njzjz-bot:ci/lsan-suppress-tf-segv
Closed

ci: suppress LeakSanitizer SEGV in TensorFlow/MLIR/LLVM stack#5301
njzjz-bot wants to merge 2 commits intodeepmodeling:masterfrom
njzjz-bot:ci/lsan-suppress-tf-segv

Conversation

@njzjz-bot
Copy link
Contributor

@njzjz-bot njzjz-bot commented Mar 10, 2026

LeakSanitizer SEGVs during stack unwinding on certain TensorFlow/MLIR/LLVM operations (ThreadRegistry::GetThreadLocked). This appears to be an upstream issue in libsanitizer when handling complex thread topologies or late allocations.

Adding a suppression for the crashing function to treat this as a known CI flake and prevent job failure.

Crash Log:

==4388==ERROR: LeakSanitizer: SEGV on unknown address 0x7f7f77468ff8 (pc 0x7f7f76e14839 bp 0x7f7f01f85a60 sp 0x7f7f01f85a10 T511)
==4388==The signal is caused by a READ memory access.
    #0 0x7f7f76e14839 in __sanitizer::ThreadRegistry::GetThreadLocked(unsigned int) ../../../../src/libsanitizer/sanitizer_common/sanitizer_thread_registry.h:103
    ...
    #5 0x7f7f76e132e0 in operator new(unsigned long, std::align_val_t, std::nothrow_t const&) ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:265
    #6 0x7f7f750da74f in llvm::allocate_buffer(unsigned long, unsigned long)

Summary by CodeRabbit

  • Chores
    • Updated CI configuration to add a LeakSanitizer suppression entry that reduces false-positive leak reports.
    • Mitigates intermittent CI test failures by treating a known unwinding crash as a CI flake; includes notes referencing the upstream issue.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 10, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 98a79b41-c391-4335-833a-010bcacb5887

📥 Commits

Reviewing files that changed from the base of the PR and between 2ae86df and 7de0d2d.

📒 Files selected for processing (1)
  • .github/workflows/suppr.txt
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/suppr.txt

📝 Walkthrough

Walkthrough

Adds a LeakSanitizer suppression entry to .github/workflows/suppr.txt for ThreadRegistry::GetThreadLocked, with CI comments noting the suppression targets a crash/unwinding flake rather than a real leak.

Changes

Cohort / File(s) Summary
Suppression Configuration
​.github/workflows/suppr.txt
Added a LeakSanitizer suppression rule matching ThreadRegistry::GetThreadLocked and CI-flake comments explaining the suppression targets an unwinding/SEGV CI flake, not an actual leak.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding a LeakSanitizer suppression for a SEGV occurring in the TensorFlow/MLIR/LLVM stack, which matches the PR's core objective of suppressing a known CI flake.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/suppr.txt:
- Line 10: The suppression entry for ThreadRegistry::GetThreadLocked is missing
the required LeakSanitizer prefix; update the suppression pattern to include the
"leak:" prefix (i.e., change the entry referencing
ThreadRegistry::GetThreadLocked so it reads
leak:ThreadRegistry::GetThreadLocked) so the leak suppression is recognized;
locate the existing suppression list and modify the
ThreadRegistry::GetThreadLocked line accordingly.
- Around line 6-10: The suppression entry for ThreadRegistry::GetThreadLocked is
missing the required LSan type prefix; update the line containing the symbol
ThreadRegistry::GetThreadLocked to include the "leak:" prefix (i.e., change it
to leak:ThreadRegistry::GetThreadLocked) so it matches the other suppressions
and conforms to LSan suppression syntax.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 68213295-d3e1-426b-b921-fc4309dca3b1

📥 Commits

Reviewing files that changed from the base of the PR and between 24e54bf and 603ea19.

📒 Files selected for processing (1)
  • .github/workflows/suppr.txt

@njzjz-bot
Copy link
Contributor Author

Fix: LSAN suppression file requires a type prefix (e.g. 'leak:'). Previous commit had a bare symbol line which caused . Updated suppression to:

so it parses correctly.

@njzjz-bot
Copy link
Contributor Author

Follow-up (shell quoting fix): the suppression file must contain entries of form <type>:<pattern>. The bare ThreadRegistry::GetThreadLocked line caused the parse failure. Now changed to leak:ThreadRegistry::GetThreadLocked.

@codecov
Copy link

codecov bot commented Mar 10, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.32%. Comparing base (24e54bf) to head (7de0d2d).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5301      +/-   ##
==========================================
- Coverage   82.32%   82.32%   -0.01%     
==========================================
  Files         768      768              
  Lines       77098    77097       -1     
  Branches     3659     3659              
==========================================
- Hits        63469    63468       -1     
+ Misses      12458    12457       -1     
- Partials     1171     1172       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@njzjz-bot njzjz-bot force-pushed the ci/lsan-suppress-tf-segv branch from 2ae86df to 7de0d2d Compare March 10, 2026 06:45
@njzjz
Copy link
Member

njzjz commented Mar 10, 2026

It seems that the error in #5300 is occasional, and cannot be reproduced again. Thus, I decide to close this PR since I cannot validate whether this PR fixes something.

@njzjz njzjz closed this Mar 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants