Conversation
There was a problem hiding this comment.
Review completed. The implementation effectively addresses the mega-memory issue by validating merge group cohesion. The approach is sound and well-tested.
🤖 Automated review complete. Please react with 👍 or 👎 on the individual review comments to provide feedback on their usefulness.
🛡️ Jit Security Scan Results✅ No security findings were detected in this PR
Security scan by Jit
|
|
Follow-up for the automated review:
The important constraint here is that |
|
Follow-up for the Bugbot report:
|
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
| candidate_memory.id, | ||
| sorted(extra_ids), | ||
| ) | ||
| return False |
There was a problem hiding this comment.
Cohesion check blocks merges with many similar memories
Medium Severity
The extra_ids check in _semantic_merge_group_is_cohesive will produce false rejections when the database contains more similar memories than SEMANTIC_DEDUP_QUERY_LIMIT (11). With e.g. 12 nearly-identical coffee-preference memories, the outer search caps candidates at 10, but each candidate's cohesion probe (also limited to 11 results) can discover the excluded 11th/12th memory as an "extra neighbor," causing extra_ids to be non-empty and blocking the entire merge. This contradicts the PR discussion's claim that the cohesion helper "cannot legitimately observe" memories outside the merge group — it can whenever total similar memories exceed the search limit.


Summary
Context
Fixes #200.
Note
Medium Risk
Changes semantic deduplication/compaction decision logic by adding additional vector searches to validate cluster cohesion, which can alter merge behavior and has some performance impact in a core data-path.
Overview
Prevents semantic compaction “mega-memories” by adding a cohesion gate to
deduplicate_by_semantic_search: candidate neighbors must be mutually consistent, otherwise the merge is skipped to avoid bridge memories pulling in unrelated items.Also adjusts semantic-search limits (adds one-slot headroom while still capping evaluated candidates) and adds regressions: an integration test using real embeddings for the diet-vs-basketball false positive, plus a new deterministic controlled-embedding test suite covering bridge/chain scenarios and ensuring same-topic paraphrases still merge.
Written by Cursor Bugbot for commit 7456467. This will update automatically on new commits. Configure here.