Skip to content

Prevent semantic compaction mega-memories#201

Open
abrookins wants to merge 3 commits intomainfrom
fix/semantic-compaction-cohesion
Open

Prevent semantic compaction mega-memories#201
abrookins wants to merge 3 commits intomainfrom
fix/semantic-compaction-cohesion

Conversation

@abrookins
Copy link
Collaborator

@abrookins abrookins commented Mar 13, 2026

Summary

Context

Fixes #200.


Note

Medium Risk
Changes semantic deduplication/compaction decision logic by adding additional vector searches to validate cluster cohesion, which can alter merge behavior and has some performance impact in a core data-path.

Overview
Prevents semantic compaction “mega-memories” by adding a cohesion gate to deduplicate_by_semantic_search: candidate neighbors must be mutually consistent, otherwise the merge is skipped to avoid bridge memories pulling in unrelated items.

Also adjusts semantic-search limits (adds one-slot headroom while still capping evaluated candidates) and adds regressions: an integration test using real embeddings for the diet-vs-basketball false positive, plus a new deterministic controlled-embedding test suite covering bridge/chain scenarios and ensuring same-topic paraphrases still merge.

Written by Cursor Bugbot for commit 7456467. This will update automatically on new commits. Configure here.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. The implementation effectively addresses the mega-memory issue by validating merge group cohesion. The approach is sound and well-tested.


🤖 Automated review complete. Please react with 👍 or 👎 on the individual review comments to provide feedback on their usefulness.

@jit-ci
Copy link

jit-ci bot commented Mar 13, 2026

🛡️ Jit Security Scan Results

CRITICAL HIGH MEDIUM

✅ No security findings were detected in this PR


Security scan by Jit

@abrookins
Copy link
Collaborator Author

Follow-up for the automated review:

  • kept the semantic dedup candidate cap at 10, but made it explicit via a shared SEMANTIC_DEDUP_SEARCH_LIMIT constant so the outer search and the cohesion probe cannot drift
  • added a regression showing that a fully capped 10-memory merge group still passes the cohesion check

The important constraint here is that deduplicate_by_semantic_search() already limits the merge group to 10 returned candidates, so the cohesion helper cannot legitimately observe a larger merge set in the current implementation.

@abrookins
Copy link
Collaborator Author

Follow-up for the Bugbot report:

  • deduplicate_by_semantic_search() now queries with one slot of headroom before filtering out memory.id, then caps the merge group back to SEMANTIC_DEDUP_SEARCH_LIMIT
  • the cohesion probe uses the same headroom so compaction sees the same candidate window as the initial indexing path
  • added a regression that reproduces the indexed-anchor case where the old compaction path could drop a valid candidate and reject the merge

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

candidate_memory.id,
sorted(extra_ids),
)
return False
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cohesion check blocks merges with many similar memories

Medium Severity

The extra_ids check in _semantic_merge_group_is_cohesive will produce false rejections when the database contains more similar memories than SEMANTIC_DEDUP_QUERY_LIMIT (11). With e.g. 12 nearly-identical coffee-preference memories, the outer search caps candidates at 10, but each candidate's cohesion probe (also limited to 11 results) can discover the excluded 11th/12th memory as an "extra neighbor," causing extra_ids to be non-empty and blocking the entire merge. This contradicts the PR discussion's claim that the cohesion helper "cannot legitimately observe" memories outside the merge group — it can whenever total similar memories exceed the search limit.

Additional Locations (1)
Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Semantic Merging Creates "Mega-Memories" with Unrelated Topics (Snowball Effect)

1 participant