Skip to content

fix(T11552): collision-safe brain_decisions.id — decision-store works under concurrent writes (P1 integrity)#901

Merged
kryptobaseddev merged 1 commit into
mainfrom
task/T11552-brain-decision-id
Jun 1, 2026
Merged

fix(T11552): collision-safe brain_decisions.id — decision-store works under concurrent writes (P1 integrity)#901
kryptobaseddev merged 1 commit into
mainfrom
task/T11552-brain-decision-id

Conversation

@kryptobaseddev
Copy link
Copy Markdown
Owner

Summary

Fixes a P1 Living-Brain integrity defect (T11552): cleo memory decision-store hit a repeatable UNIQUE constraint failed: brain_decisions.id under concurrent writes, blocking the BRAIN decision-store and forcing the CLEO_OWNER_OVERRIDE fallback. Two design agents independently could not anchor a decision atom on 2026-06-01.

Root cause (proven)

storeDecision allocated the sequential Dnnn id with a MAX(id)+1 read in application code (nextDecisionId — an async function with await boundaries) and only INSERTed the row later. The full path: cleo memory decision-storedecision.store op → engine-compat.tsstoreDecision()nextDecisionId()INSERT INTO brain_decisions(id PRIMARY KEY).

Two agents writing in the same instant both read the same MAX(id) (e.g. both see D042), both compute D043, and the second INSERT collides on the id primary key. Because concurrent writers always read the same MAX, the collision is deterministic and repeatable — exactly the observed symptom.

Fix (root-cause, not band-aid)

The next id is now computed by a MAX(CAST(substr(id,2) AS INTEGER)) + 1 subquery evaluated inside the INSERT statement (BrainDataAccessor.addDecisionWithSequentialId). node:sqlite executes the statement synchronously and atomically, so the id read and the row write are one indivisible operation that concurrent async callers cannot interleave.

  • Race-free — no await boundary splits read from write.
  • Numeric ordering — correct past D999 → D1000 (the prior lexical ORDER BY id regressed there).
  • Legacy-id tolerantGLOB 'D[0-9]*' ignores foreign id shapes like D-abcd.
  • No silent drop — a real INSERT, never INSERT OR IGNORE (the exodus anti-pattern). A bounded retry remains as defense-in-depth for the genuine cross-process case.
  • The UNIQUE constraint is not weakened.

Proof

  • New regression tests (packages/core/src/memory/__tests__/decisions.test.ts): 30 concurrent storeDecision calls all persist with distinct ids (a fan-out that exhausts a naive read-then-write retry budget), plus an injected cross-process PRIMARY KEY collision the retry loop recovers from. Both tests fail against pre-fix origin/main with the exact UNIQUE constraint failed: brain_decisions.id symptom (verified by reverting the source fix).
  • Real CLI proof: three separate OS processes firing cleo memory decision-store concurrently against a migrated brain.db all succeed with distinct ids (D002/D003/D004), no override.
  • 50-way in-process concurrent insert proof: all 50 persist with distinct sequential ids.

Scope

Only packages/core/src/memory/decisions.ts, packages/core/src/store/memory-accessor.ts, and the test. Does NOT touch the conduit layer (separate L3 agent owns it). Built on origin/main post-#899 (E6-L2 brain rewrite).

Note for follow-up (out of scope)

A separate concurrency defect was observed: two processes both running first-run CREATE TABLE brain_attention on a brand-new empty brain.db race on schema bootstrap. This is the first-run-migration path, not the decision-id bug, and is out of T11552's scope.

🤖 Generated with Claude Code

… under concurrent writes (P1 integrity)

Root cause: storeDecision allocated the sequential Dnnn id via a MAX(id)+1
read in application code (nextDecisionId, async with await boundaries) and
INSERTed later. Two agents writing in the same instant both read the same
MAX(id), both proposed the same id, and the second INSERT collided on the
id PRIMARY KEY — losing the decision and forcing CLEO_OWNER_OVERRIDE.

Fix: compute the next id with a MAX(...)+1 subquery INSIDE the INSERT
statement (BrainDataAccessor.addDecisionWithSequentialId). node:sqlite runs
it synchronously/atomically so concurrent callers cannot interleave between
read and write. Also numeric (correct past D999→D1000) and legacy-id
tolerant (GLOB 'D[0-9]*'). Bounded retry kept as cross-process defense;
never INSERT OR IGNORE so no silent drop.

Regression test: 30 concurrent storeDecision inserts all persist with
distinct ids + injected cross-process collision recovery. Both fail on the
pre-fix code with the exact UNIQUE constraint failed: brain_decisions.id.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@kryptobaseddev kryptobaseddev merged commit b589bc9 into main Jun 1, 2026
68 checks passed
@kryptobaseddev kryptobaseddev deleted the task/T11552-brain-decision-id branch June 1, 2026 10:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant