fix(memos): address review feedback on PRs #1813 and #1817#1830
fix(memos): address review feedback on PRs #1813 and #1817#1830chiefmojo wants to merge 5 commits into
Conversation
- Add POST /api/v1/rpc (JSON-RPC-over-HTTP) for out-of-process transport - Add MemosHttpClient class (Python HTTP client, drop-in for MemosBridgeClient) - Provider prefers HTTP when daemon already running on :18800 - Add kill_zombie_bridges() and probe_viewer_status() to daemon_manager - Exempt /api/v1/rpc from session cookie auth (loopback-only) - Remove tracked ephemeral files (daemon/bridge.pid, memos-local/*) - Update .gitignore for runtime state
runReflect's orphan-steps fallback was inserting new trace rows whenever a recovered episode's snapshot timestamps didn't match existing DB rows. This caused trace_ids_json to grow on every bridge restart, keeping reward.traceCount != traceIds.length and looping forever — generating 48k+ empty traces and continuous OpenRouter traffic. Guard: when meta.recoveryReason === "dirty_reward_rescore", log and skip the insert instead of writing new rows. Legitimate orphan handling (test paths, genuinely dropped events) is unchanged. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
daemon_manager.py: extend ensure_viewer_daemon health-poll deadline from 15s to 45s to cover cold Node.js starts (tsx compile + SQLite open + FTS warmup). After the first 15s, back off probes from 0.5s to 2s to avoid hammering a slow-starting daemon. __init__.py: increase _open_session timeout in init from 30s to 60s so session.open doesn't time out before the daemon finishes starting. Add a 1s double-spawn guard: when probe_viewer_status() returns "free", wait 1s and re-probe before falling through to ensure_viewer_daemon(). Eliminates the race where a second gateway session spawns a second stdio bridge because the daemon from the first session hasn't bound the port yet. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The RPC endpoint was unconditionally exempt from session-cookie auth to allow the local Python adapter to connect without a browser session. On hub-mode deployments (bindHost=0.0.0.0) this left the endpoint reachable from the network without any auth when password protection is enabled. Now the exemption only applies when remoteAddress is 127.0.0.1, ::1, or ::ffff:127.0.0.1. Standard installs (bindHost=127.0.0.1) are unaffected; the Python adapter always connects from loopback and continues to work. Network callers on hub deployments must hold a valid session cookie. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
and MemTensor#1817 Ship-stoppers: - auth.ts: add reverse proxy assumption comment to loopback RPC exemption - __init__.py: fix unbound http_bridge/http_bridge_late on constructor exception; extract _connect_http_bridge helper to deduplicate 3 connect-and-register sites - capture.ts + memory-core.ts: extract RECOVERY_REASONS constant to recovery-constants.ts; replace hardcoded "dirty_reward_rescore" strings Bugs: - rpc.ts: filter undefined from batch results (notifications serialized as null); return 204 when all batch items are notifications (per JSON-RPC 2.0 §6) - daemon_manager.py: add lsof(8) macOS fallback in kill_zombie_bridges; skip killing entirely when daemon PID cannot be identified Should fix: - daemon_manager.py: move import re to module level; add startup_lock_active() - __init__.py: make cold-start re-probe sleep conditional on startup_lock_active() Nits: - .gitignore: remove redundant bare data entry - tests: use RECOVERY_REASONS.DIRTY_REWARD_RESCORE in memory-core.test.ts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Memtensor-AI
left a comment
There was a problem hiding this comment.
Needs changes — a few issues worth addressing before merge
Overall this is solid work: the unbound-variable fix, the shared constant extraction, and the macOS lsof fallback are all good improvements. A few things stood out:
Issues
1. bridge_client.py — Callable used without import
MemosHttpClient.register_host_handler and on_event/on_log use Callable in type hints, but the diff doesn't show it being imported. The existing MemosBridgeClient may already have it via TYPE_CHECKING, but if Callable isn't in the runtime namespace this will raise NameError at instantiation time.
2. bridge_client.py — entry["event"].set() deleted
The diff ends with the removal of entry["event"].set() at line 422 (the last line of _resolve). That line wakes up the caller blocked in request(). If this deletion is real and not a diff artifact, MemosBridgeClient.request() will hang forever waiting on the event. Please confirm this is just a context boundary in the diff.
3. daemon_manager.py — kill_zombie_bridges race with short-lived PIDs
Between ps aux and os.kill(pid, SIGTERM), the PID could have been recycled. This is a classic TOCTOU issue. Low probability on most systems, but on containers with short PID namespaces it's more realistic. Consider checking /proc/{pid}/cmdline (Linux) or using psutil if available before sending the signal.
4. daemon_manager.py — ss parsing is fragile
The regex r"pid=(\d+)" works for current ss output, but ss -tlnp output format varies across distutils versions. Since you already fall through to lsof, consider trying lsof first on all platforms (it's available on Linux too) for consistency, and only falling back to ss if lsof is missing.
5. __init__.py — viewer_status reassignment inside elif branch
elif viewer_status == "free":
if startup_lock_active():
time.sleep(1.0)
viewer_status = probe_viewer_status()
if viewer_status == "running_memos":
...If startup_lock_active() is False, the second if checks the original "free" value and correctly skips. That's fine. But if the re-probe returns something other than "running_memos" (e.g. "blocked"), execution falls through to the if self._bridge is None stdio path, which is correct but not obvious. A brief comment would help future readers.
6. rpc.ts — diff is truncated, can't verify the 204 behavior
The description mentions batch notifications now return 204 instead of [], but the rpc.ts diff isn't included. If the response is truly empty (no body), ensure the Content-Type header isn't set to application/json — some HTTP clients choke on 204 + Content-Type: application/json with no body.
Nits
.gitignore: The comment# ── Runtime / ephemeral (Violet addition 2026-05-27) ──has a future date (2026). Typo?__init__.py: The bareexcept Exception: passaroundkill_zombie_bridges()swallows all errors silently. At minimum alogger.debugwould help troubleshooting.auth.tsdiff is also truncated — can't verify the documented loopback assumption comment was actually added.
What looks good
- The
_connect_http_bridge()extraction properly initializes toNonebefore construction — exactly the fix needed for the unbound variable bug. RECOVERY_REASONSconstant extraction is clean and the type export is a nice touch.- The macOS
lsoffallback with the "skip if unknown" safety net is the right call. - Timeout bump from 15s → 45s with adaptive backoff is pragmatic.
Summary
Addresses all feedback from the Memtensor-AI reviews on PRs #1813 (HTTP transport + JSON-RPC endpoint) and #1817 (bridge startup race, orphan trace fix, RPC loopback auth). This branch consolidates both PRs plus the review fixes.
Changes
Ship-stoppers
auth.ts: Added comment documenting the direct-bind assumption on the loopback RPC exemption. If a reverse proxy is ever introduced,socket.remoteAddressreflects the proxy IP andX-Forwarded-Formust be consulted instead — auth will bypass silently without this documented.__init__.py: Fixed unboundhttp_bridge/http_bridge_late— the type annotation and constructor were on the same line, so a constructor exception left the variable unbound when theexceptblock called.close(). Extracted_connect_http_bridge()helper which deduplicates the three identical connect-and-register sites and correctly initializes toNonebefore constructing.capture.ts+memory-core.ts: Extracted"dirty_reward_rescore"tocore/pipeline/recovery-constants.tsasRECOVERY_REASONS.DIRTY_REWARD_RESCORE; both files and the test now import the shared constant. A rename in the recovery path no longer silently breaks the orphan-skip guard incapture.ts.Bugs
rpc.ts: Batch notifications were serializing asnull(JSON-RPC 2.0 §6 violation). Added type-predicate filter to stripundefinedresults; all-notifications batch now returns 204 instead of an empty array (also forbidden by §6).daemon_manager.py:ss -tlnpis Linux-only — on macOSdaemon_pidwas alwaysNone, causingkill_zombie_bridgesto kill everybridge.cjsprocess including the daemon itself. Addedlsof -iTCP:18800 -sTCP:LISTENmacOS fallback; skips killing entirely if PID still cannot be identified.Should-fix / nits
daemon_manager.py:import removed to module level; addedstartup_lock_active()helper.__init__.py: Cold-start re-probe delay is now conditional onstartup_lock_active()— skipped on first cold launch..gitignore: Removed redundant baredataentry.memory-core.test.ts: Updated assertions to useRECOVERY_REASONS.DIRTY_REWARD_RESCORE.Related
Addresses review feedback on #1813 and #1817.
🤖 Generated with Claude Code