Skip to content

fix(memos): address review feedback on PRs #1813 and #1817#1830

Closed
chiefmojo wants to merge 5 commits into
MemTensor:mainfrom
chiefmojo:fix/memos-review-feedback
Closed

fix(memos): address review feedback on PRs #1813 and #1817#1830
chiefmojo wants to merge 5 commits into
MemTensor:mainfrom
chiefmojo:fix/memos-review-feedback

Conversation

@chiefmojo
Copy link
Copy Markdown

Summary

Addresses all feedback from the Memtensor-AI reviews on PRs #1813 (HTTP transport + JSON-RPC endpoint) and #1817 (bridge startup race, orphan trace fix, RPC loopback auth). This branch consolidates both PRs plus the review fixes.

Changes

Ship-stoppers

  • auth.ts: Added comment documenting the direct-bind assumption on the loopback RPC exemption. If a reverse proxy is ever introduced, socket.remoteAddress reflects the proxy IP and X-Forwarded-For must be consulted instead — auth will bypass silently without this documented.

  • __init__.py: Fixed unbound http_bridge / http_bridge_late — the type annotation and constructor were on the same line, so a constructor exception left the variable unbound when the except block called .close(). Extracted _connect_http_bridge() helper which deduplicates the three identical connect-and-register sites and correctly initializes to None before constructing.

  • capture.ts + memory-core.ts: Extracted "dirty_reward_rescore" to core/pipeline/recovery-constants.ts as RECOVERY_REASONS.DIRTY_REWARD_RESCORE; both files and the test now import the shared constant. A rename in the recovery path no longer silently breaks the orphan-skip guard in capture.ts.

Bugs

  • rpc.ts: Batch notifications were serializing as null (JSON-RPC 2.0 §6 violation). Added type-predicate filter to strip undefined results; all-notifications batch now returns 204 instead of an empty array (also forbidden by §6).

  • daemon_manager.py: ss -tlnp is Linux-only — on macOS daemon_pid was always None, causing kill_zombie_bridges to kill every bridge.cjs process including the daemon itself. Added lsof -iTCP:18800 -sTCP:LISTEN macOS fallback; skips killing entirely if PID still cannot be identified.

Should-fix / nits

  • daemon_manager.py: import re moved to module level; added startup_lock_active() helper.
  • __init__.py: Cold-start re-probe delay is now conditional on startup_lock_active() — skipped on first cold launch.
  • .gitignore: Removed redundant bare data entry.
  • memory-core.test.ts: Updated assertions to use RECOVERY_REASONS.DIRTY_REWARD_RESCORE.

Related

Addresses review feedback on #1813 and #1817.

🤖 Generated with Claude Code

Violet and others added 5 commits May 27, 2026 12:58
- Add POST /api/v1/rpc (JSON-RPC-over-HTTP) for out-of-process transport
- Add MemosHttpClient class (Python HTTP client, drop-in for MemosBridgeClient)
- Provider prefers HTTP when daemon already running on :18800
- Add kill_zombie_bridges() and probe_viewer_status() to daemon_manager
- Exempt /api/v1/rpc from session cookie auth (loopback-only)
- Remove tracked ephemeral files (daemon/bridge.pid, memos-local/*)
- Update .gitignore for runtime state
runReflect's orphan-steps fallback was inserting new trace rows whenever
a recovered episode's snapshot timestamps didn't match existing DB rows.
This caused trace_ids_json to grow on every bridge restart, keeping
reward.traceCount != traceIds.length and looping forever — generating
48k+ empty traces and continuous OpenRouter traffic.

Guard: when meta.recoveryReason === "dirty_reward_rescore", log and skip
the insert instead of writing new rows. Legitimate orphan handling (test
paths, genuinely dropped events) is unchanged.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
daemon_manager.py: extend ensure_viewer_daemon health-poll deadline
from 15s to 45s to cover cold Node.js starts (tsx compile + SQLite
open + FTS warmup). After the first 15s, back off probes from 0.5s
to 2s to avoid hammering a slow-starting daemon.

__init__.py: increase _open_session timeout in init from 30s to 60s
so session.open doesn't time out before the daemon finishes starting.

Add a 1s double-spawn guard: when probe_viewer_status() returns
"free", wait 1s and re-probe before falling through to
ensure_viewer_daemon(). Eliminates the race where a second gateway
session spawns a second stdio bridge because the daemon from the
first session hasn't bound the port yet.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The RPC endpoint was unconditionally exempt from session-cookie auth
to allow the local Python adapter to connect without a browser session.
On hub-mode deployments (bindHost=0.0.0.0) this left the endpoint
reachable from the network without any auth when password protection
is enabled.

Now the exemption only applies when remoteAddress is 127.0.0.1, ::1,
or ::ffff:127.0.0.1. Standard installs (bindHost=127.0.0.1) are
unaffected; the Python adapter always connects from loopback and
continues to work. Network callers on hub deployments must hold a
valid session cookie.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
 and MemTensor#1817

Ship-stoppers:
- auth.ts: add reverse proxy assumption comment to loopback RPC exemption
- __init__.py: fix unbound http_bridge/http_bridge_late on constructor exception;
  extract _connect_http_bridge helper to deduplicate 3 connect-and-register sites
- capture.ts + memory-core.ts: extract RECOVERY_REASONS constant to
  recovery-constants.ts; replace hardcoded "dirty_reward_rescore" strings

Bugs:
- rpc.ts: filter undefined from batch results (notifications serialized as null);
  return 204 when all batch items are notifications (per JSON-RPC 2.0 §6)
- daemon_manager.py: add lsof(8) macOS fallback in kill_zombie_bridges;
  skip killing entirely when daemon PID cannot be identified

Should fix:
- daemon_manager.py: move import re to module level; add startup_lock_active()
- __init__.py: make cold-start re-probe sleep conditional on startup_lock_active()

Nits:
- .gitignore: remove redundant bare data entry
- tests: use RECOVERY_REASONS.DIRTY_REWARD_RESCORE in memory-core.test.ts

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Memtensor-AI Memtensor-AI added the plugin Plugin/adapter/bridge layer (apps/ directory) label May 28, 2026
Copy link
Copy Markdown
Collaborator

@Memtensor-AI Memtensor-AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me review this PR carefully.

Needs changes — a few issues worth addressing before merge

Overall this is solid work: the unbound-variable fix, the shared constant extraction, and the macOS lsof fallback are all good improvements. A few things stood out:

Issues

1. bridge_client.pyCallable used without import

MemosHttpClient.register_host_handler and on_event/on_log use Callable in type hints, but the diff doesn't show it being imported. The existing MemosBridgeClient may already have it via TYPE_CHECKING, but if Callable isn't in the runtime namespace this will raise NameError at instantiation time.

2. bridge_client.pyentry["event"].set() deleted

The diff ends with the removal of entry["event"].set() at line 422 (the last line of _resolve). That line wakes up the caller blocked in request(). If this deletion is real and not a diff artifact, MemosBridgeClient.request() will hang forever waiting on the event. Please confirm this is just a context boundary in the diff.

3. daemon_manager.pykill_zombie_bridges race with short-lived PIDs

Between ps aux and os.kill(pid, SIGTERM), the PID could have been recycled. This is a classic TOCTOU issue. Low probability on most systems, but on containers with short PID namespaces it's more realistic. Consider checking /proc/{pid}/cmdline (Linux) or using psutil if available before sending the signal.

4. daemon_manager.pyss parsing is fragile

The regex r"pid=(\d+)" works for current ss output, but ss -tlnp output format varies across distutils versions. Since you already fall through to lsof, consider trying lsof first on all platforms (it's available on Linux too) for consistency, and only falling back to ss if lsof is missing.

5. __init__.pyviewer_status reassignment inside elif branch

elif viewer_status == "free":
    if startup_lock_active():
        time.sleep(1.0)
        viewer_status = probe_viewer_status()
    if viewer_status == "running_memos":
        ...

If startup_lock_active() is False, the second if checks the original "free" value and correctly skips. That's fine. But if the re-probe returns something other than "running_memos" (e.g. "blocked"), execution falls through to the if self._bridge is None stdio path, which is correct but not obvious. A brief comment would help future readers.

6. rpc.ts — diff is truncated, can't verify the 204 behavior

The description mentions batch notifications now return 204 instead of [], but the rpc.ts diff isn't included. If the response is truly empty (no body), ensure the Content-Type header isn't set to application/json — some HTTP clients choke on 204 + Content-Type: application/json with no body.

Nits

  • .gitignore: The comment # ── Runtime / ephemeral (Violet addition 2026-05-27) ── has a future date (2026). Typo?
  • __init__.py: The bare except Exception: pass around kill_zombie_bridges() swallows all errors silently. At minimum a logger.debug would help troubleshooting.
  • auth.ts diff is also truncated — can't verify the documented loopback assumption comment was actually added.

What looks good

  • The _connect_http_bridge() extraction properly initializes to None before construction — exactly the fix needed for the unbound variable bug.
  • RECOVERY_REASONS constant extraction is clean and the type export is a nice touch.
  • The macOS lsof fallback with the "skip if unknown" safety net is the right call.
  • Timeout bump from 15s → 45s with adaptive backoff is pragmatic.

@chiefmojo
Copy link
Copy Markdown
Author

Closing in favor of applying review fixes directly to #1813 and #1817 — keeps each PR focused and reviewable.

@chiefmojo chiefmojo closed this May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

plugin Plugin/adapter/bridge layer (apps/ directory)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants