Skip to content

fix: relay reconnection does not trigger when relay is down on startup #899

@rekmarks

Description

@rekmarks

Problem

When a kernel initializes remote comms with a known relay that is currently unreachable, the libp2p node starts successfully but has no /p2p-circuit address. Relay-based connections (both inbound and outbound) are unavailable, and the kernel does not recover when the relay comes back up.

The custom relay reconnection logic in ConnectionFactory (packages/ocap-kernel/src/remotes/platform/connection-factory.ts) only triggers on connection:close events:

this.#libp2p.addEventListener('connection:close', (evt) => {
  const remotePeerId = evt.detail.remotePeer.toString();
  if (this.#relayPeerIds.has(remotePeerId)) {
    this.#scheduleRelayReconnect(remotePeerId);
  }
});

A connection:close event is never emitted for a relay that was never successfully connected, so #scheduleRelayReconnect / #reconnectRelay are never invoked. libp2p's autoDial does not reliably compensate for this — in practice the kernel remains in a broken state with respect to that relay until restart.

Expected behavior

If a relay is unreachable on startup, the kernel should actively retry connecting to it with the same exponential-backoff mechanism used for post-connection relay dropouts (#reconnectRelay, base delay 5s, max delay 60s, max 10 attempts), and recover automatically when the relay comes back up.

Suggested fix

After libp2p.start(), check whether each known relay is connected. For any relay not yet connected, call #scheduleRelayReconnect immediately so the retry path is exercised symmetrically for both startup failures and post-connection dropouts.

A periodic watchdog that detects and reschedules disconnected relays would also address this more robustly.

Affected file

packages/ocap-kernel/src/remotes/platform/connection-factory.ts

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions