You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix(webapp): emit realtime backend metrics through OpenTelemetry
The realtime backend's metrics were registered with the in-process
Prometheus registry, which is no longer how the webapp ships metrics.
They now emit through the OpenTelemetry meter (realtime_native.*,
realtime_notifier.*, realtime_shadow.*) so they flow through the
internal metrics exporter like the rest of the webapp's
instrumentation. Gauges become observable gauges sampled at export
time; names move from prom-style _total suffixes to the meter's
dot-namespaced convention.
"Live realtime wakeups by reason. A rising 'timeout' share suggests a write site is missing its publishChangeRecord delegate.",
23
22
});
24
23
25
-
construnSetResolves=newCounter({
26
-
name: "realtime_native_runset_resolve_total",
27
-
help: "Multi-run (tag-list/batch) resolve+hydrate outcomes. 'hit'/'coalesced' vs 'miss' shows how effectively concurrent same-filter feeds share a single ClickHouse + Postgres query under an env-wide wake.",
"Multi-run (tag-list/batch) resolve+hydrate outcomes. 'hit'/'coalesced' vs 'miss' shows how effectively concurrent same-filter feeds share a single ClickHouse + Postgres query.",
30
27
});
31
28
32
-
construnSetQueryMs=newHistogram({
33
-
name: "realtime_native_runset_query_ms",
34
-
help: "Latency of the multi-run resolve (ClickHouse) and hydrate (Postgres) stages.",
description: "Latency of the multi-run resolve (ClickHouse) and hydrate (Postgres) stages.",
31
+
unit: "ms",
38
32
});
39
33
40
-
constlivePollPaths=newCounter({
41
-
name: "realtime_native_live_poll_total",
42
-
help: "How live polls resolved. 'fast-hydrate' = the router woke the feed with matched runs hydrated by id (no ClickHouse); 'full-resolve' = the backstop timeout did a ClickHouse resolve. A high fast-path share is the local-membership routing working.",
"How live polls resolved. 'fast-hydrate' = router wake with rows hydrated by id (no ClickHouse); 'full-resolve' = backstop; 'cold-resolve' = fresh env subscription probed once.",
help: "Runs hydrated by the EnvChangeRouter's batch-hydrate (one query per column set per wake, shared across all feeds matching the same run — the hot-shared-tag fan-out collapse).",
help: "Fresh ClickHouse resolves that had to queue for an admission permit. A rising count means a distinct-filter reconnect stampede is being throttled (the gate is doing its job).",
"Fresh ClickHouse resolves that had to queue for an admission permit. A rising count means a distinct-filter reconnect stampede is being throttled (the gate is doing its job).",
57
47
});
58
48
59
-
constreplays=newCounter({
60
-
name: "realtime_native_replays_total",
61
-
help: "Buffered change records replayed to a newly-armed feed (inter-poll gap recovery). 'delivered' = rows reached the feed; 'empty' = candidates hydrated but none survived the filter/diff.",
"Buffered change records replayed to a newly-armed feed (inter-poll gap recovery). 'delivered' = rows reached the feed; 'empty' = candidates hydrated but none survived the filter/diff.",
64
52
});
65
53
66
-
constdeliveryLagMs=newHistogram({
67
-
name: "realtime_native_delivery_lag_ms",
68
-
help: "Live emissions: now minus the newest emitted row's updatedAt (PG clock vs app clock, so approximate). The end-to-end delivery SLI — a p99 near the backstop hold means wakes are being missed.",
"Replay-buffer evictions. 'window' expiry is normal; 'cap' means an env churns more runs inside the window than the buffer holds (replay guarantee degrading — retune the knobs).",
72
57
});
73
58
74
-
constemittedRows=newHistogram({
75
-
name: "realtime_native_emitted_rows",
76
-
help: "Rows per live emission. Deltas should be small; a fat tail means working-set/offset-floor fallbacks are re-emitting full sets.",
"Live emissions: now minus the newest emitted row's updatedAt (PG clock vs app clock, so approximate). The end-to-end delivery SLI — a p99 near the backstop hold means wakes are being missed.",
62
+
unit: "ms",
79
63
});
80
64
81
-
constbackstops=newCounter({
82
-
name: "realtime_native_backstop_total",
83
-
help: "Backstop full resolves by outcome. 'empty' is normal idle behavior; sustained 'delivered' means the notify/replay path missed changes — alert on it.",
"Backstop full resolves by outcome. 'empty' is normal idle behavior; sustained 'delivered' means the notify/replay path missed changes — alert on it.",
92
74
});
93
75
94
-
constreplayEvictions=newCounter({
95
-
name: "realtime_native_replay_evictions_total",
96
-
help: "Replay-buffer evictions. 'window' expiry is normal; 'cap' means an env churns more runs inside the window than the buffer holds (replay guarantee degrading — retune the knobs).",
0 commit comments