-
Notifications
You must be signed in to change notification settings - Fork 8
Add WebSocket failover counter metric and URL change logging #661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add WebSocket failover counter metric and URL change logging #661
Conversation
… URL change logging
NPM Publishing labels 🏷️🛑 This PR needs labels to indicate how to increase the current package version in the automated workflows. Please add one of the following labels: |
src/metrics/index.ts
Outdated
| }), | ||
| wsConnectionFailoverCount: new client.Gauge({ | ||
| name: 'ws_connection_failover_count', | ||
| help: 'The number of consecutive connection issues (unresponsive/no data, abnormal closures), used to trigger URL failover. Resets to 0 when data flows successfully.', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see where this is reset to 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't. The underlying variable it is meant to expose streamHandlerInvocationsWithNoConnection also never resets. It just increments forever, and Tiingo uses modulo arithmetic on it, it is used in this PR:
https://github.com/smartcontractkit/external-adapters-js/pull/4543/files (even before my changes).
Open to resetting it if there is a good reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if it should reset, but the description says "Resets to 0 when data flows successfully."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chatting with @cawthorne in person, we'll remove that part of the description as it does not reset to 0. Good call
| help: 'The number of addresses in PoR request input parameters', | ||
| labelNames: ['feed_id'] as const, | ||
| }), | ||
| wsConnectionFailoverCount: new client.Gauge({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other thing I was toying with doing was making this counter increment when a ws connection closes with a code != 1000 (non healthy close).
Currently we only increment on an initial ws connection failing to connect + an open connection being unresponsive.
Summary
Adds observability for WebSocket failover mechanism to help diagnose connection issues.
Problem
During a Tiingo incident (2026-01-13 03:19-03:32 UTC), we could not determine if failover triggered:
streamHandlerInvocationsWithNoConnectioncounter not exposed as metricCENSOR_SENSITIVE_LOGS=trueThis made it impossible to answer:
Changes
1. New Prometheus Metric
ws_connection_failover_countgauge metricstreamHandlerInvocationsWithNoConnectionvalue in real-timetransport_namefor per-transport tracking