OpenTelemetry: metrics for WebFinger and actor discovery#772
Conversation
|
@codex review |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR adds opt-in OpenTelemetry counters and duration histograms for outgoing WebFinger lookups, incoming WebFinger handling, and actor discovery; it threads a MeterProvider through call paths, introduces bounded result/scheme enums, per-MeterProvider instrument caching, recording helpers, tests, and docs/changelog updates. ChangesOpenTelemetry Metrics for WebFinger and Actor Discovery
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Codex Review: Didn't find any major issues. Another round soon, please! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/webfinger/src/lookup.ts (1)
274-283:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winReject malformed
acct:authorities before building the lookup URL.
acct:parsing currently accepts authorities that still contain path/query/fragment characters, which can produce unintended outbound paths and classify malformed input asnot_found/errorinstead ofinvalid.Suggested fix
if (resource.protocol === "acct:") { const atPos = resource.pathname.lastIndexOf("@"); if (atPos < 0) return { resource: null, result: "invalid" }; server = resource.pathname.substring(atPos + 1); - if (server === "") return { resource: null, result: "invalid" }; + if (server === "" || /[/?#\s]/.test(server)) { + return { resource: null, result: "invalid" }; + } } else {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/webfinger/src/lookup.ts` around lines 274 - 283, When parsing acct: authorities (the atPos/server extraction using resource.pathname.lastIndexOf and server = resource.pathname.substring(atPos + 1)), validate the extracted server string does not contain any path/query/fragment characters (e.g. '/', '?', '#') and is non-empty; if it contains any of those characters return { resource: null, result: "invalid" } instead of proceeding to build the lookup URL. Apply this check before constructing the URL (the new URL(`${protocol}//${server}/.well-known/webfinger`) and setting url.searchParams) so malformed acct: inputs are rejected as invalid.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/fedify/src/federation/metrics.ts`:
- Around line 376-387: WebFingerHandleAttributes currently exposes scheme as an
unconstrained string and recordWebFingerHandle forwards it directly; change the
metric API boundary to enforce a bounded set of schemes (e.g., a string union or
enum like "http" | "https" | "acct" | "mailto" | "other") and update
recordWebFingerHandle to validate/sanitize webfinger.resource.scheme against
that set, mapping unknown values to a single fallback bucket (e.g., "other" or
"unknown") before recording the metric to preserve bounded cardinality; apply
the same validation/sanitization wherever WebFingerHandleAttributes.scheme is
forwarded.
---
Outside diff comments:
In `@packages/webfinger/src/lookup.ts`:
- Around line 274-283: When parsing acct: authorities (the atPos/server
extraction using resource.pathname.lastIndexOf and server =
resource.pathname.substring(atPos + 1)), validate the extracted server string
does not contain any path/query/fragment characters (e.g. '/', '?', '#') and is
non-empty; if it contains any of those characters return { resource: null,
result: "invalid" } instead of proceeding to build the lookup URL. Apply this
check before constructing the URL (the new
URL(`${protocol}//${server}/.well-known/webfinger`) and setting
url.searchParams) so malformed acct: inputs are rejected as invalid.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: c716e5e8-305e-4c77-9a82-508064d8ac2a
📒 Files selected for processing (13)
CHANGES.mddocs/manual/opentelemetry.mdpackages/fedify/src/federation/metrics.test.tspackages/fedify/src/federation/metrics.tspackages/fedify/src/federation/middleware.tspackages/fedify/src/federation/webfinger.test.tspackages/fedify/src/federation/webfinger.tspackages/vocab/src/actor.test.tspackages/vocab/src/actor.tspackages/vocab/src/lookup.test.tspackages/vocab/src/lookup.tspackages/webfinger/src/lookup.test.tspackages/webfinger/src/lookup.ts
There was a problem hiding this comment.
Code Review
This pull request introduces OpenTelemetry metrics for WebFinger and actor handle discovery paths. It adds counters and histograms for outgoing WebFinger lookups (webfinger.lookup), incoming WebFinger requests (webfinger.handle), and actor handle discovery attempts (activitypub.actor.discovery). The implementation includes bucketing of resource schemes to maintain bounded cardinality and introduces specific result taxonomies for each metric family. Documentation and comprehensive unit tests for these new metrics have also been added. I have no feedback to provide.
Codecov Report❌ Patch coverage is
... and 1 file with indirect coverage changes 🚀 New features to boost your workflow:
|
WebFingerHandleAttributes.scheme was an unconstrained string. The only producer (computeResourceScheme() in webfinger.ts) already buckets the scheme to a 5-value allow list, so the runtime metric was cardinality-safe today. But every sibling helper in metrics.ts (recordKeyLookup, recordSignatureVerificationDuration, and so on) constrains its bounded metric attributes via literal-union types, and `scheme: string` was the only outlier. A future caller could have regressed the cardinality guarantee without the compiler flagging it. Adds an exported WebFingerResourceScheme literal union of "acct" | "http" | "https" | "mailto" | "other", narrows WebFingerHandleAttributes.scheme to it, narrows computeResourceScheme()'s return type, and uses a type predicate (isAllowedResourceScheme) so TypeScript infers the bounded type without runtime casts. No runtime re-sanitization inside recordWebFingerHandle(); the type signature matches the contract the other Record* helpers in this file already use. fedify-dev#772 (comment) Assisted-by: Claude Code:claude-opus-4-7
|
/gemini review |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/webfinger/src/lookup.ts`:
- Around line 277-282: The current validation only checks the extracted server
string (variable server) for path/query/fragment chars but misses query/hash on
the parsed resource; update the guard in the lookup routine that returns {
resource: null, result: "invalid" } to also check resource.search and
resource.hash (or equivalent parsed URL properties) and reject when either is
non-empty, and add regression tests alongside the existing path-component test
covering acct:user@host?x=1 and acct:user@host#frag to assert they are rejected.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: da968b05-627c-4cda-91c4-37c069d32c69
📒 Files selected for processing (4)
packages/fedify/src/federation/metrics.tspackages/fedify/src/federation/webfinger.tspackages/webfinger/src/lookup.test.tspackages/webfinger/src/lookup.ts
|
@codex review |
There was a problem hiding this comment.
Code Review
This pull request implements OpenTelemetry metrics for WebFinger and actor handle discovery, adding counters and histograms for webfinger.lookup, webfinger.handle, and activitypub.actor.discovery. The implementation ensures metric cardinality safety by bucketing resource schemes and provides detailed outcome classifications. Additionally, it strengthens acct: URI validation to comply with RFC 7565 by rejecting authorities containing path, query, or fragment components. I have no feedback to provide.
|
Codex Review: Didn't find any major issues. Keep it up! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
An earlier commit on this branch added a regex check over the substring extracted from `pathname` to reject `acct:user@host/path`, but missed query and fragment cases. The WHATWG URL parser routes `?…` into `resource.search` and `#…` into `resource.hash`, not into `pathname`, so `acct:user@host?x=1` and `acct:user@host#frag` still passed the guard and would forward to a remote WebFinger lookup despite being RFC 7565-malformed (the acct: authority is bare host: no path, query, or fragment allowed). The fix gates on `resource.search !== ""` and `resource.hash !== ""` in addition to the existing extracted-server regex. The three RFC-disallowed components have to be checked independently because the URL parser lands them in different URL fields; the comment above the guard now spells this out. Adds two regression cases in the existing "invalid resource" step. fedify-dev#772 (comment) Assisted-by: Claude Code:claude-opus-4-7
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces comprehensive OpenTelemetry metrics for WebFinger and actor handle discovery paths. It adds new counters and histograms for outgoing WebFinger lookups, incoming WebFinger requests, and actor handle discovery attempts, featuring detailed result classifications and cardinality-safe resource scheme bucketing. The implementation includes updates to the core federation logic, vocabulary handling, and the WebFinger client, including the introduction of a specific ActorHandleNotFoundError for better error classification. These changes are supported by updated documentation and extensive unit tests. I have no feedback to provide.
|
@codex review |
|
Codex Review: Didn't find any major issues. 👍 ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Adds `meterProvider` option to `lookupWebFinger()` so each call can emit one `webfinger.lookup` counter and one `webfinger.lookup.duration` histogram measurement with bounded attributes. The motivation is operational visibility: discovery failures are operationally noisy, and operators today have to grep span events to see aggregate WebFinger reliability and latency. A counter plus histogram pair lets operators graph rate, latency, and outcome mix without sampling spans. The new `webfinger.lookup.result` attribute is one of `found`, `not_found`, `invalid`, `network_error`, or `error`, derived from a typed outcome returned by the refactored internal helper. `404` and `410` map to `not_found`; other non-2xx HTTP responses map to `error`; JSON parse failures, max-redirection overruns, cross-protocol redirects, malformed Location headers, and malformed `acct:` resources map to `invalid`; `validatePublicUrl()` rejection or a thrown `fetch()` maps to `network_error`. `webfinger.resource.scheme` is always present; `http.response.status_code` is recorded only when an HTTP response was observed; `activitypub.remote.host` reflects the latest hostname Fedify attempted, so an operator can see who actually returned the failure even after one or more redirects. The helper is opt-in: omitting the option emits no measurement. This mirrors `lookupObject()`'s pattern (packages/vocab/src/lookup.ts), so callers that do not use OpenTelemetry are unaffected. Wiring the federation's `MeterProvider` through to inner `lookupWebFinger` calls lands in later commits on this branch. fedify-dev#316 fedify-dev#619 fedify-dev#739 Assisted-by: Claude Code:claude-opus-4-7 Assisted-by: Codex:gpt-5.1-codex
Adds `meterProvider` option to `getActorHandle()` so each actor handle
discovery emits one `activitypub.actor.discovery` counter and one
`activitypub.actor.discovery.duration` histogram measurement.
The motivation matches the umbrella WebFinger metrics work: a server
appears broken to users when handle resolution is slow, blocked, or
returns malformed alias data, but operators today have no aggregate
view of the actor-handle discovery path beyond per-call spans. The
new metric covers the actor → handle resolution `getActorHandle()`
performs. Handle/URL → object resolution is already covered by
`activitypub.object.lookup` on `lookupObject()`.
The `activitypub.actor.discovery.result` attribute is one of
`resolved`, `not_found`, or `error`. WebFinger-level failure detail
(HTTP status, parse failure, network failure, etc.) stays on
`webfinger.lookup`; the discovery counter intentionally collapses
those into a coarser actor-level view. The "Actor does not have
enough information…" `TypeError` is thrown via a private
`ActorHandleNotFoundError` subclass so the metric distinguishes the
real not-found terminal path from other `TypeError`-shaped failures
that come from malformed remote data (an alias that does not parse
as a URL, an invalid preferred username that breaks
`normalizeActorHandle()`, etc.); those classify as `error`. The
subclass extends `TypeError` so the documented `throws {TypeError}`
contract is preserved.
`activitypub.remote.host` records `actor.id.hostname` when known,
omitted otherwise. Actor IDs and handle strings are deliberately
excluded so attacker-controlled actor data cannot inflate metric
cardinality.
The helper is opt-in: omitting `meterProvider` emits no measurement,
mirroring the `lookupObject()` and `lookupWebFinger()` patterns.
When the meter provider is set, it is also forwarded to the nested
`lookupWebFinger` and `verifyCrossOriginActorHandle` calls so each
discovery emits both the actor-discovery measurements and the one
or two underlying `webfinger.lookup` measurements; the shared
`activitypub.remote.host` attribute lets operators correlate them.
Wiring the federation's `MeterProvider` through to context-level
`getActorHandle` callers lands in later commits on this branch.
fedify-dev#316
fedify-dev#619
fedify-dev#739
Assisted-by: Claude Code:claude-opus-4-7
Assisted-by: Codex:gpt-5.1-codex
Adds the `webfinger.handle` counter and `webfinger.handle.duration` histogram for incoming WebFinger requests handled by Fedify, plus the WebFinger-resource-scheme bucketing that keeps both the inbound and the existing outbound `webfinger.lookup` metric cardinality bounded. The new metric carries `webfinger.handle.result` (one of `resolved`, `invalid`, `not_found`, `tombstoned`, or `error`), `webfinger.resource.scheme` (when extractable), and `http.response.status_code` (when a response was produced). The result is derived from the response status code that `handleWebFinger()` returns: `200 → resolved`, `400 → invalid`, `404 → not_found`, `410 → tombstoned`, every other status (or a thrown exception) → `error`. The queried resource string is deliberately excluded from metric attributes; it remains on the existing `webfinger.handle` span for trace-level investigation. Because the `resource=` query parameter is attacker-controlled, the `webfinger.resource.scheme` attribute is bucketed to a small allow list (`acct`, `http`, `https`, `mailto` from RFC 7565), and anything else is recorded as `other`. The same bucketing is applied to the outbound `webfinger.lookup` metric attribute at the recording site, keeping cardinality stable when a remote returns a redirect to an unusual scheme; the `webfinger.resource.scheme` span attribute is unchanged so traces still carry the raw value for investigation. `Federation.fetch()` now passes the federation's `MeterProvider` to `handleWebFinger()`, and `Context.lookupWebFinger()` defaults the new `meterProvider` option from the federation's provider, mirroring the `lookupObject()` plumbing. Applications that pass a `meterProvider` to `createFederation()` therefore get both the inbound `webfinger.handle*` and outbound `webfinger.lookup*` metrics automatically. fedify-dev#316 fedify-dev#619 fedify-dev#739 Assisted-by: Claude Code:claude-opus-4-7 Assisted-by: Codex:gpt-5.1-codex
Adds the three new counter and histogram pairs from this branch to the operator-facing OpenTelemetry documentation and to the in-progress Fedify 2.3.0 changelog entry. docs/manual/opentelemetry.md: - Three new rows in the instrumented metrics table for `activitypub.actor.discovery[.duration]`, `webfinger.lookup[.duration]`, and `webfinger.handle[.duration]`. - Three new metric attribute subsections describing the bounded result enumerations, the `webfinger.resource.scheme` allow list (and the fact that this bucketing applies to the metric attribute only, not the span attribute), the cardinality exclusions (resource strings, full lookup URLs, full handles, actor IDs), and the cross-origin caveat for nested WebFinger lookups during actor discovery. - Three new rows in the semantic attributes table for the `*.result` attributes plus an updated `webfinger.resource.scheme` row noting the metric-side bucketing. CHANGES.md: - One new bullet under the Version 2.3.0 @fedify/fedify section summarising the six new instruments, the bounded attribute set, and the opt-in semantics. The note draws the right line between federation-wired plumbing (which now covers the inbound `webfinger.handle` family and the federation-bound `Context.lookupWebFinger()`) and direct `getActorHandle()` calls, which remain opt-in via `GetActorHandleOptions.meterProvider`. The nested-WebFinger forwarding note flags the cross-origin verification case so operators do not expect a single shared remote host per discovery. fedify-dev#316 fedify-dev#739 Assisted-by: Claude Code:claude-opus-4-7 Assisted-by: Codex:gpt-5.1-codex
When `lookupObject()` cannot resolve a handle through the document
loader, it falls back to `lookupWebFinger()` to find the actor URL.
This nested call previously dropped the caller's `meterProvider`
option, so an application that passed a meter provider to
`lookupObject()` (or via `Context.lookupObject()`) recorded an
`activitypub.object.lookup` measurement but no `webfinger.lookup`
measurement for the underlying probe.
Forwards the option into the nested call and adds a regression test
asserting that `lookupObject("@user@host", { meterProvider })` emits
both the `activitypub.object.lookup` counter and the
`webfinger.lookup` counter, matching the same forwarding behavior
that `getActorHandle()` already has.
fedify-dev#316
fedify-dev#739
Assisted-by: Claude Code:claude-opus-4-7
Assisted-by: Codex:gpt-5.1-codex
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/webfinger/src/lookup.ts (1)
340-353:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFix the redirect-limit off-by-one.
Line 342 rejects the Nth redirect, not the N+1st. With the current
>=check,maxRedirection: 1fails on the first 302, and the default5only follows 4 redirects.Suggested fix
redirected++; const maxRedirection = options.maxRedirection ?? DEFAULT_MAX_REDIRECTION; - if (redirected >= maxRedirection) { + if (redirected > maxRedirection) { logger.error( "Too many redirections ({redirections}) while fetching WebFinger " + "resource descriptor.", { redirections: redirected }, );As per coding guidelines, "Add regression tests that demonstrate the bug before fixing it."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/webfinger/src/lookup.ts` around lines 340 - 353, The redirect count check in lookup.ts incorrectly rejects the Nth redirect because it uses "redirected >= maxRedirection" (variables: redirected, options.maxRedirection, DEFAULT_MAX_REDIRECTION) — change the comparison to "redirected > maxRedirection" so that a maxRedirection value allows that many redirects before failing; update the logger.error call context remains the same (logger.error) and return path stays, and add regression tests that demonstrate the off-by-one (e.g., set maxRedirection=1 and assert one 302 is allowed but two fail) before making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/fedify/src/federation/middleware.ts`:
- Line 1674: The WebFinger metrics are being enabled via an implicit global
meter because calls pass this.meterProvider (which falls back to the global
provider); change the implementation so only an explicitly provided
federation-level provider is propagated: add and use a dedicated field (e.g.
explicitMeterProvider or meterProviderProvided) that is set only when a
meterProvider is passed into the constructor/config, preserve the existing
this.meterProvider getter for compatibility, and replace direct uses of
this.meterProvider in the WebFinger-related call sites (the occurrences you saw)
with that explicit field (or undefined when not set) so no global fallback is
used implicitly.
In `@packages/webfinger/src/lookup.ts`:
- Around line 273-289: The branch that special-cases resource.protocol ===
"acct:" must also handle "mailto:" so mailto: URIs with opaque paths are parsed
like acct: (extract the host from resource.pathname after the last "@", validate
server !== "" and that it contains no / ? # and resource.search/resource.hash
are empty) instead of falling through to the generic opaque-path behavior;
update the condition to include resource.protocol === "mailto:" (use the same
extraction/validation logic that references resource.pathname, resource.search,
resource.hash and the server variable) and add a regression test demonstrating
mailto:juliet@example.com resolves to querying
https://example.com/.well-known/webfinger?resource=mailto:... rather than
producing a network_error.
---
Outside diff comments:
In `@packages/webfinger/src/lookup.ts`:
- Around line 340-353: The redirect count check in lookup.ts incorrectly rejects
the Nth redirect because it uses "redirected >= maxRedirection" (variables:
redirected, options.maxRedirection, DEFAULT_MAX_REDIRECTION) — change the
comparison to "redirected > maxRedirection" so that a maxRedirection value
allows that many redirects before failing; update the logger.error call context
remains the same (logger.error) and return path stays, and add regression tests
that demonstrate the off-by-one (e.g., set maxRedirection=1 and assert one 302
is allowed but two fail) before making the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 110c0153-c4da-41f4-9019-b2d8d6d02c78
📒 Files selected for processing (13)
CHANGES.mddocs/manual/opentelemetry.mdpackages/fedify/src/federation/metrics.test.tspackages/fedify/src/federation/metrics.tspackages/fedify/src/federation/middleware.tspackages/fedify/src/federation/webfinger.test.tspackages/fedify/src/federation/webfinger.tspackages/vocab/src/actor.test.tspackages/vocab/src/actor.tspackages/vocab/src/lookup.test.tspackages/vocab/src/lookup.tspackages/webfinger/src/lookup.test.tspackages/webfinger/src/lookup.ts
`maxRedirection`'s JSDoc reads "The maximum number of redirections to follow", but `lookupWebFingerInternal` incremented `redirected` before the `redirected >= maxRedirection` check. So a caller passing `maxRedirection: N` actually followed N-1 redirects and rejected on the Nth, drifting by one from the documented contract. Sibling code in *packages/vocab-runtime/src/docloader.ts* and *packages/fedify/src/sig/http.ts* uses `>=` too, but increments the counter on recursion *after* the check, so those paths correctly allow N redirects. The WebFinger lookup was the outlier. The minimal fix flips `>=` to `>`. After the change, `maxRedirection: N` follows the Nth redirect and rejects on the (N+1)th, matching both the JSDoc and the sibling code. Adds a regression sub-step "maxRedirection: 1 follows exactly one redirect" that fails under the old code (a single-redirect mock returns null) and passes under the fix. Updates the existing "custom maxRedirection" step so its lower-bound case still exercises the failure path under the new semantics (`maxRedirection: 1` against a 2-redirect mock instead of `maxRedirection: 2`). Assisted-by: Claude Code:claude-opus-4-7
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
packages/webfinger/src/lookup.ts (1)
273-293:⚠️ Potential issue | 🟠 Major | ⚡ Quick winHandle
mailto:resources in the opaque-path branch too.
WEBFINGER_LOOKUP_SCHEME_WHITELISTnow treatsmailtoas a supported resource scheme, but this parser still only extracts the lookup host fromacct:URIs.mailto:juliet@example.comfalls through the generic branch, buildsmailto:///.well-known/webfinger, and gets reported asnetwork_errorinstead of queryinghttps://example.com/.well-known/webfinger?resource=mailto:....Suggested fix
- if (resource.protocol === "acct:") { + if (resource.protocol === "acct:" || resource.protocol === "mailto:") { const atPos = resource.pathname.lastIndexOf("@"); if (atPos < 0) return { resource: null, result: "invalid" }; server = resource.pathname.substring(atPos + 1); - // Per RFC 7565, an `acct:` URI's authority is bare `host`: no path, - // query, or fragment. The WHATWG URL parser routes a path into + // Per RFC 7565, `acct:` / `mailto:` resources derive the WebFinger host + // from the opaque `user@host` path; no extra path, query, or fragment + // component is allowed. The WHATWG URL parser routes a path into // `pathname` (after the `user@host` authority) and routes `?…` / // `#…` into `search` / `hash`, so the three components live in // different places and must be checked independently. Reject any - // acct URI that carries an extraneous component so a malformed + // resource URI that carries an extraneous component so a malformed // input cannot pass through to a remote WebFinger lookup.As per coding guidelines, "Add regression tests that demonstrate the bug before fixing it."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/webfinger/src/lookup.ts` around lines 273 - 293, The parser currently only extracts the lookup host for `acct:` URIs; update the opaque-path branch in lookup.ts that checks resource.protocol to also handle `mailto:` by extracting the mailbox host from resource.pathname (like the `acct:` logic: find last '@', validate server is non-empty and contains no `/?#`, and ensure resource.search and resource.hash are empty), then set protocol and server appropriately so mailto resources build the correct WebFinger host and query; keep the existing else branch unchanged, and add a regression test that uses `mailto:juliet@example.com` to assert it queries https://example.com/.well-known/webfinger?resource=mailto:... and not produce a network_error.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/webfinger/src/lookup.test.ts`:
- Around line 217-234: The new subtests mutate global fetchMock state inside the
outer lookupWebFinger() test but only call fetchMock.hardReset() at the end of
fn(), so a failing assertion can skip teardown; wrap the outer test body (the
fn() that sets redirectCount and calls lookupWebFinger with maxRedirection: 1
and 2) in a try/finally block and move fetchMock.hardReset() (and any other
global cleanup like resetting redirectCount if needed) into the finally so
cleanup always runs even if a subtest/assertion fails—follow the same pattern
used by the metrics test.
---
Duplicate comments:
In `@packages/webfinger/src/lookup.ts`:
- Around line 273-293: The parser currently only extracts the lookup host for
`acct:` URIs; update the opaque-path branch in lookup.ts that checks
resource.protocol to also handle `mailto:` by extracting the mailbox host from
resource.pathname (like the `acct:` logic: find last '@', validate server is
non-empty and contains no `/?#`, and ensure resource.search and resource.hash
are empty), then set protocol and server appropriately so mailto resources build
the correct WebFinger host and query; keep the existing else branch unchanged,
and add a regression test that uses `mailto:juliet@example.com` to assert it
queries https://example.com/.well-known/webfinger?resource=mailto:... and not
produce a network_error.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 84501e2a-3a92-41c4-b1b2-6d3f763e88a0
📒 Files selected for processing (2)
packages/webfinger/src/lookup.test.tspackages/webfinger/src/lookup.ts
RFC 7033 §4.5 lists `mailto:` alongside `acct:` as a permissible
WebFinger resource form, and RFC 6068 makes `mailto:` an opaque-path
scheme: the `user@host` authority lives in `pathname`, not in
`host`.
The existing `else` branch in `lookupWebFingerInternal()` builds
`${protocol}//${server}/.well-known/webfinger` from `resource.host`,
but `new URL("mailto:juliet@example.com").host === ""`. The WHATWG
URL parser accepts the resulting `https:///.well-known/webfinger`
and re-interprets it as `https://.well-known/webfinger`, so the
lookup fails with `network_error` against a bogus DNS name instead
of querying the user's host. The PR also lists `mailto` in the
`webfinger.resource.scheme` metric allow list, so misclassifying a
valid `mailto:` query as `network_error` was documentation drift.
Extends the existing opaque-path branch to match both schemes,
sharing the host-extraction and the same RFC-conformance guards
(reject path, query, fragment, or empty server). Adds a "mailto"
regression step alongside the existing "acct" and "https" tests:
it fails on the prior code and passes after the fix.
fedify-dev#772 (comment)
Assisted-by: Claude Code:claude-opus-4-7
The PR documents that passing `meterProvider` to `createFederation()` is what enables the new WebFinger metric families (`webfinger.handle`, `webfinger.lookup` via federation-bound `Context.lookupWebFinger`). But the two call sites in *middleware.ts* (the `handleWebFinger` dispatch and `ContextImpl.lookupWebFinger`) defaulted through `this.meterProvider`, the getter that falls back to `metrics.getMeterProvider()` when the federation-level field is unset. An operator with a globally-installed meter provider, who deliberately omits `meterProvider` from `createFederation`, would therefore still get the WebFinger metrics emitted; that contradicts the documented opt-in contract. The metric helpers in @fedify/webfinger and @fedify/fedify already gate emission on `meterProvider != null`, so the fix is to thread the raw `this._meterProvider` field (which is `undefined` when unset) instead of the getter. `ContextImpl.lookupWebFinger` defaults from `options.meterProvider ?? this.federation._meterProvider`, matching the same chain. Inline comments at both call sites explain why we bypass the getter, so future readers do not "fix" it back to the global-fallback shape. `ContextImpl.lookupObject` has the same `this.meterProvider` default but predates this PR; that one is intentionally out of scope here. fedify-dev#772 (comment) Assisted-by: Claude Code:claude-opus-4-7
|
/gemini review |
The outer `lookupWebFinger()` test installs the global fetch spy with
`fetchMock.spyGlobal()` and cleans up at the bottom with
`fetchMock.hardReset()`, with no try/finally between them. If any
`await t.step(...)` propagates a rejection, cleanup is skipped and
the global fetch spy leaks into the metric test below as well as any
other test files that run later in the same process.
The same pattern was already fixed for the *metrics* test in this
file via an earlier commit. The recent additions to the outer test
(the new "mailto" step, the redirect / scheme metric setups, the
`maxRedirection: 1` regression step) made the leak risk concrete.
Wraps the body from `fetchMock.spyGlobal()` to `fetchMock.hardReset()`
in `try { ... } finally { fetchMock.removeRoutes(); fetchMock.hardReset(); }`,
matching the metric test's pattern.
The diff looks large because the body re-indents one level, but the
semantic change is just the try/finally wrapper and a
`removeRoutes()` call in the finally clause.
fedify-dev#772 (comment)
Assisted-by: Claude Code:claude-opus-4-7
|
/gemini review |
|
@codex review |
There was a problem hiding this comment.
Code Review
This pull request introduces OpenTelemetry metrics for WebFinger and actor handle discovery paths, adding counters and histograms for outgoing lookups, incoming requests, and discovery attempts. The implementation includes attribute bucketing for resource schemes to prevent metric cardinality inflation and refines the maxRedirection logic to correctly follow the specified number of redirects. Comprehensive documentation and tests have been added across the affected packages. I have no feedback to provide.
|
Codex Review: Didn't find any major issues. Breezy! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
There was a problem hiding this comment.
Code Review
This pull request introduces OpenTelemetry metrics for WebFinger and actor handle discovery, enabling operators to monitor discovery rates, latency, and outcomes. The changes span core federation logic, documentation, and test suites, implementing an opt-in meter provider pattern to maintain cardinality safety. A review comment correctly identified that the current validation for 'mailto:' URIs is overly strict regarding query parameters and provided a code suggestion to align the implementation with RFC 6068.
A previous commit on this branch added a shared `acct: || mailto:` branch in `lookupWebFingerInternal()` for opaque-path host extraction, but kept the same `resource.search !== "" || resource.hash !== ""` rejection for both schemes. That rejection is overly strict for `mailto:`. RFC 6068 §2 explicitly allows `mailto:` URIs to carry `?hfields=…` header fields (and fragment identifiers), so for example `mailto:user@example.com?subject=Hi` is a syntactically valid mailto URI and a legitimate WebFinger resource per RFC 7033 §4.5. The current code returned `result=invalid` for those inputs. RFC 7565 §3 explicitly forbids path, query, or fragment on `acct:` URIs, so the asymmetry matters: the search/hash rejection stays on `acct:` and is dropped for `mailto:`. Splits the validation accordingly. The bare-authority character check (no /, ?, or # in the extracted server; non-empty) still applies to both schemes. The strict search/hash rejection now applies only when `resource.protocol === "acct:"`. Adds a regression step "mailto with hfields" with a `?subject=Hi` resource, asserting the lookup forwards the full resource URI to the WebFinger endpoint and resolves successfully. The new step fails on the prior code and passes after the fix. Inline comments cite the RFCs (6068 §2, 7565 §3) so future readers see why the two schemes diverge. fedify-dev#772 (comment) Assisted-by: Claude Code:claude-opus-4-7
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request implements OpenTelemetry metrics for WebFinger and actor handle discovery. It introduces new metrics families—webfinger.lookup, webfinger.handle, and activitypub.actor.discovery—along with corresponding duration histograms. The changes include result classification logic, attribute bucketing for cardinality safety, and integration into the federation middleware and vocabulary lookups. Extensive tests and documentation updates are also provided. I have no feedback to provide.
The catch block in `lookupWebFinger`'s span wrapper used to reassign
`outcome = { resource: null, result: "error" }` before rethrowing.
That literal is identical to the initial assignment three lines
above, and the only path that can reach the catch with `outcome`
holding a different value would require either
`await lookupWebFingerInternal()` to resolve and then
`span.setStatus()` (or the simple property access that follows it)
to throw, which is not a realistic scenario.
The reassignment was therefore dead code. Removed it, and added an
inline comment explaining that `outcome` is initialised as the
error placeholder specifically so the `finally` block always has a
valid value to record when the internal call rejects; the catch no
longer has to touch it.
fedify-dev#772 (comment)
Assisted-by: Claude Code:claude-opus-4-7
|
/gemini review |
|
@codex review |
There was a problem hiding this comment.
Code Review
This pull request implements OpenTelemetry metrics for WebFinger and actor handle discovery across the fedify, vocab, and webfinger packages. It introduces new recording functions, updates internal logic to propagate the meterProvider, and adds extensive tests to verify the new metrics. I have no feedback to provide on these changes.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a976bcfbef
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
`classifyWebFingerHandleResult()` mapped any HTTP 200 to
`webfinger.handle.result=resolved`, but `handleWebFingerInternal()`
delegates five different miss paths to the caller-provided
`onNotFound(request)` callback: no actor dispatcher, alias mapper
miss, identifier resolution miss, dispatcher returned null, and
mismatched `acct:` host. The callback can legally return any status
code, including a framework-style 200 fallback page. In that setup
unresolved WebFinger requests were counted as successful discoveries
and skewed the metric's success/error ratios and latency slices.
Wraps `options.onNotFound` once in the outer `handleWebFinger()` so
we know which response object came from the callback.
`classifyWebFingerHandleResult()` now takes that wrapped response as
a second argument and short-circuits to `not_found` when the
returned `Response` is identity-equal to the callback's, regardless
of the status code. The wrapping uses object spread, so the rest of
`WebFingerHandlerParameters` (including the meterProvider option
itself) is forwarded untouched.
Adds a regression step "records result=not_found when onNotFound
returns 200" with a callback returning
`new Response("custom fallback", { status: 200 })`. It asserts
`webfinger.handle.result=not_found` and
`http.response.status_code=200`. The step fails on the prior code
(which records `resolved`) and passes after the fix.
fedify-dev#772 (comment)
Assisted-by: Claude Code:claude-opus-4-7
|
@codex review |
|
/gemini review |
|
Codex Review: Didn't find any major issues. Already looking forward to the next diff. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
There was a problem hiding this comment.
Code Review
This pull request implements OpenTelemetry metrics for WebFinger and actor handle discovery paths across the Fedify codebase. It introduces new instrumentation in @fedify/fedify, @fedify/vocab, and @fedify/webfinger to track discovery rates, latency, and terminal outcomes. The implementation follows an opt-in pattern using a meterProvider to ensure metrics are only recorded when explicitly configured, and includes comprehensive updates to documentation and regression tests. I have no feedback to provide as there were no review comments.
Closes #739. Part of #316 (umbrella metrics issue) and builds on the meter-provider plumbing landed via #619.
What
Three new counter and histogram pairs covering the WebFinger and actor handle discovery paths:
webfinger.lookupandwebfinger.lookup.durationlookupWebFinger()webfinger.handleandwebfinger.handle.durationFederation.fetch()activitypub.actor.discoveryandactivitypub.actor.discovery.durationgetActorHandle()actor handle discoveryEach family carries a bounded result attribute so dashboards can slice discovery failures by terminal outcome:
found/not_found/invalid/network_error/errorfor outgoing lookups;resolved/invalid/not_found/tombstoned/errorfor incoming requests;resolved/not_found/errorfor actor discovery.webfinger.resource.schemeis bucketed toacct,http,https,mailto, orotherso an attacker-controlled query string cannot inflate metric cardinality.activitypub.remote.hostrecords the URL hostname only. Full resource URIs, lookup URLs, and handle strings stay on the corresponding spans for trace-level investigation.lookupWebFinger()andgetActorHandle()follow the opt-inlookupObject()pattern: omittingmeterProvideremits no measurement. PassingmeterProvidertocreateFederation()wires upwebfinger.handleand the federation-boundContext.lookupWebFinger()automatically. DirectgetActorHandle()calls remain opt-in throughGetActorHandleOptions.meterProvider, and the option is forwarded into the nested WebFinger lookups so a single discovery emits both the discovery measurement and the underlyingwebfinger.lookupmeasurement(s).lookupObject()now also forwardsmeterProviderinto its WebFinger fallback so the same nested coverage applies to handle-driven object lookups.Why
Discovery failures are hard to diagnose from user reports alone. A server can appear broken when WebFinger lookups are slow, blocked by a remote firewall, or returning malformed resource descriptors. Spans capture individual lookups well, but operators need aggregate counters and histograms to graph reliability and latency without sampling.
Notable design decisions
getActorHandle()previously threw a plainTypeErrorfrom two places: the explicit "actor lacks enough information" sentinel, and from malformed remote data flowing throughnew URL(...)andnormalizeActorHandle(). Classifying everyTypeErrorasnot_foundwould mismark malformed-data failures.The sentinel is now a private
ActorHandleNotFoundError extends TypeErrorsubclass. It keeps the documentedthrows {TypeError}API contract intact while letting the metric distinguish "real not found" from "other failure" precisely.The outgoing
lookupWebFingerInternalwas refactored to return a typed{ resource, result, statusCode?, remoteHost? }outcome rather than justResourceDescriptor | null. The refactor lets each failure branch return the right bounded result, and a try/catch around theLocationheader URL constructor preservesresult=invalidplushttp.response.status_codeeven when a redirect target is unparseable.Both
webfinger.handleandwebfinger.lookupbucket the scheme attribute through the same allow list. The corresponding spans keep the raw scheme value for trace-level investigation; only the metric attribute is bucketed.Test plan
webfinger.lookup.resultbucket, redirect-path classifications, scheme bucketing, and meterProvider opt-in behavior.activitypub.actor.discovery.resultbucket, nestedwebfinger.lookuppropagation, the malformed-alias regression, and meterProvider opt-in behavior.lookupObject()forwardsmeterProviderinto its WebFinger fallback.recordWebFingerHandle().webfinger.handle.resultbucket, scheme bucketing, and meterProvider opt-in behavior.mise run checkpasses.mise run test:denoandmise run test:nodepass.mise run docs:buildpasses.Docs and changelog
docs/manual/opentelemetry.md gains rows in the metrics table, per-metric attribute subsections (mirroring the existing
activitypub.key.lookup/activitypub.object.lookupstyle), and three new rows in the semantic attributes table. CHANGES.md gets one new entry under the in-progress Version 2.3.0 @fedify/fedify section.