OpenTelemetry: metrics for key lookup and remote document fetches#771
Conversation
Introduce five new OpenTelemetry instruments in FederationMetrics, with bounded attribute taxonomies so operators get aggregate visibility into public-key lookups and remote JSON-LD document fetches without exploding metric cardinality: - activitypub.key.lookup (counter) - activitypub.key.lookup.duration (histogram) - activitypub.document.fetch (counter) - activitypub.document.fetch.duration (histogram) - activitypub.document.cache (counter) The taxonomy is encoded as the LookupKind and LookupResult union types, together with the narrower DocumentFetchKind / DocumentFetchResult / KeyLookupResult aliases so the type system rejects impossible combinations (e.g. recordDocumentFetch with kind public_key, or a key lookup ending in cache miss). recordKeyLookup, recordDocumentFetch, and recordDocumentCache take a remoteUrl: URL rather than a host string and derive activitypub.remote.host through the existing getRemoteHost() helper, so callers cannot accidentally smuggle key IDs, actor URLs, or full URLs into a metric attribute. classifyFetchError centralises the result classification, mapping FetchError responses to not_found (404 / 410) or error with their status code and treating FetchError without a Response, bare TypeError (the shape native fetch() raises on DNS / connect / TLS failures), and AbortError as network_error. Counter and histogram are always emitted together on document fetches so aggregate rate and latency views stay in sync. Only the instrument and helper plumbing lands in this commit; call-site wiring in fetchKey, the document loader, kvCache, and the Federation middleware follows in subsequent commits. fedify-dev#316 fedify-dev#738 Assisted-by: Claude Code:claude-opus-4-7 Assisted-by: Codex:gpt-5.5
Extend `kvCache()` with optional `meterProvider` and `kind` parameters so it can emit one `activitypub.document.cache` measurement each time it consults the KV store: `result=miss` when it falls through to the inner loader and `result=hit` when the KV store returns a cached `RemoteDocument`. The metric is recorded through `recordDocumentCache()` from Step 1, which derives `activitypub.remote.host` from a `URL` and keeps the bounded `activitypub.lookup.kind` / `activitypub.lookup.result` attribute taxonomy. `kind` defaults to `"object"` so the generic document-loader case does not need a new option, and context loaders opt in by passing `kind: "context"`. Preloaded-context short-circuits and rule-less calls do not consult the KV cache and therefore do not emit a measurement. The next commit wires this option through the Federation middleware's document and context loader factories. fedify-dev#316 fedify-dev#738 Assisted-by: Claude Code:claude-opus-4-7 Assisted-by: Codex:gpt-5.5
Add `instrumentDocumentLoader()` to the federation metrics module so any
`DocumentLoader` Fedify wraps can record one
`activitypub.document.fetch` + one `activitypub.document.fetch.duration`
measurement per call. The wrapper
- takes a meter provider, a bounded `DocumentFetchKind`, and an
optional `cacheEnabled` flag,
- parses the requested URL once and derives
`activitypub.remote.host` via `getRemoteHost()`, omitting the
attribute when the URL does not parse,
- records `result=fetched` on success and runs
`classifyFetchError()` on rejection (rethrowing the original
error), so non-2xx HTTP responses surface as `not_found` or
`error` with `http.response.status_code`, and native fetch
failures (`TypeError`, `AbortError`, `FetchError` without a
response) surface as `network_error`, and
- short-circuits to the inner loader untouched when no meter
provider is supplied, so opt-out is the default and applications
without OpenTelemetry pay nothing.
`classifyFetchError()`'s return type is narrowed from `LookupResult`
to `DocumentFetchResult` since it never returns `hit` or `miss`; that
also makes it directly assignable to both `DocumentFetchAttributes`
and `KeyLookupAttributes`.
The Federation middleware wiring that uses this wrapper follows in
the next commit.
fedify-dev#316
fedify-dev#738
Assisted-by: Claude Code:claude-opus-4-7
Assisted-by: Codex:gpt-5.5
Add an optional `meterProvider` to `FetchKeyOptions` and instrument the
shared `fetchKeyWithResult()` helper so every `fetchKey()` and
`fetchKeyDetailed()` call now records exactly one
`activitypub.key.lookup` + `activitypub.key.lookup.duration`
measurement, classified by outcome:
- cached valid key or cached negative entry → result=hit
- fresh fetch + parseable key → result=fetched
- fresh fetch + no usable key (resolveFetchedKey returns null)
→ result=invalid
- documentLoader threw → classified via `classifyFetchError()`,
surfacing `http.response.status_code` on `FetchError` rows and
mapping 404 / 410 to not_found, FetchError without a response /
TypeError / AbortError to network_error
- any unexpected throw upstream → result=error (default outcome
is never overridden before the failing line)
Each measurement carries `activitypub.lookup.kind=public_key`,
`activitypub.cache.enabled` (true when `options.keyCache` is set), the
key URL's hostname under `activitypub.remote.host`, and the HTTP status
code when an HTTP response was observed. The outcome is set only on
the successful return paths, so a throw from `onCachedUnavailable` or
`onFetchError` leaves the default `error` outcome rather than
mislabelling a partial failure as `hit`.
The HTTP signature verifier (`sig/http.ts`) had two `fetchKeyDetailed`
call sites — the draft-cavage path and the RFC 9421 path — that only
forwarded `tracerProvider` and not `meterProvider`. Both now thread
`meterProvider` through, and the verifier-path test asserts the new
`activitypub.key.lookup` measurement lands on the configured test
provider. The Linked Data Signatures and Object Integrity Proofs
paths already forward the full `options` object, so spreading
`options.meterProvider` is enough there.
fedify-dev#316
fedify-dev#738
Assisted-by: Claude Code:claude-opus-4-7
Assisted-by: Codex:gpt-5.5
Layer the new OpenTelemetry instrumentation onto the loader factories
that `FederationImpl` constructs. The default `documentLoaderFactory`
now produces a `kvCache()` configured with `meterProvider` and
`kind: "object"`, and a separate default `contextLoaderFactory` does
the same with `kind: "context"`. Every factory's output is wrapped
in `instrumentDocumentLoader()` so each call records exactly one
`activitypub.document.fetch` plus `activitypub.document.fetch.duration`,
and the `kvCache()` layer emits `activitypub.document.cache` hit/miss.
The authenticated document loader factory is also wrapped, with
`cacheEnabled: false` since it intentionally bypasses the cache.
`activitypub.cache.enabled` is set to `true` only for the built-in
factories where Fedify can guarantee `kvCache()` is in the path; for
user-supplied factories caching is opaque, so the attribute is omitted
rather than recorded as a confident `true`/`false`. When no
`meterProvider` is configured, `instrumentDocumentLoader()` short-
circuits and returns the inner loader unchanged, so `ctx.documentLoader`
and `ctx.contextLoader` keep strict reference equality with
user-supplied loaders for callers that do not use OpenTelemetry.
The historical fallback where omitting `contextLoaderFactory` reuses
the (possibly user-customised) `documentLoaderFactory` is preserved;
the wrapping just relabels the metric attributes to `kind=context` at
the call site.
The `authenticatedDocumentLoaderFactory` wrapper now also forwards
the optional `DocumentLoaderFactoryOptions` second argument so
user-supplied authenticated factories still receive their per-call
`allowPrivateAddress` / `userAgent` overrides.
Three new tests in `middleware.test.ts` cover:
- `ctx.documentLoader` records `kind=object` + omitted
`cache.enabled` when the user supplies a custom factory.
- `ctx.contextLoader` records `kind=context`.
- `ctx.documentLoader` and `ctx.contextLoader` are strict-equal to
the user-supplied loader when no `meterProvider` is configured.
A fourth test verifies that a user-supplied
`authenticatedDocumentLoaderFactory` still receives its second
`DocumentLoaderFactoryOptions` argument.
fedify-dev#316
fedify-dev#738
Assisted-by: Claude Code:claude-opus-4-7
Assisted-by: Codex:gpt-5.5
Record a new `activitypub.object.lookup` counter at every
`lookupObject()` call so operators can see how many lookups resolved
to an Actor, to a non-actor Object, or to nothing — without having
to mix that classification into the loader-level
`activitypub.document.fetch` counter (which would over-count
unique remote fetches).
The counter takes `activitypub.lookup.kind` ∈ `{actor, object, other}`
and an optional `activitypub.remote.host` (hostname only). The
host-extraction helper handles `URL` instances, plain `https://…`
strings, `acct:user@host` URIs (both `URL` and string forms), and bare
fediverse handles (`@user@host`, `user@host`). Anything that does not
reduce cleanly to an authority — paths, query strings, fragments, or
whitespace mixed in with the handle suffix — is dropped from the
metric attribute rather than recorded, so adversarial inputs cannot
inflate cardinality on `activitypub.remote.host`.
The metric is opt-in: `lookupObject()` only emits when its caller
passes a `meterProvider`. `Context.lookupObject()` threads
`this.meterProvider` through so Federation-managed call sites get the
counter automatically; direct callers of `lookupObject()` that do not
set up OpenTelemetry pay nothing.
The instrument lives in `@fedify/vocab` (the same package as
`lookupObject` itself), so the recording site does not introduce a
circular dependency on `@fedify/fedify`. The counter is created
lazily per `MeterProvider` via a `WeakMap` cache that mirrors the
`getFederationMetrics` pattern.
fedify-dev#316
fedify-dev#738
Assisted-by: Claude Code:claude-opus-4-7
Assisted-by: Codex:gpt-5.5
Extend `docs/manual/opentelemetry.md` with the new instruments introduced for issue fedify-dev#738. The "Instrumented metrics" table gains six rows for `activitypub.key.lookup`, `activitypub.key.lookup.duration`, `activitypub.document.fetch`, `activitypub.document.fetch.duration`, `activitypub.document.cache`, and `activitypub.object.lookup`, and each receives its own attribute subsection. The prose explains the shared bounded `activitypub.lookup.kind` and `activitypub.lookup.result` taxonomy, when `activitypub.cache.enabled` and `http.response.status_code` are recorded, what is deliberately excluded from the metric attributes for cardinality discipline, and how the new `activitypub.key.lookup.duration` complements the signature-scoped `activitypub.signature.key_fetch.duration` histogram that already covers the HTTP / Linked Data / Object Integrity slice. `CHANGES.md` gets a 2.3.0 entry under `@fedify/fedify` summarising the new instruments, the attribute discipline, and the relationship to the existing signature-scoped key-fetch histogram, with a link reference to issue fedify-dev#738. fedify-dev#316 fedify-dev#738 Assisted-by: Claude Code:claude-opus-4-7 Assisted-by: Codex:gpt-5.5
Add a `[!NOTE]` block to the "Explicit MeterProvider configuration" section that calls out which metrics are opt-in (only emit when an explicit `meterProvider` is configured) and which follow the standard "fall back to the global `MeterProvider`" behavior. Only `activitypub.document.fetch[.duration]` and `activitypub.document.cache` are opt-in inside Fedify, because the `instrumentDocumentLoader()` wrapper short-circuits to the original loader when no meter provider is supplied so that `ctx.documentLoader === userLoader` continues to hold and existing reference-identity tests do not break. Every other metric — including `activitypub.key.lookup[.duration]` and `activitypub.object.lookup` when reached through `Context.lookupObject()` — uses Fedify's standard global-provider fallback because the recording helpers do not short-circuit and `Context.meterProvider` resolves to the global default when nothing was configured. A trailing sentence clarifies that direct `lookupObject()` calls from `@fedify/vocab` (bypassing the Federation context) still require an explicit `LookupObjectOptions.meterProvider` since they have no Federation-level provider to fall back on. fedify-dev#316 fedify-dev#738 Assisted-by: Claude Code:claude-opus-4-7 Assisted-by: Codex:gpt-5.5
|
@codex review |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds bounded OpenTelemetry instruments and helpers for public-key lookups, remote JSON‑LD document fetches and cache hit/miss, and object/actor lookups; threads a MeterProvider through federation middleware and signature/key-fetch paths; extends kvCache and lookupObject to emit opt-in metrics; adds tests and docs. ChangesOpenTelemetry metrics for federation
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/vocab/src/lookup.test.ts`:
- Around line 857-858: Teardown currently calls fetchMock.removeRoutes() but
does not fully restore the global fetch after using fetchMock.spyGlobal();
update the test teardown to call fetchMock.hardReset() (in addition to
removeRoutes()) so the global fetch spy is removed and state is not leaked
across tests; locate the teardown where fetchMock.removeRoutes() is invoked and
append a call to fetchMock.hardReset().
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: f9497b6f-7cfb-448f-b2b2-f5b50d666c4d
📒 Files selected for processing (14)
CHANGES.mddocs/manual/opentelemetry.mdpackages/fedify/src/federation/metrics.test.tspackages/fedify/src/federation/metrics.tspackages/fedify/src/federation/middleware.test.tspackages/fedify/src/federation/middleware.tspackages/fedify/src/sig/http.test.tspackages/fedify/src/sig/http.tspackages/fedify/src/sig/key.test.tspackages/fedify/src/sig/key.tspackages/fedify/src/utils/kv-cache.test.tspackages/fedify/src/utils/kv-cache.tspackages/vocab/src/lookup.test.tspackages/vocab/src/lookup.ts
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive set of OpenTelemetry metrics to monitor ActivityPub operations, including public key lookups, remote JSON-LD document fetches, KV cache performance, and object classification. The implementation adds instrumentation to core loaders and signature verification paths while ensuring that document and context loader metrics remain opt-in to preserve reference identity for existing tests. The review feedback identifies a security improvement opportunity in the authenticated document loader to mitigate SSRF risks by strictly blocking private network access for indirect resource fetches, such as JSON-LD contexts, even when direct fetches are relaxed.
|
Codex Review: Didn't find any major issues. Nice work! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
The new `lookupObject() records activitypub.object.lookup counter` test block installs a `fetchMock.spyGlobal()` spy but previously only called `fetchMock.removeRoutes()` in teardown, leaving the global `fetch` spy installed after the block finished. Call `fetchMock.hardReset()` after `removeRoutes()` so the global `fetch` is restored, matching the pattern the file's other `spyGlobal()` blocks already use. fedify-dev#771 (comment) Assisted-by: Claude Code:claude-opus-4-7
|
/gemini review |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/vocab/src/lookup.test.ts`:
- Around line 701-703: The fetch-mock teardown is currently conditional and can
leak if a test fails; immediately register unconditional cleanup right after
calling fetchMock.spyGlobal() (e.g., via t.after(...) or by wrapping the test
body in try/finally) so that fetchMock.hardReset() / fetchMock.restore() always
run; update the locations where fetchMock.spyGlobal() is used (see the block
containing fetchMock.spyGlobal(), fetchMock.removeRoutes(), fetchMock.get(...)
and the similar occurrence around lines 857-858) to ensure teardown is
registered before any assertions or setup that might throw.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 2de61636-5c5c-4900-a54f-0bee571b7869
📒 Files selected for processing (1)
packages/vocab/src/lookup.test.ts
There was a problem hiding this comment.
Code Review
This pull request introduces comprehensive OpenTelemetry metrics for ActivityPub operations, including public key lookups, remote JSON-LD document fetches, KV-backed cache hits/misses, and lookupObject() classifications. It adds new counters and histograms to track latency and success rates while carefully excluding high-cardinality data to prevent cardinality explosion. The implementation includes a new instrumentDocumentLoader utility and updates to the core federation engine, signature verification, and vocabulary lookup logic. Feedback was provided regarding several violations of the repository's style guide in the documentation, specifically concerning the incorrect use of spaces around em dashes in narrative text.
|
@codex review |
|
Codex Review: Didn't find any major issues. 🚀 ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
The new `lookupObject() records activitypub.object.lookup counter`
test block calls `fetchMock.spyGlobal()` at the top and pairs it with
`fetchMock.removeRoutes(); fetchMock.hardReset();` at the bottom, but
the cleanup was unconditional only on the happy path: if any
`await t.step(...)` propagated a rejection, the global `fetch` spy
stayed installed and leaked into later tests.
Wrap the block body in `try { ... } finally { ... }` so the spy is
always torn down, even when an assertion inside a step fails.
`t.after()` was not used because Deno's `TestContext` does not expose
an `after` hook and the fixture wrapper is built around the
Deno-native context shape.
Existing test steps were re-indented one level deeper to fit inside
the try block; no test logic changed.
fedify-dev#771 (comment)
Assisted-by: Claude Code:claude-opus-4-7
Seven em-dashes in narrative paragraphs of the OpenTelemetry guide
used surrounding spaces, which the project style guide explicitly
forbids in running prose ("em dashes in narrative text should not
have surrounding spaces"). Gemini's code-assist bot flagged all
seven on the PR.
Rather than just removing the spaces around each em-dash, rewrite
the punctuation so the prose does not use em-dashes at all. Em
dashes read as LLM-style in this codebase, and the replacements
(commas, semicolons, colons, parentheses, or a sentence split) all
read better in context.
No semantic content changed.
fedify-dev#771 (comment)
fedify-dev#771 (comment)
fedify-dev#771 (comment)
fedify-dev#771 (comment)
fedify-dev#771 (comment)
fedify-dev#771 (comment)
fedify-dev#771 (comment)
Assisted-by: Claude Code:claude-opus-4-7
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces OpenTelemetry metrics for public key lookups, remote JSON-LD document fetches, and object lookups to enhance observability of cache hits and fetch performance. New instruments like activitypub.key.lookup and activitypub.document.fetch track success rates and latencies across the federation layer. The implementation includes instrumentation for document loaders and key fetchers, along with updated documentation and extensive test coverage. Feedback suggests that the authenticated document loader factory should be modified to respect per-call options such as allowPrivateAddress and userAgent instead of relying solely on federation-level defaults.
|
@codex review |
|
Codex Review: Didn't find any major issues. 🎉 ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
The built-in default `authenticatedDocumentLoaderFactory` previously ignored its second `factoryOpts: DocumentLoaderFactoryOptions` argument and always built the authenticated loader with the constructor-level `allowPrivateAddress` / `userAgent`. That was inconsistent with the regular `documentLoaderFactory` and `contextLoaderFactory` defaults a few lines up, which already use `opts?.allowPrivateAddress ?? allowPrivateAddress` and `opts?.userAgent ?? userAgent` to honor per-call overrides. Mirror that pattern in the authenticated default so the constructor defaults remain the fallback but per-call overrides now win. The user-supplied auth factory branch already forwards `factoryOpts` directly via `userAuthFactory(identity, factoryOpts)` (added earlier in this PR), so both branches now behave consistently. fedify-dev#771 (comment) Assisted-by: Claude Code:claude-opus-4-7
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces comprehensive OpenTelemetry metrics for ActivityPub operations, including public key lookups, remote JSON-LD document fetches, and object classification within lookupObject(). It adds new counters and histograms to track latency, cache performance, and resolution results while maintaining strict cardinality bounds on attributes. The implementation includes an opt-in mechanism for loader metrics to preserve reference identity for existing tests. Documentation has been updated to reflect these changes, and extensive unit and integration tests have been added across the federation and vocabulary packages. I have no feedback to provide as there were no review comments to assess.
|
@codex review |
|
Codex Review: Didn't find any major issues. Keep it up! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
The `activitypub.object.lookup` measurement's `kind` was assigned
after `lookupObjectInternal` returned, but only past several lines
that can throw, including `result.toJsonLd(options)` inside the
`span.addEvent("activitypub.object.fetched", ...)` payload. If any
of those threw, `kind` stayed at its initial `"other"` value, so the
`finally` block recorded `kind=other` even when `lookupObjectInternal`
had returned a non-null Actor or Object.
Classify the result with `kind = isActor(result) ? "actor" : "object"`
immediately after `lookupObjectInternal` returns (before the span
attributes and span event work). A thrown error past that point now
still records the correct `kind` value on the metric.
fedify-dev#771 (comment)
Assisted-by: Claude Code:claude-opus-4-7
The test was named "fetchKey() omits the meter provider has no effect on behavior", which reads as if the subject is "fetchKey() omits the meter provider" and the verb is "has no effect" (it is not a complete clause). Rename it to "fetchKey() works when meterProvider is omitted", which matches the test's actual purpose: a sanity check that omitting the new optional `meterProvider` keeps `fetchKey()` functional. No test logic changed. fedify-dev#771 (comment) Assisted-by: Claude Code:claude-opus-4-7
|
/gemini review |
|
@codex review |
There was a problem hiding this comment.
Code Review
This pull request introduces comprehensive OpenTelemetry instrumentation for Fedify, adding new metrics for public key lookups, remote JSON-LD document fetches, document cache lookups, and object resolution. It includes the necessary metric definitions, helper functions for recording these metrics, and updates to the document loader and key lookup logic to emit these events. The changes are accompanied by extensive tests and updated documentation. The review comment regarding the exclusion of 'invalid' from 'DocumentFetchResult' is valid and should be addressed to ensure type safety.
|
Codex Review: Didn't find any major issues. Already looking forward to the next diff. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
`DocumentFetchResult` was previously defined as `Exclude<LookupResult, "hit" | "miss">`, which still left `"invalid"` in the type union. But the document-loader instrumentation never produces `"invalid"`: the success path always emits `"fetched"`, and `classifyFetchError()` only returns `"not_found"`, `"network_error"`, or `"error"`. `"invalid"` is reserved for the key-lookup family, where the parser can decide that a successful HTTP response still does not contain a usable key, which the existing `activitypub.document.fetch` docs already call out. Narrow `DocumentFetchResult` to `Exclude<LookupResult, "hit" | "miss" | "invalid">` so the unreachable variant becomes a compile-time error if any future code path tries to record `result: "invalid"` on the document-fetch family. JSDoc updated to mention the third exclusion and why. No runtime behavior changes; this is a type-safety tightening. fedify-dev#771 (comment) Assisted-by: Claude Code:claude-opus-4-7
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces OpenTelemetry metrics for public key lookups, remote JSON-LD document fetches, and object lookups—improving observability of cache hits and fetch latency. The implementation spans the federation middleware, signature verification, and KV cache utilities, ensuring high-cardinality identifiers are excluded to prevent metric inflation. I have no feedback to provide as no review comments were submitted.
|
@codex review |
|
Codex Review: Didn't find any major issues. What shall we delve into next? ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Summary
Resolves #738 (sub-task of #316). Fedify already produces spans (
activitypub.fetch_key,activitypub.fetch_document,activitypub.lookup_object) and the signature-scopedactivitypub.signature.key_fetch.durationhistogram, but operators cannot answer aggregate questions like “how often are key lookups served from cache?”, “which remote host is responsible for slow document fetches?”, or “what share oflookupObject()calls resolve to an Actor?” with the existing telemetry. This PR adds six new instruments and the wiring to emit them across Fedify's loader and lookup paths.New instruments, all defined on
FederationMetricsin packages/fedify/src/federation/metrics.ts exceptactivitypub.object.lookup, which lives in packages/vocab/src/lookup.ts to avoid a circular dependency on@fedify/fedify:activitypub.key.lookup(counter) andactivitypub.key.lookup.duration(Histogram) cover everyfetchKey()/fetchKeyDetailed()call, including the three signature verification paths. The sharedclassifyFetchError()classifier maps outcomes tohit,fetched,not_found(with the HTTP status code),invalid(a successful HTTP response that did not parse into a usable key),network_error, anderror. The existingactivitypub.signature.key_fetch.durationhistogram stays in place; it keeps theactivitypub.signature.kinddimension that is the right way to slice signature verification latency by HTTP/Linked Data/Object Integrity.activitypub.document.fetch(counter) andactivitypub.document.fetch.duration(Histogram) cover every Fedify-wrapped document or context loader invocation, including the authenticated loader. TheinstrumentDocumentLoader()wrapper records the requested URL's hostname only, runs failures throughclassifyFetchError()(which now also recognises a bareTypeErrorandAbortErrorasnetwork_error, the shapes nativefetch()raises on DNS/connect/TLS/abort failures before any response is observed), and rethrows the original error so caller behavior is unchanged.activitypub.document.cache(counter) recordshit/missfrom insidekvCache()itself, where the cache decision is actually made. Preloaded JSON-LD contexts and rule-less calls do not consult the KV cache and therefore do not emit a measurement.activitypub.object.lookup(counter) records everylookupObject()call, classified asactor,object, orother(the last covers bothnullresolutions and thrown failures, since the measurement is emitted in afinallyblock). It has no companion duration histogram on purpose: the underlying document loader already recordsactivitypub.document.fetch.durationand a second duration here would double-count latency for the same wire-level fetch.All instruments share a bounded
activitypub.lookup.kindandactivitypub.lookup.resultattribute taxonomy, plusactivitypub.remote.host(hostname only, never the full URL or path),http.response.status_code(only when an HTTP response was observed), andactivitypub.cache.enabledwhere Fedify can confidently report cache presence. The TypeScript typesLookupKind,LookupResult,DocumentFetchKind,DocumentFetchResult, andKeyLookupResultenforce the bounded value sets at compile time, so impossible combinations likerecordDocumentFetchwithkind: "public_key"fail in the type-checker rather than landing as a stray attribute series at runtime. The host-extraction helper forlookupObject()is strict: it parses URLs, handlesacct:user@hostURIs in both string andURLforms, and bare@user@host/user@hosthandles, but rejects anything that contains path/query/fragment/whitespace separators so adversarial inputs cannot inflate cardinality onactivitypub.remote.host.Key IDs, actor IDs, object IDs, JSON-LD context URLs, full URLs, and fediverse handles are deliberately excluded from every new metric attribute. They remain on the corresponding spans for trace-level investigation.
The document and context loader wrappers short-circuit to the inner loader when no
meterProvideris configured oncreateFederation(). That preserves strict reference identity forContext.documentLoader,Context.contextLoader, and the authenticated document loader, so existing tests assertingctx.documentLoader === userLoadercontinue to pass. Every other metric, includingactivitypub.key.lookupandactivitypub.object.lookupwhen reached throughContext.lookupObject(), follows Fedify's standard “fall back to the globalMeterProvider“ behavior. A> [!NOTE]block in docs/manual/opentelemetry.md documents this distinction so it is not a surprise.The
authenticatedDocumentLoaderFactorywrapper also now forwards the optional secondDocumentLoaderFactoryOptionsargument so user-supplied authenticated factories keep receiving their per-callallowPrivateAddress/userAgentoverrides; a regression test in packages/fedify/src/federation/middleware.test.ts pins that behavior.Tests live alongside each piece of plumbing: helper-level coverage in packages/fedify/src/federation/metrics.test.ts, cache hit/miss in packages/fedify/src/utils/kv-cache.test.ts, key-lookup outcomes (hit, fetched, 404, TypeError) in packages/fedify/src/sig/key.test.ts, the HTTP signature verifier's
meterProviderforwarding in packages/fedify/src/sig/http.test.ts, Federation-level smoke tests in packages/fedify/src/federation/middleware.test.ts, andlookupObject()actor classification (including the strict host extraction foracct:URIs and adversarial handle suffixes) in packages/vocab/src/lookup.test.ts. The metric table, per-metric attribute breakdown, the new note about opt-in vs global-fallback semantics, and the CHANGES.md entry for Version 2.3.0 are all included.Test plan
mise checkpassesmise test:denopassesmise test:nodepassesmise run docs:buildsucceeds against the new docs/manual/opentelemetry.md additionsMeterProvider, drive a fediverse fetch, and confirm the new instruments appear with the documented attributes and bounded values (cache hit/miss on a repeatedlookupObject,kind=actorvskind=objectsplit,not_foundrow withhttp.response.status_codeon a 404 key fetch)