Skip to content

OpenTelemetry: metrics for ActivityPub fanout and activity lifecycle#770

Merged
dahlia merged 10 commits into
fedify-dev:mainfrom
dahlia:otel/fanout-retry
May 18, 2026
Merged

OpenTelemetry: metrics for ActivityPub fanout and activity lifecycle#770
dahlia merged 10 commits into
fedify-dev:mainfrom
dahlia:otel/fanout-retry

Conversation

@dahlia
Copy link
Copy Markdown
Member

@dahlia dahlia commented May 18, 2026

Closes #742, part of #316.

Background

#619 added per-recipient OpenTelemetry counters for delivery attempts, permanent failures, and delivery duration. #759 added per-task counters for the inbox, outbox, and fanout queue workers. Operators can already see how fast deliveries are draining and how loaded the queues are, but two things stayed invisible:

  • How many recipient inboxes a single fanout expanded into. A spike in delivery rate could be one busy fanout or a hundred small ones, and the existing metrics cannot tell them apart.
  • How many activities are flowing through Fedify's inbox and outbox lifecycle stages at the activity level, independent of per-recipient delivery rows and queue-task rows. Without this, a “queued but never processed” or “abandoned after retries” pattern is only visible by joining the queue task counters against logs.

What changes

packages/fedify/src/federation/metrics.ts adds these instruments:

  • activitypub.fanout.recipients (histogram, {recipient}): the recipient inbox count for each fanout enqueue. Only the activity type accompanies the value; recipient URLs and actor IDs stay out of the metric.
  • activitypub.inbox.activity (counter, {activity}): an inbound activity classified via the new activitypub.processing.result attribute as queued, processed, retried, rejected, or abandoned. Wired at every Fedify-managed lifecycle point in packages/fedify/src/federation/inbox.ts (routeActivity) and packages/fedify/src/federation/middleware.ts (#listenInboxMessage).
  • activitypub.outbox.activity (counter, {activity}): an outbound activity classified as queued, retried, or abandoned. The queued row fires from recordOutboxEnqueue on initial enqueue (attempt === 0); the retry and abandon rows fire from #listenOutboxMessage.

Per-recipient sent/failed rows are deliberately left on the existing activitypub.delivery.sent{success} and activitypub.delivery.permanent_failure counters; the new outbox counter does not duplicate them.

Design choices

  • Did not add activitypub.fanout.tasks or activitypub.outbox.retry from the issue body. Both overlap with existing fedify.queue.task.completed{role=fanout} and fedify.queue.task.enqueued{role=outbox, attempt>0}. The manual page now tells operators to read those existing measurements instead.
  • Split inbox and outbox into separate counters because their result sets differ. Outbox leaves sent and failed on the delivery counters; inbox can record rejected during routing.
  • Kept activitypub.activity.type as the full IRI (https://www.w3.org/ns/activitystreams#Create) to match existing Fedify metric and span attributes.
  • Native-retry backends do not record retried or abandoned. When queue.nativeRetrial is true Fedify returns before the retry-scheduling path, so the events never reach the recording site. The manual notes this, and the issue's native-retry-count open question stays future work.

Documentation

docs/manual/opentelemetry.md documents the three metrics, their attributes, and how the activity-level counters relate to activitypub.delivery.* and fedify.queue.task.*. It also adds activitypub.processing.result to the ActivityPub semantic-attributes table. CHANGES.md adds the unreleased 2.3.0 entry.

Test plan

  • mise run check
  • mise run test:deno (31,516 passed, 0 failed)
  • mise run test:node
  • mise run docs:build
  • Codex code review loop, per commit and on the final branch

Regression coverage exercises fanout recipient counts, outbox retry and abandonment, inbox queue-worker results, and routing-time rejected/queued/processed results.

dahlia added 7 commits May 18, 2026 08:13
Add the instrument, type, and helper plumbing for three new metrics
introduced to close the gap between per-delivery counters (added in
fedify-dev#619) and per-queue-task
counters (added in
fedify-dev#759).  Operators want a
view of activity-level pressure that does not depend on the queue
mechanism: how many recipients a fanout produced and how many
activities flowed through the inbox or outbox lifecycle.

This commit only registers the instruments; the call sites that
actually record measurements arrive in subsequent commits.

Three new instruments:

  -  activitypub.fanout.recipients (Histogram, {recipient}) records
     the recipient count for a single fanout.
  -  activitypub.inbox.activity (Counter, {activity}) classifies an
     inbound activity as queued, processed, retried, rejected, or
     abandoned via the new activitypub.processing.result attribute.
  -  activitypub.outbox.activity (Counter, {activity}) classifies an
     outbound activity as queued, retried, or abandoned via the
     same attribute.  Per-recipient sent/failed views are left on
     activitypub.delivery.* and not duplicated here.

Two new union types (InboxActivityResult, OutboxActivityResult)
bound the attribute values so cardinality stays safe.

Helpers (recordFanoutRecipients, recordInboxActivity,
recordOutboxActivity) mirror the existing recordOutboxEnqueue style
and route through the cached getFederationMetrics(meterProvider).

fedify-dev#742

Assisted-by: Claude Code:claude-opus-4-7
Record the new histogram metric at the point where Fedify enqueues a
fanout message.  The recipient count and activity type are already
known there, and recording before the message is acted on means
operators can see how much pressure each fanout produced even when
the downstream outbox queue is slow or backed up.

Recording is placed after the fanoutQueue.enqueue() await so that an
enqueue failure does not inflate the histogram.  Recipient URLs and
actor IDs deliberately stay out of the metric; only the bounded
activity type IRI accompanies the count.

fedify-dev#742

Assisted-by: Claude Code:claude-opus-4-7
Wire the activitypub.outbox.activity counter at the three Fedify-
managed lifecycle points for an outbound activity:

  -  queued: recorded from recordOutboxEnqueue() when the message's
     attempt is 0, so both Context.sendActivity() and
     OutboxContext.forwardActivity() benefit without per-caller
     wiring.  Retry re-enqueues (attempt > 0) intentionally skip
     this row; the retry-scheduling site records result=retried
     instead, with the failure context.
  -  retried: recorded inside #listenOutboxMessage() after the
     retry message is enqueued.  Native-retrial backends short-
     circuit earlier with a thrown error, so this counter remains
     a Fedify-managed signal only.
  -  abandoned: recorded in the same handler when the retry policy
     returns null and Fedify gives up on the delivery.

Per-recipient sent/failed are deliberately left on
activitypub.delivery.sent and activitypub.delivery.permanent_failure
so this counter stays activity-centric and does not duplicate the
fedify-dev#619 delivery metrics.

fedify-dev#742

Assisted-by: Claude Code:claude-opus-4-7
Assisted-by: Codex:gpt-5.5
Wire the activitypub.inbox.activity counter at every Fedify-managed
lifecycle point for an inbound activity, separating queue-mode and
no-queue paths so the same five result values (queued, processed,
retried, rejected, abandoned) classify both.

In routeActivity() (no-queue routing and queue handoff):

  -  rejected: idempotency cache hit at the HTTP routing layer,
     missing actor, no-queue listener error, and unsupported
     activity type with no registered listener.
  -  queued: successful queue.enqueue() of the incoming message.
  -  processed: successful no-queue listener completion, recorded
     immediately after the listener returns and before the
     idempotency cache write so a kv.set() failure does not lose
     the event.

In #listenInboxMessage() (queue worker path):

  -  rejected: the rare second idempotency cache hit at processing
     time (race with concurrent processing), and the no-listener
     case observed at queue-processing time.
  -  processed: successful queued listener completion (in the same
     before-cache-write position).
  -  retried: Fedify enqueued a retry message because the listener
     threw and the retry policy returned a delay.  Native-retrial
     backends short-circuit earlier with a thrown error and are
     intentionally not counted, mirroring the outbox lifecycle.
  -  abandoned: the inbox retry policy returned null.

Tests cover the queue worker lifecycle (processed/retried/abandoned)
via processQueuedTask() and the routing layer (queued/processed/
rejected for both unsupported-type and duplicate paths) via
ContextImpl.routeActivity().

fedify-dev#742

Assisted-by: Claude Code:claude-opus-4-7
Assisted-by: Codex:gpt-5.5
Extend the OpenTelemetry manual page with the three new instruments
introduced by fedify-dev#742 and a
narrative paragraph that ties them to the existing per-recipient and
per-task metric families.

Documentation changes:

  -  Add three rows to the "Instrumented metrics" table for
     activitypub.fanout.recipients, activitypub.inbox.activity, and
     activitypub.outbox.activity.
  -  Add per-instrument attribute description blocks listing the
     processing.result vocabulary (queued, processed, retried,
     rejected, abandoned for inbox; queued, retried, abandoned for
     outbox) and where each value is recorded.
  -  Note the native-retrial caveat: queue backends that declare
     nativeRetrial defer retry handling, so retried and abandoned
     are not recorded for those backends.
  -  Note the fanout strategy semantics: with the default
     fanout: "auto", activities below the 5-recipient threshold are
     delivered directly and do not appear in
     activitypub.fanout.recipients; fanout: "force" always enqueues a
     fanout task, and fanout: "skip" bypasses fanout.
  -  Add a paragraph explaining that the activity-level counters
     complement the per-recipient activitypub.delivery.* counters and
     the per-task fedify.queue.task.* metrics rather than replacing
     them.
  -  Add activitypub.processing.result to the ActivityPub semantic
     attributes reference table.

Changelog:

  -  Add an entry under the unreleased 2.3.0 section listing the
     three new instruments and the native-retrial caveat.

fedify-dev#742

Assisted-by: Claude Code:claude-opus-4-7
Assisted-by: Codex:gpt-5.5
Codex pointed out the previous wording said an outbox or fanout task
was enqueued, but Fedify only records
activitypub.outbox.activity{queued} when an initial outbox message
is enqueued (including the per-recipient outbox messages that the
fanout worker produces, not the fanout enqueue itself).  Update the
exported type's JSDoc so it matches the manual page and the actual
recording site.

fedify-dev#742

Assisted-by: Codex:gpt-5.5
Assisted-by: Claude Code:claude-opus-4-7
@dahlia
Copy link
Copy Markdown
Member Author

dahlia commented May 18, 2026

@codex review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 18, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

Adds OpenTelemetry instrumentation for ActivityPub: a fanout recipient histogram and inbox/outbox activity lifecycle counters with a activitypub.processing.result attribute; wiring and recordings are integrated into inbox routing and middleware, covered by unit and integration tests, and documented in changelog and manual.

Changes

ActivityPub Activity Lifecycle Metrics

Layer / File(s) Summary
Metrics contract and instrument definitions
packages/fedify/src/federation/metrics.ts
Adds exported InboxActivityResult and OutboxActivityResult union types and new instrument fields for activitypub.fanout.recipients, activitypub.inbox.activity, and activitypub.outbox.activity.
Metrics creation and recording helpers
packages/fedify/src/federation/metrics.ts
Registers the new instruments, implements recording methods and shared attribute builder (always sets activitypub.processing.result, conditionally activitypub.activity.type), and exports recordFanoutRecipients, recordInboxActivity, recordOutboxActivity; updates recordOutboxEnqueue to record queued on attempt 0.
Inbox routing instrumentation
packages/fedify/src/federation/inbox.ts
Integrates recordInboxActivity calls into routeActivity at decision points: rejected (idempotence/missing-actor/unsupported/error), queued (enqueue), and processed (successful handling); import reformatted.
Outbox and fanout middleware instrumentation
packages/fedify/src/federation/middleware.ts
Wires recordOutboxActivity for retried/abandoned outbox outcomes, recordInboxActivity for retried/abandoned/processed/rejected inbox outcomes, and recordFanoutRecipients after fanout task enqueue.
Metrics helper unit tests
packages/fedify/src/federation/metrics.test.ts
Tests verify histogram and counter measurements, attribute presence/absence for activity type, per-result counters for inbox/outbox activity, and enqueue-attempt behavior for outbox enqueue.
Middleware & routing integration tests
packages/fedify/src/federation/middleware.test.ts
Integration tests assert lifecycle counters for outbox retry/abandon and inbox queued/processed/retried/abandoned/rejected flows, plus fanout recipient histogram and attribute correctness.
Changelog and OpenTelemetry manual
CHANGES.md, docs/manual/opentelemetry.md
Adds Version 2.3.0 changelog entry and updates OpenTelemetry manual to list the new instruments, define activitypub.processing.result lifecycle values and recording rules, and explain relation to delivery and queue metrics.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

  • #316: Implements ActivityPub inbox/outbox/fanout metrics that overlap the recipient-count and lifecycle counters in this change.

Possibly related PRs

Suggested labels

component/federation, component/otel, component/outbox, component/inbox, type/enhancement

Suggested reviewers

  • sij411
  • 2chanhaeng
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title accurately summarizes the primary change: adding OpenTelemetry metrics for ActivityPub fanout and activity lifecycle, which is the core focus of all modifications.
Description check ✅ Passed The PR description comprehensively explains the background, changes, design choices, and test coverage related to the ActivityPub metrics implementation.
Linked Issues check ✅ Passed The PR successfully implements all core requirements from issue #742: adds activitypub.fanout.recipients histogram, activitypub.inbox.activity and activitypub.outbox.activity counters with activitypub.processing.result attributes, avoids exposing high-cardinality identifiers, and documents relationships to existing delivery/queue metrics.
Out of Scope Changes check ✅ Passed All changes are directly scoped to implementing the three new ActivityPub metrics and their integration; no unrelated modifications to unrelated features or systems are present.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces new OpenTelemetry metrics to track ActivityPub activity lifecycles, adding activitypub.inbox.activity, activitypub.outbox.activity, and activitypub.fanout.recipients. These metrics provide an activity-level view of inbox and outbox pressure by classifying outcomes as queued, processed, retried, rejected, or abandoned. The implementation includes updates to the federation engine, new recording helpers in the metrics module, and comprehensive unit tests. I have no feedback to provide.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. What shall we delve into next?

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/fedify/src/federation/inbox.ts (1)

279-287: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Record processed only after the idempotency write succeeds.

This fires before the KV marker is persisted on Lines 284-287. If kv.set() fails, the call errors out but telemetry still reports the activity as processed, and a retry will increment processed again. Move the metric below the write so it reflects durable completion.

Suggested fix
-      recordInboxActivity(
-        meterProvider,
-        "processed",
-        getTypeId(activity!).href,
-      );
       if (cacheKey != null) {
         await kv.set(cacheKey, true, {
           ttl: Temporal.Duration.from({ days: 1 }),
         });
       }
+      recordInboxActivity(
+        meterProvider,
+        "processed",
+        getTypeId(activity!).href,
+      );
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/fedify/src/federation/inbox.ts` around lines 279 - 287, The metric
"processed" is being recorded before the idempotency KV write succeeds; move the
recordInboxActivity call to after the await kv.set(...) so that
recordInboxActivity(meterProvider, "processed", getTypeId(activity!).href) only
runs if the kv.set(cacheKey, true, { ttl: Temporal.Duration.from({ days: 1 }) })
completes successfully (i.e., after the await), ensuring the processed metric
reflects durable persistence of the idempotency marker.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/manual/opentelemetry.md`:
- Around line 333-388: Add concrete usage examples for the new lifecycle metrics
by inserting short Prometheus/Grafana snippets showing how to query and
visualize activitypub.inbox.activity, activitypub.outbox.activity, and
activitypub.fanout.recipients; include example PromQL that filters by
activitypub.processing.result and activitypub.activity.type (e.g.,
rate(activitypub.inbox.activity{activitypub_processing_result="processed",activitypub_activity_type="Create"}[5m])
and a histogram query for activitypub.fanout.recipients (e.g.,
histogram_quantile(0.95, sum(rate(activitypub_fanout_recipients_bucket[5m])) by
(le, activitypub_activity_type)) ) plus a short Grafana panel description for
each metric (counter for queued/processed/retried/rejected/abandoned,
per-recipient outbox view, and fanout size distribution) so readers can
copy/paste ready-to-use dashboard queries that reference the documented labels.

In `@packages/fedify/src/federation/metrics.ts`:
- Around line 656-665: The wrapper recordFanoutRecipients currently declares
(meterProvider, activityType: string | undefined, recipientCount: number) but
passes parameters to getFederationMetrics(...).recordFanoutRecipients in the
order (recipientCount, activityType), causing an order mismatch; change the
wrapper signature to (meterProvider: MeterProvider | undefined, recipientCount:
number, activityType?: string) so the parameter order matches the instance
method recordFanoutRecipients and other wrappers like
recordInboxActivity/recordOutboxActivity, and use the optional parameter syntax
activityType?: string; keep the call to
getFederationMetrics(meterProvider).recordFanoutRecipients(recipientCount,
activityType) unchanged.

In `@packages/fedify/src/federation/middleware.ts`:
- Line 1024: The metric call recordInboxActivity(this.meterProvider,
"processed", activityType) is being emitted too early; move it so it runs only
after the idempotency KV write completes successfully (the durable idempotency
write surrounding the idempotency logic in this file), i.e., call
recordInboxActivity after the await/Promise resolution of the idempotency
put/commit (the code that persists the idempotency marker) and only on success
(not before the write or inside a try that can fail), so failures won't count as
processed and retries won't double-count.
- Around line 917-921: Permanent delivery failure branches currently return
early and never emit the terminal activitypub.outbox.activity; before each early
return in the permanent-failure/HTTP 404/410 handling code, call
recordOutboxActivity(this.meterProvider, "abandoned", message.activityType) so
these dropped queued outbox items are counted as "abandoned" — locate the
permanent-failure return paths in the middleware file and add the same
recordOutboxActivity call (using this.meterProvider and message.activityType)
just prior to returning.

---

Outside diff comments:
In `@packages/fedify/src/federation/inbox.ts`:
- Around line 279-287: The metric "processed" is being recorded before the
idempotency KV write succeeds; move the recordInboxActivity call to after the
await kv.set(...) so that recordInboxActivity(meterProvider, "processed",
getTypeId(activity!).href) only runs if the kv.set(cacheKey, true, { ttl:
Temporal.Duration.from({ days: 1 }) }) completes successfully (i.e., after the
await), ensuring the processed metric reflects durable persistence of the
idempotency marker.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 004517d7-4cbe-4872-ab75-ff7e1a22bc7a

📥 Commits

Reviewing files that changed from the base of the PR and between 15a3316 and e1434e8.

📒 Files selected for processing (7)
  • CHANGES.md
  • docs/manual/opentelemetry.md
  • packages/fedify/src/federation/inbox.ts
  • packages/fedify/src/federation/metrics.test.ts
  • packages/fedify/src/federation/metrics.ts
  • packages/fedify/src/federation/middleware.test.ts
  • packages/fedify/src/federation/middleware.ts

Comment thread docs/manual/opentelemetry.md
Comment thread packages/fedify/src/federation/metrics.ts
Comment thread packages/fedify/src/federation/middleware.ts
Comment thread packages/fedify/src/federation/middleware.ts
@codecov
Copy link
Copy Markdown

codecov Bot commented May 18, 2026

Codecov Report

❌ Patch coverage is 98.57143% with 2 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
packages/fedify/src/federation/inbox.ts 93.54% 2 Missing ⚠️
Files with missing lines Coverage Δ
packages/fedify/src/federation/metrics.ts 98.22% <100.00%> (+0.40%) ⬆️
packages/fedify/src/federation/middleware.ts 96.30% <100.00%> (+0.09%) ⬆️
packages/fedify/src/federation/inbox.ts 86.97% <93.54%> (+0.96%) ⬆️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

dahlia added 2 commits May 18, 2026 12:30
The wrapper used (meterProvider, activityType, recipientCount) while
the underlying FederationMetrics.recordFanoutRecipients method takes
(recipientCount, activityType?), and the sibling wrappers
recordInboxActivity and recordOutboxActivity already mirror their
instance methods.  Reorder the wrapper to take recipientCount before
activityType, switch to the optional `activityType?: string` form,
and update the lone call site in middleware.ts plus the unit tests.

fedify-dev#770 (comment)

Assisted-by: Claude Code:claude-opus-4-7
The activitypub.outbox.activity counter previously emitted abandoned
only when the outbox retry policy returned null after exhausted
attempts.  The permanent-failure branch of #listenOutboxMessage
returned early after recording activitypub.delivery.permanent_failure
and invoking outboxPermanentFailureHandler, so 404/410-style failures
never received a terminal activity-level row.  That left queued
unreconcilable against retried + abandoned for these failures.

Emit abandoned at that early return as well.  The per-recipient
permanent-failure detail (remote host, status code) stays on the
existing activitypub.delivery.permanent_failure counter; the new
abandoned row only marks the activity-level lifecycle as concluded.
The OutboxActivityResult JSDoc and the activitypub.outbox.activity
block in the OpenTelemetry manual page describe both abandon paths.
A regression assertion is added to the existing 410 Gone test step.

fedify-dev#770 (comment)

Assisted-by: Claude Code:claude-opus-4-7
@dahlia
Copy link
Copy Markdown
Member Author

dahlia commented May 18, 2026

/gemini review

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
docs/manual/opentelemetry.md (1)

333-393: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Add concrete usage examples for the new lifecycle metrics.

The semantic documentation is thorough, but readers need practical query examples to operationalize these metrics. Consider adding short PromQL snippets showing:

  • How to query activitypub.inbox.activity by activitypub.processing.result (e.g., rate of processed vs. rejected)
  • How to visualize activitypub.outbox.activity lifecycle outcomes (queued, retried, abandoned)
  • How to calculate histogram quantiles for activitypub.fanout.recipients (e.g., p95 fanout size by activity type)

As per coding guidelines, docs/**/*.md: "Include examples for new features in documentation."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/manual/opentelemetry.md` around lines 333 - 393, Add concrete PromQL
usage examples for the new lifecycle metrics: insert short query snippets next
to the `activitypub.inbox.activity` section showing how to filter by
`activitypub.processing.result` (e.g., rate of `processed` vs `rejected`), next
to `activitypub.outbox.activity` showing queries for lifecycle outcomes
(`queued`, `retried`, `abandoned`) and visualization/grouping by recipient or
activity type, and next to `activitypub.fanout.recipients` showing histogram
quantile calculations (e.g., p95 fanout size by `activitypub.activity.type`);
use the exact metric names `activitypub.inbox.activity`,
`activitypub.outbox.activity`, and `activitypub.fanout.recipients` in the
examples and keep each snippet brief and copy-pastable.
packages/fedify/src/federation/middleware.ts (1)

1029-1029: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Emit processed after the queued inbox item is durably finalized.

This metric is recorded before the idempotency write on Lines 1126-1129. If that KV write fails, the worker exits with an error but telemetry has already counted a processed activity, and a retry will increment it again.

Suggested fix
-          recordInboxActivity(this.meterProvider, "processed", activityType);
         } catch (error) {
           try {
             await this.inboxErrorHandler?.(context, error as Error);
@@
         if (cacheKey != null) {
           await this.kv.set(cacheKey, true, {
             ttl: Temporal.Duration.from({ days: 1 }),
           });
         }
+        recordInboxActivity(this.meterProvider, "processed", activityType);
         logger.info(
           "Activity {activityId} has been processed.",
           {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/fedify/src/federation/middleware.ts` at line 1029, The metric call
recordInboxActivity(this.meterProvider, "processed", activityType) is emitted
too early; move it so it runs only after the durable idempotency KV write
succeeds. Update the code around the idempotency write (the block that persists
the processed/inbox idempotency key) to call
recordInboxActivity(this.meterProvider, "processed", activityType) after the
write resolves/returns successfully and do not call it if the write throws or
returns an error (i.e., inside the success branch / after await of the KV write,
not before).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@docs/manual/opentelemetry.md`:
- Around line 333-393: Add concrete PromQL usage examples for the new lifecycle
metrics: insert short query snippets next to the `activitypub.inbox.activity`
section showing how to filter by `activitypub.processing.result` (e.g., rate of
`processed` vs `rejected`), next to `activitypub.outbox.activity` showing
queries for lifecycle outcomes (`queued`, `retried`, `abandoned`) and
visualization/grouping by recipient or activity type, and next to
`activitypub.fanout.recipients` showing histogram quantile calculations (e.g.,
p95 fanout size by `activitypub.activity.type`); use the exact metric names
`activitypub.inbox.activity`, `activitypub.outbox.activity`, and
`activitypub.fanout.recipients` in the examples and keep each snippet brief and
copy-pastable.

In `@packages/fedify/src/federation/middleware.ts`:
- Line 1029: The metric call recordInboxActivity(this.meterProvider,
"processed", activityType) is emitted too early; move it so it runs only after
the durable idempotency KV write succeeds. Update the code around the
idempotency write (the block that persists the processed/inbox idempotency key)
to call recordInboxActivity(this.meterProvider, "processed", activityType) after
the write resolves/returns successfully and do not call it if the write throws
or returns an error (i.e., inside the success branch / after await of the KV
write, not before).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 425ac242-19e9-4e51-a65e-52687687e7f7

📥 Commits

Reviewing files that changed from the base of the PR and between e1434e8 and bd4d2d9.

📒 Files selected for processing (5)
  • docs/manual/opentelemetry.md
  • packages/fedify/src/federation/metrics.test.ts
  • packages/fedify/src/federation/metrics.ts
  • packages/fedify/src/federation/middleware.test.ts
  • packages/fedify/src/federation/middleware.ts

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces new OpenTelemetry metrics to track the lifecycle of ActivityPub activities at a more granular level. Specifically, it adds activitypub.inbox.activity, activitypub.outbox.activity, and activitypub.fanout.recipients to monitor processing results such as 'queued', 'processed', 'retried', 'rejected', and 'abandoned'. The implementation includes updates to the documentation, core federation logic in inbox.ts and middleware.ts, and the addition of comprehensive unit and integration tests. I have no feedback to provide as there were no review comments.

@dahlia
Copy link
Copy Markdown
Member Author

dahlia commented May 18, 2026

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. You're on a roll.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@dahlia dahlia requested review from 2chanhaeng and sij411 May 18, 2026 04:10
Copy link
Copy Markdown
Contributor

@2chanhaeng 2chanhaeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! But just one niptick comment. 😅

Comment thread packages/fedify/src/federation/inbox.ts
The activity parameter of routeActivity() is declared
`activity: Activity` in RouteActivityParameters, not `Activity | null`
or `Activity | undefined`, so the non-null assertions sprinkled
through the function body have no work to do.  The inner
tracer.startActiveSpan callback also does not strip the parameter's
narrowing, so the assertions inside it were never load-bearing
either.  Drop every `activity!` in this function, and reduce the
matching `activity?.id?.href` to `activity.id?.href` for the same
reason.

fedify-dev#770 (comment)

Assisted-by: Claude Code:claude-opus-4-7
@dahlia
Copy link
Copy Markdown
Member Author

dahlia commented May 18, 2026

/gemini review

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/fedify/src/federation/inbox.ts`:
- Around line 279-287: The telemetry emit for "processed" is happening before
the idempotency write and can be emitted even when kv.set fails; move the
recordInboxActivity(...) call so it runs only after the kv.set(...) completes
successfully (i.e., after awaiting kv.set when cacheKey != null). Update the
block that checks cacheKey (using cacheKey, kv.set, getTypeId(activity).href and
recordInboxActivity) so the ttl write is awaited first and then call
recordInboxActivity for the "processed" event.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 00323ad8-54ba-4370-9b6e-50656ec62eba

📥 Commits

Reviewing files that changed from the base of the PR and between bd4d2d9 and f93da7b.

📒 Files selected for processing (1)
  • packages/fedify/src/federation/inbox.ts

Comment thread packages/fedify/src/federation/inbox.ts
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces new OpenTelemetry metrics to track ActivityPub activity lifecycle events and fanout recipient counts, providing a higher-level view of inbox and outbox pressure. Specifically, it adds the activitypub.inbox.activity and activitypub.outbox.activity counters to classify activities by outcomes such as queued, processed, or abandoned, alongside an activitypub.fanout.recipients histogram. The implementation includes updates to the documentation, the core metrics module, and the integration of recording logic within the federation middleware and inbox routing. I have no feedback to provide as there were no review comments to assess.

@dahlia dahlia merged commit 6cc0266 into fedify-dev:main May 18, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

OpenTelemetry metrics for ActivityPub fanout and retry behavior

3 participants