Add kn.revision.request.queued and kn.revision.request.active metrics by prashanthjos · Pull Request #16367 · knative/serving

prashanthjos · 2026-01-30T21:50:41Z

Description

This PR moves activator queue metrics from the throttler to the handler layer, adding two new metrics that report instantaneous request counts per revision.

Design

Uses UpDownCounter with callback-based tracking pattern:

Metrics updated at request state transitions (queued → active → complete)
Uses context.Background() to prevent metric cancellation when requests are cancelled
Handler doesn't interact with instruments directly - uses encapsulated methods

Metric Details

Name	Type	Description
`kn.revision.request.queued`	Int64UpDownCounter	Number of requests currently queued waiting for capacity
`kn.revision.request.active`	Int64UpDownCounter	Number of requests currently being proxied

Labels:

kn.revision.name: Revision name
k8s.namespace.name: Namespace

Testing

Updated handler tests for new New() signature
All pkg/activator/handler and pkg/activator/net tests pass

release-note
Add kn.revision.request.queued and kn.revision.request.active metrics to monitor instantaneous request counts per revision in the activator

Adds an observable gauge metric that reports the number of in-flight requests per revision in the activator's throttler. Uses a callback-based approach to avoid any overhead on the request hot path - values are only computed when metrics are scraped.

knative-prow · 2026-01-30T21:50:52Z

Hi @prashanthjos. Thanks for your PR.

I'm waiting for a knative member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

codecov · 2026-01-30T21:56:18Z

Codecov Report

❌ Patch coverage is 90.69767% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.25%. Comparing base (4004491) to head (bbedb07).
⚠️ Report is 4 commits behind head on main.

Files with missing lines	Patch %	Lines
pkg/activator/handler/metrics.go	89.18%	2 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #16367      +/-   ##
==========================================
+ Coverage   80.17%   80.25%   +0.08%     
==========================================
  Files         216      216              
  Lines       13440    13483      +43     
==========================================
+ Hits        10775    10821      +46     
+ Misses       2300     2296       -4     
- Partials      365      366       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

linkvt

Hi @prashanthjos , thanks for the PR!
I left a few comments and would also still require a review from a maintainer with more experience.

/ok-to-test

linkvt · 2026-02-02T08:33:49Z

pkg/activator/net/metrics_test.go

+	}
+}
+
+func TestThrottlerMetricsShutdownNilObserver(t *testing.T) {


Seems like a duplicate of what is tested in TestNewThrottlerMetricsWithNilProvider

linkvt · 2026-02-02T08:34:43Z

pkg/activator/net/metrics_test.go

+	}
+}
+
+func TestThrottlerMetricsEmptyQueueDepth(t *testing.T) {


We could maybe merge this and the TestNewThrottlerMetrics by using a table driven test? See https://go.dev/wiki/TableDrivenTests#using-a-map-to-store-test-cases

linkvt · 2026-02-02T08:41:36Z

pkg/activator/net/throttler.go

+	// Initialize metrics with the throttler as the queue depth provider.
+	metrics, err := newThrottlerMetrics(mp, t)
+	if err != nil {
+		logger.Warnw("Failed to initialize throttler metrics", "error", err)


I understand that there is currently no error passed to callers of NewThrottler but I don't think that swallowing the error here is the best solution.

linkvt · 2026-02-02T08:42:16Z

pkg/activator/net/metrics.go

+}
+
+// Shutdown unregisters the metrics callback.
+func (m *throttlerMetrics) Shutdown() error {


Maybe I'm missing something but it seems this is only called in tests?

linkvt · 2026-02-02T08:46:01Z

pkg/activator/net/metrics.go

+
+	queueDepth, err := meter.Int64ObservableGauge(
+		"kn.activator.queue.depth",
+		metric.WithDescription("Number of requests currently queued in the activator for a revision"),


The metric name and description use the word queue but below we are tracking inflight requests which seems a bit different to me? Is this intended?

The terminology "queue" was used here in the documentation, I am open to change it.

The activator is part of the data-plane. It is responsible to queue incoming requests (if a Knative Service is scaled-to-zero). It communicates with the autoscaler to bring scaled-to-zero Services back up and forward the queued requests. Activator can also act as a request buffer to handle traffic bursts. Additional details can be found here.

https://knative.dev/docs/serving/architecture/#diagram

prashanthjos · 2026-02-02T17:56:59Z

Thank you @vincent-pli, I will wait for @dprotaso also to review and will address all the comments together.

dprotaso · 2026-02-02T21:50:40Z

We already have the metric - kn.revision.request.concurrency. This metric reporter happens before the throttler.

Is this not the metric you're looking for based on your description reports the number of requests currently in-flight

prashanthjos · 2026-02-02T23:15:39Z

Thank you Dave, Looking at the code more carefully, I see that ConcurrencyReporter.Handler wraps the activationHandler (which calls throttler.Try), so handleRequestIn/handleRequestOut are called before entering and after exiting the throttler respectively. This means kn.revision.request.concurrency already captures requests that are waiting in the throttler queue.

The differences between the two metrics are:

We want to capture this raw number instead of having it pre-averaged, so we can apply our own aggregation scenarios (average/sum/count/max/percentiles) based on specific monitoring needs. Pre-averaged data limits flexibility in how we can analyze queue behavior.

dprotaso · 2026-02-05T00:30:46Z

I think it would be easier to add this type of instrumentation to where we already produce our spans - then we shouldn't need to mess with our breakers.

serving/pkg/activator/handler/handler.go

Line 114 in 11899d4

proxyCtx, proxySpan := a.tracer.Start(r.Context(), "activator_proxy")

Might be worth refactoring the spans logic as well into a similar struct that encapsulates both similar to https://github.com/knative/serving/blob/main/pkg/activator/handler/metrics.go

As for attribute naming we already have the following so we should use those -

k8s.namespace.name
kn.revision.name

We can/probably should include other attributes if they exist. See:

serving/pkg/activator/handler/metric_handler.go

Line 50 in 2cd582b

labeler.Add(

As for the metric name kn.activator.queue.depth maybe we should tweak it to pair with kn.revision.request.concurrency

Thoughts on
kn.revision.request.active
kn.revision.request.queued

prashanthjos · 2026-02-05T07:27:17Z

I think it would be easier to add this type of instrumentation to where we already produce our spans - then we shouldn't need to mess with our breakers.

@dprotaso I am ok with other changes, but from handler.go, we can only count requests entering/exiting throttler.Try(). We can't distinguish "waiting in queue" vs "actively executing" without accessing the breaker internals.

dprotaso · 2026-02-05T14:19:44Z

@prashanthjos it should work similar to how we measure spans

	// increment queue count
	if err := a.throttler.Try(tryContext, revID, func(dest string, isClusterIP bool) error {
	        // decrement queue count
	        // increment executing count
	        // defer decrement executing count
                  ...
		return nil
	}); err != nil {
	        // we need to distinguish between failure to proxy and failure to queue 
	        // decrement queue count if failure to queue
	        ...
	}

prashanthjos · 2026-02-05T23:48:00Z

@dprotaso i changed the code, please do let me know your thoughts.

dprotaso · 2026-02-06T01:40:33Z

pkg/activator/handler/handler.go

 	tryContext, trySpan := a.tracer.Start(r.Context(), "throttler_try")

 	revID := RevIDFrom(r.Context())
+	rev := RevisionFrom(r.Context())


why do we need rev when it seems like revID has what we need?

dprotaso · 2026-02-06T01:41:50Z

pkg/activator/handler/handler.go

+
 	if err := a.throttler.Try(tryContext, revID, func(dest string, isClusterIP bool) error {
+		// Request got capacity - decrement queued, increment active
+		a.metrics.requestQueued.Add(r.Context(), -1, metricOpts)


Don't use r.Context() when recording metrics - we shouldn't cancel metric recording when the request is cancelled

pkg/activator/handler/handler.go

activator

pkg/activator/net/throttler.go

dprotaso · 2026-02-08T19:14:02Z

/lgtm
/approve

thanks @prashanthjos

cc @linkvt for any post merge comments

knative-prow · 2026-02-08T19:14:15Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dprotaso, prashanthjos

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [dprotaso]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

knative-prow bot requested review from dprotaso and skonto January 30, 2026 21:50

knative-prow bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 30, 2026

Improving the tests

5809924

linkvt suggested changes Feb 2, 2026

View reviewed changes

knative-prow bot assigned linkvt Feb 2, 2026

knative-prow bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 2, 2026

Addressing the review comments

aef4768

dprotaso reviewed Feb 6, 2026

View reviewed changes

Addressing review comments

bbedb07

prashanthjos changed the title ~~Add activator queue depth metric (kn.activator.queue.depth)~~ Add kn.revision.request.queued and kn.revision.request.active metrics Feb 7, 2026

knative-prow bot assigned dprotaso Feb 8, 2026

knative-prow bot added the lgtm Indicates that a PR is ready to be merged. label Feb 8, 2026

knative-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 8, 2026

knative-prow bot merged commit 9c43aa8 into knative:main Feb 8, 2026
93 checks passed

Conversation

prashanthjos commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Design

Metric Details

Testing

Uh oh!

knative-prow bot commented Jan 30, 2026

Uh oh!

codecov bot commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

linkvt left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

prashanthjos commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dprotaso commented Feb 2, 2026

Uh oh!

prashanthjos commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dprotaso commented Feb 5, 2026

Uh oh!

prashanthjos commented Feb 5, 2026

Uh oh!

dprotaso commented Feb 5, 2026

Uh oh!

prashanthjos commented Feb 5, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dprotaso commented Feb 8, 2026

Uh oh!

knative-prow bot commented Feb 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

prashanthjos commented Jan 30, 2026 •

edited

Loading

codecov bot commented Jan 30, 2026 •

edited

Loading

prashanthjos commented Feb 2, 2026 •

edited

Loading

prashanthjos commented Feb 2, 2026 •

edited

Loading