Skip to content

Conversation

@dejanzele
Copy link
Member

What type of PR is this?

Enhancement

What this PR does / why we need it

Adds per-pool scheduling metrics to track success/failure outcomes for each pool independently.

Currently a scheduling failure in one pool causes the entire cycle to fail with a single error. These metrics enable:

  1. Identifying which pool is failing
  2. Alerting on specific pool failures
  3. Tracking pool health over time

New metrics:

  • armada_scheduler_pool_scheduling_outcome - counter with labels pool, outcome (success/failure)
  • armada_scheduler_pool_scheduling_errors - counter with labels pool, error_type (context_creation/schedule/upsert)

Which issue(s) this PR fixes

Fixes #

Special notes for your reviewer

@dejanzele dejanzele force-pushed the feat/pool-scheduling-metrics branch 4 times, most recently from dc32c4d to 1ad6afe Compare December 23, 2025 15:26
Signed-off-by: Dejan Zele Pejchev <pejcev.dejan@gmail.com>
@dejanzele dejanzele force-pushed the feat/pool-scheduling-metrics branch from 1ad6afe to 2f736dd Compare December 23, 2025 17:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant