Skip to content

fix(metrics): standardize dataset labels and correct GC metric names#2027

Merged
Theodus merged 3 commits intomainfrom
theodus/metrics
Mar 25, 2026
Merged

fix(metrics): standardize dataset labels and correct GC metric names#2027
Theodus merged 3 commits intomainfrom
theodus/metrics

Conversation

@Theodus
Copy link
Member

@Theodus Theodus commented Mar 25, 2026

Replace split dataset_name/dataset_namespace labels with a single dataset label formatted as namespace/name for consistent cardinality across components.

  • Standardize dataset identification across components to a single dataset = namespace/name label, replacing the inconsistent mix of dataset_name, dataset_namespace, and dataset labels across server and worker-core
  • Fix copy-paste bugs where expired_files_found was registered as "files_not_found" (colliding with the real counter) and expired_entries_deleted was registered as "files_failed_to_delete"
  • Align docs/metrics.md with actual code: add dataset label to all query/streaming metrics, replace stale version with manifest_hash, and add missing manifest_hash/job_id labels to dump, compaction, and GC metrics

Replace split dataset_name/dataset_namespace labels with a single `dataset`
label formatted as `namespace/name` for consistent cardinality across components.

- Drop redundant `dataset_name` label from `dataset_kvs` in server metrics
- Remove `dataset_name` from three inline `kv` arrays in `flight.rs`
- Replace `dataset_name` + `dataset_namespace` pair in `worker-core` `base_kvs`
with a single `dataset` = `namespace/name` label
- Update `docs/metrics.md` to reflect `dataset` label on all query and
streaming metrics (previously incorrectly documented as "Labels: None")
… with code

Fix copy-paste errors where `expired_files_found` and `expired_entries_deleted`
were registered with wrong OTel metric names, causing silent collisions and
missing data in Prometheus.

- Rename `expired_files_found` metric from `"files_not_found"` to
`"expired_files_found"` (was colliding with the actual `files_not_found`
counter)
- Rename `expired_entries_deleted` metric from `"files_failed_to_delete"` to
`"expired_entries_deleted"` with corrected description
- Replace stale `version` label with `manifest_hash` across all dump/extraction
metric docs
- Add missing `manifest_hash` and `job_id` labels to compaction and GC metric
docs
@Theodus Theodus changed the title refactor(server): standardize dataset metric label to namespace/name fix(metrics): standardize dataset labels and correct GC metric names Mar 25, 2026
Copy link
Contributor

@LNSD LNSD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ✅

@Theodus Theodus merged commit 83fc175 into main Mar 25, 2026
8 checks passed
@Theodus Theodus deleted the theodus/metrics branch March 25, 2026 18:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants