Skip to content

feat(metrics): Add per-component CPU usage metric#25185

Draft
gwenaskell wants to merge 3 commits intomasterfrom
yoenn.burban/OPA-5012-add-per-component-cpu-metric
Draft

feat(metrics): Add per-component CPU usage metric#25185
gwenaskell wants to merge 3 commits intomasterfrom
yoenn.burban/OPA-5012-add-per-component-cpu-metric

Conversation

@gwenaskell
Copy link
Copy Markdown
Contributor

Summary

This PR adds a new metric reporting the CPU time consumed by sync/function transforms, to get a better measurement of their CPU usage.

See the RFC for more details.

Vector configuration

How did you test this PR?

Change Type

  • Bug fix
  • New feature
  • Dependencies
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

References

  • Closes: OPA-5012

Notes

  • Please read our Vector contributor resources.
  • Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
  • Some CI checks run only after we manually approve them.
    • We recommend adding a pre-push hook, please see this template.
    • Alternatively, we recommend running the following locally before pushing to the remote branch:
      • make fmt
      • make check-clippy (if there are failures it's possible some of them can be fixed with make clippy-fix)
      • make test
  • After a review is requested, please avoid force pushes to help us review incrementally.
    • Feel free to push as many commits as you want. They will be squashed into one before merging.
    • For example, you can run git merge origin master and git push.
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
    run make build-licenses to regenerate the license inventory and commit the changes (if any). More details on the dd-rust-license-tool.

@github-actions github-actions bot added domain: topology Anything related to Vector's topology code domain: rfc labels Apr 14, 2026
Operators use it exactly like `process_cpu_seconds_total` from the Prometheus
ecosystem:

```promql
CPU, excluding preemption, involuntary context switches, and any time another
process used the core.

**Linux and macOS — `clock_gettime(CLOCK_THREAD_CPUTIME_ID)`**
```rust
#[cfg(any(target_os = "linux", target_os = "macos"))]
fn thread_cpu_time() -> Duration {
let mut ts = libc::timespec { tv_sec: 0, tv_nsec: 0 };
#[cfg(any(target_os = "linux", target_os = "macos"))]
fn thread_cpu_time() -> Duration {
let mut ts = libc::timespec { tv_sec: 0, tv_nsec: 0 };
// SAFETY: ts is a valid pointer to a timespec struct and
// CLOCK_THREAD_CPUTIME_ID is a valid clock id on Linux >= 2.6
// and macOS >= 10.12.
unsafe {
libc::clock_gettime(libc::CLOCK_THREAD_CPUTIME_ID, &mut ts);
poll, the delta between two calls around `transform_all` gives exact CPU time
consumed by that transform invocation.

**Overhead:** On Linux, `clock_gettime(CLOCK_THREAD_CPUTIME_ID)` is
```rust
/// Returns the CPU time consumed by the calling thread.
///
/// On Linux and macOS, uses clock_gettime(CLOCK_THREAD_CPUTIME_ID) (nanosecond precision).
CPU-bottlenecked vs. backpressure-limited, enabling informed tuning.
- **Composable with existing metrics.** `rate(cpu_seconds[1m])` gives CPU
cores used; dividing by `utilization` separates CPU from pipeline effects.
- **Low overhead.** Two `clock_gettime` calls per batch (~80ns total on Linux)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain: rfc domain: topology Anything related to Vector's topology code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants