Skip to content

feat(sources): add source latency metric#24987

Open
gwenaskell wants to merge 9 commits intomasterfrom
yoenn.burban/OPA-4855-add-source-latency-metric
Open

feat(sources): add source latency metric#24987
gwenaskell wants to merge 9 commits intomasterfrom
yoenn.burban/OPA-4855-add-source-latency-metric

Conversation

@gwenaskell
Copy link
Copy Markdown
Contributor

@gwenaskell gwenaskell commented Mar 23, 2026

Summary

This PR aims at generalizing the metric http_server_handler_duration_seconds by producing metrics giving insight about events batches handling time to all sources, in order to help debugging latency and backpressure issues.

It adds two distributions, source_send_latency_seconds and source_send_batch_latency_seconds, recording the time spent waiting for a single array/event, and for a full payload, to be pulled by the buffer.

For the sake of simplicity, those metrics are directly added to the source_sender's Output object so that they will be transparently added to all existing sources. This means that the metrics do not account for the processing time by the source itself (decoding and enriching the events), neither the time elapsed before returning an ack. The former is usually difficult to record without complex refactoring of the source implementation because decoding is buried in the source reader channel, but the computing time used by the source itself is usually small in comparison to the delay caused by downstream latency. The latter however (acks) might be significant if using E2E acks, and since not all sources support acknowledgments (and those that do have a different way of implementing it), it will probably require case-by-case implementation.

Vector configuration

n/a

How did you test this PR?

Change Type

  • Bug fix
  • New feature
  • Dependencies
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

References

  • Closes: OPA-4855

Notes

  • Please read our Vector contributor resources.
  • Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
  • Some CI checks run only after we manually approve them.
    • We recommend adding a pre-push hook, please see this template.
    • Alternatively, we recommend running the following locally before pushing to the remote branch:
      • make fmt
      • make check-clippy (if there are failures it's possible some of them can be fixed with make clippy-fix)
      • make test
  • After a review is requested, please avoid force pushes to help us review incrementally.
    • Feel free to push as many commits as you want. They will be squashed into one before merging.
    • For example, you can run git merge origin master and git push.
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
    run make build-licenses to regenerate the license inventory and commit the changes (if any). More details here.

@github-actions github-actions bot added the domain: core Anything related to core crates i.e. vector-core, core-common, etc label Mar 23, 2026
@gwenaskell gwenaskell force-pushed the yoenn.burban/OPA-4855-add-source-latency-metric branch from 3229c6c to 7b7b37a Compare April 13, 2026 14:53
@gwenaskell gwenaskell force-pushed the yoenn.burban/OPA-4855-add-source-latency-metric branch from 7b7b37a to 7646d06 Compare April 13, 2026 14:55
@gwenaskell gwenaskell marked this pull request as ready for review April 13, 2026 14:58
@gwenaskell gwenaskell requested a review from a team as a code owner April 13, 2026 14:58
@pront
Copy link
Copy Markdown
Member

pront commented Apr 13, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7646d065be

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@gwenaskell gwenaskell requested a review from a team as a code owner April 14, 2026 08:23
@github-actions github-actions bot added the domain: external docs Anything related to Vector's external, public documentation label Apr 14, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5e287c34ee

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +258 to 262
let send_batch_start = Instant::now();

for events in array::events_into_arrays(events, Some(CHUNK_SIZE)) {
self.send(events, &mut unsent_event_count)
.await
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Measure batch latency at channel-send boundaries

source_send_batch_latency_seconds is intended (and documented) as downstream channel blocking time, but the timer starts before the loop and wraps self.send(...), which includes per-event lag metric emission and metadata mutation before send_with_timeout. On large/complex batches, this inflates the histogram with source-side CPU work and can mislead backpressure debugging or alerting even when channel wait time is unchanged.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we discussed that in the previous comment. I think it is good to have a measurement that encompasses most of the shared logic across all sources, to get a better idea of how much it is slow to respond. We can combine source_send_latency_seconds and source_send_batch_latency_seconds to measure the cost of that source-side CPU work only

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain: core Anything related to core crates i.e. vector-core, core-common, etc domain: external docs Anything related to Vector's external, public documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants