Skip to content

Conversation

@yeya24
Copy link
Contributor

@yeya24 yeya24 commented Jan 31, 2026

What this PR does:

After enabling remote write v2, I am seeing following errors from my Prometheus server.

time=2026-01-30T02:17:34.780Z level=ERROR source=queue_manager.go:1723 msg="non-recoverable error" component=remote remote_name=16566e url=https://distributor.cortex:80/api/v1/push failedSampleCount=1964 failedHistogramCount=36 failedExemplarCount=0 err="sent v2 request with 1964 samples, 36 histograms and 0 exemplars; got 2xx, but PRW 2.0 response header statistics indicate 0 samples, 0 histograms and 0 exemplars were accepted; assumining failure e.g. the target only supports PRW 1.0 prometheus.WriteRequest, but does not check the Content-Type header correctly"

After investigation, those requests were actually HA deduped so the samples were counted as failure. This is because we return stats of 0 samples written for HA deduped requests because it is considered as an error. However, this is against the expectation of the sender and Prometheus currently treats those samples as failed to send and it won't retry because it is a 202 http accepted status.

To fix this, this PR sets the samples stats in the response to the same value as number of samples in the remote write v2 request.

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

…plicated

Signed-off-by: yeya24 <benye@amazon.com>
Signed-off-by: yeya24 <benye@amazon.com>
Signed-off-by: yeya24 <benye@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant