Reuse grpc buffer in querier's store-gateway stream#7519
Conversation
way stream Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
| // | ||
| // With SeriesResponse implementing ReleasableMessage, calling Free() after each Recv() | ||
| // returns the unmarshal buffer to the pool, reducing per-message allocations by ~32KB. | ||
| func BenchmarkGrpcStoreGatewayCalls(b *testing.B) { |
There was a problem hiding this comment.
This seems not the right place for benchmarking store gateways. Can you move to its own folder?
|
|
||
| // MessageWithBufRef holds a reference to gRPC unmarshal buffers for explicit lifecycle management. | ||
| // It satisfies cortexpb.ReleasableMessage via structural typing. | ||
| type MessageWithBufRef struct { |
There was a problem hiding this comment.
We shouldn't change Thanos via vendor
There was a problem hiding this comment.
I'll change upstream if this helps, still in process of testing and wanted to get opinions on draft pr since I'm not very familiar with grpc
|
I think we should also apply this fix in distributor_queryable.go when querying from ingesters. I remember seeing a similar kind of heap profile for ingester queries as well. |
| } | ||
|
|
||
| // Buffer reference for explicit memory management via cortexCodec. | ||
| cortexpb.MessageWithBufRef Ref = 1001 [(gogoproto.embed) = true, (gogoproto.customtype) = "github.com/cortexproject/cortex/pkg/cortexpb.MessageWithBufRef", (gogoproto.nullable) = false]; |
There was a problem hiding this comment.
This will a problem. Can we avoid having upstream changes in Thanos. We can't have this circular dependency with Thanos importing Cortex.
There was a problem hiding this comment.
Could you scope this PR down to just the detach? That alone fixes the OOM (buffer gets GC'd once nothing references it). The pool-reuse part needs the proto changes which create a circular dep with Thanos.
In parallel we can open an upstream thanos PR to add a generic BufRef to SeriesResponse — once vendored back, we can swap detach for Free() to get the alloc wins.
What this PR does:
Reuse grpc buffer in querier's store-gateway stream to reduce memory usage in large queries
Current heap dump during heavy queries:

GRPC Store-Gateway Bench Test NEW:
GRPC Store-Gateway Bench Test OLD:
Checklist
CHANGELOG.mdupdated - the order of entries should be[CHANGE],[FEATURE],[ENHANCEMENT],[BUGFIX]docs/configuration/v1-guarantees.mdupdated if this PR introduces experimental flags