[metrics] Support DDSketch in the parquet pipeline#6257
[metrics] Support DDSketch in the parquet pipeline#6257
Conversation
db0e0db to
86c034b
Compare
c3fc790 to
2261237
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 727f085864
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
i cannot review this PR due to lack of context. I know what DDSketch are, but I do not know what they are used for in the context the metrics ingestion pipeline, why they are stored in different files, etc. |
2261237 to
6e7d6a9
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6e7d6a90ae
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
6e7d6a9 to
7490fe3
Compare
|
|
||
| let keys_inner = keys_builder.values(); | ||
| for &k in &dp.keys { | ||
| keys_inner.append_value(k); |
There was a problem hiding this comment.
Where do we store the length of keys/counts? I.e. how do we recover the number of elements per data_point from _inner arrays?
There was a problem hiding this comment.
they are stored implicitly by Arrow - it stores an array of offsets, so you can get the length of row i with offsets[i+1] - offsets[i]
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 37f4298f66
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9a38c47b7c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: dff8e08fa0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 40cfe5a158
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| for metadata in &splits_metadata { | ||
| let insertable = | ||
| InsertableMetricsSplit::from_metadata(metadata, MetricsSplitState::Staged) | ||
| .map_err(|e| MetastoreError::JsonSerializeError { | ||
| struct_name: "MetricsSplitMetadata".to_string(), | ||
| message: e.to_string(), | ||
| })?; | ||
| let insertable = InsertableParquetSplit::from_metadata(metadata, SplitState::Staged) | ||
| .map_err(|err| MetastoreError::JsonSerializeError { |
There was a problem hiding this comment.
Reject split-kind mismatches during parquet staging
stage_parquet_splits_impl chooses the destination table from the RPC (kind) but never verifies that each ParquetSplitMetadata.kind matches that table. A mixed or misrouted request can therefore persist sketch metadata into metrics_splits (or vice versa), violating table invariants and causing downstream routing/listing logic that relies on split kind to query the wrong API/table. Add an explicit metadata.kind == kind check and fail fast on mismatch before building insert rows.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a35fdb8600
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
a35fdb8 to
3bbfb7f
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3bbfb7ff26
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3bbfb7ff26
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1d430881b0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let index_uid = request.index_uid().clone(); | ||
| let splits_metadata = request.deserialize_splits_metadata()?; | ||
|
|
There was a problem hiding this comment.
Enforce request index UID for staged sketch metadata
stage_sketch_splits deserializes and forwards splits_metadata_json without checking that each ParquetSplitMetadata.index_uid matches request.index_uid, so a caller can submit StageSketchSplitsRequest.index_uid = A while embedding metadata for B. In file-backed mode this stores rows under index A whose serialized metadata claims B, which breaks request-scoped invariants and can cause later list/publish responses to carry inconsistent index identities; this path should validate equality (or overwrite metadata index UID from the request) before staging.
Useful? React with 👍 / 👎.
1d43088 to
ad84838
Compare
ad84838 to
0655391
Compare
Description
This PR can be reviewed commit by commit.
This PR updates the parquet pipeline to process DDSketches. See https://datadoghq.atlassian.net/wiki/spaces/QKHS/pages/6291357728/DDSketch+in+Parquet for more information about the DDSketch spec.
How was this PR tested?
Describe how you tested this PR.