Antalya 26.1 Backport of #96620 - Iceberg partitioing fix#1579
Antalya 26.1 Backport of #96620 - Iceberg partitioing fix#1579zvonand merged 1 commit intoantalya-26.1from
Conversation
…g_fix Iceberg partitioing fix
|
AI audit note: This review comment was generated by AI (claude-4.6-sonnet-medium-thinking). Audit update for PR #1579 (Iceberg partitioning fix — ChunkPartitioner scatter selector): Confirmed defects:
Coverage summary:
|
|
Just another audit review from the QA side - does this require an open issue or it's not related and we can leave it as is? @mkmkme AI audit note: This review comment was generated by AI (gpt-5.3-codex). Audit update for PR #1579 (Iceberg partitioning fix): Confirmed defectsLow: Removed preallocation for partitioned result vectorImpact: Extra dynamic reallocations while building per-partition chunks; no correctness break, but avoidable CPU/memory overhead for high partition cardinality inserts. Anchor: Trigger: Any Affected transition: Why defect: The previous code reserved Affected subsystem and blast radius: Iceberg write path partition fan-out; impacts write-time efficiency only, especially on large partition fan-out blocks. Smallest logical reproduction steps: Create an Iceberg table with partition transform yielding many partitions in one insert block, insert a block that maps rows across multiple partitions, then observe Logical fault-injection mapping: Inject high partition cardinality (many unique partition keys) to amplify vector growth behavior. Fix direction (short): Add Regression test direction (short): Add a micro-benchmark/unit perf guard for partition fan-out path to assert no repeated Code evidence: // PR #1579 removes the earlier reserve call after moving declaration scope.
std::vector<std::pair<ChunkPartitioner::PartitionKey, Chunk>> result;
for (auto && [key, partition_columns] : result_columns)
{
size_t column_size = partition_columns[0]->size();
result.push_back({key, Chunk(std::move(partition_columns), column_size)});
}Coverage summaryScope reviewed: PR diff for Categories failed: Performance/resource handling (avoidable vector reallocation in final materialization stage). Categories passed: Root correctness fix (hashing transformed partition columns), lifetime/ownership safety ( Assumptions/limits: Static audit only; runtime perf quantification was not executed, and the new integration test path is limited to |
It doesn't require an open issue, but it is related. As mentioned in my self-report above, it's not a logical error and only leads to a small potential performance penalty. Also it's a thing in the original PR. For the sake of the backport, we prefer being 1-to-1 aligned with the upstream. |
PR #1579 — CI verification
New vs base (upstream "Checks New Fails") — NOT PR-related:
Regression New Fails — NOT PR-related:
Known Fails (45 BROKEN — all pre-existing):All catalogued under "Checks Known Fails" with established reasons:
Infrastructure:
Passing suites of note:
ConclusionNo failures are caused by this PR. The single "new fail" ( |
Iceberg partitioing fix
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Iceberg partitioing fix (ClickHouse#96620 by @scanhex12)
Documentation entry for user-facing changes
...
CI/CD Options
Exclude tests:
Regression jobs to run: