Skip to content

Antalya 26.1 Backport of #96620 - Iceberg partitioing fix#1579

Merged
zvonand merged 1 commit intoantalya-26.1from
backports/antalya-26.1/96620
Mar 27, 2026
Merged

Antalya 26.1 Backport of #96620 - Iceberg partitioing fix#1579
zvonand merged 1 commit intoantalya-26.1from
backports/antalya-26.1/96620

Conversation

@mkmkme
Copy link
Copy Markdown
Collaborator

@mkmkme mkmkme commented Mar 26, 2026

Iceberg partitioing fix

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Iceberg partitioing fix (ClickHouse#96620 by @scanhex12)

Documentation entry for user-facing changes

...

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • S3 Export (2h)
  • Swarms (30m)
  • Tiered Storage (2h)

@github-actions
Copy link
Copy Markdown

Workflow [PR], commit [dd35d60]

@mkmkme
Copy link
Copy Markdown
Collaborator Author

mkmkme commented Mar 26, 2026

AI audit note: This review comment was generated by AI (claude-4.6-sonnet-medium-thinking).

Audit update for PR #1579 (Iceberg partitioning fix — ChunkPartitioner scatter selector):

Confirmed defects:

  • Low: Missing result.reserve() after variable relocation
    • Impact: Extra vector reallocations on large partition counts; no correctness impact.
    • Anchor: ChunkPartitioner.cpp / ChunkPartitioner::partitionChunk
    • Trigger: Any partitionChunk call with >1 partition.
    • Why defect: The original code had result.reserve(result_columns.size()) immediately before the fill loop. The fix moves result's declaration to before buildScatterSelector (where result_columns.size() is not yet known), and drops the reserve entirely rather than re-adding it just before the fill loop. O(log N) reallocations instead of 0.
    • Fix direction: Add result.reserve(result_columns.size()) directly before the for (auto && [key, partition_columns] : result_columns) fill loop.
    • Note: This behaviour is aligned with upstream PR Iceberg partitioing fix ClickHouse/ClickHouse#96620 — the diff there is byte-for-byte identical, including the missing reserve. The fix should be applied upstream first and then picked up in a follow-on backport.

Coverage summary:

  • Scope reviewed: ChunkPartitioner.cpp (partitionChunk), ChunkPartitioner.h, IcebergWrites.cpp (construct + consume paths), Mutations.cpp (getPartitionedChunks), buildScatterSelector (MergeTreeDataWriter.cpp), and new integration test.
  • Categories failed: result.reserve() regression (Low).
  • Categories passed: Root bug fix correctness; ownership lifetime safety (functions_columns holding IColumn::Ptr keeps raw_columns valid through buildScatterSelector); exception safety; no variable shadowing (outer result declared after the transform loop); empty-functions guard (ChunkPartitioner constructed only when partition spec fields > 0 at all call sites — IcebergWrites.cpp:704, Mutations.cpp:600 — so raw_columns is never empty at buildScatterSelector); thread safety; data correctness (buildScatterSelector now hashes partition-key columns, not raw data columns, producing the correct N-group scatter).
  • Assumptions/limits: Static analysis only; format_version=1 path not covered by the new test (parametrized on format_version=2 only).

Copy link
Copy Markdown

@ianton-ru ianton-ru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Selfeer
Copy link
Copy Markdown
Collaborator

Selfeer commented Mar 27, 2026

Just another audit review from the QA side - does this require an open issue or it's not related and we can leave it as is? @mkmkme

AI audit note: This review comment was generated by AI (gpt-5.3-codex).

Audit update for PR #1579 (Iceberg partitioning fix):

Confirmed defects

Low: Removed preallocation for partitioned result vector

Impact: Extra dynamic reallocations while building per-partition chunks; no correctness break, but avoidable CPU/memory overhead for high partition cardinality inserts.

Anchor: src/Storages/ObjectStorage/DataLakes/Iceberg/ChunkPartitioner.cpp / ChunkPartitioner::partitionChunk.

Trigger: Any partitionChunk() execution with more than one produced partition.

Affected transition: partitioned mutable columns -> final vector<PartitionKey, Chunk> materialization.

Why defect: The previous code reserved result capacity from result_columns.size() before the push-back loop. This PR moves result declaration earlier and drops that reserve, reintroducing growth reallocations.

Affected subsystem and blast radius: Iceberg write path partition fan-out; impacts write-time efficiency only, especially on large partition fan-out blocks.

Smallest logical reproduction steps: Create an Iceberg table with partition transform yielding many partitions in one insert block, insert a block that maps rows across multiple partitions, then observe result.push_back(...) growth without prior reserve(result_columns.size()).

Logical fault-injection mapping: Inject high partition cardinality (many unique partition keys) to amplify vector growth behavior.

Fix direction (short): Add result.reserve(result_columns.size()) immediately before the loop that pushes finalized partition chunks.

Regression test direction (short): Add a micro-benchmark/unit perf guard for partition fan-out path to assert no repeated result capacity growth under known partition counts.

Code evidence:

// PR #1579 removes the earlier reserve call after moving declaration scope.
std::vector<std::pair<ChunkPartitioner::PartitionKey, Chunk>> result;
for (auto && [key, partition_columns] : result_columns)
{
    size_t column_size = partition_columns[0]->size();
    result.push_back({key, Chunk(std::move(partition_columns), column_size)});
}

Coverage summary

Scope reviewed: PR diff for ChunkPartitioner::partitionChunk and test_writes_with_partitioned_table_count_partitions, plus call-chain touchpoints in IcebergWrites/Mutations and selector logic in buildScatterSelector.

Categories failed: Performance/resource handling (avoidable vector reallocation in final materialization stage).

Categories passed: Root correctness fix (hashing transformed partition columns), lifetime/ownership safety (functions_columns retains transformed columns across selector build), empty-partitioner guard at call sites, exception-safety/partial-update behavior in touched flow, concurrency/locking impact (no new shared mutable state), C++ UB/race/invalidation checks in changed path.

Assumptions/limits: Static audit only; runtime perf quantification was not executed, and the new integration test path is limited to format_version=2, storage_type=s3.

@mkmkme
Copy link
Copy Markdown
Collaborator Author

mkmkme commented Mar 27, 2026

Just another audit review from the QA side - does this require an open issue or it's not related and we can leave it as is? @mkmkme

It doesn't require an open issue, but it is related. As mentioned in my self-report above, it's not a logical error and only leads to a small potential performance penalty. Also it's a thing in the original PR.

For the sake of the backport, we prefer being 1-to-1 aligned with the upstream.

@Selfeer
Copy link
Copy Markdown
Collaborator

Selfeer commented Mar 27, 2026

PR #1579 — CI verification

New vs base (upstream "Checks New Fails") — NOT PR-related:

  • test_storage_delta/test_cdf.py::test_cdf[] (Integration amd_binary 2/5)
    Error: Code: 742. DELTA_KERNEL_ERROR — "Expected the first commit to have version 2, got None" in DeltaLake::TableChanges::getTableChanges().
    This is a Delta Lake CDF test, not an Iceberg test. The PR only touches Iceberg ChunkPartitioner.cpp and Iceberg-specific integration tests — zero overlap with Delta Lake code paths. The test did not appear as a failure on PR Antalya 26.1 Backport of #96191 - Introduce async prefetch and staleness for Iceberg metadata #1575 (same base branch, merged just prior). Likely a pre-existing flake or sensitivity to test data ordering in Delta kernel. Not caused by this PR.

Regression New Fails — NOT PR-related:

  • /swarms/feature/node failure/check restart clickhouse on swarm node (x86_64 + aarch64)
    /swarms/feature/node failure/network failure (aarch64 only)
    Swarms node-restart / network-failure scenarios — infra-level tests, same pattern seen on PR Fix export task not being killed during s3 outage #1564 and other PRs. No relation to Iceberg partitioning.

  • /parquet/datatypes/unsupportednull (x86_64 + aarch64)
    Same parquet regression failure seen across multiple PRs on this branch (also failed on PR Fix export task not being killed during s3 outage #1564). Pre-existing flake. No relation to Iceberg.

Known Fails (45 BROKEN — all pre-existing):

All catalogued under "Checks Known Fails" with established reasons:

  • test_backup_restore_on_cluster — NETLINK_ERROR (multiple shards/arches)
  • test_dirty_pages_force_purge — KNOWN #1369
  • test_s3_cache_locality — INVESTIGATE timeout
  • 02815_no_throw_in_simple_queries — KNOWN unstable upstream
  • 03206_no_exceptions_clickhouse_local — KNOWN unstable upstream
  • 03441_deltalake_clickhouse_public_datasets — INVESTIGATE S3 unreachable
  • Plus additional entries across multiple stateless / integration shards (see report)

Infrastructure:

  • Grype (Alpine image): 1 high/critical CVE on Alpine tag; non-Alpine Grype passed. Not a functional issue.

Passing suites of note:

  • All stateless test suites: green (all 16 shards)
  • All integration suites except amd_binary 2/5: green (14/15 shards)
  • S3 export partition (x86_64 + aarch64): green
  • Parquet minio + parquet AWS S3: green
  • All builds, fuzzers, stress tests, unit tests: green

Conclusion

No failures are caused by this PR. The single "new fail" (test_cdf Delta Lake) is in an unrelated code path (Delta, not Iceberg). Regression failures (swarms, parquet unsupportednull) are recurring across multiple PRs on this branch.

@Selfeer Selfeer added the verified Verified by QA label Mar 27, 2026
@zvonand zvonand merged commit 3addd5f into antalya-26.1 Mar 27, 2026
259 of 277 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants