Skip to content

Spark: Support variant_get predicate pushdown for file skipping#15385

Open
qlong wants to merge 1 commit intoapache:mainfrom
qlong:variant-file-skipping-sparkv2filters
Open

Spark: Support variant_get predicate pushdown for file skipping#15385
qlong wants to merge 1 commit intoapache:mainfrom
qlong:variant-file-skipping-sparkv2filters

Conversation

@qlong
Copy link
Copy Markdown

@qlong qlong commented Feb 20, 2026

This is to support manifest-based file skipping for variant columns.

Changes:

  • SparkV2Filters: Convert variant_get/try_variant_get to Expressions.extract()
  • Spark3Util.describe: Output extract terms as variant_get() for EXPLAIN

Tests:

  • Added unit tests
  • Manual e2e testing with spark-sql built with dependence PRs, verified variant_get is pushdown to iceberg for file skipping. Verified that files is skipped from Spark history.

The PR depends on:

  1. Api: Support variant extract and fix manifest bounds byte order #15384: variant bound fix
  2. Spark: Support writing shredded variant in Iceberg-Spark #14297: shredded variant support for Spark @aihuaxu
  3. [SPARK-55617] Add VariantGet to V2ExpressionBuilder for DSv2 filter pushdown spark#54394: Spark side change to add VariantGet to DSv2 filter

This PR can be safely merged once the 1st dependency PR is merged.

Related issue:

  1. Variant Data Type Support #10392

- SparkV2Filters: Convert variant_get/try_variant_get to
  Expressions.extract()
- Spark3Util.describe: Output extract terms as variant_get() for EXPLAIN
- Add tests for both

Depends on Spark PR: apache/spark#54394
@huaxingao
Copy link
Copy Markdown
Contributor

cc @aihuaxu

@github-actions
Copy link
Copy Markdown

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions.

@github-actions github-actions Bot added the stale label Mar 31, 2026
@steveloughran
Copy link
Copy Markdown
Contributor

Not stale! we need this! queries on variants are pretty bad right now, and skipping files can start to recover that.

@github-actions github-actions Bot removed the stale label Apr 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants