Skip to content

Conversation

@andygrove
Copy link
Member

Which issue does this PR close?

Closes #.

Rationale for this change

DataFusion 52's default PhysicalExprAdapter can fail when casting complex nested types (List, Map) between physical and logical schemas. This adds a fallback path in SparkPhysicalExprAdapter that wraps type-mismatched columns with CometCastColumnExpr using spark_parquet_convert for the actual conversion.

Changes to CometCastColumnExpr:

  • Add optional SparkParquetOptions for complex nested type conversions
  • Use == instead of equals_datatype to detect field name differences in nested types (Struct, List, Map)
  • Add relabel_array for types that differ only in field names (e.g., List element "item" vs "element", Map "key_value" vs "entries")
  • Fallback to spark_parquet_convert for structural nested type changes

Changes to SparkPhysicalExprAdapter:

  • Try default adapter first, fall back to wrap_all_type_mismatches when it fails on complex nested types
  • Route Struct/List/Map casts to CometCastColumnExpr instead of Spark Cast, which doesn't handle nested type rewriting

What changes are included in this PR?

How are these changes tested?

andygrove and others added 4 commits February 10, 2026 10:50
DataFusion 52's default PhysicalExprAdapter can fail when casting
complex nested types (List<Struct>, Map) between physical and logical
schemas. This adds a fallback path in SparkPhysicalExprAdapter that
wraps type-mismatched columns with CometCastColumnExpr using
spark_parquet_convert for the actual conversion.

Changes to CometCastColumnExpr:
- Add optional SparkParquetOptions for complex nested type conversions
- Use == instead of equals_datatype to detect field name differences
  in nested types (Struct, List, Map)
- Add relabel_array for types that differ only in field names (e.g.,
  List element "item" vs "element", Map "key_value" vs "entries")
- Fallback to spark_parquet_convert for structural nested type changes

Changes to SparkPhysicalExprAdapter:
- Try default adapter first, fall back to wrap_all_type_mismatches
  when it fails on complex nested types
- Route Struct/List/Map casts to CometCastColumnExpr instead of
  Spark Cast, which doesn't handle nested type rewriting

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Cargo cache key only hashed Cargo.lock and Cargo.toml, not the actual .rs source files. This meant changes to Rust code without dependency changes would restore a stale cache, potentially using an old libcomet.so built from different source.

Add hashFiles('native/**/*.rs') to the cache key and update restore-keys to use the dependency hash as a prefix, allowing proper incremental builds when only source changes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants