Skip to content

fix: propagate inner-field metadata through make_array and array_agg#22176

Open
CuteChuanChuan wants to merge 3 commits into
apache:mainfrom
CuteChuanChuan:raymond/issue-21982-add-inner-field-metadata
Open

fix: propagate inner-field metadata through make_array and array_agg#22176
CuteChuanChuan wants to merge 3 commits into
apache:mainfrom
CuteChuanChuan:raymond/issue-21982-add-inner-field-metadata

Conversation

@CuteChuanChuan
Copy link
Copy Markdown
Contributor

@CuteChuanChuan CuteChuanChuan commented May 14, 2026

Which issue does this PR close?

Rationale for this change

SQL UDFs/UDAFs that wrap a column into a composite type (List, Struct, Map, ...) currently drop the input field's metadata when building the output's inner Field.
This breaks Arrow extension types (ARROW:extension:*): SQL-constructed lists silently lose extension-type identity, and any downstream comparison sees them as different types.

The fix needs both a planning-time hook (return_field_from_args / return_field) that propagates metadata, and runtime construction paths that thread the planned FieldRef through to the produced array.

What changes are included in this PR?

Helpers (datafusion-common::utils)

  • nullable_inner_field_from / nullable_list_item_field_from: build canonical composite inner fields, preserving data type + metadata. Built on FieldExt::renamed, taking FieldRef by value so callers control cloning.
  • SingleRowListArrayBuilder::with_field now propagates metadata too.

make_array

  • New return_field_from_args.
  • New array_array_with_field runtime variant; old array_array becomes a thin shim for back-compat.

array_agg

  • New return_field and state_fields.
  • All four accumulators now carry FieldRef instead of DataType.

Are these changes tested?

Yes

  • New SLT array_metadata_propagation.slt covers make_array and array_agg

Are there any user-facing changes?

Yes

  • SQL-constructed lists from make_array and array_agg now retain Arrow extension-type identity from their input fields.

Public API change :

  • ArrayAggAccumulator::try_new, DistinctArrayAggAccumulator::try_new,
    OrderSensitiveArrayAggAccumulator::try_new, and ArrayAggGroupsAccumulator::new now take &FieldRef instead of &DataType / owned DataType.

- Add `nullable_inner_field_from` / `nullable_list_item_field_from` helpers in `datafusion-common::utils` (built on `FieldExt::renamed`).
- Extend `SingleRowListArrayBuilder::with_field` to also propagate metadata.
- `make_array`: add `return_field_from_args`; thread inner `FieldRef` through new `array_array_with_field` runtime variant.
- `array_agg`: add `return_field` and `state_fields` overrides; all four accumulators (`ArrayAgg`, `Distinct`, `OrderSensitive`, `Groups`) now carry `FieldRef` instead of `DataType`, propagating metadata.
- Add SLT `array_metadata_propagation.slt` covering `make_array` and `array_agg`.
- Update memory-accounting tests for new struct layout.
@github-actions github-actions Bot added sqllogictest SQL Logic Tests (.slt) common Related to common crate functions Changes to functions implementation spark labels May 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate functions Changes to functions implementation spark sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

make_array / array_agg drop inner-Field metadata when constructing List<T>

1 participant