[fix](serde) match STRUCT sub-fields by name when loading JSON#64011
Open
csun5285 wants to merge 1 commit into
Open
[fix](serde) match STRUCT sub-fields by name when loading JSON#64011csun5285 wants to merge 1 commit into
csun5285 wants to merge 1 commit into
Conversation
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Contributor
Author
|
run buildall |
Contributor
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
Contributor
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
Contributor
TPC-H: Total hot run time: 29321 ms |
Contributor
TPC-DS: Total hot run time: 170190 ms |
Stream Load into a STRUCT column reads each value as a string and converts it with DataTypeStructSerDe::from_string (CAST varchar -> struct). That path matched sub-fields by position, so JSON keys whose order differed from the DDL turned the whole struct column into NULL, and a row missing a field failed to load. from_string now detects named mode by the delimiter structure and matches sub-fields by name (case-insensitive), filling missing nullable fields with NULL. Unknown/extra fields are ignored in non-strict mode (load), matching the simdjson JSON reader that feeds STRUCT columns on JSON stream load; strict CAST instead rejects an unknown field name. Positional input still requires an exact field count, and struct-to-struct CAST stays positional. Add BE unit tests (DataTypeStructSerDeTest.FromStringByFieldName) covering by-name matching, case-insensitivity, missing/unknown/extra fields, positional input and the error paths; and a stream-load regression test (test_struct_field_align). Update the existing struct expectations in test_stream_load, test_stream_load_move_memtable and test_cast_struct to the by-name results. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ee43d13 to
d585cbe
Compare
Contributor
Author
|
run buildall |
Contributor
TPC-H: Total hot run time: 29666 ms |
Contributor
TPC-DS: Total hot run time: 170390 ms |
Contributor
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
csun5285
added a commit
to csun5285/doris-website
that referenced
this pull request
Jun 4, 2026
Document the behavior change from apache/doris#64011: when casting a string to STRUCT with field names (named mode), fields are now matched by name (case-insensitive) instead of by position. The input field order may differ from the schema, missing fields are filled with NULL, and unknown fields are rejected in strict mode / ignored in non-strict mode. Positional input (no field names) still requires an exact field count. Updated EN/ZH for dev and 4.x. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
11 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stream Load into a STRUCT column reads each value as a string and converts it with DataTypeStructSerDe::from_string (CAST varchar -> struct). That path matched sub-fields by position, so JSON keys whose order differed from the DDL turned the whole struct column into NULL, and a row missing a field failed to load.
doc: apache/doris-website#3907
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)