fix(substrait): normalize table names from Substrait NamedTable for Calcite interop#20993
Draft
bvolpato wants to merge 1 commit intoapache:mainfrom
Draft
fix(substrait): normalize table names from Substrait NamedTable for Calcite interop#20993bvolpato wants to merge 1 commit intoapache:mainfrom
bvolpato wants to merge 1 commit intoapache:mainfrom
Conversation
…alcite interop Normalize Substrait NamedTable names using TableReference::parse_str, matching how DataFusion's SQL planner normalizes identifiers at parse time. Since Substrait has no concept of quoted identifiers, all names are treated as unquoted and lowercased. This fixes interoperability with producers like Apache Calcite/Isthmus which emit uppercase table names (e.g. LINEITEM) while DataFusion's catalog stores names in lowercase (e.g. lineitem). This addresses 118 out of 120 failing consumer-testing plans.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
Substrait plans produced by Apache Calcite/Isthmus use uppercase table names (e.g.
LINEITEM,PARTSUPP), following the SQL standard. DataFusion normalizes identifiers to lowercase. Since Substrait has no concept of quoted vs unquoted identifiers (NamedTable.namesis just a list of plain strings), this mismatch causes all table lookups to fail.Out of 120 Isthmus-produced Substrait plans in the consumer-testing repo, 118 fail with the same error:
No table named 'LINEITEM'(or similar uppercase name).What changes are included in this PR?
Normalize Substrait
NamedTablenames at the input boundary usingTableReference::parse_str(), matching how DataFusion's SQL planner normalizes identifiers at parse time viaIdentNormalizer. This is explicit, consistent with the rest of the codebase, and stays in sync if normalization rules ever change.Files changed:
read_rel.rs— core fix: useTableReference::parse_str(&nt.names.join("."))instead of constructingTableReferencevariants directlytests/utils.rs— test utility updated for consistent normalizationsubstrait_validations.rs— useparse_strfor test table registrationconsumer_integration.rs— updated snapshots + 2 new tests for uppercase name handlingemit_kind_tests.rs,logical_plans.rs— snapshot updatesAre these changes tested?
Yes, 2 new tests added:
test_uppercase_table_name_resolves_to_lowercase— verifies Calcite interop (uppercase plan + lowercase catalog)test_uppercase_table_name_with_plan_schemas— verifies consistency withadd_plan_schemas_to_ctxAll existing tests pass. Snapshot updates reflect the normalization (table names lowercase in plan output).
Are there any user-facing changes?
Table names from consumed Substrait plans will now be normalized to lowercase, matching DataFusion's default SQL behavior. This is a bug fix — previously, plans from Calcite/Isthmus were unusable.