Skip to content

fix(substrait): normalize table names from Substrait NamedTable for Calcite interop#20993

Draft
bvolpato wants to merge 1 commit intoapache:mainfrom
bvolpato:bvolpato/substrait-normalize-table-names
Draft

fix(substrait): normalize table names from Substrait NamedTable for Calcite interop#20993
bvolpato wants to merge 1 commit intoapache:mainfrom
bvolpato:bvolpato/substrait-normalize-table-names

Conversation

@bvolpato
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

Substrait plans produced by Apache Calcite/Isthmus use uppercase table names (e.g. LINEITEM, PARTSUPP), following the SQL standard. DataFusion normalizes identifiers to lowercase. Since Substrait has no concept of quoted vs unquoted identifiers (NamedTable.names is just a list of plain strings), this mismatch causes all table lookups to fail.

Out of 120 Isthmus-produced Substrait plans in the consumer-testing repo, 118 fail with the same error: No table named 'LINEITEM' (or similar uppercase name).

What changes are included in this PR?

Normalize Substrait NamedTable names at the input boundary using TableReference::parse_str(), matching how DataFusion's SQL planner normalizes identifiers at parse time via IdentNormalizer. This is explicit, consistent with the rest of the codebase, and stays in sync if normalization rules ever change.

Files changed:

  • read_rel.rs — core fix: use TableReference::parse_str(&nt.names.join(".")) instead of constructing TableReference variants directly
  • tests/utils.rs — test utility updated for consistent normalization
  • substrait_validations.rs — use parse_str for test table registration
  • consumer_integration.rs — updated snapshots + 2 new tests for uppercase name handling
  • emit_kind_tests.rs, logical_plans.rs — snapshot updates

Are these changes tested?

Yes, 2 new tests added:

  • test_uppercase_table_name_resolves_to_lowercase — verifies Calcite interop (uppercase plan + lowercase catalog)
  • test_uppercase_table_name_with_plan_schemas — verifies consistency with add_plan_schemas_to_ctx

All existing tests pass. Snapshot updates reflect the normalization (table names lowercase in plan output).

Are there any user-facing changes?

Table names from consumed Substrait plans will now be normalized to lowercase, matching DataFusion's default SQL behavior. This is a bug fix — previously, plans from Calcite/Isthmus were unusable.

…alcite interop

Normalize Substrait NamedTable names using TableReference::parse_str,
matching how DataFusion's SQL planner normalizes identifiers at parse
time. Since Substrait has no concept of quoted identifiers, all names
are treated as unquoted and lowercased.

This fixes interoperability with producers like Apache Calcite/Isthmus
which emit uppercase table names (e.g. LINEITEM) while DataFusion's
catalog stores names in lowercase (e.g. lineitem). This addresses
118 out of 120 failing consumer-testing plans.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

substrait Changes to the substrait crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

substrait generated by Apache Calcite does not run in DataFusion

1 participant