Skip to content

ORCA: order-preserving LINT mapping for string statistics#1742

Draft
yjhjstz wants to merge 1 commit into
apache:mainfrom
yjhjstz:orca/string-stats-ordering
Draft

ORCA: order-preserving LINT mapping for string statistics#1742
yjhjstz wants to merge 1 commit into
apache:mainfrom
yjhjstz:orca/string-stats-ordering

Conversation

@yjhjstz
Copy link
Copy Markdown
Member

@yjhjstz yjhjstz commented May 13, 2026

Previously, string statistics were mapped to LINT values via hashtext, which does not preserve lexicographic order. As a result, ORCA's range/comparison estimates on string columns could be wildly inaccurate. This change introduces an order-preserving mapping so that a < b on strings implies LINT(a) < LINT(b), restoring meaningful selectivity estimates for inequality predicates on text columns.

Replace the hash-based LINT mapping used by ORCA's varchar/text/bpchar statistics with an order-preserving 7-byte locale sort-key prefix, so that single-column MCV and histogram estimates respect the column's collation order instead of collapsing to hash values.

Fixes #ISSUE_Number

What does this PR do?

Type of Change

  • Bug fix (non-breaking change)
  • New feature (non-breaking change)
  • Breaking change (fix or feature with breaking changes)
  • Documentation update

Breaking Changes

Test Plan

  • Unit tests added/updated
  • Integration tests added/updated
  • Passed make installcheck
  • Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Additional Context

CI Skip Instructions


@yjhjstz yjhjstz force-pushed the orca/string-stats-ordering branch from 26ec795 to 6890283 Compare May 14, 2026 01:21
Replace the hash-based LINT mapping used by ORCA's varchar/text/bpchar
statistics with an order-preserving 7-byte locale sort-key prefix, so
that single-column MCV and histogram estimates respect the column's
collation order instead of collapsing to hash values.
@yjhjstz yjhjstz force-pushed the orca/string-stats-ordering branch 2 times, most recently from 7feb5b7 to 56cb1fc Compare May 15, 2026 15:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant