Skip to content

Fix tdigest#21007

Open
LiaCastaneda wants to merge 2 commits intoapache:mainfrom
LiaCastaneda:lia/fix-tdigest
Open

Fix tdigest#21007
LiaCastaneda wants to merge 2 commits intoapache:mainfrom
LiaCastaneda:lia/fix-tdigest

Conversation

@LiaCastaneda
Copy link
Contributor

@LiaCastaneda LiaCastaneda commented Mar 17, 2026

Which issue does this PR close?

  • Closes #.

Rationale for this change

SELECT 
  approx_percentile(x, 0.5) AS p50,
  approx_percentile(x, 0.90) AS p90,
  approx_percentile(x, 0.99) AS p99
FROM (VALUES (1), (2), (3), (4), (5), (10), (20), (50), (100), (1000)) AS t(x)

returns 6, 550, 100, where p90>p99, which look like wrong results.

What changes are included in this PR?

Adds a boundary check in estimate_quantile() when rank >= count - 1.0, returns max directly instead of interpolating. Without this, the interpolation could produce values larger than higher quantiles, causing p90 > p99 on sparse datasets.

Are these changes tested?

Added test_sparse_dataset_quantile_ordering

Are there any user-facing changes?

@github-actions github-actions bot added the functions Changes to functions implementation label Mar 17, 2026
@github-actions github-actions bot added the core Core DataFusion crate label Mar 18, 2026
@LiaCastaneda LiaCastaneda marked this pull request as ready for review March 18, 2026 10:04
@LiaCastaneda LiaCastaneda force-pushed the lia/fix-tdigest branch 2 times, most recently from b0a96b6 to a2e57dc Compare March 18, 2026 11:51
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Mar 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant