Skip to content

Conversation

@Shekharrajak
Copy link

Which issue does this PR close?

Closes #2894

Rationale for this change

Comet currently falls back to Spark for ALL final hash aggregates when there's no Comet partial aggregate in the child plan. This is overly conservative because some aggregates have compatible intermediate buffer formats between Spark and Comet.
For example, MIN, MAX, COUNT, and bitwise aggregates (BIT_AND, BIT_OR, BIT_XOR) have simple intermediate buffers (single value) that are compatible between Spark and Comet. These can safely run with "Spark partial / Comet final" execution.
Other aggregates like SUM, AVG, VARIANCE, etc. have known incompatibilities (e.g., decimal overflow handling differences, complex intermediate buffers) and should continue to fall back when there's no Comet partial aggregate.

What changes are included in this PR?

Added supportsSparkPartialCometFinal method to CometAggregateExpressionSerde trait - Default is false

Added helper function - aggSupportsMixedExecution() in QueryPlanSerde

How are these changes tested?

"CometExecRule should not allow Spark partial and Comet final for unsafe aggregates" - Verifies SUM still falls back to Spark

"CometExecRule should allow Spark partial and Comet final for safe aggregates" - Verifies MIN/MAX/COUNT can use Comet final with Spark partial

@Shekharrajak Shekharrajak force-pushed the fix/issue-2894-aggregate-fallback branch from f2e6748 to 51869b1 Compare December 27, 2025 09:42
@codecov-commenter
Copy link

codecov-commenter commented Dec 27, 2025

Codecov Report

❌ Patch coverage is 53.84615% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 54.58%. Comparing base (f09f8af) to head (51869b1).
⚠️ Report is 803 commits behind head on main.

Files with missing lines Patch % Lines
...main/scala/org/apache/comet/serde/aggregates.scala 50.00% 3 Missing ⚠️
.../scala/org/apache/comet/serde/QueryPlanSerde.scala 33.33% 1 Missing and 1 partial ⚠️
...n/scala/org/apache/spark/sql/comet/operators.scala 66.66% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2994      +/-   ##
============================================
- Coverage     56.12%   54.58%   -1.54%     
- Complexity      976     1256     +280     
============================================
  Files           119      167      +48     
  Lines         11743    15505    +3762     
  Branches       2251     2571     +320     
============================================
+ Hits           6591     8464    +1873     
- Misses         4012     5822    +1810     
- Partials       1140     1219      +79     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Comet falls back to Spark for final hash aggregate in some cases when it could be supported

2 participants