Skip to content

Natively support time-window grouping expressions: window, session_window, window_time #4553

@andygrove

Description

@andygrove

Background

Spark's time-window grouping expressions currently fall back to Spark in Comet:

  • window(timeColumn, windowDuration, [slideDuration], [startTime]) (TimeWindow) - tumbling/sliding windows, common in batch aggregation (GROUP BY window(ts, '1 hour')), not just streaming.
  • session_window(timeColumn, gapDuration) (SessionWindow) - session windows.
  • window_time(window) (WindowTime) - extracts the event time from a window column.

Comet has no serde for TimeWindow / SessionWindow / WindowTime today, so any query using them falls back.

Notes

These are not plain scalar functions: Spark's analyzer (TimeWindowing / SessionWindowing rules) rewrites window() / session_window() into an Expand plus grouping on a computed window struct. Native support would need to handle the rewritten form (struct construction and the window-boundary arithmetic) and the grouping that follows.

window() over a fixed duration is the most commonly used of the three and would be the natural starting point.

Acceptance criteria

  • window, session_window, and window_time execute natively in Comet and match Spark.
  • Add SQL file test coverage under expressions/datetime/.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions