Specialize Top1Monoid for thread-local aggregation #17058

antiguru · 2023-01-10T02:11:00Z

Similarly to #17056, use a non-allocating monoid for all top-k aggregations.

This currently depends on a change to Differential (TimelyDataflow/differential-dataflow#375) before it can land (it has landed.)

Checklist

This PR has adequate test coverage / QA involvement has been duly considered.
This PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way) and therefore is tagged with a T-proto label.
If this PR will require changes to cloud orchestration, there is a
companion cloud PR to account for those changes that is tagged with
the release-blocker label (example).
This PR includes the following user-facing behavior changes:

Introduce a top-1 monoid that shares the column order among all peers and retains buffers for efficient row unpacking. This allows for allocation-free row comparisons where otherwise the row needs to be unpacked into a new vector. The implementation relies on reference-counted shared state, which makes it unsuitable for sharing across thread boundaries. Signed-off-by: Moritz Hoffmann <mh@materialize.com>

Signed-off-by: Moritz Hoffmann <mh@materialize.com>

… exchange for per-worker pre-aggregation

vmarcos · 2023-01-19T17:58:04Z

This PR extends #17056 with improvements for monotonic top-1 aggregation, namely reuse of column order across top-1 monoids as well as per-worker pre-aggregation. However, in a local development environment, these improvements have not yet shown performance benefits as documented in an internal spreadsheet:

Monotonic	Query Time - 1 worker [ms]	Query Time - 8 workers [ms]	Speedup	Dataflow records	Time Factor - 1 worker	Time Factor - 8 workers	Records Factor
Top-1	2,393.51	366.63	6.53	3	1.02	1.01	1

The settings used for the evaluation are the same described in #17056 (comment). Here, the time and record factors are wrt. main, and show no real benefit from the techniques added in the setting evaluated.

After discussion with @antiguru, we felt that the draft PR #17058 adds quite a bit of complexity for a relatively small (at this point) benefit. So we are suggesting leaving the top-1 improvement in #17058 in the icebox, perhaps for when we actually can see some benefit of per-worker pre-aggregation for top-1, e.g., in distributed dataflow executions.

antiguru requested review from frankmcsherry and vmarcos January 10, 2023 02:11

antiguru force-pushed the Top1MonoidLocal2 branch from ebc1d20 to a6c0f1f Compare January 10, 2023 02:29

antiguru and others added 3 commits January 13, 2023 11:34

Non-allocating monoids for top-k

08401f9

Signed-off-by: Moritz Hoffmann <mh@materialize.com>

changed monotonic top-1 prepare from exchange to pipeline followed by…

c486e48

… exchange for per-worker pre-aggregation

vmarcos force-pushed the Top1MonoidLocal2 branch from a6c0f1f to c486e48 Compare January 13, 2023 11:38

vmarcos mentioned this pull request Jan 19, 2023

Specialize Top1Monoid for thread-local aggregation #17056

Merged

4 tasks

antiguru closed this Jan 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Specialize Top1Monoid for thread-local aggregation #17058

Specialize Top1Monoid for thread-local aggregation #17058

Uh oh!

antiguru commented Jan 10, 2023 •

edited

Loading

Uh oh!

vmarcos commented Jan 19, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Specialize Top1Monoid for thread-local aggregation #17058

Specialize Top1Monoid for thread-local aggregation #17058

Uh oh!

Conversation

antiguru commented Jan 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Uh oh!

vmarcos commented Jan 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

antiguru commented Jan 10, 2023 •

edited

Loading

vmarcos commented Jan 19, 2023 •

edited

Loading