Specialize Top1Monoid for thread-local aggregation #17056

antiguru · 2023-01-09T22:50:13Z

Introduce a top-1 monoid that shares the column order among all peers and retains buffers for efficient row unpacking. This allows for allocation-free row comparisons where otherwise the row needs to be unpacked into a new vector. The implementation relies on reference-counted shared state, which makes it unsuitable for sharing across thread boundaries.

Signed-off-by: Moritz Hoffmann mh@materialize.com

Checklist

This PR has adequate test coverage / QA involvement has been duly considered.
This PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way) and therefore is tagged with a T-proto label.
If this PR will require changes to cloud orchestration, there is a
companion cloud PR to account for those changes that is tagged with
the release-blocker label (example).
This PR includes the following user-facing behavior changes:

src/compute/src/render/top_k.rs

frankmcsherry

This seems nice! It may also be a nice way to get a not-terrible way to do custom ordering for arrangements, too?

antiguru · 2023-01-09T23:06:19Z

It may also be a nice way to get a not-terrible way to do custom ordering for arrangements, too?

Yes, we'd need to arrange the data prior to creating the monoid and handing it to the arrangement because with Rc we can't Exchange, but arrange_core allows us to bypass this problem.

Introduce a top-1 monoid that shares the column order among all peers and retains buffers for efficient row unpacking. This allows for allocation-free row comparisons where otherwise the row needs to be unpacked into a new vector. The implementation relies on reference-counted shared state, which makes it unsuitable for sharing across thread boundaries. Signed-off-by: Moritz Hoffmann <mh@materialize.com>

vmarcos

Good stuff! I have extended our measurements in the internal spreadsheet with results for this PR and for #17058. We can see that this PR brings about a clearly positive effect on time measurements.

PR #17058 is a bit harder to evaluate. There appears to be a time cost for top-1 when we increase the number of workers. One would expect a positive effect on memory usage, but I estimate this to be small compared to the total allocated memory in the experiments. So I did not try to measure it.

vmarcos · 2023-01-19T17:57:00Z

Going into a bit more depth, measurements in a local development environment for this PR as documented in an internal spreadsheet with a TPC-H database at scale factor 1 and the monotonic top-k queries in epic MaterializeInc/database-issues#4838 were as follows:

Monotonic	Query Time - 1 worker [ms]	Query Time - 8 workers [ms]	Speedup	Dataflow records	Time Factor - 1 worker	Time Factor - 8 workers	Records Factor
Top-k, small (10)	4,303.79	648.74	6.6	27	1.30	1.26	1.00
Top-k, large (600000)	12,192.32	15,784.00	0.8	1,478,994	1.47	1.08	1.00

Above, the time and records factors compare this PR with measurements for PR #16813. As we can see, the PR produces lower running times across all configurations evaluated. The gains for small $k$ and 8 workers were above a factor of 1.25x. The speedup behavior is also in line with PR #16813. For large $k$, PR #16813 showed no speedup, as expected, since the query does not benefit from parallelism in this setting. In this PR, we see a degradation in query execution time with multiple workers, but the absolute time is reduced wrt. PR #16813, thus still showing a gain.

There is some extra code complexity, but the PR illustrates a technique that could be employed in other situations. So we feel that this PR should be merged. We also believe that PR #17058 should at this point not be merged. We provide more details there.

antiguru requested review from frankmcsherry, petrosagg and vmarcos January 9, 2023 22:50

frankmcsherry reviewed Jan 9, 2023

View reviewed changes

src/compute/src/render/top_k.rs Outdated Show resolved Hide resolved

frankmcsherry approved these changes Jan 9, 2023

View reviewed changes

antiguru force-pushed the Top1MonoidLocal branch from 17d1a88 to 8be2ec7 Compare January 9, 2023 23:03

antiguru force-pushed the Top1MonoidLocal branch from 8be2ec7 to 80657ea Compare January 9, 2023 23:07

antiguru mentioned this pull request Jan 10, 2023

Specialize Top1Monoid for thread-local aggregation #17058

Closed

4 tasks

vmarcos approved these changes Jan 10, 2023

View reviewed changes

antiguru merged commit 3039cea into MaterializeInc:main Jan 19, 2023

antiguru deleted the Top1MonoidLocal branch January 19, 2023 18:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Specialize Top1Monoid for thread-local aggregation #17056

Specialize Top1Monoid for thread-local aggregation #17056

Uh oh!

antiguru commented Jan 9, 2023

Uh oh!

Uh oh!

frankmcsherry left a comment

Uh oh!

antiguru commented Jan 9, 2023

Uh oh!

vmarcos left a comment

Uh oh!

vmarcos commented Jan 19, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Specialize Top1Monoid for thread-local aggregation #17056

Specialize Top1Monoid for thread-local aggregation #17056

Uh oh!

Conversation

antiguru commented Jan 9, 2023

Checklist

Uh oh!

Uh oh!

frankmcsherry left a comment

Choose a reason for hiding this comment

Uh oh!

antiguru commented Jan 9, 2023

Uh oh!

vmarcos left a comment

Choose a reason for hiding this comment

Uh oh!

vmarcos commented Jan 19, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants