-
Notifications
You must be signed in to change notification settings - Fork 487
Specialize Top1Monoid for thread-local aggregation #17056
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
frankmcsherry
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems nice! It may also be a nice way to get a not-terrible way to do custom ordering for arrangements, too?
17d1a88 to
8be2ec7
Compare
Yes, we'd need to arrange the data prior to creating the monoid and handing it to the arrangement because with |
Introduce a top-1 monoid that shares the column order among all peers and retains buffers for efficient row unpacking. This allows for allocation-free row comparisons where otherwise the row needs to be unpacked into a new vector. The implementation relies on reference-counted shared state, which makes it unsuitable for sharing across thread boundaries. Signed-off-by: Moritz Hoffmann <mh@materialize.com>
8be2ec7 to
80657ea
Compare
vmarcos
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good stuff! I have extended our measurements in the internal spreadsheet with results for this PR and for #17058. We can see that this PR brings about a clearly positive effect on time measurements.
PR #17058 is a bit harder to evaluate. There appears to be a time cost for top-1 when we increase the number of workers. One would expect a positive effect on memory usage, but I estimate this to be small compared to the total allocated memory in the experiments. So I did not try to measure it.
|
Going into a bit more depth, measurements in a local development environment for this PR as documented in an internal spreadsheet with a TPC-H database at scale factor 1 and the monotonic top-k queries in epic MaterializeInc/database-issues#4838 were as follows:
Above, the time and records factors compare this PR with measurements for PR #16813. As we can see, the PR produces lower running times across all configurations evaluated. The gains for small There is some extra code complexity, but the PR illustrates a technique that could be employed in other situations. So we feel that this PR should be merged. We also believe that PR #17058 should at this point not be merged. We provide more details there. |
Introduce a top-1 monoid that shares the column order among all peers and retains buffers for efficient row unpacking. This allows for allocation-free row comparisons where otherwise the row needs to be unpacked into a new vector. The implementation relies on reference-counted shared state, which makes it unsuitable for sharing across thread boundaries.
Signed-off-by: Moritz Hoffmann mh@materialize.com
Checklist
$T ⇔ Proto$Tmapping (possibly in a backwards-incompatible way) and therefore is tagged with aT-protolabel.companion cloud PR to account for those changes that is tagged with
the release-blocker label (example).