Skip to content

Comments

Migrate Join logic away from traits#668

Merged
frankmcsherry merged 2 commits intoTimelyDataflow:masterfrom
frankmcsherry:de_vec_join
Feb 20, 2026
Merged

Migrate Join logic away from traits#668
frankmcsherry merged 2 commits intoTimelyDataflow:masterfrom
frankmcsherry:de_vec_join

Conversation

@frankmcsherry
Copy link
Member

The logic for join has been provided by traits like Join and JoinCore, whose implementations involve VecCollection more than we might like. Implementations for VecCollection have become inherent methods, leaving the only other implementor Arranged who gets some inherent methods and some clean-up.

There are some very clear breaking changes here, and probably a few more to implement.

If you previously used an arrangement to join with a collection, you'll need to use join_core instead. Many places used semijoin on arrangements and it's just wrong (well, probably correctly used, but only a semijoin because the data happen to have primary keys). These calls also need light changes. Generally, using join_core, which currently still produces VecCollection output, but I'd like to make it easier to avoid that without having to drop all the way to join_traces.

We should probably delete/deprecate semijoin and antijoin. They are the right "pattern", but they don't actually guarantee that they do what their name says (e.g. they do not ensure distinctness of keys in their collections). They are also just sugar for 1-2 lines each, with a bunch of boilerplate supporting them.

/// The underlying `Stream<G, BatchWrapper<T::Batch>>` is a much more efficient way to access the data,
/// and this method should only be used when the data need to be transformed or exchanged, rather than
/// supplied as arguments to an operator using the same key-value structure.
pub fn as_vecs(&self) -> VecCollection<G, (Tr::KeyOwn, Tr::ValOwn), Tr::Diff>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to mention I added this. We should use it all over the place instead of as_collection with logic that happens to do the same thing.

Copy link
Member

@antiguru antiguru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, delta mdbook failures.

@frankmcsherry frankmcsherry force-pushed the de_vec_join branch 2 times, most recently from a10bedc to d8dfab3 Compare February 20, 2026 17:16
@frankmcsherry frankmcsherry merged commit fce4a22 into TimelyDataflow:master Feb 20, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants