-
Notifications
You must be signed in to change notification settings - Fork 487
adapter: replacement materialized views #34234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
2bf4ec7 to
fec6902
Compare
fec6902 to
2db420a
Compare
|
Note that currently it's possible to panic envd by creating an MV that depends on itself, like this: Though this doesn't actually do anything cool, just panics envd 😞 I'll fix this in a follow-up as well. |
| // TODO(alter-mv): Wait until there is overlap between the old MV's write frontier and the | ||
| // new MV's as-of, to ensure no times are skipped. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary? During planning we should ensure that we're using the existing MV as a dependency and hence should select a as-of that is valid wrt. the old MV. The new MV would eventually catch up to the old MV's write frontier, and then start writing. If the new MV is already past the old MV's write frontier, we should still write the uncompacted history between the two, potentially in a batch spanning many time stamps. That said, I might be missing something, so at least worth explaining more :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's consider the most simple example of a MV that has only a single input, and we replace that with a MV that also only has a single input.
I1 --> MV
I2 --> RPL
The interesting case is the one where the since of I2 is greater than the upper of MV. In timestamp selection, we cannot select an as-of that's less than the since of the input, so for RPL we'll most likely select I2's since as the as-of. But since that since is greater than MV's upper, if we'd cut over without waiting for MV's upper to catch up, we'd skip times in the output.
It doesn't really matter that RPL also depends on MV in this example. An additional dependency can only push the as-of further into the future, not pull it into the past. We can't select an as-of that's before I2's since, since then we wouldn't be able to read the input data anymore.
But yeah, seems like a good idea to put a comment somewhere!
aljoscha
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had some comments inline. I think the code is good but needs the follow-ups that you're already aware off, and maybe resolving some of the murkiness around dependencies, sinces, and how that system currently works.
|
|
||
| // For now, we don't support schema evolution for materialized views. | ||
| if &target.desc.latest() != global_lir_plan.desc() { | ||
| return Err(AdapterError::Unstructured(anyhow!("incompatible schemas"))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might eventually want a structured error for this one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I left that here because I wanted to look into whether this check should/can be moved to planning instead, as a follow-up.
| .and_then(|r| r.try_step_forward()); | ||
| let until = Antichain::from_iter(until_ts); | ||
|
|
||
| // If this is a replacement MV, ensure that `storage_as_of` > the `since` of the target |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shows that we're in somewhat weird territory with the primary/target business. The replacement doesn't really depend on the target, and so normally we wouldn't want these since invariants here. What have you thought about that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is needed because in storage collections there is this soft assert. The comment above it explains that we need this for source exports depending on their remap shards. However, the soft assert also applies to storage collections depending on their primaries, where the check doesn't quite make sense.
I didn't want to muck with the deepest storage collections internals here, so I went with the easy workaround.
The replacement doesn't really depend on the target
I would say it does! At least at the SQL-level you can't drop the target without dropping the replacement, as you'd end up with a "dangling pointer". On the storage collection level you could drop them separately, but if you dropped the target GlobalId, there wouldn't be anything left to downgrade the shard's critical since, which seems bad.
| .map(|gid| scx.catalog.resolve_item_id(&gid)) | ||
| .collect(); | ||
|
|
||
| if let Some(id) = replacement_target { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will dropping an MV while there is a live replacement block unless you use CASCADE? And will CASCADE also drop the replacements?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep! See the tests in replacement-materialized-views.td.
|
Approved now, because we know that we will keep iterating on this one, and it unblocks some of the follow-up work, demos, the like! |
Once materialized views can have multiple historical storage collections associated with them, we'll need to make sure to drop all of those when dropping a materialized view. There is always only a single compute collection to drop. This commit ensures we'll do the right thing and also simplifies the item dropping code a bit in the process: Instead of keeping track of indexes/MVs/CTs to drop, we instead only track compute and storage collections to drop. Doing so allows deleting a bunch of duplicate code.
2db420a to
937ba62
Compare
This commit adds a feature flag to gate the creation and application of replacement materialized views. It's disabled by default and enabled in CI.
937ba62 to
a0bbb0b
Compare
antiguru
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I left one comment around SQL syntax.
| } | ||
|
|
||
| if let Some(target) = &self.replacing { | ||
| f.write_str(" REPLACING "); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder about the SQL syntax, but don't have a strong opinion:
replacingmight indicate something continuous, but we're only replacing once.replacesalso seems to be that it actively replaces.replacewould be less opinionated. Seems similar to SQL'sCASCADE.
Just putting this out here, maybe a native speaker has some input!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No strong opinion from me either! I've put this down as one of the remaining things to figure out in https://github.com/MaterializeInc/database-issues/issues/9903
When a replacement MV is applied, the catalog is updated by dropping the replacement and updating the target MV with a new version. This transfers both the replacement's compute and storage collection to the target MV, so they should not be dropped. The implication application logic must thus be extended to account for that.
For compute collections, it is wrong to assume that all GlobalIds associated with a catalog entry (specifically an MV catalog entry) also map to live compute collections. Instead, only the "write GlobalId" has a live compute collection. Compute collections that were previously maintaining the catalog item have likely been dropped already.
9c732d0 to
4d63825
Compare
|
TFTRs! |
This PR implements an MVP enabling replacement of materialized views. It follows the same ideas as #34032 but models replacement MVs as special MVs instead of introducing a separate catalog item type. This matches the product requirements we arrived at in this Slack thread.
The PR adds the following syntax:
Modelling replacement MVs as materialized views means that the amount of code changes required is much less than in the alternative approach that introduces a new item type. It also means that diagnostic queries that work for MVs also immediately work for replacements, including
SHOW CREATE,EXPLAIN, andEXPLAIN ANALYZE.There are a few special cases of replacement MVs that this PR doesn't address yet:
All of these are left as follow ups.
Motivation
Part of https://github.com/MaterializeInc/database-issues/issues/9903
Tips for reviewer
Commits are meaningful and can be reviewed individually.
Checklist
$T ⇔ Proto$Tmapping (possibly in a backwards-incompatible way), then it is tagged with aT-protolabel.