-
Notifications
You must be signed in to change notification settings - Fork 488
Design: replica capacity #31304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Design: replica capacity #31304
Conversation
Signed-off-by: Moritz Hoffmann <mh@materialize.com>
mgree
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. As a document, maybe a touch more abstract than we need it? If we're treating disk as elastic memory, do we really mean any resource other than memory? Can CPU starvation cause a cluster to not come up at all, or just not within a timeframe we're happy with?
| --> | ||
|
|
||
| A core problem of using Materialize today is that users cannot rely on a configuration that works today to continue working in the future. | ||
| We give indications about a workload's status, but it is easy to ignore them, or drift into an unsupported configuration by workload changes or unintended DDL operations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| We give indications about a workload's status, but it is easy to ignore them, or drift into an unsupported configuration by workload changes or unintended DDL operations. | |
| We give indications about a workload's status, but it is easy to ignore them, misunderstand them, or drift into an unsupported configuration by workload changes or unintended DDL operations. |
|
|
||
| At the moment, we present detailed metrics, such as memory, CPU and disk utilization, for replicas, with the hope that the metrics successfully characterize the health of a replica, and allow users to make scaling decisions. | ||
| While the metrics are useful from an operational perspective, they are not suitable to predict the future behavior of a replica. | ||
| Specifically, they are a bad predictor for whether a replica can successfully restart, since the resource utilization during restart can be different from steady-state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Specifically, they are a bad predictor for whether a replica can successfully restart, since the resource utilization during restart can be different from steady-state. | |
| Specifically, memory/disk utilization on their own are a bad/easy-to-misunderstand predictor for whether a replica can successfully restart, since the resource utilization during restart is almost certainly different from steady-state. |
| In the absence of advance analysis of query plans, it's better to look at observed metrics. | ||
|
|
||
| We know the steady-state memory utilization and ignore other signals like the size of its inputs. | ||
| The required resources are within a factor of two from the steady-state utilization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| The required resources are within a factor of two from the steady-state utilization. | |
| The memory resources required to restart and hydrate a repliace are within a factor of two from the steady-state utilization. |
| ### Materialized views | ||
|
|
||
| Materialized views suffer from the same problem as indexes, but they do not necessarily maintain their output in memory. | ||
| The resource requirements can be approximated as within a factor of two of its steady-state plus the output size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| The resource requirements can be approximated as within a factor of two of its steady-state plus the output size. | |
| The memory resource requirements to restart and hydrate can be approximated as within a factor of two of its steady-state plus the output size. |
|
|
||
| ### Sources and sinks | ||
|
|
||
| @antiguru cannot say much about non-compute objects. :/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One hopes that the memory required is linear in the snapshot size?
|
|
||
| ### Combining resource requirements | ||
|
|
||
| Once we know the resource requirements of each object we can estimate the resources required to successfully restart a replica. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You say "resource" but do you really mean anything other than memory?
| like to skip or delay it. | ||
| --> | ||
|
|
||
| Before implementing any of this idea in code, we can validate the hypotheses by observing how Materialize behaves in production. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these questions we can answer using historical data?
| --> | ||
|
|
||
| * Should we have a finer-grained model that can take sequential hydration into account? | ||
| While possible, it would require us to exercise more control over what gets hydrated in which order, which I think is a separate problem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes: rehydration is a capacity scheduling problem.
Design for a replica capacity measurement.
cc @mgree
Checklist
$T ⇔ Proto$Tmapping (possibly in a backwards-incompatible way), then it is tagged with aT-protolabel.