Skip to content

Conversation

@antiguru
Copy link
Member

@antiguru antiguru commented Feb 6, 2025

Design for a replica capacity measurement.

cc @mgree

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

Signed-off-by: Moritz Hoffmann <mh@materialize.com>
Copy link
Contributor

@mgree mgree left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. As a document, maybe a touch more abstract than we need it? If we're treating disk as elastic memory, do we really mean any resource other than memory? Can CPU starvation cause a cluster to not come up at all, or just not within a timeframe we're happy with?

-->

A core problem of using Materialize today is that users cannot rely on a configuration that works today to continue working in the future.
We give indications about a workload's status, but it is easy to ignore them, or drift into an unsupported configuration by workload changes or unintended DDL operations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We give indications about a workload's status, but it is easy to ignore them, or drift into an unsupported configuration by workload changes or unintended DDL operations.
We give indications about a workload's status, but it is easy to ignore them, misunderstand them, or drift into an unsupported configuration by workload changes or unintended DDL operations.


At the moment, we present detailed metrics, such as memory, CPU and disk utilization, for replicas, with the hope that the metrics successfully characterize the health of a replica, and allow users to make scaling decisions.
While the metrics are useful from an operational perspective, they are not suitable to predict the future behavior of a replica.
Specifically, they are a bad predictor for whether a replica can successfully restart, since the resource utilization during restart can be different from steady-state.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Specifically, they are a bad predictor for whether a replica can successfully restart, since the resource utilization during restart can be different from steady-state.
Specifically, memory/disk utilization on their own are a bad/easy-to-misunderstand predictor for whether a replica can successfully restart, since the resource utilization during restart is almost certainly different from steady-state.

In the absence of advance analysis of query plans, it's better to look at observed metrics.

We know the steady-state memory utilization and ignore other signals like the size of its inputs.
The required resources are within a factor of two from the steady-state utilization.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The required resources are within a factor of two from the steady-state utilization.
The memory resources required to restart and hydrate a repliace are within a factor of two from the steady-state utilization.

### Materialized views

Materialized views suffer from the same problem as indexes, but they do not necessarily maintain their output in memory.
The resource requirements can be approximated as within a factor of two of its steady-state plus the output size.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The resource requirements can be approximated as within a factor of two of its steady-state plus the output size.
The memory resource requirements to restart and hydrate can be approximated as within a factor of two of its steady-state plus the output size.


### Sources and sinks

@antiguru cannot say much about non-compute objects. :/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One hopes that the memory required is linear in the snapshot size?


### Combining resource requirements

Once we know the resource requirements of each object we can estimate the resources required to successfully restart a replica.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You say "resource" but do you really mean anything other than memory?

like to skip or delay it.
-->

Before implementing any of this idea in code, we can validate the hypotheses by observing how Materialize behaves in production.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these questions we can answer using historical data?

-->

* Should we have a finer-grained model that can take sequential hydration into account?
While possible, it would require us to exercise more control over what gets hydrated in which order, which I think is a separate problem.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes: rehydration is a capacity scheduling problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants