[FLINK-39016][Runtime/REST] Add configurable TTL for ExecutionGraph cache independent of web refresh interval#27509
Conversation
| * Time-to-live for cached ExecutionGraph. If not set, defaults to the value of {@link | ||
| * #REFRESH_INTERVAL}. | ||
| * | ||
| * <p>Setting this to 0 (or a very small value) means the cache will always fetch fresh data, |
There was a problem hiding this comment.
In the comments it says "0 (or a very small value)" but the other comments only talk of 0. I think we should be explicit in the text as to what the smallest number is that would cause caching.
There was a problem hiding this comment.
Good catch! You're right that the Javadoc and withDescription were inconsistent. Looking at the DefaultExecutionGraphCache implementation, only a value of exactly 0 deterministically disables caching (since currentTime < currentTime is always false). A "very small value" would only probabilistically result in cache misses, not guarantee them. I've updated the Javadoc to remove the ambiguous "(or a very small value)" phrasing, making it consistent with the withDescription text — both now explicitly state that setting this to 0 disables caching.
| <td><h5>web.execution-graph.cache-ttl</h5></td> | ||
| <td style="word-wrap: break-word;">(none)</td> | ||
| <td>Duration</td> | ||
| <td>Time-to-live for cached ExecutionGraph. If not set, defaults to the value of '<code class="highlighter-rouge">web.refresh-interval</code>'. Setting this to 0 means the cache will always fetch fresh data, which is useful for real-time state synchronization scenarios.</td> |
There was a problem hiding this comment.
The only reference to web.refresh-interval is see in the docs is web.refresh-interval.
The config parameter starts with web. implies it is for the Web UI. There are no other parameters starting with web.execution-graph - this is a Flink internal concept. How should I understand the context of where this config option implies?
There was a problem hiding this comment.
Thanks for the review! You are absolutely right. The web. prefix is reserved for Web UI-facing configurations, and execution-graph is a Flink internal concept that doesn't fit in this namespace. I've moved the config option to RestOptions with the key rest.cache.execution-graph.timeout, following the same naming convention as the existing rest.cache.checkpoint-statistics.timeout. The documentation has been updated accordingly.
2a894f6 to
b13a852
Compare
| * Tests that ExecutionGraph cache TTL can be set to zero for real-time state synchronization. | ||
| */ | ||
| @Test | ||
| void testExecutionGraphCacheTTLZeroValue() { |
There was a problem hiding this comment.
This zero value test can be merged into testExecutionGraphCacheTTLCustomValue.
|
Please rebase the latest master branch and squash all commits. |
b13a852 to
46cf2fc
Compare
…ache independent of web refresh interval
46cf2fc to
3744421
Compare
What is the purpose of the change
This pull request introduces a new configuration option web.execution-graph.cache-ttl that allows users to configure the TTL (Time-to-Live) for the ExecutionGraph cache independently from the web.refresh-interval.
Previously, the ExecutionGraph cache TTL was implicitly tied to the web refresh interval, which made it difficult for users who need real-time job state synchronization (e.g., for monitoring dashboards or orchestration systems) to get fresh ExecutionGraph data without affecting the overall web UI refresh behavior.
With this change, users can:
Brief change log
Verifying this change
This change added tests and can be verified as follows:
Does this pull request potentially affect one of the following parts:
@Public(Evolving): (no)Documentation