Skip to content

Conversation

@zacw7
Copy link
Member

@zacw7 zacw7 commented Dec 12, 2025

== RELEASE NOTES ==

Prestissimo
* Add ``ssd-cache-max-entries`` config to limit SSD cache entries and prevent unbounded metadata memory growth (default value: 10M entries, ~500MB metadata).  :pr:`26795`

@zacw7 zacw7 requested review from a team, elharo and steveburnett as code owners December 12, 2025 22:52
@prestodb-ci prestodb-ci added the from:Meta PR from Meta label Dec 12, 2025
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Dec 12, 2025

Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

Adds a configurable upper bound on the number of SSD cache entries and wires it through system configuration to the SSD cache setup, with a default of 10M entries (~500MB metadata).

Sequence diagram for propagating SSD cache maxEntries into SsdCache

sequenceDiagram
  participant PrestoServer
  participant SystemConfig
  participant SsdCacheConfig
  participant SsdCache

  PrestoServer->>SystemConfig: ssdCacheReadVerificationEnabled()
  SystemConfig-->>PrestoServer: bool
  PrestoServer->>SystemConfig: ssdCacheMaxEntries()
  SystemConfig-->>PrestoServer: uint64_t
  PrestoServer->>SsdCacheConfig: create cacheConfig(..., ssdCacheReadVerificationEnabled, ssdCacheMaxEntries)
  SsdCacheConfig-->>PrestoServer: cacheConfig
  PrestoServer->>SsdCache: create SsdCache(cacheConfig)
Loading

Class diagram for new SSD cache maxEntries configuration flow

classDiagram
  class SystemConfig {
    +bool ssdCacheReadVerificationEnabled()
    +uint64_t ssdCacheMaxEntries()
  }
  class PrestoServer {
    +std::unique_ptr~velox::cache::SsdCache~ setupSsdCache()
  }
  class SsdCacheConfig {
    +SsdCacheConfig(uint64_t ssdCacheSize, uint64_t ssdCacheCheckpointSize, bool asyncCacheSsdDisableFileCow, bool ssdCacheChecksumEnabled, bool ssdCacheReadVerificationEnabled, uint64_t ssdCacheMaxEntries)
  }
  class SsdCache {
    +SsdCache(SsdCacheConfig cacheConfig)
  }

  PrestoServer --> SystemConfig : uses
  PrestoServer --> SsdCacheConfig : creates
  PrestoServer --> SsdCache : creates
  SsdCacheConfig --> SsdCache : configures
Loading

File-Level Changes

Change Details Files
Introduce a new ssd-cache-max-entries system configuration property with a default of 10M entries.
  • Declare new config key constant kSsdCacheMaxEntries and document its semantics and default in SystemConfig
  • Register kSsdCacheMaxEntries as a numeric property with default 10,000,000 in the SystemConfig constructor
  • Expose a typed accessor ssdCacheMaxEntries() returning uint64_t via optionalProperty
presto-native-execution/presto_cpp/main/common/Configs.h
presto-native-execution/presto_cpp/main/common/Configs.cpp
Plumb the new SSD cache max-entries limit into SSD cache initialization and document the config.
  • Extend PrestoServer::setupSsdCache() to pass ssdCacheMaxEntries() into the velox::cache::SsdCache configuration
  • Update Presto C++ properties documentation to describe the new ssd-cache-max-entries configuration option and its default
presto-native-execution/presto_cpp/main/PrestoServer.cpp
presto-docs/src/main/sphinx/presto_cpp/properties.rst

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • The new ssdCacheMaxEntries config uses uint64_t, but most config numerics are signed; consider using a signed type and explicitly validating non-negative values to avoid surprising wraparound if a negative value is configured.
  • The comment states that 0 means "no limit" for ssdCacheMaxEntries, but this behavior isn't visible in the changes here; ensure that the underlying SsdCache / cacheConfig actually implements this semantic so the config description matches behavior.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The new `ssdCacheMaxEntries` config uses `uint64_t`, but most config numerics are signed; consider using a signed type and explicitly validating non-negative values to avoid surprising wraparound if a negative value is configured.
- The comment states that `0` means "no limit" for `ssdCacheMaxEntries`, but this behavior isn't visible in the changes here; ensure that the underlying `SsdCache` / `cacheConfig` actually implements this semantic so the config description matches behavior.

## Individual Comments

### Comment 1
<location> `presto-docs/src/main/sphinx/presto_cpp/properties.rst:553-554` </location>
<code_context>
+* **Default value:** ``10000000``
+
+  Maximum number of entries allowed in the SSD cache. A value of 0 means no limit.
+  When the limit is reached, new entry writes will be skipped.
+
+  The default of 10 million entries keeps metadata memory usage around 500MB, as each
</code_context>

<issue_to_address>
**nitpick (typo):** Consider rephrasing "new entry writes" for clarity

You could instead say “writes of new entries” or “writes of additional entries” so it’s clearer that only writes of new entries are skipped once the limit is reached.

```suggestion
  Maximum number of entries allowed in the SSD cache. A value of 0 means no limit.
  When the limit is reached, writes of new entries will be skipped.
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@zacw7 zacw7 requested a review from kewang1024 December 12, 2025 23:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:Meta PR from Meta

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants