Add a memory bound FileStatisticsCache for the Listing Table #20047
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
This change introduces a default FileStatisticsCache implementation for the ListingTable with a size limit, implementing the following steps following #19052 (comment) :
Added heap size estimation for file statistics and the relevant data types used in caching (This is temporary until Add heap memory estimation for statistics #19599 and Add a crate for HeapSize trait arrow-rs#9138 are resolved)
Redesigned DefaultFileStatisticsCache to use a LruQueue to make it memory-bound following Adds memory-bound DefaultListFilesCache #18855
Introduced a size limit and use it together with the heap_size to limit the memory usage of the cache
Moved FileStatisticsCache creation into CacheManager, making it session-scoped and shared across statements and listing tables.
Disabled caching in some of the SQL-logic tests where the change altered the output result.
Closes Add a default
FileStatisticsCacheimplementation for theListingTable#19217Closes Add limit to
DefaultFileStatisticsCache#19052Rationale for this change
See above.
What changes are included in this PR?
See above.
Are these changes tested?
Yes.
Are there any user-facing changes?
A new runtime setting
datafusion.runtime.file_statistics.cache_limit