Remove file prefetching from FileStream#20916
Conversation
Simplify the FileStream state machine by removing the mechanism that opens the next file in parallel while scanning the current one. Files are now opened sequentially (Scan -> Idle -> Open) instead of prefetching. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
run benchmarks |
|
Benchmark job started for this request (job |
|
Benchmark job started for this request (job |
|
Benchmark job started for this request (job |
This reverts commit 38fe60a.
|
🤖 Benchmark completed (GKE) | trigger Details
Resource Usagetpch — base (merge-base)
tpch — branch
|
|
🤖 Benchmark completed (GKE) | trigger Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
|
|
🤖 Benchmark completed (GKE) | trigger Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
|
| .metrics | ||
| .clone(); | ||
| let _timer = scanning_total_metric.timer(); | ||
| self.start_next_file().transpose() |
There was a problem hiding this comment.
So this @alamb is what I was mostly talking about. It will read the footer (what we want) but AFAIK also:
- build the pruning predicate (I think this is suboptimal, too early)
- prune row groups
- optionally load the page index
- return the stream (without driving that forward)
We should be able to do this much better with the IO / CPU separation.
There was a problem hiding this comment.
This makes sense to me 👍🏻
|
🤖 |
|
🤖: Benchmark completed Details
|
|
run benchmarks |
|
Benchmark job started for this request (job |
|
Benchmark job started for this request (job |
|
Benchmark job started for this request (job |
|
🤖 Benchmark completed (GKE) | trigger Details
Resource Usageclickbench_pushdown — base (merge-base)
clickbench_pushdown — branch
|
|
The benchmarks with S3-like latency show that it is pretty ineffective (or even counterproductive, see query 6 of ClickBench) at this point. Let's remove this to simplify this code and make room for the better CPU/IO pipelining? |
adriangb
left a comment
There was a problem hiding this comment.
Yep agreed let’s move forward with this, the logic and data both support it.
|
I think we should ask @thinkharderdev about this one as he put in the original prefetch as I recall Maybe we could also put out a call to some other downstream users (via the mailing list or something) to ask them to test What I worry about is that if we remove this feature and someone was relying on it downstream they will hit a regression without any way to revert / go back |
|
Yeah good call. I posted on discord and the mailing list. |
Agreed - let's bake it a bit more to gather feedback thanks @adriangb @alamb |
thinkharderdev
left a comment
There was a problem hiding this comment.
Makes sense. The original goal here was to pipeline the IO but current file opening blends too much IO and CPU work for this to be effective I think
|
run benchmark supported? |
|
Hi @Dandandan, thanks for the request (#20916 (comment)). Supported benchmarks:
Usage: Per-side configuration ( env:
SHARED_SETTING: enabled
baseline:
ref: v45.0.0
env:
DATAFUSION_RUNTIME_MEMORY_LIMIT: 1G
changed:
ref: v46.0.0
env:
DATAFUSION_RUNTIME_MEMORY_LIMIT: 2G |
|
run benchmark tpch10 clickbench_partitioned clickbench_1 clickbench_extended |
|
Thanks all, let's wait a couple of days for further feedback - I launched some more benchmarks to check them all |
|
run benchmark tpch10 clickbench_partitioned clickbench_1 clickbench_extended |
|
🤖 Benchmark completed (GKE) | trigger Details
Resource Usagetpch10 — base (merge-base)
tpch10 — branch
|
|
🤖 Benchmark completed (GKE) | trigger Details
Resource Usageclickbench_extended — base (merge-base)
clickbench_extended — branch
|
|
🤖 Benchmark completed (GKE) | trigger Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
|
|
🤖 Benchmark completed (GKE) | trigger Details
Resource Usageclickbench_extended — base (merge-base)
clickbench_extended — branch
|
|
🤖 Benchmark completed (GKE) | trigger Details
Resource Usageclickbench_1 — base (merge-base)
clickbench_1 — branch
|
|
🤖 Benchmark completed (GKE) | trigger Details
Resource Usageclickbench_1 — base (merge-base)
clickbench_1 — branch
|
|
🤖 Benchmark completed (GKE) | trigger Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
|
|
🤖 Benchmark completed (GKE) | trigger Details
Resource Usagetpch10 — base (merge-base)
tpch10 — branch
|
Simplify the FileStream state machine by removing the mechanism that opens the next file in parallel while scanning the current one. Files are now opened sequentially (Scan -> Idle -> Open) instead of prefetching.
Which issue does this PR close?
Rationale for this change
Current prefetching is ineffective or even counterproductive, see e.g. query 6 when testing against simulated S3 latency:
Let's remove it so we can make room for better prefetching taking care of pruning / late materilization better, cache behavior and IO/CPU split.
What changes are included in this PR?
Remove this API / prefetching to make room for more impactful prefetching,
Are these changes tested?
Existing tests.
Are there any user-facing changes?
Yes,
NextOpennext: Option<NextOpen>,are removed ofpubmembers.