Expose ExecutionPlan statistics across the FFI boundary#22157
Expose ExecutionPlan statistics across the FFI boundary#22157mailmindlin wants to merge 12 commits into
ExecutionPlan statistics across the FFI boundary#22157Conversation
|
Thank you for opening this pull request! Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch). Details |
Not sure why it shows up on the FFI enum but not the original
97fb89e to
a171b9e
Compare
There was a problem hiding this comment.
Pull request overview
This PR plumbs ExecutionPlan::partition_statistics and TableProvider::statistics across the datafusion-ffi boundary so out-of-process plugins can report real statistics to the planner instead of always returning Statistics::new_unknown/None. It reuses the existing datafusion_proto_common::Statistics prost encoding (the same approach already used for filter expressions) rather than mirroring ScalarValue in #[repr(C)].
Changes:
- New
datafusion_ffi::statisticsmodule withserialize_statistics/deserialize_statisticshelpers backed bydatafusion-proto-common. - New
partition_statisticsfunction pointer onFFI_ExecutionPlanand correspondingForeignExecutionPlan::partition_statisticsimplementation (Result-propagating). - New
statisticsfunction pointer onFFI_TableProviderand correspondingForeignTableProvider::statisticsimplementation (logs +debug_assert!on decode error since the trait can only returnOption).
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
datafusion/ffi/src/lib.rs |
Registers the new statistics module. |
datafusion/ffi/src/statistics.rs |
New helpers wrapping prost round-trip with three unit round-trip tests. |
datafusion/ffi/src/execution_plan.rs |
Adds partition_statistics FFI field, wrapper, and ForeignExecutionPlan impl; extends EmptyExec test helper with with_statistics; adds a round-trip integration test. |
datafusion/ffi/src/table_provider.rs |
Adds statistics FFI field, wrapper, and ForeignTableProvider::statistics impl with log::warn!/debug_assert! on decode failure; adds a TableWithStats-based round-trip test. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
timsaucer
left a comment
There was a problem hiding this comment.
Looking good, but a few things worth addressing
| /// `datafusion_proto_common::Statistics`. | ||
| pub partition_statistics: unsafe extern "C" fn( | ||
| plan: &Self, | ||
| partition: FFI_Option<u64>, |
There was a problem hiding this comment.
why FFI_Option<u64> instead of FFI_Option<usize>?
There was a problem hiding this comment.
I'm not sure I understand what you mean
There was a problem hiding this comment.
Fascinating. If you type in FFI_Option<u64> without the back ticks then it doesn't show the <u64>. added the ticks
into_iter().collect() is hard to optimize if you're not cheating like the stdlib
Which issue does this PR close?
ExecutionPlanstatistics across the FFI boundary #22152Rationale for this change
ExecutionPlan::partition_statisticsandTableProvider::statisticsare not currently transported across the DataFusion FFI boundary, so foreign plans and providers always reportStatistics::new_unknown/None. This blocks optimizer rules that depend on statistics (e.g. join reordering, partition pruning) from working with out-of-process plugins, which defeats the point of exposing those hooks to plugin authors.StatisticscontainsPrecision<ScalarValue>for column min/max/sum.ScalarValueis a large enum that's impractical to mirror in#[repr(C)], so I reuse the existingdatafusion_proto_common::Statisticsprost encoding — the same pattern this crate already uses for filter expressions.What changes are included in this PR?
datafusion_ffi::statisticsmodule with[de]serialize_statisticshelpers wrapping thedatafusion_proto_common::Statisticsround-trip.partition_statisticsfield onFFI_ExecutionPlanand correspondingExecutionPlan::partition_statisticsimpl onForeignExecutionPlanstatisticsfield onFFI_TableProviderand correspondingTableProvider::statisticsimpl onForeignTableProvider. Since the trait returnsOption<Statistics>, the implementation cannot propagate decode errors, it logs alog::warn!and triggers adebug_assert!.This PR is expected to be merged after #22136 so it includes those changes.
Are these changes tested?
Yes:
statistics.rscover three round-trip cases:Statistics::new_unknown, fully-exact statistics withScalarValue::Int32/Int64/Utf8min/max/sum, and mixedPrecision::Exact/Inexact/Absentvalues.execution_plan.rsexercisesForeignExecutionPlan::partition_statisticswith bothNoneandSome(idx)partitions, against a plan with no statistics (returnsStatistics::new_unknown) and a plan with concrete statistics.table_provider.rsuses a thinTableWithStatswrapper overMemTableto verify both theNonepath and the concreteStatisticspath throughForeignTableProvider::statistics.Are there any user-facing changes?
This is a breaking ABI change for the
datafusion-fficrate:FFI_ExecutionPlangains apartition_statisticsfield.FFI_TableProvidergains astatisticsfield.Plugins compiled against earlier versions of
datafusion-ffiwill need to be recompiled. There are no breaking changes to the Rust trait surface or toStatisticsitself; downstreamExecutionPlan/TableProviderimplementations require no changes.