Add Python bindings for accessing ExecutionMetrics by ShreyeshArangath · Pull Request #1381 · apache/datafusion-python

ShreyeshArangath · 2026-02-15T01:52:11Z

Which issue does this PR close?

Rationale for this change

Today, DataFusion Python only exposes execution metrics through formatted console output via explain(analyze=True). This makes it difficult to programmatically inspect execution behavior.

There is currently no structured python API to access per-operator metrics such as output_rows, elapsed_compute, spill_count and other runtime metrics collected during execution.

This PR introduces APIs to surface the execution metrics, mirroring the Rust API in datafusion::physical_plan::metrics.

What changes are included in this PR?

Added plan caching to PyDataFrame so the physical plan used during execution is retained and available for metrics access.
Kept the metrics() method and added collect_metrics() helper to walk the execution plan tree and aggregate metrics from all operators.

Are there any user-facing changes?

Users can now programmatically access execution metrics

  df = ctx.sql("SELECT * FROM t WHERE x > 1")
  df.collect()
  plan = df.execution_plan()
  metrics = plan.collect_metrics() 
  for operator_name, metrics_set in metrics:
      print(f"{operator_name}: {metrics_set.output_rows} rows")

timsaucer

At a high level, I think this could bring a lot of value. Thank you for putting in the work!

From an implementation perspective, did you consider instead of caching the prior execution plan that instead we simply add the collect() and execute_stream() and so forth on PyExecutionPlan? It seems like that would more closely mirror the upstream repo and simplify the code. I haven't spent a lot of time going through the details of why you're caching the prior plan, so it's very possible I missed something.

ShreyeshArangath · 2026-02-20T05:58:50Z

@timsaucer Thanks for the suggestion! Initially when I designed the change, I did consider moving collect() / execute_*() onto plan object. The reason I didn’t go that route was more about how observability fits into real usage patterns (from the cases that I have seen).

Today, I think the users naturally treat a dataframe as the primary handle for a query:

df = ctx.sql("SELECT * FROM t WHERE column1 > 1")
batches = df.collect()

Requiring metrics to go through ExecutionPlan would effectively change the model to look something like so

df = ctx.sql("SELECT * FROM t WHERE column1 > 1")
plan = df.execution_plan()
batches = plan.collect()
metrics = plan.collect_metrics()

I thought that this would require users to restructure pipelines and thread a plan object through call chains purely to have access to metrics. The LoE required to get people to use it seemed high to me.

My goal was to make minimal changes to how users can add support for metrics without changing how they run queries

df = ctx.sql("SELECT * FROM t WHERE column1 > 1")
batches = df.collect()
plan = df.execution_plan()
metrics = plan.collect_metrics()

I’m happy to switch to the plan-based approach if we prefer stronger alignment with the upstream API, but I leaned toward this design to make observability easier to adopt without disrupting current usage patterns — lmk what you think

timsaucer

First off, I love this PR!

I've become convinced that your approach is better than what I was suggesting with regards to making them create a plan and execute!

One area I am concerned about is that when we do a display() we do bypass all of this mechanism. That is good and bad. The good is that the metrics are definitely going to be different between the smaller collection that happens when we display because it ends early. The bad is that as a user it's probably confusing to see the the data but then be told that we don't have the metrics for the data in front of them. What do you think?

The biggest area that I think is really necessary is around user facing documentation. I'm willing to chip in and help with this if you need. I think we want to tell the users how to use these metrics, both mechanically (like how you have to have executed the dataframe) and what information they provide. Plus there are differences between which stage of the plan you get them from and the fact that some metrics come from the different partitions as opposed to aggregate values.

timsaucer · 2026-02-25T21:07:14Z

+    def metrics(self) -> MetricsSet | None:
+        """Return metrics for this plan node after execution, or None if unavailable."""
+        raw = self._raw_plan.metrics()
+        if raw is None:
+            return None
+        return MetricsSet(raw)


This is leading me to think we should have some high level documentation, probably in the DataFrame page (or a subpage under it). Some of the things it would be good to do are to explain to a user what kinds of information they could find under these metrics and why that data are not available until after the DataFrame has been executed.

+1, I think that would be super helpful. I can extend this to include a new user-facing RST page covering things like what metrics are, when they're available, how the physical plan tree maps to operators, etc.

timsaucer · 2026-02-25T21:07:48Z

+        """Walk the plan tree and collect metrics from all operators.
+
+        Returns a list of (operator_name, MetricsSet) tuples.


"Walk the plan tree and collect metrics" probably does not make a lot of sense to someone other than a developer. I think we can make this more user focused.

I haven't dug in, but is operator_name the name of the execution plan?

Updated! operator_name was meant to be the node.display() str

timsaucer · 2026-02-25T21:10:19Z

+    Provides both individual metric access and convenience aggregations
+    across partitions.


A bit of an explanation is probably useful here. Again, I don't think we can assume the user understands that there are both individual execution plan metrics as well as aggregate. I think that some operators have metrics that cannot be aggregated. In general I suspect we really do need some high level documentation with examples we can point to that makes all of this more concrete.

On second read I now see this is aggregating across partitions. So does that mean the metrics() fn is returning per partition metrics for one ExecutionPlan? Asking for my understanding mostly.

timsaucer · 2026-02-25T21:12:27Z

+
+    @property
+    def elapsed_compute(self) -> int | None:
+        """Sum of elapsed_compute across all partitions, in nanoseconds."""


We probably want to describe what elapsed_compute is rather than assume user knowledge.

timsaucer · 2026-02-25T21:12:42Z

+
+    @property
+    def spill_count(self) -> int | None:
+        """Sum of spill_count across all partitions."""


Same with spill count. Do you know what units it has?

From my understanding it seems to be spill-to-disk events (so no units). Updated the doc

timsaucer · 2026-02-25T21:20:59Z

+        let df = self.df.as_ref().clone();
+        let plan = wait_for_future(py, df.create_physical_plan())?
+            .map_err(PyDataFusionError::from)?;
+        *self.last_plan.lock() = Some(Arc::clone(&plan));
+        let task_ctx = Arc::new(self.df.as_ref().task_ctx());
+        let batches = wait_for_future(py, df_collect(plan, task_ctx))?


If I run collect() twice on a DF, should we instead just do the lock on the last plan and clone it? I suspect there's not a huge performance difference the vast majority of the time as opposed to how you have it.

timsaucer · 2026-02-25T21:22:31Z

+        if let Some(plan) = self.last_plan.lock().as_ref() {
+            return Ok(PyExecutionPlan::new(Arc::clone(plan)));
+        }
        let plan = wait_for_future(py, self.df.as_ref().clone().create_physical_plan())??;
        Ok(plan.into())


If you go the route of using the existing last_plan for collect() like in my other comment then I think you could set it here just like you do in collect().

timsaucer · 2026-02-25T21:22:54Z

+        let plan = wait_for_future(py, df.create_physical_plan())?
+            .map_err(PyDataFusionError::from)?;
+        *self.last_plan.lock() = Some(Arc::clone(&plan));
+        let task_ctx = Arc::new(self.df.as_ref().task_ctx());


It feels like we're doing this in a bunch of places, so maybe make a private helper function.

timsaucer · 2026-02-25T21:26:49Z

+        self.metrics.output_rows()
+    }
+
+    /// Returns the sum of all `elapsed_compute` metrics in nanoseconds, or None if not present.


There is a lot of boiler plate comments like this where the function is self explanatory and not exposed to the end user.

timsaucer · 2026-02-25T21:27:41Z

+
+    /// Returns the numeric value of this metric, or None for non-numeric types.
+    #[getter]
+    fn value(&self) -> Option<usize> {


It feels like we could return Option<Py<PyAny>> and try casting the value appropriately.

Agreed! Attempted to fixing this, lmk what you think

ShreyeshArangath · 2026-03-19T20:47:32Z

Apologies for the delayed response on this one 😅

One area I am concerned about is that when we do a display() we do bypass all of this mechanism. That is good and bad. The good is that the metrics are definitely going to be different between the smaller collection that happens when we display because it ends early. The bad is that as a user it's probably confusing to see the the data but then be told that we don't have the metrics for the data in front of them. What do you think?

That's a totally fair concern, and I think this is worth addressing. It's definitely going to trip people up. I'm still thinking through this, but a high-level idea could be that we could possibly have display() also cache a plan so users can at least inspect metrics from the display execution (haven't looked into what might be required to support this). The metrics would reflect the truncated run but timestamps and compute times would still be meaningful. WDYT?

For documentation, I've already started a little bit of work in this PR, please do lmk what you think (you likely have a lot more context on what a user might expect).

timsaucer · 2026-03-27T18:18:13Z

I was taking a look at this to see if we want to include it in DF53, but after rebase I'm getting two problems:

metrics are set on DataSourceExec even before it's executed so the unit test that would return None is failing. This does expose that we need to have a good explanation / idea of when this does exist or not.
test_union_distinct is now causing a panic

FAILED python/tests/test_dataframe.py::test_union_distinct - pyo3_runtime.PanicException: partition not used yet

# Conflicts: # crates/core/src/metrics.rs # uv.lock

timsaucer · 2026-04-06T11:48:21Z

How is this coming along? We've got a short list of PRs to merge before we start the release on 53 and I'm wondering if you think this is close to review to see if we should include in this release or hold for the next.

ShreyeshArangath · 2026-04-06T16:19:18Z

How is this coming along? We've got a short list of PRs to merge before we start the release on 53 and I'm wondering if you think this is close to review to see if we should include in this release or hold for the next.

Just trying to fix the last couple build issues, but I think functionality-wise, this is ready to review. It'd be nice to include it in the next release. Thanks

timsaucer

This is some excellent work!

I am a little torn about including it in the DF53 release because it is a decent change to one of the most important and fundamental APIs in the library and it hasn't spent any time on main for other developers to use/test it.

Would you mind if we merge this into main immediately after we release DF53? That would let it soak on main for a while so more of our devs experience the change in case there are some edge cases we haven't thought of.

timsaucer · 2026-04-08T18:20:41Z

+        }
+    }
+
+    fn value_as_datetime<'py>(&self, py: Python<'py>) -> PyResult<Option<Bound<'py, PyAny>>> {


This is a @Property on the python side so should we have #[getter]?

timsaucer · 2026-04-08T18:34:53Z

+Metrics are populated only **after** the DataFrame has been executed.
+Execution is triggered by any of the terminal operations:


It's actually up to exec executor to decide when the metrics are calculated, right? Aren't there some that may populate metrics ahead of time? I think that's part of what I found when looking at #1381 (comment)

timsaucer · 2026-04-08T18:43:29Z

+    fn timestamp_to_pyobject<'py>(
+        py: Python<'py>,
+        ts: &Timestamp,
+    ) -> PyResult<Option<Bound<'py, PyAny>>> {
+        match ts.value() {
+            Some(dt) => {
+                let nanos = dt.timestamp_nanos_opt().ok_or_else(|| {
+                    PyErr::new::<pyo3::exceptions::PyOverflowError, _>("timestamp out of range")
+                })?;
+                let datetime_mod = py.import("datetime")?;
+                let datetime_cls = datetime_mod.getattr("datetime")?;
+                let tz_utc = datetime_mod.getattr("timezone")?.getattr("utc")?;
+                let secs = nanos / 1_000_000_000;
+                let micros = (nanos % 1_000_000_000) / 1_000;
+                let result = datetime_cls.call_method1(
+                    "fromtimestamp",
+                    (secs as f64 + micros as f64 / 1_000_000.0, tz_utc),
+                )?;
+                Ok(Some(result))
+            }
+            None => Ok(None),
+        }
+    }
+}


My agent suggested this to avoid the floating point precision loss:

fn timestamp_to_pyobject<'py>( py: Python<'py>, ts: &Timestamp, ) -> PyResult<Option<Bound<'py, PyAny>>> { match ts.value() { Some(dt) => { let datetime_mod = py.import("datetime")?; let datetime_cls = datetime_mod.getattr("datetime")?; let tz_utc = datetime_mod.getattr("timezone")?.getattr("utc")?; let result = datetime_cls.call1(( dt.year(), dt.month(), dt.day(), dt.hour(), dt.minute(), dt.second(), dt.timestamp_subsec_micros(), tz_utc, ))?; Ok(Some(result)) } None => Ok(None), } }

nice, updated :)

ShreyeshArangath · 2026-04-08T22:41:35Z

Would you mind if we merge this into main immediately after we release DF53? That would let it soak on main for a while so more of our devs experience the change in case there are some edge cases we haven't thought of.

Yes, that's completely reasonable :) It is a lot of new code so I do expect issues, we can polish this for the next release

ShreyeshArangath changed the title ~~feat: add Python bindings for accessing ExecutionMetrics~~ Add Python bindings for accessing ExecutionMetrics Feb 15, 2026

ShreyeshArangath marked this pull request as ready for review February 15, 2026 01:53

timsaucer reviewed Feb 18, 2026

View reviewed changes

ShreyeshArangath added 2 commits February 25, 2026 10:20

feat: add Python bindings for accessing ExecutionMetrics

697de36

test: imporve tests

0a57da6

ShreyeshArangath force-pushed the feat/support-metrics branch from 075e1ec to 0a57da6 Compare February 25, 2026 18:21

ShreyeshArangath requested a review from timsaucer February 25, 2026 18:21

timsaucer reviewed Feb 25, 2026

View reviewed changes

timsaucer mentioned this pull request Feb 25, 2026

Improve online documentation page for DataFrame #1397

Open

ShreyeshArangath added 2 commits March 19, 2026 13:33

first round of reviews

e1d0c81

plan caching

7200857

ShreyeshArangath added 5 commits March 29, 2026 15:48

address some concerns

d2b6c9f

Merge branch 'main' into feat/support-metrics

30ec047

# Conflicts: # crates/core/src/metrics.rs # uv.lock

merge and address comments

a8623c2

fix Ci issues

a0ddc25

attempt to fix lint

afe8df8

fix build

71e20ed

fix docstring

98d5904

ShreyeshArangath requested a review from timsaucer April 8, 2026 17:21

timsaucer approved these changes Apr 8, 2026

View reviewed changes

timsaucer reviewed Apr 8, 2026

View reviewed changes

address some more comments

7631a82

timsaucer merged commit 8a7efea into apache:main Apr 13, 2026
21 checks passed

		"""Walk the plan tree and collect metrics from all operators.

		Returns a list of (operator_name, MetricsSet) tuples.

		Provides both individual metric access and convenience aggregations
		across partitions.

		Metrics are populated only after the DataFrame has been executed.
		Execution is triggered by any of the terminal operations:

Conversation

ShreyeshArangath commented Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Uh oh!

timsaucer left a comment

Choose a reason for hiding this comment

Uh oh!

ShreyeshArangath commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

timsaucer left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ShreyeshArangath commented Mar 19, 2026

Uh oh!

timsaucer commented Mar 27, 2026

Uh oh!

timsaucer commented Apr 6, 2026

Uh oh!

ShreyeshArangath commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

timsaucer left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ShreyeshArangath commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

ShreyeshArangath commented Feb 15, 2026 •

edited

Loading

ShreyeshArangath commented Feb 20, 2026 •

edited

Loading

ShreyeshArangath commented Apr 6, 2026 •

edited

Loading

ShreyeshArangath commented Apr 8, 2026 •

edited

Loading