Skip to content

Commit eb3c194

Browse files
timsaucerclaude
andcommitted
docs: clarify Python UDF inlining docstring; drop unresolved :doc: refs
Rewrite with_python_udf_inlining docstring for readability and remove references to /user-guide/io/distributing_work, which does not exist yet. Keep security warning inline as a .. warning:: Security block, matching the existing pattern in Expr.to_bytes / from_bytes / __reduce__. The central doc will land in a follow-on PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 14178db commit eb3c194

3 files changed

Lines changed: 30 additions & 40 deletions

File tree

python/datafusion/context.py

Lines changed: 29 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -1771,42 +1771,35 @@ def with_physical_extension_codec(self, codec: Any) -> SessionContext:
17711771
return new
17721772

17731773
def with_python_udf_inlining(self, *, enabled: bool) -> SessionContext:
1774-
"""Toggle inline encoding of Python-defined UDFs on this session.
1775-
1776-
``enabled`` is keyword-only:
1777-
``with_python_udf_inlining(enabled=False)`` reads at the call
1778-
site as the inverse of
1779-
``with_python_udf_inlining(enabled=True)``, where a positional
1780-
``True`` / ``False`` would not.
1781-
1782-
When ``True`` (the default), Python scalar, aggregate, and window
1783-
UDFs travel inside the serialized expression and are
1784-
reconstructed on the receiver without pre-registration.
1785-
1786-
Set ``False`` to:
1787-
1788-
* Produce serialized bytes that round-trip through a non-Python
1789-
decoder (cross-language portability). UDFs are stored by name
1790-
only; the receiver must have matching registrations.
1791-
* Refuse to reconstruct Python UDFs from
1792-
:meth:`Expr.from_bytes` input that may come from an untrusted
1793-
source — ``cloudpickle.loads`` will not be invoked.
1794-
1795-
The toggle applies directly to :meth:`Expr.to_bytes` /
1796-
:meth:`Expr.from_bytes` calls that pass this session as their
1797-
``ctx`` argument. To make the toggle apply through
1798-
:func:`pickle.dumps` (which calls :meth:`Expr.to_bytes` with no
1799-
context), install this session as the driver's sender context
1800-
via :func:`datafusion.ipc.set_sender_ctx` — and install it as
1801-
the worker's context via
1802-
:func:`datafusion.ipc.set_worker_ctx` for the corresponding
1803-
:func:`pickle.loads`.
1804-
1805-
For the full security model, see
1806-
:doc:`/user-guide/io/distributing_work` (Security section). In
1807-
short: this toggle narrows only the :meth:`Expr.from_bytes`
1808-
surface; :func:`pickle.loads` on untrusted bytes remains
1809-
unsafe regardless of the toggle.
1774+
"""Control whether Python UDFs are embedded in serialized expressions.
1775+
1776+
When ``enabled=True`` (the default), serialized expressions carry
1777+
the Python code for any scalar, aggregate, or window UDFs they
1778+
reference. The receiver rebuilds the UDFs from those bytes and
1779+
does not need to register them first.
1780+
1781+
When ``enabled=False``, serialized expressions store only the
1782+
UDF names. This has two uses:
1783+
1784+
* **Cross-language portability.** The bytes can be decoded by a
1785+
non-Python receiver, which must already have UDFs registered
1786+
under matching names.
1787+
* **Safer deserialization.** :meth:`Expr.from_bytes` will refuse
1788+
to rebuild Python UDFs rather than call ``cloudpickle.loads``
1789+
on untrusted input.
1790+
1791+
The setting affects :meth:`Expr.to_bytes` and
1792+
:meth:`Expr.from_bytes` whenever this session is passed as the
1793+
``ctx`` argument. :func:`pickle.dumps` and :func:`pickle.loads`
1794+
do not pass a context, so to apply the setting through pickle,
1795+
register this session with
1796+
:func:`datafusion.ipc.set_sender_ctx` on the sender and
1797+
:func:`datafusion.ipc.set_worker_ctx` on the receiver.
1798+
1799+
.. warning:: Security
1800+
This setting narrows only :meth:`Expr.from_bytes`. Calling
1801+
:func:`pickle.loads` on untrusted bytes remains unsafe
1802+
regardless of the toggle.
18101803
"""
18111804
new_internal = self.ctx.with_python_udf_inlining(enabled)
18121805
new = SessionContext.__new__(SessionContext)

python/datafusion/expr.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -454,8 +454,7 @@ def to_bytes(self, ctx: SessionContext | None = None) -> bytes:
454454
Built-in functions and Python UDFs (scalar, aggregate, window)
455455
travel inside the returned bytes; the worker does not need to
456456
pre-register them. UDFs imported via the FFI capsule protocol
457-
travel by name only and must be registered on the worker. See
458-
:doc:`/user-guide/io/distributing_work`.
457+
travel by name only and must be registered on the worker.
459458
460459
.. warning:: Security
461460
Bytes returned here may embed a cloudpickled Python

python/datafusion/ipc.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -72,8 +72,6 @@ def init_worker():
7272
inlining on). The sender context only affects pickle / ``to_bytes``
7373
encoding; explicit ``expr.to_bytes(ctx)`` calls still use the supplied
7474
``ctx``.
75-
76-
See :doc:`/user-guide/io/distributing_work` for the full pattern.
7775
"""
7876

7977
from __future__ import annotations

0 commit comments

Comments
 (0)