@@ -1771,42 +1771,35 @@ def with_physical_extension_codec(self, codec: Any) -> SessionContext:
17711771 return new
17721772
17731773 def with_python_udf_inlining (self , * , enabled : bool ) -> SessionContext :
1774- """Toggle inline encoding of Python-defined UDFs on this session.
1775-
1776- ``enabled`` is keyword-only:
1777- ``with_python_udf_inlining(enabled=False)`` reads at the call
1778- site as the inverse of
1779- ``with_python_udf_inlining(enabled=True)``, where a positional
1780- ``True`` / ``False`` would not.
1781-
1782- When ``True`` (the default), Python scalar, aggregate, and window
1783- UDFs travel inside the serialized expression and are
1784- reconstructed on the receiver without pre-registration.
1785-
1786- Set ``False`` to:
1787-
1788- * Produce serialized bytes that round-trip through a non-Python
1789- decoder (cross-language portability). UDFs are stored by name
1790- only; the receiver must have matching registrations.
1791- * Refuse to reconstruct Python UDFs from
1792- :meth:`Expr.from_bytes` input that may come from an untrusted
1793- source — ``cloudpickle.loads`` will not be invoked.
1794-
1795- The toggle applies directly to :meth:`Expr.to_bytes` /
1796- :meth:`Expr.from_bytes` calls that pass this session as their
1797- ``ctx`` argument. To make the toggle apply through
1798- :func:`pickle.dumps` (which calls :meth:`Expr.to_bytes` with no
1799- context), install this session as the driver's sender context
1800- via :func:`datafusion.ipc.set_sender_ctx` — and install it as
1801- the worker's context via
1802- :func:`datafusion.ipc.set_worker_ctx` for the corresponding
1803- :func:`pickle.loads`.
1804-
1805- For the full security model, see
1806- :doc:`/user-guide/io/distributing_work` (Security section). In
1807- short: this toggle narrows only the :meth:`Expr.from_bytes`
1808- surface; :func:`pickle.loads` on untrusted bytes remains
1809- unsafe regardless of the toggle.
1774+ """Control whether Python UDFs are embedded in serialized expressions.
1775+
1776+ When ``enabled=True`` (the default), serialized expressions carry
1777+ the Python code for any scalar, aggregate, or window UDFs they
1778+ reference. The receiver rebuilds the UDFs from those bytes and
1779+ does not need to register them first.
1780+
1781+ When ``enabled=False``, serialized expressions store only the
1782+ UDF names. This has two uses:
1783+
1784+ * **Cross-language portability.** The bytes can be decoded by a
1785+ non-Python receiver, which must already have UDFs registered
1786+ under matching names.
1787+ * **Safer deserialization.** :meth:`Expr.from_bytes` will refuse
1788+ to rebuild Python UDFs rather than call ``cloudpickle.loads``
1789+ on untrusted input.
1790+
1791+ The setting affects :meth:`Expr.to_bytes` and
1792+ :meth:`Expr.from_bytes` whenever this session is passed as the
1793+ ``ctx`` argument. :func:`pickle.dumps` and :func:`pickle.loads`
1794+ do not pass a context, so to apply the setting through pickle,
1795+ register this session with
1796+ :func:`datafusion.ipc.set_sender_ctx` on the sender and
1797+ :func:`datafusion.ipc.set_worker_ctx` on the receiver.
1798+
1799+ .. warning:: Security
1800+ This setting narrows only :meth:`Expr.from_bytes`. Calling
1801+ :func:`pickle.loads` on untrusted bytes remains unsafe
1802+ regardless of the toggle.
18101803 """
18111804 new_internal = self .ctx .with_python_udf_inlining (enabled )
18121805 new = SessionContext .__new__ (SessionContext )
0 commit comments