Summary
Propose adding a cloudpickle.patch_multiprocessing() helper that replaces multiprocessing.reduction.ForkingPickler with a cloudpickle-based pickler, enabling Pool.map(lambda x: x**2, range(10)) to work out of the box.
Motivation: ecosystem fragmentation
Every project that needs cloudpickle + multiprocessing.Pool independently reinvents this patching. At least 6 projects maintain their own version:
| Project |
Approach |
| loky/joblib |
Full custom _LokyPickler subsystem in loky/backend/reduction.py |
| PySpark |
Own CloudPickleSerializer wrapping cloudpickle.dumps/loads |
| Ray |
Bundled fork as ray.cloudpickle with custom object store |
| Dask |
Custom serialization protocol in distributed scheduler |
| multiprocess |
Complete fork of CPython's multiprocessing with dill substituted |
| trading-strategy/exec-sandbox/pypeln/pyrocko |
Ad-hoc monkey patches of varying correctness |
Most ad-hoc implementations are incomplete because of a non-obvious CPython pitfall (see below).
The _ForkingPickler double-binding pitfall
CPython has two separate name bindings for ForkingPickler:
# multiprocessing/reduction.py
class ForkingPickler(pickle.Pickler):
...
# multiprocessing/connection.py
from .context import reduction
_ForkingPickler = reduction.ForkingPickler # captured at import time
class Connection:
def send(self, obj):
self._send_bytes(_ForkingPickler.dumps(obj)) # uses the captured reference
Patching reduction.ForkingPickler alone is insufficient — Connection.send() still uses the stale _ForkingPickler reference captured at import time. You must also patch multiprocessing.connection._ForkingPickler. Most ad-hoc implementations miss this.
Additionally, reduction.dump() is a module-level function that also needs replacing for completeness.
Proposed API
import cloudpickle
cloudpickle.patch_multiprocessing()
One call, idempotent, patches all three binding sites:
multiprocessing.reduction.ForkingPickler — the class
multiprocessing.reduction.dump — the module-level helper
multiprocessing.connection._ForkingPickler — the import-time captured reference
Reference implementation
Here's a minimal working implementation (tested on Python 3.14):
import copyreg
import io
import multiprocessing.connection
import multiprocessing.reduction
import cloudpickle
class CloudForkingPickler(cloudpickle.Pickler):
"""ForkingPickler replacement backed by cloudpickle."""
_extra_reducers = {}
_copyreg_dispatch_table = copyreg.dispatch_table
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.dispatch_table = self._copyreg_dispatch_table.copy()
self.dispatch_table.update(self._extra_reducers)
@classmethod
def register(cls, type, reduce):
cls._extra_reducers[type] = reduce
@classmethod
def dumps(cls, obj, protocol=None):
buf = io.BytesIO()
cls(buf, protocol).dump(obj)
return buf.getbuffer()
loads = staticmethod(cloudpickle.loads)
def patch_multiprocessing():
"""Replace multiprocessing's ForkingPickler with cloudpickle-based version."""
# 1. The class itself
multiprocessing.reduction.ForkingPickler = CloudForkingPickler
# 2. The module-level dump() helper
multiprocessing.reduction.dump = lambda obj, file, protocol=None: \
CloudForkingPickler(file, protocol).dump(obj)
# 3. The import-time captured reference in connection.py
multiprocessing.connection._ForkingPickler = CloudForkingPickler
After patch_multiprocessing():
from multiprocessing import Pool
with Pool(4) as p:
print(p.map(lambda x: x**2, range(10)))
# [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
Why cloudpickle (not CPython)
There's an open discussion on discuss.python.org about adding a pluggable pickler API to multiprocessing, but no PEP has materialized. cloudpickle is the pragmatic place for this — it already provides Pickler/dumps/loads, and adding a one-shot integration helper is a small, natural extension.
Alternatives considered
- "Just use loky/joblib" — Valid for many users, but loky replaces the entire process management layer. Many projects only need cloudpickle serialization with stdlib
multiprocessing.Pool.
- "Just use multiprocess (dill)" — Requires replacing all
multiprocessing imports. dill is heavier than cloudpickle and has different serialization semantics.
- "Document the pattern instead" — The
_ForkingPickler double-binding makes documentation insufficient; people will keep getting it wrong.
Happy to submit a PR if there's interest.
Summary
Propose adding a
cloudpickle.patch_multiprocessing()helper that replacesmultiprocessing.reduction.ForkingPicklerwith a cloudpickle-based pickler, enablingPool.map(lambda x: x**2, range(10))to work out of the box.Motivation: ecosystem fragmentation
Every project that needs cloudpickle +
multiprocessing.Poolindependently reinvents this patching. At least 6 projects maintain their own version:_LokyPicklersubsystem inloky/backend/reduction.pyCloudPickleSerializerwrappingcloudpickle.dumps/loadsray.cloudpicklewith custom object storeMost ad-hoc implementations are incomplete because of a non-obvious CPython pitfall (see below).
The
_ForkingPicklerdouble-binding pitfallCPython has two separate name bindings for
ForkingPickler:Patching
reduction.ForkingPickleralone is insufficient —Connection.send()still uses the stale_ForkingPicklerreference captured at import time. You must also patchmultiprocessing.connection._ForkingPickler. Most ad-hoc implementations miss this.Additionally,
reduction.dump()is a module-level function that also needs replacing for completeness.Proposed API
One call, idempotent, patches all three binding sites:
multiprocessing.reduction.ForkingPickler— the classmultiprocessing.reduction.dump— the module-level helpermultiprocessing.connection._ForkingPickler— the import-time captured referenceReference implementation
Here's a minimal working implementation (tested on Python 3.14):
After
patch_multiprocessing():Why cloudpickle (not CPython)
There's an open discussion on discuss.python.org about adding a pluggable pickler API to multiprocessing, but no PEP has materialized. cloudpickle is the pragmatic place for this — it already provides
Pickler/dumps/loads, and adding a one-shot integration helper is a small, natural extension.Alternatives considered
multiprocessing.Pool.multiprocessingimports. dill is heavier than cloudpickle and has different serialization semantics._ForkingPicklerdouble-binding makes documentation insufficient; people will keep getting it wrong.Happy to submit a PR if there's interest.