@@ -734,6 +734,132 @@ Subinterpreters are supported. Each subinterpreter maintains its own
734734``sys.lazy_modules `` and import state, so lazy imports in one subinterpreter
735735do not affect others.
736736
737+ Performance
738+ -----------
739+
740+ Lazy imports have **no measurable performance overhead **. The implementation
741+ is designed to be performance-neutral for both code that uses lazy imports and
742+ code that doesn't.
743+
744+ Runtime performance
745+ ~~~~~~~~~~~~~~~~~~~
746+
747+ After reification (first use), lazy imports have **zero overhead **. The
748+ adaptive interpreter specializes the bytecode (typically after 2-3 accesses),
749+ eliminating any checks. For example, ``LOAD_GLOBAL `` becomes
750+ ``LOAD_GLOBAL_MODULE ``, which directly accesses the module identically to
751+ normal imports.
752+
753+ The `pyperformance suite `_ confirms the implementation is performance-neutral.
754+
755+ .. _pyperformance suite : https://github.com/facebookexperimental/
756+ free-threading-benchmarking/blob/main/results/bm-20250922-3.15.0a0-27836e5/
757+ bm-20250922-vultr-x86_64-DinoV-lazy_imports-3.15.0a0-27836e5-vs-base.svg
758+
759+ Filter function performance
760+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
761+
762+ The filter function (set via ``sys.set_lazy_imports_filter() ``) is called for
763+ every *potentially lazy * import to determine whether it should actually be
764+ lazy. When no filter is set, this is simply a NULL check (testing whether a
765+ filter function has been registered), which is a highly predictable branch that
766+ adds essentially no overhead. When a filter is installed, it is called for each
767+ potentially lazy import, but this still has **almost no measurable performance
768+ cost **. To measure this, we benchmarked importing all 278 top-level importable
769+ modules from the Python standard library (which transitively loads 392 total
770+ modules including all submodules and dependencies), then forced reification of
771+ every loaded module to ensure everything was fully materialized.
772+
773+ Note that these measurements establish the baseline overhead of the filter
774+ mechanism itself. Of course, any user-defined filter function that performs
775+ additional work beyond a trivial check will add overhead proportional to the
776+ complexity of that work. However, we expect that in practice this overhead
777+ will be dwarfed by the performance benefits gained from avoiding unnecessary
778+ imports. The benchmarks below measure the minimal cost of the filter dispatch
779+ mechanism when the filter function does essentially nothing.
780+
781+ We compared four different configurations:
782+
783+ .. list-table ::
784+ :header-rows: 1
785+ :widths: 50 25 25
786+
787+ * - Configuration
788+ - Mean ± Std Dev (ms)
789+ - Overhead vs Baseline
790+ * - **Eager imports ** (baseline)
791+ - 161.2 ± 4.3
792+ - 0%
793+ * - **Lazy + filter forcing eager **
794+ - 161.7 ± 4.2
795+ - +0.3% ± 3.7%
796+ * - **Lazy + filter allowing lazy + reification **
797+ - 162.0 ± 4.0
798+ - +0.5% ± 3.7%
799+ * - **Lazy + no filter + reification **
800+ - 161.4 ± 4.3
801+ - +0.1% ± 3.8%
802+
803+ The four configurations:
804+
805+ 1. **Eager imports (baseline) **: Normal Python imports with no lazy machinery.
806+ Standard Python behavior.
807+
808+ 2. **Lazy + filter forcing eager **: Filter function returns ``False `` for all
809+ imports, forcing eager execution, then all imports are reified at script
810+ end. Measures pure filter calling overhead since every import goes through
811+ the filter but executes eagerly.
812+
813+ 3. **Lazy + filter allowing lazy + reification **: Filter function returns
814+ ``True `` for all imports, allowing lazy execution. All imports are reified
815+ at script end. Measures filter overhead when imports are actually lazy.
816+
817+ 4. **Lazy + no filter + reification **: No filter installed, imports are lazy
818+ and reified at script end. Baseline for lazy behavior without filter.
819+
820+ The benchmarks used `hyperfine <https://github.com/sharkdp/hyperfine >`_,
821+ testing 278 standard library modules. Each ran in a fresh Python process.
822+ All configurations force the import of exactly the same set of modules
823+ (all modules loaded by the eager baseline) to ensure a fair comparison.
824+
825+ The benchmark environment used CPU isolation with 32 logical CPUs (0-15 at
826+ 3200 MHz, 16-31 at 2400 MHz), the performance scaling governor, Turbo Boost
827+ disabled, and full ASLR randomization. The overhead error bars are computed
828+ using standard error propagation for the formula ``(value - baseline) /
829+ baseline ``, accounting for uncertainties in both the measured value and the
830+ baseline.
831+
832+ Startup time improvements
833+ ~~~~~~~~~~~~~~~~~~~~~~~~~~
834+
835+ The primary performance benefit of lazy imports is reduced startup time by
836+ loading only the modules actually used at runtime, rather than optimistically
837+ loading entire dependency trees at startup.
838+
839+ Real-world deployments at scale have demonstrated that the benefits can be
840+ massive, though of course this depends on the specific codebase and usage
841+ patterns. Organizations with large, interconnected codebases have reported
842+ substantial reductions in server reload times, ML training initialization,
843+ command-line tool startup, and Jupyter notebook loading. Memory usage
844+ improvements have also been observed as unused modules remain unloaded.
845+
846+ For detailed case studies and performance data from production deployments,
847+ see:
848+
849+ - `Python Lazy Imports With Cinder
850+ <https://developers.facebook.com/blog/post/2022/06/15/python-lazy-imports-with-cinder/> `__
851+ (Meta Instagram Server)
852+ - `Lazy is the new fast: How Lazy Imports and Cinder accelerate machine
853+ learning at Meta
854+ <https://engineering.fb.com/2024/01/18/developer-tools/lazy-imports-cinder-machine-learning-meta/> `__
855+ (Meta ML Workloads)
856+ - `Inside HRT's Python Fork
857+ <https://www.hudsonrivertrading.com/hrtbeat/inside-hrts-python-fork/> `__
858+ (Hudson River Trading)
859+
860+ The benefits scale with codebase complexity: the larger and more
861+ interconnected the codebase, the more dramatic the improvements.
862+
737863Typing and tools
738864----------------
739865
0 commit comments