Skip to content

feat: support np.random.Generator#3983

Merged
flying-sheep merged 36 commits intomainfrom
pa/rng
Mar 20, 2026
Merged

feat: support np.random.Generator#3983
flying-sheep merged 36 commits intomainfrom
pa/rng

Conversation

@flying-sheep
Copy link
Member

@flying-sheep flying-sheep commented Feb 23, 2026

The idea is to make our code behave as closely as possible to how it did, but make it read like modern code:

  1. add decorator that converts random_state into rng argument (deduplicating them and using 0 by default)
  2. add helpers that allow old behavior (e.g. for APIs that only take RandomState)
    • _FakeRandomGen.wrap_global and _if_legacy_apply_global to conditionally replace np.random.seed and other global-state-mutating functions
    • _legacy_random_state to get back a random_state argument for APIs that don’t take rng, e.g. scikit-learn stuff and "random_state" in adata.uns[thing]["params"]
  3. after this PR: make feat: presets #3653 change the default behavior when a preset is passed.

I also didn’t convert the transformer argument to neighbors (yet?), or deprecated stuff like louvain or the external APIs.

Reviewing

First a short info abut how Generators work:

  • they can spawn independent children (doing so advances their internal state once)
  • all their other methods advance their internal state
  • they are reproducible in the same environment (when initialized with the same seed of course) but make no reproducibility guarantee across versions
  • the convention is to use rng: SeedLike | RNGLike | None = None for the argument, None meaning random initialization

Now questions to the reviewers:

  • Should we store the new RNG in adata.uns? If no, this fixes Passing a RandomState instance can cause failures to save #1131
  • Should we keep random_state in the docs?
  • How should we annotate that the default rng isn’t actually None but “a new instance of _LegacyRandom(0)” but people can pass rng=None to get the future default behavior?
  • Should I handle passing rng to neighbors transformer?
  • Did I miss other spots where rng can be passed?
  • Did I miss any spots where we called np.random.seed()

TODO:

  • add the decorator
  • add helpers for restarting (e.g. if 0 was passed, it’d be reused by the functions called in function body)
  • for the functions that called it: re-add the np.random.seed calls (maybe if isinstance(rng, _FakeRandomGen): gen = legacy_numpy_gen(rng) or so?)
    • partially done, finish the work
  • add spawn to _FakeRandomGen (that does nothing) and use spawn for a tree structure
  • ingest

@codecov
Copy link

codecov bot commented Feb 24, 2026

Codecov Report

❌ Patch coverage is 92.30769% with 26 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.21%. Comparing base (af57cff) to head (c745802).
⚠️ Report is 7 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/scanpy/plotting/_tools/paga.py 68.96% 9 Missing ⚠️
src/scanpy/_utils/random.py 88.88% 7 Missing ⚠️
src/scanpy/tools/_draw_graph.py 78.57% 6 Missing ⚠️
src/scanpy/neighbors/__init__.py 89.47% 2 Missing ⚠️
src/scanpy/preprocessing/_pca/__init__.py 87.50% 1 Missing ⚠️
src/scanpy/tools/_ingest.py 97.22% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3983      +/-   ##
==========================================
+ Coverage   77.96%   78.21%   +0.24%     
==========================================
  Files         118      119       +1     
  Lines       12517    12676     +159     
==========================================
+ Hits         9759     9914     +155     
- Misses       2758     2762       +4     
Flag Coverage Δ
hatch-test.low-vers 77.50% <92.30%> (+0.25%) ⬆️
hatch-test.pre 77.16% <91.71%> (+0.25%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/scanpy/_docs.py 100.00% <100.00%> (ø)
src/scanpy/datasets/_datasets.py 91.26% <100.00%> (+0.36%) ⬆️
src/scanpy/experimental/_docs.py 100.00% <ø> (ø)
src/scanpy/experimental/pp/_normalization.py 94.18% <100.00%> (+0.13%) ⬆️
src/scanpy/experimental/pp/_recipes.py 100.00% <100.00%> (ø)
src/scanpy/preprocessing/_deprecated/sampling.py 100.00% <100.00%> (ø)
src/scanpy/preprocessing/_pca/_compat.py 100.00% <100.00%> (ø)
src/scanpy/preprocessing/_recipes.py 91.07% <100.00%> (+0.33%) ⬆️
src/scanpy/preprocessing/_scrublet/__init__.py 97.05% <100.00%> (+0.31%) ⬆️
src/scanpy/preprocessing/_scrublet/core.py 92.81% <100.00%> (+0.09%) ⬆️
... and 19 more

... and 2 files with indirect coverage changes

@flying-sheep flying-sheep added this to the 1.13.0 milestone Feb 26, 2026
@flying-sheep flying-sheep marked this pull request as ready for review February 26, 2026 12:55
@ilan-gold
Copy link
Contributor

I think we do it when the user passed a transformer class or when using an explicitly transformer that supports it (sklearn?)

Could you check this out? My impression looking at our code was that there was no mechanism for "updating" a user-passed-in transformer i.e., giving it an RNG (state). So I'm not sure even what this would look like. You would use https://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html#sklearn.base.BaseEstimator.set_params or something?

@flying-sheep
Copy link
Member Author

@ilan-gold ah yeah I assumed we supported passing a transformer class, but we only support passing an instance.

@scverse-benchmark
Copy link

scverse-benchmark bot commented Mar 12, 2026

Benchmark changes

Change Before [af57cff] After [fbdc47e] Ratio Benchmark (Parameter)
+ 175±1ms 205±0.4ms 1.17 preprocessing_counts.PreprocessingCountsRngSuite.time_downsample_per_cell('pbmc3k', 'random_state')
+ 11.2±0.05ms 25.1±0.1ms 2.25 preprocessing_counts.PreprocessingCountsRngSuite.time_downsample_per_cell('pbmc68k_reduced', 'random_state')
- 306±6ms 188±2ms 0.61 preprocessing_counts.PreprocessingCountsRngSuite.time_downsample_total('pbmc3k', 'random_state')
- 16.1±0.3ms 12.4±0.4ms 0.77 preprocessing_counts.PreprocessingCountsRngSuite.time_downsample_total('pbmc68k_reduced', 'random_state')

Comparison: https://github.com/scverse/scanpy/compare/af57cffc6eb7fa77618b2ab026231f72cd029c12..fbdc47e5c7315e36b224779d1635fc039348df34
Last changed: Fri, 20 Mar 2026 13:41:05 +0000

More details: https://github.com/scverse/scanpy/pull/3983/checks?check_run_id=67904739742

Copy link
Member

@selmanozleyen selmanozleyen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding TODO's and issues will clarify the spawn usages are for possible parallel implementations.

Also we need to clarify if we will create rng per core or independent component.

@flying-sheep
Copy link
Member Author

I tried to parallelize downsample and ran into a bunch of numba limitations and bugs: #4004

@flying-sheep
Copy link
Member Author

Since it’s impossible to parallelize downsampling with replacement=False using numba right now, I just removed the spawning. If we merge #4004, it’ll change things anyway, no matter how it looks.

@flying-sheep flying-sheep merged commit 9de11b1 into main Mar 20, 2026
13 of 14 checks passed
@flying-sheep flying-sheep deleted the pa/rng branch March 20, 2026 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Switch to numpy.random.Generator Passing a RandomState instance can cause failures to save

3 participants