feat: support np.random.Generator by flying-sheep · Pull Request #3983 · scverse/scanpy

flying-sheep · 2026-02-23T15:27:34Z

Closes Switch to numpy.random.Generator #3371
Tests included or not required because:

Release notes not necessary because:

The idea is to make our code behave as closely as possible to how it did, but make it read like modern code:

add decorator that converts random_state into rng argument (deduplicating them and using 0 by default)
add helpers that allow old behavior (e.g. for APIs that only take RandomState)
- _FakeRandomGen.wrap_global and _if_legacy_apply_global to conditionally replace np.random.seed and other global-state-mutating functions
- _legacy_random_state to get back a random_state argument for APIs that don’t take rng, e.g. scikit-learn stuff and "random_state" in adata.uns[thing]["params"]
after this PR: make feat: presets #3653 change the default behavior when a preset is passed.

I also didn’t convert the transformer argument to neighbors (yet?), or deprecated stuff like louvain or the external APIs.

Reviewing

First a short info abut how Generators work:

they can spawn independent children (doing so advances their internal state once)
all their other methods advance their internal state
they are reproducible in the same environment (when initialized with the same seed of course) but make no reproducibility guarantee across versions
the convention is to use rng: SeedLike | RNGLike | None = None for the argument, None meaning random initialization

Now questions to the reviewers:

Should we store the new RNG in adata.uns? If no, this fixes Passing a RandomState instance can cause failures to save #1131
Should we keep random_state in the docs?
How should we annotate that the default rng isn’t actually None but “a new instance of _LegacyRandom(0)” but people can pass rng=None to get the future default behavior?
Should I handle passing rng to neighbors transformer?
Did I miss other spots where rng can be passed?
Did I miss any spots where we called np.random.seed()

TODO:

add the decorator
add helpers for restarting (e.g. if 0 was passed, it’d be reused by the functions called in function body)
for the functions that called it: re-add the np.random.seed calls (maybe if isinstance(rng, _FakeRandomGen): gen = legacy_numpy_gen(rng) or so?)
- partially done, finish the work
add spawn to _FakeRandomGen (that does nothing) and use spawn for a tree structure
- ENH: Should there be an rng.clone() or similar? numpy/numpy#24086 (comment)
ingest

codecov · 2026-02-24T13:00:56Z

Codecov Report

❌ Patch coverage is 92.30769% with 26 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.21%. Comparing base (af57cff) to head (c745802).
⚠️ Report is 7 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
src/scanpy/plotting/_tools/paga.py	68.96%	9 Missing ⚠️
src/scanpy/_utils/random.py	88.88%	7 Missing ⚠️
src/scanpy/tools/_draw_graph.py	78.57%	6 Missing ⚠️
src/scanpy/neighbors/__init__.py	89.47%	2 Missing ⚠️
src/scanpy/preprocessing/_pca/__init__.py	87.50%	1 Missing ⚠️
src/scanpy/tools/_ingest.py	97.22%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3983      +/-   ##
==========================================
+ Coverage   77.96%   78.21%   +0.24%     
==========================================
  Files         118      119       +1     
  Lines       12517    12676     +159     
==========================================
+ Hits         9759     9914     +155     
- Misses       2758     2762       +4

Flag	Coverage Δ
hatch-test.low-vers	`77.50% <92.30%> (+0.25%)`	⬆️
hatch-test.pre	`77.16% <91.71%> (+0.25%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
src/scanpy/_docs.py	`100.00% <100.00%> (ø)`
src/scanpy/datasets/_datasets.py	`91.26% <100.00%> (+0.36%)`	⬆️
src/scanpy/experimental/_docs.py	`100.00% <ø> (ø)`
src/scanpy/experimental/pp/_normalization.py	`94.18% <100.00%> (+0.13%)`	⬆️
src/scanpy/experimental/pp/_recipes.py	`100.00% <100.00%> (ø)`
src/scanpy/preprocessing/_deprecated/sampling.py	`100.00% <100.00%> (ø)`
src/scanpy/preprocessing/_pca/_compat.py	`100.00% <100.00%> (ø)`
src/scanpy/preprocessing/_recipes.py	`91.07% <100.00%> (+0.33%)`	⬆️
src/scanpy/preprocessing/_scrublet/__init__.py	`97.05% <100.00%> (+0.31%)`	⬆️
src/scanpy/preprocessing/_scrublet/core.py	`92.81% <100.00%> (+0.09%)`	⬆️
... and 19 more

... and 2 files with indirect coverage changes

src/scanpy/preprocessing/_pca/_compat.py

src/scanpy/preprocessing/_scrublet/sparse_utils.py

src/scanpy/tools/_umap.py

src/scanpy/neighbors/__init__.py

src/scanpy/preprocessing/_simple.py

src/scanpy/preprocessing/_pca/__init__.py

src/scanpy/preprocessing/_simple.py

ilan-gold · 2026-03-11T10:03:20Z

I think we do it when the user passed a transformer class or when using an explicitly transformer that supports it (sklearn?)

Could you check this out? My impression looking at our code was that there was no mechanism for "updating" a user-passed-in transformer i.e., giving it an RNG (state). So I'm not sure even what this would look like. You would use https://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html#sklearn.base.BaseEstimator.set_params or something?

flying-sheep · 2026-03-12T14:23:54Z

@ilan-gold ah yeah I assumed we supported passing a transformer class, but we only support passing an instance.

scverse-benchmark · 2026-03-12T14:56:27Z

Benchmark changes

Change	Before [`af57cff`]	After [`fbdc47e`]	Ratio	Benchmark (Parameter)
+	175±1ms	205±0.4ms	1.17	preprocessing_counts.PreprocessingCountsRngSuite.time_downsample_per_cell('pbmc3k', 'random_state')
+	11.2±0.05ms	25.1±0.1ms	2.25	preprocessing_counts.PreprocessingCountsRngSuite.time_downsample_per_cell('pbmc68k_reduced', 'random_state')
-	306±6ms	188±2ms	0.61	preprocessing_counts.PreprocessingCountsRngSuite.time_downsample_total('pbmc3k', 'random_state')
-	16.1±0.3ms	12.4±0.4ms	0.77	preprocessing_counts.PreprocessingCountsRngSuite.time_downsample_total('pbmc68k_reduced', 'random_state')

Comparison: https://github.com/scverse/scanpy/compare/af57cffc6eb7fa77618b2ab026231f72cd029c12..fbdc47e5c7315e36b224779d1635fc039348df34
Last changed: Fri, 20 Mar 2026 13:41:05 +0000

More details: https://github.com/scverse/scanpy/pull/3983/checks?check_run_id=67904739742

src/scanpy/tools/_umap.py

src/scanpy/preprocessing/_pca/__init__.py

src/scanpy/preprocessing/_pca/_compat.py

selmanozleyen

I think adding TODO's and issues will clarify the spawn usages are for possible parallel implementations.

Also we need to clarify if we will create rng per core or independent component.

src/scanpy/neighbors/__init__.py

flying-sheep · 2026-03-18T17:22:42Z

I tried to parallelize downsample and ran into a bunch of numba limitations and bugs: #4004

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

for more information, see https://pre-commit.ci

flying-sheep · 2026-03-20T13:05:32Z

Since it’s impossible to parallelize downsampling with replacement=False using numba right now, I just removed the spawning. If we merge #4004, it’ll change things anyway, no matter how it looks.

flying-sheep added 2 commits February 23, 2026 16:26

feat: support np.random.Generator

de9c481

add decorator

8ab6661

flying-sheep added 7 commits February 24, 2026 14:19

scrublet

1ef8780

almost done

5308a1a

Merge branch 'main' into pa/rng

93a8c0b

fix scrublet_simulate_doublets

32b3ddc

fix _RNGIgraph compat

c3da2bb

Merge branch 'main' into pa/rng

2c82b67

whoops

bd85d95

flying-sheep added this to the 1.13.0 milestone Feb 26, 2026

relnote

8247cdb

flying-sheep marked this pull request as ready for review February 26, 2026 12:55

flying-sheep added 5 commits February 26, 2026 14:37

don’t store rng in random_state arg

47f3ceb

make consistent

1e43b2a

use sub-generators

64a0f26

docs

baf2c85

paga

7e2fab5

flying-sheep requested review from ilan-gold and selmanozleyen February 27, 2026 13:22

selmanozleyen reviewed Feb 27, 2026

View reviewed changes

src/scanpy/preprocessing/_pca/_compat.py Outdated Show resolved Hide resolved

selmanozleyen reviewed Feb 27, 2026

View reviewed changes

src/scanpy/preprocessing/_scrublet/sparse_utils.py Outdated Show resolved Hide resolved

selmanozleyen reviewed Feb 27, 2026

View reviewed changes

src/scanpy/tools/_umap.py Show resolved Hide resolved

selmanozleyen reviewed Feb 27, 2026

View reviewed changes

src/scanpy/neighbors/__init__.py Show resolved Hide resolved

selmanozleyen reviewed Feb 27, 2026

View reviewed changes

src/scanpy/preprocessing/_simple.py Show resolved Hide resolved

selmanozleyen reviewed Feb 27, 2026

View reviewed changes

src/scanpy/preprocessing/_pca/__init__.py Outdated Show resolved Hide resolved

flying-sheep added 2 commits February 27, 2026 16:19

test

8ad699a

Selman’s findings

a4b2d12

flying-sheep commented Feb 27, 2026

View reviewed changes

src/scanpy/preprocessing/_pca/__init__.py Outdated Show resolved Hide resolved

selmanozleyen reviewed Feb 27, 2026

View reviewed changes

src/scanpy/preprocessing/_simple.py Outdated Show resolved Hide resolved

flying-sheep added 3 commits March 12, 2026 12:43

no spawning without loops/parallel

c98ce73

test

93a9d90

undo spawn param

11155c3

flying-sheep added the benchmark label Mar 12, 2026

flying-sheep mentioned this pull request Mar 12, 2026

Add Harmony to scanpy #3953

Open

fix pca

4570320

flying-sheep requested review from ilan-gold and selmanozleyen March 13, 2026 09:38

flying-sheep added 3 commits March 13, 2026 11:30

more bench

d4814ee

fix tests

4d82e6b

whoops

65fb583

ilan-gold reviewed Mar 13, 2026

View reviewed changes

src/scanpy/tools/_umap.py Outdated Show resolved Hide resolved

src/scanpy/preprocessing/_pca/__init__.py Outdated Show resolved Hide resolved

src/scanpy/preprocessing/_pca/_compat.py Show resolved Hide resolved

no rng warning

d8d0e80

selmanozleyen approved these changes Mar 16, 2026

View reviewed changes

ilan-gold reviewed Mar 18, 2026

View reviewed changes

src/scanpy/neighbors/__init__.py Outdated Show resolved Hide resolved

src/scanpy/neighbors/__init__.py Show resolved Hide resolved

flying-sheep removed the benchmark label Mar 20, 2026

flying-sheep and others added 4 commits March 20, 2026 11:32

Update src/scanpy/neighbors/__init__.py

a491b3a

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

f736f2d

for more information, see https://pre-commit.ci

don’t store random_state metadata if it’s ignored

1596277

comment on RNG spawning

fbdc47e

flying-sheep added the benchmark label Mar 20, 2026

fix tests

c745802

flying-sheep removed the benchmark label Mar 20, 2026

flying-sheep merged commit 9de11b1 into main Mar 20, 2026
13 of 14 checks passed

flying-sheep deleted the pa/rng branch March 20, 2026 14:01

Conversation

flying-sheep commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewing

TODO:

Uh oh!

codecov bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ilan-gold commented Mar 11, 2026

Uh oh!

flying-sheep commented Mar 12, 2026

Uh oh!

scverse-benchmark bot commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

selmanozleyen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

flying-sheep commented Mar 18, 2026

Uh oh!

flying-sheep commented Mar 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

flying-sheep commented Feb 23, 2026 •

edited

Loading

codecov bot commented Feb 24, 2026 •

edited

Loading

scverse-benchmark bot commented Mar 12, 2026 •

edited

Loading