feat: support np.random.Generator#3983
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #3983 +/- ##
==========================================
+ Coverage 77.96% 78.21% +0.24%
==========================================
Files 118 119 +1
Lines 12517 12676 +159
==========================================
+ Hits 9759 9914 +155
- Misses 2758 2762 +4
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Could you check this out? My impression looking at our code was that there was no mechanism for "updating" a user-passed-in transformer i.e., giving it an RNG (state). So I'm not sure even what this would look like. You would use https://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html#sklearn.base.BaseEstimator.set_params or something? |
|
@ilan-gold ah yeah I assumed we supported passing a transformer class, but we only support passing an instance. |
Benchmark changes
Comparison: https://github.com/scverse/scanpy/compare/af57cffc6eb7fa77618b2ab026231f72cd029c12..fbdc47e5c7315e36b224779d1635fc039348df34 More details: https://github.com/scverse/scanpy/pull/3983/checks?check_run_id=67904739742 |
selmanozleyen
left a comment
There was a problem hiding this comment.
I think adding TODO's and issues will clarify the spawn usages are for possible parallel implementations.
Also we need to clarify if we will create rng per core or independent component.
|
I tried to parallelize downsample and ran into a bunch of numba limitations and bugs: #4004 |
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
for more information, see https://pre-commit.ci
|
Since it’s impossible to parallelize downsampling with |
numpy.random.Generator#3371The idea is to make our code behave as closely as possible to how it did, but make it read like modern code:
random_stateintorngargument (deduplicating them and using 0 by default)RandomState)_FakeRandomGen.wrap_globaland_if_legacy_apply_globalto conditionally replacenp.random.seedand other global-state-mutating functions_legacy_random_stateto get back arandom_stateargument for APIs that don’t takerng, e.g.scikit-learnstuff and"random_state"inadata.uns[thing]["params"]I also didn’t convert the
transformerargument toneighbors(yet?), or deprecated stuff likelouvainor theexternalAPIs.Reviewing
First a short info abut how Generators work:
spawnindependent children (doing so advances their internal state once)rng: SeedLike | RNGLike | None = Nonefor the argument,Nonemeaning random initializationNow questions to the reviewers:
adata.uns? If no, this fixes Passing a RandomState instance can cause failures to save #1131random_statein the docs?rngisn’t actuallyNonebut “a new instance of_LegacyRandom(0)” but people can passrng=Noneto get the future default behavior?rngto neighbors transformer?rngcan be passed?np.random.seed()TODO:
np.random.seedcalls (maybeif isinstance(rng, _FakeRandomGen): gen = legacy_numpy_gen(rng)or so?)spawnto_FakeRandomGen(that does nothing) and usespawnfor a tree structurerng.clone()or similar? numpy/numpy#24086 (comment)