Improve transparency by warning when max_cohort_size triggers downsampling#930
Open
adilraza99 wants to merge 2 commits intomalariagen:masterfrom
Open
Improve transparency by warning when max_cohort_size triggers downsampling#930adilraza99 wants to merge 2 commits intomalariagen:masterfrom
adilraza99 wants to merge 2 commits intomalariagen:masterfrom
Conversation
When n_samples > max_cohort_size, the dataset is randomly downsampled without notification. This adds a UserWarning explaining the original and new sample counts, and how to disable downsampling. No change to logic, defaults, or returned data.
Contributor
Author
|
Hi @jonbrenas This PR adds a UserWarning when All checks are passing. Would appreciate your review. |
Contributor
Author
|
Hi @jonbrenas, Thanks again for your review and guidance. If you know of any important issues where additional help would be useful, I’d be happy to work on them. Appreciate your time! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR improves transparency when cohort downsampling occurs.
Currently, if
max_cohort_sizeis specified and the number of availablesamples exceeds this value, the cohort is silently downsampled. Because
no notification is emitted, users may incorrectly assume all available
samples were used, which can affect reproducibility and interpretation.
This change emits a
UserWarningwhen downsampling is triggered.Fixes #912