Skip to content

Comments

Improve transparency by warning when max_cohort_size triggers downsampling#930

Open
adilraza99 wants to merge 2 commits intomalariagen:masterfrom
adilraza99:GH912-cohort-downsampling-warning
Open

Improve transparency by warning when max_cohort_size triggers downsampling#930
adilraza99 wants to merge 2 commits intomalariagen:masterfrom
adilraza99:GH912-cohort-downsampling-warning

Conversation

@adilraza99
Copy link
Contributor

Summary

This PR improves transparency when cohort downsampling occurs.

Currently, if max_cohort_size is specified and the number of available
samples exceeds this value, the cohort is silently downsampled. Because
no notification is emitted, users may incorrectly assume all available
samples were used, which can affect reproducibility and interpretation.

This change emits a UserWarning when downsampling is triggered.

Fixes #912

When n_samples > max_cohort_size, the dataset is randomly downsampled
without notification. This adds a UserWarning explaining the original
and new sample counts, and how to disable downsampling.

No change to logic, defaults, or returned data.
@adilraza99
Copy link
Contributor Author

Hi @jonbrenas

This PR adds a UserWarning when max_cohort_size triggers downsampling,
making the behavior transparent while preserving reproducibility.

All checks are passing. Would appreciate your review.
Thanks!

Copy link
Collaborator

@jonbrenas jonbrenas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@adilraza99
Copy link
Contributor Author

Hi @jonbrenas,

Thanks again for your review and guidance.

If you know of any important issues where additional help would be useful, I’d be happy to work on them.

Appreciate your time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Silent cohort downsampling when max_cohort_size is exceeded reduces transparency and reproducibility

2 participants