Describe the bug
This bug came up in the process of generating this notebook for our recent preprint. As part of the workflow, we fit a simple model with two design factors. One design factor has two levels; the other has seventy.
In 0.4.8, the procedure takes about four minutes, the vast majority of it spent on fitting dispersions. The mean/dispersion curve looks OK.
In 0.5.1, the dispersion fitting fails.
To Reproduce
Notebook here. Test data here.
New version:
anndata==0.11.4
matplotlib==3.10.3
numpy==2.2.6
pandas==2.2.3
pydeseq2==0.5.1
scanpy==1.11.1
scipy==1.15.3
tqdm==4.67.1
python==3.13.2
Old version:
tqdm==4.67.1
scipy==1.11.4
scanpy==1.9.6
pydeseq2==0.4.8
pandas==2.2.3
numpy==1.26.2
matplotlib==3.8.2
anndata==0.10.3
python==3.9.7
Expected behavior
- Most immediately, the dispersion estimation should probably not fail, given that it works in an older version.
- If the dispersion fitting procedure changed at some point between 0.4.8 and 0.5.1, it would be good if this were documented. But the only entries I see relate to vst (unrelated) and refitting (also unrelated).
- I do not expect the dispersion computation to scale so unfavorably with the number of levels. The example shown here takes a few minutes for
dds_f, which has 70 levels, but over an hour for dds_m, which has a little over 200 levels. This is about a 20x increase in runtime for a 2.8x increase in the size of the design matrix.
Screenshots
See above.
Desktop (please complete the following information):
- OS: Linux-5.10.93-87.444.amzn2.x86_64-x86_64-with-glibc2.26
Describe the bug
This bug came up in the process of generating this notebook for our recent preprint. As part of the workflow, we fit a simple model with two design factors. One design factor has two levels; the other has seventy.
In 0.4.8, the procedure takes about four minutes, the vast majority of it spent on fitting dispersions. The mean/dispersion curve looks OK.
In 0.5.1, the dispersion fitting fails.
To Reproduce
Notebook here. Test data here.
New version:
Old version:
Expected behavior
dds_f, which has 70 levels, but over an hour fordds_m, which has a little over 200 levels. This is about a 20x increase in runtime for a 2.8x increase in the size of the design matrix.Screenshots
See above.
Desktop (please complete the following information):