Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
271 commits
Select commit Hold shift + click to select a range
3ca7ea4
fix: do not require pydantic
JoshLoecker Dec 17, 2024
cb896b9
fix: use new file name
JoshLoecker Dec 17, 2024
d620960
fix: zfpkm should be performed on rows, not cols
JoshLoecker Dec 17, 2024
9541824
style: reorganize code
JoshLoecker Dec 17, 2024
4ba45e0
style: rename arguments
JoshLoecker Dec 17, 2024
0a0824e
fix: ignore mismatched types
JoshLoecker Dec 17, 2024
2dbebb6
fix: misspelling
JoshLoecker Dec 17, 2024
cf06da4
feat: move zscore graphing to its own module
JoshLoecker Dec 25, 2024
e14af0c
fix: explicit import graphing
JoshLoecker Dec 25, 2024
a509786
fix: hotfix - zscore ceiling is incorrect
JoshLoecker Dec 25, 2024
941e073
chore(deps): bump astral-sh/ruff-action from 2 to 3 (#213)
dependabot[bot] Jan 6, 2025
fdfd0ec
chore(deps): bump astral-sh/setup-uv from 4 to 5
dependabot[bot] Jan 6, 2025
6d74649
Merge pull request #212 from HelikarLab/dependabot/github_actions/ast…
JoshLoecker Jan 6, 2025
fb384ec
fix: pin numpy to less than version 2; update scanpy
JoshLoecker Jan 12, 2025
e659eb2
refactor!: moved `return_placeholder_data` to `como.utils`
JoshLoecker Jan 12, 2025
b5d0a91
fix: properly return row and column values
JoshLoecker Jan 12, 2025
4079deb
feat: added function to set up logging throughout COMO
JoshLoecker Jan 12, 2025
bf7f139
feat: added function to log and raise an error
JoshLoecker Jan 12, 2025
237d1fc
refactor!: moved Algorithms to data_types
JoshLoecker Jan 12, 2025
34ccd44
refactor!: moved Compartments to data_types
JoshLoecker Jan 12, 2025
038ab17
refactor: re-organized constants
JoshLoecker Jan 12, 2025
c337e4d
fix: removed items from __all__ that were moved
JoshLoecker Jan 12, 2025
99676ed
refactor: imported required packages and types
JoshLoecker Jan 12, 2025
8662f7c
fix: added Algorithm as top level import
JoshLoecker Jan 12, 2025
c454e8d
fix: added CobraCompartments as top-level import
JoshLoecker Jan 12, 2025
510c77c
fix: process ids as string type
JoshLoecker Jan 12, 2025
3e93488
refactor: combine async file reading
JoshLoecker Jan 12, 2025
e99d3bd
refactor: remove un-used function
JoshLoecker Jan 12, 2025
c1c6d00
refactor: fix async reading of data
JoshLoecker Jan 12, 2025
5c912b3
feat: allow passing strings for dataframes
JoshLoecker Jan 12, 2025
e942b85
fix: importing of required and unused modules
JoshLoecker Jan 12, 2025
0345a14
feat: use new log_and_raise_error
JoshLoecker Jan 12, 2025
4d66cd3
feat: set appropriate logging
JoshLoecker Jan 12, 2025
40cdfd4
feat: combine lines async file reading
JoshLoecker Jan 12, 2025
509ed96
feat: create Enum from rna types
JoshLoecker Jan 12, 2025
55c04d1
fix: use new RNA type enum
JoshLoecker Jan 12, 2025
d7ecd50
refactor: convert literals to Enums
JoshLoecker Jan 12, 2025
c69fd1a
fix: re-organize code
JoshLoecker Jan 12, 2025
189cfe5
fix: added dataclasses for simplification of data
JoshLoecker Jan 12, 2025
4008dc2
feat: use _log_and_raise_error function
JoshLoecker Jan 12, 2025
111465d
fix: drop na values
JoshLoecker Jan 12, 2025
14fb0fa
feat: async read matrix files
JoshLoecker Jan 12, 2025
f632679
refactor: calculate size more pythonically
JoshLoecker Jan 12, 2025
5690616
fix: use better variable for ensembl gene
JoshLoecker Jan 12, 2025
746099b
refactor: drop na values
JoshLoecker Jan 12, 2025
132660f
chore: fix logging message
JoshLoecker Jan 12, 2025
1f7d252
feat: use new LogLevel Enum
JoshLoecker Jan 12, 2025
7451732
fix: resolve context directory only if provided
JoshLoecker Jan 12, 2025
58b66e9
refactor: import/remove modules
JoshLoecker Jan 12, 2025
5c7e3b2
refactor: remove extra read_counts function
JoshLoecker Jan 12, 2025
369b5a1
fix: drop na values when building compiling matrix
JoshLoecker Jan 12, 2025
e708644
fix: fpkm calculations
JoshLoecker Jan 12, 2025
002c705
style: ruff formatting
JoshLoecker Jan 12, 2025
f8bf998
style: fix docstring
JoshLoecker Jan 12, 2025
e7f65e0
fix: zfpkm calculations
JoshLoecker Jan 12, 2025
93e9844
feat: use concurrent.futures for easier processing
JoshLoecker Jan 12, 2025
d6cb677
feat: allow setting bandwidth and peak parameters
JoshLoecker Jan 12, 2025
b2e5fe6
style: ruff whitespace fixes
JoshLoecker Jan 12, 2025
720543f
refactor: more efficient zfpkm plotting
JoshLoecker Jan 12, 2025
1daa66e
feat: use new RNAType Enum
JoshLoecker Jan 12, 2025
0b005b7
fix: drop na values
JoshLoecker Jan 12, 2025
cb6e995
refactor: provide bandwidth and peak parameters
JoshLoecker Jan 12, 2025
8c16d94
feat: allow force-plotting zfpkm
JoshLoecker Jan 12, 2025
97c7d1d
refactor: rename FilteringTechnique constants
JoshLoecker Jan 12, 2025
b67cd84
style: ruff whitespace and log message formatting
JoshLoecker Jan 12, 2025
638de62
refactor: use _process, not _save_rnaseq_tests
JoshLoecker Jan 12, 2025
ba85237
refactor: provide path to rnaseq matrix
JoshLoecker Jan 12, 2025
681e871
refactor: process provided rnaseq matrix
JoshLoecker Jan 12, 2025
f4d6be9
fix: proper check if dataframe is empty/None
JoshLoecker Jan 12, 2025
54cd8ac
fix: only write normalized matrix if data exists
JoshLoecker Jan 12, 2025
ea9654d
fix: proper calculation of high confidence genes
JoshLoecker Jan 12, 2025
e663406
refactor: remove internal _create_metadata_df func
JoshLoecker Jan 12, 2025
fda629e
refactor: proper usage of _set_up_logging function
JoshLoecker Jan 12, 2025
59f7f95
fix: proper usage of _log_and_raise_error function
JoshLoecker Jan 12, 2025
3b63b7d
refactor: proper processing of metadata
JoshLoecker Jan 12, 2025
0ff2eb7
refactor: move logic to out of main function
JoshLoecker Jan 12, 2025
7770643
fix: add proper function parameters
JoshLoecker Jan 12, 2025
bdc0f05
refactor: import required modules
JoshLoecker Jan 12, 2025
475b27a
fix: proper calling of _log_and_raise_error
JoshLoecker Jan 12, 2025
2c65b08
feat: allow providing log level and location
JoshLoecker Jan 12, 2025
e9d9193
refactor: import required modules
JoshLoecker Jan 12, 2025
632bfa8
refactor: remove data-specific Enums
JoshLoecker Jan 12, 2025
f9669a6
refactor: async data loading
JoshLoecker Jan 12, 2025
938f6cb
refactor: use new RNAType; _log_and_raise_error
JoshLoecker Jan 12, 2025
ed2044e
refactor: drop na values
JoshLoecker Jan 12, 2025
6074d6b
feat: add logging
JoshLoecker Jan 12, 2025
9b52049
refactor: type hinting; drop na values
JoshLoecker Jan 12, 2025
38a9f1e
feat: added function to update missing genome data
JoshLoecker Jan 12, 2025
11cd6b4
refactor: build and use new parameter data types
JoshLoecker Jan 12, 2025
0d484a2
refactor: pass new parameter data types
JoshLoecker Jan 12, 2025
07fc452
refactor: calculate variables near their usage
JoshLoecker Jan 12, 2025
c371c12
style: more verbose variable name
JoshLoecker Jan 12, 2025
026f1cc
feat: add functions for building batch data
JoshLoecker Jan 12, 2025
ebf3edd
feat: add function for validating data soruces
JoshLoecker Jan 12, 2025
0f1d584
refactor!: provide None as default argument
JoshLoecker Jan 12, 2025
f0a2aab
refactor: better building of source data variables
JoshLoecker Jan 12, 2025
b9bade6
feat: allow providing log levela and location
JoshLoecker Jan 12, 2025
325b559
refactor: move variable creation closer to usage
JoshLoecker Jan 12, 2025
463d545
refactor: use _log_and_raise_error
JoshLoecker Jan 12, 2025
f1c242d
refactor: import required modules
JoshLoecker Jan 12, 2025
437082a
refactor!: move Solver to data_types
JoshLoecker Jan 12, 2025
4694a5a
refactor: remove _Arguments cli parsing
JoshLoecker Jan 12, 2025
91a538f
fix: bracket and logical gene rule creation
JoshLoecker Jan 12, 2025
675a26b
refactor: reduce extraneous function usage
JoshLoecker Jan 12, 2025
618edc7
style: ruff whitespace formatting
JoshLoecker Jan 12, 2025
70371fe
refactor: remove extraneous variable
JoshLoecker Jan 12, 2025
cd36b5e
refactor: pythonic if-statement calculation
JoshLoecker Jan 12, 2025
d5f1fdf
refactor: remove extraneous comment + variables
JoshLoecker Jan 12, 2025
b586e25
style: more verbose parameter names
JoshLoecker Jan 12, 2025
498e84a
refactor: async usage where possible
JoshLoecker Jan 12, 2025
1668b35
refactor: type hinting
JoshLoecker Jan 12, 2025
f162b94
refactor: pythonic collection of gene activity
JoshLoecker Jan 12, 2025
52f335f
fix: update mapping process to be more readable
JoshLoecker Jan 12, 2025
eab0713
refactor: import required modules
JoshLoecker Jan 12, 2025
23a01f0
refactor: move dataclasses to data_types.py
JoshLoecker Jan 12, 2025
a207fcc
refactor: make async functions
JoshLoecker Jan 12, 2025
5d918c5
refactor: simplify processing, use async
JoshLoecker Jan 12, 2025
97cc2a9
refactor: use relevant data types
JoshLoecker Jan 12, 2025
aabcb8f
refactor: pythonic approach to z score calculation
JoshLoecker Jan 12, 2025
0a795e1
refactor: async create directories
JoshLoecker Jan 12, 2025
fdf415f
refactor: move when early return happens
JoshLoecker Jan 12, 2025
8c1ef04
fix: remove unused z score calculation
JoshLoecker Jan 12, 2025
60eb969
fix: store gene ids as string
JoshLoecker Jan 12, 2025
9cf76c2
refactor: more efficient pandas melting
JoshLoecker Jan 12, 2025
c7d1477
fix: graph title name
JoshLoecker Jan 12, 2025
3e38a0f
refactor: add appropriate column names
JoshLoecker Jan 12, 2025
e323a9a
refactor: rename parameter names
JoshLoecker Jan 12, 2025
b575d40
refactor: log for early return
JoshLoecker Jan 12, 2025
b2d7b24
feat: async create directories
JoshLoecker Jan 12, 2025
d847da2
style: add logging
JoshLoecker Jan 12, 2025
3a47f20
refactor: remove old z score calculation code
JoshLoecker Jan 12, 2025
229ea7c
refactor: better function name
JoshLoecker Jan 12, 2025
77c9ff1
refactor: pythonic z score calculation
JoshLoecker Jan 12, 2025
3df7503
style: log for early return
JoshLoecker Jan 12, 2025
7a41401
refactor: remove old z score calculation code
JoshLoecker Jan 12, 2025
fc7b95b
refactor: more efficient pandas melting
JoshLoecker Jan 12, 2025
78a1ab6
refactor: temporarily remove z score graphing
JoshLoecker Jan 12, 2025
f34b1db
refactor: use new function names
JoshLoecker Jan 12, 2025
71ee8cc
refactor: use matplotlib for plot creation
JoshLoecker Jan 12, 2025
b6cf039
refactor: exchange plotly for seaborn
JoshLoecker Jan 12, 2025
4c424b5
feat: validate that source types process in order
JoshLoecker Jan 12, 2025
22b82ef
refactor: add logging, use _log_and_raise_error
JoshLoecker Jan 12, 2025
82ce6b3
refactor: remove plotly graphing
JoshLoecker Jan 12, 2025
b846efd
feat: do not throw error if missing png path
JoshLoecker Jan 12, 2025
a43485f
refactor: move NamedTuple to data_types
JoshLoecker Jan 12, 2025
0264c3c
fix: comment unused variables
JoshLoecker Jan 12, 2025
be4e89e
fix: comment unused variables
JoshLoecker Jan 12, 2025
ab96976
fix: line too long
JoshLoecker Jan 12, 2025
8a0f4c6
style: ruff formatting and linting fixes
JoshLoecker Jan 12, 2025
7e53dac
fix: allow undocumented public package
JoshLoecker Jan 12, 2025
bd65dbe
fix: do not create virtual environment to format notebooks
JoshLoecker Jan 12, 2025
e419816
fix: use python 3.10 to evade errors
JoshLoecker Jan 12, 2025
7fa8d00
fix: use uv tool to run nbconvert
JoshLoecker Jan 12, 2025
c6f02bc
style: format code, Jupyter Notebook(s), and Python imports with `ruff`
JoshLoecker Jan 12, 2025
0646150
feat: expand test suite to include python 3.11 and 3.12
JoshLoecker Jan 12, 2025
3412c31
Merge remote-tracking branch 'origin/fix/single-cell-processing' into…
JoshLoecker Jan 12, 2025
9a72ae5
fix: check if provided data is a path before checking if it exists
JoshLoecker Jan 12, 2025
8259105
fix: read StringIO data instead of attempting to make dataframe from it
JoshLoecker Jan 12, 2025
330ed84
fix: statsmodels version dependent on python version
JoshLoecker Jan 12, 2025
b6b6633
fix: use tpm instead of quantile
JoshLoecker Jan 13, 2025
21f780b
fix: typo
JoshLoecker Jan 13, 2025
b119787
chore: update uv lock
JoshLoecker Jan 13, 2025
a441231
chore: code re-arrangement
JoshLoecker Jan 13, 2025
09e1852
chore: remove aiofiles as a dependency
JoshLoecker Jan 13, 2025
80f32d4
fix: renamed RNAType.(trna,mrna) to RNAType.(TRNA,MRNA)
JoshLoecker Jan 14, 2025
2c1687b
fix: re-attempt processing if JSON decode error occurs
JoshLoecker Jan 14, 2025
0f317f3
refactor: remove dependency on aiofiles
JoshLoecker Jan 14, 2025
2229c51
fix: do not call listify twice
JoshLoecker Jan 14, 2025
9a6a42d
refactor: remove dependency on aiofiles
JoshLoecker Jan 14, 2025
f288015
refactor: allow providing a list of model filepaths to write to
JoshLoecker Jan 14, 2025
c0d3816
fix: check that license information is present if using gurobi
JoshLoecker Jan 14, 2025
df86903
feat: add NONE for no logging
JoshLoecker Jan 14, 2025
f925078
refactor: remove dependency on aiofiles
JoshLoecker Jan 14, 2025
13aa4ae
style: ruff formatting
JoshLoecker Jan 31, 2025
1ac47f0
fix: make parent directories before saving files
JoshLoecker Jan 31, 2025
72c3630
refactor: use critical log instead of warning
JoshLoecker Jan 31, 2025
8be741d
refactor: reset exection output and count
JoshLoecker Jan 31, 2025
35a0bf2
fix: rename column names
JoshLoecker Jan 31, 2025
13f3253
chore: increase line length
JoshLoecker Jun 3, 2025
12aa4ad
refactor: updated to be more python-like
JoshLoecker Jun 3, 2025
a3c4b28
style: ruff formatting; error handling
JoshLoecker Jun 3, 2025
c0f5f00
refactor: remove unwrapping aio file reading for better consistency/r…
JoshLoecker Jun 3, 2025
f1a4137
feat: add dataclasses to handle types more easily
JoshLoecker Jun 3, 2025
8c71b57
refactor: add logging
JoshLoecker Jun 3, 2025
b632a72
refactor: added slots to dataclass for reduced memory usage; ruff for…
JoshLoecker Jun 3, 2025
f651a85
refactor: better managing/converting of ensembl and entrez gene IDs
JoshLoecker Jun 3, 2025
5a30ced
refactor: more efficient tpm calculation
JoshLoecker Jun 3, 2025
645eeef
refactor: use pandas dataframes instead of numpy arrays for developer…
JoshLoecker Jun 3, 2025
146a806
refactor: fix plot generation
JoshLoecker Jun 3, 2025
a9d7800
refactor: ruff formatting
JoshLoecker Jun 3, 2025
992469a
refactor: ruff formatting
JoshLoecker Jun 3, 2025
8c00a1f
refactor: ruff formatting
JoshLoecker Jun 3, 2025
d651700
style: better variable naming
JoshLoecker Jun 3, 2025
66bc302
refactor: fix creating evaluable gene reaction rules
JoshLoecker Jun 3, 2025
a2cb161
style: ruff formatting
JoshLoecker Jun 3, 2025
5e331ee
refactor: only publish package when tag starts with 'v'
JoshLoecker Jun 3, 2025
f572104
refactor: use better method of converting to numeric dtype
JoshLoecker Aug 7, 2025
3a547ea
chore: bump package requirements
JoshLoecker Sep 5, 2025
ca99e0a
fix: swap minimum and maximum bounds for exchange reactions
JoshLoecker Sep 5, 2025
efbeed1
feat: provide `identifier_column` for dynamic splitting of gene expre…
JoshLoecker Sep 5, 2025
aac7bbd
feat: use async file reading
JoshLoecker Sep 5, 2025
d1dae99
refactor!: rename `graph` to `plot`
JoshLoecker Sep 5, 2025
b7983c3
chore: remove trailing underscore; do not write index
JoshLoecker Sep 5, 2025
009fbb2
feat: add identifier column check for gene expression data processing
JoshLoecker Sep 5, 2025
0a7c0a6
fix: replace '-' with pd.NA in drug target genes DataFrame
JoshLoecker Sep 5, 2025
3468625
style: ruff formatting
JoshLoecker Sep 5, 2025
6561c24
refactor: use pd.NA instead of checking "-"
JoshLoecker Sep 5, 2025
bf56069
fix: only drop NA values from ensembl_gene_id, not entire dataframe
JoshLoecker Sep 5, 2025
dc324d5
feat: added function to build heatmap of conditions vs pathways
JoshLoecker Sep 5, 2025
74d883d
feat: added function to plot z score distribution
JoshLoecker Sep 5, 2025
b55215d
feat: added `pipelines` module to make using utilities easier
JoshLoecker Sep 5, 2025
40efbe1
fix: merge conflict
JoshLoecker Sep 5, 2025
d65a219
chore: fix typo
JoshLoecker Sep 5, 2025
3e671d9
style: format code, Jupyter Notebook(s), and Python imports with `ruff`
JoshLoecker Sep 5, 2025
1e0a5bb
fix: break up complex conditional
JoshLoecker Sep 5, 2025
6adaa7b
fix: _validate_source_arguments requires arg expansion
JoshLoecker Sep 5, 2025
c26f437
fix: typo
JoshLoecker Sep 5, 2025
da2beaf
fix: typo
JoshLoecker Sep 5, 2025
5eb0254
Merge remote-tracking branch 'origin/fix-startup' into fix-startup
JoshLoecker Sep 5, 2025
462ef3f
refactor: simplify function signatures and improve docstrings
JoshLoecker Sep 5, 2025
df84aea
fix: invalid collection of gene counts
JoshLoecker Sep 5, 2025
bcb6689
fix(rnaseq_preprocess): ensure correct order of async file reads and …
JoshLoecker Sep 5, 2025
feb7d11
chore: add type hint
JoshLoecker Sep 10, 2025
dbf12f4
refactor: use python optional-groups instead of uv dependency-groups
JoshLoecker Sep 10, 2025
700489b
feat: added Kolmogorov-Smirnoff tests
JoshLoecker Sep 18, 2025
eafaa9f
feat: added Mann-Whitney U-tests
JoshLoecker Sep 18, 2025
315d5c6
feat: added Fisher Exact tests
JoshLoecker Sep 18, 2025
16d5c83
feat: added unit tests for KS statistics
JoshLoecker Sep 18, 2025
ae1ca04
feat: added unit tests for Mann Whitney statistics
JoshLoecker Sep 18, 2025
d26bc77
feat: added unit tests for Fisher's Exact statistics
JoshLoecker Sep 18, 2025
371a0ad
refactor: rename variables for better understanding
JoshLoecker Oct 1, 2025
5419d95
refactor: only return a `cobra.Model` reconstruction
JoshLoecker Oct 1, 2025
b5cea99
refactor: add comments stating purpose of code blocks
JoshLoecker Oct 2, 2025
3ea0070
feat: optionally force include exchange reactions
JoshLoecker Oct 2, 2025
b9104ae
chore: more specific type hints
JoshLoecker Oct 2, 2025
85aee68
feat(dev): make private classes public
JoshLoecker Oct 2, 2025
b0479e2
fix: allow .xml or .sbml model types
JoshLoecker Oct 2, 2025
35882d7
style: reword log messages
JoshLoecker Oct 2, 2025
437e013
fix: use better variable name
JoshLoecker Oct 2, 2025
57ced4f
feat(dev): optional lowercasing of dataframe column names
JoshLoecker Oct 2, 2025
bff3654
feat(dev): fix ty warning
JoshLoecker Oct 2, 2025
312c7bb
Merge branch 'develop' into fix-imat-build
JoshLoecker Oct 6, 2025
b511c09
fix: uv lock dependencies
JoshLoecker Oct 6, 2025
f00ee17
Merge remote-tracking branch 'origin/fix-imat-build' into fix-imat-build
JoshLoecker Oct 6, 2025
5be0a8a
fix: uv lock dependencies
JoshLoecker Oct 6, 2025
50d85b8
fix: remove `--cov` for coverage updating
JoshLoecker Oct 6, 2025
11ac5c2
fix: remove duplicate code from merge conflict
JoshLoecker Oct 6, 2025
cb8153a
Move ref models (#232)
JoshLoecker Oct 6, 2025
d688e0d
chore: add V3 reference model from utils-update
JoshLoecker Oct 7, 2025
50e19d2
fix: typo for expected test value
JoshLoecker Oct 7, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/continuous_integration.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [ "3.10", "3.11", "3.12", "3.13" ]
python-version: [ "3.11", "3.12", "3.13" ]
steps:
- name: Checkout
uses: actions/checkout@v4
Expand All @@ -27,7 +27,7 @@ jobs:
run: uv sync --python "${{ matrix.python-version }}" --all-extras --dev

- name: Run tests
run: uv run --python "${{ matrix.python-version }}" pytest --cov --junitxml=junit.xml -o junit_family=legacy
run: uv run --python "${{ matrix.python-version }}" pytest

- name: Cache Clear
run: uv cache prune --ci
229 changes: 83 additions & 146 deletions main/COMO.ipynb

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions main/como/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
from como import plot
from como.data_types import AdjustmentMethod, Algorithm, CobraCompartments, FilteringTechnique, LogLevel, Solver
from como.utils import stringlist_to_list

Expand Down
49 changes: 37 additions & 12 deletions main/como/combine_distributions.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
)


async def _combine_z_distribution_for_batch(
def _combine_z_distribution_for_batch(
context_name: str,
batch: _BatchEntry,
matrix: pd.DataFrame,
Expand All @@ -33,6 +33,21 @@ async def _combine_z_distribution_for_batch(
weighted_z_floor: int,
weighted_z_ceiling: int,
) -> pd.DataFrame:
"""Combine z-score distributions across samples for a single batch.

Args:
context_name: Name of the context (e.g., tissue or condition).
batch: Batch entry containing batch number and sample names.
matrix: DataFrame with 'ensembl_gene_id' and sample columns.
source: Source type (e.g., trna, mrna, scrna, proteomics).
output_combined_matrix_filepath: Path to save the combined z-score matrix.
output_figure_dirpath: Path to save the z-score distribution figure.
weighted_z_floor: Minimum z-score value after combining.
weighted_z_ceiling: Maximum z-score value after combining.

Returns:
A pandas dataframe of the weighted z-distributions
"""
output_combined_matrix_filepath.parent.mkdir(parents=True, exist_ok=True)
output_figure_dirpath.mkdir(parents=True, exist_ok=True)

Expand Down Expand Up @@ -80,15 +95,29 @@ async def _combine_z_distribution_for_batch(
return weighted_matrix


async def _combine_z_distribution_for_source(
def _combine_z_distribution_for_source(
merged_source_data: pd.DataFrame,
context_name: str,
num_replicates: int,
output_combined_matrix_filepath: Path,
output_figure_filepath: Path,
weighted_z_floor: int = -6,
weighted_z_ceiling: int = 6,
):
) -> pd.DataFrame:
"""Combine z-score distributions across batches for a single source.

Args:
merged_source_data: DataFrame with 'ensembl_gene_id' and batch columns.
context_name: Name of the context (e.g., tissue or condition).
num_replicates: Number of replicates (samples) for weighting.
output_combined_matrix_filepath: Path to save the combined z-score matrix.
output_figure_filepath: Path to save the z-score distribution figure.
weighted_z_floor: Minimum z-score value after combining.
weighted_z_ceiling: Maximum z-score value after combining.

Returns:
A pandas dataframe of the weighted z-distributions
"""
if _num_columns(merged_source_data) <= 2:
logger.warning("A single source exists, returning matrix as-is because no additional combining can be done")
merged_source_data.columns = ["ensembl_gene_id", "combine_z"]
Expand Down Expand Up @@ -144,14 +173,10 @@ def _combine_z_distribution_for_context(
return pd.DataFrame({"ensembl_gene_id": [], "combine_z": []})

z_matrices = [
result.z_score_matrix.set_index("ensembl_gene_id").rename(columns=dict.fromkeys(result.z_score_matrix.columns[1:], result.type.value))
for result in zscore_results
res.z_score_matrix.set_index("ensembl_gene_id").rename(columns=dict.fromkeys(res.z_score_matrix.columns[1:], res.type.value))
for res in zscore_results
]
z_matrix = pd.DataFrame()
for matrix in z_matrices:
z_matrix = z_matrix.merge(right=matrix, left_index=True, right_index=True, how="outer") if not z_matrix.empty else matrix
z_matrix = z_matrix.reset_index(drop=False)
# z_matrix = pd.concat(z_matrices, axis=1, join="outer").reset_index()
z_matrix = pd.concat(z_matrices, axis=1, join="outer").reset_index()
if _num_columns(z_matrix) <= 1:
logger.trace(f"Only 1 source exists for '{context}', returning dataframe as-is becuase no data exists to combine")
z_matrix.columns = ["ensembl_gene_id", "combine_z"]
Expand Down Expand Up @@ -229,7 +254,7 @@ async def _begin_combining_distributions(
matrix=matrix[[GeneIdentifier.ENSEMBL_GENE_ID.value, *batch.sample_names]],
source=source,
output_combined_matrix_filepath=(
output_filepaths[source.value].parent / f"{context_name}_{source.value}_batch{batch.batch_num}_combined_z_distribution_.csv"
output_filepaths[source.value].parent / f"{context_name}_{source.value}_batch{batch.batch_num}_combined_z_distribution.csv"
),
output_figure_dirpath=output_figure_dirpath,
weighted_z_floor=weighted_z_floor,
Expand All @@ -243,7 +268,7 @@ async def _begin_combining_distributions(
for df in batch_results:
merged_batch_results = df if merged_batch_results.empty else merged_batch_results.merge(df, on="ensembl_gene_id", how="outer")

merged_source_results: pd.DataFrame = await _combine_z_distribution_for_source(
merged_source_results: pd.DataFrame = _combine_z_distribution_for_source(
merged_source_data=merged_batch_results,
context_name=context_name,
num_replicates=sum(batch.num_samples for batch in batch_names[source.value]),
Expand Down
Loading