-
Notifications
You must be signed in to change notification settings - Fork 8
Fix zFPKM Calculations #238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…dicates an incorrect taxon id Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
…ht across the study Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
…sible Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
BREAKING-CHANGE: instead of providing a type `PeakIdentificationParameters`, simply provide the min peak height and distance directly Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
…, not at the maximum fpkm distribution Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
…sible Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
…sible Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
…ecting gene data Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR replaces NumPy-specific type hints (e.g., np.float64, np.float32, np.int8) with Python built-in types (float, int) across the codebase. It also includes significant refactoring of zFPKM calculations to match R's implementation, adds new helper modules for density estimation and peak finding, updates enum values to lowercase, and removes test/debugging code.
- Replaces NumPy-specific dtype references with Python built-in types for better compatibility
- Refactors zFPKM transformations to align with R's implementation using custom density and peak-finding functions
- Updates enum values from uppercase to lowercase for consistency
- Adds new configuration file (ty.toml) and helper modules (density.py, peak_finder.py, approx.py)
Reviewed Changes
Copilot reviewed 24 out of 25 changed files in this pull request and generated 20 comments.
Show a summary per file
| File | Description |
|---|---|
| ty.toml | Adds new type checker configuration |
| tests/unit/test_*.py | Updates test fixtures to use float/int instead of NumPy types |
| ruff.toml | Reformats linting configuration and moves F401 to unfixable rules |
| main/como/utils.py | Updates type hints and adds logic to handle DataFrame index names |
| main/como/stats/*.py | Replaces NumPy types with built-in types in statistical test classes |
| main/como/rnaseq_*.py | Major refactoring including zFPKM calculation updates and gene ID handling |
| main/como/proteomics_*.py | Updates type hints and gene symbol processing logic |
| main/como/data_types.py | Changes enum values from uppercase to lowercase |
| main/como/create_context_specific_model.py | Adds taxon parameter, refactors model I/O, and updates type hints |
| main/como/merge_xomics.py | Updates enum usage and improves batch number extraction |
| main/como/combine_distributions.py | Refactors z-score combination logic to handle replicates per batch |
| main/como/peak_finder.py | New module implementing R's findpeaks function |
| main/como/density.py | New module implementing R's density function for KDE |
| main/como/approx.py | New module implementing R's approx interpolation function |
Comments suppressed due to low confidence (1)
main/como/create_context_specific_model.py:1
- Comment uses incorrect type checker ignore syntax. Should be
type: ignorenotty: ignore.
from __future__ import annotations
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| A NumPy array of the filtered data. | ||
| """ | ||
| if not isinstance(data, pd.DataFrame | npt.NDArray): | ||
| if not isinstance(data, pd.DataFrame | np.ndarray): |
Copilot
AI
Nov 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The isinstance() call should use a tuple, not a union type with |. This should be isinstance(data, (pd.DataFrame, np.ndarray)). The current syntax will raise a TypeError at runtime.
| if not isinstance(data, pd.DataFrame | np.ndarray): | |
| if not isinstance(data, (pd.DataFrame, np.ndarray)): |
| "ensembl_gene_id": merged_source_data["ensembl_gene_id"], | ||
| "combine_z": weighted_matrix, | ||
| } | ||
| # alidate alignment between columns and replicate vector |
Copilot
AI
Nov 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected spelling of 'Validate' (missing 'V').
| # alidate alignment between columns and replicate vector | |
| # Validate alignment between columns and replicate vector |
| # replace gene_id with activity, using optional whitespace before and after the gene id | ||
| # Do not replace the whitespace (if it exists) before and after the gene ID | ||
| gene_reaction_rule = re.sub(pattern=rf"\b{gene_id}\b", repl=activity, string=gene_reaction_rule) | ||
| gene_reaction_rule = re.sub(pattern=rf"\b{gene_id}\b", repl=str(activity), string=str(gene_reaction_rule)) |
Copilot
AI
Nov 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling str() on gene_reaction_rule in every iteration is unnecessary if it's already a string. Consider verifying the type once before the loop or using a type assertion if you're certain it's always a string.
| gene_reaction_rule = re.sub(pattern=rf"\b{gene_id}\b", repl=str(activity), string=str(gene_reaction_rule)) | |
| gene_reaction_rule = re.sub(pattern=rf"\b{gene_id}\b", repl=str(activity), string=gene_reaction_rule) |
| ) | ||
| ) | ||
| df = pd.concat(dataframes, axis="columns") | ||
| print(df) |
Copilot
AI
Nov 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug print statement left in production code. This should either be removed or replaced with proper logging using logger.debug(df) or logger.trace(df).
| print(df) | |
| logger.debug(df) |
|
|
||
| @overload | ||
| async def _read_file(path: pd.DataFrame, h5ad_as_df: bool, **kwargs) -> pd.DataFrame: ... | ||
| async def _read_file(path: pd.DataFrame, h5ad_as_df: Literal[True] | Literal[False], **kwargs) -> pd.DataFrame: ... |
Copilot
AI
Nov 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This statement has no effect.
|
|
||
| @overload | ||
| async def _read_file(path: sc.AnnData, h5ad_as_df: bool = False, **kwargs) -> sc.AnnData: ... | ||
| async def _read_file(path: sc.AnnData, h5ad_as_df: Literal[False] = False, **kwargs) -> sc.AnnData: ... |
Copilot
AI
Nov 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This statement has no effect.
| def _num_rows(item: pd.DataFrame | npt.NDArray) -> int: | ||
| return item.shape[0] | ||
| @overload | ||
| async def _read_file(path: sc.AnnData, h5ad_as_df: Literal[True] = True, **kwargs) -> pd.DataFrame: ... |
Copilot
AI
Nov 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This statement has no effect.
|
|
||
|
|
||
| @overload | ||
| async def _read_file(path: Path, h5ad_as_df: Literal[False] = False, **kwargs) -> pd.DataFrame | sc.AnnData: ... |
Copilot
AI
Nov 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This statement has no effect.
|
|
||
| @overload | ||
| async def _read_file(path: sc.AnnData, h5ad_as_df: bool = True, **kwargs) -> pd.DataFrame: ... | ||
| async def _read_file(path: Path, h5ad_as_df: Literal[True] = True, **kwargs) -> pd.DataFrame: ... |
Copilot
AI
Nov 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This statement has no effect.
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
This pull request replicates R's
zFPKMfunctionality by rewriting underlying C code from R into Python using NumPy.