Skip to content

Conversation

@JoshLoecker
Copy link
Member

This pull request replicates R's zFPKM functionality by rewriting underlying C code from R into Python using NumPy.

…dicates an incorrect taxon id

Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
…ht across the study

Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
…sible

Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
BREAKING-CHANGE: instead of providing a type `PeakIdentificationParameters`,  simply provide the min peak height and distance directly
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
…, not at the maximum fpkm distribution

Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
…sible

Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
…sible

Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
…ecting gene data

Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Copilot AI review requested due to automatic review settings November 3, 2025 16:55
@JoshLoecker JoshLoecker changed the base branch from main to develop November 3, 2025 16:55
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR replaces NumPy-specific type hints (e.g., np.float64, np.float32, np.int8) with Python built-in types (float, int) across the codebase. It also includes significant refactoring of zFPKM calculations to match R's implementation, adds new helper modules for density estimation and peak finding, updates enum values to lowercase, and removes test/debugging code.

  • Replaces NumPy-specific dtype references with Python built-in types for better compatibility
  • Refactors zFPKM transformations to align with R's implementation using custom density and peak-finding functions
  • Updates enum values from uppercase to lowercase for consistency
  • Adds new configuration file (ty.toml) and helper modules (density.py, peak_finder.py, approx.py)

Reviewed Changes

Copilot reviewed 24 out of 25 changed files in this pull request and generated 20 comments.

Show a summary per file
File Description
ty.toml Adds new type checker configuration
tests/unit/test_*.py Updates test fixtures to use float/int instead of NumPy types
ruff.toml Reformats linting configuration and moves F401 to unfixable rules
main/como/utils.py Updates type hints and adds logic to handle DataFrame index names
main/como/stats/*.py Replaces NumPy types with built-in types in statistical test classes
main/como/rnaseq_*.py Major refactoring including zFPKM calculation updates and gene ID handling
main/como/proteomics_*.py Updates type hints and gene symbol processing logic
main/como/data_types.py Changes enum values from uppercase to lowercase
main/como/create_context_specific_model.py Adds taxon parameter, refactors model I/O, and updates type hints
main/como/merge_xomics.py Updates enum usage and improves batch number extraction
main/como/combine_distributions.py Refactors z-score combination logic to handle replicates per batch
main/como/peak_finder.py New module implementing R's findpeaks function
main/como/density.py New module implementing R's density function for KDE
main/como/approx.py New module implementing R's approx interpolation function
Comments suppressed due to low confidence (1)

main/como/create_context_specific_model.py:1

  • Comment uses incorrect type checker ignore syntax. Should be type: ignore not ty: ignore.
from __future__ import annotations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

A NumPy array of the filtered data.
"""
if not isinstance(data, pd.DataFrame | npt.NDArray):
if not isinstance(data, pd.DataFrame | np.ndarray):
Copy link

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The isinstance() call should use a tuple, not a union type with |. This should be isinstance(data, (pd.DataFrame, np.ndarray)). The current syntax will raise a TypeError at runtime.

Suggested change
if not isinstance(data, pd.DataFrame | np.ndarray):
if not isinstance(data, (pd.DataFrame, np.ndarray)):

Copilot uses AI. Check for mistakes.
"ensembl_gene_id": merged_source_data["ensembl_gene_id"],
"combine_z": weighted_matrix,
}
# alidate alignment between columns and replicate vector
Copy link

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'Validate' (missing 'V').

Suggested change
# alidate alignment between columns and replicate vector
# Validate alignment between columns and replicate vector

Copilot uses AI. Check for mistakes.
# replace gene_id with activity, using optional whitespace before and after the gene id
# Do not replace the whitespace (if it exists) before and after the gene ID
gene_reaction_rule = re.sub(pattern=rf"\b{gene_id}\b", repl=activity, string=gene_reaction_rule)
gene_reaction_rule = re.sub(pattern=rf"\b{gene_id}\b", repl=str(activity), string=str(gene_reaction_rule))
Copy link

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling str() on gene_reaction_rule in every iteration is unnecessary if it's already a string. Consider verifying the type once before the loop or using a type assertion if you're certain it's always a string.

Suggested change
gene_reaction_rule = re.sub(pattern=rf"\b{gene_id}\b", repl=str(activity), string=str(gene_reaction_rule))
gene_reaction_rule = re.sub(pattern=rf"\b{gene_id}\b", repl=str(activity), string=gene_reaction_rule)

Copilot uses AI. Check for mistakes.
)
)
df = pd.concat(dataframes, axis="columns")
print(df)
Copy link

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug print statement left in production code. This should either be removed or replaced with proper logging using logger.debug(df) or logger.trace(df).

Suggested change
print(df)
logger.debug(df)

Copilot uses AI. Check for mistakes.

@overload
async def _read_file(path: pd.DataFrame, h5ad_as_df: bool, **kwargs) -> pd.DataFrame: ...
async def _read_file(path: pd.DataFrame, h5ad_as_df: Literal[True] | Literal[False], **kwargs) -> pd.DataFrame: ...
Copy link

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statement has no effect.

Copilot uses AI. Check for mistakes.

@overload
async def _read_file(path: sc.AnnData, h5ad_as_df: bool = False, **kwargs) -> sc.AnnData: ...
async def _read_file(path: sc.AnnData, h5ad_as_df: Literal[False] = False, **kwargs) -> sc.AnnData: ...
Copy link

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statement has no effect.

Copilot uses AI. Check for mistakes.
def _num_rows(item: pd.DataFrame | npt.NDArray) -> int:
return item.shape[0]
@overload
async def _read_file(path: sc.AnnData, h5ad_as_df: Literal[True] = True, **kwargs) -> pd.DataFrame: ...
Copy link

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statement has no effect.

Copilot uses AI. Check for mistakes.


@overload
async def _read_file(path: Path, h5ad_as_df: Literal[False] = False, **kwargs) -> pd.DataFrame | sc.AnnData: ...
Copy link

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statement has no effect.

Copilot uses AI. Check for mistakes.

@overload
async def _read_file(path: sc.AnnData, h5ad_as_df: bool = True, **kwargs) -> pd.DataFrame: ...
async def _read_file(path: Path, h5ad_as_df: Literal[True] = True, **kwargs) -> pd.DataFrame: ...
Copy link

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statement has no effect.

Copilot uses AI. Check for mistakes.
Signed-off-by: Josh Loecker <joshloecker@icloud.com>
@JoshLoecker JoshLoecker merged commit a609a56 into develop Nov 3, 2025
3 checks passed
@JoshLoecker JoshLoecker deleted the fix-zfpkm branch November 3, 2025 17:02
JoshLoecker added a commit that referenced this pull request Nov 3, 2025
JoshLoecker added a commit that referenced this pull request Nov 3, 2025
JoshLoecker added a commit that referenced this pull request Nov 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants