|
1 | | -# Copykat_python |
2 | | -The Copykat re-write in Python |
| 1 | +# CopyKAT-Python |
| 2 | + |
| 3 | +CopyKAT-Python is a Python implementation of the CopyKAT workflow for inferring large-scale copy number alterations from single-cell RNA-seq data. It is designed for Python-based single-cell analysis pipelines and aims to improve usability, scalability, and integration with modern `AnnData`/`Scanpy` workflows. |
| 4 | + |
| 5 | +## Why CopyKAT-Python? |
| 6 | + |
| 7 | +The original CopyKAT-R package is widely used for distinguishing aneuploid tumor cells from diploid normal cells using scRNA-seq data. However, users have reported practical limitations when applying CopyKAT-R to large modern datasets. |
| 8 | + |
| 9 | +Open GitHub issues in the original CopyKAT repository highlight several recurring needs: |
| 10 | + |
| 11 | +- Long runtime, including reports of >1 hour for ~8,000 cells. |
| 12 | +- Difficulty running very large datasets, including hundreds of thousands to millions of cells. |
| 13 | + |
| 14 | +CopyKAT-Python was developed to address these practical issues while preserving the main biological idea of CopyKAT: large-scale chromosomal expression patterns can be used to infer copy number profiles and separate malignant from non-malignant cells. |
| 15 | + |
| 16 | +## What Is Improved? |
| 17 | + |
| 18 | +Compared with CopyKAT-R, CopyKAT-Python focuses on: |
| 19 | + |
| 20 | +- Native Python workflow support. |
| 21 | +- Easier integration with `AnnData`, `Scanpy`, and Python pipelines. |
| 22 | +- Improved handling of large datasets. |
| 23 | +- More transparent intermediate outputs. |
| 24 | +- Clearer confidence reporting for uncertain cells. |
| 25 | +- More flexible downstream use of CNV matrices and cell-level annotations. |
| 26 | +- Better reproducibility through Python package management and scripted workflows. |
| 27 | + |
| 28 | +CopyKAT-Python is not intended to be a line-by-line clone of CopyKAT-R. It is a Python reimplementation designed to reproduce the core CopyKAT strategy while improving scalability and usability. |
| 29 | + |
| 30 | +## Confidence of Results |
| 31 | + |
| 32 | +CopyKAT-Python reports tumor/normal predictions together with confidence-related outputs. High-confidence results usually show clear chromosome-arm or whole-chromosome CNV patterns, consistent CNV profiles within clusters, and strong separation between inferred diploid and aneuploid cells. |
| 33 | + |
| 34 | +Lower-confidence results may occur in samples with weak CNV signal, low sequencing depth, few normal reference cells, strong batch effects, or tumors with near-diploid genomes. |
| 35 | + |
| 36 | +## Why Results May Differ from CopyKAT-R |
| 37 | + |
| 38 | +CopyKAT-Python results may not be identical to CopyKAT-R because of differences in: |
| 39 | + |
| 40 | +- Gene annotation versions. |
| 41 | +- Filtering and preprocessing. |
| 42 | +- Numerical implementation. |
| 43 | +- Smoothing and segmentation details. |
| 44 | +- Clustering behavior and random seeds. |
| 45 | +- Handling of uncertain or `not.defined` cells. |
| 46 | + |
| 47 | +These differences are expected for an independent Python implementation. |
| 48 | + |
| 49 | +## Figures and Tables to Add |
| 50 | + |
| 51 | +### Figure 1. Workflow overview |
| 52 | + |
| 53 | +Input expression matrix → gene genomic ordering → smoothing → CNV inference → clustering → tumor/normal prediction. |
| 54 | + |
| 55 | +### Figure 2. Example CNV heatmap |
| 56 | + |
| 57 | +Show inferred CNV profiles across chromosomes with cells grouped by predicted tumor/normal status. |
| 58 | + |
| 59 | +### Figure 3. CopyKAT-R vs CopyKAT-Python comparison |
| 60 | + |
| 61 | +Show side-by-side CNV heatmaps or classification agreement on the same dataset. |
| 62 | + |
| 63 | +### Table 1. Runtime and scalability benchmark |
| 64 | + |
| 65 | +| Dataset | Cells | Genes | CopyKAT-R runtime | CopyKAT-Python runtime | Notes | |
| 66 | +|---|---:|---:|---:|---:|---| |
| 67 | +| TODO | TODO | TODO | TODO | TODO | TODO | |
| 68 | + |
| 69 | +### Table 2. Classification concordance |
| 70 | + |
| 71 | +| Dataset | Tumor/normal agreement | Aneuploid agreement | Diploid agreement | Uncertain cells | Notes | |
| 72 | +|---|---:|---:|---:|---:|---| |
| 73 | +| TODO | TODO | TODO | TODO | TODO | TODO | |
| 74 | + |
| 75 | +## Installation |
| 76 | + |
| 77 | +```bash |
| 78 | +git clone https://github.com/NavinLab/copykat-python.git |
| 79 | +cd copykat-python |
| 80 | +pip install -e . |
0 commit comments