Skip to content

Commit 038ad65

Browse files
Revise README for CopyKAT-Python project
Updated README to provide detailed information about CopyKAT-Python, its improvements over CopyKAT-R, and installation instructions.
1 parent 8195add commit 038ad65

1 file changed

Lines changed: 80 additions & 2 deletions

File tree

README.md

Lines changed: 80 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,80 @@
1-
# Copykat_python
2-
The Copykat re-write in Python
1+
# CopyKAT-Python
2+
3+
CopyKAT-Python is a Python implementation of the CopyKAT workflow for inferring large-scale copy number alterations from single-cell RNA-seq data. It is designed for Python-based single-cell analysis pipelines and aims to improve usability, scalability, and integration with modern `AnnData`/`Scanpy` workflows.
4+
5+
## Why CopyKAT-Python?
6+
7+
The original CopyKAT-R package is widely used for distinguishing aneuploid tumor cells from diploid normal cells using scRNA-seq data. However, users have reported practical limitations when applying CopyKAT-R to large modern datasets.
8+
9+
Open GitHub issues in the original CopyKAT repository highlight several recurring needs:
10+
11+
- Long runtime, including reports of >1 hour for ~8,000 cells.
12+
- Difficulty running very large datasets, including hundreds of thousands to millions of cells.
13+
14+
CopyKAT-Python was developed to address these practical issues while preserving the main biological idea of CopyKAT: large-scale chromosomal expression patterns can be used to infer copy number profiles and separate malignant from non-malignant cells.
15+
16+
## What Is Improved?
17+
18+
Compared with CopyKAT-R, CopyKAT-Python focuses on:
19+
20+
- Native Python workflow support.
21+
- Easier integration with `AnnData`, `Scanpy`, and Python pipelines.
22+
- Improved handling of large datasets.
23+
- More transparent intermediate outputs.
24+
- Clearer confidence reporting for uncertain cells.
25+
- More flexible downstream use of CNV matrices and cell-level annotations.
26+
- Better reproducibility through Python package management and scripted workflows.
27+
28+
CopyKAT-Python is not intended to be a line-by-line clone of CopyKAT-R. It is a Python reimplementation designed to reproduce the core CopyKAT strategy while improving scalability and usability.
29+
30+
## Confidence of Results
31+
32+
CopyKAT-Python reports tumor/normal predictions together with confidence-related outputs. High-confidence results usually show clear chromosome-arm or whole-chromosome CNV patterns, consistent CNV profiles within clusters, and strong separation between inferred diploid and aneuploid cells.
33+
34+
Lower-confidence results may occur in samples with weak CNV signal, low sequencing depth, few normal reference cells, strong batch effects, or tumors with near-diploid genomes.
35+
36+
## Why Results May Differ from CopyKAT-R
37+
38+
CopyKAT-Python results may not be identical to CopyKAT-R because of differences in:
39+
40+
- Gene annotation versions.
41+
- Filtering and preprocessing.
42+
- Numerical implementation.
43+
- Smoothing and segmentation details.
44+
- Clustering behavior and random seeds.
45+
- Handling of uncertain or `not.defined` cells.
46+
47+
These differences are expected for an independent Python implementation.
48+
49+
## Figures and Tables to Add
50+
51+
### Figure 1. Workflow overview
52+
53+
Input expression matrix → gene genomic ordering → smoothing → CNV inference → clustering → tumor/normal prediction.
54+
55+
### Figure 2. Example CNV heatmap
56+
57+
Show inferred CNV profiles across chromosomes with cells grouped by predicted tumor/normal status.
58+
59+
### Figure 3. CopyKAT-R vs CopyKAT-Python comparison
60+
61+
Show side-by-side CNV heatmaps or classification agreement on the same dataset.
62+
63+
### Table 1. Runtime and scalability benchmark
64+
65+
| Dataset | Cells | Genes | CopyKAT-R runtime | CopyKAT-Python runtime | Notes |
66+
|---|---:|---:|---:|---:|---|
67+
| TODO | TODO | TODO | TODO | TODO | TODO |
68+
69+
### Table 2. Classification concordance
70+
71+
| Dataset | Tumor/normal agreement | Aneuploid agreement | Diploid agreement | Uncertain cells | Notes |
72+
|---|---:|---:|---:|---:|---|
73+
| TODO | TODO | TODO | TODO | TODO | TODO |
74+
75+
## Installation
76+
77+
```bash
78+
git clone https://github.com/NavinLab/copykat-python.git
79+
cd copykat-python
80+
pip install -e .

0 commit comments

Comments
 (0)