omicML: a bioinformatics and machine learning based graphical user interface for transcriptomic biomarker identification

An intuitive graphical user interface that combines transcriptomic data analysis with ML-based classification via integrating R and Python packages/libraries. It supports both RNA-Seq and microarray data following automatic preprocessing and differential expression analysis. It annotates differentially expressed genes with descriptions, gene ontology, and pathway information and incorporates comparative analysis. The extensive ML pipeline sequentially enables both unsupervised and supervised learning, benchmarks multiple ML classifiers, assesses feature importance, develops single-gene and multi-gene predictive models, and systematically finalizes the biomarker algorithm.

Production instance: https://backend.omicml.org
Frontend: https://omicml.org
Repository: https://github.com/definecoder/omicML-server

Preprint

bioRxiv

omicML: An Integrative Bioinformatics and Machine Learning Framework for Transcriptomic Biomarker Identification

Joy Prokash Debnath, Kabir Hossen, Md. Sayeam Khandaker, Shawon Majid, Md Mehrajul Islam, Siam Arefin, Preonath Chondrow Dev, Saifuddin Sarker, Tanvir Hossain

bioRxiv, Cold Spring Harbor Laboratory (2025)

DOI: 10.1101/2025.10.25.684517

View PDF

Features

Module	Capabilities
RNA-seq Analysis	Upload count + metadata, log2 normalization, quantile normalization, DESeq2 DGE, volcano plots
Microarray Analysis	Limma-based DGE, before/after normalization QC
Quality Control	PCA, t-SNE, UMAP, hierarchical clustering, K-means, boxplots
Batch Effect Correction	SVA (Surrogate Variable Analysis)
Machine Learning	Feature selection, model benchmarking (XGBoost, Random Forest)
Gene Annotation	BiomaRt-based ID mapping across multiple organisms and ID types
Venn Diagrams	Multi-set gene list comparison
Heatmaps	Publication-ready annotated heatmaps
STRING Networks	Protein–protein interaction queries and enrichment analysis

Architecture

omicML-server/
├── api/
│   ├── main.py                  # FastAPI application entry point
│   ├── requirements.txt         # Python dependencies
│   ├── database.py              # SQLite database (SQLAlchemy)
│   ├── routers/                 # API route handlers
│   │   ├── auth_router.py       # User registration & JWT login
│   │   ├── operation_router.py  # RNA-seq analysis pipeline
│   │   ├── micro_router.py      # Microarray pipeline
│   │   ├── annotation_router.py # Gene annotation (BiomaRt)
│   │   ├── venn_router.py       # Venn diagram generation
│   │   ├── heatmap_router.py    # Heatmap generation
│   │   └── file_router.py       # File serving
│   ├── code/                    # R analysis scripts
│   ├── string/                  # STRING database scripts
│   ├── models/                  # Pydantic schemas & ORM models
│   └── core/                    # Security (JWT), constants
├── Dockerfile
└── README.md

The server bridges Python (FastAPI) and R (rpy2) to execute statistical analyses. Each user's data is stored in isolated per-user directories under api/code/{user_id}/.

System Requirements

Minimum

CPU: 2 cores
RAM: 8 GB (16 GB recommended for large datasets)
Disk: 10 GB free (Docker image is ~6 GB due to R packages)

Software

Dependency	Version	Notes
Python	3.11.x	Required for local development
R	4.4+	Required for local development
Docker	20.10+	Required for Docker deployment

Note: The Docker image bundles Python 3.11, R 4.5, and all required packages. Local development requires you to install Python and R separately.

Installation

Option 1 — Docker (Recommended)

Docker is the recommended installation method. It bundles all Python and R dependencies, eliminating manual environment setup.

Prerequisites: Docker installed and running.

# 1. Clone the repository
git clone https://github.com/definecoder/omicML-server.git
cd omicML-server

# 2. Build the Docker image
#    (installs all Python and R packages — takes ~15–30 minutes on first build)
docker build -t omicml-server .

# 3. Run the container
docker run --name omicml-server -p 8000:8000 -d omicml-server:latest

# 4. Verify the server is running
curl http://localhost:8000/
# Expected: {"message": "PLAGL1 Server is running..."}

The server will be available at http://localhost:8000.

To stop the container:

docker stop omicml-server

To rebuild after code changes:

docker rm -f omicml-server && docker build -t omicml-server . && \
docker run --name omicml-server -p 8000:8000 -d omicml-server:latest

Option 2 — Local Development

Step 1 — Install Python 3.11

Download from python.org or via your system package manager.

# Ubuntu/Debian
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt-get update && sudo apt-get install -y python3.11 python3.11-venv python3.11-dev

Step 2 — Install R 4.4+

# Ubuntu/Debian (CRAN repository)
wget -qO- https://cloud.r-project.org/bin/linux/ubuntu/marutter_pubkey.asc | \
  sudo gpg --dearmor -o /usr/share/keyrings/cran.gpg
echo "deb [signed-by=/usr/share/keyrings/cran.gpg] https://cloud.r-project.org/bin/linux/ubuntu $(lsb_release -cs)-cran40/" | \
  sudo tee /etc/apt/sources.list.d/cran.list
sudo apt-get update && sudo apt-get install -y r-base

# macOS (via Homebrew)
brew install r

# Windows: download installer from https://cran.r-project.org/bin/windows/base/

Step 3 — Install system libraries (Ubuntu/Debian only)

sudo apt-get install -y \
  libssl-dev libcurl4-openssl-dev libxml2-dev libgit2-dev libglpk-dev \
  libgmp-dev libgsl-dev libfreetype6-dev libpng-dev libtiff5-dev \
  libjpeg-dev libcairo2-dev libharfbuzz-dev libfribidi-dev cmake

Step 4 — Install R packages

Open an R console and run:

# Install BiocManager
if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")

# Set Bioconductor version (ensures compatibility across all packages)
BiocManager::install(version = "3.21", ask = FALSE, update = TRUE)

# Install Bioconductor packages
BiocManager::install(
  c("WGCNA", "DESeq2", "limma", "biomaRt", "sva",
    "STRINGdb", "apeglm", "impute"),
  ask = FALSE, update = FALSE
)

# Install CRAN packages
BiocManager::install(
  c("tidyverse", "Rtsne", "umap", "ggplot2", "readr", "ape",
    "mice", "dplyr", "gplots", "ggVennDiagram", "pheatmap",
    "RColorBrewer", "stringr", "here", "lme4"),
  ask = FALSE
)

Expected versions (Bioconductor 3.21): DESeq2 ~1.44, limma ~3.60, biomaRt ~2.60, sva ~3.52, WGCNA ~1.73, STRINGdb ~2.16.

Step 5 — Set up the Python virtual environment

# From the repository root
python3.11 -m venv venv

# Activate
source venv/bin/activate      # Linux/macOS
# venv\Scripts\activate       # Windows

# Install Python dependencies
cd api
pip install --upgrade pip
pip install -r requirements.txt

Python dependencies summary:

Package	Version	Purpose
fastapi	0.112.0	Web framework
uvicorn	0.30.5	ASGI server
rpy2	3.5.16	Python–R bridge
pandas	2.2.2	Data manipulation
numpy	1.24.3	Numerical computing
scikit-learn	1.6.1	Machine learning utilities
xgboost	2.1.4	Gradient boosting
umap-learn	0.5.7	UMAP dimensionality reduction
matplotlib	3.7.1	Plotting
seaborn	0.13.2	Statistical visualization
SQLAlchemy	2.0.32	ORM / SQLite
python-jose	3.3.0	JWT authentication
passlib	1.7.4	Password hashing
pydantic	2.8.2	Data validation

Full pinned list: api/requirements.txt

Step 6 — Run the development server

# From the api/ directory (with venv activated)
fastapi dev main.py

The server starts at http://localhost:8000.

Configuration

The following settings can be modified in api/core/consts.py:

Variable	Default	Description
`BASE_URL`	`http://localhost:8000`	Base URL used in file-link responses. Set to your production domain.

The JWT secret key is defined in api/core/security.py. For production deployments, replace the default key with a strong random secret and manage it via an environment variable or secrets manager.

API Overview

All protected endpoints require a Bearer JWT token in the Authorization header.

Method	Endpoint	Auth	Description
`POST`	`/auth/register`	—	Register a new user
`POST`	`/auth/login`	—	Login and obtain a JWT token
`POST`	`/operations/init`	✓	Upload count data and metadata
`GET`	`/operations/analyze`	✓	Run normalization and QC visualizations
`GET`	`/operations/dge-analysis`	✓	Differential gene expression + volcano plots
`GET`	`/operations/visualize-dimensions`	✓	PCA / t-SNE / UMAP
`GET`	`/operations/find-best-model`	✓	Benchmark ML models
`POST`	`/operations/annotation/annotate_genes`	✓	Annotate genes via BiomaRt
`POST`	`/operations/venn/venn_diagram`	✓	Generate Venn diagram
`GET`	`/operations/heatmap/heatmap_diagram`	✓	Generate heatmap
`GET`	`/figures/{user_id}/{file}`	—	Retrieve a generated figure (PNG/PDF)
`GET`	`/files/{user_id}/{file}`	—	Retrieve a generated data file (CSV)

Full interactive documentation is available at http://localhost:8000/docs.

Test Dataset & Quickstart

Download Test Data

Two CSV files from the Mpox / PLAGL1 study are provided as test data:

File	Description	Download
`count_data.csv`	RNA-seq raw gene count matrix (genes × samples)	Download from Google Drive
`metadata.csv`	Sample metadata with condition labels	Download from Google Drive

Input Format

count_data.csv — Gene count matrix:

Row names (first column): gene identifiers (e.g., Ensembl IDs or gene symbols)
Column names (first row): sample IDs (must match the row names of metadata.csv)
Values: non-negative integers (raw counts)

,Sample1,Sample2,Sample3,Sample4
ENSG00000000003,1200,985,1102,1340
ENSG00000000005,450,520,490,610
...

metadata.csv — Sample annotation:

Row names (first column): sample IDs (must match column names of count_data.csv)
Columns: sample attributes; the condition column defines groups for DGE analysis

,condition
Sample1,control
Sample2,control
Sample3,infected
Sample4,infected
...

Step-by-Step Quickstart (cURL)

Replace http://localhost:8000 with https://backend.omicml.org to test against the production server.

1. Register a user account

curl -X POST http://localhost:8000/auth/register \
  -H "Content-Type: application/json" \
  -d '{"email": "user@example.com", "password": "yourpassword", "name": "Your Name"}'

Response:

{
  "message": "User created successfully",
  "token": "<JWT_TOKEN>"
}

2. Login (if account already exists)

curl -X POST http://localhost:8000/auth/login \
  -H "Content-Type: application/json" \
  -d '{"email": "user@example.com", "password": "yourpassword"}'

Response:

{
  "message": "User logged in successfully",
  "token": "<JWT_TOKEN>"
}

Save the token for subsequent requests:

TOKEN="<JWT_TOKEN>"

3. Upload count data and metadata

curl -X POST http://localhost:8000/operations/init \
  -H "Authorization: Bearer $TOKEN" \
  -F "count_data=@count_data.csv" \
  -F "meta_data=@metadata.csv"

Response:

{
  "message": "file uploaded & Processed successfully!",
  "count_data": "/path/to/count_data.csv",
  "meta_data": "/path/to/meta_data.csv"
}

4. Run normalization and quality control analysis

curl -X GET http://localhost:8000/operations/analyze \
  -H "Authorization: Bearer $TOKEN"

Response:

{
  "message": "Analysis completed successfully!",
  "results": {
    "boxplot_denorm_img": "http://localhost:8000/figures/{user_id}/Boxplot_denorm.png",
    "boxplot_norm_img":   "http://localhost:8000/figures/{user_id}/Boxplot_norm.png",
    "pca_denorm_img":     "http://localhost:8000/figures/{user_id}/PCA_denorm.png",
    "pca_norm_img":       "http://localhost:8000/figures/{user_id}/PCA_norm.png",
    "tsne_norm_img":      "http://localhost:8000/figures/{user_id}/tSNE_norm.png",
    "umap_norm_img":      "http://localhost:8000/figures/{user_id}/UMAP_norm.png",
    "htree_norm_img":     "http://localhost:8000/figures/{user_id}/htree_norm.png"
  }
}

Each URL in the response points to a downloadable PNG or PDF figure.

5. Run differential gene expression analysis

curl -X GET "http://localhost:8000/operations/dge-analysis" \
  -H "Authorization: Bearer $TOKEN"

Runs DESeq2 and returns URLs to volcano plot figures showing significantly up- and down-regulated genes per condition contrast.

6. Retrieve a generated figure

# Replace {user_id} with your numeric user ID from the analysis response
curl -O http://localhost:8000/figures/{user_id}/PCA_norm.png

Expected Output Summary

Endpoint	Output files	Description
`GET /operations/analyze`	`Boxplot_denorm.png/pdf`	Sample expression distributions before normalization
`GET /operations/analyze`	`Boxplot_norm.png/pdf`	Sample expression distributions after normalization
`GET /operations/analyze`	`PCA_denorm.png/pdf`	PCA before normalization
`GET /operations/analyze`	`PCA_norm.png/pdf`	PCA after normalization
`GET /operations/analyze`	`tSNE_norm.png/pdf`	t-SNE dimensionality reduction
`GET /operations/analyze`	`UMAP_norm.png/pdf`	UMAP dimensionality reduction
`GET /operations/analyze`	`htree_norm.png/pdf`	Hierarchical clustering dendrogram
`GET /operations/dge-analysis`	`volcano_*.png/pdf`	Volcano plots per condition comparison

All figures are generated in both PNG (for web display) and PDF (for publication) formats.

Interactive API Docs

Once the server is running, the full interactive Swagger UI is available at:

http://localhost:8000/docs

This lets you explore and test all endpoints directly from your browser with request/response schemas, without needing cURL or Postman.

A Postman collection is also included in the repository. Import it via Postman > Import to run all endpoints with pre-configured requests.

Citation

If you use omicML in your research, please cite:

Debnath, J.P., Hossen, K., Khandaker, M.S., Majid, S., Islam, M.M., Arefin, S., Chondrow Dev, P., Sarker, S. and Hossain, T., 2025. omicML: An Integrative Bioinformatics and Machine Learning Framework for Transcriptomic Biomarker Identification. bioRxiv, pp.2025-10.

License

License terms to be determined. Please contact the authors before use in derivative works.

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
api		api
.DS_Store		.DS_Store
.gitignore		.gitignore
.pylintrc		.pylintrc
.tool-versions		.tool-versions
Dockerfile		Dockerfile
R_PACKAGE_MANAGEMENT.md		R_PACKAGE_MANAGEMENT.md
Readme.md		Readme.md
deploy.sh		deploy.sh
postman_collection.json		postman_collection.json

definecoder/omicML-server

Folders and files

Latest commit

History

Repository files navigation

omicML: a bioinformatics and machine learning based graphical user interface for transcriptomic biomarker identification

Preprint

Table of Contents

Features

Architecture

System Requirements

Minimum

Software

Installation

Option 1 — Docker (Recommended)

Option 2 — Local Development

Step 1 — Install Python 3.11

Step 2 — Install R 4.4+

Step 3 — Install system libraries (Ubuntu/Debian only)

Step 4 — Install R packages

Step 5 — Set up the Python virtual environment

Step 6 — Run the development server

Configuration

API Overview

Test Dataset & Quickstart

Download Test Data

Input Format

Step-by-Step Quickstart (cURL)

1. Register a user account

2. Login (if account already exists)

3. Upload count data and metadata

4. Run normalization and quality control analysis

5. Run differential gene expression analysis

6. Retrieve a generated figure

Expected Output Summary

Interactive API Docs

Citation

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages