GEPBind is a hybrid antibody-antigen affinity predictor for (\Delta G) regression that combines:
- sequence representations from protein language models (winner uses ESM2), and
- graph-based structural encoding (winner uses GINE + Performer).
- Core training/evaluation code:
train_hybrid.pyesm2_embedder.pysrc/graphgps/mamba/
- Reproducibility data assets (75% clustered split):
datasets/pairs_sabdab_clean_clustered75_noprune.csvdatasets/seq_natural.fastadatasets/ABAG-DG/sabdab_clean/processed/data.pt
- Winner checkpoint and metadata:
checkpoints/win75_hybrid_s6/checkpoint_best.ptcheckpoints/win75_hybrid_s6/test_metrics.jsoncheckpoints/win75_hybrid_s6/hparams.json
- Curated results:
results/winner/results/ablation_core/results/ablation_graph_encoder/
pip install -r requirements.txt
pip install -e mambaOptional (only for antibody-specific PLM ablations):
pip install -r requirements-optional.txtbash scripts/eval_winning_checkpoint.shbash scripts/run_winning_holdout_train.shbash scripts/run_seq_graph_ablation_holdout.shbash scripts/run_graph_encoder_ablation_holdout.shSee configs/winning_holdout_s6.txt for the exact setting used in the reported winner.
- Winner holdout: RMSE 1.5693, Pearson 0.5752
- Seq-only holdout: RMSE 1.7586, Pearson 0.4423
- Graph-only holdout: RMSE 1.7445, Pearson 0.3707
Grouped 10-fold CV:
- Hybrid: RMSE 1.7259 +/- 0.3176, Pearson 0.3387 +/- 0.2116
- Seq-only: RMSE 1.7475 +/- 0.3224, Pearson 0.3255 +/- 0.2660
- Graph-only: RMSE 1.8008 +/- 0.2932, Pearson 0.2943 +/- 0.1275
Full summary: results/RESULTS_SUMMARY.md