Add synthetic-EHR generative evaluation metrics by chufangao · Pull Request #1148 · sunlabuiuc/PyHealth

chufangao · 2026-05-18T05:29:33Z

Summary

Adds pyhealth/metrics/generative/, a subpackage for evaluating synthetic
EHR data along three axes — privacy, utility, and statistical
fidelity.

privacy.py — calc_nnaar (Nearest Neighbor Adversarial Accuracy
Risk), calc_membership_inference (membership inference attack), and
compute_discriminator_privacy (real-vs-synthetic discriminator score).
utility.py — compute_mle (machine learning efficacy, TRTR vs TSTR)
and compute_prevalence_metrics (code-prevalence similarity: R², Pearson,
RMSE).
utils.py — shared data prep, a self-contained LSTM classifier, and a
random-forest baseline.
evaluate_synthetic_ehr() — convenience orchestrator that runs the
full suite and returns one merged {metric: (mean, std)} dict.

The metrics are ported from a standalone evaluation script. The
MIMIC-specific data-loading/CLI glue is dropped so the functions work on any
flat EHR dataframe (one row per patient/visit/code event). Public functions
are re-exported from pyhealth.metrics.

Cleanups applied during the port

logging instead of bare print calls.
Fixed a latent CUDA crash in the LSTM eval loop (.cpu().numpy()).
Replaced scipy.stats.pearsonr with numpy.corrcoef to avoid an
undeclared scipy dependency.
Input dataframes are copied instead of mutated in place.
Google-style docstrings, type hints, PEP8 (≤88 chars).

Tests

tests/core/test_generative_metrics.py — 18 unittest cases, all passing:

13 functional tests covering each metric and the orchestrator
(lstm + rf modes, argument validation).

5 behavioral tests (TestMetricsBehavior) that verify each metric
responds sensibly across three synthetic datasets — an exact copy of the
training data, a similar set (~15% of codes perturbed), and a different
set (disjoint code vocabulary):

Metric	Verified behavior
Prevalence	RMSE `0 → 0.03 → 0.26`; exact copy → RMSE 0, R²/Pearson = 1
NNAAR	Flags memorization: `1.0 → 0.1 → 0.0`
Membership inference	Attack accuracy `1.0 → 0.94 → 0.46` (chance for unrelated data)
Discriminator privacy	Disjoint-vocabulary data trivially flagged; real-derived data is not
MLE (utility)	Exact copy reproduces real utility exactly; ratio degrades `1.0 → 0.98 → 0.81`

Docs

Added docs/api/metrics/pyhealth.metrics.generative.rst and a toctree entry
in docs/api/metrics.rst.

Notes

The discriminator-privacy score is degenerate for exact copies (the model
predicts a constant on identical features, so the score reflects test-split
balance rather than 0.5). The behavioral test asserts the robust direction —
disjoint synthetic data is cleanly flagged while real-derived data is not.

Adds pyhealth/metrics/generative/, a subpackage for evaluating synthetic EHR data along privacy, utility, and statistical-fidelity axes: - privacy.py: NNAAR, membership inference attack, discriminator privacy - utility.py: machine learning efficacy (TRTR vs TSTR), code-prevalence similarity (R2, Pearson, RMSE) - utils.py: shared data prep, an LSTM classifier, and a random-forest baseline - evaluate_synthetic_ehr(): convenience orchestrator for the full suite These functions are ported from a standalone evaluation script. The MIMIC-specific data-loading/CLI glue is dropped; the metrics work on any flat EHR dataframe. Public functions are re-exported from pyhealth.metrics. Adds unit tests in tests/core/test_generative_metrics.py and Sphinx docs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chufangao and others added 2 commits May 17, 2026 23:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add synthetic-EHR generative evaluation metrics#1148

Add synthetic-EHR generative evaluation metrics#1148
chufangao wants to merge 2 commits into
sunlabuiuc:masterfrom
chufangao:chil26_evals2

chufangao commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

chufangao commented May 18, 2026

Summary

Cleanups applied during the port

Tests

Docs

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant