Add structural analysis to the SepTop protocol by hannahbaumann · Pull Request #1982 · OpenFreeEnergy/openfe

hannahbaumann · 2026-05-28T10:05:26Z

Checklist

All new code is appropriately documented (user-facing code must have complete docstrings).
Added a news entry, or the changes are not user-facing.
Ran pre-commit: you can run pre-commit locally or comment on this PR with pre-commit.ci autofix.

Manual Tests: these are slow so don't need to be run every commit, only before merging and when relevant changes are made (generally at reviewer-discretion).

GPU integration tests
example notebook testing
packaging tests: run this for any large feature PRs or PRs that add test data.

Developers certificate of origin

I certify that this contribution is covered by the MIT License here and the Developer Certificate of Origin at https://developercertificate.org/.

hannahbaumann · 2026-05-28T10:06:06Z

pre-commit.ci autofix

for more information, see https://pre-commit.ci

hannahbaumann · 2026-05-29T11:04:42Z

This is what it would look like for the complex right now.

codecov · 2026-05-29T13:30:30Z

Codecov Report

❌ Patch coverage is 97.11538% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.59%. Comparing base (5347082) to head (78f01d5).

Files with missing lines	Patch %	Lines
src/openfe/protocols/openmm_septop/base_units.py	94.11%	6 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1982      +/-   ##
==========================================
- Coverage   94.94%   90.59%   -4.35%     
==========================================
  Files         216      217       +1     
  Lines       20481    20687     +206     
==========================================
- Hits        19445    18741     -704     
- Misses       1036     1946     +910

Flag	Coverage Δ
fast-tests	`90.59% <97.11%> (?)`
slow-tests	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

hannahbaumann · 2026-06-02T09:12:02Z

pre-commit.ci autofix

for more information, see https://pre-commit.ci

hannahbaumann · 2026-06-02T09:34:13Z

pre-commit.ci autofix

for more information, see https://pre-commit.ci

jthorton · 2026-06-03T16:17:19Z

+        u_top = mda.Universe(pdb_file)
+        for state_idx in range(n_lambda):
+            universe = create_universe_single_state(u_top._topology, ds, state=state_idx)
+            prot = universe.select_atoms(protein_selection)


Not sure if its possible or faster but could we do this selection once on the u_top to get the indices of the atoms and then just indexing to get the atoms in each loop?

I updated this, not sure if it's faster, but definitely makes sense!

jthorton · 2026-06-03T16:20:39Z

+        trj_file: pathlib.Path,
+        output_directory: pathlib.Path,
+        dry: bool,
+        simtype: str,


Use a literal type here with complex and solvent

Thanks, fixed!

jthorton · 2026-06-03T16:25:24Z

+                    n_frames = len(range(0, ds.dimensions["iteration"].size, ds.PositionInterval))
+                else:
+                    n_frames = ds.dimensions["iteration"].size
+                skip = max(n_frames // 500, 1)


Is there a reason why we want to keep the analysis to only 500 frames, should this be exposed?

Good point, I exposed it!

jthorton · 2026-06-03T16:33:31Z

+        if simtype == "complex":
+            np.savez_compressed(
+                npz_file,
+                ligand_A_RMSD=np.asarray(data["ligand_A_RMSD"], dtype=np.float32),
+                ligand_B_RMSD=np.asarray(data["ligand_B_RMSD"], dtype=np.float32),
+                ligand_A_COM_drift=np.asarray(data["ligand_A_COM_drift"], dtype=np.float32),
+                ligand_B_COM_drift=np.asarray(data["ligand_B_COM_drift"], dtype=np.float32),
+                protein_2D_RMSD=np.asarray(data["protein_2D_RMSD"], dtype=np.float32),
+                time_ps=np.asarray(data["time_ps"], dtype=np.float32),
+            )
+        else:
+            np.savez_compressed(
+                npz_file,
+                ligand_A_RMSD=np.asarray(data["ligand_A_RMSD"], dtype=np.float32),
+                ligand_B_RMSD=np.asarray(data["ligand_B_RMSD"], dtype=np.float32),
+                time_ps=np.asarray(data["time_ps"], dtype=np.float32),
+            )


What about building a dict of shared data that will be saved in both cases so the time_ps and ligand_A/B_RMSD data and then if the simtype=="complex" then add the extra data to the dict then you can have single np.savez_compressed call?

That's a great idea, updated this!

jthorton · 2026-06-03T16:50:42Z

+        ligand_B_indices: list[int],
+        rdmol_A: Chem.Mol,
+        rdmol_B: Chem.Mol,
+        protein_selection: str = "protein and name CA",


Do you want to expose this to users, and does it make sense to remove the protein_selection default from the private functions so that if you do update the default, you only have to do it in a single place, i.e single source of truth on the public function.

Removed the default from the private functions.

I like the idea of exposing it to the user, but I'm not sure where yet. Would it make sense to add it to the MultiStateOutputSettings? Or should we create a new MultistateAnalysisSettings class? I think this would also be nice for the HybTop protocol, in case someone runs host-guest systems or similar.

jthorton · 2026-06-03T16:56:47Z

+        selection_indices = np.array(setup.outputs["selection_indices"])
+        ligand_A_indices = np.where(np.isin(selection_indices, setup.outputs["ligand_A_indices"]))[
+            0
+        ].tolist()
+        ligand_B_indices = np.where(np.isin(selection_indices, setup.outputs["ligand_B_indices"]))[
+            0
+        ].tolist()


I got lost trying to follow this through but could this go wrong if the user accidentally changes the settings to only save the protein and water by mistake?

So I think in that case it would return an empty list, np.isin would return only false, so np.where wouldn't find anything.
I added a warning at the beginning of _structural_analysis that would give a meaningful structural_analysis_error. Does that make sense?

hannahbaumann · 2026-06-04T13:01:27Z

pre-commit.ci autofix

for more information, see https://pre-commit.ci

github-actions · 2026-06-04T13:06:31Z

No API break detected ✅

First pass at adding structural analysis to the SepTop protocol

ea6e58a

hannahbaumann marked this pull request as draft May 28, 2026 10:05

hannahbaumann changed the title ~~Add structural analysis to the SepTop protocol~~ [WIP] Add structural analysis to the SepTop protocol May 28, 2026

pre-commit-ci Bot and others added 3 commits May 28, 2026 10:06

[pre-commit.ci] auto fixes from pre-commit.com hooks

b0538ac

for more information, see https://pre-commit.ci

Small fix

0ae11d9

Small fix

00d3066

hannahbaumann self-assigned this May 28, 2026

hannahbaumann added 3 commits May 29, 2026 09:43

Fix ligand indices mapping

14381f9

Fix complex alignment

811a1b6

Small fix

54d4839

Add ligand indices to test

e1c101e

hannahbaumann added 2 commits June 1, 2026 15:40

Some updates

14ce7fd

Add tests for structural analysis

d953848

hannahbaumann marked this pull request as ready for review June 2, 2026 09:12

pre-commit-ci Bot and others added 2 commits June 2, 2026 09:12

[pre-commit.ci] auto fixes from pre-commit.com hooks

850b1a5

for more information, see https://pre-commit.ci

fix mypy

5394785

pre-commit-ci Bot and others added 2 commits June 2, 2026 09:34

[pre-commit.ci] auto fixes from pre-commit.com hooks

62a34dc

for more information, see https://pre-commit.ci

Update env

d1e3abe

hannahbaumann requested review from IAlibay and jthorton June 2, 2026 12:41

jameseastwood assigned IAlibay and unassigned hannahbaumann Jun 3, 2026

jthorton reviewed Jun 3, 2026

View reviewed changes

hannahbaumann assigned jthorton Jun 4, 2026

Handle empty ligand case

14d7072

pre-commit-ci Bot and others added 2 commits June 4, 2026 13:01

[pre-commit.ci] auto fixes from pre-commit.com hooks

fedc1ea

for more information, see https://pre-commit.ci

mypy fixes

78f01d5

hannahbaumann requested a review from jthorton June 4, 2026 13:49

hannahbaumann changed the title ~~[WIP] Add structural analysis to the SepTop protocol~~ Add structural analysis to the SepTop protocol Jun 4, 2026

Conversation

hannahbaumann commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Developers certificate of origin

Uh oh!

hannahbaumann commented May 28, 2026

Uh oh!

hannahbaumann commented May 29, 2026

Uh oh!

codecov Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

hannahbaumann commented Jun 2, 2026

Uh oh!

hannahbaumann commented Jun 2, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hannahbaumann commented Jun 4, 2026

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hannahbaumann commented May 28, 2026 •

edited

Loading

codecov Bot commented May 29, 2026 •

edited

Loading