Skip to content

Add structural analysis to the SepTop protocol#1982

Open
hannahbaumann wants to merge 17 commits into
mainfrom
structural_analysis_septop
Open

Add structural analysis to the SepTop protocol#1982
hannahbaumann wants to merge 17 commits into
mainfrom
structural_analysis_septop

Conversation

@hannahbaumann
Copy link
Copy Markdown
Contributor

@hannahbaumann hannahbaumann commented May 28, 2026

Checklist

  • All new code is appropriately documented (user-facing code must have complete docstrings).
  • Added a news entry, or the changes are not user-facing.
  • Ran pre-commit: you can run pre-commit locally or comment on this PR with pre-commit.ci autofix.

Manual Tests: these are slow so don't need to be run every commit, only before merging and when relevant changes are made (generally at reviewer-discretion).

Developers certificate of origin

@hannahbaumann hannahbaumann marked this pull request as draft May 28, 2026 10:05
@hannahbaumann hannahbaumann changed the title Add structural analysis to the SepTop protocol [WIP] Add structural analysis to the SepTop protocol May 28, 2026
@hannahbaumann
Copy link
Copy Markdown
Contributor Author

pre-commit.ci autofix

@hannahbaumann hannahbaumann self-assigned this May 28, 2026
@hannahbaumann
Copy link
Copy Markdown
Contributor Author

This is what it would look like for the complex right now.

protein_2D_RMSD ligand_B_RMSD ligand_B_COM_drift ligand_A_RMSD ligand_A_COM_drift

@codecov
Copy link
Copy Markdown

codecov Bot commented May 29, 2026

Codecov Report

❌ Patch coverage is 97.11538% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.59%. Comparing base (5347082) to head (78f01d5).

Files with missing lines Patch % Lines
src/openfe/protocols/openmm_septop/base_units.py 94.11% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1982      +/-   ##
==========================================
- Coverage   94.94%   90.59%   -4.35%     
==========================================
  Files         216      217       +1     
  Lines       20481    20687     +206     
==========================================
- Hits        19445    18741     -704     
- Misses       1036     1946     +910     
Flag Coverage Δ
fast-tests 90.59% <97.11%> (?)
slow-tests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@hannahbaumann
Copy link
Copy Markdown
Contributor Author

pre-commit.ci autofix

@hannahbaumann hannahbaumann marked this pull request as ready for review June 2, 2026 09:12
@hannahbaumann
Copy link
Copy Markdown
Contributor Author

pre-commit.ci autofix

u_top = mda.Universe(pdb_file)
for state_idx in range(n_lambda):
universe = create_universe_single_state(u_top._topology, ds, state=state_idx)
prot = universe.select_atoms(protein_selection)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if its possible or faster but could we do this selection once on the u_top to get the indices of the atoms and then just indexing to get the atoms in each loop?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this, not sure if it's faster, but definitely makes sense!

trj_file: pathlib.Path,
output_directory: pathlib.Path,
dry: bool,
simtype: str,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a literal type here with complex and solvent

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed!

n_frames = len(range(0, ds.dimensions["iteration"].size, ds.PositionInterval))
else:
n_frames = ds.dimensions["iteration"].size
skip = max(n_frames // 500, 1)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why we want to keep the analysis to only 500 frames, should this be exposed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I exposed it!

Comment on lines +1776 to +1792
if simtype == "complex":
np.savez_compressed(
npz_file,
ligand_A_RMSD=np.asarray(data["ligand_A_RMSD"], dtype=np.float32),
ligand_B_RMSD=np.asarray(data["ligand_B_RMSD"], dtype=np.float32),
ligand_A_COM_drift=np.asarray(data["ligand_A_COM_drift"], dtype=np.float32),
ligand_B_COM_drift=np.asarray(data["ligand_B_COM_drift"], dtype=np.float32),
protein_2D_RMSD=np.asarray(data["protein_2D_RMSD"], dtype=np.float32),
time_ps=np.asarray(data["time_ps"], dtype=np.float32),
)
else:
np.savez_compressed(
npz_file,
ligand_A_RMSD=np.asarray(data["ligand_A_RMSD"], dtype=np.float32),
ligand_B_RMSD=np.asarray(data["ligand_B_RMSD"], dtype=np.float32),
time_ps=np.asarray(data["time_ps"], dtype=np.float32),
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about building a dict of shared data that will be saved in both cases so the time_ps and ligand_A/B_RMSD data and then if the simtype=="complex" then add the extra data to the dict then you can have single np.savez_compressed call?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great idea, updated this!

ligand_B_indices: list[int],
rdmol_A: Chem.Mol,
rdmol_B: Chem.Mol,
protein_selection: str = "protein and name CA",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to expose this to users, and does it make sense to remove the protein_selection default from the private functions so that if you do update the default, you only have to do it in a single place, i.e single source of truth on the public function.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the default from the private functions.

I like the idea of exposing it to the user, but I'm not sure where yet. Would it make sense to add it to the MultiStateOutputSettings? Or should we create a new MultistateAnalysisSettings class? I think this would also be nice for the HybTop protocol, in case someone runs host-guest systems or similar.

Comment on lines +1912 to +1918
selection_indices = np.array(setup.outputs["selection_indices"])
ligand_A_indices = np.where(np.isin(selection_indices, setup.outputs["ligand_A_indices"]))[
0
].tolist()
ligand_B_indices = np.where(np.isin(selection_indices, setup.outputs["ligand_B_indices"]))[
0
].tolist()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got lost trying to follow this through but could this go wrong if the user accidentally changes the settings to only save the protein and water by mistake?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think in that case it would return an empty list, np.isin would return only false, so np.where wouldn't find anything.
I added a warning at the beginning of _structural_analysis that would give a meaningful structural_analysis_error. Does that make sense?

@hannahbaumann
Copy link
Copy Markdown
Contributor Author

pre-commit.ci autofix

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 4, 2026

No API break detected ✅

@hannahbaumann hannahbaumann requested a review from jthorton June 4, 2026 13:49
@hannahbaumann hannahbaumann changed the title [WIP] Add structural analysis to the SepTop protocol Add structural analysis to the SepTop protocol Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants