Add structural analysis to the SepTop protocol#1982
Conversation
|
pre-commit.ci autofix |
for more information, see https://pre-commit.ci
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1982 +/- ##
==========================================
- Coverage 94.94% 90.59% -4.35%
==========================================
Files 216 217 +1
Lines 20481 20687 +206
==========================================
- Hits 19445 18741 -704
- Misses 1036 1946 +910
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
pre-commit.ci autofix |
for more information, see https://pre-commit.ci
|
pre-commit.ci autofix |
for more information, see https://pre-commit.ci
| u_top = mda.Universe(pdb_file) | ||
| for state_idx in range(n_lambda): | ||
| universe = create_universe_single_state(u_top._topology, ds, state=state_idx) | ||
| prot = universe.select_atoms(protein_selection) |
There was a problem hiding this comment.
Not sure if its possible or faster but could we do this selection once on the u_top to get the indices of the atoms and then just indexing to get the atoms in each loop?
There was a problem hiding this comment.
I updated this, not sure if it's faster, but definitely makes sense!
| trj_file: pathlib.Path, | ||
| output_directory: pathlib.Path, | ||
| dry: bool, | ||
| simtype: str, |
There was a problem hiding this comment.
Use a literal type here with complex and solvent
There was a problem hiding this comment.
Thanks, fixed!
| n_frames = len(range(0, ds.dimensions["iteration"].size, ds.PositionInterval)) | ||
| else: | ||
| n_frames = ds.dimensions["iteration"].size | ||
| skip = max(n_frames // 500, 1) |
There was a problem hiding this comment.
Is there a reason why we want to keep the analysis to only 500 frames, should this be exposed?
There was a problem hiding this comment.
Good point, I exposed it!
| if simtype == "complex": | ||
| np.savez_compressed( | ||
| npz_file, | ||
| ligand_A_RMSD=np.asarray(data["ligand_A_RMSD"], dtype=np.float32), | ||
| ligand_B_RMSD=np.asarray(data["ligand_B_RMSD"], dtype=np.float32), | ||
| ligand_A_COM_drift=np.asarray(data["ligand_A_COM_drift"], dtype=np.float32), | ||
| ligand_B_COM_drift=np.asarray(data["ligand_B_COM_drift"], dtype=np.float32), | ||
| protein_2D_RMSD=np.asarray(data["protein_2D_RMSD"], dtype=np.float32), | ||
| time_ps=np.asarray(data["time_ps"], dtype=np.float32), | ||
| ) | ||
| else: | ||
| np.savez_compressed( | ||
| npz_file, | ||
| ligand_A_RMSD=np.asarray(data["ligand_A_RMSD"], dtype=np.float32), | ||
| ligand_B_RMSD=np.asarray(data["ligand_B_RMSD"], dtype=np.float32), | ||
| time_ps=np.asarray(data["time_ps"], dtype=np.float32), | ||
| ) |
There was a problem hiding this comment.
What about building a dict of shared data that will be saved in both cases so the time_ps and ligand_A/B_RMSD data and then if the simtype=="complex" then add the extra data to the dict then you can have single np.savez_compressed call?
There was a problem hiding this comment.
That's a great idea, updated this!
| ligand_B_indices: list[int], | ||
| rdmol_A: Chem.Mol, | ||
| rdmol_B: Chem.Mol, | ||
| protein_selection: str = "protein and name CA", |
There was a problem hiding this comment.
Do you want to expose this to users, and does it make sense to remove the protein_selection default from the private functions so that if you do update the default, you only have to do it in a single place, i.e single source of truth on the public function.
There was a problem hiding this comment.
Removed the default from the private functions.
I like the idea of exposing it to the user, but I'm not sure where yet. Would it make sense to add it to the MultiStateOutputSettings? Or should we create a new MultistateAnalysisSettings class? I think this would also be nice for the HybTop protocol, in case someone runs host-guest systems or similar.
| selection_indices = np.array(setup.outputs["selection_indices"]) | ||
| ligand_A_indices = np.where(np.isin(selection_indices, setup.outputs["ligand_A_indices"]))[ | ||
| 0 | ||
| ].tolist() | ||
| ligand_B_indices = np.where(np.isin(selection_indices, setup.outputs["ligand_B_indices"]))[ | ||
| 0 | ||
| ].tolist() |
There was a problem hiding this comment.
I got lost trying to follow this through but could this go wrong if the user accidentally changes the settings to only save the protein and water by mistake?
There was a problem hiding this comment.
So I think in that case it would return an empty list, np.isin would return only false, so np.where wouldn't find anything.
I added a warning at the beginning of _structural_analysis that would give a meaningful structural_analysis_error. Does that make sense?
|
pre-commit.ci autofix |
for more information, see https://pre-commit.ci
|
No API break detected ✅ |





Checklist
newsentry, or the changes are not user-facing.pre-commit.ci autofix.Manual Tests: these are slow so don't need to be run every commit, only before merging and when relevant changes are made (generally at reviewer-discretion).
Developers certificate of origin