Skip to content

Commit 1707b48

Browse files
Add Py_AASequence wrapper for amino acid sequence operations (#24)
* Initial plan * Add Py_AASequence wrapper class with reverse and shuffle operations Co-authored-by: timosachsenberg <5803621+timosachsenberg@users.noreply.github.com> * Complete Py_AASequence implementation with tests and security checks Co-authored-by: timosachsenberg <5803621+timosachsenberg@users.noreply.github.com> * Add Py_AASequence documentation to README.md Co-authored-by: timosachsenberg <5803621+timosachsenberg@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: timosachsenberg <5803621+timosachsenberg@users.noreply.github.com>
1 parent 72ef880 commit 1707b48

File tree

4 files changed

+641
-0
lines changed

4 files changed

+641
-0
lines changed

README.md

Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -267,6 +267,78 @@ for key in feature:
267267
print(key, feature[key])
268268
```
269269

270+
### Working with Amino Acid Sequences
271+
272+
The `Py_AASequence` wrapper provides a Pythonic interface to amino acid sequences with support for common operations like sequence reversal and shuffling for decoy generation. All operations delegate to pyOpenMS functionality to minimize reimplementation.
273+
274+
```python
275+
from openms_python import Py_AASequence
276+
277+
# Create a sequence from string
278+
seq = Py_AASequence.from_string("PEPTIDERK")
279+
280+
# Access properties
281+
print(f"Sequence: {seq.sequence}") # PEPTIDERK
282+
print(f"Length: {len(seq)}") # 9
283+
print(f"Mono weight: {seq.mono_weight:.2f} Da") # 1083.56 Da
284+
print(f"Formula: {seq.formula}") # C46H77N13O17
285+
286+
# Iterate over amino acids
287+
for aa in seq:
288+
print(aa) # P, E, P, T, I, D, E, R, K
289+
290+
# Generate decoy sequences
291+
reversed_seq = seq.reverse()
292+
print(reversed_seq.sequence) # KREDITPEP
293+
294+
# Reverse with enzyme constraint (preserves cleavage sites)
295+
reversed_enzyme = seq.reverse_with_enzyme("Trypsin")
296+
print(reversed_enzyme.sequence) # EDITPEPRK
297+
298+
# Shuffle with reproducible seed
299+
shuffled = seq.shuffle(enzyme="Trypsin", seed=42)
300+
print(shuffled.sequence) # IPEDTEPRK (same with seed=42)
301+
302+
# Calculate m/z for different charge states
303+
mz1 = seq.get_mz(1) # 1084.56
304+
mz2 = seq.get_mz(2) # 542.79
305+
mz3 = seq.get_mz(3) # 362.19
306+
307+
# Query sequence content
308+
has_tide = seq.has_substring("TIDE") # True
309+
starts_pep = seq.has_prefix("PEP") # True
310+
ends_rk = seq.has_suffix("RK") # True
311+
312+
# Access individual residues
313+
first_aa = seq[0] # "P"
314+
315+
# Work with modified sequences
316+
mod_seq = Py_AASequence.from_string("PEPTIDEM(Oxidation)K")
317+
print(f"Is modified: {mod_seq.is_modified}") # True
318+
print(f"Unmodified: {mod_seq.unmodified_sequence}") # PEPTIDEMK
319+
```
320+
321+
**Properties:**
322+
- `sequence`: Full sequence string with modifications
323+
- `unmodified_sequence`: Sequence without modifications
324+
- `mono_weight`: Monoisotopic weight in Da
325+
- `average_weight`: Average weight in Da
326+
- `formula`: Molecular formula
327+
- `is_modified`: Whether sequence has modifications
328+
- `has_n_terminal_modification`: N-terminal modification status
329+
- `has_c_terminal_modification`: C-terminal modification status
330+
- `native`: Access to underlying pyOpenMS AASequence
331+
332+
**Methods:**
333+
- `from_string(sequence_str)`: Create from string (class method)
334+
- `reverse()`: Reverse entire sequence
335+
- `reverse_with_enzyme(enzyme)`: Reverse peptides between cleavage sites
336+
- `shuffle(enzyme, max_attempts, seed)`: Shuffle with enzyme constraints
337+
- `get_mz(charge)`: Calculate m/z for charge state
338+
- `has_substring(substring)`: Check for substring
339+
- `has_prefix(prefix)`: Check for prefix
340+
- `has_suffix(suffix)`: Check for suffix
341+
270342
### Working with Spectra
271343

272344
```python
@@ -866,6 +938,29 @@ plt.show()
866938
- `normalize_intensity(max_value)`: Normalize intensities to max value
867939
- `normalize_to_tic()`: Normalize so intensities sum to 1.0
868940

941+
### Py_AASequence
942+
943+
**Properties:**
944+
- `sequence`: Full sequence string with modifications
945+
- `unmodified_sequence`: Sequence without modifications
946+
- `mono_weight`: Monoisotopic weight in Da
947+
- `average_weight`: Average weight in Da
948+
- `formula`: Molecular formula
949+
- `is_modified`: Whether sequence has modifications
950+
- `has_n_terminal_modification`: N-terminal modification status
951+
- `has_c_terminal_modification`: C-terminal modification status
952+
- `native`: Access to underlying pyOpenMS AASequence
953+
954+
**Methods:**
955+
- `from_string(sequence_str)`: Create from string (class method)
956+
- `reverse()`: Reverse entire sequence
957+
- `reverse_with_enzyme(enzyme)`: Reverse peptides between enzymatic cleavage sites
958+
- `shuffle(enzyme, max_attempts, seed)`: Shuffle peptides with enzyme constraints
959+
- `get_mz(charge)`: Calculate m/z for given charge state
960+
- `has_substring(substring)`: Check if sequence contains substring
961+
- `has_prefix(prefix)`: Check if sequence starts with prefix
962+
- `has_suffix(suffix)`: Check if sequence ends with suffix
963+
869964
### Py_MSSpectrum
870965

871966
**Properties:**
@@ -954,6 +1049,10 @@ pip install -e ".[dev]"
9541049
| Iterate chromatograms | Manual loop + range check | `for chrom in exp.chromatograms():` |
9551050
| Peak data | `peaks = spec.get_peaks(); mz = peaks[0]` | `mz, intensity = spec.peaks` |
9561051
| DataFrame | Not available | `df = exp.to_dataframe()` |
1052+
| Create sequence | `oms.AASequence.fromString("PEP")` | `Py_AASequence.from_string("PEP")` |
1053+
| Get sequence weight | `seq.getMonoWeight()` | `seq.mono_weight` |
1054+
| Reverse sequence | `DecoyGenerator().reverseProtein(seq)` | `seq.reverse()` |
1055+
| Iterate residues | Manual loop with `getResidue(i)` | `for aa in seq:` |
9571056

9581057
## Contributing
9591058

openms_python/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@
2929
from .py_featuremap import Py_FeatureMap
3030
from .py_consensusmap import Py_ConsensusMap
3131
from .py_experimentaldesign import Py_ExperimentalDesign
32+
from .py_aasequence import Py_AASequence
3233
from .py_identifications import (
3334
ProteinIdentifications,
3435
PeptideIdentifications,
@@ -107,6 +108,7 @@ def get_example(name: str, *, load: bool = False, target_dir: Union[str, Path, N
107108
"Py_FeatureMap",
108109
"Py_ConsensusMap",
109110
"Py_ExperimentalDesign",
111+
"Py_AASequence",
110112
"ProteinIdentifications",
111113
"PeptideIdentifications",
112114
"Identifications",

0 commit comments

Comments
 (0)