|
| 1 | +# GitHub Copilot Instructions for openms-python |
| 2 | + |
| 3 | +## Repository Overview |
| 4 | + |
| 5 | +`openms-python` is a Pythonic wrapper around pyOpenMS for mass spectrometry data analysis. The goal is to provide an intuitive, Python-friendly interface that makes working with mass spectrometry data feel natural for Python developers and data scientists. |
| 6 | + |
| 7 | +**Key Principle**: Make pyOpenMS more Pythonic by wrapping verbose C++ bindings with intuitive Python APIs. |
| 8 | + |
| 9 | +## Code Style and Conventions |
| 10 | + |
| 11 | +### Python Style |
| 12 | +- Follow PEP 8 conventions |
| 13 | +- Use Black formatter with 100 character line length (configured in `pyproject.toml`) |
| 14 | +- Target Python 3.8+ compatibility |
| 15 | +- Use type hints for better IDE support and code clarity |
| 16 | +- Prefer clear, descriptive names over abbreviations |
| 17 | + |
| 18 | +### Wrapper Design Patterns |
| 19 | + |
| 20 | +1. **Properties over getters/setters**: Use `@property` decorators instead of verbose get/set methods |
| 21 | + ```python |
| 22 | + # Good |
| 23 | + spec.retention_time |
| 24 | + # Avoid |
| 25 | + spec.getRT() |
| 26 | + ``` |
| 27 | + |
| 28 | +2. **Pythonic iteration**: Support Python's iteration protocols (`__iter__`, `__len__`, `__getitem__`) |
| 29 | + ```python |
| 30 | + for spec in experiment.ms1_spectra(): |
| 31 | + print(spec.retention_time) |
| 32 | + ``` |
| 33 | + |
| 34 | +3. **Method chaining**: Return `self` from mutation methods to enable fluent interfaces |
| 35 | + ```python |
| 36 | + exp.filter_by_ms_level(1).filter_by_rt(100, 500) |
| 37 | + ``` |
| 38 | + |
| 39 | +4. **DataFrame integration**: Provide `to_dataframe()` and `from_dataframe()` methods for pandas interoperability |
| 40 | + |
| 41 | +5. **Context managers**: Support `with` statements for file I/O operations |
| 42 | + |
| 43 | +6. **Mapping interface for metadata**: Classes wrapping `MetaInfoInterface` should support dict-like access |
| 44 | + ```python |
| 45 | + feature["label"] = "sample_a" |
| 46 | + ``` |
| 47 | + |
| 48 | +### Class Naming Convention |
| 49 | +- Wrapper classes use the `Py_` prefix (e.g., `Py_MSExperiment`, `Py_FeatureMap`) |
| 50 | +- This distinguishes them from pyOpenMS classes while maintaining recognizability |
| 51 | + |
| 52 | +### File Organization |
| 53 | +- Core wrapper classes: `py_*.py` files (e.g., `py_msexperiment.py`, `py_featuremap.py`) |
| 54 | +- I/O utilities: `io.py` and `_io_utils.py` |
| 55 | +- Helper utilities: `_meta_mapping.py` for metadata handling |
| 56 | +- Workflow helpers: `workflows.py` for high-level pipelines |
| 57 | +- Example data: `examples/` directory contains sample files like `small.mzML` |
| 58 | + |
| 59 | +## Testing Requirements |
| 60 | + |
| 61 | +### Test Structure |
| 62 | +- All tests in `tests/` directory |
| 63 | +- Test files follow `test_*.py` naming convention |
| 64 | +- Use pytest as the testing framework |
| 65 | +- Aim for good coverage of wrapper functionality |
| 66 | + |
| 67 | +### Running Tests |
| 68 | +```bash |
| 69 | +# Install development dependencies |
| 70 | +pip install -e ".[dev]" |
| 71 | + |
| 72 | +# Run all tests |
| 73 | +pytest -v |
| 74 | + |
| 75 | +# Run with coverage |
| 76 | +pytest -v --cov=openms_python --cov-report=term-missing |
| 77 | +``` |
| 78 | + |
| 79 | +### Test Patterns |
| 80 | +- Test basic wrapper functionality (properties, methods) |
| 81 | +- Test DataFrame conversions (to/from) |
| 82 | +- Test file I/O (load/store operations) |
| 83 | +- Test iteration and filtering |
| 84 | +- Test method chaining |
| 85 | +- Use `conftest.py` for shared fixtures |
| 86 | + |
| 87 | +## Development Setup |
| 88 | + |
| 89 | +### Installation |
| 90 | +```bash |
| 91 | +git clone https://github.com/openms/openms-python.git |
| 92 | +cd openms-python |
| 93 | +pip install -e ".[dev]" |
| 94 | +``` |
| 95 | + |
| 96 | +### Dependencies |
| 97 | +- **Core**: pyopenms (>=3.0.0), pandas (>=1.3.0), numpy (>=1.20.0) |
| 98 | +- **Dev**: pytest, pytest-cov, black, flake8, mypy |
| 99 | + |
| 100 | +### Code Formatting |
| 101 | +```bash |
| 102 | +# Format code with Black |
| 103 | +black openms_python tests |
| 104 | + |
| 105 | +# Check style with flake8 |
| 106 | +flake8 openms_python tests |
| 107 | +``` |
| 108 | + |
| 109 | +## Key Architecture Patterns |
| 110 | + |
| 111 | +### 1. Wrapper Pattern |
| 112 | +Most classes wrap a corresponding pyOpenMS class and delegate to it while providing Pythonic interfaces: |
| 113 | +```python |
| 114 | +class Py_MSExperiment: |
| 115 | + def __init__(self, exp=None): |
| 116 | + self._exp = exp if exp is not None else oms.MSExperiment() |
| 117 | + |
| 118 | + @property |
| 119 | + def retention_time(self): |
| 120 | + return self._exp.getRT() |
| 121 | +``` |
| 122 | + |
| 123 | +### 2. Factory Methods |
| 124 | +Use class methods for alternative constructors: |
| 125 | +```python |
| 126 | +@classmethod |
| 127 | +def from_file(cls, filepath): |
| 128 | + # Load from file and return new instance |
| 129 | + |
| 130 | +@classmethod |
| 131 | +def from_dataframe(cls, df): |
| 132 | + # Create from pandas DataFrame |
| 133 | +``` |
| 134 | + |
| 135 | +### 3. Smart Filtering |
| 136 | +Provide multiple ways to filter data: |
| 137 | +- Method-based: `filter_by_rt(min_rt, max_rt)` |
| 138 | +- Property-based: `rt_filter[min:max]` |
| 139 | +- Iterator-based: `ms1_spectra()`, `ms2_spectra()` |
| 140 | + |
| 141 | +### 4. Metadata Handling |
| 142 | +Classes that wrap `MetaInfoInterface` should implement mapping protocol: |
| 143 | +- `__getitem__`, `__setitem__`, `__delitem__` |
| 144 | +- `__contains__`, `__iter__`, `__len__` |
| 145 | +- `get()`, `pop()`, `update()` methods |
| 146 | + |
| 147 | +## Common Tasks |
| 148 | + |
| 149 | +### Adding a New Wrapper Class |
| 150 | +1. Create a new `py_<classname>.py` file |
| 151 | +2. Wrap the corresponding pyOpenMS class |
| 152 | +3. Add Pythonic properties for common getters/setters |
| 153 | +4. Implement `__len__`, `__iter__`, `__getitem__` if applicable |
| 154 | +5. Add `to_dataframe()` and `from_dataframe()` if appropriate |
| 155 | +6. Add `load()` and `store()` methods for file I/O |
| 156 | +7. Write comprehensive tests in `tests/test_py_<classname>.py` |
| 157 | +8. Update `__init__.py` to export the new class |
| 158 | +9. Add examples to README.md |
| 159 | + |
| 160 | +### Adding Helper Functions |
| 161 | +- High-level workflow functions go in `workflows.py` |
| 162 | +- I/O utilities go in `io.py` or `_io_utils.py` |
| 163 | +- Metadata utilities go in `_meta_mapping.py` |
| 164 | + |
| 165 | +### Documentation |
| 166 | +- Add docstrings to all public classes and methods |
| 167 | +- Include usage examples in docstrings |
| 168 | +- Update README.md with new features |
| 169 | +- Keep API reference section in README current |
| 170 | + |
| 171 | +## Special Considerations |
| 172 | + |
| 173 | +### Memory Management |
| 174 | +- Be mindful of memory when working with large datasets |
| 175 | +- Provide streaming alternatives for large files (see `stream_mzml`) |
| 176 | +- Consider using generators for iteration over large collections |
| 177 | + |
| 178 | +### pyOpenMS Compatibility |
| 179 | +- The package depends on pyOpenMS >= 3.0.0 |
| 180 | +- When wrapping pyOpenMS classes, preserve all functionality |
| 181 | +- Add convenience methods but don't remove or break existing capabilities |
| 182 | + |
| 183 | +### Error Handling |
| 184 | +- Provide clear, helpful error messages |
| 185 | +- Validate inputs before passing to pyOpenMS |
| 186 | +- Handle common edge cases (empty containers, missing files, etc.) |
| 187 | + |
| 188 | +### Performance |
| 189 | +- Wrapper overhead should be minimal |
| 190 | +- Avoid unnecessary data copies |
| 191 | +- Use NumPy arrays for peak data when possible |
| 192 | +- Consider performance implications of DataFrame conversions |
| 193 | + |
| 194 | +## Examples and Documentation |
| 195 | + |
| 196 | +The README.md contains extensive examples. When adding new features: |
| 197 | +1. Add code examples showing the improvement over pyOpenMS |
| 198 | +2. Use "Before (pyOpenMS)" vs "After (openms-python)" format |
| 199 | +3. Include practical use cases |
| 200 | +4. Show integration with pandas/numpy when relevant |
| 201 | + |
| 202 | +## CI/CD |
| 203 | + |
| 204 | +The repository uses GitHub Actions for continuous integration: |
| 205 | +- Workflow: `.github/workflows/integration-tests.yml` |
| 206 | +- Runs on: Python 3.10 (configurable via matrix) |
| 207 | +- Tests run automatically on push to main and on pull requests |
| 208 | + |
| 209 | +## Contributing Guidelines |
| 210 | + |
| 211 | +When contributing: |
| 212 | +1. Make minimal, focused changes |
| 213 | +2. Maintain backward compatibility unless explicitly breaking |
| 214 | +3. Add tests for new functionality |
| 215 | +4. Format code with Black |
| 216 | +5. Ensure all tests pass |
| 217 | +6. Update documentation as needed |
| 218 | + |
| 219 | +## Questions or Issues? |
| 220 | + |
| 221 | +- Check existing documentation in README.md |
| 222 | +- Review existing wrapper implementations for patterns |
| 223 | +- Look at test files for usage examples |
| 224 | +- Open a discussion on GitHub for design questions |
0 commit comments