-
Notifications
You must be signed in to change notification settings - Fork 178
Add comprehensive CONTRIBUTING.md guide for new contributors #860
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
jonbrenas
merged 4 commits into
malariagen:master
from
adilraza99:docs/add-contributing-md
Feb 21, 2026
Merged
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
b69d587
Add comprehensive CONTRIBUTING.md guide for new contributors
adilraza99 f58a18d
Update CONTRIBUTING guide to address maintainer review feedback
adilraza99 d69d533
Merge branch 'master' into docs/add-contributing-md
jonbrenas 29a37af
docs: clarify Poetry workflow and add optional shell plugin instructions
adilraza99 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,238 @@ | ||
| # Contributing to malariagen-data-python | ||
|
|
||
| Thanks for your interest in contributing to this project! This guide will help you get started. | ||
|
|
||
| ## About the project | ||
|
|
||
| This package provides Python tools for accessing and analyzing genomic data from [MalariaGEN](https://www.malariagen.net/), a global research network studying the genomic epidemiology of malaria and its vectors. It provides access to data on _Anopheles_ mosquito species and _Plasmodium_ malaria parasites, with functionality for variant analysis, haplotype clustering, population genetics, and visualization. | ||
|
|
||
| ## Setting up your development environment | ||
|
|
||
| ### Prerequisites | ||
|
|
||
| You'll need: | ||
|
|
||
| - Python 3.10.x (CI-tested version) | ||
| - [Poetry](https://python-poetry.org/) for dependency management | ||
| - [Git](https://git-scm.com/) for version control | ||
|
|
||
| ### Initial setup | ||
|
|
||
| 1. **Fork and clone the repository** | ||
|
|
||
| Fork the repository on GitHub, then clone your fork: | ||
|
|
||
| ```bash | ||
| git clone git@github.com:[your-username]/malariagen-data-python.git | ||
| cd malariagen-data-python | ||
| ``` | ||
|
|
||
| 2. **Add the upstream remote** | ||
|
|
||
| ```bash | ||
| git remote add upstream https://github.com/malariagen/malariagen-data-python.git | ||
| ``` | ||
|
|
||
| 3. **Install Poetry** (if not already installed) | ||
|
|
||
| ```bash | ||
| pipx install poetry | ||
| ``` | ||
|
|
||
| 4. **Install the project and its dependencies** | ||
|
|
||
| ```bash | ||
| poetry install | ||
| ``` | ||
|
|
||
| **Recommended**: Use `poetry run` to run commands inside the virtual environment: | ||
|
|
||
| ```bash | ||
| poetry run pytest | ||
| poetry run python script.py | ||
| ``` | ||
|
|
||
| **Optional**: If you prefer an interactive shell session, install the shell plugin first: | ||
|
|
||
| ```bash | ||
| poetry self add poetry-plugin-shell | ||
| ``` | ||
|
|
||
| Then activate the environment with: | ||
|
|
||
| ```bash | ||
| poetry shell | ||
| ``` | ||
|
|
||
| After activation, commands run directly inside the virtual environment: | ||
|
|
||
| ```bash | ||
| pytest | ||
| python script.py | ||
| ``` | ||
|
|
||
| 5. **Install pre-commit hooks** | ||
|
|
||
| ```bash | ||
| pipx install pre-commit | ||
| pre-commit install | ||
| ``` | ||
|
|
||
| Pre-commit hooks will automatically run `ruff` (linter and formatter) on your changes before each commit. | ||
|
|
||
| ## Development workflow | ||
|
|
||
| ### Creating a new feature or fix | ||
|
|
||
| 1. **Sync with upstream** | ||
|
|
||
| ```bash | ||
| git checkout master | ||
| git pull upstream master | ||
| ``` | ||
|
|
||
| 2. **Create a feature branch** | ||
|
|
||
| If an issue does not already exist for your change, [create one](https://github.com/malariagen/malariagen-data-python/issues/new) first. Then create a branch using the convention `GH{issue number}-{short description}`: | ||
|
|
||
| ```bash | ||
| git checkout -b GH123-fix-broken-filter | ||
| # or | ||
| git checkout -b GH456-add-new-analysis | ||
| ``` | ||
|
|
||
| 3. **Make your changes** | ||
|
|
||
| Write your code, add tests, update documentation as needed. | ||
|
|
||
| 4. **Run tests locally** | ||
|
|
||
| Fast unit tests (no external data access): | ||
|
|
||
| ```bash | ||
| poetry run pytest -v tests/anoph | ||
| ``` | ||
|
|
||
| All unit tests (requires setting up credentials for legacy tests): | ||
|
|
||
| ```bash | ||
| poetry run pytest -v tests --ignore tests/integration | ||
| ``` | ||
|
|
||
| 5. **Check code quality** | ||
|
|
||
| The pre-commit hooks will run automatically, but you can also run them manually: | ||
|
|
||
| ```bash | ||
| pre-commit run --all-files | ||
| ``` | ||
|
|
||
| ### Code style | ||
|
|
||
| We use `ruff` for both linting and formatting. The configuration is in `pyproject.toml`. Key points: | ||
|
|
||
| - Line length: 88 characters (black default) | ||
| - Follow PEP 8 conventions | ||
| - Use type hints where appropriate | ||
| - Write clear docstrings (we use numpydoc format) | ||
|
|
||
| The pre-commit hooks will handle most formatting automatically. If you want to run ruff manually: | ||
|
|
||
| ```bash | ||
| ruff check . | ||
| ruff format . | ||
| ``` | ||
|
|
||
| ### Testing | ||
|
|
||
| - **Write tests for new functionality**: Add unit tests in the `tests/` directory | ||
| - **Test coverage**: Aim to maintain or improve test coverage | ||
| - **Fast tests**: Unit tests should use simulated data when possible (see `tests/anoph/`) | ||
| - **Integration tests**: Tests requiring GCS data access are slower and run separately | ||
|
|
||
| Run type checking with: | ||
|
|
||
| ```bash | ||
| poetry run pytest -v tests --typeguard-packages=malariagen_data,malariagen_data.anoph | ||
| ``` | ||
|
|
||
| ### Documentation | ||
|
|
||
| - Update docstrings if you modify public APIs | ||
| - Documentation is built using Sphinx with the pydata theme | ||
| - API docs are auto-generated from docstrings | ||
| - Follow the [numpydoc](https://numpydoc.readthedocs.io/) style guide | ||
|
|
||
| ## Submitting your contribution | ||
|
|
||
| ### Before opening a pull request | ||
|
|
||
| - [ ] Tests pass locally | ||
| - [ ] Pre-commit hooks pass (or run `pre-commit run --all-files`) | ||
| - [ ] Code is well-documented | ||
| - [ ] Commit messages are clear and descriptive | ||
|
|
||
| ### Opening a pull request | ||
|
|
||
| 1. **Push your branch** | ||
|
|
||
| ```bash | ||
| git push origin your-branch-name | ||
| ``` | ||
|
|
||
| 2. **Create the pull request** | ||
| - Go to the [repository on GitHub](https://github.com/malariagen/malariagen-data-python) | ||
| - Click "Pull requests" → "New pull request" | ||
| - Select your fork and branch | ||
| - Write a clear title and description | ||
|
|
||
| 3. **Pull request description should include:** | ||
| - What problem does this solve? | ||
| - How does it solve it? | ||
| - Any relevant issue numbers (e.g., "Fixes #123") | ||
| - Testing done | ||
| - Any breaking changes or migration notes | ||
|
|
||
| ### Review process | ||
|
|
||
| - PRs require approval from a project maintainer | ||
| - CI tests must pass (pytest on Python 3.10 with NumPy 1.26.4) | ||
| - Address review feedback by pushing new commits to your branch | ||
| - Once approved, a maintainer will merge your PR | ||
|
|
||
| ## AI-assisted contributions | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I need to discuss what our policy is around AI-use with @cclarkson and @ahernank, but this looks like a great start. |
||
|
|
||
| We welcome contributions that involve AI tools (like GitHub Copilot, ChatGPT, or similar). If you use AI assistance: | ||
|
|
||
| - Review and understand any AI-generated code before submitting | ||
| - Ensure the code follows project conventions and passes all tests | ||
| - You remain responsible for the quality and correctness of the contribution | ||
| - Disclosure of AI usage is optional. Regardless of tools used, contributors remain responsible for the quality and correctness of their submissions. | ||
|
|
||
| ## Communication | ||
|
|
||
| - **Issues**: Use [GitHub Issues](https://github.com/malariagen/malariagen-data-python/issues) for bug reports and feature requests | ||
| - **Discussions**: For questions and general discussion, use [GitHub Discussions](https://github.com/malariagen/malariagen-data-python/discussions) | ||
| - **Pull requests**: Use PR comments for code review discussions | ||
| - **Email**: For data access questions, contact [support@malariagen.net](mailto:support@malariagen.net) | ||
|
|
||
| ## Finding something to work on | ||
|
|
||
| - Look for issues labeled [`good first issue`](https://github.com/malariagen/malariagen-data-python/labels/good%20first%20issue) | ||
| - Check for issues labeled [`help wanted`](https://github.com/malariagen/malariagen-data-python/labels/help%20wanted) | ||
| - Improve documentation or add examples | ||
| - Increase test coverage | ||
|
|
||
| ## Questions? | ||
|
|
||
| If you're unsure about anything, feel free to: | ||
|
|
||
| - Open an issue to ask | ||
| - Start a discussion on GitHub Discussions | ||
| - Ask in your pull request | ||
|
|
||
| We appreciate your contributions and will do our best to help you succeed! | ||
|
|
||
| ## License | ||
|
|
||
| By contributing to this project, you agree that your contributions will be licensed under the [MIT License](LICENSE). | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a step is missing around generating issues if they do not exist. In either case, we generally favour branches being named 'GH{#issue}-{branch description}' where {#issue} is the number/identifier of the issue and {PR description} describes the content of the branch (more important when the branch only addresses part of the issue).