- Docker must be installed and the Docker engine must be running.
- uv for dependency management.
Tests are organized into three categories:
These run in CI and do not call any third-party APIs. They include:
- Integration tests — API endpoints, database operations, core functionality, security, and tasks.
- Unit tests — isolated logic tests.
Validates that database migration scripts are well-formed and can be applied cleanly.
Tests that call third-party APIs (Google, MuckRock, etc.) and are not run automatically. The directory intentionally lacks the test prefix to prevent accidental inclusion in pytest runs.
Run these individually and only when needed — they may incur API costs.
Start the local database, then run pytest:
# Start the database
cd local_database
docker compose up -d
cd ..
# Run automated tests
uv run pytest tests/automated
# Run alembic tests
uv run pytest tests/alembicThis spins up a two-container setup (FastAPI app + PostgreSQL):
docker compose up -dThen run tests inside the container:
docker exec data-source-identification-app-1 pytest /app/tests/automatedNote: The docker-compose.yml in the root is configured for Linux Docker (used in GitHub Actions). For local development on Windows/macOS, you may need to change POSTGRES_HOST from 172.17.0.1 to host.docker.internal. See the comments in docker-compose.yml.
The GitHub Actions workflow (.github/workflows/test_app.yml) runs on every pull request:
- Starts a PostgreSQL 15 service container with health checks.
- Installs uv and project dependencies.
- Runs
pytest tests/automated. - Runs
pytest tests/alembic.
The pipeline has a 20-minute timeout.
A separate workflow (.github/workflows/python_checks.yml) runs flake8 linting via reviewdog on pull requests. These are advisory warnings and do not block merges.
tests/
├── conftest.py # Session fixtures: DB setup, teardown, client instances
├── helpers/
│ ├── alembic_runner.py # Alembic test utilities
│ ├── data_creator/ # Test data generation helpers
│ └── setup/ # Database populate/wipe utilities
├── test_data/ # Static test data (JSON files, etc.)
├── automated/
│ ├── integration/
│ │ ├── api/ # Endpoint tests (agencies, annotate, batch, etc.)
│ │ ├── core/ # Core async operation tests
│ │ ├── db/
│ │ │ ├── client/ # Database client method tests
│ │ │ └── structure/ # Schema validation tests
│ │ ├── readonly/ # Read-only operation tests
│ │ ├── security_manager/ # Auth/authz tests
│ │ └── tasks/ # Task implementation tests
│ └── unit/ # Unit tests
├── alembic/
│ └── test_revisions.py # Migration validation
└── manual/
├── core/lifecycle/ # Core lifecycle tests
├── source_collectors/ # Collector integration tests
└── unsorted/ # Miscellaneous manual tests
From pytest.ini:
- Timeout: 300 seconds per test.
- Async mode:
auto(all async tests are automatically detected). - Fixture loop scope:
function(each test gets its own event loop). - Manual tests are marked and excluded from automated runs.
- Place automated tests in
tests/automated/integration/ortests/automated/unit/. - Use the fixtures defined in
conftest.pyfor database access (adb_client,db_client). - Use helpers in
tests/helpers/data_creator/to generate test data. - If your test calls a third-party API, place it in
tests/manual/and do not prefix the directory withtest.