|
| 1 | +# Test Strategy |
| 2 | + |
| 3 | +Canonical map of **what we test, at which layer, and where the gaps are**. |
| 4 | +Pairs with the how-to-run instructions in `CONTRIBUTING.md` and the manual |
| 5 | +client runbook in `.github/INTEGRATION-TEST.md`. |
| 6 | + |
| 7 | +Last verified: 2026-05-29 — 284 tests passing, ruff clean, pyright 0 errors. |
| 8 | + |
| 9 | +## 1. The pyramid (current shape) |
| 10 | + |
| 11 | +``` |
| 12 | + / stdio E2E \ 9 tests — real MCP process over stdio |
| 13 | + / integration \ ~38 tests — multi-version, publish, cache, phase1 |
| 14 | + / unit / service \ ~230 tests — services, retrieval, ingestion, compare |
| 15 | + / contract + regression \ 38 tests — schema snapshots, stability, curated cases |
| 16 | +``` |
| 17 | + |
| 18 | +This is the right shape: a wide, fast base and a thin, slow top. Keep new |
| 19 | +tests pushed **down** the pyramid — only add an stdio E2E test when a bug can |
| 20 | +*only* manifest across the process boundary (framing, lifespan DI, stdout |
| 21 | +hygiene). |
| 22 | + |
| 23 | +## 2. Expected features → coverage map |
| 24 | + |
| 25 | +The server exposes **6 MCP tools**. Every tool must have at least one |
| 26 | +behavioral test and appear in the schema snapshot. |
| 27 | + |
| 28 | +| Tool / feature | Primary tests | Layer | Status | |
| 29 | +|------------------------|----------------------------------------------------------------------|------------------|--------| |
| 30 | +| `search_docs` | `test_services`, `test_retrieval`, `test_synonyms`, `test_stability` | unit + regression| STRONG | |
| 31 | +| `get_docs` | `test_services`, `test_retrieval`, `test_persistent_docs_cache`, `test_mcp_get_docs_cache_smoke` | unit + integration | STRONG | |
| 32 | +| `list_versions` | `test_services`, `test_multi_version` | unit + integration| GOOD | |
| 33 | +| `compare_versions` | `test_compare_versions` (15), `test_services` | unit | GOOD | |
| 34 | +| `lookup_package_docs` | `test_package_docs` (8) | unit only | THIN | |
| 35 | +| `detect_python_version`| `test_detection` (12) | unit | GOOD | |
| 36 | + |
| 37 | +Cross-cutting coverage: |
| 38 | + |
| 39 | +- **Schema contract**: `test_schema.py`, `test_schema_snapshot.py` — input/output |
| 40 | + JSON schemas for each tool are frozen as fixtures; a wire-shape change fails CI. |
| 41 | +- **Multi-version routing**: `test_multi_version.py` — version param resolution and |
| 42 | + default fallback across indexed doc sets. |
| 43 | +- **Regression**: `test_retrieval_regression.py` (curated query→expected cases) and |
| 44 | + `test_stability.py` (property-based invariants that survive CPython doc revisions). |
| 45 | +- **Process hygiene**: `test_stdio_smoke.py`, `test_stdio_hygiene.py` — confirm a real |
| 46 | + stdio server starts, answers, and keeps stdout free of non-protocol noise. |
| 47 | +- **Packaging / CI**: `test_packaging.py`, `test_ci_workflows.py` — installable |
| 48 | + artifact + workflow file invariants. |
| 49 | + |
| 50 | +## 3. What to test, by component type |
| 51 | + |
| 52 | +- **Services** (`services/`): business logic in isolation against a `tmp_path` |
| 53 | + SQLite fixture. Cover the happy path, every error branch (`DocsServerError` |
| 54 | + subclasses), and token-budget trimming. |
| 55 | +- **Retrieval/ranking** (`retrieval/`): query parsing, FTS5 behavior, ranker |
| 56 | + ordering. Use property assertions (`>= 1 result`, substring match) over exact |
| 57 | + content so upstream doc edits don't break the suite. |
| 58 | +- **Ingestion** (`ingestion/`): parse valid + deliberately broken `.fjson` |
| 59 | + fixtures; assert idempotency on re-publish. |
| 60 | +- **Server layer** (`server.py`): thin — it only delegates to services and maps |
| 61 | + `DocsServerError → ToolError`. Cover that mapping via stdio smoke, not unit tests. |
| 62 | +- **Detection** (`detection.py`): pure environment probing — see gap below. |
| 63 | + |
| 64 | +## 4. Coverage targets |
| 65 | + |
| 66 | +No line-coverage gate is enforced (no `pytest-cov` in the dev deps). The bar is |
| 67 | +**behavioral**, not numeric: |
| 68 | + |
| 69 | +- Every public tool has ≥1 happy-path + ≥1 error-path test. |
| 70 | +- Every `errors.py` exception type is raised by at least one test. |
| 71 | +- Every wire-facing model is pinned by a schema snapshot. |
| 72 | + |
| 73 | +Adopt these as the definition of done for new tools. A line-coverage gate is |
| 74 | +optional future work; if added, target the `services/` and `retrieval/` |
| 75 | +packages, not `server.py` (intentionally thin) or `__main__.py`. |
| 76 | + |
| 77 | +## 5. Known gaps |
| 78 | + |
| 79 | +1. **`detection.py` — CLOSED (2026-05-29).** `tests/test_detection.py` now |
| 80 | + covers all three branches of the fallback chain (`.python-version` file → |
| 81 | + `python3` in PATH → `sys.version_info`), `_parse_major_minor` parsing, and |
| 82 | + `match_to_indexed` — 12 tests. The isolation pattern (`monkeypatch.chdir` to |
| 83 | + escape a real `.python-version`, `monkeypatch.setattr` on `subprocess.run`) |
| 84 | + is the reference for testing order-dependent environment probing. |
| 85 | +2. **`lookup_package_docs` has no stdio smoke (LOW).** Covered at the service |
| 86 | + layer only; the PyPI-allowlist trust boundary is never exercised end-to-end. |
| 87 | +3. **No negative version-resolution E2E (LOW).** Unknown-version errors are |
| 88 | + unit-tested but not asserted over the stdio boundary. |
| 89 | + |
| 90 | +## 6. Reference cases — `detection.py` (now implemented in `test_detection.py`) |
| 91 | + |
| 92 | +| Case | Expectation | |
| 93 | +|----------------------------------------|------------------------------------------| |
| 94 | +| `.python-version` file present in cwd | returns `(version, ".python-version file")` | |
| 95 | +| `.python-version` malformed / empty | falls through to next source, no crash | |
| 96 | +| no file, `python3` on PATH | returns `(version, "python3 in PATH")` | |
| 97 | +| no file, no `python3` | returns runtime `(X.Y, "server runtime")`| |
| 98 | +| `_parse_major_minor("Python 3.13.2")` | `"3.13"` | |
| 99 | +| `_parse_major_minor("no digits here")` | `None` | |
| 100 | +| `match_to_indexed("3.13", ["3.13"])` | `"3.13"` | |
| 101 | +| `match_to_indexed("3.9", ["3.13"])` | `None` | |
0 commit comments