Skip to content

Commit 97a3a22

Browse files
committed
chore(docs): update docs
1 parent 986fe5f commit 97a3a22

8 files changed

Lines changed: 294 additions & 69 deletions

File tree

AGENTS.md

Lines changed: 159 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,11 @@
11
# AGENTS.md — CodeClone (AI Agent Playbook)
22

3-
This document is the **source of truth** for how AI agents should work in this repository.
3+
This document is the **source of truth** for agent operating rules in this repository.
44
It is optimized for **determinism**, **CI stability**, and **reproducible changes**.
55

6+
For architecture, module ownership, and runtime behavior, the **current repository code is the source of truth**.
7+
If AGENTS.md and code diverge, follow code and update AGENTS.md accordingly.
8+
69
> Repository goal: maximize **honesty**, **reproducibility**, **determinism**, and **precision** for real‑world CI
710
> usage.
811
@@ -65,9 +68,7 @@ Key artifacts:
6568
Run these locally before proposing changes:
6669

6770
```bash
68-
uv run ruff check .
69-
uv run mypy .
70-
uv run pytest -q
71+
uv run pre-commit run --all-files
7172
```
7273

7374
If you touched baseline/cache/report contracts, also run the repo’s audit runner (or the scenario script if present).
@@ -266,9 +267,160 @@ Before cutting a release:
266267

267268
---
268269

269-
## 12) Where to put new code
270+
## 12) Repository architecture
271+
272+
Architecture is layered, but grounded in current code (not aspirational diagrams):
273+
274+
- **CLI / orchestration surface** (`codeclone/cli.py`, `codeclone/_cli_*.py`) parses args, resolves runtime mode,
275+
coordinates pipeline calls, and prints UX.
276+
- **Pipeline orchestrator** (`codeclone/pipeline.py`) owns end-to-end flow: bootstrap → discovery → processing →
277+
analysis → report artifacts → gating.
278+
- **Core analysis** (`codeclone/extractor.py`, `codeclone/cfg.py`, `codeclone/normalize.py`, `codeclone/blocks.py`,
279+
`codeclone/grouping.py`, `codeclone/scanner.py`) produces normalized structural facts and clone candidates.
280+
- **Domain/contracts layer** (`codeclone/models.py`, `codeclone/contracts.py`, `codeclone/errors.py`) defines typed
281+
entities and stable enums/constants used across layers.
282+
- **Persistence contracts** (`codeclone/baseline.py`, `codeclone/cache.py`, `codeclone/metrics_baseline.py`) store
283+
trusted comparison state and optimization state.
284+
- **Canonical report + projections** (`codeclone/report/json_contract.py`, `codeclone/report/*.py`) converts analysis
285+
facts to deterministic, contract-shaped outputs.
286+
- **HTML/UI rendering** (`codeclone/html_report.py`, `codeclone/_html_*.py`, `codeclone/templates.py`) renders views
287+
from report/meta facts.
288+
- **Tests-as-spec** (`tests/`) lock behavior, contracts, determinism, and architecture boundaries.
289+
290+
Non-negotiable interpretation:
291+
292+
- Core produces facts; renderers present facts.
293+
- Baseline/cache are persistence contracts, not analysis truth.
294+
- UI/report must not invent gating semantics.
295+
296+
## 13) Module map
297+
298+
Use this map to route changes to the right owner module.
299+
300+
- `codeclone/cli.py` — public CLI entry and control-flow coordinator; add orchestration and top-level UX here; do not
301+
move core analysis logic here.
302+
- `codeclone/_cli_*.py` — CLI support slices (args, config, runtime, summary, reports, baselines, gating); keep them
303+
thin and reusable; do not encode domain semantics that belong to pipeline/core/contracts.
304+
- `codeclone/pipeline.py` — canonical orchestration and data plumbing between scanner/extractor/metrics/report/gating;
305+
change integration flow here; do not move HTML-only presentation logic here.
306+
- `codeclone/extractor.py` — AST extraction, CFG fingerprint input preparation, symbol/declaration collection, and
307+
per-file metrics inputs; change parsing/extraction semantics here; do not couple this module to CLI/report
308+
rendering/baseline logic.
309+
- `codeclone/grouping.py` / `codeclone/blocks.py` / `codeclone/blockhash.py` — clone grouping and block/segment
310+
mechanics; change grouping behavior here; do not mix in CLI/report UX concerns.
311+
- `codeclone/metrics/` — metric computations and dead-code/dependency/health logic; change metric math and thresholds
312+
here; do not make metrics depend on renderer/UI concerns.
313+
- `codeclone/structural_findings.py` — structural finding extraction/normalization policy; keep it report-layer factual
314+
and deterministic.
315+
- `codeclone/suppressions.py` — inline `# noqa: codeclone[...]` parse/bind/index logic; keep it declaration-scoped and
316+
deterministic.
317+
- `codeclone/baseline.py` — baseline schema/trust/integrity/compatibility contract; all baseline format changes go here
318+
with explicit contract process.
319+
- `codeclone/cache.py` — cache schema/integrity/profile compatibility and serialization; cache remains
320+
optimization-only.
321+
- `codeclone/report/json_contract.py` — canonical report schema builder/integrity payload; any JSON contract shape
322+
change belongs here.
323+
- `codeclone/report/*.py` (other modules) — deterministic projections/format transforms (
324+
text/markdown/sarif/derived/findings/suggestions); avoid injecting new analysis heuristics here.
325+
- `codeclone/html_report.py` — HTML presentation layer from report/meta payload; no hidden analysis decisions.
326+
- `codeclone/models.py` — shared typed models crossing modules; keep model changes contract-aware.
327+
- `tests/` — executable specification: architecture rules, contracts, goldens, invariants, regressions.
328+
329+
## 14) Dependency direction
330+
331+
Dependency direction is enforceable and partially test-guarded (`tests/test_architecture.py`):
332+
333+
- `codeclone.report.*` must not import `codeclone.cli`, `codeclone.html_report`, or `codeclone.ui_messages`.
334+
- `codeclone.extractor` must not import `codeclone.report`, `codeclone.cli`, or `codeclone.baseline`.
335+
- `codeclone.grouping` must not import `codeclone.cli`, `codeclone.baseline`, or `codeclone.html_report`.
336+
- `codeclone.baseline` and `codeclone.cache` must not import `codeclone.cli`, `codeclone.ui_messages`, or
337+
`codeclone.html_report`.
338+
- `codeclone.models` may import only `codeclone.contracts` and `codeclone.errors` from local modules.
339+
340+
Operational rules:
341+
342+
- Core/domain code must not depend on HTML/UI.
343+
- Renderers depend on canonical report payload/model; canonical report code must not depend on renderer/UI.
344+
- Metrics/report layers must not recompute or invent core facts in UI.
345+
- CLI helper modules (`_cli_*`) must orchestrate/format, not own domain semantics.
346+
- Persistence semantics (baseline/cache trust/integrity) must stay in persistence/domain modules, not in render/UI
347+
layers.
348+
349+
## 15) Suppression policy
350+
351+
Inline suppressions are explicit local policy, not analysis truth.
352+
353+
- Supported syntax is `# noqa: codeclone[rule-id,...]` via `codeclone/suppressions.py`.
354+
- Binding scope is declaration-only (`def`, `async def`, `class`) using:
355+
- leading comment on the line immediately before declaration
356+
- inline comment on declaration line
357+
- Binding is target-specific (`filepath`, `qualname`, declaration span, kind). No file-wide/global implicit scope.
358+
- Unknown/malformed directives are ignored safely; analysis must not fail because of suppression syntax issues.
359+
- Current active semantic effect is dead-code suppression (`dead-code`) through `extractor.py`
360+
`DeadCandidate.suppressed_rules``metrics/dead_code.py`.
361+
- Suppressed dead-code findings are excluded from active dead-code findings and health impact, but remain observable in
362+
report surfaces where implemented (JSON summary/details, text/markdown/html, CLI counters).
363+
- Suppressions must not silently alter unrelated finding families.
364+
365+
Prefer explicit inline suppressions for runtime/dynamic false positives instead of broad framework heuristics.
366+
367+
## 16) Change routing
368+
369+
If you change a contract-sensitive zone, route docs/tests/approval deliberately.
370+
371+
| Change zone | Must update docs | Must update tests | Explicit approval required when | Contract-change trigger |
372+
|--------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------|------------------------------------------------------------------------------------|
373+
| Baseline schema/trust/integrity (`codeclone/baseline.py`) | `docs/book/06-baseline.md`, `docs/book/14-compatibility-and-versioning.md`, `docs/book/appendix/b-schema-layouts.md`, `CHANGELOG.md` | `tests/test_baseline.py`, CI/CLI behavior tests (`tests/test_cli_inprocess.py`, `tests/test_cli_unit.py`) | schema/trust semantics, compatibility windows, payload integrity logic change | baseline key layout/status semantics/compat rules change |
374+
| Cache schema/profile/integrity (`codeclone/cache.py`) | `docs/book/07-cache.md`, `docs/book/appendix/b-schema-layouts.md`, `CHANGELOG.md` | `tests/test_cache.py`, pipeline/CLI cache integration tests | cache schema/status/profile compatibility semantics change | cache payload/version/status semantics change |
375+
| Canonical report JSON shape (`codeclone/report/json_contract.py`, report projections) | `docs/book/08-report.md` (+ `docs/book/10-html-render.md` if rendering contract impacted), `CHANGELOG.md` | `tests/test_report.py`, `tests/test_report_contract_coverage.py`, `tests/test_report_branch_invariants.py`, relevant report-format tests | finding/meta/summary schema changes | stable JSON fields/meaning/order guarantees change |
376+
| CLI flags/help/exit behavior (`codeclone/cli.py`, `_cli_*`, `contracts.py`) | `docs/book/09-cli.md`, `docs/book/03-contracts-exit-codes.md`, `README.md`, `CHANGELOG.md` | `tests/test_cli_unit.py`, `tests/test_cli_inprocess.py`, `tests/test_cli_smoke.py` | exit-code semantics, script-facing behavior, flag contracts change | user-visible CLI contract changes |
377+
| Fingerprint-adjacent analysis (`extractor/cfg/normalize/grouping`) | `docs/book/05-core-pipeline.md`, `docs/cfg.md`, `docs/book/14-compatibility-and-versioning.md`, `CHANGELOG.md` | `tests/test_fingerprint.py`, `tests/test_extractor.py`, `tests/test_cfg.py`, golden tests (`tests/test_detector_golden.py`, `tests/test_golden_v2.py`) | always (see Section 1.6) | clone identity / NEW-vs-KNOWN / fingerprint inputs change |
378+
| Suppression semantics/reporting (`suppressions`, extractor dead-code wiring, report/UI counters) | `docs/book/19-inline-suppressions.md`, `docs/book/16-dead-code-contract.md`, `docs/book/08-report.md`, and interface docs if surfaced (`09-cli`, `10-html-render`) | `tests/test_suppressions.py`, `tests/test_extractor.py`, `tests/test_metrics_modules.py`, `tests/test_pipeline_metrics.py`, report/html/cli tests | declaration scope semantics, rule effect, or contract-visible counters/fields change | suppression changes alter active finding output or contract-visible report payload |
379+
380+
Golden rule: do not “fix” failures by snapshot refresh unless the underlying contract change is intentional, documented,
381+
and approved.
382+
383+
## 17) Testing taxonomy
384+
385+
Treat tests as specification with explicit intent:
386+
387+
- **Unit tests** — module-level behavior and edge conditions (e.g., `tests/test_cfg.py`, `tests/test_normalize.py`,
388+
`tests/test_metrics_modules.py`, `tests/test_suppressions.py`).
389+
- **Contract tests** — baseline/cache/report/CLI public semantics (e.g., `tests/test_baseline.py`,
390+
`tests/test_cache.py`, `tests/test_report_contract_coverage.py`, `tests/test_cli_unit.py`).
391+
- **Golden tests** — snapshot sentinels for stable outputs (`tests/test_detector_golden.py`, `tests/test_golden_v2.py`).
392+
- **Determinism/invariant tests** — ordering, branch-path invariants, and canonical stability (e.g.,
393+
`tests/test_report_branch_invariants.py`, `tests/test_core_branch_coverage.py`).
394+
- **Scenario/regression tests** — multi-step integration and process-level behavior (e.g.,
395+
`tests/test_cli_inprocess.py`, `tests/test_pipeline_process.py`, `tests/test_cli_smoke.py`).
396+
397+
Policy:
398+
399+
- Expand the closest taxonomy bucket when changing behavior.
400+
- If a change touches a public surface, include/adjust contract tests, not only unit tests.
401+
- Goldens validate intended contract shifts; they are not a substitute for reasoning or routing.
402+
403+
## 18) Public vs internal surfaces
404+
405+
### Public / contract-sensitive surfaces
406+
407+
- CLI flags, defaults, exit codes, and stable script-facing messages.
408+
- Baseline schema/trust semantics/integrity compatibility (`2.0` baseline contract family).
409+
- Cache schema/status/profile compatibility/integrity (`CACHE_VERSION` contract family).
410+
- Canonical report JSON schema/payload semantics (`REPORT_SCHEMA_VERSION` contract family).
411+
- Documented finding families/kinds/ids and suppression-facing report fields.
412+
- Metrics baseline schema/compatibility where used by CI/gating.
413+
- Benchmark schema/outputs if consumed as a reproducible contract surface.
414+
415+
### Internal implementation surfaces
416+
417+
- Local helpers and formatting utilities (`_html_*`, many private `_as_*` normalizers, local transformers).
418+
- Internal orchestration decomposition inside `_cli_*` modules.
419+
- Private utility refactors that do not change public payloads, exit semantics, ordering, or trust rules.
420+
421+
If classification is ambiguous, treat it as contract-sensitive and add tests/docs before merging.
270422

271-
## 13) Python language + typing rules (3.10 → 3.14)
423+
## 19) Python language + typing rules (3.10 → 3.14)
272424

273425
These rules are **repo policy**. If you need to violate one, you must explain why in the PR.
274426

@@ -330,8 +482,6 @@ Use modern syntax when it stays compatible with 3.10+:
330482
Prefer these rules:
331483

332484
- **Domain / contracts / enums** live near the domain owner (baseline statuses in baseline domain).
333-
- **Core logic** should not depend on HTML.
334-
- **Render** depends on report model, never the other way around.
335485
- If a module becomes a “god module”, split by:
336486
- model (types)
337487
- io/serialization
@@ -342,7 +492,7 @@ Avoid deep package hierarchies unless they clearly reduce coupling.
342492

343493
---
344494

345-
## 14) Minimal checklist for PRs (agents)
495+
## 20) Minimal checklist for PRs (agents)
346496

347497
- [ ] Change is deterministic.
348498
- [ ] Contracts preserved or versioned.

README.md

Lines changed: 20 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -80,32 +80,6 @@ codeclone . --fail-cycles --fail-dead-code
8080
codeclone . --fail-on-new-metrics
8181
```
8282

83-
### Inline Suppressions For Known FP
84-
85-
Use local declaration-level suppressions when a finding is accepted by design
86-
(for example runtime callbacks invoked by a framework):
87-
88-
```python
89-
# noqa: codeclone[dead-code]
90-
def handle_exception(exc: Exception) -> None:
91-
...
92-
93-
94-
class Middleware: # noqa: codeclone[dead-code]
95-
...
96-
```
97-
98-
Rules:
99-
100-
- supports `def`, `async def`, and `class`
101-
- supports previous-line and end-of-line forms on declaration lines
102-
- requires explicit rule list: `codeclone[...]`
103-
- does not provide file-level/global ignores
104-
- suppressed dead-code candidates are excluded from active findings/health and
105-
reported separately in JSON/TXT/Markdown/HTML
106-
- CLI metrics line shows suppression context, for example:
107-
`Dead code ✔ clean (9 suppressed)`
108-
10983
### Pre-commit
11084

11185
```yaml
@@ -180,8 +154,23 @@ Structural findings include:
180154
- `clone_guard_exit_divergence`
181155
- `clone_cohort_drift`
182156

183-
Dead-code detection is intentionally deterministic and static. Dynamic/runtime false positives are resolved
184-
via explicit inline suppressions, not via broad heuristics or implicit framework-specific guesses.
157+
### Inline Suppressions
158+
159+
CodeClone keeps dead-code detection deterministic and static by default. When a symbol is intentionally
160+
invoked through runtime dynamics (for example framework callbacks, plugin loading, or reflection), suppress
161+
the known false positive explicitly at the declaration site:
162+
163+
```python
164+
# noqa: codeclone[dead-code]
165+
def handle_exception(exc: Exception) -> None:
166+
...
167+
168+
169+
class Middleware: # noqa: codeclone[dead-code]
170+
...
171+
```
172+
173+
Dynamic/runtime false positives are resolved via explicit inline suppressions, not via broad heuristics.
185174

186175
<details>
187176
<summary>JSON report shape (v2.1)</summary>
@@ -263,7 +252,8 @@ via explicit inline suppressions, not via broad heuristics or implicit framework
263252
}
264253
```
265254

266-
Canonical contract: [`docs/book/08-report.md`](docs/book/08-report.md)
255+
Canonical contract: [`docs/book/08-report.md`](docs/book/08-report.md) and [
256+
`docs/book/16-dead-code-contract.md`](docs/book/16-dead-code-contract.md)
267257

268258
</details>
269259

@@ -314,7 +304,7 @@ CPUSET=0 CPUS=1.0 MEMORY=2g RUNS=16 WARMUPS=4 \
314304
```
315305

316306
Performance claims are backed by the reproducible benchmark workflow documented
317-
in [docs/book/18-benchmarking.md](docs/book/18-benchmarking.md)
307+
in [docs/book/18-benchmarking.md](docs/book/18-benchmarking.md)
318308

319309
</details>
320310

docs/book/08-report.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,7 @@ Refs:
108108
- `tests/test_report_branch_invariants.py::test_overview_and_sarif_branch_invariants`
109109
- `tests/test_report.py::test_json_includes_clone_guard_exit_divergence_structural_group`
110110
- `tests/test_report.py::test_json_includes_clone_cohort_drift_structural_group`
111+
- `tests/test_report.py::test_report_json_dead_code_suppressed_items_are_reported_separately`
111112

112113
## Non-guarantees
113114

docs/book/09-cli.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,10 @@ Refs:
4444
- Internal errors use `fmt_internal_error` with optional debug details.
4545
- Runtime footer uses explicit wording: `Pipeline done in <seconds>s`.
4646
This metric is CLI pipeline time and does not include external launcher/startup overhead (for example `uv run`).
47+
- Dead-code metric line is stateful and deterministic:
48+
- `N found (M suppressed)` when active dead-code items exist
49+
- `✔ clean` when both active and suppressed are zero
50+
- `✔ clean (M suppressed)` when active is zero but suppressed > 0
4751

4852
Refs:
4953

docs/book/10-html-render.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,8 @@ Refs:
3535
- HTML must not recompute detection semantics; it renders facts from core/report layers.
3636
- Explainability hints shown in UI are sourced from `build_block_group_facts` data.
3737
- Provenance panel mirrors report metadata contract.
38+
- Dead-code UI is a single top-level `Dead Code` tab with deterministic split
39+
sub-tabs: `Active` and `Suppressed`.
3840

3941
Refs:
4042

@@ -47,6 +49,8 @@ Refs:
4749
- All user/content fields are escaped for text/attributes before insertion.
4850
- Missing file snippets render explicit fallback blocks.
4951
- Novelty controls reflect baseline trust split note and per-group novelty flags.
52+
- Suppressed dead-code rows are rendered only from report dead-code suppression
53+
payloads and do not become active dead-code findings in UI tables.
5054

5155
Refs:
5256

0 commit comments

Comments
 (0)