From bd12feaab1ab00ad1c3dca56fac79ec520c4424a Mon Sep 17 00:00:00 2001 From: Nigel Jones Date: Fri, 16 Jan 2026 13:53:58 +0000 Subject: [PATCH 1/2] docs: add AGENTS.md guidelines for AI coding assistants Add structured guidance for AI assistants working with Mellea: - AGENTS.md for contributors modifying Mellea internals - docs/AGENTS_TEMPLATE.md for downstream projects to copy --- AGENTS.md | 78 +++++++++++++++++ docs/AGENTS_TEMPLATE.md | 183 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 261 insertions(+) create mode 100644 AGENTS.md create mode 100644 docs/AGENTS_TEMPLATE.md diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 00000000..b50bb8c7 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,78 @@ + + +# Agent Guidelines for Mellea Contributors + +> **Which guide?** Modifying `mellea/`, `cli/`, or `test/` → this file. Writing code that imports Mellea → [`docs/AGENTS_TEMPLATE.md`](docs/AGENTS_TEMPLATE.md). + +## 1. Quick Reference +```bash +pre-commit install # Required: install git hooks +uv sync # Install deps & fix lockfile +uv run pytest -m "not qualitative" # Fast loop (unit tests only) +uv run pytest # Full suite (includes LLM tests) +uv run pytest -m integration # Tests requiring API keys +uv run ruff format . && uv run ruff check . # Lint & format +``` +**Branches**: `feat/topic`, `fix/issue-id`, `docs/topic` + +## 2. Directory Structure +| Path | Contents | +|------|----------| +| `mellea/stdlib` | Core: Sessions, Genslots, Requirements, Sampling, Context | +| `mellea/backends` | Providers: HF, OpenAI, Ollama, Watsonx, LiteLLM | +| `mellea/helpers` | Utilities, logging, model ID tables | +| `cli/` | CLI commands (`m serve`, `m alora`, `m decompose`, `m eval`) | +| `test/` | All tests. Unmarked = unit tests (no network/API keys) | +| `scratchpad/` | Experiments (git-ignored) | + +## 3. Test Markers +- `@pytest.mark.qualitative` — LLM output quality tests (skipped in CI) +- `@pytest.mark.integration` — Requires API keys +- **Unmarked** — Pure unit tests: no network, deterministic + +⚠️ Don't add `qualitative` to trivial tests—keep the fast loop fast. + +## 4. Coding Standards +- **Types required** on all core functions +- **Docstrings are prompts** — be specific, the LLM reads them +- **Google-style docstrings** +- **Ruff** for linting/formatting +- Use `...` in `@generative` function bodies +- Prefer primitives over classes + +## 5. Commits & Hooks +[Angular format](https://github.com/angular/angular/blob/main/CONTRIBUTING.md#commit): `feat:`, `fix:`, `docs:`, `test:`, `refactor:`, `release:` + +Pre-commit runs: ruff, mypy, uv-lock, codespell + +## 6. Timing +> **Don't cancel**: `pytest` (full) and `pre-commit --all-files` may take minutes. Canceling mid-run can corrupt state. + +## 7. Common Issues +| Problem | Fix | +|---------|-----| +| `ComponentParseError` | Add examples to docstring | +| `uv.lock` out of sync | Run `uv sync` | +| Ollama refused | Run `ollama serve` | + +## 8. Self-Review (before notifying user) +1. `uv run pytest -m "not qualitative"` passes? +2. `ruff format` and `ruff check` clean? +3. New functions typed with concise docstrings? +4. Unit tests added for new functionality? +5. Avoided over-engineering? + +## 9. Writing Tests +- Place tests in `test/` mirroring source structure +- Name files `test_*.py` (required for pydocstyle) +- Use `gh_run` fixture for CI-aware tests (see `conftest.py`) +- **No LLM calls** in unmarked tests—mock or mark `qualitative` +- If a test fails, fix the **code**, not the test (unless the test was wrong) + +## 10. Feedback Loop +Found a bug, workaround, or pattern? Update the docs: +- **Issue/workaround?** → Add to Section 7 (Common Issues) in this file +- **Usage pattern?** → Add to [`docs/AGENTS_TEMPLATE.md`](docs/AGENTS_TEMPLATE.md) +- **New pitfall?** → Add warning near relevant section diff --git a/docs/AGENTS_TEMPLATE.md b/docs/AGENTS_TEMPLATE.md new file mode 100644 index 00000000..a1543c1e --- /dev/null +++ b/docs/AGENTS_TEMPLATE.md @@ -0,0 +1,183 @@ + + +# Mellea Usage Guidelines + +> **This file**: For code that *imports* Mellea. For Mellea internals, see [`../AGENTS.md`](../AGENTS.md). + +Copy below into your `AGENTS.md` or system prompt. + +--- + +### Library: Mellea +Use `mellea` for LLM interactions. No direct OpenAI/Anthropic calls or LangChain OutputParsers. + +**Prerequisites**: `pip install mellea` · [Docs](https://mellea.ai) · [Repo](https://github.com/generative-computing/mellea) + +#### 1. The `@generative` Pattern +**Don't** write prompt templates or regex parsers: +```python +# BAD - don't do this +response = openai.chat.completions.create(...) +age = int(re.search(r"\d+", response).group()) +``` +**Do** use typed function signatures: +```python +from mellea import generative, start_session + +@generative +def extract_age(text: str) -> int: + """Extract the user's age from text.""" + ... + +m = start_session() +age = extract_age(m, text="Alice is 30") # Returns int(30) +``` + +#### 2. Complex Types +```python +from pydantic import BaseModel +from mellea import generative + +class UserProfile(BaseModel): + name: str + age: int + interests: list[str] + +@generative +def parse_profile(bio: str) -> UserProfile: ... +``` + +#### 3. Chain-of-Thought +Add `reasoning` field to force the LLM to "think" before answering: +```python +from typing import Literal +from pydantic import BaseModel, Field + +class AnalysisResult(BaseModel): + reasoning: str # LLM fills first + conclusion: Literal["approve", "reject"] + confidence: float = Field(ge=0.0, le=1.0) + +@generative +def analyze_document(doc: str) -> AnalysisResult: ... +``` + +#### 4. Control Flow +Use Python `if/for/while`. No graph frameworks needed: +```python +if analyze_sentiment(m, email) == "negative": + draft = draft_apology(m, email) +else: + draft = draft_response(m, email) +``` + +#### 5. Instruct-Validate-Repair +For strict requirements, use `m.instruct()`: +```python +from mellea.stdlib.requirements import req, simple_validate +from mellea.stdlib.sampling import RejectionSamplingStrategy + +email = m.instruct( + "Write an invite for {{name}}", + requirements=[ + req("Must be formal"), + req("Lowercase only", validation_fn=simple_validate(lambda x: x.islower())) + ], + strategy=RejectionSamplingStrategy(loop_budget=3), + user_variables={"name": "Alice"} +) +``` + +#### 6. Small Model Fix +Small models (1B-8B) can't calculate. Extract params with LLM, compute in Python: +```python +from pydantic import BaseModel + +class PhysicsParams(BaseModel): + speed_a: float + speed_b: float + delay_hours: float + +@generative +def extract_params(text: str) -> PhysicsParams: + """EXTRACT numbers only. Do not calculate.""" + ... + +def calculate_gap(p: PhysicsParams) -> float: + return p.speed_a * p.delay_hours +``` + +#### 7. One-Shot Examples +If model struggles, add examples to docstring: +```python +@generative +def identify_fruit(text: str) -> str | None: + """ + Extract fruit from text, or None if none mentioned. + Ex: "I ate an apple" -> "apple" + Ex: "The sky is blue" -> None + """ + ... +``` + +#### 8. Backend Config +```python +from mellea import start_session +from mellea.backends.model_options import ModelOption + +m = start_session( + model_id="granite3.3:8b", + model_options={ModelOption.TEMPERATURE: 0.0, ModelOption.MAX_NEW_TOKENS: 500} +) +``` +Options: `TEMPERATURE`, `MAX_NEW_TOKENS`, `SYSTEM_PROMPT`, `SEED`, `TOOLS`, `CONTEXT_WINDOW`, `THINKING`, `STREAM` + +#### 9. Async +```python +@generative +async def extract_age(text: str) -> int: + """Extract age.""" + ... + +result = await extract_age(m, text="Alice is 30") +``` +Session methods: `ainstruct`, `achat`, `aact`, `avalidate`, `aquery`, `atransform` + +#### 10. Auth +- **Ollama**: `start_session()` (no setup) +- **OpenAI**: `export OPENAI_API_KEY="..."` +- **Watsonx**: `export WATSONX_API_KEY="..."`, `WATSONX_URL`, `WATSONX_PROJECT_ID` + +**Never hardcode API keys.** + +#### 11. Anti-Patterns +- **Don't** retry `@generative` calls — Mellea handles retries internally +- **Don't** use `json.loads()` — use typed returns +- **Don't** wrap single functions in classes +- **Do** use `try/except` at app boundaries for network errors + +#### 12. Debugging +```python +from mellea.core import FancyLogger +FancyLogger.get_logger().setLevel("DEBUG") +``` +- `m.last_prompt()` — see exact prompt sent + +#### 13. Common Errors +| Error | Fix | +|-------|-----| +| `ComponentParseError` | LLM output didn't match type—add docstring examples | +| `TypeError: missing positional argument` | First arg must be session `m` | +| `ConnectionRefusedError` | Run `ollama serve` | +| Output wrong/None | Model too small—try larger or add `reasoning` field | + +#### 14. Testing +```bash +uv run pytest -m "not qualitative" # Fast loop +uv run pytest # Full (verify prompts work) +``` + +#### 15. Feedback +Found a workaround or pattern? Add it to Section 13 (Common Errors) above, or update this file with new guidance. From ed490564aa5aebc6f64f9563e64fa1847d810b53 Mon Sep 17 00:00:00 2001 From: Nigel Jones Date: Fri, 16 Jan 2026 14:30:47 +0000 Subject: [PATCH 2/2] docs: fix AGENTS.md to match actual test infrastructure - Use uv sync --all-extras --all-groups (required for tests) - Add ollama serve requirement - Remove non-existent integration marker - Fix test timing expectations (~2 min, not instant) - Remove contradictory unmarked test guidance --- AGENTS.md | 19 +++++++++---------- 1 file changed, 9 insertions(+), 10 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index b50bb8c7..b7b2950d 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -9,10 +9,10 @@ AGENTS.md — Instructions for AI coding assistants (Claude, Cursor, Copilot, Co ## 1. Quick Reference ```bash pre-commit install # Required: install git hooks -uv sync # Install deps & fix lockfile -uv run pytest -m "not qualitative" # Fast loop (unit tests only) -uv run pytest # Full suite (includes LLM tests) -uv run pytest -m integration # Tests requiring API keys +uv sync --all-extras --all-groups # Install all deps (required for tests) +ollama serve # Start Ollama (required for most tests) +uv run pytest -m "not qualitative" # Skips LLM quality tests (~2 min) +uv run pytest # Full suite (includes LLM quality tests) uv run ruff format . && uv run ruff check . # Lint & format ``` **Branches**: `feat/topic`, `fix/issue-id`, `docs/topic` @@ -24,13 +24,12 @@ uv run ruff format . && uv run ruff check . # Lint & format | `mellea/backends` | Providers: HF, OpenAI, Ollama, Watsonx, LiteLLM | | `mellea/helpers` | Utilities, logging, model ID tables | | `cli/` | CLI commands (`m serve`, `m alora`, `m decompose`, `m eval`) | -| `test/` | All tests. Unmarked = unit tests (no network/API keys) | +| `test/` | All tests (run from repo root) | | `scratchpad/` | Experiments (git-ignored) | ## 3. Test Markers -- `@pytest.mark.qualitative` — LLM output quality tests (skipped in CI) -- `@pytest.mark.integration` — Requires API keys -- **Unmarked** — Pure unit tests: no network, deterministic +- `@pytest.mark.qualitative` — LLM output quality tests (skipped in CI via `CICD=1`) +- **Unmarked** — Unit tests (may still require Ollama running locally) ⚠️ Don't add `qualitative` to trivial tests—keep the fast loop fast. @@ -67,8 +66,8 @@ Pre-commit runs: ruff, mypy, uv-lock, codespell ## 9. Writing Tests - Place tests in `test/` mirroring source structure - Name files `test_*.py` (required for pydocstyle) -- Use `gh_run` fixture for CI-aware tests (see `conftest.py`) -- **No LLM calls** in unmarked tests—mock or mark `qualitative` +- Use `gh_run` fixture for CI-aware tests (see `test/conftest.py`) +- Mark tests checking LLM output quality with `@pytest.mark.qualitative` - If a test fails, fix the **code**, not the test (unless the test was wrong) ## 10. Feedback Loop