From 280e8feefe91360f3597db0aa99e5bdab027d4ad Mon Sep 17 00:00:00 2001
From: Jeff Huang <huangjeff@google.com>
Date: Tue, 17 Feb 2026 11:12:18 -0600
Subject: [PATCH 01/17] Python SDK beta proposal

---
 py/docs/python_beta_api_proposal.md | 281 +++++++++++++++++++++++
 py/docs/python_beta_sdk_audit.md    |  92 ++++++++
 py/docs/python_beta_sdk_design.md   | 330 ++++++++++++++++++++++++++++
 3 files changed, 703 insertions(+)
 create mode 100644 py/docs/python_beta_api_proposal.md
 create mode 100644 py/docs/python_beta_sdk_audit.md
 create mode 100644 py/docs/python_beta_sdk_design.md

diff --git a/py/docs/python_beta_api_proposal.md b/py/docs/python_beta_api_proposal.md
new file mode 100644
index 0000000000..8162cfc29b
--- /dev/null
+++ b/py/docs/python_beta_api_proposal.md
@@ -0,0 +1,281 @@
+# Genkit Python — Public API Surface Proposal
+
+This doc defines the public API of the Genkit Python SDK: the set of symbols developers can import, that we commit to keeping stable, and that we document and support. Everything else is internal and can change without notice.
+
+There are two audiences for this SDK:
+
+1. **App developers** — people building AI features with Genkit. They need `Genkit`, decorators, data types, and not much else.
+2. **Plugin authors** — people building model providers, vector stores, telemetry exporters, web framework integrations. They need the action system, schema types, and metadata builders.
+
+These audiences have separate entry points.
+
+---
+
+## Methodology
+
+Every import across 30+ samples, 20+ plugins, and the test suite was audited to understand what symbols people actually use. The symbol lists and usage counts below come from that audit. The documentation audit ([DOCUMENTATION_AUDIT.md](./DOCUMENTATION_AUDIT.md)) independently confirmed the import path confusion — the verification agent used internal paths because no public boundary existed.
+
+---
+
+## Type architecture
+
+Before presenting the symbol lists, it helps to understand why there are multiple types for seemingly the same thing. The SDK has three layers of types, each serving a different purpose.
+
+### Layer 1: Schema types (auto-generated)
+
+These are auto-generated from `genkit-schemas.json`, the shared cross-language schema. They're plain Pydantic `BaseModel` classes — data containers with no convenience methods.
+
+```python
+# genkit/core/typing.py (auto-generated)
+class GenerateResponse(BaseModel):
+    candidates: list[Candidate] | None = None
+    usage: GenerationUsage | None = None
+    request: GenerateRequest | None = None
+    ...
+
+class Message(BaseModel):
+    role: Role
+    content: list[Part]
+    metadata: dict[str, Any] | None = None
+
+class Part(RootModel):
+    root: TextPart | MediaPart | DataPart | ToolRequestPart | ToolResponsePart | ...
+
+class OutputConfig(BaseModel):
+    format: str | None = None
+    schema_: dict[str, Any] | None = None
+    instructions: str | bool | None = None
+    constrained: bool | None = None
+```
+
+These are the internal contract between the framework and plugins. A model plugin receives a `GenerateRequest` and returns a `GenerateResponse`. They split into two audiences:
+
+| Audience | Types | Examples |
+|----------|-------|---------|
+| **Plugin authors** (contract types) | Request/response schemas, config types | `GenerateRequest`, `GenerateResponse`, `OutputConfig`, `ModelInfo` |
+| **App developers** (content types) | Things users construct and pass around | `Message`, `Part`, `TextPart`, `Media`, `Document`, `Role` |
+
+Today they all live in `genkit.types` (or `genkit.core.typing` internally), mixed together.
+
+### Layer 2: Veneers (hand-written wrappers)
+
+These extend schema types with convenience methods — `.text`, `.output`, `.tool_requests`. They're what app developers interact with when receiving responses.
+
+```python
+# genkit/blocks/model.py (hand-written)
+class GenerateResponseWrapper(GenerateResponse):
+    @property
+    def text(self) -> str: ...
+    @property
+    def output(self) -> Any: ...
+    @property
+    def tool_requests(self) -> list[ToolRequestPart]: ...
+    @property
+    def messages(self) -> list[MessageWrapper]: ...
+
+class MessageWrapper:  # wraps Message, doesn't extend it
+    def __init__(self, message: Message): ...
+    @property
+    def text(self) -> str: ...
+    @property
+    def tool_requests(self) -> list[ToolRequestPart]: ...
+```
+
+Key distinction:
+- `GenerateResponseWrapper` **extends** `GenerateResponse` (inheritance). Aliasing it as `GenerateResponse` publicly is safe — construction is compatible.
+- `MessageWrapper` **wraps** `Message` (composition). Its constructor takes a `Message` instance, not raw fields. Aliasing it as `Message` would break `Message(role="user", content=[...])`.
+
+Veneers are for app developers receiving responses. Plugin authors constructing responses use the schema types directly.
+
+### Layer 3: Config helpers (hand-written)
+
+Type-carrying wrappers for configuration:
+
+```python
+# genkit/blocks/interfaces.py (hand-written)
+class Input(Generic[T]):
+    """Carries type info for input validation."""
+    def __init__(self, schema: type[T]): ...
+
+class Output(Generic[T]):
+    """Carries type info for output parsing."""
+    def __init__(self, schema: type[T], format: str = "json", ...): ...
+```
+
+`Input[T]` and `Output[T]` exist so that `generate()` and `prompt()` can carry generic type information — `ai.generate(output=Output(MyModel))` returns `GenerateResponse[MyModel]` with typed `.output`.
+
+### Where the layers bleed
+
+1. **App developers construct schema types directly.** `Message(role="user", content=[Part(text="hello")])` is a schema type, not a veneer. Content-building types are schema types used by both audiences.
+
+2. **Schema and config types overlap.** `OutputConfig` (schema, Layer 1) and `Output[T]` (config helper, Layer 3) configure the same thing. `generate()` accepts both: `output: OutputConfig | OutputConfigDict | Output[Any] | None`. (This is addressed in [PYTHON_API_REVIEW.md, section 5](./PYTHON_API_REVIEW.md).)
+
+3. **Veneers exported under internal names.** `GenerateResponseWrapper` — the "Wrapper" suffix is an implementation detail that leaked into the public API.
+
+4. **`genkit.types` mixes audiences.** Plugin contract types and app developer types sit in the same module.
+
+The proposal below addresses all four of these problems.
+
+---
+
+## Entry point 1: `from genkit import ...`
+
+The framework entry point for app developers. Veneers, context, and errors. Data types live in `genkit.types` (see below) — this follows the Go SDK pattern where types are a separate package, and matches what Python samples already do in practice.
+
+```python
+from genkit import (
+    Genkit,
+    ActionRunContext,
+    GenerateResponse,     # veneer — aliased from GenerateResponseWrapper
+    GenkitError,
+    UserFacingError,
+)
+```
+
+**5 symbols.** Tight, intentional, hard to get wrong.
+
+| Symbol | Why it's here |
+|--------|--------------|
+| `Genkit` | The entry point. Every app starts with this. (48 files) |
+| `ActionRunContext` | Context object inside flows and tools. (20 files) |
+| `GenerateResponse` | Return type of `ai.generate()` — veneer with `.text`, `.output`, `.tool_requests`. |
+| `GenkitError` | Base error class for catching framework errors. |
+| `UserFacingError` | Errors safe to surface to HTTP clients. |
+
+### Veneer aliasing
+
+Users should never see "Wrapper" suffixes. The fix:
+
+```python
+# genkit/__init__.py
+from genkit.blocks.model import GenerateResponseWrapper as GenerateResponse
+from genkit.blocks.model import GenerateResponseChunkWrapper as GenerateResponseChunk
+```
+
+`GenerateResponseWrapper` uses inheritance, so this alias is safe. `GenerateResponseChunkWrapper` follows the same pattern.
+
+**`MessageWrapper` is the exception.** It uses composition — its constructor takes a `Message` instance, not raw fields. Aliasing it as `Message` would break `Message(role="user", content=[...])`. So `Message` remains the schema type everywhere. Users interact with `MessageWrapper` via `response.messages` but never construct it directly.
+
+### `ExecutablePrompt` — should it be public?
+
+`ExecutablePrompt` is the class returned by `ai.prompt()`. Today it's not exported — users can't type-annotate a variable that holds a prompt reference.
+
+```python
+# Today: no way to annotate this
+my_prompt = ai.prompt("greeting")
+
+# Proposed: export as Prompt
+from genkit import Prompt
+my_prompt: Prompt = ai.prompt("greeting")
+```
+
+Recommendation: export it as `Prompt`. It's a core concept, and being unable to type-annotate it is a gap. This would bring the top-level to 6 symbols.
+
+### What was removed
+
+| Symbol | Reason |
+|--------|--------|
+| `tool_response` | Only 3 sample usages. JS/Go use a method on the tool instance. |
+| `Plugin` | Users pass plugin instances (`GoogleAI()`), never reference the type. Moved to `genkit.plugin`. |
+| `get_logger` | Thin wrapper around `logging.getLogger("genkit")`. Use the stdlib. |
+| `GenkitRegistry`, `FlowWrapper`, `SimpleRetrieverOptions` | Internal implementation types. |
+
+### `ToolRunContext` placement
+
+`ToolRunContext` extends `ActionRunContext` with tool-specific features. Both types are kept (for documentation clarity, future-proofing, and runtime `isinstance` checks), but only `ActionRunContext` is exported from the top level. `ToolRunContext` is available from `genkit.types` for type annotations when needed.
+
+---
+
+## Entry point 2: `from genkit.types import ...`
+
+Content types that app developers construct and pass around — the schema types (Layer 1) that users interact with directly.
+
+```python
+from genkit.types import (
+    # Content
+    Part, TextPart, MediaPart, Media,
+    DataPart, ToolRequestPart, ToolResponsePart, CustomPart,
+
+    # Messages
+    Message, Role,
+
+    # Documents
+    Document, DocumentData,
+
+    # Context
+    ToolRunContext,
+
+    # Evaluation
+    BaseEvalDataPoint,
+
+    # Tool control
+    ToolChoice,
+
+    # Generation config
+    GenerationCommonConfig,
+)
+```
+
+This module is focused: things app developers construct and pass to Genkit methods. Plugin contract types (`GenerateRequest`, `OutputConfig`, `ModelInfo`) have been moved to `genkit.plugin` — they don't belong in the app developer's import path.
+
+### No re-exports at the top level
+
+`from genkit import Part, Message` does **not** work. Content types live in `genkit.types` only. This keeps the top-level surface tight and makes it unambiguous where types come from. Samples already use `from genkit.types import ...` — this formalizes the existing pattern.
+
+---
+
+## Entry point 3: `from genkit.plugin import ...`
+
+Everything a plugin author needs to implement a model provider, retriever, embedder, evaluator, or web framework integration.
+
+```python
+from genkit.plugin import (
+    # Base class
+    Plugin,
+
+    # Action system
+    Action, ActionRunContext,
+
+    # Schema types (wire format — what plugins receive and return)
+    GenerateRequest, GenerateResponse, GenerateResponseChunk,
+    Message, OutputConfig, ModelInfo, Supports,
+
+    # Request/response types for other action types
+    RetrieverRequest, RetrieverResponse,
+    EmbedRequest, EmbedResponse,
+
+    # Metadata builders
+    model_metadata, retriever_metadata, embedder_metadata,
+
+    # Telemetry
+    TelemetryConfig,
+)
+```
+
+Note: `GenerateResponse` here is the **schema type** (auto-generated, no convenience methods). In `from genkit import GenerateResponse`, it's the **veneer** (with `.text`, `.output`, etc.). Different classes, different modules, different audiences. This coexistence works without shadowing because the two types never appear in the same import path.
+
+### Cross-language comparison
+
+| Language | App developer imports | Plugin author imports |
+|----------|----------------------|---------------------|
+| **JS** | `import { genkit, z } from 'genkit'` (unified) | Same package |
+| **Go** | `import "github.com/firebase/genkit/go/ai"` (types separate) | Same package, different types |
+| **Python (proposed)** | `from genkit import Genkit` + `from genkit.types import Part, Message` | `from genkit.plugin import GenerateRequest, Plugin` |
+
+Python follows the Go pattern — types are a separate import. This matches what samples already do in practice.
+
+---
+
+## Internal modules
+
+Everything under `genkit._core`, `genkit._blocks`, and `genkit._ai` (note underscore prefix) carries no stability guarantee. Today these modules lack the underscore (`genkit.core`, `genkit.blocks`, `genkit.ai`), which is why samples and the documentation agent used internal paths. Renaming them is part of this proposal — the underscore is Python's convention for "private, use at your own risk."
+
+---
+
+## Open design question: `Input[T]` / `Output[T]`
+
+This is the one genuinely open question in the public API surface. It's covered in depth as a design decision in [PYTHON_API_REVIEW.md, section 5](./PYTHON_API_REVIEW.md).
+
+Summary: `Output[T]` carries generic type information for typed responses (`ai.generate(output=Output(MyModel))` → `GenerateResponse[MyModel]`). The alternative is inline kwargs (`output_schema=MyModel`), which loses the generic typing. A tech lead challenged the naming — "Input of what? Output of what?" — arguing the names are too generic.
+
+Three options: inline only, wrapper only, or keep both. Recommendation is wrapper only (consolidate to one `output=` param), with the name open for discussion.
diff --git a/py/docs/python_beta_sdk_audit.md b/py/docs/python_beta_sdk_audit.md
new file mode 100644
index 0000000000..b29f37bf94
--- /dev/null
+++ b/py/docs/python_beta_sdk_audit.md
@@ -0,0 +1,92 @@
+# API Audit: Resolved Decisions
+
+The documentation audit surfaced 10 API issues. The items below had clear Pythonic answers and are resolved. The remaining issues — streaming, public API surface, output configuration, async support, method signatures, and class structure — are open design questions covered in [PYTHON_API_REVIEW.md](./PYTHON_API_REVIEW.md).
+
+---
+
+## Keyword-only arguments
+
+All public methods currently accept positional arguments. Nothing prevents `ai.generate("gemini", "Hi", None, None, ["search"])` — five positional args where the middle three are just filling slots. This is the single most common source of fragile call sites.
+
+**Decision:** Every public method gets a `*` marker after `self`. At most one positional argument is allowed (e.g., `input` on prompt `__call__`). Everything else is keyword-only.
+
+```python
+# Before — positional abuse possible
+ai.generate("gemini", "Hi", None, None, ["search"])
+
+# After — every argument named
+ai.generate(model="gemini", prompt="Hi", tools=["search"])
+```
+
+This is standard Python convention. OpenAI, Anthropic, and most modern Python APIs enforce keyword-only arguments on methods with more than 2-3 parameters.
+
+## Decorator shorthands
+
+Already implemented. `@ai.tool()`, `@ai.flow()` exist alongside imperative `define_*` methods. App developers use decorators; plugin authors use the imperative API. This also resolved the handler signature discoverability issue — decorators make expected signatures clear through type hints, while the imperative `define_*` methods accept generic callables with no signature guidance.
+
+## Part constructor
+
+Runtime testing revealed that `Part(text="hello")` works via Pydantic's union parsing — `Part` is a `RootModel[Union[TextPart, MediaPart, ...]]` and Pydantic resolves the correct variant from keyword arguments. The verbose `Part(root=TextPart(text="hello"))` form also works but adds no value. Samples use the verbose form.
+
+**Decision:** Bless the shorthand as the documented pattern. Both forms produce identical objects.
+
+```python
+Part(text="hello")                                              # blessed
+Part(media=Media(url="https://...", content_type="image/png"))  # blessed
+```
+
+## RetrieverResponse iterability
+
+`RetrieverResponse` has a `.documents` field but doesn't implement Python's sequence protocol. The audit found 9 occurrences of code trying to iterate over the response directly — the most common single error pattern.
+
+**Decision:** Implement `__iter__`, `__len__`, `__getitem__` delegating to `self.documents`.
+
+```python
+# Before — must access .documents
+for doc in response.documents:
+
+# After — response is directly iterable
+for doc in await ai.retrieve(retriever=my_retriever, query=query):
+    print(doc.text)
+
+len(response)    # number of documents
+response[0]      # first document
+```
+
+This follows the Python convention that collection-like objects should implement the sequence protocol. `RetrieverResponse` is conceptually a list of documents with metadata — it should behave like one.
+
+## response.media property
+
+JS has `response.media` for image generation responses. The audit found 5 occurrences of code using this property — all runtime errors in Python. Users currently have to navigate `response.message.content[0].media`.
+
+**Decision:** Add a `response.media` convenience property on `GenerateResponseWrapper`.
+
+```python
+response = await ai.generate(model="googleai/imagen3", prompt="a cat")
+image = response.media  # Media | None
+```
+
+## Veneer naming
+
+The SDK has auto-generated schema types (`GenerateResponse` from `genkit-schemas.json`) and hand-written wrappers that add convenience methods (`GenerateResponseWrapper`). Users interact with the wrapper but see the "Wrapper" suffix in type hints and docs.
+
+**Decision:** Alias the wrapper under the clean name at the public surface: `GenerateResponseWrapper` exported as `GenerateResponse` from `from genkit import ...`. The auto-generated schema type remains available as `GenerateResponse` in `genkit.plugin` for plugin authors. See [TYPE_LAYERS.md](./TYPE_LAYERS.md) for the full type architecture.
+
+## Type consolidation
+
+Two nearly-identical types existed: `BaseDataPoint` (generic) and `BaseEvalDataPoint` (evaluator-specific). The audit found samples using them interchangeably.
+
+**Decision:** Merge into `BaseEvalDataPoint`. Remove `BaseDataPoint` from the public API.
+
+## Public API cleanup
+
+Several symbols were in the public `__all__` that don't belong:
+
+- **`tool_response`** — only 3 sample usages. JS and Go use a method on the tool instance. Removed.
+- **`dump_dict` / `dump_json`** — internal serialization utilities. Removed.
+- **`get_logger`** — thin wrapper around `logging.getLogger("genkit")`. Python developers know the stdlib. Removed.
+- **`GenkitRegistry`, `FlowWrapper`, `SimpleRetrieverOptions`** — internal implementation types. Removed.
+
+## Evaluator API
+
+The evaluator API (`GenkitMetricType`, `MetricConfig`, `PluginOptions`) has its own design issues — the audit found the API shape diverges significantly from what the naming suggests. Not addressed in this review; flagged for separate follow-up.
diff --git a/py/docs/python_beta_sdk_design.md b/py/docs/python_beta_sdk_design.md
new file mode 100644
index 0000000000..96eb8368b7
--- /dev/null
+++ b/py/docs/python_beta_sdk_design.md
@@ -0,0 +1,330 @@
+# Genkit Python SDK — API Design Review
+
+## 1. Background
+
+The Python SDK launched to match JS and Go feature timelines. It achieved feature parity, but the API surface was never designed independently. Patterns were ported from JS rather than designed natively for Python. 
+
+The Python SDK is public but hasn't cut a stable release. The JS SDK went through a similar cleanup between v0.5 and v1.0, and the migration cost grew with each release. Python is earlier in that curve and changes are still cheap.
+
+In this doc, we're laying out some guiding principles for designing the API so we have more consistency and standardization for adding new framework features going forward.
+
+## 2. Principles
+
+### Pythonic API conventions
+
+The JS SDK uses options objects, camelCase, and callback patterns. "Pythonic" means a set of concrete conventions:
+
+**Zero-to-one positional arguments.** Every public method allows at most one positional arg — the "obvious" one (e.g., `input` for a prompt call). Everything else is keyword-only via the `*` marker. This prevents positional abuse and makes call sites self-documenting:
+
+```python
+# Bad: what are these arguments?
+ai.generate("gemini", "Hi", None, None, ["search"])
+
+# Good: every argument is named
+ai.generate(model="gemini", prompt="Hi", tools=["search"])
+```
+
+**Kwargs over options dicts.** JS groups parameters into an options object. Python has first-class keyword arguments. Dict-based configuration loses autocomplete, type checking, and discoverability. This applies to `generate()`, `prompt()`, and every public method.
+
+**One way to configure each behavior.** PEP 20: "There should be one — and preferably only one — obvious way to do it." If output format can be set via `output=Output(...)`, it shouldn't also be settable via `output_format=`, `output_content_type=`, and `output_constrained=`. Multiple paths to the same result create confusion in docs, samples, and LLM-generated code.
+
+**Types should help, not hinder.** The SDK uses Pydantic models, generics, and type hints extensively. These should earn their keep: autocomplete that works, return types that carry meaning (`GenerateResponse[MyModel]` so `.output` is typed), and import paths that make sense. When a developer has to choose between `OutputConfig`, `OutputConfigDict`, and `Output[T]` to configure the same behavior, the type system is creating friction, not reducing it.
+
+**Flat imports, intentional boundaries.** Python has no access modifiers — any module is importable, and there's no way to enforce "private." This makes API boundary design a deliberate choice, not a language feature. We define three public entry points (`genkit`, `genkit.types`, `genkit.plugin`) and treat everything else as internal with no stability guarantee. Internal modules should be underscore-prefixed (`genkit._core`, `genkit._blocks`) to signal this — today they lack the underscore, which is why samples accidentally depend on them. The mechanics of this boundary are covered in section 4.
+
+## 3. Initial Audit
+
+While working on updated docs, we identified several friction points in the developer experience. 
+
+For many of these friction points, there was a clear Pythonic standard to follow — keyword-only arguments on all methods, sequence protocol on `RetrieverResponse`, convenience properties like `response.media`, veneer aliasing (`GenerateResponseWrapper` → `GenerateResponse`), and cleanup of internal utilities from the public surface. More details here: [API_AUDIT.md](./API_AUDIT.md)
+
+The remaining sections in this doc are open questions that need some discussion to resolve.
+
+## 4. Public API surface & type architecture
+
+Today there is no formal public/internal boundary. The documentation audit found samples importing from `genkit.core.action`, `genkit.blocks.model`, and `genkit.ai` — all internal paths that happen to work. This means any internal module rename or refactor is a breaking change for external developers, even if the public API hasn't changed. App developers and plugin authors share a single `genkit.types` module, which means app developers are exposed to plugin contract types they'll never use — and plugin authors have to sift through content types to find the schema types they need. Wrapper classes are exported under internal names like `GenerateResponseWrapper`, so the implementation detail of "this is a wrapper around an auto-generated type" leaks into every type hint and docstring.
+
+We propose formalizing three entry points, separated by audience:
+
+- **`from genkit import ...`** — App developers. 5-6 symbols: `Genkit`, `ActionRunContext`, `GenerateResponse` (veneer), `GenkitError`, `UserFacingError`,  `Prompt`.
+- **`from genkit.types import ...`** — App developers (data types). Content types: `Part`, `Message`, `Document`, `Role`, `ToolChoice`, `GenerationCommonConfig`, etc.
+- **`from genkit.plugin import ...`** — Plugin authors. Plugin contract: `Plugin`, `GenerateRequest`, `GenerateResponse` (schema), `OutputConfig`, `ModelInfo`, metadata builders, etc.
+
+Internal modules (`genkit.core`, `genkit.blocks`, `genkit.ai`) would be renamed with underscore prefixes (`genkit._core`, `genkit._blocks`) to signal "private, no stability guarantee" — the standard Python convention.
+
+The full proposal — including the type architecture (auto-generated schema types vs hand-written veneers vs config helpers), symbol lists, rationale for each inclusion/exclusion, and the `MessageWrapper` aliasing problem — is in [PUBLIC_API_PROPOSAL.md](./PUBLIC_API_PROPOSAL.md).
+
+## 5. Output configuration
+
+The `generate()` method currently accepts output configuration five different ways:
+
+```python
+# Way 1-4: Inline kwargs
+await ai.generate(prompt="...", output_format="json", output_content_type="application/json",
+                  output_instructions="Return valid JSON", output_constrained=True)
+
+# Way 5a: Wire-format type (leaks through the type union)
+await ai.generate(prompt="...", output=OutputConfig(format="json", schema=my_schema))
+
+# Way 5b: TypedDict
+await ai.generate(prompt="...", output={"format": "json", "schema_": my_schema})
+
+# Way 5c: Config helper with generics
+await ai.generate(prompt="...", output=Output(schema=MyModel))
+```
+
+Four inline params, plus an `output` param that accepts three different types. This directly violates "one obvious way to do it" — and it shows up in practice. My documentation agent used different approaches in different files while updating documentation because of this ambiguity.
+
+**Flat kwargs vs. wrapper object.** We considered both approaches:
+
+A **wrapper object** (`output=OutputConfig(Recipe)` or `output=Recipe`) bundles the schema with secondary options (format, constrained, instructions) into one param, reducing `generate()`'s parameter count. But it introduces a new type developers have to learn and import, and creates a naming problem — `Output[T]` is too generic ("Output of what?"), and the obvious alternative `OutputConfig[T]` collides with the existing wire-format `OutputConfig` used by plugin authors.
+
+**Flat kwargs** (`output_schema=Recipe`) is the more Pythonic approach. Python functions embrace explicit parameters with defaults — `requests.get()` has 15+ kwargs, `json.dumps()` has 8, `subprocess.run()` has 12. No config objects. The secondary output params (`output_format`, `output_constrained`, `output_content_type`, `output_instructions`) stay as kwargs with sensible defaults — `output_format` auto-defaults to `'json'` when a schema is set, the rest default to `None` and are rarely used. The common case is just:
+
+```python
+response = await ai.generate(prompt="...", output_schema=Recipe)
+response.output.name  # typed as str — IDE autocomplete works
+```
+
+No new types, no imports beyond the Pydantic model, and the 95% case is one kwarg.
+
+**Recommendation.** Flat kwargs. Remove the `output` param (which currently accepts `OutputConfig | OutputConfigDict | Output[T]`). Keep `output_schema`, `output_format`, `output_constrained`, `output_content_type`, and `output_instructions` as individual keyword-only params. Use `@overload` so `output_schema: type[T]` parameterizes the return type. Remove `Output[T]`, `OutputConfigDict`, and the wire-format `OutputConfig` from the app developer's surface entirely. The wire-format `OutputConfig` remains an internal/plugin type — plugin authors use it when implementing model plugins to read output configuration from the `GenerateRequest`, but app developers never see it.
+
+`output_schema` accepts three forms: a Pydantic model class (`type[T]` — the common case, gives typed returns), a raw JSON schema dict (`dict` — for dynamic schemas, returns `Any`), or a registered schema name (`str` - looked up from registry at runtime, returns `Any`). Only the Pydantic class form carries the generic type. 
+
+**The same applies to `input_schema`.** The current `Input[T]` wrapper exists for the same (incorrect) reason as `Output[T]`. It's used on `define_prompt()` and `ai.prompt()` to type the prompt's input parameter. With flat kwargs and overloads, `input_schema: type[T]` carries the generic directly — `Input[T]` can be removed:
+
+```python
+prompt = ai.define_prompt(
+    name='recipe',
+    input_schema=RecipeInput,
+    output_schema=Recipe,
+    prompt='Make a recipe for {dish}',
+)
+
+response = await prompt(RecipeInput(dish='pizza'))
+response.output.name  # typed as str — IDE knows this is Recipe
+```
+
+`Input[T]` and `Output[T]` are both removed from the public API — they no longer need to exist. `input_schema` and `output_schema` already exist as params today; they just need the `type[T]` overloads to carry the generic. That's three types eliminated from the app developer surface (`Input[T]`, `Output[T]`, `OutputConfigDict`) and zero new types introduced.
+
+**The Dotprompt typing gap.** This decision also simplifies the Dotprompt story. When a schema is defined in a `.prompt` file's YAML frontmatter (`output: { schema: Recipe }`), the SDK uses it to constrain the model's JSON output at runtime. But the type checker doesn't know this — `.prompt` files can't carry Python type references — so `response.output` is `Any`. To get typed output, pass the schema at the call site:
+
+```python
+# Without output_schema — runtime parsing works, but typing is Any
+recipe = ai.prompt('recipe')
+response = await recipe({'food': 'pizza'})
+response.output  # Any — no autocomplete
+
+# With output_schema — typed
+recipe = ai.prompt('recipe', output_schema=Recipe)
+response = await recipe({'food': 'pizza'})
+response.output.name  # str — IDE knows this is Recipe
+```
+
+This is inherent to Python's static type system. The redundancy (schema in both `.prompt` and Python) is the cost of typed output, and it's a cost every framework with external schema files pays. The flat kwarg `output_schema=Recipe` keeps this as lightweight as possible — no wrapper type needed, just name the class.
+
+## 6. Streaming API
+
+The SDK currently has two streaming patterns:
+
+```python
+# generate_stream() — returns a tuple
+stream, future = ai.generate_stream(prompt="Tell me a story")
+async for chunk in stream:
+    print(chunk.text, end="")
+response = await future
+
+# prompt.stream() — returns an object with .stream accessor
+result = prompt.stream({"topic": "AI"})
+async for chunk in result.stream:
+    print(chunk.text, end="")
+response = await result.response
+```
+
+The Python standard for streaming is iterators — OpenAI, Anthropic, and every major Python SDK use them.
+
+Genkit can't use a plain async generator (the OpenAI pattern) because Genkit responses carry more than text — structured output parsing, usage statistics, tool request handling, and the assembled `Message` for multi-turn conversations. A plain generator can't expose a final response object after iteration.
+
+**Proposed streaming syntax:**
+
+```python
+# Simple case — looks identical to a plain generator
+async for chunk in ai.generate_stream(prompt="Tell me a story"):
+    print(chunk.text, end="")
+
+# When you need the final response — assign, iterate, then access
+result = ai.generate_stream(prompt="Tell me a story")
+async for chunk in result:
+    print(chunk.text, end="")
+response = await result.response  # structured output, usage stats, tool requests
+```
+
+**What changes:**
+- `generate_stream()` returns a directly iterable object (implements `__aiter__`) with a `.response` property for the final assembled response
+- `prompt.stream()` uses the same pattern — one streaming convention across the SDK
+- The tuple return is removed — no more destructuring
+
+## 7. Sync and async support
+
+Every Genkit Python method is `async def`. There is no sync API. This is a Python-specific problem — JS is inherently async, Go handles concurrency transparently with goroutines. Python is the only language where the developer has to explicitly choose.
+
+The practical consequences: a Flask route handler can't call `ai.generate()` without managing an event loop. A Jupyter notebook cell needs `await` or `nest_asyncio` workarounds. A CLI script requires wrapping everything in `async def main()` and `ai.run_main()`. These are the most common entry points for developers trying Genkit for the first time.
+
+For context, every major Python LLM SDK offers both sync and async: OpenAI and Anthropic ship dual clients (`OpenAI` / `AsyncOpenAI`), LangChain has dual methods (`.invoke()` / `.ainvoke()`), Google Cloud AI uses a separate async transport. Even Hugging Face, which is sync-only, made a deliberate choice. Genkit is the only async-only SDK in the ecosystem. A developer coming from OpenAI's `client.chat.completions.create()` — no `await`, no `async def` — hits immediate friction.
+
+**Proposal.** Dual clients — `Genkit` (sync) and `AsyncGenkit` (async). This is the industry standard: OpenAI, Anthropic, and Cohere all ship it. The async client holds the real implementation; the sync client delegates to it. We prefer the dual-client pattern (separate classes) over dual methods (`generate()` / `agenerate()` on the same class, à la LangChain) because it keeps each class's type signatures clean — every method on `Genkit` returns `T`, every method on `AsyncGenkit` returns `Awaitable[T]` — and avoids polluting autocomplete with `a`-prefixed duplicates of every method.
+
+```python
+from genkit import Genkit, AsyncGenkit
+
+# Sync (scripts, Flask, notebooks)
+ai = Genkit(plugins=[GoogleAI()])
+response = ai.generate(model="googleai/gemini-2.0-flash", prompt="Hi")
+
+# Async (FastAPI, high-concurrency)
+ai = AsyncGenkit(plugins=[GoogleAI()])
+response = await ai.generate(model="googleai/gemini-2.0-flash", prompt="Hi")
+```
+
+The maintenance cost is manageable: the sync client is auto-generated from the async client's method signatures, as OpenAI and Anthropic do. No duplicate implementation, no diverging logic.
+
+## 8. Method signatures
+
+The current `generate()` signature has 20 parameters:
+
+```python
+async def generate(
+    self,
+    model=None, prompt=None, system=None, messages=None,
+    tools=None, return_tool_requests=None, tool_choice=None,
+    tool_responses=None, config=None, max_turns=None,
+    on_chunk=None, context=None,
+    output_format=None, output_content_type=None,
+    output_instructions=None, output_constrained=None,
+    output=None, use=None, docs=None,
+) -> GenerateResponseWrapper[Any]:
+```
+
+None are keyword-only. Several don't belong.
+
+**What changes:**
+
+- **Add `*`** (section 3) — all params keyword-only.
+- **Keep output as flat kwargs** (section 5) — `output_schema`, `output_format`, `output_constrained` stay as individual kwargs with defaults. Remove the `output` param that accepted `OutputConfig | OutputConfigDict | Output[T]`. Net: same param count for output, but one way to configure instead of five.
+- **Remove `on_chunk`** — `generate()` has a streaming callback parameter, but streaming belongs on `generate_stream()`.
+- **Move `tool_responses` to `resume`** — only used when resuming from a tool interrupt. JS already groups this under a `resume` options object.
+
+**After cleanup:**
+
+```python
+async def generate(
+    self,
+    *,
+    model: str | None = None,
+    prompt: str | Part | list[Part] | None = None,
+    system: str | Part | list[Part] | None = None,
+    messages: list[Message] | None = None,
+    tools: list[str] | None = None,
+    tool_choice: ToolChoice | None = None,
+    return_tool_requests: bool | None = None,
+    config: GenerationCommonConfig | dict | None = None,
+    max_turns: int | None = None,
+    context: dict[str, object] | None = None,
+    output_schema: type[OutputT] | None = None,
+    output_format: str | None = None,
+    output_constrained: bool | None = None,
+    use: list[ModelMiddleware] | None = None,
+    docs: list[DocumentData] | None = None,
+) -> GenerateResponse[OutputT]:
+```
+
+**`prompt.__call__()` also changes.** Today it takes a JS-style opts dict:
+
+```python
+# Before — opts dict, no autocomplete
+response = await my_prompt({"name": "Ted"}, opts={"config": {"temperature": 0.4}})
+
+# After — kwargs, full IDE support
+response = await my_prompt({"name": "Ted"}, config={"temperature": 0.4})
+```
+
+The `opts` dict (a TypedDict with 16 fields) is replaced with individual kwargs:
+
+```python
+async def __call__(
+    self,
+    input: InputT | None = None,
+    *,
+    model: str | None = None,
+    config: GenerationCommonConfig | dict[str, Any] | None = None,
+    messages: list[Message] | None = None,
+    docs: list[DocumentData] | None = None,
+    tools: list[str] | None = None,
+    tool_choice: ToolChoice | None = None,
+    output_schema: type | dict[str, Any] | None = None,
+    output_format: str | None = None,
+    output_constrained: bool | None = None,
+    return_tool_requests: bool | None = None,
+    max_turns: int | None = None,
+    use: list[ModelMiddleware] | None = None,
+    context: dict[str, Any] | None = None,
+) -> GenerateResponse[OutputT]:
+```
+
+`input` stays as the one positional arg (the template variables). Everything else is keyword-only. The `resume` options move to a separate `resume()` method or a `resume` kwarg (matching section 8's `generate()` cleanup). `on_chunk` is removed — streaming belongs on `prompt.stream()`.
+
+**`generate_stream()` — after cleanup:**
+
+```python
+def generate_stream(
+    self,
+    *,
+    model: str | None = None,
+    prompt: str | Part | list[Part] | None = None,
+    system: str | Part | list[Part] | None = None,
+    messages: list[Message] | None = None,
+    tools: list[str] | None = None,
+    tool_choice: ToolChoice | None = None,
+    return_tool_requests: bool | None = None,
+    config: GenerationCommonConfig | dict | None = None,
+    max_turns: int | None = None,
+    context: dict[str, object] | None = None,
+    output_schema: type[OutputT] | None = None,
+    output_format: str | None = None,
+    output_constrained: bool | None = None,
+    use: list[ModelMiddleware] | None = None,
+    docs: list[DocumentData] | None = None,
+) -> GenerateStreamResponse[OutputT]:
+```
+
+Same params as `generate()`. The return type changes from `tuple[AsyncIterator, Future]` to a single `GenerateStreamResponse` object that is directly async-iterable and exposes `.response` (see section 7).
+
+**`retrieve()` — after cleanup:**
+
+```python
+async def retrieve(
+    self,
+    *,
+    retriever: str,
+    query: str | DocumentData,
+    options: dict[str, object] | None = None,
+) -> RetrieverResponse:
+```
+
+Already clean — just needs the `*` marker to enforce keyword-only. `retriever` and `query` become required (they were Optional before, but calling without them always fails).
+
+**`embed()` — after cleanup:**
+
+```python
+async def embed(
+    self,
+    *,
+    embedder: str,
+    content: str | Document | DocumentData,
+    metadata: dict[str, object] | None = None,
+    options: dict[str, object] | None = None,
+) -> list[Embedding]:
+```
+
+Same treatment — `*` marker, `embedder` and `content` become required.

From 8853bb55bde8b8710fca261e20bc25de23ef8904 Mon Sep 17 00:00:00 2001
From: huangjeff5 <64040981+huangjeff5@users.noreply.github.com>
Date: Tue, 17 Feb 2026 11:24:04 -0600
Subject: [PATCH 02/17] Update py/docs/python_beta_sdk_design.md

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
---
 py/docs/python_beta_sdk_design.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/py/docs/python_beta_sdk_design.md b/py/docs/python_beta_sdk_design.md
index 96eb8368b7..e539fe4ea1 100644
--- a/py/docs/python_beta_sdk_design.md
+++ b/py/docs/python_beta_sdk_design.md
@@ -277,7 +277,7 @@ async def __call__(
 **`generate_stream()` — after cleanup:**
 
 ```python
-def generate_stream(
+async def generate_stream(
     self,
     *,
     model: str | None = None,

From e10fbfe97c371beb79b6f5e287b094c67bf5f6f5 Mon Sep 17 00:00:00 2001
From: huangjeff5 <64040981+huangjeff5@users.noreply.github.com>
Date: Tue, 17 Feb 2026 11:24:19 -0600
Subject: [PATCH 03/17] Update py/docs/python_beta_sdk_design.md

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
---
 py/docs/python_beta_sdk_design.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/py/docs/python_beta_sdk_design.md b/py/docs/python_beta_sdk_design.md
index e539fe4ea1..59a580a7ca 100644
--- a/py/docs/python_beta_sdk_design.md
+++ b/py/docs/python_beta_sdk_design.md
@@ -90,7 +90,7 @@ No new types, no imports beyond the Pydantic model, and the 95% case is one kwar
 
 **Recommendation.** Flat kwargs. Remove the `output` param (which currently accepts `OutputConfig | OutputConfigDict | Output[T]`). Keep `output_schema`, `output_format`, `output_constrained`, `output_content_type`, and `output_instructions` as individual keyword-only params. Use `@overload` so `output_schema: type[T]` parameterizes the return type. Remove `Output[T]`, `OutputConfigDict`, and the wire-format `OutputConfig` from the app developer's surface entirely. The wire-format `OutputConfig` remains an internal/plugin type — plugin authors use it when implementing model plugins to read output configuration from the `GenerateRequest`, but app developers never see it.
 
-`output_schema` accepts three forms: a Pydantic model class (`type[T]` — the common case, gives typed returns), a raw JSON schema dict (`dict` — for dynamic schemas, returns `Any`), or a registered schema name (`str` - looked up from registry at runtime, returns `Any`). Only the Pydantic class form carries the generic type. 
+`output_schema` accepts three forms: a Pydantic model class (`type[T]` — the common case, gives typed returns), a raw JSON schema dict (`dict` — for dynamic schemas, returns `Any`), or a registered schema name (`str`, looked up from registry at runtime, returns `Any`). Only the Pydantic class form carries the generic type. 
 
 **The same applies to `input_schema`.** The current `Input[T]` wrapper exists for the same (incorrect) reason as `Output[T]`. It's used on `define_prompt()` and `ai.prompt()` to type the prompt's input parameter. With flat kwargs and overloads, `input_schema: type[T]` carries the generic directly — `Input[T]` can be removed:
 

From 8eb607a35715f2a69c22aac34cb00ca052046b81 Mon Sep 17 00:00:00 2001
From: huangjeff5 <64040981+huangjeff5@users.noreply.github.com>
Date: Tue, 17 Feb 2026 11:24:33 -0600
Subject: [PATCH 04/17] Update py/docs/python_beta_sdk_design.md

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
---
 py/docs/python_beta_sdk_design.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/py/docs/python_beta_sdk_design.md b/py/docs/python_beta_sdk_design.md
index 59a580a7ca..c2bc2a42d7 100644
--- a/py/docs/python_beta_sdk_design.md
+++ b/py/docs/python_beta_sdk_design.md
@@ -52,7 +52,7 @@ We propose formalizing three entry points, separated by audience:
 
 Internal modules (`genkit.core`, `genkit.blocks`, `genkit.ai`) would be renamed with underscore prefixes (`genkit._core`, `genkit._blocks`) to signal "private, no stability guarantee" — the standard Python convention.
 
-The full proposal — including the type architecture (auto-generated schema types vs hand-written veneers vs config helpers), symbol lists, rationale for each inclusion/exclusion, and the `MessageWrapper` aliasing problem — is in [PUBLIC_API_PROPOSAL.md](./PUBLIC_API_PROPOSAL.md).
+The full proposal — including the type architecture (auto-generated schema types vs hand-written veneers vs config helpers), symbol lists, rationale for each inclusion/exclusion, and the `MessageWrapper` aliasing problem — is in [python_beta_api_proposal.md](./python_beta_api_proposal.md).
 
 ## 5. Output configuration
 

From ab970c05438cb120c6666956600bd75d6026d34a Mon Sep 17 00:00:00 2001
From: huangjeff5 <64040981+huangjeff5@users.noreply.github.com>
Date: Tue, 17 Feb 2026 11:24:45 -0600
Subject: [PATCH 05/17] Update py/docs/python_beta_sdk_audit.md

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
---
 py/docs/python_beta_sdk_audit.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/py/docs/python_beta_sdk_audit.md b/py/docs/python_beta_sdk_audit.md
index b29f37bf94..8df0a75c5b 100644
--- a/py/docs/python_beta_sdk_audit.md
+++ b/py/docs/python_beta_sdk_audit.md
@@ -70,7 +70,7 @@ image = response.media  # Media | None
 
 The SDK has auto-generated schema types (`GenerateResponse` from `genkit-schemas.json`) and hand-written wrappers that add convenience methods (`GenerateResponseWrapper`). Users interact with the wrapper but see the "Wrapper" suffix in type hints and docs.
 
-**Decision:** Alias the wrapper under the clean name at the public surface: `GenerateResponseWrapper` exported as `GenerateResponse` from `from genkit import ...`. The auto-generated schema type remains available as `GenerateResponse` in `genkit.plugin` for plugin authors. See [TYPE_LAYERS.md](./TYPE_LAYERS.md) for the full type architecture.
+**Decision:** Alias the wrapper under the clean name at the public surface: `GenerateResponseWrapper` exported as `GenerateResponse` from `from genkit import ...`. The auto-generated schema type remains available as `GenerateResponse` in `genkit.plugin` for plugin authors. See [python_beta_api_proposal.md](./python_beta_api_proposal.md) for the full type architecture.
 
 ## Type consolidation
 

From 45e3f1efde23652188b1309220220d026e46d0e5 Mon Sep 17 00:00:00 2001
From: huangjeff5 <64040981+huangjeff5@users.noreply.github.com>
Date: Tue, 17 Feb 2026 11:36:06 -0600
Subject: [PATCH 06/17] Update python_beta_api_proposal.md

---
 py/docs/python_beta_api_proposal.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/py/docs/python_beta_api_proposal.md b/py/docs/python_beta_api_proposal.md
index 8162cfc29b..843c93db2a 100644
--- a/py/docs/python_beta_api_proposal.md
+++ b/py/docs/python_beta_api_proposal.md
@@ -13,7 +13,7 @@ These audiences have separate entry points.
 
 ## Methodology
 
-Every import across 30+ samples, 20+ plugins, and the test suite was audited to understand what symbols people actually use. The symbol lists and usage counts below come from that audit. The documentation audit ([DOCUMENTATION_AUDIT.md](./DOCUMENTATION_AUDIT.md)) independently confirmed the import path confusion — the verification agent used internal paths because no public boundary existed.
+Every import across 30+ samples, 20+ plugins, and the test suite was audited to understand what symbols people actually use. The symbol lists and usage counts below come from that audit.
 
 ---
 

From 01ecdc91a74ba882b41c0e09f760e41c86576828 Mon Sep 17 00:00:00 2001
From: Jeff Huang <huangjeff@google.com>
Date: Tue, 17 Feb 2026 11:48:43 -0600
Subject: [PATCH 07/17] fix links

---
 py/docs/python_beta_sdk_design.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/py/docs/python_beta_sdk_design.md b/py/docs/python_beta_sdk_design.md
index c2bc2a42d7..70322da36f 100644
--- a/py/docs/python_beta_sdk_design.md
+++ b/py/docs/python_beta_sdk_design.md
@@ -36,7 +36,7 @@ ai.generate(model="gemini", prompt="Hi", tools=["search"])
 
 While working on updated docs, we identified several friction points in the developer experience. 
 
-For many of these friction points, there was a clear Pythonic standard to follow — keyword-only arguments on all methods, sequence protocol on `RetrieverResponse`, convenience properties like `response.media`, veneer aliasing (`GenerateResponseWrapper` → `GenerateResponse`), and cleanup of internal utilities from the public surface. More details here: [API_AUDIT.md](./API_AUDIT.md)
+For many of these friction points, there was a clear Pythonic standard to follow — keyword-only arguments on all methods, sequence protocol on `RetrieverResponse`, convenience properties like `response.media`, veneer aliasing (`GenerateResponseWrapper` → `GenerateResponse`), and cleanup of internal utilities from the public surface. More details here: [python_beta_sdk_audit.md](./py/docs/python_beta_sdk_audit.md)
 
 The remaining sections in this doc are open questions that need some discussion to resolve.
 

From df41e82f9fd870f45be92c26342c2db473a9e1aa Mon Sep 17 00:00:00 2001
From: Jeff Huang <huangjeff@google.com>
Date: Tue, 17 Feb 2026 14:49:50 -0600
Subject: [PATCH 08/17] clean up dotprompt section

---
 py/docs/python_beta_sdk_design.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/py/docs/python_beta_sdk_design.md b/py/docs/python_beta_sdk_design.md
index 70322da36f..fe6a906a38 100644
--- a/py/docs/python_beta_sdk_design.md
+++ b/py/docs/python_beta_sdk_design.md
@@ -77,7 +77,7 @@ Four inline params, plus an `output` param that accepts three different types. T
 
 **Flat kwargs vs. wrapper object.** We considered both approaches:
 
-A **wrapper object** (`output=OutputConfig(Recipe)` or `output=Recipe`) bundles the schema with secondary options (format, constrained, instructions) into one param, reducing `generate()`'s parameter count. But it introduces a new type developers have to learn and import, and creates a naming problem — `Output[T]` is too generic ("Output of what?"), and the obvious alternative `OutputConfig[T]` collides with the existing wire-format `OutputConfig` used by plugin authors.
+A **wrapper object** (`output=OutputConfig(Recipe)` or `output=Recipe`) bundles the schema with secondary options (format, constrained, instructions) into one param, reducing `generate()`'s parameter count. But it introduces a new type developers have to learn and import.
 
 **Flat kwargs** (`output_schema=Recipe`) is the more Pythonic approach. Python functions embrace explicit parameters with defaults — `requests.get()` has 15+ kwargs, `json.dumps()` has 8, `subprocess.run()` has 12. No config objects. The secondary output params (`output_format`, `output_constrained`, `output_content_type`, `output_instructions`) stay as kwargs with sensible defaults — `output_format` auto-defaults to `'json'` when a schema is set, the rest default to `None` and are rarely used. The common case is just:
 
@@ -108,7 +108,7 @@ response.output.name  # typed as str — IDE knows this is Recipe
 
 `Input[T]` and `Output[T]` are both removed from the public API — they no longer need to exist. `input_schema` and `output_schema` already exist as params today; they just need the `type[T]` overloads to carry the generic. That's three types eliminated from the app developer surface (`Input[T]`, `Output[T]`, `OutputConfigDict`) and zero new types introduced.
 
-**The Dotprompt typing gap.** This decision also simplifies the Dotprompt story. When a schema is defined in a `.prompt` file's YAML frontmatter (`output: { schema: Recipe }`), the SDK uses it to constrain the model's JSON output at runtime. But the type checker doesn't know this — `.prompt` files can't carry Python type references — so `response.output` is `Any`. To get typed output, pass the schema at the call site:
+Dotprompt should work in a similar way but with one additional nuance. When a schema is defined in a `.prompt` file's YAML frontmatter (`output: { schema: Recipe }`), the SDK uses it to constrain the model's JSON output at runtime. But the type checker doesn't know this — `.prompt` files can't carry Python type references — so `response.output` is `Any`. To get typed output, pass the schema at the call site:
 
 ```python
 # Without output_schema — runtime parsing works, but typing is Any

From 51f4008b0b99d7545be54552133cd58fb0e978be Mon Sep 17 00:00:00 2001
From: Jeff Huang <huangjeff@google.com>
Date: Tue, 17 Feb 2026 16:35:31 -0600
Subject: [PATCH 09/17] clean up doc

---
 py/docs/python_beta_sdk_design.md | 35 ++++++++++++-------------------
 1 file changed, 13 insertions(+), 22 deletions(-)

diff --git a/py/docs/python_beta_sdk_design.md b/py/docs/python_beta_sdk_design.md
index fe6a906a38..e4b5f03c32 100644
--- a/py/docs/python_beta_sdk_design.md
+++ b/py/docs/python_beta_sdk_design.md
@@ -26,10 +26,6 @@ ai.generate(model="gemini", prompt="Hi", tools=["search"])
 
 **Kwargs over options dicts.** JS groups parameters into an options object. Python has first-class keyword arguments. Dict-based configuration loses autocomplete, type checking, and discoverability. This applies to `generate()`, `prompt()`, and every public method.
 
-**One way to configure each behavior.** PEP 20: "There should be one — and preferably only one — obvious way to do it." If output format can be set via `output=Output(...)`, it shouldn't also be settable via `output_format=`, `output_content_type=`, and `output_constrained=`. Multiple paths to the same result create confusion in docs, samples, and LLM-generated code.
-
-**Types should help, not hinder.** The SDK uses Pydantic models, generics, and type hints extensively. These should earn their keep: autocomplete that works, return types that carry meaning (`GenerateResponse[MyModel]` so `.output` is typed), and import paths that make sense. When a developer has to choose between `OutputConfig`, `OutputConfigDict`, and `Output[T]` to configure the same behavior, the type system is creating friction, not reducing it.
-
 **Flat imports, intentional boundaries.** Python has no access modifiers — any module is importable, and there's no way to enforce "private." This makes API boundary design a deliberate choice, not a language feature. We define three public entry points (`genkit`, `genkit.types`, `genkit.plugin`) and treat everything else as internal with no stability guarantee. Internal modules should be underscore-prefixed (`genkit._core`, `genkit._blocks`) to signal this — today they lack the underscore, which is why samples accidentally depend on them. The mechanics of this boundary are covered in section 4.
 
 ## 3. Initial Audit
@@ -54,27 +50,24 @@ Internal modules (`genkit.core`, `genkit.blocks`, `genkit.ai`) would be renamed
 
 The full proposal — including the type architecture (auto-generated schema types vs hand-written veneers vs config helpers), symbol lists, rationale for each inclusion/exclusion, and the `MessageWrapper` aliasing problem — is in [python_beta_api_proposal.md](./python_beta_api_proposal.md).
 
+
+^^^ Upon discussion, we got more details on aliasing. App developers may need access to the wire format for unit testing. They are more likely to need that actually vs. the veneer (which I think is handled internally). Also I remember Pavel said something about flow vs. generate. One returns veneer vs. other returns the wire format. He said app developer may need to use one or the other.
+
+^^^ Upon dicsussion, no clear reason to separate from genkit import vs from genkit.types import
+
 ## 5. Output configuration
 
-The `generate()` method currently accepts output configuration five different ways:
+The `generate()` method currently accepts output configuration multiple ways:
 
 ```python
-# Way 1-4: Inline kwargs
+# Way 1: Inline kwargs
 await ai.generate(prompt="...", output_format="json", output_content_type="application/json",
                   output_instructions="Return valid JSON", output_constrained=True)
 
-# Way 5a: Wire-format type (leaks through the type union)
-await ai.generate(prompt="...", output=OutputConfig(format="json", schema=my_schema))
-
-# Way 5b: TypedDict
-await ai.generate(prompt="...", output={"format": "json", "schema_": my_schema})
-
-# Way 5c: Config helper with generics
+# Way 2: Config helper with generics
 await ai.generate(prompt="...", output=Output(schema=MyModel))
 ```
 
-Four inline params, plus an `output` param that accepts three different types. This directly violates "one obvious way to do it" — and it shows up in practice. My documentation agent used different approaches in different files while updating documentation because of this ambiguity.
-
 **Flat kwargs vs. wrapper object.** We considered both approaches:
 
 A **wrapper object** (`output=OutputConfig(Recipe)` or `output=Recipe`) bundles the schema with secondary options (format, constrained, instructions) into one param, reducing `generate()`'s parameter count. But it introduces a new type developers have to learn and import.
@@ -88,11 +81,11 @@ response.output.name  # typed as str — IDE autocomplete works
 
 No new types, no imports beyond the Pydantic model, and the 95% case is one kwarg.
 
-**Recommendation.** Flat kwargs. Remove the `output` param (which currently accepts `OutputConfig | OutputConfigDict | Output[T]`). Keep `output_schema`, `output_format`, `output_constrained`, `output_content_type`, and `output_instructions` as individual keyword-only params. Use `@overload` so `output_schema: type[T]` parameterizes the return type. Remove `Output[T]`, `OutputConfigDict`, and the wire-format `OutputConfig` from the app developer's surface entirely. The wire-format `OutputConfig` remains an internal/plugin type — plugin authors use it when implementing model plugins to read output configuration from the `GenerateRequest`, but app developers never see it.
+**Recommendation.** Flat kwargs. Remove the `output` param. Keep `output_schema`, `output_format`, `output_constrained`, `output_content_type`, and `output_instructions` as individual keyword-only params. Use `@overload` so `output_schema: type[T]` parameterizes the return type. The wire-format `OutputConfig` remains an internal/plugin type — plugin authors use it when implementing model plugins to read output configuration from the `GenerateRequest`, but app developers never see it.
 
 `output_schema` accepts three forms: a Pydantic model class (`type[T]` — the common case, gives typed returns), a raw JSON schema dict (`dict` — for dynamic schemas, returns `Any`), or a registered schema name (`str`, looked up from registry at runtime, returns `Any`). Only the Pydantic class form carries the generic type. 
 
-**The same applies to `input_schema`.** The current `Input[T]` wrapper exists for the same (incorrect) reason as `Output[T]`. It's used on `define_prompt()` and `ai.prompt()` to type the prompt's input parameter. With flat kwargs and overloads, `input_schema: type[T]` carries the generic directly — `Input[T]` can be removed:
+**The same applies to `input_schema`.** With flat kwargs and overloads, `input_schema: type[T]` carries the generic directly — `Input[T]` can be removed:
 
 ```python
 prompt = ai.define_prompt(
@@ -106,8 +99,6 @@ response = await prompt(RecipeInput(dish='pizza'))
 response.output.name  # typed as str — IDE knows this is Recipe
 ```
 
-`Input[T]` and `Output[T]` are both removed from the public API — they no longer need to exist. `input_schema` and `output_schema` already exist as params today; they just need the `type[T]` overloads to carry the generic. That's three types eliminated from the app developer surface (`Input[T]`, `Output[T]`, `OutputConfigDict`) and zero new types introduced.
-
 Dotprompt should work in a similar way but with one additional nuance. When a schema is defined in a `.prompt` file's YAML frontmatter (`output: { schema: Recipe }`), the SDK uses it to constrain the model's JSON output at runtime. But the type checker doesn't know this — `.prompt` files can't carry Python type references — so `response.output` is `Any`. To get typed output, pass the schema at the call site:
 
 ```python
@@ -173,7 +164,7 @@ The practical consequences: a Flask route handler can't call `ai.generate()` wit
 
 For context, every major Python LLM SDK offers both sync and async: OpenAI and Anthropic ship dual clients (`OpenAI` / `AsyncOpenAI`), LangChain has dual methods (`.invoke()` / `.ainvoke()`), Google Cloud AI uses a separate async transport. Even Hugging Face, which is sync-only, made a deliberate choice. Genkit is the only async-only SDK in the ecosystem. A developer coming from OpenAI's `client.chat.completions.create()` — no `await`, no `async def` — hits immediate friction.
 
-**Proposal.** Dual clients — `Genkit` (sync) and `AsyncGenkit` (async). This is the industry standard: OpenAI, Anthropic, and Cohere all ship it. The async client holds the real implementation; the sync client delegates to it. We prefer the dual-client pattern (separate classes) over dual methods (`generate()` / `agenerate()` on the same class, à la LangChain) because it keeps each class's type signatures clean — every method on `Genkit` returns `T`, every method on `AsyncGenkit` returns `Awaitable[T]` — and avoids polluting autocomplete with `a`-prefixed duplicates of every method.
+**Proposal.** Dual clients — `Genkit` (sync) and `AsyncGenkit` (async). This is the industry standard: OpenAI, Anthropic, and Cohere all ship it. The async client holds the real implementation; the sync client delegates to it. We prefer the dual-client pattern (separate classes) over dual methods (`generate()` / `agenerate()` on the same class) because it keeps each class's type signatures clean — every method on `Genkit` returns `T`, every method on `AsyncGenkit` returns `Awaitable[T]` — and avoids polluting autocomplete with `a`-prefixed duplicates of every method.
 
 ```python
 from genkit import Genkit, AsyncGenkit
@@ -201,12 +192,12 @@ async def generate(
     tool_responses=None, config=None, max_turns=None,
     on_chunk=None, context=None,
     output_format=None, output_content_type=None,
-    output_instructions=None, output_constrained=None,
+    output_instructions=None, output_constrained=None, *,
     output=None, use=None, docs=None,
 ) -> GenerateResponseWrapper[Any]:
 ```
 
-None are keyword-only. Several don't belong.
+Pretty much none are keyword-only. The original decision of where to put the * in the first place seems arbitrary. Several params don't belong.
 
 **What changes:**
 

From 022b1ef1fbfc441f09199b22fcece2ba5f6f94e2 Mon Sep 17 00:00:00 2001
From: Jeff Huang <huangjeff@google.com>
Date: Tue, 17 Feb 2026 16:49:38 -0600
Subject: [PATCH 10/17] Address more feedback, notes

---
 py/docs/python_beta_sdk_audit.md  | 24 ++++++++++++------------
 py/docs/python_beta_sdk_design.md |  4 ++++
 2 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/py/docs/python_beta_sdk_audit.md b/py/docs/python_beta_sdk_audit.md
index 8df0a75c5b..df25d70d96 100644
--- a/py/docs/python_beta_sdk_audit.md
+++ b/py/docs/python_beta_sdk_audit.md
@@ -1,6 +1,6 @@
 # API Audit: Resolved Decisions
 
-The documentation audit surfaced 10 API issues. The items below had clear Pythonic answers and are resolved. The remaining issues — streaming, public API surface, output configuration, async support, method signatures, and class structure — are open design questions covered in [PYTHON_API_REVIEW.md](./PYTHON_API_REVIEW.md).
+The documentation audit surfaced various API issues. The items below had clear Pythonic answers and are resolved. The remaining issues — streaming, public API surface, output configuration, async support, method signatures, and class structure — are open design questions covered in [PYTHON_API_REVIEW.md](./PYTHON_API_REVIEW.md).
 
 ---
 
@@ -22,7 +22,9 @@ This is standard Python convention. OpenAI, Anthropic, and most modern Python AP
 
 ## Decorator shorthands
 
-Already implemented. `@ai.tool()`, `@ai.flow()` exist alongside imperative `define_*` methods. App developers use decorators; plugin authors use the imperative API. This also resolved the handler signature discoverability issue — decorators make expected signatures clear through type hints, while the imperative `define_*` methods accept generic callables with no signature guidance.
+`@ai.tool()`, `@ai.flow()` exist alongside imperative `define_*` methods. App developers use decorators; plugin authors use the imperative API. This also resolved the handler signature discoverability issue — decorators make expected signatures clear through type hints, while the imperative `define_*` methods accept generic callables with no signature guidance.
+
+^^ Need decorators for other primitives as well.
 
 ## Part constructor
 
@@ -66,12 +68,6 @@ response = await ai.generate(model="googleai/imagen3", prompt="a cat")
 image = response.media  # Media | None
 ```
 
-## Veneer naming
-
-The SDK has auto-generated schema types (`GenerateResponse` from `genkit-schemas.json`) and hand-written wrappers that add convenience methods (`GenerateResponseWrapper`). Users interact with the wrapper but see the "Wrapper" suffix in type hints and docs.
-
-**Decision:** Alias the wrapper under the clean name at the public surface: `GenerateResponseWrapper` exported as `GenerateResponse` from `from genkit import ...`. The auto-generated schema type remains available as `GenerateResponse` in `genkit.plugin` for plugin authors. See [python_beta_api_proposal.md](./python_beta_api_proposal.md) for the full type architecture.
-
 ## Type consolidation
 
 Two nearly-identical types existed: `BaseDataPoint` (generic) and `BaseEvalDataPoint` (evaluator-specific). The audit found samples using them interchangeably.
@@ -82,11 +78,15 @@ Two nearly-identical types existed: `BaseDataPoint` (generic) and `BaseEvalDataP
 
 Several symbols were in the public `__all__` that don't belong:
 
-- **`tool_response`** — only 3 sample usages. JS and Go use a method on the tool instance. Removed.
-- **`dump_dict` / `dump_json`** — internal serialization utilities. Removed.
-- **`get_logger`** — thin wrapper around `logging.getLogger("genkit")`. Python developers know the stdlib. Removed.
-- **`GenkitRegistry`, `FlowWrapper`, `SimpleRetrieverOptions`** — internal implementation types. Removed.
+- **`tool_response`** — only 3 sample usages. JS and Go use a method on the tool instance.
+- **`dump_dict` / `dump_json`** — internal serialization utilities.
+- **`get_logger`** — thin wrapper around `logging.getLogger("genkit")`. Python developers know the stdlib.
+- **`GenkitRegistry`, `FlowWrapper`, `SimpleRetrieverOptions`** — internal implementation types.
+
+^^ Need to audit __all__ in all packages and see what comes up. 
 
 ## Evaluator API
 
 The evaluator API (`GenkitMetricType`, `MetricConfig`, `PluginOptions`) has its own design issues — the audit found the API shape diverges significantly from what the naming suggests. Not addressed in this review; flagged for separate follow-up.
+
+^^ Follow up
diff --git a/py/docs/python_beta_sdk_design.md b/py/docs/python_beta_sdk_design.md
index e4b5f03c32..923ef151b3 100644
--- a/py/docs/python_beta_sdk_design.md
+++ b/py/docs/python_beta_sdk_design.md
@@ -55,6 +55,10 @@ The full proposal — including the type architecture (auto-generated schema typ
 
 ^^^ Upon dicsussion, no clear reason to separate from genkit import vs from genkit.types import
 
+^^^ Audit what's exposed via __all__ in all the packages (there are some random helpers for example)
+
+^^^ Consider internal code organization as well, what goes in blocks? core? web? types? Internal code organization is somewhat generic/sprawling/unopinionated 
+
 ## 5. Output configuration
 
 The `generate()` method currently accepts output configuration multiple ways:

From ceae83eb3a5872cd7d416a68d20a681296d16ec4 Mon Sep 17 00:00:00 2001
From: Jeff Huang <huangjeff@google.com>
Date: Tue, 17 Feb 2026 18:24:21 -0600
Subject: [PATCH 11/17] add some more comments

---
 py/docs/python_beta_api_proposal.md | 12 ++----------
 py/docs/python_beta_sdk_design.md   |  3 +++
 2 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/py/docs/python_beta_api_proposal.md b/py/docs/python_beta_api_proposal.md
index 843c93db2a..e4a9e0065d 100644
--- a/py/docs/python_beta_api_proposal.md
+++ b/py/docs/python_beta_api_proposal.md
@@ -9,6 +9,8 @@ There are two audiences for this SDK:
 
 These audiences have separate entry points.
 
+^^ TODO: Unify `from genkit import *` and `from genkit.types import *`
+
 ---
 
 ## Methodology
@@ -269,13 +271,3 @@ Python follows the Go pattern — types are a separate import. This matches what
 ## Internal modules
 
 Everything under `genkit._core`, `genkit._blocks`, and `genkit._ai` (note underscore prefix) carries no stability guarantee. Today these modules lack the underscore (`genkit.core`, `genkit.blocks`, `genkit.ai`), which is why samples and the documentation agent used internal paths. Renaming them is part of this proposal — the underscore is Python's convention for "private, use at your own risk."
-
----
-
-## Open design question: `Input[T]` / `Output[T]`
-
-This is the one genuinely open question in the public API surface. It's covered in depth as a design decision in [PYTHON_API_REVIEW.md, section 5](./PYTHON_API_REVIEW.md).
-
-Summary: `Output[T]` carries generic type information for typed responses (`ai.generate(output=Output(MyModel))` → `GenerateResponse[MyModel]`). The alternative is inline kwargs (`output_schema=MyModel`), which loses the generic typing. A tech lead challenged the naming — "Input of what? Output of what?" — arguing the names are too generic.
-
-Three options: inline only, wrapper only, or keep both. Recommendation is wrapper only (consolidate to one `output=` param), with the name open for discussion.
diff --git a/py/docs/python_beta_sdk_design.md b/py/docs/python_beta_sdk_design.md
index 923ef151b3..0aa0b71ff0 100644
--- a/py/docs/python_beta_sdk_design.md
+++ b/py/docs/python_beta_sdk_design.md
@@ -28,6 +28,9 @@ ai.generate(model="gemini", prompt="Hi", tools=["search"])
 
 **Flat imports, intentional boundaries.** Python has no access modifiers — any module is importable, and there's no way to enforce "private." This makes API boundary design a deliberate choice, not a language feature. We define three public entry points (`genkit`, `genkit.types`, `genkit.plugin`) and treat everything else as internal with no stability guarantee. Internal modules should be underscore-prefixed (`genkit._core`, `genkit._blocks`) to signal this — today they lack the underscore, which is why samples accidentally depend on them. The mechanics of this boundary are covered in section 4.
 
+^^ genkit.plugin => for core genkit plugin imports (used by plugin author only)
+^^ genkit.plugins.___ => for actual plugin imports exposed by plugin authors
+
 ## 3. Initial Audit
 
 While working on updated docs, we identified several friction points in the developer experience. 

From ee9a2fe191a1722ec7664701c6068a27bd531264 Mon Sep 17 00:00:00 2001
From: Jeff Huang <huangjeff@google.com>
Date: Thu, 19 Feb 2026 15:28:23 -0600
Subject: [PATCH 12/17] wip - updates 2/19

---
 py/docs/python_beta_api_proposal.md    | 523 ++++++++-----
 py/docs/python_beta_sdk_design.md      |   5 +-
 py/docs/python_package_reorg.md        | 450 +++++++++++
 py/docs/python_type_audit_checklist.md | 179 +++++
 py/docs/python_type_audit_details.md   | 996 +++++++++++++++++++++++++
 5 files changed, 1976 insertions(+), 177 deletions(-)
 create mode 100644 py/docs/python_package_reorg.md
 create mode 100644 py/docs/python_type_audit_checklist.md
 create mode 100644 py/docs/python_type_audit_details.md

diff --git a/py/docs/python_beta_api_proposal.md b/py/docs/python_beta_api_proposal.md
index e4a9e0065d..64ad343f87 100644
--- a/py/docs/python_beta_api_proposal.md
+++ b/py/docs/python_beta_api_proposal.md
@@ -1,273 +1,448 @@
 # Genkit Python — Public API Surface Proposal
 
-This doc defines the public API of the Genkit Python SDK: the set of symbols developers can import, that we commit to keeping stable, and that we document and support. Everything else is internal and can change without notice.
+Stable public symbols — what we document, support, and commit to. Everything else is internal.
 
-There are two audiences for this SDK:
+Two audiences, separate entry points:
 
-1. **App developers** — people building AI features with Genkit. They need `Genkit`, decorators, data types, and not much else.
-2. **Plugin authors** — people building model providers, vector stores, telemetry exporters, web framework integrations. They need the action system, schema types, and metadata builders.
+1. **App developers** — `from genkit import ...` for framework objects, content types, errors.
+2. **Plugin authors / advanced users** — domain sub-modules (`genkit.model`, `genkit.retriever`, etc.) for request/response schemas, config types, metadata builders.
 
-These audiences have separate entry points.
-
-^^ TODO: Unify `from genkit import *` and `from genkit.types import *`
+> **Type architecture detail:** The SDK has schema types (auto-generated Pydantic models from `genkit-schemas.json`) and veneers (hand-written wrappers that add convenience methods like `.text`, `.output`). See [Type Architecture](#type-architecture) appendix at the end of this doc for the full breakdown.
 
 ---
 
-## Methodology
+## Entry point 1: `from genkit import ...`
 
-Every import across 30+ samples, 20+ plugins, and the test suite was audited to understand what symbols people actually use. The symbol lists and usage counts below come from that audit.
+The single entry point for app developers. Framework objects, veneers, context, errors, and all content/data types live here. No separate `genkit.types` import needed.
 
----
+```python
+from genkit import (
+    # Core
+    Genkit,
+    ActionRunContext,
+    GenerateResponse,      # veneer — aliased from GenerateResponseWrapper
+    GenerateResponseChunk, # veneer — aliased from GenerateResponseChunkWrapper (streaming)
+    GenkitError,
+    UserFacingError,
+    Prompt,
 
-## Type architecture
+    # Content
+    Part, TextPart, MediaPart, Media,
+    DataPart, ToolRequestPart, ToolResponsePart, CustomPart,
+    ReasoningPart,
 
-Before presenting the symbol lists, it helps to understand why there are multiple types for seemingly the same thing. The SDK has three layers of types, each serving a different purpose.
+    # Messages
+    Message, Role, Metadata,
 
-### Layer 1: Schema types (auto-generated)
+    # Documents
+    Document, DocumentData, DocumentPart,
 
-These are auto-generated from `genkit-schemas.json`, the shared cross-language schema. They're plain Pydantic `BaseModel` classes — data containers with no convenience methods.
+    # Context
+    ToolRunContext,
+    ToolInterruptError,
 
-```python
-# genkit/core/typing.py (auto-generated)
-class GenerateResponse(BaseModel):
-    candidates: list[Candidate] | None = None
-    usage: GenerationUsage | None = None
-    request: GenerateRequest | None = None
-    ...
-
-class Message(BaseModel):
-    role: Role
-    content: list[Part]
-    metadata: dict[str, Any] | None = None
-
-class Part(RootModel):
-    root: TextPart | MediaPart | DataPart | ToolRequestPart | ToolResponsePart | ...
-
-class OutputConfig(BaseModel):
-    format: str | None = None
-    schema_: dict[str, Any] | None = None
-    instructions: str | bool | None = None
-    constrained: bool | None = None
-```
+    # Evaluation
+    BaseEvalDataPoint,
 
-These are the internal contract between the framework and plugins. A model plugin receives a `GenerateRequest` and returns a `GenerateResponse`. They split into two audiences:
+    # Tool control
+    ToolChoice,
 
-| Audience | Types | Examples |
-|----------|-------|---------|
-| **Plugin authors** (contract types) | Request/response schemas, config types | `GenerateRequest`, `GenerateResponse`, `OutputConfig`, `ModelInfo` |
-| **App developers** (content types) | Things users construct and pass around | `Message`, `Part`, `TextPart`, `Media`, `Document`, `Role` |
+    # Generation config
+    GenerationCommonConfig,
 
-Today they all live in `genkit.types` (or `genkit.core.typing` internally), mixed together.
+    # Plugin authoring (also used by advanced app developers)
+    Plugin,
+    Action,
+    ActionMetadata,
+    ActionKind,
+    StatusName,
+    to_json_schema,
+)
+```
 
-### Layer 2: Veneers (hand-written wrappers)
+**~29 symbols.** One import covers both app developers (~22 symbols) and plugin authors (~7 additional). This is normal for Python — OpenAI and Anthropic export far more from their top level.
+
+- `Genkit` — the entry point. Every app starts with this.
+- `ActionRunContext` — context object inside flows and tools.
+- `GenerateResponse` — return type of `ai.generate()` — veneer with `.text`, `.output`, `.tool_requests`.
+- `GenerateResponseChunk` — chunk type from `ai.generate_stream()` — veneer with `.text` (aliased from `GenerateResponseChunkWrapper`).
+- `Prompt` — return type of `ai.prompt()`. Core concept, needs to be type-annotatable.
+- `GenkitError` — base error class for catching framework errors.
+- `UserFacingError` — errors safe to surface to HTTP clients.
+- `Part`, `TextPart`, `MediaPart`, `Media`, `DataPart`, `ToolRequestPart`, `ToolResponsePart`, `CustomPart`, `ReasoningPart` — content types developers construct and pass around.
+- `Message`, `Role`, `Metadata` — message construction for multi-turn conversations.
+- `Document`, `DocumentData`, `DocumentPart` — RAG document types.
+- `ToolRunContext` — extended context for tool handlers (extends `ActionRunContext`).
+- `ToolInterruptError` — error type for tool interrupts.
+- `ToolChoice` — tool selection control for `generate()`.
+- `GenerationCommonConfig` — model config (temperature, top_k, etc.).
+- `BaseEvalDataPoint` — evaluation data point type.
+- `Plugin` — base class for all plugin types (plugin authors).
+- `Action` — core action type (plugin authors).
+- `ActionMetadata` — action registration metadata (plugin authors).
+- `ActionKind` — action type enum: model, retriever, embedder, etc. (plugin authors).
+- `StatusName` — error status codes (plugin authors, error handling).
+- `to_json_schema` — converts Pydantic models to JSON Schema (plugin authors, 10+ plugins use this during action registration).
 
-These extend schema types with convenience methods — `.text`, `.output`, `.tool_requests`. They're what app developers interact with when receiving responses.
+### Veneer aliasing
+
+Users should never see "Wrapper" suffixes. The fix:
 
 ```python
-# genkit/blocks/model.py (hand-written)
-class GenerateResponseWrapper(GenerateResponse):
-    @property
-    def text(self) -> str: ...
-    @property
-    def output(self) -> Any: ...
-    @property
-    def tool_requests(self) -> list[ToolRequestPart]: ...
-    @property
-    def messages(self) -> list[MessageWrapper]: ...
-
-class MessageWrapper:  # wraps Message, doesn't extend it
-    def __init__(self, message: Message): ...
-    @property
-    def text(self) -> str: ...
-    @property
-    def tool_requests(self) -> list[ToolRequestPart]: ...
+# genkit/__init__.py
+from genkit.blocks.model import GenerateResponseWrapper as GenerateResponse
+from genkit.blocks.model import GenerateResponseChunkWrapper as GenerateResponseChunk
 ```
 
-Key distinction:
-- `GenerateResponseWrapper` **extends** `GenerateResponse` (inheritance). Aliasing it as `GenerateResponse` publicly is safe — construction is compatible.
-- `MessageWrapper` **wraps** `Message` (composition). Its constructor takes a `Message` instance, not raw fields. Aliasing it as `Message` would break `Message(role="user", content=[...])`.
+Both use inheritance (extend the schema type), so these aliases are safe — `isinstance` checks still work.
 
-Veneers are for app developers receiving responses. Plugin authors constructing responses use the schema types directly.
+**`MessageWrapper` is the exception.** It uses composition — its constructor takes a `Message` instance, not raw fields. Aliasing it as `Message` would break `Message(role="user", content=[...])`. So `Message` remains the schema type everywhere. Users interact with `MessageWrapper` via `response.messages` but never construct it directly.
 
-### Layer 3: Config helpers (hand-written)
+### `ExecutablePrompt` — should it be public?
 
-Type-carrying wrappers for configuration:
+`ExecutablePrompt` is the class returned by `ai.prompt()`. Today it's not exported — users can't type-annotate a variable that holds a prompt reference.
 
 ```python
-# genkit/blocks/interfaces.py (hand-written)
-class Input(Generic[T]):
-    """Carries type info for input validation."""
-    def __init__(self, schema: type[T]): ...
-
-class Output(Generic[T]):
-    """Carries type info for output parsing."""
-    def __init__(self, schema: type[T], format: str = "json", ...): ...
+# Today: no way to annotate this
+my_prompt = ai.prompt("greeting")
+
+# Proposed: export as Prompt
+from genkit import Prompt
+my_prompt: Prompt = ai.prompt("greeting")
 ```
 
-`Input[T]` and `Output[T]` exist so that `generate()` and `prompt()` can carry generic type information — `ai.generate(output=Output(MyModel))` returns `GenerateResponse[MyModel]` with typed `.output`.
+Recommendation: export it as `Prompt`. It's a core concept, and being unable to type-annotate it is a gap. This would bring the top-level to 6 symbols.
 
-### Where the layers bleed
+### What was removed
 
-1. **App developers construct schema types directly.** `Message(role="user", content=[Part(text="hello")])` is a schema type, not a veneer. Content-building types are schema types used by both audiences.
+- `tool_response` — only 3 sample usages. JS/Go use a method on the tool instance.
+- `Plugin` — users pass plugin instances (`GoogleAI()`), never reference the type. Moved to shared across domains.
+- `get_logger` — thin wrapper around `logging.getLogger("genkit")`. Use the stdlib.
+- `GenkitRegistry`, `FlowWrapper`, `SimpleRetrieverOptions` — internal implementation types.
 
-2. **Schema and config types overlap.** `OutputConfig` (schema, Layer 1) and `Output[T]` (config helper, Layer 3) configure the same thing. `generate()` accepts both: `output: OutputConfig | OutputConfigDict | Output[Any] | None`. (This is addressed in [PYTHON_API_REVIEW.md, section 5](./PYTHON_API_REVIEW.md).)
+### `ToolRunContext` placement
 
-3. **Veneers exported under internal names.** `GenerateResponseWrapper` — the "Wrapper" suffix is an implementation detail that leaked into the public API.
+`ToolRunContext` extends `ActionRunContext` with tool-specific features. Both types are kept (for documentation clarity, future-proofing, and runtime `isinstance` checks), but only `ActionRunContext` is exported from the top level. `ToolRunContext` is available from `genkit.types` for type annotations when needed.
 
-4. **`genkit.types` mixes audiences.** Plugin contract types and app developer types sit in the same module.
+---
 
-The proposal below addresses all four of these problems.
+Types specific to a domain (model, retriever, embedder, etc.) live in domain sub-modules — not in the top-level `genkit` import. These types are used by both plugin authors and advanced app developers (e.g., writing middleware or defining custom models).
 
 ---
 
-## Entry point 1: `from genkit import ...`
+## Domain sub-modules
+
+Organized by action type, mirroring the JS SDK's `genkit/model`, `genkit/retriever`, etc. Each sub-module contains the wire-format types, metadata builders, helpers, and options for that domain. Both plugin authors and advanced app developers import from here.
 
-The framework entry point for app developers. Veneers, context, and errors. Data types live in `genkit.types` (see below) — this follows the Go SDK pattern where types are a separate package, and matches what Python samples already do in practice.
+### `genkit.model`
+
+Everything related to model implementation and the model wire format.
 
 ```python
-from genkit import (
-    Genkit,
-    ActionRunContext,
-    GenerateResponse,     # veneer — aliased from GenerateResponseWrapper
-    GenkitError,
-    UserFacingError,
+from genkit.model import (
+    # Wire-format types
+    GenerateRequest,
+    GenerateResponse,       # schema type — NOT the veneer
+    GenerateResponseChunk,
+    GenerationUsage,
+    Candidate,
+    OutputConfig,
+    FinishReason,
+    GenerateActionOptions,
+    Error,                  # schema error type (not GenkitError)
+    Operation,              # long-running operation type
+
+    # Tool wire-format types (used by model handlers to process tool calls)
+    ToolRequest,
+    ToolDefinition,
+    ToolResponse,
+
+    # Model info and capabilities
+    ModelInfo,
+    Supports,
+    Constrained,
+    Stage,
+
+    # Registration and metadata
+    model_action_metadata,
+    model_ref,
+    ModelReference,
+
+    # Background / long-running models (e.g. video generation)
+    BackgroundAction,
+    lookup_background_action,
+
+    # Helpers
+    compute_usage_stats,
+    resolve_api_key,             # resolves API key: request config overrides plugin default
+    GenerationCommonConfig,
+
+    # Model middleware - WIP
+    ModelMiddleware,
+    ModelMiddlewareNext,
 )
 ```
 
-**5 symbols.** Tight, intentional, hard to get wrong.
+Used by: model plugin authors, app developers writing middleware (`GenerateRequest`, `GenerateResponse`, `ModelMiddlewareNext`), app developers defining custom models (`ModelInfo`, `Supports`).
 
-| Symbol | Why it's here |
-|--------|--------------|
-| `Genkit` | The entry point. Every app starts with this. (48 files) |
-| `ActionRunContext` | Context object inside flows and tools. (20 files) |
-| `GenerateResponse` | Return type of `ai.generate()` — veneer with `.text`, `.output`, `.tool_requests`. |
-| `GenkitError` | Base error class for catching framework errors. |
-| `UserFacingError` | Errors safe to surface to HTTP clients. |
+**Notes on helpers:**
 
-### Veneer aliasing
+- **`resolve_api_key(config, plugin_key)`** — resolves which API key to use: per-request key from `GenerationCommonConfig` overrides the plugin-level default. In JS, this logic lives duplicated in each plugin's `utils.ts` (`calculateApiKey`). Centralizing it in `genkit.model` avoids every plugin re-inventing key resolution for multi-tenancy. The lower-level extraction function (`extract_request_api_key`) stays in `genkit.blocks.model` but is not re-exported to the public API — only the google-genai plugin needs it for the `apiKey: false` ADC edge case.
+- **`compute_usage_stats(input, response)`** — renamed from `get_basic_usage_stats`. Counts characters, images, videos, and audio in input/output messages. "Compute" reflects that it does work (not a lookup), and "basic" was dropped (basic compared to what?).
+- **`text_from_content`** — removed from public API. Consumers should use the veneer layer instead:
+  - **Messages:** `MessageWrapper.text` (available on `response.messages[i].text`)
+  - **Responses:** `GenerateResponse.text` (the veneer's `.text` property)
+  - **Stream chunks:** `GenerateResponseChunkWrapper.text`
+  - **Documents:** `Document.text()` (already exists on the `Document` class)
+  - Current consumers: google-genai reranker (should use `Document.text()`), internal middleware (should use `doc.text()`), tests (should use chunk/response veneers).
 
-Users should never see "Wrapper" suffixes. The fix:
+### `genkit.retriever`
 
 ```python
-# genkit/__init__.py
-from genkit.blocks.model import GenerateResponseWrapper as GenerateResponse
-from genkit.blocks.model import GenerateResponseChunkWrapper as GenerateResponseChunk
+from genkit.retriever import (
+    # Wire-format types
+    RetrieverRequest,
+    RetrieverResponse,
+
+    # Registration and metadata
+    retriever_action_metadata,
+    create_retriever_ref,
+    RetrieverOptions,
+
+    # Indexer support
+    IndexerRequest,
+    IndexerOptions,
+    indexer_action_metadata,
+    create_indexer_ref,
+)
 ```
 
-`GenerateResponseWrapper` uses inheritance, so this alias is safe. `GenerateResponseChunkWrapper` follows the same pattern.
+Used by: retriever/indexer plugin authors, app developers using `RetrieverResponse` as a type annotation.
 
-**`MessageWrapper` is the exception.** It uses composition — its constructor takes a `Message` instance, not raw fields. Aliasing it as `Message` would break `Message(role="user", content=[...])`. So `Message` remains the schema type everywhere. Users interact with `MessageWrapper` via `response.messages` but never construct it directly.
+### `genkit.embedder`
 
-### `ExecutablePrompt` — should it be public?
+```python
+from genkit.embedder import (
+    # Wire-format types
+    EmbedRequest,
+    EmbedResponse,
+    Embedding,
+
+    # Registration and metadata
+    embedder_action_metadata,
+    create_embedder_ref,
+    EmbedderOptions,
+    EmbedderSupports,
+)
+```
 
-`ExecutablePrompt` is the class returned by `ai.prompt()`. Today it's not exported — users can't type-annotate a variable that holds a prompt reference.
+### `genkit.reranker`
 
 ```python
-# Today: no way to annotate this
-my_prompt = ai.prompt("greeting")
-
-# Proposed: export as Prompt
-from genkit import Prompt
-my_prompt: Prompt = ai.prompt("greeting")
+from genkit.reranker import (
+    reranker_action_metadata,
+    create_reranker_ref,
+    RankedDocument,
+    RerankerRequest,
+    RerankerResponse,
+    RankedDocumentData,
+    RankedDocumentMetadata,
+)
 ```
 
-Recommendation: export it as `Prompt`. It's a core concept, and being unable to type-annotate it is a gap. This would bring the top-level to 6 symbols.
+### `genkit.evaluator`
 
-### What was removed
+```python
+from genkit.evaluator import (
+    # Wire-format types
+    EvalRequest,
+    EvalResponse,
+    EvalFnResponse,
+    Score,
+    Details,
+    BaseEvalDataPoint,
+    EvalStatusEnum,
 
-| Symbol | Reason |
-|--------|--------|
-| `tool_response` | Only 3 sample usages. JS/Go use a method on the tool instance. |
-| `Plugin` | Users pass plugin instances (`GoogleAI()`), never reference the type. Moved to `genkit.plugin`. |
-| `get_logger` | Thin wrapper around `logging.getLogger("genkit")`. Use the stdlib. |
-| `GenkitRegistry`, `FlowWrapper`, `SimpleRetrieverOptions` | Internal implementation types. |
+    # Registration and metadata
+    evaluator_action_metadata,
+    evaluator_ref,
+)
+```
 
-### `ToolRunContext` placement
+Used by: evaluator plugin authors and app developers writing custom evaluators (samples show both).
 
-`ToolRunContext` extends `ActionRunContext` with tool-specific features. Both types are kept (for documentation clarity, future-proofing, and runtime `isinstance` checks), but only `ActionRunContext` is exported from the top level. `ToolRunContext` is available from `genkit.types` for type annotations when needed.
+### `genkit.web`
 
----
+Web framework integration (FastAPI, Flask, custom ASGI apps).
 
-## Entry point 2: `from genkit.types import ...`
+```python
+from genkit.web import (
+    FlowWrapper,
+    ContextProvider,
+    RequestData,
+    create_flows_asgi_app,
+)
+```
 
-Content types that app developers construct and pass around — the schema types (Layer 1) that users interact with directly.
+Used by: fastapi plugin, flask plugin, app developers serving flows over HTTP.
+
+### `genkit.telemetry`
 
 ```python
-from genkit.types import (
-    # Content
-    Part, TextPart, MediaPart, Media,
-    DataPart, ToolRequestPart, ToolResponsePart, CustomPart,
+from genkit.telemetry import (
+    add_custom_exporter,
+    is_dev_environment,
+    GENKIT_VERSION,
+    GENKIT_CLIENT_HEADER,
+    tracer,
+)
+```
 
-    # Messages
-    Message, Role,
+Used by: telemetry plugins (observability, Google Cloud, Firebase, Amazon Bedrock, Cloudflare, Microsoft Foundry).
 
-    # Documents
-    Document, DocumentData,
+`AdjustingTraceExporter` and `RedactedSpan` should live in the telemetry *plugin* (e.g., `genkit-google-cloud`), not in core. Both are implementation details of specific telemetry providers — JS and Go also keep these in their cloud plugins, not core.
 
-    # Context
-    ToolRunContext,
-
-    # Evaluation
-    BaseEvalDataPoint,
+### Shared across domains (plugin authoring)
 
-    # Tool control
-    ToolChoice,
+These symbols are used by plugin authors across multiple domains. They live in the top-level `from genkit import ...` alongside the app-developer symbols:
 
-    # Generation config
-    GenerationCommonConfig,
+```python
+from genkit import (
+    # (app-developer symbols from Entry point 1 above, plus:)
+
+    # Plugin authoring
+    Plugin,             # base class for all plugin types
+    Action,             # core action type
+    ActionMetadata,     # action registration metadata
+    ActionKind,         # action type enum (model, retriever, embedder, etc.)
+    StatusName,         # error status codes
+    DocumentPart,       # part type within Documents (vs. message Parts)
+    to_json_schema,     # converts Pydantic models to JSON Schema
 )
 ```
 
-This module is focused: things app developers construct and pass to Genkit methods. Plugin contract types (`GenerateRequest`, `OutputConfig`, `ModelInfo`) have been moved to `genkit.plugin` — they don't belong in the app developer's import path.
+This brings the total top-level surface to **~29 symbols** (22 app-developer + 7 plugin-authoring). All are stable, documented types — no implementation details.
 
-### No re-exports at the top level
+---
 
-`from genkit import Part, Message` does **not** work. Content types live in `genkit.types` only. This keeps the top-level surface tight and makes it unambiguous where types come from. Samples already use `from genkit.types import ...` — this formalizes the existing pattern.
+## Internal — resolved decisions
 
----
+Helpers that were candidates for export. Each has a final verdict.
 
-## Entry point 3: `from genkit.plugin import ...`
+| Symbol | Consumers | Verdict | Reasoning |
+|---|---|---|---|
+| `get_logger` | 15+ plugins, 10+ samples | **Drop.** | Structlog wrapper. Neither JS nor Go force a logging library. Use stdlib `logging`. |
+| `get_cached_client` | 9 plugins | **Internal (reconsider later).** | Per-event-loop httpx client cache. Solves a real async problem (~100 lines to reimplement). No JS/Go equivalent. Keep internal but may export if third-party plugins need it. |
+| `dump_dict` / `dump_json` | 15+ consumers | **Remove.** Fix at source. | Wrappers for `model_dump(exclude_none=True, by_alias=True)`. Fix: emit `GenkitBaseModel` from the code generator that defaults these flags. Then `.model_dump()` just works. See [pydantic/pydantic#10141](https://github.com/pydantic/pydantic/issues/10141). |
+| `get_callable_json` | fastapi, flask, core | **Remove.** Add method instead. | Converts exceptions to JSON for HTTP responses. Fix: add `.to_json()` and `.http_status` to `GenkitError` (matches Go's `.ToReflectionError()`). |
+| `matches_uri_template` | MCP plugin only | **Internal.** | 15-line regex helper. MCP plugin should own its copy. |
 
-Everything a plugin author needs to implement a model provider, retriever, embedder, evaluator, or web framework integration.
+- **`create_reflection_asgi_app`, `RuntimeManager`, `ServerSpec`, `Registry`** — dev-mode reflection server infrastructure. Used by fastapi plugin, multi-server sample, and core `_base_async.py`.
 
-```python
-from genkit.plugin import (
-    # Base class
-    Plugin,
+  **The problem:** Genkit needs a dev-mode reflection server (HTTP API on a separate port) so the Genkit Dev UI can introspect the running app. In JS and Go, this is fully automatic — the `Genkit` constructor (JS) or `Init()` (Go) starts the reflection server internally because JS has an ambient event loop and Go has goroutines. The user never touches it.
 
-    # Action system
-    Action, ActionRunContext,
+  Python can't copy this because:
+  1. **No event loop at construction time.** `Genkit()` is called synchronously at module level. There's no running `asyncio` event loop yet — you can't start an async HTTP server from a synchronous constructor.
+  2. **ASGI server ownership.** In the FastAPI/Flask use case, uvicorn owns the process and the event loop. Genkit is a library inside someone else's ASGI app — it can't spin up a second server on its own.
 
-    # Schema types (wire format — what plugins receive and return)
-    GenerateRequest, GenerateResponse, GenerateResponseChunk,
-    Message, OutputConfig, ModelInfo, Supports,
+  When Genkit owns the process (`ai.run_main(coro)`), the reflection server starts automatically (same as JS/Go). The problem is only the "Genkit as a library" path (FastAPI/Flask), where the plugin currently needs four internal imports to wire up the reflection server manually.
 
-    # Request/response types for other action types
-    RetrieverRequest, RetrieverResponse,
-    EmbedRequest, EmbedResponse,
+  **Fix: add `await ai.start_dev_server()` method on `Genkit`.** One async method that encapsulates all the wiring (creates reflection app from its own registry, binds a port, starts uvicorn, registers with RuntimeManager). The fastapi plugin's lifespan becomes trivial:
 
-    # Metadata builders
-    model_metadata, retriever_metadata, embedder_metadata,
+  ```python
+  cleanup = await ai.start_dev_server()
+  yield
+  await cleanup()
+  ```
 
-    # Telemetry
-    TelemetryConfig,
-)
-```
+  This eliminates `create_reflection_asgi_app`, `RuntimeManager`, `ServerSpec`, and `Registry` from external consumption. All four stay internal. The multi-server sample also uses `ai.start_dev_server()` instead of manually wiring internals.
+
+### `GenerateResponse` naming: veneer vs. schema type
 
-Note: `GenerateResponse` here is the **schema type** (auto-generated, no convenience methods). In `from genkit import GenerateResponse`, it's the **veneer** (with `.text`, `.output`, etc.). Different classes, different modules, different audiences. This coexistence works without shadowing because the two types never appear in the same import path.
+`GenerateResponse` appears in two places:
+
+- **`from genkit import GenerateResponse`** — the **veneer** with `.text`, `.output`, `.tool_requests` (aliased from `GenerateResponseWrapper`). This is what app developers use.
+- **`from genkit.model import GenerateResponse`** — the **schema type** (auto-generated wire format). This is what model handlers receive and return.
+
+These are different classes in different modules. No shadowing occurs because a file imports from one or the other, never both. The veneer extends the schema type (inheritance), so they're compatible.
 
 ### Cross-language comparison
 
-| Language | App developer imports | Plugin author imports |
-|----------|----------------------|---------------------|
-| **JS** | `import { genkit, z } from 'genkit'` (unified) | Same package |
-| **Go** | `import "github.com/firebase/genkit/go/ai"` (types separate) | Same package, different types |
-| **Python (proposed)** | `from genkit import Genkit` + `from genkit.types import Part, Message` | `from genkit.plugin import GenerateRequest, Plugin` |
+- **JS** — `import { genkit } from 'genkit'` for common types, `import { ... } from 'genkit/model'` / `'genkit/retriever'` / etc. for domain-specific types.
+- **Go** — `import "github.com/firebase/genkit/go/genkit"` for the framework, `import "github.com/firebase/genkit/go/ai"` for all domain types (single package).
+- **Python (proposed)** — `from genkit import Genkit, Part, Message` for common types, `from genkit.model import ...` / `from genkit.retriever import ...` / etc. for domain-specific types.
 
-Python follows the Go pattern — types are a separate import. This matches what samples already do in practice.
+Python follows the JS pattern — common types in the top-level import, domain-specific types in sub-modules organized by action type.
 
 ---
 
 ## Internal modules
 
 Everything under `genkit._core`, `genkit._blocks`, and `genkit._ai` (note underscore prefix) carries no stability guarantee. Today these modules lack the underscore (`genkit.core`, `genkit.blocks`, `genkit.ai`), which is why samples and the documentation agent used internal paths. Renaming them is part of this proposal — the underscore is Python's convention for "private, use at your own risk."
+
+The domain sub-modules (`genkit.model`, `genkit.retriever`, etc.) are **re-export facades** — they import from the internal modules and re-export a curated public surface. The actual implementation stays in `genkit._blocks` and `genkit._core`. This decouples the public API from internal code organization, so internal refactors don't break users.
+
+---
+
+## Changes from status quo
+
+What plugins and samples currently do that needs to change. This is the migration work.
+
+### Removed from public API
+
+- **`text_from_content`** — standalone function for extracting text from `list[Part]`. Consumers should use veneers instead: `GenerateResponse.text`, `MessageWrapper.text`, `GenerateResponseChunkWrapper.text`, or `Document.text()`. Affected: google-genai reranker, internal middleware, tests.
+- **`tool_response`** — only 3 sample usages. JS/Go use a method on the tool instance.
+- **`get_logger`** — thin wrapper around `logging.getLogger("genkit")`. Consumers should use stdlib `logging` directly. Affected: 15+ plugins, 10+ samples (trivial change).
+- **`GenkitRegistry`** — internal implementation type. Should not be imported by plugins.
+- **`SimpleRetrieverOptions`** — internal implementation type.
+
+### Moved to plugins (out of core)
+
+- **`AdjustingTraceExporter`** — base class for trace exporters that adjust spans before export. Currently in `genkit.core.trace.adjusting_exporter`. Should move to the telemetry plugin (e.g., `genkit-google-cloud`). JS and Go both keep this in their cloud plugins, not core.
+- **`RedactedSpan`** — span wrapper that redacts `genkit:input`/`genkit:output` attributes. Same file as `AdjustingTraceExporter`. Should move with it. No equivalent in JS/Go core.
+
+### Reorganized (new public paths)
+
+- **`genkit.types` → `genkit`** — all app developer types unified into `from genkit import ...`. No separate `genkit.types` import.
+- **`genkit.plugin` → domain sub-modules** — plugin types split by action type: `genkit.model`, `genkit.retriever`, `genkit.embedder`, `genkit.reranker`, `genkit.evaluator`, `genkit.telemetry`. 
+- **`Plugin` class** — moved from top-level `genkit` to shared across domains (imported by plugin authors, not app developers).
+- **`FlowWrapper`** — moved from internal to a web sub-module export.
+- **`ContextProvider`, `RequestData`** — moved from internal to a web sub-module export.
+- **`create_flows_asgi_app`** — moved from internal to a web sub-module export.
+
+### Renamed
+
+- **`GenerateResponseWrapper` → `GenerateResponse`** (at the `genkit` top-level) — alias removes the "Wrapper" suffix leak.
+- **`genkit.core` → `genkit._core`**, **`genkit.blocks` → `genkit._blocks`**, **`genkit.ai` → `genkit._ai`** — underscore prefix signals internal. This is what breaks all the existing direct imports from plugins/samples.
+
+### Now explicitly internal (plugins must stop importing)
+
+These are things plugins/samples currently import from internal paths. After the rename to `_core`/`_blocks`/`_ai`, these imports break. The public replacements are listed.
+
+- `from genkit.blocks.model import ...` → `from genkit.model import ...`
+- `from genkit.blocks.retriever import ...` → `from genkit.retriever import ...`
+- `from genkit.blocks.reranker import ...` → `from genkit.reranker import ...`
+- `from genkit.blocks.document import Document` → `from genkit import Document`
+- `from genkit.core.typing import ...` → `from genkit import ...` (for content types) or `from genkit.model import ...` (for wire-format types)
+- `from genkit.core.action import Action, ActionRunContext` → `from genkit import ActionRunContext` or `from genkit.model import Action`
+- `from genkit.core.error import GenkitError` → `from genkit import GenkitError`
+- `from genkit.core.logging import get_logger` → `import logging; logger = logging.getLogger("genkit")`
+- `from genkit.core.http_client import get_cached_client` → stays internal (no public replacement yet; reconsider for export)
+- `from genkit.codec import dump_dict, dump_json` → stays internal (no public replacement)
+- `from genkit.core.registry import Registry` → stays internal (code smell; use `Genkit` instance)
+- `from genkit.core.reflection import create_reflection_asgi_app` → stays internal
+- `from genkit.ai._runtime import RuntimeManager` → stays internal
+- `from genkit.ai._server import ServerSpec` → stays internal
+- `from genkit.blocks.resource import matches_uri_template` → stays internal (MCP plugin should own this)
+
+---
+
+## Appendix
+
+### Type architecture
+
+Two layers of types:
+
+**Schema types (auto-generated).** Auto-generated from `genkit-schemas.json` (shared cross-language schema). Plain Pydantic `BaseModel` classes — data containers with no convenience methods. These are the contract between the framework and plugins. A model plugin receives a `GenerateRequest` and returns a `GenerateResponse`. Content-building types (`Message`, `Part`, `TextPart`, etc.) are also schema types — app developers construct them directly.
+
+**Veneers (hand-written wrappers).** Extend schema types with convenience methods (`.text`, `.output`, `.tool_requests`). `GenerateResponseWrapper` extends `GenerateResponse` via inheritance — aliasing it as `GenerateResponse` publicly is safe. `MessageWrapper` wraps `Message` via composition — its constructor takes a `Message` instance, so aliasing as `Message` would break `Message(role="user", content=[...])`. Users interact with `MessageWrapper` through `response.messages` but never construct it.
diff --git a/py/docs/python_beta_sdk_design.md b/py/docs/python_beta_sdk_design.md
index 0aa0b71ff0..c9baf3116a 100644
--- a/py/docs/python_beta_sdk_design.md
+++ b/py/docs/python_beta_sdk_design.md
@@ -45,8 +45,7 @@ Today there is no formal public/internal boundary. The documentation audit found
 
 We propose formalizing three entry points, separated by audience:
 
-- **`from genkit import ...`** — App developers. 5-6 symbols: `Genkit`, `ActionRunContext`, `GenerateResponse` (veneer), `GenkitError`, `UserFacingError`,  `Prompt`.
-- **`from genkit.types import ...`** — App developers (data types). Content types: `Part`, `Message`, `Document`, `Role`, `ToolChoice`, `GenerationCommonConfig`, etc.
+- **`from genkit import ...`** — App developers. ~22 symbols: framework objects (`Genkit`, `ActionRunContext`, `GenerateResponse`, `Prompt`, `GenkitError`, `UserFacingError`) and content/data types (`Part`, `Message`, `Document`, `Role`, `ToolChoice`, `GenerationCommonConfig`, etc.) in a single import. `genkit.types` remains as a backward-compatible re-export but is no longer the canonical path.
 - **`from genkit.plugin import ...`** — Plugin authors. Plugin contract: `Plugin`, `GenerateRequest`, `GenerateResponse` (schema), `OutputConfig`, `ModelInfo`, metadata builders, etc.
 
 Internal modules (`genkit.core`, `genkit.blocks`, `genkit.ai`) would be renamed with underscore prefixes (`genkit._core`, `genkit._blocks`) to signal "private, no stability guarantee" — the standard Python convention.
@@ -56,7 +55,7 @@ The full proposal — including the type architecture (auto-generated schema typ
 
 ^^^ Upon discussion, we got more details on aliasing. App developers may need access to the wire format for unit testing. They are more likely to need that actually vs. the veneer (which I think is handled internally). Also I remember Pavel said something about flow vs. generate. One returns veneer vs. other returns the wire format. He said app developer may need to use one or the other.
 
-^^^ Upon dicsussion, no clear reason to separate from genkit import vs from genkit.types import
+**Resolved — unified import:** No clear reason to separate `from genkit import ...` and `from genkit.types import ...`. Merged into a single entry point. See [python_beta_api_proposal.md](./python_beta_api_proposal.md).
 
 ^^^ Audit what's exposed via __all__ in all the packages (there are some random helpers for example)
 
diff --git a/py/docs/python_package_reorg.md b/py/docs/python_package_reorg.md
new file mode 100644
index 0000000000..857e93e232
--- /dev/null
+++ b/py/docs/python_package_reorg.md
@@ -0,0 +1,450 @@
+# Python SDK — Package Reorganization
+
+Proposal to align the Python SDK's internal package structure with Go and JS,
+enforce public/internal boundaries, and split oversized files.
+
+Related docs:
+- [python_beta_type_design.md](./python_beta_type_design.md) — type audit
+- [python_type_audit_checklist.md](./python_type_audit_checklist.md) — checklist
+- [python_beta_api_proposal.md](./python_beta_api_proposal.md) — public API surface
+- [GENKIT_CLASS_DESIGN.md](../GENKIT_CLASS_DESIGN.md) — Genkit class
+
+---
+
+## Current state
+
+```
+genkit/                          7 sub-packages, 73 .py files
+├── ai/                          god object + helpers (5 files, 4,500 lines)
+├── aio/                         async utilities (4 files)
+├── blocks/                      domain types (14 files, 7,800 lines)
+│   └── formats/                 output format impls
+├── core/                        framework internals (15 files, 5,500 lines)
+│   ├── action/                  Action class, context, types
+│   └── trace/                   OTel exporters/processors
+├── lang/                        deprecation helpers (1 file)
+├── types/                       barrel re-export
+├── web/                         ASGI server management (8 files)
+│   └── manager/
+├── __init__.py                  public API barrel
+├── codec.py                     JSON serialization helpers
+├── model_types.py               GenerationCommonConfig + api_key helpers
+└── testing.py                   test doubles
+```
+
+### Problems
+
+1. **`blocks/` doesn't exist in Go or JS.** Both put domain types in `ai/`.
+   Python's extra layer creates the question "does this go in `ai/` or `blocks/`?"
+
+2. **Orphan packages.** `aio/` (4 files), `lang/` (1 file), `types/` (barrel).
+   None earn their existence as top-level packages.
+
+3. **Giant files.** `blocks/prompt.py` (2,446 lines), `ai/_registry.py` (1,680),
+   `ai/_aio.py` (1,164). JS/Go equivalents are 600–900 lines.
+
+4. **No boundary enforcement.** Plugins import from `genkit.core.action._action`,
+   `genkit.blocks.model`, `genkit.ai._runtime` — deep internal paths. No `__all__`
+   on most `__init__.py` files.
+
+5. **Loose root files.** `codec.py` and `model_types.py` are orphans that belong
+   in `core/` and `ai/` respectively.
+
+---
+
+## Proposed structure
+
+```
+genkit/
+├── __init__.py              public API barrel (__all__ defined)
+├── ai/                      AI domain types + Genkit class
+│   ├── __init__.py          public exports (__all__ defined)
+│   ├── prompt.py            ExecutablePrompt + define_prompt (like Go ai/prompt.go)
+│   ├── streaming.py         GenerateStreamResponse
+│   ├── model.py             GenerateResponseWrapper, ChunkWrapper, MessageWrapper,
+│   │                        ModelReference, GenerationCommonConfig, define_model,
+│   │                        resolve_api_key, compute_usage_stats
+│   ├── document.py          Document, RankedDocument
+│   ├── retriever.py         RetrieverRef, RetrieverOptions, define_retriever, etc.
+│   ├── embedding.py         Embedder, EmbedderRef, EmbedderOptions, define_embedder
+│   ├── reranker.py          RerankerRef, RerankerOptions, define_reranker
+│   ├── evaluator.py         EvaluatorRef, define_evaluator
+│   ├── tools.py             ToolRunContext, ToolInterruptError, define_tool
+│   ├── resource.py          resource actions, define_resource
+│   ├── formats/             output format system
+│   │   ├── types.py         FormatDef, Formatter, FormatterConfig
+│   │   ├── json.py, text.py, jsonl.py, enum.py, array.py
+│   ├── _internal/
+│   │   ├── _genkit.py       Genkit class body (from ai/_aio.py)
+│   │   ├── _genkit_base.py  Genkit __init__, server startup (from ai/_base_async.py)
+│   │   ├── _prompt_render.py  dotprompt rendering + PromptCache (split from blocks/prompt.py)
+│   │   ├── _generate.py     generate() orchestration, tool loop (from blocks/generate.py)
+│   │   ├── _middleware.py    model middleware execution
+│   │   └── _messages.py     message construction helpers
+│
+├── core/                    framework primitives (not AI-specific)
+│   ├── __init__.py          public exports (__all__ defined)
+│   ├── action.py            Action, ActionRunContext, ActionMetadata (flattened)
+│   ├── action_types.py      ActionKind, ActionResponse, ActionMetadataKey
+│   ├── error.py             GenkitError, UserFacingError
+│   ├── plugin.py            Plugin ABC
+│   ├── flow.py              FlowWrapper (generic streaming wrapper)
+│   ├── background.py        BackgroundAction (start/check/cancel pattern)
+│   ├── dap.py               DynamicActionProvider, DapConfig
+│   ├── status_types.py      StatusCodes, Status
+│   ├── typing.py            auto-generated schema types (DO NOT EDIT header)
+│   ├── _internal/
+│   │   ├── _registry.py     Registry class
+│   │   ├── _server.py       ServerSpec (reflection API config — moved from ai/)
+│   │   ├── _runtime.py      RuntimeManager (.genkit/runtimes/ files — moved from ai/)
+│   │   ├── _flows.py        flow registration helpers
+│   │   ├── _context.py      RequestData, ContextMetadata
+│   │   ├── _tracing.py      tracing setup, span creation
+│   │   ├── _trace/          OTel exporters and processors
+│   │   │   ├── _default_exporter.py
+│   │   │   ├── _adjusting_exporter.py
+│   │   │   ├── _realtime_processor.py
+│   │   │   └── _types.py    GenkitSpan
+│   │   ├── _schema.py       schema utilities, to_json_schema
+│   │   ├── _extract.py      JSON extraction from text
+│   │   ├── _codec.py        dump_dict, dump_json (from root codec.py)
+│   │   ├── _http_client.py  HTTP client helpers
+│   │   ├── _environment.py  EnvVar, GenkitEnvironment
+│   │   ├── _aio.py          Channel, loop utils (from aio/)
+│   │   ├── _logging.py      get_logger
+│   │   ├── _constants.py
+│   │   └── _deprecations.py (from lang/)
+│
+├── _web/                    dev server only (all internal)
+│   ├── reflection.py        Dev UI reflection API (moved from core/)
+│   └── _runtime.py          RuntimeManager — writes .genkit/runtimes/ files
+│
+│   DELETED: web/manager/ (~1,500 lines, 7 types)
+│   ServerManager, ASGIServerAdapter, UvicornAdapter, GranianAdapter,
+│   SignalHandler, ServerLifecycle, ServerConfig, AbstractBaseServer,
+│   ports.py, info.py — all unused by framework/plugins. Only consumer
+│   was one sample (web-multi-server). The reflection server uses raw
+│   uvicorn directly (~15 lines in _base_async.py). No abstraction needed.
+│
+└── testing.py               ProgrammableModel, EchoModel, StaticResponseModel
+```
+
+### What changed
+
+| Change | Details |
+|---|---|
+| **Delete `blocks/`** | All files move into `ai/`. Domain types live where Go/JS put them. |
+| **Delete `aio/`** | `Channel` + loop utils → `core/_internal/_aio.py` |
+| **Delete `lang/`** | `deprecations.py` → `core/_internal/_deprecations.py` |
+| **Delete `types/`** | Barrel re-export removed. `genkit/__init__.py` handles this. |
+| **Delete `web/manager/`** | ~1,500 lines of unused multi-server orchestration. Reflection server uses raw uvicorn (~15 lines). |
+| **Delete `core/flows.py`** | `create_flows_asgi_app()` — auto-exposes flows as HTTP endpoints. Firebase Cloud Functions pattern that doesn't fit Python (Cloud Functions uses Flask, not ASGI; no `onCallGenkit` for Python). Users should use FastAPI/Flask instead. JS has this (`startFlowServer`) because the Express ecosystem aligns; Python's doesn't. ~370 lines. |
+| **Rename `web/` → `_web/`** | Prefix signals "internal, don't import". Now just reflection + runtime. |
+| **Move `core/reflection.py` → `_web/`** | It's a Starlette ASGI app, not a core primitive. Breaks `core/` → `web/` cycle. |
+| **Move `codec.py`** | → `core/_internal/_codec.py` |
+| **Delete `model_types.py`** | `GenerationCommonConfig` → `ai/model.py`. API key helpers renamed to `resolve_api_key` and exposed from `model.py`. `get_basic_usage_stats` renamed to `compute_usage_stats`. |
+| **Move `FlowWrapper`** | `ai/_registry.py` → `core/flow.py` (matches Go/JS) |
+| **Move `BackgroundAction`** | `blocks/background_model.py` → `core/background.py` (matches Go/JS) |
+| **Move `DynamicActionProvider`** | `blocks/dap.py` → `core/dap.py` (matches Go/JS) |
+| **Split `prompt.py`** | 2,446 → ~600 (prompt.py) + ~200 (streaming.py) + ~800 (_prompt_render.py) + ~400 (_prompt_cache.py) |
+| **Dissolve `ai/_registry.py`** | define_* functions move to their domain files (like Go). `define_model` → `ai/model.py`, `define_retriever` → `ai/retriever.py`, etc. Genkit method stubs stay in `ai/_internal/_genkit.py`. `_registry.py` ceases to exist. |
+| **Add `_internal/`** | Pydantic v2 pattern: private implementation behind `_internal/` |
+| **Add `__all__`** | Every public `__init__.py` declares its exports |
+
+---
+
+## Cross-language alignment
+
+After the reorg, every audited type lands in the same package as Go and JS.
+
+### `core/` — framework primitives (all three SDKs agree)
+
+| Python type | Go equivalent | JS equivalent |
+|---|---|---|
+| `Action` | `core/api/action.go` Action | `core/src/action.ts` Action |
+| `ActionRunContext` | `core/context.go` ActionContext | `core/src/context.ts` ActionContext |
+| `ActionMetadata` | `core/api/action.go` ActionDesc | `core/src/action.ts` ActionMetadata |
+| `ActionKind` | `core/api/action.go` ActionType | `core/src/registry.ts` ActionType |
+| `GenkitError` | `core/error.go` | `core/src/error.ts` |
+| `UserFacingError` | `core/error.go` | `core/src/error.ts` |
+| `Plugin` | `core/api/plugin.go` | `core/src/plugin.ts` PluginProvider |
+| `StatusCodes` | `core/status_types.go` | `core/src/statusTypes.ts` |
+| `FlowWrapper` | `core/flow.go` Flow | `core/src/flow.ts` Flow |
+| `BackgroundAction` | `core/background_action.go` | `core/src/background-action.ts` |
+| `DynamicActionProvider` | `core/api/plugin.go` DynamicPlugin | `core/src/dynamic-action-provider.ts` |
+| `Channel` | N/A (Go built-in) | `core/src/async.ts` |
+| `Registry` | `core/api/registry.go` (interface) | `core/src/registry.ts` |
+
+### `ai/` — AI domain types (all three SDKs agree)
+
+| Python type | Go equivalent | JS equivalent |
+|---|---|---|
+| `Genkit` | `genkit/genkit.go` | `genkit/src/genkit.ts` |
+| `ExecutablePrompt` | `ai/prompt.go` Prompt | `ai/src/prompt.ts` |
+| `GenerateStreamResponse` | N/A (callback-based) | `ai/src/generate.ts` |
+| `GenerateResponseWrapper` | `ai/gen.go` ModelResponse | `ai/src/generate/response.ts` |
+| `GenerateResponseChunkWrapper` | `ai/gen.go` ModelResponseChunk | `ai/src/generate/chunk.ts` |
+| `MessageWrapper` | `ai/gen.go` Message | `ai/src/message.ts` |
+| `Document` | `ai/document.go` | `ai/src/document.ts` |
+| `RankedDocument` | `ai/gen.go` RankedDocumentData | `ai/src/reranker.ts` |
+| `ToolRunContext` | `ai/tools.go` | `ai/src/tool.ts` |
+| `ToolInterruptError` | `ai/tools.go` (unexported) | `ai/src/tool.ts` |
+| `ModelReference` | `ai/generate.go` ModelRef | `ai/src/model.ts` |
+| `EmbedderRef` | `ai/embedder.go` | `ai/src/embedder.ts` EmbedderReference |
+| `RetrieverRef` / `IndexerRef` | `ai/retriever.go` | `ai/src/retriever.ts` |
+| `RerankerRef` | N/A | `ai/src/reranker.ts` RerankerReference |
+| `EvaluatorRef` | `ai/evaluator.go` | `ai/src/evaluator.ts` |
+| `Embedder` | `ai/embedder.go` | `ai/src/embedder.ts` |
+| `EmbedderOptions` / `Supports` | `ai/embedder.go` | `ai/src/embedder.ts` |
+| `RetrieverOptions` / `IndexerOptions` | `ai/retriever.go` | `ai/src/retriever.ts` |
+| `RerankerOptions` | N/A | `ai/src/reranker.ts` |
+| `FormatDef` / `Formatter` | `ai/formatter.go` | `ai/src/formats/types.ts` |
+| `GenerationCommonConfig` | `ai/gen.go` | `ai/src/model-types.ts` |
+| `ActionMetadata` | `core/api/action.go` | `core/src/action.ts` |
+
+Mismatches: **zero.** Every type ends up in the same package as Go and JS.
+
+(`Genkit` is a special case — Go/JS have a separate `genkit/` package, Python uses
+the top-level `genkit/__init__.py`. Same role, different mechanism.)
+
+---
+
+## Plugin import paths — before and after
+
+### Model plugin (e.g., google-genai gemini.py)
+
+```python
+# Before (6 deep imports):
+from genkit.ai import ActionRunContext, GENKIT_CLIENT_HEADER
+from genkit.blocks.model import get_basic_usage_stats
+from genkit.codec import dump_dict, dump_json
+from genkit.core.error import GenkitError, StatusName
+from genkit.core.tracing import tracer
+from genkit.core.typing import GenerationCommonConfig, Message, ...
+
+# After (2 imports):
+from genkit.ai import (
+    ActionRunContext, GenkitError, GenerationCommonConfig,
+    Message, get_basic_usage_stats, dump_dict, dump_json,
+)
+from genkit.core import tracer, GENKIT_CLIENT_HEADER
+```
+
+### Retriever plugin (e.g., vertex-ai vector_search.py)
+
+```python
+# Before (5 deep imports):
+from genkit.ai import Genkit
+from genkit.blocks.document import Document
+from genkit.blocks.retriever import RetrieverOptions, retriever_action_metadata
+from genkit.core.action.types import ActionKind
+from genkit.core.schema import to_json_schema
+
+# After (1 import):
+from genkit.ai import (
+    Genkit, Document, RetrieverOptions,
+    retriever_action_metadata, ActionKind, to_json_schema,
+)
+```
+
+### Telemetry plugin (e.g., observability)
+
+```python
+# Before (3 deep imports):
+from genkit.core.environment import is_dev_environment
+from genkit.core.trace.adjusting_exporter import AdjustingTraceExporter
+from genkit.core.tracing import add_custom_exporter
+
+# After (1 import):
+from genkit.core import is_dev_environment, AdjustingTraceExporter, add_custom_exporter
+```
+
+---
+
+## Circular import fix: `core/` → `_web/` cycle
+
+**Problem.** Today `core/` has a hidden dependency on `web/`:
+
+- `core/reflection.py` imports `genkit.web.manager` (it **is** a Starlette ASGI app)
+- `core/flows.py` imports `genkit.web.manager` (it **is** a Starlette ASGI app)
+- `web/` modules import from `genkit.core.*`
+
+This creates a package-level cycle: `core/ ↔ web/`.
+
+**Root cause.** Both `reflection.py` and `flows.py` are 100% HTTP server
+code — Starlette routes, ASGI apps, request/response handling. They ended
+up in `core/` by accident, not because they provide core primitives.
+
+**Fix.**
+
+- `core/reflection.py` → move to `_web/reflection.py`
+- `core/flows.py` → **delete** (see "What changed" table — Firebase pattern
+  that doesn't fit Python; users should use FastAPI/Flask)
+
+```
+_web/
+├── reflection.py    ← was core/reflection.py
+└── _runtime.py      ← RuntimeManager
+```
+
+### Additional cross-package violations to fix
+
+**`core/plugin.py` → `blocks/` (becomes `core/` → `ai/` after reorg).**
+The `Plugin` base class has two convenience methods — `model(name)` and
+`embedder(name)` — that do deferred runtime imports of `ModelReference` and
+`EmbedderRef` from `blocks/`. After the reorg, `blocks/` merges into `ai/`,
+creating a `core/ → ai/` layering violation.
+
+Fix: move `ModelReference` and `EmbedderRef` into `core/` (they're tiny
+types — just a `name: str` wrapper). Or remove the helper methods from
+`Plugin` and let plugins construct refs directly.
+
+**`ai/_base_async.py` → `web/manager/_ports.py`.**
+Imports `find_free_port_sync` — a 15-line stdlib socket utility. After the
+reorg, `web/manager/` is deleted.
+
+Fix: move `find_free_port_sync` to `core/_internal/_ports.py`. It's pure
+stdlib (`socket.bind`), no dependencies.
+
+### After all fixes
+
+The dependency graph is strictly one-directional:
+
+```
+_web/  →  ai/  →  core/
+  └────────────────↗
+```
+
+`core/` has zero imports from `_web/` or `ai/`. Clean layering.
+
+---
+
+## Boundary enforcement
+
+### 1. `__all__` on every public `__init__.py`
+
+```python
+# genkit/__init__.py
+__all__ = [
+    'Genkit', 'Document', 'GenkitError', 'UserFacingError',
+    'GenerateResponse', 'GenerateStreamResponse',
+    'ActionRunContext', 'ToolRunContext', 'Plugin',
+    # ... ~50 symbols
+]
+
+# genkit/ai/__init__.py
+__all__ = [
+    'Genkit', 'ExecutablePrompt', 'GenerateStreamResponse',
+    'Document', 'RankedDocument', 'ToolRunContext',
+    # ... AI domain types + plugin helpers
+]
+
+# genkit/core/__init__.py
+__all__ = [
+    'Action', 'ActionRunContext', 'ActionMetadata', 'ActionKind',
+    'GenkitError', 'UserFacingError', 'Plugin', 'FlowWrapper',
+    # ... framework types + plugin helpers
+]
+```
+
+### 2. `import-linter` in CI
+
+```ini
+# .importlinter
+[importlinter]
+root_package = genkit
+
+[importlinter:contract:layers]
+name = Package layers
+type = layers
+layers =
+    genkit._web
+    genkit.ai
+    genkit.core
+
+[importlinter:contract:no-internal-from-plugins]
+name = Plugins must not import _internal
+type = forbidden
+source_modules =
+    genkit.plugins
+forbidden_modules =
+    genkit.ai._internal
+    genkit.core._internal
+```
+
+### 3. `_internal/` convention
+
+Following Pydantic v2's pattern. Everything in `_internal/` can change without
+notice between minor versions. The public modules re-export what's needed.
+
+---
+
+## File size targets
+
+| File | Current | Target | How |
+|---|---|---|---|
+| `blocks/prompt.py` | 2,446 | ~600 | Split into prompt.py + streaming.py + _prompt_render.py + _prompt_cache.py |
+| `ai/_registry.py` | 1,680 | **0 (deleted)** | define_* functions move to domain files (model.py, retriever.py, etc.). Genkit method stubs absorbed into _genkit.py. File ceases to exist. |
+| `ai/_aio.py` | 1,164 | ~800 | Rename to _genkit.py, extract server startup to _genkit_base.py |
+| `blocks/generate.py` | 1,088 | ~600 | Extract tool loop to _generate.py, keep public generate function |
+| `core/typing.py` | 1,066 | 1,066 | Auto-generated, don't touch. Add DO NOT EDIT header. |
+
+Target: no hand-written file over 800 lines. Matches Go/JS norms.
+
+---
+
+## Migration path
+
+This is a **one-time refactor** with no logic changes, no API changes, no behavior
+changes. The diff is:
+
+1. Move files
+2. Update import paths (find-and-replace across plugins)
+3. Add `__all__` to `__init__.py` files
+4. Split 3 oversized files
+
+### Order of operations
+
+1. **Add `__all__` to existing `__init__.py` files** — zero-risk, clarifies
+   public API immediately. Can land as its own PR.
+
+2. **Merge `blocks/` into `ai/`** — the big structural move. Update all
+   import paths. One PR.
+
+3. **Move `FlowWrapper`, `BackgroundAction`, `DynamicActionProvider` to `core/`** —
+   small cross-language alignment fix. One PR.
+
+4. **Kill orphans** — delete `aio/`, `lang/`, `types/`, move root files.
+   One PR.
+
+5. **Create `_internal/` directories** — move implementation files behind
+   the boundary. Update internal imports. One PR.
+
+6. **Rename `web/` → `_web/`, move `core/reflection.py` into `_web/`,
+   delete `core/flows.py`** — breaks the `core/ ↔ web/` circular
+   dependency and removes the unused flows server. One PR.
+
+7. **Split oversized files** — `prompt.py`, `_registry.py`, `generate.py`.
+   One PR each.
+
+8. **Add `import-linter` to CI** — one PR, enforces the new structure going
+   forward.
+
+Each step is independently shippable and independently revertible.
+
+---
+
+## What we're NOT doing
+
+- **Not changing the public API.** `from genkit import Genkit` still works.
+  All public symbols stay accessible from `genkit.__init__`.
+
+- **Not splitting into multiple PyPI packages.** `genkit` stays as one
+  installable package. `ai/` and `core/` are internal organization.
+
+- **Not changing runtime behavior.** This is purely a code organization refactor.
+
+- **Not touching `core/typing.py`.** Auto-generated schema types stay as-is.
+
+- **Not touching plugins' public APIs.** Plugins' `__init__.py` exports
+  are unchanged. Only their internal imports from `genkit.*` are updated.
diff --git a/py/docs/python_type_audit_checklist.md b/py/docs/python_type_audit_checklist.md
new file mode 100644
index 0000000000..a9e219c8ab
--- /dev/null
+++ b/py/docs/python_type_audit_checklist.md
@@ -0,0 +1,179 @@
+# Hand-Written Type Audit — Checklist
+
+121 classes total (119 audited + 2 private: `_LatencyTrackable`, `_ModelCopyable`).
+
+Detailed write-ups: [python_beta_type_design.md](./python_beta_type_design.md),
+[python_class_audits.md](./python_class_audits.md),
+[GENKIT_CLASS_DESIGN.md](../GENKIT_CLASS_DESIGN.md).
+
+---
+
+## Must fix (5) — significant design rework
+
+- [ ] `Genkit` — god object, 38 methods, positional args, `generate_stream`
+      returns raw tuple, `define_prompt` has 23 params. Audited in
+      GENKIT_CLASS_DESIGN.md.
+
+- [ ] `ExecutablePrompt` — `opts: TypedDict` kills IDE autocomplete on
+      `__call__`, `stream`, `render`. 220-line `render()`. Fragile
+      `_ensure_resolved()` copies 20 fields. Audited in python_class_audits.md §2.
+
+- [ ] `GenerateStreamResponse` — not used by `Genkit.generate_stream()` (returns
+      raw tuple instead), not directly iterable (no `__aiter__`), lives in
+      wrong module (`blocks/prompt.py`). Audited in python_class_audits.md §5.
+
+- [ ] `GenerateResponseWrapper` — `assert_valid()`/`assert_valid_schema()` are
+      empty placeholders, missing `reasoning`/`media`/`data`/`model` properties
+      that JS has. Audited in python_class_audits.md §3.
+
+- [ ] `ToolInterruptError` — extends `Exception` not `GenkitError` (blocked on
+      #4346), `str(err)` returns empty string, `metadata` not keyword-only.
+      Audited in python_class_audits.md §6.
+
+---
+
+## Should fix (29) — non-trivial changes needed
+
+- [ ] `UserFacingError` — positional args, should be keyword-only.
+- [ ] `GenkitError` — two serialization methods + standalone function, consolidate.
+- [ ] `Document` — `.text()` is a method, not property. Inconsistent with every
+      other `.text` in the SDK. Breaking change. Audited in python_class_audits.md §4.
+- [ ] `FlowWrapper` — `stream()` returns tuple, should return `GenerateStreamResponse`.
+- [ ] `GenerationResponseError` — positional args, should be keyword-only.
+- [ ] `Plugin` — has `model()`/`embedder()` convenience but not `retriever()` etc.
+      Causes layering violation (circular import).
+- [ ] `TelemetryServerSpanExporter` — creates new `httpx.Client()` per `export()`
+      call (no connection pooling), ignores HTTP errors.
+- [ ] `ServerSpec` — confusingly similar name to `ServerConfig` (being deleted).
+      Rename to `ReflectionServerConfig` or similar.
+- [ ] `ModelReference` / `EmbedderRef` / `RetrieverRef` / `IndexerRef` /
+      `RerankerRef` / `EvaluatorRef` — wildly inconsistent shapes. `ModelReference`
+      allows extras, `EvaluatorRef` uses different fields, `EmbedderRef` missing
+      `info`. See python_beta_type_design.md §20.
+- [ ] `GenerateResponseChunkWrapper` / `MessageWrapper` — missing `reasoning`,
+      `media`, `data` properties that JS has. See python_beta_type_design.md §21.
+- [ ] `Action` — mutable `input_schema`/`output_schema` (should be immutable),
+      `on_chunk`/`on_trace_start` callbacks on public API (Python uses `async for`),
+      `run()` should be deleted, `arun()`/`arun_raw()` confusing, no `__call__`.
+      Audited in python_class_audits.md §1.
+- [ ] `ActionRunContext` / `ToolRunContext` — missing trace_id/span_id (JS
+      provides), `ToolRunContext` accesses parent private fields.
+- [ ] `FormatDef` — uses `@abc.abstractmethod` but doesn't extend `abc.ABC`.
+      One-line fix.
+- [ ] `GenkitSpan` — `__getattr__` proxy makes type invisible to type checkers.
+      Low priority.
+- [ ] `Logger` — 20-method Protocol. `warn`/`warning` redundant alias,
+      `fatal`/`critical` redundant alias. JS Logger has 7 methods.
+- [ ] `AdjustingTraceExporter` — belongs in telemetry plugin, not core SDK.
+      JS equivalent lives in `js/plugins/google-cloud/`.
+- [ ] `RealtimeSpanProcessor` — belongs in telemetry plugin, not core SDK.
+- [ ] `RedactedSpan` — used exclusively by `AdjustingTraceExporter`, moves
+      with it to telemetry plugin.
+- [ ] `GablorkenInput` — test fixture exported publicly in `__all__`. Should
+      be private or inlined into `test_models()`.
+- [ ] `PromptCache` — plain class with 3 optional fields, not even a dataclass.
+      Fold into `ExecutablePrompt` as private attributes.
+- [ ] `RerankerParams` — misnamed. Has `reranker`, `query`, `documents` — this
+      is action input, should be `RerankerRequest` for consistency with
+      `RetrieverRequest`/`IndexerRequest`.
+- [ ] `ResumeOptions` — TypedDict (same autocomplete-killer as
+      `PromptGenerateOptions`). Convert to dataclass or flatten when
+      `PromptGenerateOptions` is replaced with kwargs.
+
+---
+
+## Delete (33) — remove entirely
+
+**Replaced by kwargs on `define_*` methods:**
+- [ ] `EmbedderOptions` — flatten to kwargs on `define_embedder()`
+- [ ] `RetrieverOptions` — flatten to kwargs on `define_retriever()`
+- [ ] `IndexerOptions` — flatten to kwargs on `define_indexer()`
+- [ ] `RerankerOptions` — flatten to kwargs on `define_reranker()`
+- [ ] `ResourceOptions` — `define_resource()` already has the same kwargs
+- [ ] `DapConfig` — flatten to kwargs on `define_dynamic_action_provider()`
+- [ ] `DapCacheConfig` — one-field dataclass (`ttl_millis`), fold into parent
+- [ ] `DefineBackgroundModelOptions` — flatten to kwargs on `define_background_model()`
+- [ ] `SimpleRetrieverOptions` — flatten to kwargs on `define_simple_retriever()`
+
+**Replaced by flat kwargs on prompt methods:**
+- [ ] `PromptGenerateOptions` — 17-field TypedDict, THE autocomplete-killer
+- [ ] `OutputOptions` — dies when `PromptGenerateOptions` is replaced
+- [ ] `OutputConfigDict` — dies when `Output[T]` is replaced
+
+**Dead code / unused:**
+- [ ] `Input` / `Output` — replace with `output_schema` kwarg + `@overload`
+- [ ] `Retriever` — dead code, never instantiated
+- [ ] `ToolRequestLike` — used in 1 place as cast target, delete
+- [ ] `ResourceFn` — dead weight, only used in union with `Callable[..., ...]`
+- [ ] `MatchableAction` — code smell, `Action` already has `.matches` field
+- [ ] `ASGIApp` — defined but never used as type annotation
+- [ ] `ServerManagerProtocol` — lives in `web/manager/` being deleted
+
+**Error wire formats (consolidate into one `ErrorResponse`):**
+- [ ] `HttpErrorWireFormat` — dies with `core/flows.py`
+- [ ] `GenkitReflectionApiDetailsWireFormat` — collapse into `ErrorResponse`
+- [ ] `GenkitReflectionApiErrorWireFormat` — collapse into `ErrorResponse`
+
+**Server over-engineering (`web/manager/` deleted — 15-line problem):**
+- [ ] `ServerManager`
+- [ ] `ASGIServerAdapter`
+- [ ] `UvicornAdapter`
+- [ ] `GranianAdapter`
+- [ ] `Server`
+- [ ] `ServerConfig`
+- [ ] `ServerLifecycle`
+- [ ] `AbstractBaseServer`
+- [ ] `SignalHandler`
+- [ ] `ServerType`
+
+---
+
+## Clean (52) — no changes needed
+
+**User-facing types:**
+- [x] `RankedDocument` — `Document` subclass with `.score`. All 3 SDKs.
+- [x] `EmbedderSupports` — value type for embedder capabilities. All 3 SDKs.
+- [x] `Formatter` / `FormatterConfig` — format system base types. All 3 SDKs.
+- [x] `ActionMetadata` — 9-field data bag for action registration. All 3 SDKs.
+- [x] `GenerationCommonConfig` — extends schema type with `api_key`. ~36 files.
+- [x] `ContextMetadata` / `RequestData` — request-level context for web frameworks.
+- [x] `BackgroundAction` — wraps start/check/cancel for long-running ops. All 3 SDKs.
+- [x] `DynamicActionProvider` — runtime action discovery (MCP). All 3 SDKs.
+- [x] `Channel` — async iteration channel for streaming. JS has same.
+- [x] `Registry` — central action/plugin/schema registry. All 3 SDKs.
+- [x] `Embedder` — wraps embedder `Action` with `embed()`. Go has same.
+
+**Enums:**
+- [x] `ActionKind` — StrEnum, 17 action types.
+- [x] `ActionMetadataKey` — StrEnum, 3 keys.
+- [x] `StatusCodes` — IntEnum, gRPC-style status codes.
+- [x] `EnvVar` — StrEnum, `GENKIT_ENV`.
+- [x] `GenkitEnvironment` — StrEnum, `DEV`/`PROD`.
+- [x] `DeprecationStatus` — Enum, 3 values. Python-only.
+
+**Internal plumbing:**
+- [x] `ActionResponse` — action result wrapper. All 3 SDKs.
+- [x] `Status` — status with code + message.
+- [x] `ResourceInput` / `ResourceOutput` — action I/O for resources. All 3 SDKs.
+- [x] `RetrieverRequest` / `RetrieverSupports` / `RetrieverInfo` — retriever wire types.
+- [x] `IndexerRequest` / `IndexerInfo` — indexer wire types.
+- [x] `RerankerSupports` / `RerankerInfo` — reranker wire types.
+- [x] `PartCounts` — token counting helper.
+- [x] `PromptConfig` — BaseModel stored with prompt action. Internal.
+- [x] `ExtractItemsResult` — JSON extraction helper. Python-only.
+- [x] `DeprecationInfo` — deprecation metadata. Python-only.
+
+**Format implementations (all subclass FormatDef):**
+- [x] `TextFormat` / `JsonFormat` / `JsonlFormat` / `EnumFormat` / `ArrayFormat`
+
+**Runtime/testing:**
+- [x] `RuntimeManager` — writes `.genkit/runtimes/` for Dev UI discovery.
+- [x] `SimpleCache` — thread-safe TTL cache for DAP. Internal to `DynamicActionProvider`.
+- [x] `ProgrammableModel` / `EchoModel` / `StaticResponseModel` — test doubles.
+- [x] `SkipTestError` / `ModelTestError` / `ModelTestResult` / `TestCaseReport` — test infra.
+
+**Other:**
+- [x] `UnstableApiError` — `GenkitError` subclass for beta API gating. Matches JS.
+- [x] `DeprecatedEnumMeta` — metaclass for enum deprecation warnings. Python-only.
+- [x] `GenkitBase` / `GenkitRegistry` — `Genkit` class hierarchy. Audited in
+      GENKIT_CLASS_DESIGN.md.
diff --git a/py/docs/python_type_audit_details.md b/py/docs/python_type_audit_details.md
new file mode 100644
index 0000000000..9c51b7858c
--- /dev/null
+++ b/py/docs/python_type_audit_details.md
@@ -0,0 +1,996 @@
+# Class Audits — Action, ExecutablePrompt, GenerateResponseWrapper, Document
+
+Method-by-method audit of the four most important classes users interact with
+(after `Genkit` itself, which is covered in [GENKIT_CLASS_DESIGN.md](../GENKIT_CLASS_DESIGN.md)).
+
+---
+
+## 1. `Action`
+
+`core/action/_action.py` — the foundational type. Everything in Genkit is an
+Action. ~65 files reference it.
+
+### Class shape
+
+```
+Action(Generic[InputT, OutputT, ChunkT])
+  ├── Properties (read-only): kind, name, description, metadata, input_type, is_async
+  ├── Properties (read/write!): input_schema, output_schema
+  ├── run(input, on_chunk, context, _telemetry_labels)         → ActionResponse
+  ├── arun(input, on_chunk, context, on_trace_start, ...)      → ActionResponse
+  ├── arun_raw(raw_input, on_chunk, context, on_trace_start, ...) → ActionResponse
+  └── stream(input, context, telemetry_labels, timeout)        → tuple[AsyncIterator, Future]
+```
+
+### JS comparison
+
+JS `Action` is a **type alias**, not a class — it's a callable function with
+attached properties:
+
+```typescript
+type Action<I, O, S> = ((input?, options?) => Promise<O>) & {
+  __action: ActionMetadata<I, O, S>;
+  run(input?, options?): Promise<ActionResult<O>>;
+  stream(input?, opts?): StreamingResponse<O, S>;
+};
+```
+
+Key differences:
+- **JS actions are callable** — `await action(input)` works. Python requires
+  `await action.arun(input)`.
+- **JS has one `run()` method** that returns `ActionResult`. Python has three:
+  `run()` (sync), `arun()` (async), `arun_raw()` (async + validation).
+- **JS `stream()` returns `StreamingResponse`** with `.stream` and `.response`
+  properties. Python returns a raw tuple.
+
+### `run()`
+
+```python
+# Today
+def run(
+    self,
+    input: InputT | None = None,
+    on_chunk: StreamingCallback | None = None,
+    context: dict[str, object] | None = None,
+    _telemetry_labels: dict[str, object] | None = None,
+) -> ActionResponse[OutputT]
+```
+
+**Verdict: delete entirely.** Only exists to support sync flow/tool wrappers
+(2 callsites in `_registry.py`). The framework is async-first — sync user
+functions should be auto-wrapped with `ensure_async()` at registration time
+instead of maintaining a parallel sync execution path. JS and Go don't have
+this because they don't have separate sync/async function types.
+
+### `arun()`
+
+```python
+# Today
+async def arun(
+    self,
+    input: InputT | None = None,
+    on_chunk: StreamingCallback | None = None,
+    context: dict[str, object] | None = None,
+    on_trace_start: Callable[[str, str], None] | None = None,
+    _telemetry_labels: dict[str, object] | None = None,
+) -> ActionResponse[OutputT]
+```
+
+**Issues:**
+1. **`on_chunk` is a JS callback pattern leaking into the public API.** Python's
+   streaming convention is `async for` (async iterators), not callbacks. `on_chunk`
+   is internal plumbing — `Action.stream()` already wraps it into a `Channel`
+   (async iterator) for users. The only external caller passing `on_chunk` directly
+   is `core/reflection.py` (Dev UI server). Regular users should never see this
+   parameter; they should use `stream()` instead.
+2. **`on_trace_start` is internal Dev UI plumbing** — only called by
+   `core/reflection.py` to grab trace/span IDs for the Dev UI response. No user
+   ever passes this. Shouldn't be on the public method.
+3. `_telemetry_labels` — same underscore issue.
+4. `input` is optional but many actions require it — fails at runtime.
+
+### `arun_raw()`
+
+```python
+# Today
+async def arun_raw(
+    self,
+    raw_input: InputT | None = None,
+    on_chunk: StreamingCallback | None = None,
+    context: dict[str, object] | None = None,
+    on_trace_start: Callable[[str, str], None] | None = None,
+    telemetry_labels: dict[str, object] | None = None,
+) -> ActionResponse[OutputT]
+```
+
+**Issues:**
+1. **Confusing name** — "raw" means "I'll validate for you via Pydantic." But
+   `arun()` does NOT validate. So `arun_raw` does more work, not less. The name
+   implies the opposite.
+2. The only difference from `arun()` is Pydantic validation on input. This
+   should be a flag, not a separate method.
+3. Same `on_chunk`/`on_trace_start` callback leakage as `arun()`.
+4. `telemetry_labels` has no underscore here but has underscore in `arun()`.
+   Inconsistent.
+
+### `stream()`
+
+```python
+# Today
+def stream(
+    self,
+    input: InputT | None = None,
+    context: dict[str, object] | None = None,
+    telemetry_labels: dict[str, object] | None = None,
+    timeout: float | None = None,
+) -> tuple[AsyncIterator[ChunkT], asyncio.Future[ActionResponse[OutputT]]]
+```
+
+**Issues:**
+1. **Returns a tuple** — not directly iterable. Must destructure:
+   `chunks, future = action.stream(input)`. JS returns a `StreamingResponse`
+   with `.stream` and `.response`.
+2. **Not async** — synchronously creates a Channel and kicks off a task. This
+   is fine mechanically but unexpected (an async operation that doesn't use `await`).
+3. Creates a redundant `result_future` that wraps `stream.closed` — why not
+   just expose `stream.closed` directly?
+
+### `input_schema` / `output_schema` (property setters)
+
+```python
+@input_schema.setter
+def input_schema(self, value: dict[str, object]) -> None:
+    self._input_schema = value
+    self._metadata[ActionMetadataKey.INPUT_KEY] = value
+
+@output_schema.setter
+def output_schema(self, value: dict[str, object]) -> None:
+    self._output_schema = value
+    self._metadata[ActionMetadataKey.OUTPUT_KEY] = value
+```
+
+**Issues:**
+1. **Actions should be immutable after construction.** Mutable schemas invite
+   subtle bugs — if someone stores a reference to `action.input_schema` and
+   the schema later changes, they have stale data.
+2. The setters exist for lazy-loaded prompts that set schema after registration.
+   This is a hack around the construction order — the prompt system should pass
+   schemas at construction time, or the action should accept a schema-factory.
+
+### Proposed `Action` changes
+
+```python
+class Action(Generic[InputT, OutputT, ChunkT]):
+    # All properties read-only (remove setters)
+    kind: ActionKind       # read-only
+    name: str              # read-only
+    input_schema: dict     # read-only
+    output_schema: dict    # read-only
+
+    async def __call__(
+        self,
+        input: InputT | None = None,
+        *,
+        context: dict[str, object] | None = None,
+    ) -> ActionResponse[OutputT]:
+        """Primary execution method. Validates input, runs async."""
+
+    def stream(
+        self,
+        input: InputT | None = None,
+        *,
+        context: dict[str, object] | None = None,
+        timeout: float | None = None,
+    ) -> StreamResponse[OutputT, ChunkT]:
+        """Returns a StreamResponse (iterable + awaitable response)."""
+
+    def __repr__(self) -> str:
+        return f"Action(kind={self.kind}, name={self.name!r})"
+```
+
+**Removed:** `run()` (delete — only 2 internal callsites in `_registry.py`,
+rewrite to use `__call__` with sync-async bridging), `arun()` (replaced by
+`__call__`), `arun_raw()` (merge validation into `__call__`), schema setters,
+`on_chunk`/`on_trace_start` from public signatures.
+
+---
+
+## 2. `ExecutablePrompt`
+
+`blocks/prompt.py` — returned by `define_prompt()` and `prompt()`. The primary
+way users work with prompts.
+
+### Class shape
+
+```
+ExecutablePrompt(Generic[InputT, OutputT])
+  ├── ref (property)                                → dict
+  ├── __call__(input, opts: TypedDict | None)       → GenerateResponseWrapper
+  ├── stream(input, opts, *, timeout)               → GenerateStreamResponse
+  ├── render(input, opts)                           → GenerateActionOptions
+  ├── as_tool()                                     → Action
+  └── _ensure_resolved()                            → None (lazy loading)
+```
+
+25 constructor params. The constructor stores every prompt option as an instance
+field — model, config, system, prompt, messages, output_format,
+output_content_type, output_instructions, output_schema, output_constrained,
+max_turns, return_tool_requests, metadata, tools, tool_choice, use, docs,
+resources, plus internal fields (_name, _ns, _prompt_action, _cache_prompt).
+
+### JS comparison
+
+JS `ExecutablePrompt` is an **interface**:
+
+```typescript
+interface ExecutablePrompt<I, O, CustomOptions> {
+  ref: { name: string; metadata?: Record<string, any> };
+  (input?, opts?): Promise<GenerateResponse<O>>;
+  stream(input?, opts?): GenerateStreamResponse<O>;
+  render(input?, opts?): Promise<GenerateOptions<O>>;
+  asTool(): Promise<ToolAction>;
+}
+```
+
+Same methods, same shape. The difference is in how `opts` works.
+
+### `__call__()`
+
+```python
+# Today
+async def __call__(
+    self,
+    input: InputT | None = None,
+    opts: PromptGenerateOptions | None = None,
+) -> GenerateResponseWrapper[OutputT]
+```
+
+**Issues:**
+1. **`opts` is a TypedDict** — `PromptGenerateOptions` is a dict-like type.
+   This kills IDE autocomplete. Users must know the TypedDict keys by heart.
+   Compare with kwargs:
+   ```python
+   # TypedDict (today) — no autocomplete on keys
+   await prompt(input, opts={'model': 'gemini-2.0-flash', 'config': {...}})
+
+   # Kwargs (proposed) — full autocomplete
+   await prompt(input, model='gemini-2.0-flash', config={...})
+   ```
+2. **JS does the same thing** — `opts` is an object parameter there too. But
+   TypeScript has much better autocomplete for object literals. Python
+   TypedDicts don't get the same treatment from IDEs.
+
+### `stream()`
+
+```python
+# Today
+def stream(
+    self,
+    input: InputT | None = None,
+    opts: PromptGenerateOptions | None = None,
+    *,
+    timeout: float | None = None,
+) -> GenerateStreamResponse[OutputT]
+```
+
+**Clean.** Returns `GenerateStreamResponse` (not a tuple). This is correct —
+it's the one place streaming is done right. The irony is that
+`Genkit.generate_stream()` doesn't use this type but `ExecutablePrompt.stream()`
+does.
+
+### `render()`
+
+```python
+# Today
+async def render(
+    self,
+    input: InputT | dict[str, Any] | None = None,
+    opts: PromptGenerateOptions | None = None,
+) -> GenerateActionOptions
+```
+
+**Issues:**
+1. **220 lines of merging logic** — the method body is enormous. It merges
+   config, model, tools, output options, tool_choice, return_tool_requests,
+   max_turns, metadata, docs, resources, and messages from three sources
+   (prompt defaults, opts overrides, and input rendering). This is the
+   complexity center of the entire prompt system.
+2. `input` accepts `InputT | dict[str, Any]` — mixed typing. Should be one
+   or the other. The method body has 4 branches to handle different input types
+   (None, dict, Pydantic v2, Pydantic v1, fallback cast).
+
+### `as_tool()`
+
+```python
+# Today
+async def as_tool(self) -> Action
+```
+
+**Clean.** Simple lookup. Minor naming difference from JS (`asTool`).
+
+### `_ensure_resolved()`
+
+```python
+async def _ensure_resolved(self) -> None
+```
+
+**Issues:**
+1. **Lazy loading that can fail** — if the prompt was created via `ai.prompt(name)`,
+   it's unresolved until first use. The first `__call__`, `stream`, `render`, or
+   `as_tool` triggers resolution. If the prompt file doesn't exist, the error
+   appears at call time, not at construction time.
+2. **Copies all fields from resolved prompt** — 20 field assignments. If a new
+   field is added to `ExecutablePrompt`, someone must remember to add it here too.
+   This is fragile.
+
+### Proposed `ExecutablePrompt` changes
+
+```python
+class Prompt(Generic[InputT, OutputT]):
+    """Renamed from ExecutablePrompt (shorter, clearer)."""
+
+    @property
+    def ref(self) -> PromptRef: ...
+
+    async def __call__(
+        self,
+        input: InputT | None = None,
+        *,
+        model: str | None = None,
+        config: dict | GenerationCommonConfig | None = None,
+        tools: list[str] | None = None,
+        tool_choice: ToolChoice | None = None,
+        return_tool_requests: bool | None = None,
+        max_turns: int | None = None,
+        context: dict[str, object] | None = None,
+        output_schema: type | None = None,
+        output_format: str | None = None,
+        docs: list[DocumentData] | None = None,
+    ) -> GenerateResponse[OutputT]:
+        """Execute the prompt. Flat kwargs instead of opts TypedDict."""
+
+    def stream(
+        self,
+        input: InputT | None = None,
+        *,
+        # same kwargs as __call__
+        timeout: float | None = None,
+    ) -> GenerateStreamResponse[OutputT]: ...
+
+    async def render(
+        self,
+        input: InputT | None = None,
+        *,
+        # same kwargs as __call__
+    ) -> GenerateOptions: ...
+
+    async def as_tool(self) -> Action: ...
+```
+
+**Key changes:**
+- Rename `ExecutablePrompt` → `Prompt` (shorter).
+- Replace `opts: TypedDict` with flat kwargs for IDE autocomplete.
+- Simplify `render()` — extract merging logic into a shared helper.
+
+---
+
+## 3. `GenerateResponseWrapper`
+
+`blocks/model.py` — the response users get from `generate()`. The thing they
+interact with most after calling the model.
+
+### Class shape
+
+```
+GenerateResponseWrapper(GenerateResponse, Generic[OutputT])
+  ├── Private: _message_parser, _schema_type
+  ├── message: MessageWrapper | None
+  ├── text (cached_property)                → str
+  ├── output (cached_property)              → OutputT
+  ├── messages (cached_property)            → list[Message]
+  ├── tool_requests (cached_property)       → list[ToolRequestPart]
+  ├── interrupts (cached_property)          → list[ToolRequestPart]
+  ├── assert_valid()                        → None (PLACEHOLDER)
+  └── assert_valid_schema()                 → None (PLACEHOLDER)
+```
+
+### JS comparison
+
+JS `GenerateResponse<O>` has everything Python has, plus:
+
+| Property/Method | JS | Python |
+|---|---|---|
+| `text` | getter | cached_property |
+| `output` | getter | cached_property |
+| `reasoning` | getter | **missing** |
+| `media` | getter | **missing** |
+| `data` | getter | **missing** |
+| `toolRequests` | getter | cached_property |
+| `interrupts` | getter | cached_property |
+| `messages` | getter | cached_property |
+| `model` | field | **missing** |
+| `raw` | field | **missing** |
+| `assertValid()` | **implemented** | **placeholder (TODO)** |
+| `assertValidSchema()` | **implemented** | **placeholder (TODO)** |
+| `isValid()` | method | **missing** |
+| `toJSON()` | method | Pydantic handles it |
+
+### `__init__()`
+
+```python
+# Today
+def __init__(
+    self,
+    response: GenerateResponse,
+    request: GenerateRequest,
+    message_parser: Callable[[MessageWrapper], object] | None = None,
+    schema_type: type[BaseModel] | None = None,
+) -> None
+```
+
+**Issues:**
+1. Wraps a `GenerateResponse` but copies all fields into `super().__init__()`.
+   Could just store the response and delegate. The copy-and-reconstruct
+   pattern is fragile — if `GenerateResponse` adds a field, this breaks.
+2. `message_parser` and `schema_type` are internal — users never pass these.
+   They should be keyword-only or prefixed.
+
+### `assert_valid()` / `assert_valid_schema()`
+
+```python
+def assert_valid(self) -> None:
+    # TODO(#4343): implement
+    pass
+
+def assert_valid_schema(self) -> None:
+    # TODO(#4343): implement
+    pass
+```
+
+**Issue:** Empty placeholders since initial implementation. JS has these
+fully implemented — they check for empty responses, missing messages,
+malformed content, and schema violations. These are important for production
+use — without them, users can't validate responses programmatically.
+
+### `messages`
+
+```python
+@cached_property
+def messages(self) -> list[Message]:
+    if self.message is None:
+        return list(self.request.messages) if self.request else []
+    return [
+        *(self.request.messages if self.request else []),
+        self.message._original_message,  # private field access!
+    ]
+```
+
+**Issue:** Accesses `self.message._original_message` (private field on
+`MessageWrapper`). Should expose a public method on `MessageWrapper` for this,
+like `.unwrap()` or `.to_message()`.
+
+### `output`
+
+```python
+@cached_property
+def output(self) -> OutputT:
+    if self._message_parser and self.message is not None:
+        parsed = self._message_parser(self.message)
+    else:
+        parsed = extract_json(self.text)
+
+    if self._schema_type is not None and parsed is not None and isinstance(parsed, dict):
+        return cast(OutputT, self._schema_type.model_validate(parsed))
+
+    return cast(OutputT, parsed)
+```
+
+**Issue:** Falls back to `extract_json(self.text)` when no parser is set. This
+regex-based JSON extraction is fragile — it scans the text for `{...}` or
+`[...]`. If the model returns markdown with JSON in a code fence, this might
+extract it or might not. JS has the same pattern, so this is cross-language
+consistent at least.
+
+### Proposed `GenerateResponseWrapper` changes
+
+```python
+class GenerateResponse(Generic[OutputT]):
+    """Rename from GenerateResponseWrapper (drop 'Wrapper' suffix)."""
+
+    # Existing
+    text: str                              # property
+    output: OutputT                        # property
+    messages: list[Message]                # property
+    tool_requests: list[ToolRequestPart]   # property
+    interrupts: list[ToolRequestPart]      # property
+
+    # Add (parity with JS)
+    reasoning: str                         # for chain-of-thought models
+    media: MediaPart | None                # first media part
+    data: OutputT | None                   # first data part
+    model: str | None                      # which model generated this
+
+    # Implement
+    def assert_valid(self) -> None: ...            # actually check response
+    def assert_valid_schema(self) -> None: ...     # actually check schema
+    def is_valid(self) -> bool: ...                # non-throwing version
+```
+
+---
+
+## 4. `Document`
+
+`blocks/document.py` — used by every retrieval, embedding, and reranking
+operation. ~25 files reference it.
+
+### Class shape
+
+```
+Document(DocumentData)
+  ├── text()                                → str  (METHOD, not property!)
+  ├── media()                               → list[Media]
+  ├── data()                                → str
+  ├── data_type()                           → str | None
+  ├── get_embedding_documents(embeddings)   → list[Document]
+  ├── from_document_data(data)              → Document  (static)
+  ├── from_text(text, metadata)             → Document  (static)
+  ├── from_media(url, content_type, meta)   → Document  (static)
+  └── from_data(data, data_type, metadata)  → Document  (static)
+```
+
+### JS comparison
+
+| Member | JS | Python |
+|---|---|---|
+| `text` | **getter** (property) | **method** `text()` |
+| `media` | **getter** (property) | **method** `media()` |
+| `data` | **getter** (property) | **method** `data()` |
+| `dataType` | **getter** (property) | **method** `data_type()` |
+| `toJSON()` | method | (Pydantic handles) |
+| `getEmbeddingDocuments()` | method | method |
+| `fromText()` | static | static |
+| `fromMedia()` | static | static |
+| `fromData()` | static | static |
+
+### `text()` — method vs property
+
+```python
+# Today (Python)
+def text(self) -> str:
+    texts = []
+    for p in self.content:
+        part = p.root if hasattr(p, 'root') else p
+        text_val = getattr(part, 'text', None)
+        if isinstance(text_val, str):
+            texts.append(text_val)
+    return ''.join(texts)
+```
+
+```typescript
+// JS — property
+get text(): string {
+    return this.content.map((part) => part.text || '').join('');
+}
+```
+
+**Issues:**
+1. **The single most confusing inconsistency in the SDK.** `MessageWrapper.text`
+   is a property. `GenerateResponseWrapper.text` is a property.
+   `GenerateResponseChunkWrapper.text` is a property. `Document.text()` is a
+   method. Users will write `doc.text` (no parens) and get a bound method
+   reference instead of a string. No error, no warning, just silent bugs.
+2. **Breaking change to fix** — this is a public API. Changing from method to
+   property will break every call site that uses `doc.text()`. But the
+   inconsistency is worse than the break.
+3. Same issue applies to `media()`, `data()`, `data_type()`.
+
+### `data()` — calls `text()` twice
+
+```python
+def data(self) -> str:
+    if self.text():      # first call — scans all content
+        return self.text()  # second call — scans all content again
+    if self.media():
+        return self.media()[0].url
+    return ''
+```
+
+**Issue:** Scans content twice. Should cache or store the result. Not a
+correctness bug but wasteful. Same issue with `data_type()` calling `text()`
+and `media()` again.
+
+### Constructor — deep copies
+
+```python
+def __init__(
+    self,
+    content: list[DocumentPart],
+    metadata: dict[str, Any] | None = None,
+) -> None:
+    doc_content = deepcopy(content)
+    doc_metadata = deepcopy(metadata)
+    super().__init__(content=doc_content, metadata=doc_metadata)
+```
+
+**Issue:** Always deep-copies content and metadata. JS does the same, so
+this is cross-language consistent. But in Python, `deepcopy` on Pydantic
+models is expensive. For large documents (e.g., embedding pipelines with
+thousands of documents), this could be a performance bottleneck.
+
+### Proposed `Document` changes
+
+```python
+class Document(DocumentData):
+    @property
+    def text(self) -> str: ...          # Change to property
+
+    @property
+    def media(self) -> list[Media]: ... # Change to property
+
+    @property
+    def data(self) -> str: ...          # Change to property
+
+    @property
+    def data_type(self) -> str | None: ...  # Change to property
+
+    # Static factories stay the same
+    @staticmethod
+    def from_text(text: str, metadata: dict | None = None) -> Document: ...
+    @staticmethod
+    def from_media(url: str, content_type: str | None = None, ...) -> Document: ...
+    @staticmethod
+    def from_data(data: str, data_type: str | None = None, ...) -> Document: ...
+```
+
+**Key changes:**
+- All accessors become `@property` (or `@cached_property` for perf).
+- Breaking change for `text()`, `media()`, `data()`, `data_type()` call sites.
+- Consider lazy `@cached_property` to avoid scanning content multiple times.
+
+---
+
+## 5. `GenerateStreamResponse`
+
+**File:** `blocks/prompt.py` (lines 414–539)
+**Base class:** `Generic[OutputT]`
+
+### Class shape
+
+```python
+class GenerateStreamResponse(Generic[OutputT]):
+    _channel: Channel[GenerateResponseChunkWrapper, GenerateResponseWrapper[OutputT]]
+    _response_future: asyncio.Future[GenerateResponseWrapper[OutputT]]
+
+    @property stream -> AsyncIterable[GenerateResponseChunkWrapper]
+    @property response -> Awaitable[GenerateResponseWrapper[OutputT]]
+```
+
+Two properties, two private fields. That's the entire class.
+
+### JS comparison
+
+```typescript
+// js/ai/src/generate.ts
+export interface GenerateStreamResponse<O extends z.ZodTypeAny = z.ZodTypeAny> {
+  get stream(): AsyncIterable<GenerateResponseChunk>;
+  get response(): Promise<GenerateResponse<O>>;
+}
+```
+
+JS has the identical interface — `stream` + `response`. But critically, JS uses
+this type everywhere: both `generateStream()` and `prompt.stream()` return it.
+
+### Go comparison
+
+Go has no wrapper class. `GenerateStream()` returns `iter.Seq2[*ModelStreamValue, error]`
+— a native Go iterator. Each yielded `ModelStreamValue` has either `.Chunk` (streaming)
+or `.Done == true` with `.Response` (final). Go-idiomatic, no need for a wrapper.
+
+### Issue 1: Not used by `Genkit.generate_stream()`
+
+This is the biggest problem. The main streaming entry point returns a raw tuple:
+
+```python
+# ai/_aio.py
+def generate_stream(self, ...) -> tuple[
+    AsyncIterator[GenerateResponseChunkWrapper],
+    asyncio.Future[GenerateResponseWrapper[Any]],
+]:
+```
+
+But `ExecutablePrompt.stream()` returns `GenerateStreamResponse`. So there are
+two inconsistent streaming APIs in the same SDK:
+
+```python
+# Prompt streaming — nice wrapper
+result = prompt.stream({"topic": "AI"})
+async for chunk in result.stream:
+    print(chunk.text)
+final = await result.response
+
+# Genkit.generate_stream() — raw tuple
+stream, future = ai.generate_stream(prompt="hello")
+async for chunk in stream:
+    print(chunk.text)
+final = await future
+```
+
+JS doesn't have this split — both paths return `GenerateStreamResponse`.
+
+### Issue 2: Not directly iterable
+
+You can't do `async for chunk in result:` — you must access `.stream` first.
+Python convention for iterable wrappers is to implement `__aiter__`:
+
+```python
+# Current — requires .stream access
+async for chunk in result.stream:
+    print(chunk.text)
+
+# Expected Pythonic pattern
+async for chunk in result:
+    print(chunk.text)
+```
+
+JS has the same `.stream` access pattern, but Python's `async for` protocol
+makes direct iteration a stronger convention.
+
+### Issue 3: Lives in wrong module
+
+Defined in `blocks/prompt.py` even though it's a general streaming response type.
+It's not prompt-specific — `Genkit.generate_stream()` should use it too.
+Should live in `blocks/generate.py` or `blocks/model.py`.
+
+### Issue 4: No `__await__`
+
+You can't `await` the response directly on the object:
+
+```python
+# Current — must access .response
+final = await result.response
+
+# Could support direct await
+final = await result
+```
+
+This is a minor convenience but makes the object more Pythonic.
+
+### Issue 5: No `__repr__`
+
+`repr(result)` gives `<GenerateStreamResponse object at 0x...>`. Should show
+useful state (e.g., whether stream is consumed, whether response is resolved).
+
+### Proposed `GenerateStreamResponse` changes
+
+1. **Wire into `Genkit.generate_stream()`** — return `GenerateStreamResponse`
+   instead of raw tuple. This is the highest-priority fix. One streaming API,
+   not two.
+
+2. **Add `__aiter__`** — delegate to `self._channel` so `async for chunk in result:`
+   works directly.
+
+3. **Add `__await__`** — delegate to `self._response_future` so `final = await result`
+   works as a shortcut for `await result.response`.
+
+4. **Move to `blocks/generate.py`** or a shared module — it's not prompt-specific.
+
+5. **Rename to `StreamResponse`** — shorter, matches the pattern of removing
+   redundant prefixes (`GenerateResponseWrapper` → `GenerateResponse`).
+
+6. **Add `__repr__`** — show stream/response state.
+
+After these changes:
+
+```python
+# Unified streaming API
+result = ai.generate_stream(prompt="hello")
+# OR
+result = prompt.stream({"topic": "AI"})
+
+# Direct iteration (no .stream needed)
+async for chunk in result:
+    print(chunk.text)
+
+# Direct await (no .response needed)
+final = await result
+
+# .stream and .response still work for explicit access
+async for chunk in result.stream:
+    print(chunk.text)
+final = await result.response
+```
+
+---
+
+## 6. `ToolInterruptError`
+
+**File:** `blocks/tools.py` (lines 172–188)
+**Base class:** `Exception`
+
+### Class shape
+
+```python
+class ToolInterruptError(Exception):
+    metadata: dict[str, Any]
+
+    def __init__(self, metadata: dict[str, Any] | None = None) -> None:
+        super().__init__()
+        self.metadata = metadata or {}
+```
+
+One field, one constructor. No methods, no `__str__`, no `__repr__`.
+
+### JS comparison
+
+```typescript
+// js/ai/src/tool.ts
+export class ToolInterruptError extends Error {
+  constructor(readonly metadata?: Record<string, any>) {
+    super();
+    this.name = 'ToolInterruptError';
+  }
+}
+```
+
+Same shape but sets `this.name`. Both extend base error (not framework error).
+JS comment: "It's meant to be caught by the framework, not public API."
+
+### Go comparison
+
+```go
+// go/ai/tools.go (unexported)
+type toolInterruptError struct {
+    Metadata map[string]any
+}
+
+func (e *toolInterruptError) Error() string {
+    if e.Metadata != nil {
+        data, _ := json.MarshalIndent(e.Metadata, "", "  ")
+        return fmt.Sprintf("tool execution interrupted: \n\n%s", string(data))
+    }
+    return "tool execution interrupted"
+}
+
+func IsToolInterruptError(err error) (bool, map[string]any) { ... }
+```
+
+Go is the best here: unexported type (can't be constructed by users),
+public `IsToolInterruptError()` helper for checking, and a useful `Error()` string.
+
+### Issue 1: Extends `Exception` not `GenkitError`
+
+The TODO at line 171 says it all:
+
+```python
+# TODO(#4346): make this extend GenkitError once it has INTERRUPTED status
+```
+
+This means `except GenkitError` won't catch tool interrupts. Users who write
+broad Genkit error handlers will miss these. Blocked on adding an `INTERRUPTED`
+status code to `StatusCodes`.
+
+### Issue 2: No error message
+
+```python
+err = ToolInterruptError(metadata={"step": "confirm"})
+str(err)   # => ''
+repr(err)  # => 'ToolInterruptError()'
+```
+
+Compare Go: `"tool execution interrupted: \n\n{\"step\": \"confirm\"}"` — actually
+useful in logs. Python's version is silent, which makes debugging painful.
+
+### Issue 3: `metadata` should be keyword-only
+
+```python
+# Currently allows positional
+ToolInterruptError({"key": "val"})
+
+# Should require keyword
+ToolInterruptError(metadata={"key": "val"})
+```
+
+All other error constructors in the SDK are being moved to keyword-only.
+This should follow.
+
+### Issue 4: No `__repr__`
+
+As noted above, `repr()` is useless. Should show metadata.
+
+### Issue 5: Mutable default via `or {}`
+
+```python
+self.metadata = metadata or {}
+```
+
+This creates a new dict each time (which is fine), but the pattern is inconsistent
+with the rest of the codebase which uses `field(default_factory=dict)` for dataclasses
+or explicit `None` checks. Minor.
+
+### Proposed `ToolInterruptError` changes
+
+1. **Extend `GenkitError`** once `StatusCodes.INTERRUPTED` exists (unblock #4346).
+   This gives: status code, serialization, cause chaining for free.
+
+2. **Add `__str__`** — `"tool execution interrupted"` + metadata dump (match Go).
+
+3. **Add `__repr__`** — `ToolInterruptError(metadata={'step': 'confirm'})`.
+
+4. **Make `metadata` keyword-only**:
+   ```python
+   def __init__(self, *, metadata: dict[str, Any] | None = None) -> None:
+   ```
+
+5. **Consider Go pattern** — make the class private (`_ToolInterruptError`) with
+   a public `is_tool_interrupt(err)` helper, since the JS comment says "not public
+   API." Python can't fully hide it (users need `except ToolInterruptError`), but
+   the Go pattern is worth noting.
+
+After these changes:
+
+```python
+class ToolInterruptError(GenkitError):
+    def __init__(self, *, metadata: dict[str, Any] | None = None) -> None:
+        super().__init__(status=StatusCodes.INTERRUPTED, message="tool execution interrupted")
+        self.metadata: dict[str, Any] = metadata or {}
+
+    def __str__(self) -> str:
+        if self.metadata:
+            return f"tool execution interrupted: {json.dumps(self.metadata, indent=2)}"
+        return "tool execution interrupted"
+
+    def __repr__(self) -> str:
+        return f"ToolInterruptError(metadata={self.metadata!r})"
+```
+
+---
+
+## Summary of all issues
+
+### High priority (user-facing, correctness, or API consistency)
+
+| Class | Issue | Effort |
+|---|---|---|
+| `Action` | `stream()` returns tuple instead of iterable object | medium |
+| `Action` | No `__call__` — can't do `await action(input)` | low |
+| `Action` | `on_chunk` callback on public API — Python uses `async for` not callbacks | medium |
+| `Action` | `arun()` vs `arun_raw()` confusing, inconsistent naming | medium |
+| `Action` | Mutable `input_schema`/`output_schema` setters | low |
+| `ExecutablePrompt` | `opts: TypedDict` kills autocomplete | medium |
+| `ExecutablePrompt` | `render()` is 220 lines of merging | refactor |
+| `GenerateResponseWrapper` | `assert_valid()`/`assert_valid_schema()` empty | medium |
+| `GenerateResponseWrapper` | Missing `reasoning`, `media`, `data` | low |
+| `GenerateResponseWrapper` | Missing `model` field | low |
+| `GenerateStreamResponse` | Not used by `Genkit.generate_stream()` — two streaming APIs | medium |
+| `GenerateStreamResponse` | Not directly iterable (no `__aiter__`) | low |
+| `ToolInterruptError` | Extends `Exception` not `GenkitError` — blocked on #4346 | medium |
+| `Document` | `text()` is method, not property — inconsistent | **breaking** |
+
+### Medium priority (engineering quality)
+
+| Class | Issue | Effort |
+|---|---|---|
+| `Action` | No `__repr__` | low |
+| `Action` | `_telemetry_labels` inconsistent underscore | low |
+| `Action` | `on_trace_start` Dev UI plumbing leaked into public API | low |
+| `ExecutablePrompt` | 25 constructor params | refactor |
+| `ExecutablePrompt` | `_ensure_resolved()` copies 20 fields — fragile | refactor |
+| `GenerateResponseWrapper` | Accesses `message._original_message` | low |
+| `GenerateResponseWrapper` | Constructor copies fields from response — fragile | refactor |
+| `GenerateStreamResponse` | Lives in `blocks/prompt.py` — not prompt-specific | low |
+| `ToolInterruptError` | No `__str__` — empty string in logs | low |
+| `ToolInterruptError` | `metadata` should be keyword-only | low |
+| `Document` | `data()` calls `text()` twice | low |
+| `Document` | `deepcopy` on every construction — perf risk | low |
+
+### Low priority (nice to have)
+
+| Class | Issue | Effort |
+|---|---|---|
+| `Action` | `run()` sync method — remove entirely (2 internal callsites) | low |
+| `ExecutablePrompt` | Rename to `Prompt` | **breaking** |
+| `GenerateResponseWrapper` | Rename to `GenerateResponse` | **breaking** |
+| `GenerateResponseWrapper` | Add `is_valid()` non-throwing check | low |
+| `GenerateStreamResponse` | No `__await__` for direct `await result` | low |
+| `GenerateStreamResponse` | Rename to `StreamResponse` | **breaking** |
+| `ToolInterruptError` | No `__repr__` | low |

From 240378ff1450240ac1cd73cd61b8061c4422c501 Mon Sep 17 00:00:00 2001
From: Jeff Huang <huangjeff@google.com>
Date: Fri, 20 Feb 2026 11:12:54 -0600
Subject: [PATCH 13/17] more updates

---
 py/docs/python_beta_api_proposal.md    | 410 +++++++------------------
 py/docs/python_beta_sdk_design.md      | 245 +++++++++++++--
 py/docs/python_package_reorg.md        | 269 +++++++---------
 py/docs/python_type_audit_checklist.md |  12 +-
 4 files changed, 470 insertions(+), 466 deletions(-)

diff --git a/py/docs/python_beta_api_proposal.md b/py/docs/python_beta_api_proposal.md
index 64ad343f87..48ea9df57d 100644
--- a/py/docs/python_beta_api_proposal.md
+++ b/py/docs/python_beta_api_proposal.md
@@ -1,237 +1,142 @@
 # Genkit Python — Public API Surface Proposal
 
-Stable public symbols — what we document, support, and commit to. Everything else is internal.
+What's importable, what's not, and where the boundary is.
 
 Two audiences, separate entry points:
 
-1. **App developers** — `from genkit import ...` for framework objects, content types, errors.
-2. **Plugin authors / advanced users** — domain sub-modules (`genkit.model`, `genkit.retriever`, etc.) for request/response schemas, config types, metadata builders.
-
-> **Type architecture detail:** The SDK has schema types (auto-generated Pydantic models from `genkit-schemas.json`) and veneers (hand-written wrappers that add convenience methods like `.text`, `.output`). See [Type Architecture](#type-architecture) appendix at the end of this doc for the full breakdown.
+1. **Simple Path** — `from genkit import ...`
+2. **Advanced Usage** — domain sub-modules (`genkit.model`, `genkit.retriever`, etc.)
 
 ---
 
-## Entry point 1: `from genkit import ...`
+## 1. Proposed imports
 
-The single entry point for app developers. Framework objects, veneers, context, errors, and all content/data types live here. No separate `genkit.types` import needed.
+### `from genkit import ...` — app developers
 
 ```python
 from genkit import (
     # Core
     Genkit,
     ActionRunContext,
-    GenerateResponse,      # veneer — aliased from GenerateResponseWrapper
-    GenerateResponseChunk, # veneer — aliased from GenerateResponseChunkWrapper (streaming)
+    GenerateResponse,       # veneer (aliased from GenerateResponseWrapper)
+    GenerateResponseChunk,  # veneer (aliased from GenerateResponseChunkWrapper)
+    StreamResponse,         # renamed from GenerateStreamResponse
+    ExecutablePrompt,  
     GenkitError,
     UserFacingError,
-    Prompt,
 
-    # Content
+    # Content types
     Part, TextPart, MediaPart, Media,
     DataPart, ToolRequestPart, ToolResponsePart, CustomPart,
     ReasoningPart,
 
     # Messages
-    Message, Role, Metadata,
+    Message, Role,
 
     # Documents
     Document, DocumentData, DocumentPart,
 
-    # Context
+    # Tool context
     ToolRunContext,
     ToolInterruptError,
+    ToolChoice,
+
+    # Generation config
+    GenerationCommonConfig,
 
     # Evaluation
     BaseEvalDataPoint,
 
-    # Tool control
-    ToolChoice,
+    # Web framework integration
+    RequestData,
+    ContextProvider,
 
-    # Generation config
-    GenerationCommonConfig,
+    # Constants
+    GENKIT_VERSION,
+    GENKIT_CLIENT_HEADER,
+    is_dev_environment,
 
-    # Plugin authoring (also used by advanced app developers)
+    # Plugin authoring (also used by advanced app devs)
     Plugin,
     Action,
     ActionMetadata,
     ActionKind,
-    StatusName,
-    to_json_schema,
+    StatusCodes,
 )
 ```
 
-**~29 symbols.** One import covers both app developers (~22 symbols) and plugin authors (~7 additional). This is normal for Python — OpenAI and Anthropic export far more from their top level.
-
-- `Genkit` — the entry point. Every app starts with this.
-- `ActionRunContext` — context object inside flows and tools.
-- `GenerateResponse` — return type of `ai.generate()` — veneer with `.text`, `.output`, `.tool_requests`.
-- `GenerateResponseChunk` — chunk type from `ai.generate_stream()` — veneer with `.text` (aliased from `GenerateResponseChunkWrapper`).
-- `Prompt` — return type of `ai.prompt()`. Core concept, needs to be type-annotatable.
-- `GenkitError` — base error class for catching framework errors.
-- `UserFacingError` — errors safe to surface to HTTP clients.
-- `Part`, `TextPart`, `MediaPart`, `Media`, `DataPart`, `ToolRequestPart`, `ToolResponsePart`, `CustomPart`, `ReasoningPart` — content types developers construct and pass around.
-- `Message`, `Role`, `Metadata` — message construction for multi-turn conversations.
-- `Document`, `DocumentData`, `DocumentPart` — RAG document types.
-- `ToolRunContext` — extended context for tool handlers (extends `ActionRunContext`).
-- `ToolInterruptError` — error type for tool interrupts.
-- `ToolChoice` — tool selection control for `generate()`.
-- `GenerationCommonConfig` — model config (temperature, top_k, etc.).
-- `BaseEvalDataPoint` — evaluation data point type.
-- `Plugin` — base class for all plugin types (plugin authors).
-- `Action` — core action type (plugin authors).
-- `ActionMetadata` — action registration metadata (plugin authors).
-- `ActionKind` — action type enum: model, retriever, embedder, etc. (plugin authors).
-- `StatusName` — error status codes (plugin authors, error handling).
-- `to_json_schema` — converts Pydantic models to JSON Schema (plugin authors, 10+ plugins use this during action registration).
-
-### Veneer aliasing
-
-Users should never see "Wrapper" suffixes. The fix:
-
-```python
-# genkit/__init__.py
-from genkit.blocks.model import GenerateResponseWrapper as GenerateResponse
-from genkit.blocks.model import GenerateResponseChunkWrapper as GenerateResponseChunk
-```
-
-Both use inheritance (extend the schema type), so these aliases are safe — `isinstance` checks still work.
-
-**`MessageWrapper` is the exception.** It uses composition — its constructor takes a `Message` instance, not raw fields. Aliasing it as `Message` would break `Message(role="user", content=[...])`. So `Message` remains the schema type everywhere. Users interact with `MessageWrapper` via `response.messages` but never construct it directly.
+**~34 symbols.** One import covers both app developers (~25) and plugin authors
+(~9 additional). Normal for Python — OpenAI and Anthropic export more.
 
-### `ExecutablePrompt` — should it be public?
+Notes:
+- `GenerateResponse` / `GenerateResponseChunk` — aliases that hide the "Wrapper" suffix.
+  Both use inheritance, so `isinstance` checks work.
+- `Message` is the schema type, not `MessageWrapper`. `MessageWrapper` uses composition
+  so aliasing it would break `Message(role="user", content=[...])`. Users get
+  `MessageWrapper` via `response.messages` but never construct it.
+- `ExecutablePrompt` — exported so users can type-annotate: `my_prompt: ExecutablePrompt = ai.prompt("greeting")`.
 
-`ExecutablePrompt` is the class returned by `ai.prompt()`. Today it's not exported — users can't type-annotate a variable that holds a prompt reference.
-
-```python
-# Today: no way to annotate this
-my_prompt = ai.prompt("greeting")
-
-# Proposed: export as Prompt
-from genkit import Prompt
-my_prompt: Prompt = ai.prompt("greeting")
-```
-
-Recommendation: export it as `Prompt`. It's a core concept, and being unable to type-annotate it is a gap. This would bring the top-level to 6 symbols.
-
-### What was removed
-
-- `tool_response` — only 3 sample usages. JS/Go use a method on the tool instance.
-- `Plugin` — users pass plugin instances (`GoogleAI()`), never reference the type. Moved to shared across domains.
-- `get_logger` — thin wrapper around `logging.getLogger("genkit")`. Use the stdlib.
-- `GenkitRegistry`, `FlowWrapper`, `SimpleRetrieverOptions` — internal implementation types.
-
-### `ToolRunContext` placement
-
-`ToolRunContext` extends `ActionRunContext` with tool-specific features. Both types are kept (for documentation clarity, future-proofing, and runtime `isinstance` checks), but only `ActionRunContext` is exported from the top level. `ToolRunContext` is available from `genkit.types` for type annotations when needed.
-
----
-
-Types specific to a domain (model, retriever, embedder, etc.) live in domain sub-modules — not in the top-level `genkit` import. These types are used by both plugin authors and advanced app developers (e.g., writing middleware or defining custom models).
-
----
-
-## Domain sub-modules
-
-Organized by action type, mirroring the JS SDK's `genkit/model`, `genkit/retriever`, etc. Each sub-module contains the wire-format types, metadata builders, helpers, and options for that domain. Both plugin authors and advanced app developers import from here.
-
-### `genkit.model`
-
-Everything related to model implementation and the model wire format.
+### `genkit.model` — model plugin authors
 
 ```python
 from genkit.model import (
-    # Wire-format types
     GenerateRequest,
-    GenerateResponse,       # schema type — NOT the veneer
+    GenerateResponse,        # schema type (NOT the veneer)
     GenerateResponseChunk,
     GenerationUsage,
     Candidate,
     OutputConfig,
     FinishReason,
     GenerateActionOptions,
-    Error,                  # schema error type (not GenkitError)
-    Operation,              # long-running operation type
-
-    # Tool wire-format types (used by model handlers to process tool calls)
+    Error,
+    Operation,
     ToolRequest,
     ToolDefinition,
     ToolResponse,
-
-    # Model info and capabilities
     ModelInfo,
     Supports,
     Constrained,
     Stage,
-
-    # Registration and metadata
     model_action_metadata,
     model_ref,
     ModelReference,
-
-    # Background / long-running models (e.g. video generation)
     BackgroundAction,
     lookup_background_action,
-
-    # Helpers
     compute_usage_stats,
-    resolve_api_key,             # resolves API key: request config overrides plugin default
+    resolve_api_key,
     GenerationCommonConfig,
-
-    # Model middleware - WIP
     ModelMiddleware,
     ModelMiddlewareNext,
 )
 ```
 
-Used by: model plugin authors, app developers writing middleware (`GenerateRequest`, `GenerateResponse`, `ModelMiddlewareNext`), app developers defining custom models (`ModelInfo`, `Supports`).
-
-**Notes on helpers:**
-
-- **`resolve_api_key(config, plugin_key)`** — resolves which API key to use: per-request key from `GenerationCommonConfig` overrides the plugin-level default. In JS, this logic lives duplicated in each plugin's `utils.ts` (`calculateApiKey`). Centralizing it in `genkit.model` avoids every plugin re-inventing key resolution for multi-tenancy. The lower-level extraction function (`extract_request_api_key`) stays in `genkit.blocks.model` but is not re-exported to the public API — only the google-genai plugin needs it for the `apiKey: false` ADC edge case.
-- **`compute_usage_stats(input, response)`** — renamed from `get_basic_usage_stats`. Counts characters, images, videos, and audio in input/output messages. "Compute" reflects that it does work (not a lookup), and "basic" was dropped (basic compared to what?).
-- **`text_from_content`** — removed from public API. Consumers should use the veneer layer instead:
-  - **Messages:** `MessageWrapper.text` (available on `response.messages[i].text`)
-  - **Responses:** `GenerateResponse.text` (the veneer's `.text` property)
-  - **Stream chunks:** `GenerateResponseChunkWrapper.text`
-  - **Documents:** `Document.text()` (already exists on the `Document` class)
-  - Current consumers: google-genai reranker (should use `Document.text()`), internal middleware (should use `doc.text()`), tests (should use chunk/response veneers).
+`GenerateResponse` naming: `from genkit import GenerateResponse` = veneer.
+`from genkit.model import GenerateResponse` = schema type. No shadowing — a file
+imports from one or the other. Veneer extends schema via inheritance.
 
 ### `genkit.retriever`
 
 ```python
 from genkit.retriever import (
-    # Wire-format types
     RetrieverRequest,
     RetrieverResponse,
-
-    # Registration and metadata
     retriever_action_metadata,
-    create_retriever_ref,
-    RetrieverOptions,
-
-    # Indexer support
+    retriever_ref,
     IndexerRequest,
-    IndexerOptions,
     indexer_action_metadata,
-    create_indexer_ref,
+    indexer_ref,
 )
 ```
 
-Used by: retriever/indexer plugin authors, app developers using `RetrieverResponse` as a type annotation.
-
 ### `genkit.embedder`
 
 ```python
 from genkit.embedder import (
-    # Wire-format types
     EmbedRequest,
     EmbedResponse,
     Embedding,
-
-    # Registration and metadata
     embedder_action_metadata,
-    create_embedder_ref,
-    EmbedderOptions,
+    embedder_ref,
     EmbedderSupports,
 )
 ```
@@ -241,7 +146,7 @@ from genkit.embedder import (
 ```python
 from genkit.reranker import (
     reranker_action_metadata,
-    create_reranker_ref,
+    reranker_ref,
     RankedDocument,
     RerankerRequest,
     RerankerResponse,
@@ -254,7 +159,6 @@ from genkit.reranker import (
 
 ```python
 from genkit.evaluator import (
-    # Wire-format types
     EvalRequest,
     EvalResponse,
     EvalFnResponse,
@@ -262,187 +166,111 @@ from genkit.evaluator import (
     Details,
     BaseEvalDataPoint,
     EvalStatusEnum,
-
-    # Registration and metadata
     evaluator_action_metadata,
     evaluator_ref,
 )
 ```
 
-Used by: evaluator plugin authors and app developers writing custom evaluators (samples show both).
-
-### `genkit.web`
-
-Web framework integration (FastAPI, Flask, custom ASGI apps).
-
-```python
-from genkit.web import (
-    FlowWrapper,
-    ContextProvider,
-    RequestData,
-    create_flows_asgi_app,
-)
-```
-
-Used by: fastapi plugin, flask plugin, app developers serving flows over HTTP.
-
-### `genkit.telemetry`
-
-```python
-from genkit.telemetry import (
-    add_custom_exporter,
-    is_dev_environment,
-    GENKIT_VERSION,
-    GENKIT_CLIENT_HEADER,
-    tracer,
-)
-```
-
-Used by: telemetry plugins (observability, Google Cloud, Firebase, Amazon Bedrock, Cloudflare, Microsoft Foundry).
-
-`AdjustingTraceExporter` and `RedactedSpan` should live in the telemetry *plugin* (e.g., `genkit-google-cloud`), not in core. Both are implementation details of specific telemetry providers — JS and Go also keep these in their cloud plugins, not core.
-
-### Shared across domains (plugin authoring)
-
-These symbols are used by plugin authors across multiple domains. They live in the top-level `from genkit import ...` alongside the app-developer symbols:
+### `genkit.tracing` — telemetry plugin authors
 
 ```python
-from genkit import (
-    # (app-developer symbols from Entry point 1 above, plus:)
-
-    # Plugin authoring
-    Plugin,             # base class for all plugin types
-    Action,             # core action type
-    ActionMetadata,     # action registration metadata
-    ActionKind,         # action type enum (model, retriever, embedder, etc.)
-    StatusName,         # error status codes
-    DocumentPart,       # part type within Documents (vs. message Parts)
-    to_json_schema,     # converts Pydantic models to JSON Schema
-)
+from genkit.tracing import tracer, add_custom_exporter
 ```
 
-This brings the total top-level surface to **~29 symbols** (22 app-developer + 7 plugin-authoring). All are stable, documented types — no implementation details.
-
----
-
-## Internal — resolved decisions
-
-Helpers that were candidates for export. Each has a final verdict.
+## 2. What we removed from imports (and why)
 
-| Symbol | Consumers | Verdict | Reasoning |
-|---|---|---|---|
-| `get_logger` | 15+ plugins, 10+ samples | **Drop.** | Structlog wrapper. Neither JS nor Go force a logging library. Use stdlib `logging`. |
-| `get_cached_client` | 9 plugins | **Internal (reconsider later).** | Per-event-loop httpx client cache. Solves a real async problem (~100 lines to reimplement). No JS/Go equivalent. Keep internal but may export if third-party plugins need it. |
-| `dump_dict` / `dump_json` | 15+ consumers | **Remove.** Fix at source. | Wrappers for `model_dump(exclude_none=True, by_alias=True)`. Fix: emit `GenkitBaseModel` from the code generator that defaults these flags. Then `.model_dump()` just works. See [pydantic/pydantic#10141](https://github.com/pydantic/pydantic/issues/10141). |
-| `get_callable_json` | fastapi, flask, core | **Remove.** Add method instead. | Converts exceptions to JSON for HTTP responses. Fix: add `.to_json()` and `.http_status` to `GenkitError` (matches Go's `.ToReflectionError()`). |
-| `matches_uri_template` | MCP plugin only | **Internal.** | 15-line regex helper. MCP plugin should own its copy. |
+### Removed from `from genkit import ...`
 
-- **`create_reflection_asgi_app`, `RuntimeManager`, `ServerSpec`, `Registry`** — dev-mode reflection server infrastructure. Used by fastapi plugin, multi-server sample, and core `_base_async.py`.
+| Symbol | Why |
+|---|---|
+| `Input` / `Output` | Type deleted. Replaced by `output_schema` kwarg. Neither JS nor Go needed this. |
+| `GenkitRegistry` | Internal implementation type. Plugins use the `Genkit` instance. |
+| `FlowWrapper` | Internal. Not needed by app developers. |
+| `SimpleRetrieverOptions` | Type deleted. Flatten to kwargs on `define_simple_retriever()`. |
+| `PromptGenerateOptions` | Type deleted. 17-field TypedDict that killed IDE autocomplete. |
+| `OutputOptions` | Type deleted. Dies with `PromptGenerateOptions`. |
+| `ResumeOptions` | No longer top-level. Passed as `resume=` kwarg on prompt methods. |
+| `tool_response` | Only 3 sample usages. JS/Go use a method on the tool instance. |
+| `GENKIT_CLIENT_HEADER` / `GENKIT_VERSION` | Previously deep internal import (`genkit.core.constants`). Now in top-level `from genkit import ...`. |
 
-  **The problem:** Genkit needs a dev-mode reflection server (HTTP API on a separate port) so the Genkit Dev UI can introspect the running app. In JS and Go, this is fully automatic — the `Genkit` constructor (JS) or `Init()` (Go) starts the reflection server internally because JS has an ambient event loop and Go has goroutines. The user never touches it.
+### Removed from domain sub-modules
 
-  Python can't copy this because:
-  1. **No event loop at construction time.** `Genkit()` is called synchronously at module level. There's no running `asyncio` event loop yet — you can't start an async HTTP server from a synchronous constructor.
-  2. **ASGI server ownership.** In the FastAPI/Flask use case, uvicorn owns the process and the event loop. Genkit is a library inside someone else's ASGI app — it can't spin up a second server on its own.
+| Symbol | Module | Why |
+|---|---|---|
+| `RetrieverOptions` | `genkit.retriever` | Type deleted. Flatten to kwargs on `define_retriever()`. |
+| `IndexerOptions` | `genkit.retriever` | Type deleted. Flatten to kwargs on `define_indexer()`. |
+| `EmbedderOptions` | `genkit.embedder` | Type deleted. Flatten to kwargs on `define_embedder()`. |
+| `RerankerOptions` | `genkit.reranker` | Type deleted. Flatten to kwargs on `define_reranker()`. |
 
-  When Genkit owns the process (`ai.run_main(coro)`), the reflection server starts automatically (same as JS/Go). The problem is only the "Genkit as a library" path (FastAPI/Flask), where the plugin currently needs four internal imports to wire up the reflection server manually.
+### Removed helpers (no longer importable)
 
-  **Fix: add `await ai.start_dev_server()` method on `Genkit`.** One async method that encapsulates all the wiring (creates reflection app from its own registry, binds a port, starts uvicorn, registers with RuntimeManager). The fastapi plugin's lifespan becomes trivial:
+| Helper | Why |
+|---|---|
+| `get_logger` | Structlog wrapper. Use stdlib `logging`. Neither JS nor Go force a logging library. |
+| `text_from_content` | Use veneers instead: `response.text`, `message.text`, `doc.text`. |
+| `dump_dict` / `dump_json` | Fix at source — `GenkitBaseModel` defaults handle this. See [sdk_design §9](./python_beta_sdk_design.md). |
+| `get_callable_json` | Dies with `core/flows.py`. |
+| `create_flows_asgi_app` | Cloud Functions pattern — doesn't fit Python where FastAPI/Flask own routing. |
 
-  ```python
-  cleanup = await ai.start_dev_server()
-  yield
-  await cleanup()
-  ```
+### Internalized (no longer importable)
 
-  This eliminates `create_reflection_asgi_app`, `RuntimeManager`, `ServerSpec`, and `Registry` from external consumption. All four stay internal. The multi-server sample also uses `ai.start_dev_server()` instead of manually wiring internals.
+| Symbol | Why |
+|---|---|
+| `to_json_schema` | `define_*` accepts types directly — no plugin needs manual conversion. Moves to `core/_internal/_schema.py`. See [sdk_design §10](./python_beta_sdk_design.md). |
+| `extract_json` | Zero plugin consumers. Only used internally by `formats/`. Moves to `core/_internal/_extract.py`. |
 
-### `GenerateResponse` naming: veneer vs. schema type
+JS exports both (`genkit/schema`, `genkit/extract`) but no JS plugin imports them either.
 
-`GenerateResponse` appears in two places:
+### Moved to plugins (out of core SDK)
 
-- **`from genkit import GenerateResponse`** — the **veneer** with `.text`, `.output`, `.tool_requests` (aliased from `GenerateResponseWrapper`). This is what app developers use.
-- **`from genkit.model import GenerateResponse`** — the **schema type** (auto-generated wire format). This is what model handlers receive and return.
-
-These are different classes in different modules. No shadowing occurs because a file imports from one or the other, never both. The veneer extends the schema type (inheritance), so they're compatible.
-
-### Cross-language comparison
-
-- **JS** — `import { genkit } from 'genkit'` for common types, `import { ... } from 'genkit/model'` / `'genkit/retriever'` / etc. for domain-specific types.
-- **Go** — `import "github.com/firebase/genkit/go/genkit"` for the framework, `import "github.com/firebase/genkit/go/ai"` for all domain types (single package).
-- **Python (proposed)** — `from genkit import Genkit, Part, Message` for common types, `from genkit.model import ...` / `from genkit.retriever import ...` / etc. for domain-specific types.
-
-Python follows the JS pattern — common types in the top-level import, domain-specific types in sub-modules organized by action type.
+| Symbol | Destination | Why |
+|---|---|---|
+| `AdjustingTraceExporter` | telemetry plugin | JS equivalent is in `js/plugins/google-cloud/`, not core. |
+| `RealtimeSpanProcessor` | telemetry plugin | Telemetry implementation detail. |
+| `RedactedSpan` | telemetry plugin | Only used by `AdjustingTraceExporter`. |
 
 ---
 
-## Internal modules
-
-Everything under `genkit._core`, `genkit._blocks`, and `genkit._ai` (note underscore prefix) carries no stability guarantee. Today these modules lack the underscore (`genkit.core`, `genkit.blocks`, `genkit.ai`), which is why samples and the documentation agent used internal paths. Renaming them is part of this proposal — the underscore is Python's convention for "private, use at your own risk."
+## 3. What we added to imports (and why)
 
-The domain sub-modules (`genkit.model`, `genkit.retriever`, etc.) are **re-export facades** — they import from the internal modules and re-export a curated public surface. The actual implementation stays in `genkit._blocks` and `genkit._core`. This decouples the public API from internal code organization, so internal refactors don't break users.
+### Added to `from genkit import ...`
 
----
+| Symbol | Why |
+|---|---|
+| `StreamResponse` | Renamed from `GenerateStreamResponse`. Return type of all streaming APIs. Previously not importable — `generate_stream()` returned a raw tuple. |
+| `GenerateResponseChunk` | Veneer alias. Previously not exported from top level. |
+| `ToolInterruptError` | User-facing error type for tool interrupts. Previously only importable from internal path. |
+| `ToolChoice` | Tool selection control for `generate()`. Previously internal. |
+| `StatusCodes` | Error status codes for plugin authors. Previously only in `genkit.core`. |
+| `ReasoningPart` | Content type for chain-of-thought. New model capability. |
+| `DataPart` / `CustomPart` | Content types that were missing from top-level exports. |
 
-## Changes from status quo
+### Added to `genkit.model`
 
-What plugins and samples currently do that needs to change. This is the migration work.
+| Symbol | Why |
+|---|---|
+| `compute_usage_stats` | Renamed from `get_basic_usage_stats`. Centralized — avoids each plugin re-inventing token counting. |
+| `resolve_api_key` | Resolves per-request API key vs plugin default. Previously duplicated across plugins. |
 
-### Removed from public API
-
-- **`text_from_content`** — standalone function for extracting text from `list[Part]`. Consumers should use veneers instead: `GenerateResponse.text`, `MessageWrapper.text`, `GenerateResponseChunkWrapper.text`, or `Document.text()`. Affected: google-genai reranker, internal middleware, tests.
-- **`tool_response`** — only 3 sample usages. JS/Go use a method on the tool instance.
-- **`get_logger`** — thin wrapper around `logging.getLogger("genkit")`. Consumers should use stdlib `logging` directly. Affected: 15+ plugins, 10+ samples (trivial change).
-- **`GenkitRegistry`** — internal implementation type. Should not be imported by plugins.
-- **`SimpleRetrieverOptions`** — internal implementation type.
-
-### Moved to plugins (out of core)
-
-- **`AdjustingTraceExporter`** — base class for trace exporters that adjust spans before export. Currently in `genkit.core.trace.adjusting_exporter`. Should move to the telemetry plugin (e.g., `genkit-google-cloud`). JS and Go both keep this in their cloud plugins, not core.
-- **`RedactedSpan`** — span wrapper that redacts `genkit:input`/`genkit:output` attributes. Same file as `AdjustingTraceExporter`. Should move with it. No equivalent in JS/Go core.
-
-### Reorganized (new public paths)
-
-- **`genkit.types` → `genkit`** — all app developer types unified into `from genkit import ...`. No separate `genkit.types` import.
-- **`genkit.plugin` → domain sub-modules** — plugin types split by action type: `genkit.model`, `genkit.retriever`, `genkit.embedder`, `genkit.reranker`, `genkit.evaluator`, `genkit.telemetry`. 
-- **`Plugin` class** — moved from top-level `genkit` to shared across domains (imported by plugin authors, not app developers).
-- **`FlowWrapper`** — moved from internal to a web sub-module export.
-- **`ContextProvider`, `RequestData`** — moved from internal to a web sub-module export.
-- **`create_flows_asgi_app`** — moved from internal to a web sub-module export.
+---
 
-### Renamed
+## 4. Internal design decisions
 
-- **`GenerateResponseWrapper` → `GenerateResponse`** (at the `genkit` top-level) — alias removes the "Wrapper" suffix leak.
-- **`genkit.core` → `genkit._core`**, **`genkit.blocks` → `genkit._blocks`**, **`genkit.ai` → `genkit._ai`** — underscore prefix signals internal. This is what breaks all the existing direct imports from plugins/samples.
+The following design changes affect the public API indirectly. Full details
+(rationale, import DAG, migration plans, open questions) are in
+[python_beta_sdk_design.md](./python_beta_sdk_design.md).
 
-### Now explicitly internal (plugins must stop importing)
+**Serialization cleanup (`GenkitBaseModel`)** — Internal base class that
+defaults `model_dump()` to `exclude_none=True, by_alias=True`. Eliminates
+`dump_dict`/`dump_json` wrappers and fixes 11 inconsistent serialization calls.
+See [sdk_design §9](./python_beta_sdk_design.md).
 
-These are things plugins/samples currently import from internal paths. After the rename to `_core`/`_blocks`/`_ai`, these imports break. The public replacements are listed.
+**`define_*` accepts raw Python types** — `define_model`, `define_retriever`,
+etc. accept `type | dict | None` directly instead of requiring pre-converted
+JSON Schema dicts. `to_json_schema` and `extract_json` move to
+`core/_internal/`. See [sdk_design §10](./python_beta_sdk_design.md).
 
-- `from genkit.blocks.model import ...` → `from genkit.model import ...`
-- `from genkit.blocks.retriever import ...` → `from genkit.retriever import ...`
-- `from genkit.blocks.reranker import ...` → `from genkit.reranker import ...`
-- `from genkit.blocks.document import Document` → `from genkit import Document`
-- `from genkit.core.typing import ...` → `from genkit import ...` (for content types) or `from genkit.model import ...` (for wire-format types)
-- `from genkit.core.action import Action, ActionRunContext` → `from genkit import ActionRunContext` or `from genkit.model import Action`
-- `from genkit.core.error import GenkitError` → `from genkit import GenkitError`
-- `from genkit.core.logging import get_logger` → `import logging; logger = logging.getLogger("genkit")`
-- `from genkit.core.http_client import get_cached_client` → stays internal (no public replacement yet; reconsider for export)
-- `from genkit.codec import dump_dict, dump_json` → stays internal (no public replacement)
-- `from genkit.core.registry import Registry` → stays internal (code smell; use `Genkit` instance)
-- `from genkit.core.reflection import create_reflection_asgi_app` → stays internal
-- `from genkit.ai._runtime import RuntimeManager` → stays internal
-- `from genkit.ai._server import ServerSpec` → stays internal
-- `from genkit.blocks.resource import matches_uri_template` → stays internal (MCP plugin should own this)
+**`ErrorResponse` consolidation** — Replaces 3 error wire format types with a
+single internal Pydantic model. See [sdk_design §11](./python_beta_sdk_design.md).
 
 ---
-
-## Appendix
-
-### Type architecture
-
-Two layers of types:
-
-**Schema types (auto-generated).** Auto-generated from `genkit-schemas.json` (shared cross-language schema). Plain Pydantic `BaseModel` classes — data containers with no convenience methods. These are the contract between the framework and plugins. A model plugin receives a `GenerateRequest` and returns a `GenerateResponse`. Content-building types (`Message`, `Part`, `TextPart`, etc.) are also schema types — app developers construct them directly.
-
-**Veneers (hand-written wrappers).** Extend schema types with convenience methods (`.text`, `.output`, `.tool_requests`). `GenerateResponseWrapper` extends `GenerateResponse` via inheritance — aliasing it as `GenerateResponse` publicly is safe. `MessageWrapper` wraps `Message` via composition — its constructor takes a `Message` instance, so aliasing as `Message` would break `Message(role="user", content=[...])`. Users interact with `MessageWrapper` through `response.messages` but never construct it.
diff --git a/py/docs/python_beta_sdk_design.md b/py/docs/python_beta_sdk_design.md
index c9baf3116a..98643c47ff 100644
--- a/py/docs/python_beta_sdk_design.md
+++ b/py/docs/python_beta_sdk_design.md
@@ -1,4 +1,10 @@
-# Genkit Python SDK — API Design Review
+# Genkit Python SDK — Design Review
+
+Related docs:
+- [python_beta_api_proposal.md](./python_beta_api_proposal.md) — public API surface (what's importable)
+- [python_package_reorg.md](./python_package_reorg.md) — internal package structure
+- [python_type_audit_checklist.md](./python_type_audit_checklist.md) — type deletions/fixes
+- [python_beta_sdk_audit.md](./python_beta_sdk_audit.md) — initial friction audit
 
 ## 1. Background
 
@@ -6,7 +12,7 @@ The Python SDK launched to match JS and Go feature timelines. It achieved featur
 
 The Python SDK is public but hasn't cut a stable release. The JS SDK went through a similar cleanup between v0.5 and v1.0, and the migration cost grew with each release. Python is earlier in that curve and changes are still cheap.
 
-In this doc, we're laying out some guiding principles for designing the API so we have more consistency and standardization for adding new framework features going forward.
+This doc covers **design decisions** — the "how" and "why" behind internal architecture choices. For the public import surface (the "what"), see [python_beta_api_proposal.md](./python_beta_api_proposal.md).
 
 ## 2. Principles
 
@@ -26,40 +32,33 @@ ai.generate(model="gemini", prompt="Hi", tools=["search"])
 
 **Kwargs over options dicts.** JS groups parameters into an options object. Python has first-class keyword arguments. Dict-based configuration loses autocomplete, type checking, and discoverability. This applies to `generate()`, `prompt()`, and every public method.
 
-**Flat imports, intentional boundaries.** Python has no access modifiers — any module is importable, and there's no way to enforce "private." This makes API boundary design a deliberate choice, not a language feature. We define three public entry points (`genkit`, `genkit.types`, `genkit.plugin`) and treat everything else as internal with no stability guarantee. Internal modules should be underscore-prefixed (`genkit._core`, `genkit._blocks`) to signal this — today they lack the underscore, which is why samples accidentally depend on them. The mechanics of this boundary are covered in section 4.
-
-^^ genkit.plugin => for core genkit plugin imports (used by plugin author only)
-^^ genkit.plugins.___ => for actual plugin imports exposed by plugin authors
+**Flat imports, intentional boundaries.** Python has no access modifiers — any module is importable, and there's no way to enforce "private." This makes API boundary design a deliberate choice, not a language feature. Public entry points are `from genkit import ...` (app developers) and domain sub-modules like `genkit.model`, `genkit.retriever`, `genkit.tracing` (plugin authors). Internal modules use `_internal/` directories following the Pydantic v2 convention. There is no public `genkit.core` or `genkit.ai` import path — those are internal structure only. Full symbol lists are in [python_beta_api_proposal.md](./python_beta_api_proposal.md). The internal package structure is in [python_package_reorg.md](./python_package_reorg.md).
 
 ## 3. Initial Audit
 
 While working on updated docs, we identified several friction points in the developer experience. 
 
-For many of these friction points, there was a clear Pythonic standard to follow — keyword-only arguments on all methods, sequence protocol on `RetrieverResponse`, convenience properties like `response.media`, veneer aliasing (`GenerateResponseWrapper` → `GenerateResponse`), and cleanup of internal utilities from the public surface. More details here: [python_beta_sdk_audit.md](./py/docs/python_beta_sdk_audit.md)
+For many of these friction points, there was a clear Pythonic standard to follow — keyword-only arguments on all methods, sequence protocol on `RetrieverResponse`, convenience properties like `response.media`, veneer aliasing (`GenerateResponseWrapper` → `GenerateResponse`), and cleanup of internal utilities from the public surface. More details here: [python_beta_sdk_audit.md](./python_beta_sdk_audit.md)
 
 The remaining sections in this doc are open questions that need some discussion to resolve.
 
 ## 4. Public API surface & type architecture
 
-Today there is no formal public/internal boundary. The documentation audit found samples importing from `genkit.core.action`, `genkit.blocks.model`, and `genkit.ai` — all internal paths that happen to work. This means any internal module rename or refactor is a breaking change for external developers, even if the public API hasn't changed. App developers and plugin authors share a single `genkit.types` module, which means app developers are exposed to plugin contract types they'll never use — and plugin authors have to sift through content types to find the schema types they need. Wrapper classes are exported under internal names like `GenerateResponseWrapper`, so the implementation detail of "this is a wrapper around an auto-generated type" leaks into every type hint and docstring.
-
-We propose formalizing three entry points, separated by audience:
-
-- **`from genkit import ...`** — App developers. ~22 symbols: framework objects (`Genkit`, `ActionRunContext`, `GenerateResponse`, `Prompt`, `GenkitError`, `UserFacingError`) and content/data types (`Part`, `Message`, `Document`, `Role`, `ToolChoice`, `GenerationCommonConfig`, etc.) in a single import. `genkit.types` remains as a backward-compatible re-export but is no longer the canonical path.
-- **`from genkit.plugin import ...`** — Plugin authors. Plugin contract: `Plugin`, `GenerateRequest`, `GenerateResponse` (schema), `OutputConfig`, `ModelInfo`, metadata builders, etc.
+Today there is no formal public/internal boundary. The documentation audit found samples importing from `genkit.core.action`, `genkit.blocks.model`, and `genkit.ai` — all internal paths that happen to work. This means any internal module rename or refactor is a breaking change for external developers, even if the public API hasn't changed.
 
-Internal modules (`genkit.core`, `genkit.blocks`, `genkit.ai`) would be renamed with underscore prefixes (`genkit._core`, `genkit._blocks`) to signal "private, no stability guarantee" — the standard Python convention.
+**Resolved decisions:**
 
-The full proposal — including the type architecture (auto-generated schema types vs hand-written veneers vs config helpers), symbol lists, rationale for each inclusion/exclusion, and the `MessageWrapper` aliasing problem — is in [python_beta_api_proposal.md](./python_beta_api_proposal.md).
+- **Single entry point.** `from genkit import ...` covers both app developers (~25 symbols) and plugin authors (~9 additional). No separate `genkit.types` or `genkit.plugin`. Domain sub-modules (`genkit.model`, `genkit.retriever`, `genkit.tracing`, etc.) provide wire-format types for plugin authors who need them.
 
+- **No public `genkit.core`.** Internal packages (`core/`, `ai/`) use `_internal/` directories following Pydantic v2's convention. `genkit/__init__.py` re-exports everything users need. See [python_package_reorg.md](./python_package_reorg.md) for the full structure.
 
-^^^ Upon discussion, we got more details on aliasing. App developers may need access to the wire format for unit testing. They are more likely to need that actually vs. the veneer (which I think is handled internally). Also I remember Pavel said something about flow vs. generate. One returns veneer vs. other returns the wire format. He said app developer may need to use one or the other.
+- **Veneer aliasing.** `GenerateResponseWrapper` → `GenerateResponse` via inheritance (so `isinstance` works). `MessageWrapper` stays as-is because it uses composition — aliasing would break `Message(role="user", content=[...])`. App developers get `MessageWrapper` via `response.messages` but never construct it directly.
 
-**Resolved — unified import:** No clear reason to separate `from genkit import ...` and `from genkit.types import ...`. Merged into a single entry point. See [python_beta_api_proposal.md](./python_beta_api_proposal.md).
+- **`__all__` on every public `__init__.py`.** Enforced by `import-linter` in CI.
 
-^^^ Audit what's exposed via __all__ in all the packages (there are some random helpers for example)
+- **Internal code organization.** `blocks/` is deleted (merged into `ai/`). `aio/`, `lang/`, `types/` are deleted (absorbed into `core/_internal/`). `web/` renamed to `_web/`. See [python_package_reorg.md](./python_package_reorg.md).
 
-^^^ Consider internal code organization as well, what goes in blocks? core? web? types? Internal code organization is somewhat generic/sprawling/unopinionated 
+Full symbol lists and rationale for each inclusion/exclusion: [python_beta_api_proposal.md](./python_beta_api_proposal.md).
 
 ## 5. Output configuration
 
@@ -325,3 +324,211 @@ async def embed(
 ```
 
 Same treatment — `*` marker, `embedder` and `content` become required.
+
+## 9. Serialization cleanup — `GenkitBaseModel`
+
+### The problem
+
+Every Genkit type extends raw `pydantic.BaseModel`. Serialization to the wire
+(camelCase JSON, no null fields) requires passing two flags every time:
+
+```python
+obj.model_dump(exclude_none=True, by_alias=True)
+```
+
+Nobody remembers both flags. So `codec.py` provides `dump_dict()` and `dump_json()`
+wrappers. But call sites are split three ways:
+
+| Pattern | Correct? | Count |
+|---|---|---|
+| `dump_dict(obj)` / `dump_json(obj)` | Yes (both flags) | ~20 calls across 13 files |
+| `.model_dump(exclude_none=True, by_alias=True)` | Yes (both flags) | 5 calls |
+| `.model_dump()` with partial or no flags | **No** | **11 calls** |
+
+### The fix: `GenkitBaseModel`
+
+Pydantic's `model_config` doesn't support `exclude_none` as a config key — it's
+a parameter to `model_dump()`. So we override the methods to change the defaults:
+
+```python
+from pydantic import BaseModel, ConfigDict
+from pydantic.alias_generators import to_camel
+
+class GenkitBaseModel(BaseModel):
+    model_config = ConfigDict(
+        populate_by_name=True,
+        alias_generator=to_camel,
+    )
+
+    def model_dump(self, *, exclude_none=True, by_alias=True, **kwargs):
+        return super().model_dump(exclude_none=exclude_none, by_alias=by_alias, **kwargs)
+
+    def model_dump_json(self, *, exclude_none=True, by_alias=True, **kwargs):
+        return super().model_dump_json(exclude_none=exclude_none, by_alias=by_alias, **kwargs)
+```
+
+Now `obj.model_dump()` does the right thing. You can still override:
+`obj.model_dump(exclude_none=False)` when you actually want nulls.
+
+**Where it lives:** `genkit/core/_internal/_base.py` — Level 0 in the import DAG
+(see §12). Zero genkit imports, no circular import risk.
+
+**Not re-exported.** `GenkitBaseModel` is strictly internal. App developers
+construct `Message(...)`, `Document(...)`, etc. and never see the base class.
+Plugin authors extend exported types like `GenerationCommonConfig` or use plain
+`BaseModel` for plugin-internal types.
+
+### Migration plan
+
+| Step | Scope | Risk |
+|---|---|---|
+| Create `GenkitBaseModel` in `genkit.core._internal._base` | 1 file | None |
+| Change core schema types to inherit from it | ~10 files in `genkit/core/`, `genkit/blocks/` | Low — behavioral change only on direct `.model_dump()` calls |
+| Audit the 11 inconsistent calls — some may intentionally want no aliases | Case-by-case | Medium — need to check if any internal-only paths rely on snake_case keys |
+| Simplify `dump_dict`/`dump_json` | `codec.py` | Low |
+| Remove `dump_dict`/`dump_json` from public API | `__init__.py` | None — already proposed for removal |
+
+### Open questions
+
+1. **Do any internal paths intentionally use snake_case keys?** The `prompt.py`
+   calls that skip `by_alias` might be feeding data back into `model_validate()`,
+   where snake_case is fine. Need to audit each of the 11 sites.
+
+2. **`document.py` dedup hash** — uses `model_dump_json()` with no flags for
+   equality comparison. If we change defaults, the hash changes for any model
+   that has aliases. This could break dedup for in-flight data. Probably fine
+   (dedup is ephemeral), but worth noting.
+
+3. **Third-party model types** — e.g. Google AI SDK types that Genkit wraps.
+   These won't inherit `GenkitBaseModel`, so `dump_dict()` still needs to handle
+   the `isinstance(obj, BaseModel)` case with explicit flags. Or we only use
+   `dump_dict` for third-party types and `.model_dump()` for our own.
+
+## 10. `define_*` should accept raw Python types
+
+### The problem
+
+17 plugins and 8 core files call `to_json_schema()` manually before passing
+schemas to `define_model`, `define_retriever`, etc.:
+
+```python
+# Current — every plugin does this:
+from genkit.core.schema import to_json_schema
+
+ai.define_model(
+    name='my-model',
+    metadata={'model': {'customOptions': to_json_schema(MyConfig)}},
+    config_schema=to_json_schema(MyConfig),
+    ...
+)
+```
+
+This is unnecessary boilerplate. The framework should handle the conversion.
+
+### Cross-language comparison
+
+- **JS** — `toJsonSchema` is public at `genkit/schema` (alongside `parseSchema`,
+  `validateSchema`, `JSONSchema`). But JS `defineModel` also accepts Zod schemas
+  directly — plugins don't *have* to call `toJsonSchema` manually.
+- **Go** — `jsonschema.Reflect()` is internal. `defineModel` in `ai/gen.go`
+  accepts Go types and converts internally.
+- **Python** — `to_json_schema` is public but lives at a deep path
+  (`genkit.core.schema`). And `define_*` functions *require* pre-converted dicts.
+
+Python is the only SDK where plugins are *forced* to call the schema conversion
+themselves. JS has it public but optional; Go internalizes it entirely.
+
+### The fix
+
+`define_*` functions accept `type | dict | None` directly:
+
+```python
+# After — plugins just pass the type:
+ai.define_model(
+    name='my-model',
+    config_schema=MyConfig,   # Python type, not JSON Schema dict
+    ...
+)
+```
+
+The framework calls `to_json_schema()` internally when building action metadata.
+Same for `define_retriever`, `define_embedder`, `define_reranker`, `define_evaluator`.
+
+`to_json_schema` moves to `core/_internal/_schema.py`. No plugin needs it.
+`extract_json` moves to `core/_internal/_extract.py`. Zero plugin consumers —
+only used by `formats/` internally.
+
+### Migration
+
+| Step | Scope | Risk |
+|---|---|---|
+| Update `define_*` signatures to accept `type \| dict \| None` | ~6 functions in ai/ | Low — dict passthrough preserves backward compat |
+| Move `to_json_schema` calls inside `define_*` functions | Same 6 functions | Low |
+| Move `schema.py` to `core/_internal/_schema.py` | 1 file | None |
+| Move `extract.py` to `core/_internal/_extract.py` | 1 file | None |
+| Update 16 plugins to drop `to_json_schema` import + calls | 16 plugin files | Medium — mechanical but wide |
+
+## 11. `ErrorResponse` — internal type consolidation
+
+Replaces 3 error wire format types (`HttpErrorWireFormat`,
+`GenkitReflectionApiDetailsWireFormat`, `GenkitReflectionApiErrorWireFormat`).
+Single Pydantic model with `message`, `status`, `details: dict | None`.
+Internal only — used by the reflection server (`_web/_reflection.py`).
+
+## 12. Import DAG
+
+The internal import graph of the `genkit` package, simplified. Every new module
+or dependency should be evaluated against this to prevent circular imports.
+
+```
+Level 0  (no genkit imports — leaf modules):
+  core/_internal/_base.py      GenkitBaseModel
+  core/_internal/_compat.py    StrEnum, override, wait_for backfills
+  core/_internal/_schema.py    to_json_schema
+  core/_internal/_extract.py   extract_json, extract_items
+  core/_internal/_constants.py GENKIT_VERSION, GENKIT_CLIENT_HEADER
+  core/_internal/_logging.py   get_logger
+
+Level 1  (imports Level 0 only):
+  core/_internal/_typing.py    60+ BaseModel classes (imports _compat + _base)
+  core/error.py                GenkitError, UserFacingError, StatusCodes, Status
+                               (absorbs status_types.py — imports _base only)
+
+Level 2  (imports Level 0–1):
+  core/action.py               Action, ActionRunContext, ActionMetadata, ActionKind,
+                               ActionResponse (absorbs action_types.py)
+  core/_internal/_registry.py  Registry
+  core/_internal/_context.py   RequestData, ContextMetadata
+  core/_internal/_environment.py  EnvVar, is_dev_environment
+  core/_internal/_aio.py       Channel, run_async, ensure_async
+  core/_internal/_http_client.py  per-event-loop httpx.AsyncClient cache
+  core/plugin.py               Plugin ABC
+  core/_internal/_flow.py      FlowWrapper (~50 lines)
+  core/_internal/_background.py  BackgroundAction (imports action, error)
+  core/_internal/_dap.py       DynamicActionProvider (imports action)
+
+Level 3  (imports Level 0–2):
+  ai/model.py                  define_model, GenerateResponseWrapper, etc.
+  ai/retriever.py              define_retriever, RetrieverRef, etc.
+  ai/embedding.py              define_embedder, EmbedderRef, etc.
+  ai/evaluator.py              define_evaluator, EvaluatorRef
+  ai/tools.py                  define_tool, ToolRunContext
+  ai/prompt.py                 ExecutablePrompt, define_prompt
+  ai/_internal/_generate.py    generate() orchestration, tool loop
+  ai/_internal/_dotprompt.py   dotprompt template engine
+
+Level 4  (imports Level 0–3):
+  ai/_internal/_genkit.py      Genkit class body
+  ai/_internal/_genkit_base.py Genkit __init__, server startup
+  _web/_reflection.py          Dev UI ASGI app
+  _web/_runtime.py             RuntimeManager
+```
+
+**Rules:**
+- Each level may only import from levels below it.
+- `core/` has zero imports from `ai/` or `_web/`.
+- `ai/` has zero imports from `_web/`.
+- All `_internal/` modules are plumbing — can change between versions. Parent packages re-export what's needed. `import-linter` blocks plugins from importing `_internal/` paths.
+- `core/` has only 3 package-level files (`action.py`, `error.py`, `plugin.py`) — everything else is `_internal/`. These are stable abstractions listed in `core/__init__.py`'s `__all__`. They can still have `_`-prefixed private helpers inside — normal Python.
+- Since there's no public `genkit.core` or `genkit.ai` import path, the split is for SDK developer clarity, not external API.
+- Enforced by `import-linter` in CI (see [python_package_reorg.md](./python_package_reorg.md)).
diff --git a/py/docs/python_package_reorg.md b/py/docs/python_package_reorg.md
index 857e93e232..a83a09ea11 100644
--- a/py/docs/python_package_reorg.md
+++ b/py/docs/python_package_reorg.md
@@ -5,8 +5,8 @@ enforce public/internal boundaries, and split oversized files.
 
 Related docs:
 - [python_beta_type_design.md](./python_beta_type_design.md) — type audit
-- [python_type_audit_checklist.md](./python_type_audit_checklist.md) — checklist
-- [python_beta_api_proposal.md](./python_beta_api_proposal.md) — public API surface
+- [python_type_audit_checklist.md](./python_type_audit_checklist.md) — checklist (33 types deleted, affects file contents below)
+- [python_beta_api_proposal.md](./python_beta_api_proposal.md) — public API surface + `GenkitBaseModel` serialization fix + `define_*` accepts raw types (schema/extract internalized)
 - [GENKIT_CLASS_DESIGN.md](../GENKIT_CLASS_DESIGN.md) — Genkit class
 
 ---
@@ -59,15 +59,15 @@ genkit/
 ├── __init__.py              public API barrel (__all__ defined)
 ├── ai/                      AI domain types + Genkit class
 │   ├── __init__.py          public exports (__all__ defined)
-│   ├── prompt.py            ExecutablePrompt + define_prompt (like Go ai/prompt.go)
+│   ├── prompt.py            ExecutablePrompt + define_prompt
 │   ├── streaming.py         GenerateStreamResponse
 │   ├── model.py             GenerateResponseWrapper, ChunkWrapper, MessageWrapper,
 │   │                        ModelReference, GenerationCommonConfig, define_model,
 │   │                        resolve_api_key, compute_usage_stats
 │   ├── document.py          Document, RankedDocument
-│   ├── retriever.py         RetrieverRef, RetrieverOptions, define_retriever, etc.
-│   ├── embedding.py         Embedder, EmbedderRef, EmbedderOptions, define_embedder
-│   ├── reranker.py          RerankerRef, RerankerOptions, define_reranker
+│   ├── retriever.py         RetrieverRef, define_retriever, etc. (RetrieverOptions deleted — kwargs)
+│   ├── embedding.py         Embedder, EmbedderRef, define_embedder (EmbedderOptions deleted — kwargs)
+│   ├── reranker.py          RerankerRef, define_reranker (RerankerOptions deleted — kwargs)
 │   ├── evaluator.py         EvaluatorRef, define_evaluator
 │   ├── tools.py             ToolRunContext, ToolInterruptError, define_tool
 │   ├── resource.py          resource actions, define_resource
@@ -77,46 +77,52 @@ genkit/
 │   ├── _internal/
 │   │   ├── _genkit.py       Genkit class body (from ai/_aio.py)
 │   │   ├── _genkit_base.py  Genkit __init__, server startup (from ai/_base_async.py)
-│   │   ├── _prompt_render.py  dotprompt rendering + PromptCache (split from blocks/prompt.py)
+│   │   ├── _dotprompt.py    dotprompt template engine — render_*, file loading, PromptCache
 │   │   ├── _generate.py     generate() orchestration, tool loop (from blocks/generate.py)
 │   │   ├── _middleware.py    model middleware execution
 │   │   └── _messages.py     message construction helpers
 │
 ├── core/                    framework primitives (not AI-specific)
 │   ├── __init__.py          public exports (__all__ defined)
-│   ├── action.py            Action, ActionRunContext, ActionMetadata (flattened)
-│   ├── action_types.py      ActionKind, ActionResponse, ActionMetadataKey
-│   ├── error.py             GenkitError, UserFacingError
+│   ├── action.py            Action, ActionRunContext, ActionMetadata, ActionKind,
+│   │                         ActionResponse, ActionMetadataKey (flattened —
+│   │                         absorbs action_types.py, 18 consumers, same concept)
+│   ├── error.py             GenkitError, UserFacingError, StatusCodes, Status,
+│   │                         http_status_code (absorbs status_types.py — only consumer)
 │   ├── plugin.py            Plugin ABC
-│   ├── flow.py              FlowWrapper (generic streaming wrapper)
-│   ├── background.py        BackgroundAction (start/check/cancel pattern)
-│   ├── dap.py               DynamicActionProvider, DapConfig
-│   ├── status_types.py      StatusCodes, Status
-│   ├── typing.py            auto-generated schema types (DO NOT EDIT header)
 │   ├── _internal/
+│   │   ├── _typing.py       auto-generated schema types (DO NOT EDIT header).
+│   │   │                     60+ BaseModel classes. Re-exported via genkit/__init__.py
+│   │   │                     and domain sub-modules. Nobody imports this directly.
+│   │   ├── _base.py         GenkitBaseModel (Pydantic base with exclude_none + by_alias defaults)
+│   │   ├── _compat.py       StrEnum (3.10), override (3.11), wait_for (3.10) backfills
+│   │   │                     (absorbs aio/_compat.py — dies when min Python ≥ 3.12)
 │   │   ├── _registry.py     Registry class
 │   │   ├── _server.py       ServerSpec (reflection API config — moved from ai/)
-│   │   ├── _runtime.py      RuntimeManager (.genkit/runtimes/ files — moved from ai/)
-│   │   ├── _flows.py        flow registration helpers
 │   │   ├── _context.py      RequestData, ContextMetadata
-│   │   ├── _tracing.py      tracing setup, span creation
-│   │   ├── _trace/          OTel exporters and processors
-│   │   │   ├── _default_exporter.py
-│   │   │   ├── _adjusting_exporter.py
-│   │   │   ├── _realtime_processor.py
-│   │   │   └── _types.py    GenkitSpan
-│   │   ├── _schema.py       schema utilities, to_json_schema
-│   │   ├── _extract.py      JSON extraction from text
-│   │   ├── _codec.py        dump_dict, dump_json (from root codec.py)
-│   │   ├── _http_client.py  HTTP client helpers
-│   │   ├── _environment.py  EnvVar, GenkitEnvironment
-│   │   ├── _aio.py          Channel, loop utils (from aio/)
-│   │   ├── _logging.py      get_logger
-│   │   ├── _constants.py
-│   │   └── _deprecations.py (from lang/)
+│   │   ├── _tracing.py      tracing setup, span creation, dev UI exporter,
+│   │   │                     RealtimeSpanProcessor (~350 lines merged from
+│   │   │                     tracing.py + default_exporter.py + realtime_processor.py.
+│   │   │                     AdjustingTraceExporter + RedactedSpan moved to
+│   │   │                     telemetry plugin — 5 plugins import, 0 core files do.)
+│   │   ├── _http_client.py  HTTP client cache (per-event-loop httpx.AsyncClient — 8 plugins use)
+│   │   ├── _environment.py  EnvVar, GenkitEnvironment, is_dev_environment()
+│   │   ├── _aio.py          Channel, run_async, run_loop, ensure_async, iter_over_async
+│   │   │                     (~500 lines merged from all 4 aio/* files)
+│   │   ├── _schema.py       to_json_schema (internal — define_* accepts types directly)
+│   │   ├── _extract.py      extract_json, extract_items (internal — only used by formats/)
+│   │   ├── _logging.py      get_logger (structlog wrapper — trim 20-method Protocol to ~7)
+│   │   ├── _constants.py    GENKIT_VERSION, GENKIT_CLIENT_HEADER
+│   │   ├── _flow.py         FlowWrapper (~50 lines — users never construct, returned by @ai.flow())
+│   │   ├── _background.py   BackgroundAction (2 internal consumers, not re-exported top-level)
+│   │   └── _dap.py          DynamicActionProvider (1 internal consumer, not re-exported top-level)
+│
+├── tracing.py               tracer, add_custom_exporter (public — matches JS genkit/tracing)
 │
 ├── _web/                    dev server only (all internal)
-│   ├── reflection.py        Dev UI reflection API (moved from core/)
+│   ├── _reflection.py       Dev UI reflection API (moved from core/). Starlette ASGI app
+│   │                         exposing /api/actions, /api/runAction, etc. Only consumer is
+│   │                         the runtime startup code that mounts it on uvicorn.
 │   └── _runtime.py          RuntimeManager — writes .genkit/runtimes/ files
 │
 │   DELETED: web/manager/ (~1,500 lines, 7 types)
@@ -135,80 +141,26 @@ genkit/
 |---|---|
 | **Delete `blocks/`** | All files move into `ai/`. Domain types live where Go/JS put them. |
 | **Delete `aio/`** | `Channel` + loop utils → `core/_internal/_aio.py` |
-| **Delete `lang/`** | `deprecations.py` → `core/_internal/_deprecations.py` |
+| **Delete `lang/`** | `deprecations.py` → inline into google-genai plugin (only consumer). |
 | **Delete `types/`** | Barrel re-export removed. `genkit/__init__.py` handles this. |
 | **Delete `web/manager/`** | ~1,500 lines of unused multi-server orchestration. Reflection server uses raw uvicorn (~15 lines). |
 | **Delete `core/flows.py`** | `create_flows_asgi_app()` — auto-exposes flows as HTTP endpoints. Firebase Cloud Functions pattern that doesn't fit Python (Cloud Functions uses Flask, not ASGI; no `onCallGenkit` for Python). Users should use FastAPI/Flask instead. JS has this (`startFlowServer`) because the Express ecosystem aligns; Python's doesn't. ~370 lines. |
 | **Rename `web/` → `_web/`** | Prefix signals "internal, don't import". Now just reflection + runtime. |
 | **Move `core/reflection.py` → `_web/`** | It's a Starlette ASGI app, not a core primitive. Breaks `core/` → `web/` cycle. |
-| **Move `codec.py`** | → `core/_internal/_codec.py` |
+| **Delete `codec.py`** | `dump_dict`/`dump_json` die with `GenkitBaseModel` (see [python_beta_api_proposal.md §5](./python_beta_api_proposal.md)). Third-party `BaseModel` fallback inlined into `_base.py`. |
 | **Delete `model_types.py`** | `GenerationCommonConfig` → `ai/model.py`. API key helpers renamed to `resolve_api_key` and exposed from `model.py`. `get_basic_usage_stats` renamed to `compute_usage_stats`. |
-| **Move `FlowWrapper`** | `ai/_registry.py` → `core/flow.py` (matches Go/JS) |
-| **Move `BackgroundAction`** | `blocks/background_model.py` → `core/background.py` (matches Go/JS) |
-| **Move `DynamicActionProvider`** | `blocks/dap.py` → `core/dap.py` (matches Go/JS) |
-| **Split `prompt.py`** | 2,446 → ~600 (prompt.py) + ~200 (streaming.py) + ~800 (_prompt_render.py) + ~400 (_prompt_cache.py) |
+| **Merge `action_types.py` into `action.py`** | 95 lines, same 18 consumers, same concept. `ActionKind`, `ActionResponse`, `ActionMetadataKey` live alongside `Action`. |
+| **Merge `status_types.py` into `error.py`** | Only consumer is `error.py`. `StatusCodes`, `Status`, `http_status_code` are tightly coupled with the error hierarchy. |
+| **Move `FlowWrapper` → `_internal/`** | `ai/_registry.py` → `core/_internal/_flow.py`. ~50 lines, 2 consumers, users never construct directly (returned by `@ai.flow()`). |
+| **Move `BackgroundAction` → `_internal/`** | `blocks/background_model.py` → `core/_internal/_background.py`. Not re-exported top-level, only 2 internal consumers. `genkit.model` sub-module re-exports it for plugin authors. |
+| **Move `DynamicActionProvider` → `_internal/`** | `blocks/dap.py` → `core/_internal/_dap.py`. Not re-exported top-level, single internal consumer (`ai/_registry.py`). |
+| **Split `prompt.py`** | 2,446 → ~600 (prompt.py) + ~200 (streaming.py) + ~800 (_dotprompt.py) |
+| **Move `typing.py` → `_internal/`** | `core/typing.py` → `core/_internal/_typing.py`. Auto-generated 60+ `BaseModel` classes. `core/` is not a public import path — public types are re-exported from `genkit/__init__.py` and domain sub-modules. The file is pure plumbing. |
+| **Internalize `schema.py` + `extract.py`** | Both move to `core/_internal/`. `define_*` functions accept raw Python types so no plugin needs `to_json_schema`. `extract_json` has zero plugin consumers — only used by `formats/`. JS exports both publicly but nobody imports them there either. See [python_beta_api_proposal.md §6](./python_beta_api_proposal.md). |
 | **Dissolve `ai/_registry.py`** | define_* functions move to their domain files (like Go). `define_model` → `ai/model.py`, `define_retriever` → `ai/retriever.py`, etc. Genkit method stubs stay in `ai/_internal/_genkit.py`. `_registry.py` ceases to exist. |
 | **Add `_internal/`** | Pydantic v2 pattern: private implementation behind `_internal/` |
 | **Add `__all__`** | Every public `__init__.py` declares its exports |
 
----
-
-## Cross-language alignment
-
-After the reorg, every audited type lands in the same package as Go and JS.
-
-### `core/` — framework primitives (all three SDKs agree)
-
-| Python type | Go equivalent | JS equivalent |
-|---|---|---|
-| `Action` | `core/api/action.go` Action | `core/src/action.ts` Action |
-| `ActionRunContext` | `core/context.go` ActionContext | `core/src/context.ts` ActionContext |
-| `ActionMetadata` | `core/api/action.go` ActionDesc | `core/src/action.ts` ActionMetadata |
-| `ActionKind` | `core/api/action.go` ActionType | `core/src/registry.ts` ActionType |
-| `GenkitError` | `core/error.go` | `core/src/error.ts` |
-| `UserFacingError` | `core/error.go` | `core/src/error.ts` |
-| `Plugin` | `core/api/plugin.go` | `core/src/plugin.ts` PluginProvider |
-| `StatusCodes` | `core/status_types.go` | `core/src/statusTypes.ts` |
-| `FlowWrapper` | `core/flow.go` Flow | `core/src/flow.ts` Flow |
-| `BackgroundAction` | `core/background_action.go` | `core/src/background-action.ts` |
-| `DynamicActionProvider` | `core/api/plugin.go` DynamicPlugin | `core/src/dynamic-action-provider.ts` |
-| `Channel` | N/A (Go built-in) | `core/src/async.ts` |
-| `Registry` | `core/api/registry.go` (interface) | `core/src/registry.ts` |
-
-### `ai/` — AI domain types (all three SDKs agree)
-
-| Python type | Go equivalent | JS equivalent |
-|---|---|---|
-| `Genkit` | `genkit/genkit.go` | `genkit/src/genkit.ts` |
-| `ExecutablePrompt` | `ai/prompt.go` Prompt | `ai/src/prompt.ts` |
-| `GenerateStreamResponse` | N/A (callback-based) | `ai/src/generate.ts` |
-| `GenerateResponseWrapper` | `ai/gen.go` ModelResponse | `ai/src/generate/response.ts` |
-| `GenerateResponseChunkWrapper` | `ai/gen.go` ModelResponseChunk | `ai/src/generate/chunk.ts` |
-| `MessageWrapper` | `ai/gen.go` Message | `ai/src/message.ts` |
-| `Document` | `ai/document.go` | `ai/src/document.ts` |
-| `RankedDocument` | `ai/gen.go` RankedDocumentData | `ai/src/reranker.ts` |
-| `ToolRunContext` | `ai/tools.go` | `ai/src/tool.ts` |
-| `ToolInterruptError` | `ai/tools.go` (unexported) | `ai/src/tool.ts` |
-| `ModelReference` | `ai/generate.go` ModelRef | `ai/src/model.ts` |
-| `EmbedderRef` | `ai/embedder.go` | `ai/src/embedder.ts` EmbedderReference |
-| `RetrieverRef` / `IndexerRef` | `ai/retriever.go` | `ai/src/retriever.ts` |
-| `RerankerRef` | N/A | `ai/src/reranker.ts` RerankerReference |
-| `EvaluatorRef` | `ai/evaluator.go` | `ai/src/evaluator.ts` |
-| `Embedder` | `ai/embedder.go` | `ai/src/embedder.ts` |
-| `EmbedderOptions` / `Supports` | `ai/embedder.go` | `ai/src/embedder.ts` |
-| `RetrieverOptions` / `IndexerOptions` | `ai/retriever.go` | `ai/src/retriever.ts` |
-| `RerankerOptions` | N/A | `ai/src/reranker.ts` |
-| `FormatDef` / `Formatter` | `ai/formatter.go` | `ai/src/formats/types.ts` |
-| `GenerationCommonConfig` | `ai/gen.go` | `ai/src/model-types.ts` |
-| `ActionMetadata` | `core/api/action.go` | `core/src/action.ts` |
-
-Mismatches: **zero.** Every type ends up in the same package as Go and JS.
-
-(`Genkit` is a special case — Go/JS have a separate `genkit/` package, Python uses
-the top-level `genkit/__init__.py`. Same role, different mechanism.)
-
----
-
 ## Plugin import paths — before and after
 
 ### Model plugin (e.g., google-genai gemini.py)
@@ -222,12 +174,13 @@ from genkit.core.error import GenkitError, StatusName
 from genkit.core.tracing import tracer
 from genkit.core.typing import GenerationCommonConfig, Message, ...
 
-# After (2 imports):
+# After (2-3 imports — top-level genkit, genkit.ai, genkit.tracing):
+from genkit import GenkitError, GENKIT_CLIENT_HEADER
+from genkit.tracing import tracer
 from genkit.ai import (
-    ActionRunContext, GenkitError, GenerationCommonConfig,
-    Message, get_basic_usage_stats, dump_dict, dump_json,
+    ActionRunContext, GenerationCommonConfig,
+    Message, compute_usage_stats,
 )
-from genkit.core import tracer, GENKIT_CLIENT_HEADER
 ```
 
 ### Retriever plugin (e.g., vertex-ai vector_search.py)
@@ -236,15 +189,12 @@ from genkit.core import tracer, GENKIT_CLIENT_HEADER
 # Before (5 deep imports):
 from genkit.ai import Genkit
 from genkit.blocks.document import Document
-from genkit.blocks.retriever import RetrieverOptions, retriever_action_metadata
+from genkit.blocks.retriever import retriever_action_metadata
 from genkit.core.action.types import ActionKind
 from genkit.core.schema import to_json_schema
 
-# After (1 import):
-from genkit.ai import (
-    Genkit, Document, RetrieverOptions,
-    retriever_action_metadata, ActionKind, to_json_schema,
-)
+# After (1 import — define_retriever accepts types directly, no manual to_json_schema):
+from genkit import Genkit, Document, ActionKind
 ```
 
 ### Telemetry plugin (e.g., observability)
@@ -255,8 +205,9 @@ from genkit.core.environment import is_dev_environment
 from genkit.core.trace.adjusting_exporter import AdjustingTraceExporter
 from genkit.core.tracing import add_custom_exporter
 
-# After (1 import):
-from genkit.core import is_dev_environment, AdjustingTraceExporter, add_custom_exporter
+# After (2 imports — AdjustingTraceExporter moves to telemetry plugin):
+from genkit import is_dev_environment
+from genkit.tracing import add_custom_exporter
 ```
 
 ---
@@ -283,7 +234,7 @@ up in `core/` by accident, not because they provide core primitives.
 
 ```
 _web/
-├── reflection.py    ← was core/reflection.py
+├── _reflection.py   ← was core/reflection.py
 └── _runtime.py      ← RuntimeManager
 ```
 
@@ -295,9 +246,13 @@ The `Plugin` base class has two convenience methods — `model(name)` and
 `EmbedderRef` from `blocks/`. After the reorg, `blocks/` merges into `ai/`,
 creating a `core/ → ai/` layering violation.
 
-Fix: move `ModelReference` and `EmbedderRef` into `core/` (they're tiny
-types — just a `name: str` wrapper). Or remove the helper methods from
-`Plugin` and let plugins construct refs directly.
+Fix: **restore the original async resolve-based helpers, add `embedder()`.** The
+current methods (added in #4278) construct `ModelReference`/`EmbedderRef` objects,
+which requires importing from `blocks/`. The original version (from #4132) called
+`self.resolve(ActionKind.MODEL, name)` and returned `Action` — no imports from
+`blocks/` or `ai/`, zero layering violation. Matches JS's
+`GenkitPluginV2Instance.model()`. The `embedder()` method gets the same treatment.
+Both are async, return `Action | None`, and only use types already in `core/`.
 
 **`ai/_base_async.py` → `web/manager/_ports.py`.**
 Imports `find_free_port_sync` — a 15-line stdlib socket utility. After the
@@ -324,29 +279,31 @@ _web/  →  ai/  →  core/
 ### 1. `__all__` on every public `__init__.py`
 
 ```python
-# genkit/__init__.py
+# genkit/__init__.py  (the ONLY public import path for most users)
 __all__ = [
     'Genkit', 'Document', 'GenkitError', 'UserFacingError',
-    'GenerateResponse', 'GenerateStreamResponse',
-    'ActionRunContext', 'ToolRunContext', 'Plugin',
-    # ... ~50 symbols
+    'GenerateResponse', 'StreamResponse', 'GenerateResponseChunk',
+    'ExecutablePrompt', 'Message', 'Role',
+    'Part', 'TextPart', 'MediaPart', 'Media',
+    'ToolRunContext', 'ToolInterruptError', 'ToolChoice',
+    'RequestData', 'ContextProvider',
+    'GENKIT_VERSION', 'GENKIT_CLIENT_HEADER', 'is_dev_environment',
+    'Plugin', 'Action', 'ActionMetadata', 'ActionKind', 'StatusCodes',
+    # ... ~34 symbols (see python_beta_api_proposal.md §1)
 ]
 
-# genkit/ai/__init__.py
-__all__ = [
-    'Genkit', 'ExecutablePrompt', 'GenerateStreamResponse',
-    'Document', 'RankedDocument', 'ToolRunContext',
-    # ... AI domain types + plugin helpers
-]
+# genkit/tracing.py  (telemetry plugin authors)
+__all__ = ['tracer', 'add_custom_exporter']
 
-# genkit/core/__init__.py
-__all__ = [
-    'Action', 'ActionRunContext', 'ActionMetadata', 'ActionKind',
-    'GenkitError', 'UserFacingError', 'Plugin', 'FlowWrapper',
-    # ... framework types + plugin helpers
-]
+# genkit/model.py, genkit/retriever.py, etc.  (domain sub-modules for plugin authors)
+# Each defines __all__ with its domain types.
 ```
 
+**No public `genkit.core` or `genkit.ai` import paths.** `core/` and `ai/` are
+internal package structure — `genkit/__init__.py` re-exports everything users need.
+Domain sub-modules (`genkit.model`, `genkit.retriever`, etc.) are for plugin authors
+who need wire-format types not in the top-level barrel.
+
 ### 2. `import-linter` in CI
 
 ```ini
@@ -374,8 +331,25 @@ forbidden_modules =
 
 ### 3. `_internal/` convention
 
-Following Pydantic v2's pattern. Everything in `_internal/` can change without
-notice between minor versions. The public modules re-export what's needed.
+Following Pydantic v2's pattern. The split works like this:
+
+**Files at package level (e.g. `core/action.py`, `core/error.py`):**
+- Clean abstractions within the package — the "logical public API" of that sub-package
+- Listed in the sub-package's `__init__.py` `__all__`
+- Other SDK modules import from here: `from genkit.core.action import Action`
+- Can still have private helpers (`_foo()`) inside the file — normal Python
+- Signals to SDK developers: "this is a stable abstraction"
+
+**Files in `_internal/` (e.g. `core/_internal/_registry.py`, `core/_internal/_typing.py`):**
+- Implementation machinery — can change between versions without notice
+- NOT listed in the sub-package's `__init__.py` `__all__`
+- Other SDK modules import directly when needed: `from genkit.core._internal._base import GenkitBaseModel`
+- `import-linter` prevents plugins from importing these paths
+- Signals to SDK developers: "this is plumbing, handle with care"
+
+Since there's **no public `genkit.core`** import path anyway, the split is primarily
+about signaling intent to other developers working on the SDK itself. External users
+import from `genkit`, `genkit.model`, `genkit.tracing`, etc. — they never see either level.
 
 ---
 
@@ -383,11 +357,11 @@ notice between minor versions. The public modules re-export what's needed.
 
 | File | Current | Target | How |
 |---|---|---|---|
-| `blocks/prompt.py` | 2,446 | ~600 | Split into prompt.py + streaming.py + _prompt_render.py + _prompt_cache.py |
+| `blocks/prompt.py` | 2,446 | ~600 | Split into prompt.py + streaming.py + _dotprompt.py (render_*, file loading, PromptCache) |
 | `ai/_registry.py` | 1,680 | **0 (deleted)** | define_* functions move to domain files (model.py, retriever.py, etc.). Genkit method stubs absorbed into _genkit.py. File ceases to exist. |
 | `ai/_aio.py` | 1,164 | ~800 | Rename to _genkit.py, extract server startup to _genkit_base.py |
 | `blocks/generate.py` | 1,088 | ~600 | Extract tool loop to _generate.py, keep public generate function |
-| `core/typing.py` | 1,066 | 1,066 | Auto-generated, don't touch. Add DO NOT EDIT header. |
+| `core/_internal/_typing.py` | 1,066 | 1,066 | Auto-generated, don't touch. Add DO NOT EDIT header. Moved to `_internal/`. |
 
 Target: no hand-written file over 800 lines. Matches Go/JS norms.
 
@@ -395,8 +369,16 @@ Target: no hand-written file over 800 lines. Matches Go/JS norms.
 
 ## Migration path
 
-This is a **one-time refactor** with no logic changes, no API changes, no behavior
-changes. The diff is:
+This is a **one-time refactor** with minimal logic changes. Most of the diff is
+file moves and import path updates. The API changes are:
+
+- `define_*` functions accept `type | dict | None` (see [§6](./python_beta_api_proposal.md))
+- `GenkitBaseModel` replaces `dump_dict`/`dump_json` (see [§5](./python_beta_api_proposal.md))
+- `to_json_schema` and `extract_json` become internal
+- Public import paths change from `genkit.core.*` / `genkit.blocks.*` to
+  `from genkit import ...` and domain sub-modules (`genkit.model`, etc.)
+
+The structural diff is:
 
 1. Move files
 2. Update import paths (find-and-replace across plugins)
@@ -431,20 +413,3 @@ changes. The diff is:
    forward.
 
 Each step is independently shippable and independently revertible.
-
----
-
-## What we're NOT doing
-
-- **Not changing the public API.** `from genkit import Genkit` still works.
-  All public symbols stay accessible from `genkit.__init__`.
-
-- **Not splitting into multiple PyPI packages.** `genkit` stays as one
-  installable package. `ai/` and `core/` are internal organization.
-
-- **Not changing runtime behavior.** This is purely a code organization refactor.
-
-- **Not touching `core/typing.py`.** Auto-generated schema types stay as-is.
-
-- **Not touching plugins' public APIs.** Plugins' `__init__.py` exports
-  are unchanged. Only their internal imports from `genkit.*` are updated.
diff --git a/py/docs/python_type_audit_checklist.md b/py/docs/python_type_audit_checklist.md
index a9e219c8ab..9d8f33931c 100644
--- a/py/docs/python_type_audit_checklist.md
+++ b/py/docs/python_type_audit_checklist.md
@@ -32,7 +32,7 @@ Detailed write-ups: [python_beta_type_design.md](./python_beta_type_design.md),
 
 ---
 
-## Should fix (29) — non-trivial changes needed
+## Should fix (28) — non-trivial changes needed
 
 - [ ] `UserFacingError` — positional args, should be keyword-only.
 - [ ] `GenkitError` — two serialization methods + standalone function, consolidate.
@@ -60,8 +60,6 @@ Detailed write-ups: [python_beta_type_design.md](./python_beta_type_design.md),
       provides), `ToolRunContext` accesses parent private fields.
 - [ ] `FormatDef` — uses `@abc.abstractmethod` but doesn't extend `abc.ABC`.
       One-line fix.
-- [ ] `GenkitSpan` — `__getattr__` proxy makes type invisible to type checkers.
-      Low priority.
 - [ ] `Logger` — 20-method Protocol. `warn`/`warning` redundant alias,
       `fatal`/`critical` redundant alias. JS Logger has 7 methods.
 - [ ] `AdjustingTraceExporter` — belongs in telemetry plugin, not core SDK.
@@ -82,7 +80,7 @@ Detailed write-ups: [python_beta_type_design.md](./python_beta_type_design.md),
 
 ---
 
-## Delete (33) — remove entirely
+## Delete (34) — remove entirely
 
 **Replaced by kwargs on `define_*` methods:**
 - [ ] `EmbedderOptions` — flatten to kwargs on `define_embedder()`
@@ -100,6 +98,12 @@ Detailed write-ups: [python_beta_type_design.md](./python_beta_type_design.md),
 - [ ] `OutputOptions` — dies when `PromptGenerateOptions` is replaced
 - [ ] `OutputConfigDict` — dies when `Output[T]` is replaced
 
+**Inlined into helpers (class unnecessary):**
+- [ ] `GenkitSpan` — `__getattr__` proxy kills type checking. Replace with free
+      functions in `_tracing.py` (`_set_genkit_attr`, `_set_span_input`,
+      `_set_span_output`). `is_root` becomes `span.parent is None` at call site.
+      `_trace/_types.py` deleted.
+
 **Dead code / unused:**
 - [ ] `Input` / `Output` — replace with `output_schema` kwarg + `@overload`
 - [ ] `Retriever` — dead code, never instantiated

From c73b439b60ae223befc35c801e57c3f90324b6f7 Mon Sep 17 00:00:00 2001
From: Jeff Huang <huangjeff@google.com>
Date: Tue, 24 Feb 2026 14:58:57 -0600
Subject: [PATCH 14/17] Updates to api proposal

---
 py/docs/python_beta_api_proposal.md | 841 ++++++++++++++++++++++++----
 1 file changed, 733 insertions(+), 108 deletions(-)

diff --git a/py/docs/python_beta_api_proposal.md b/py/docs/python_beta_api_proposal.md
index 48ea9df57d..7b9ab5e91c 100644
--- a/py/docs/python_beta_api_proposal.md
+++ b/py/docs/python_beta_api_proposal.md
@@ -1,15 +1,15 @@
-# Genkit Python — Public API Surface Proposal
+# Genkit Python — Beta API Design Review
 
-What's importable, what's not, and where the boundary is.
-
-Two audiences, separate entry points:
-
-1. **Simple Path** — `from genkit import ...`
-2. **Advanced Usage** — domain sub-modules (`genkit.model`, `genkit.retriever`, etc.)
+This doc covers the full public API surface being locked at beta: what's importable,
+how the client is constructed, the high-traffic method signatures, and the return types
+users interact with. Section 5 lists open design questions requiring explicit sign-off.
 
 ---
 
-## 1. Proposed imports
+## 1. Import Surface
+
+Every symbol exported at beta. This is exhaustive by design — the list itself is what's
+being approved.
 
 ### `from genkit import ...` — app developers
 
@@ -18,12 +18,14 @@ from genkit import (
     # Core
     Genkit,
     ActionRunContext,
-    GenerateResponse,       # veneer (aliased from GenerateResponseWrapper)
-    GenerateResponseChunk,  # veneer (aliased from GenerateResponseChunkWrapper)
-    StreamResponse,         # renamed from GenerateStreamResponse
-    ExecutablePrompt,  
+    GenerateResponse,         # veneer alias of GenerateResponseWrapper
+    GenerateResponseChunk,    # veneer alias of GenerateResponseChunkWrapper
+    ActionStreamResponse,     # base streaming wrapper — Action.stream()
+    FlowStreamResponse,       # flow streaming wrapper — FlowWrapper.stream()
+    GenerateStreamResponse,   # generate/prompt streaming wrapper — subclass of FlowStreamResponse
+    ExecutablePrompt,
     GenkitError,
-    UserFacingError,
+    PublicError,         # renamed from UserFacingError — matches Go's NewPublicError
 
     # Content types
     Part, TextPart, MediaPart, Media,
@@ -52,36 +54,17 @@ from genkit import (
     ContextProvider,
 
     # Constants
-    GENKIT_VERSION,
-    GENKIT_CLIENT_HEADER,
     is_dev_environment,
 
-    # Plugin authoring (also used by advanced app devs)
-    Plugin,
-    Action,
-    ActionMetadata,
-    ActionKind,
-    StatusCodes,
 )
 ```
 
-**~34 symbols.** One import covers both app developers (~25) and plugin authors
-(~9 additional). Normal for Python — OpenAI and Anthropic export more.
-
-Notes:
-- `GenerateResponse` / `GenerateResponseChunk` — aliases that hide the "Wrapper" suffix.
-  Both use inheritance, so `isinstance` checks work.
-- `Message` is the schema type, not `MessageWrapper`. `MessageWrapper` uses composition
-  so aliasing it would break `Message(role="user", content=[...])`. Users get
-  `MessageWrapper` via `response.messages` but never construct it.
-- `ExecutablePrompt` — exported so users can type-annotate: `my_prompt: ExecutablePrompt = ai.prompt("greeting")`.
-
-### `genkit.model` — model plugin authors
+### `genkit.model`
 
 ```python
 from genkit.model import (
     GenerateRequest,
-    GenerateResponse,        # schema type (NOT the veneer)
+    GenerateResponse,        # schema type (NOT the veneer — see §5)
     GenerateResponseChunk,
     GenerationUsage,
     Candidate,
@@ -110,14 +93,11 @@ from genkit.model import (
 )
 ```
 
-`GenerateResponse` naming: `from genkit import GenerateResponse` = veneer.
-`from genkit.model import GenerateResponse` = schema type. No shadowing — a file
-imports from one or the other. Veneer extends schema via inheritance.
-
 ### `genkit.retriever`
 
 ```python
 from genkit.retriever import (
+    RetrieverRef,
     RetrieverRequest,
     RetrieverResponse,
     retriever_action_metadata,
@@ -132,6 +112,7 @@ from genkit.retriever import (
 
 ```python
 from genkit.embedder import (
+    EmbedderRef,
     EmbedRequest,
     EmbedResponse,
     Embedding,
@@ -177,100 +158,744 @@ from genkit.evaluator import (
 from genkit.tracing import tracer, add_custom_exporter
 ```
 
-## 2. What we removed from imports (and why)
+### `genkit.plugin` — all plugin authors
+
+```python
+from genkit.plugin import (
+    # Base class and framework primitives
+    Plugin,
+    Action,
+    ActionMetadata,
+    ActionKind,
+    StatusCodes,
+
+    # HTTP / version stamping (for setting x-goog-api-client and user-agent headers)
+    GENKIT_CLIENT_HEADER,
+    GENKIT_VERSION,
+
+    # Convenience re-exports from domain modules
+    # (identical to importing from genkit.model, genkit.retriever, etc.)
+    model_action_metadata, model_ref, ModelReference,
+    embedder_action_metadata, embedder_ref,
+    retriever_action_metadata, retriever_ref,
+    indexer_action_metadata, indexer_ref,
+    reranker_action_metadata, reranker_ref,
+    evaluator_action_metadata, evaluator_ref,
+)
+```
+
+Note: The domain sub-modules (`genkit.model`, `genkit.retriever`, etc.) are still the canonical
+paths for domain-specific types. `genkit.plugin` re-exports the cross-cutting framework primitives
+and provides a single entry point for plugin authors who don't want to hunt across multiple paths.
+
+---
+
+## 2. Client Construction
+
+```python
+ai = Genkit(
+    plugins: list[Plugin] | None = None,
+    model: str | None = None,
+    prompt_dir: str | Path | None = None,
+)
+```
+
+- `plugins` — list of initialized plugin instances
+- `model` — default model name used when `model=` is omitted from `generate()`
+- `prompt_dir` — directory to load `.prompt` files from; defaults to `./prompts` if it exists
 
-### Removed from `from genkit import ...`
+---
 
-| Symbol | Why |
-|---|---|
-| `Input` / `Output` | Type deleted. Replaced by `output_schema` kwarg. Neither JS nor Go needed this. |
-| `GenkitRegistry` | Internal implementation type. Plugins use the `Genkit` instance. |
-| `FlowWrapper` | Internal. Not needed by app developers. |
-| `SimpleRetrieverOptions` | Type deleted. Flatten to kwargs on `define_simple_retriever()`. |
-| `PromptGenerateOptions` | Type deleted. 17-field TypedDict that killed IDE autocomplete. |
-| `OutputOptions` | Type deleted. Dies with `PromptGenerateOptions`. |
-| `ResumeOptions` | No longer top-level. Passed as `resume=` kwarg on prompt methods. |
-| `tool_response` | Only 3 sample usages. JS/Go use a method on the tool instance. |
-| `GENKIT_CLIENT_HEADER` / `GENKIT_VERSION` | Previously deep internal import (`genkit.core.constants`). Now in top-level `from genkit import ...`. |
+## 3. Method Signatures
 
-### Removed from domain sub-modules
+High-traffic paths only — not exhaustive.
 
-| Symbol | Module | Why |
-|---|---|---|
-| `RetrieverOptions` | `genkit.retriever` | Type deleted. Flatten to kwargs on `define_retriever()`. |
-| `IndexerOptions` | `genkit.retriever` | Type deleted. Flatten to kwargs on `define_indexer()`. |
-| `EmbedderOptions` | `genkit.embedder` | Type deleted. Flatten to kwargs on `define_embedder()`. |
-| `RerankerOptions` | `genkit.reranker` | Type deleted. Flatten to kwargs on `define_reranker()`. |
+### `Genkit`
 
-### Removed helpers (no longer importable)
+```python
+from typing import Any, overload
+C = TypeVar('C', bound=GenerationCommonConfig)
+InputT = TypeVar('InputT')
+OutputT = TypeVar('OutputT')
+# Key invariant: TypeVars are bound ONLY when the caller passes a concrete type[T] argument.
+# When a parameter typed type[T] has no default, the overload only matches when explicitly
+# provided — that absence-of-default is the mechanism that triggers TypeVar binding.
+# Catch-all overloads use dict[str, object] | None = None and return Any-parameterized types.
+
+# generate() has 4 overloads — the (model type) × (output_schema type) cross-product:
+#   [1] ModelReference[C] + type[OutputT]  →  GenerateResponse[OutputT]   ← C and OutputT both bound
+#   [2] ModelReference[C] + dict/None      →  GenerateResponse[Any]
+#   [3] str/None          + type[OutputT]  →  GenerateResponse[OutputT]   ← OutputT bound, C erased
+#   [4] str/None          + dict/None      →  GenerateResponse[Any]       ← catch-all
+#
+# Typed overloads (1, 3): output_schema has NO default — absence of a default is what forces
+# the type checker to bind OutputT only when the caller explicitly passes a concrete type.
+# Catch-all overloads (2, 4): output_schema: dict[str, object] | None = None covers
+# raw JSON Schema dicts, None, and the omitted-entirely case.
+
+# [1] ModelReference[C] + typed schema — both C and OutputT bound:
+@overload
+async def generate(
+    self,
+    *,
+    model: ModelReference[C],
+    config: C | None = None,
+    prompt: str | Part | list[Part] | None = None,
+    system: str | Part | list[Part] | None = None,
+    messages: list[Message] | None = None,
+    tools: list[str | Action | ExecutablePrompt] | None = None,
+    return_tool_requests: bool | None = None,
+    tool_choice: ToolChoice | None = None,
+    tool_responses: list[Part] | None = None,
+    max_turns: int | None = None,
+    context: dict[str, object] | None = None,
+    output_schema: type[OutputT],           # no default — binds OutputT
+    output_format: str | None = None,
+    output_content_type: str | None = None,
+    output_instructions: bool | str | None = None,
+    output_constrained: bool | None = None,
+    use: list[ModelMiddleware] | None = None,
+    docs: list[Document] | None = None,
+) -> GenerateResponse[OutputT]: ...
+
+# [2] ModelReference[C] + untyped schema:
+@overload
+async def generate(
+    self,
+    *,
+    model: ModelReference[C],
+    config: C | None = None,
+    prompt: str | Part | list[Part] | None = None,
+    system: str | Part | list[Part] | None = None,
+    messages: list[Message] | None = None,
+    tools: list[str | Action | ExecutablePrompt] | None = None,
+    return_tool_requests: bool | None = None,
+    tool_choice: ToolChoice | None = None,
+    tool_responses: list[Part] | None = None,
+    max_turns: int | None = None,
+    context: dict[str, object] | None = None,
+    output_schema: dict[str, object] | None = None,
+    output_format: str | None = None,
+    output_content_type: str | None = None,
+    output_instructions: bool | str | None = None,
+    output_constrained: bool | None = None,
+    use: list[ModelMiddleware] | None = None,
+    docs: list[Document] | None = None,
+) -> GenerateResponse[Any]: ...
+
+# [3] str model + typed schema — OutputT bound, config falls back to GenerationCommonConfig:
+@overload
+async def generate(
+    self,
+    *,
+    model: str | None = None,
+    config: GenerationCommonConfig | None = None,
+    prompt: str | Part | list[Part] | None = None,
+    system: str | Part | list[Part] | None = None,
+    messages: list[Message] | None = None,
+    tools: list[str | Action | ExecutablePrompt] | None = None,
+    return_tool_requests: bool | None = None,
+    tool_choice: ToolChoice | None = None,
+    tool_responses: list[Part] | None = None,
+    max_turns: int | None = None,
+    context: dict[str, object] | None = None,
+    output_schema: type[OutputT],           # no default — binds OutputT
+    output_format: str | None = None,
+    output_content_type: str | None = None,
+    output_instructions: bool | str | None = None,
+    output_constrained: bool | None = None,
+    use: list[ModelMiddleware] | None = None,
+    docs: list[Document] | None = None,
+) -> GenerateResponse[OutputT]: ...
+
+# [4] str model + untyped schema — catch-all (dict, None, or omitted):
+@overload
+async def generate(
+    self,
+    *,
+    model: str | None = None,
+    config: GenerationCommonConfig | None = None,
+    prompt: str | Part | list[Part] | None = None,
+    system: str | Part | list[Part] | None = None,
+    messages: list[Message] | None = None,
+    tools: list[str | Action | ExecutablePrompt] | None = None,
+    return_tool_requests: bool | None = None,
+    tool_choice: ToolChoice | None = None,
+    tool_responses: list[Part] | None = None,
+    max_turns: int | None = None,
+    context: dict[str, object] | None = None,
+    output_schema: dict[str, object] | None = None,
+    output_format: str | None = None,
+    output_content_type: str | None = None,
+    output_instructions: bool | str | None = None,
+    output_constrained: bool | None = None,
+    use: list[ModelMiddleware] | None = None,
+    docs: list[Document] | None = None,
+) -> GenerateResponse[Any]: ...
+
+# generate_stream() has the same 4-overload structure, returning GenerateStreamResponse[T]:
+
+# [1] ModelReference[C] + typed schema:
+@overload
+def generate_stream(
+    self,
+    *,
+    model: ModelReference[C],
+    config: C | None = None,
+    prompt: str | Part | list[Part] | None = None,
+    system: str | Part | list[Part] | None = None,
+    messages: list[Message] | None = None,
+    tools: list[str | Action | ExecutablePrompt] | None = None,
+    return_tool_requests: bool | None = None,
+    tool_choice: ToolChoice | None = None,
+    tool_responses: list[Part] | None = None,
+    max_turns: int | None = None,
+    context: dict[str, object] | None = None,
+    output_schema: type[OutputT],           # no default — binds OutputT
+    output_format: str | None = None,
+    output_content_type: str | None = None,
+    output_instructions: bool | str | None = None,
+    output_constrained: bool | None = None,
+    use: list[ModelMiddleware] | None = None,
+    docs: list[Document] | None = None,
+    timeout: float | None = None,
+) -> GenerateStreamResponse[OutputT]: ...
+
+# [2] ModelReference[C] + untyped schema:
+@overload
+def generate_stream(
+    self,
+    *,
+    model: ModelReference[C],
+    config: C | None = None,
+    prompt: str | Part | list[Part] | None = None,
+    system: str | Part | list[Part] | None = None,
+    messages: list[Message] | None = None,
+    tools: list[str | Action | ExecutablePrompt] | None = None,
+    return_tool_requests: bool | None = None,
+    tool_choice: ToolChoice | None = None,
+    tool_responses: list[Part] | None = None,
+    max_turns: int | None = None,
+    context: dict[str, object] | None = None,
+    output_schema: dict[str, object] | None = None,
+    output_format: str | None = None,
+    output_content_type: str | None = None,
+    output_instructions: bool | str | None = None,
+    output_constrained: bool | None = None,
+    use: list[ModelMiddleware] | None = None,
+    docs: list[Document] | None = None,
+    timeout: float | None = None,
+) -> GenerateStreamResponse[Any]: ...
+
+# [3] str model + typed schema:
+@overload
+def generate_stream(
+    self,
+    *,
+    model: str | None = None,
+    config: GenerationCommonConfig | None = None,
+    prompt: str | Part | list[Part] | None = None,
+    system: str | Part | list[Part] | None = None,
+    messages: list[Message] | None = None,
+    tools: list[str | Action | ExecutablePrompt] | None = None,
+    return_tool_requests: bool | None = None,
+    tool_choice: ToolChoice | None = None,
+    tool_responses: list[Part] | None = None,
+    max_turns: int | None = None,
+    context: dict[str, object] | None = None,
+    output_schema: type[OutputT],           # no default — binds OutputT
+    output_format: str | None = None,
+    output_content_type: str | None = None,
+    output_instructions: bool | str | None = None,
+    output_constrained: bool | None = None,
+    use: list[ModelMiddleware] | None = None,
+    docs: list[Document] | None = None,
+    timeout: float | None = None,
+) -> GenerateStreamResponse[OutputT]: ...
+
+# [4] str model + untyped schema — catch-all:
+@overload
+def generate_stream(
+    self,
+    *,
+    model: str | None = None,
+    config: GenerationCommonConfig | None = None,
+    prompt: str | Part | list[Part] | None = None,
+    system: str | Part | list[Part] | None = None,
+    messages: list[Message] | None = None,
+    tools: list[str | Action | ExecutablePrompt] | None = None,
+    return_tool_requests: bool | None = None,
+    tool_choice: ToolChoice | None = None,
+    tool_responses: list[Part] | None = None,
+    max_turns: int | None = None,
+    context: dict[str, object] | None = None,
+    output_schema: dict[str, object] | None = None,
+    output_format: str | None = None,
+    output_content_type: str | None = None,
+    output_instructions: bool | str | None = None,
+    output_constrained: bool | None = None,
+    use: list[ModelMiddleware] | None = None,
+    docs: list[Document] | None = None,
+    timeout: float | None = None,
+) -> GenerateStreamResponse[Any]: ...
+
+# Retrieval
+async def retrieve(
+    self,
+    retriever: str | RetrieverRef,
+    query: str | Document,
+    *,
+    options: dict[str, object] | None = None,  # plugin-defined schema; shape varies per retriever
+) -> list[Document]: ...  # JS parity: wire DocumentData converted to Document veneers
+
+# Embedding
+async def embed(
+    self,
+    embedder: str | EmbedderRef,
+    content: str | Document,
+    *,
+    options: dict[str, object] | None = None,  # plugin-defined schema; shape varies per embedder
+) -> list[Embedding]: ...
+
+# Prompt lookup — 4 overloads (InputT × OutputT cross-product):
+# [1] Both bound:
+@overload
+def prompt(
+    self,
+    name: str,
+    variant: str | None = None,
+    *,
+    input_schema: type[InputT],             # no default — binds InputT
+    output_schema: type[OutputT],           # no default — binds OutputT
+) -> ExecutablePrompt[InputT, OutputT]: ...
+
+# [2] InputT bound, OutputT untyped — typed input, unstructured output:
+@overload
+def prompt(
+    self,
+    name: str,
+    variant: str | None = None,
+    *,
+    input_schema: type[InputT],             # no default — binds InputT
+    output_schema: dict[str, object] | None = None,
+) -> ExecutablePrompt[InputT, Any]: ...
+
+# [3] OutputT bound, InputT untyped — unstructured input, typed output:
+@overload
+def prompt(
+    self,
+    name: str,
+    variant: str | None = None,
+    *,
+    input_schema: dict[str, object] | None = None,
+    output_schema: type[OutputT],           # no default — binds OutputT
+) -> ExecutablePrompt[Any, OutputT]: ...
+
+# [4] Catch-all — neither bound (dict, None, or omitted):
+@overload
+def prompt(
+    self,
+    name: str,
+    variant: str | None = None,
+    *,
+    input_schema: dict[str, object] | None = None,
+    output_schema: dict[str, object] | None = None,
+) -> ExecutablePrompt[Any, Any]: ...
+
+# Decorators
+@ai.flow(name: str | None = None)
+async def my_flow(input: InputT) -> OutputT: ...
+# Returns: FlowWrapper
+
+@ai.tool(name: str | None = None, description: str | None = None)
+def my_tool(input: InputT, ctx: ToolRunContext | None = None) -> OutputT: ...
+```
 
-| Helper | Why |
-|---|---|
-| `get_logger` | Structlog wrapper. Use stdlib `logging`. Neither JS nor Go force a logging library. |
-| `text_from_content` | Use veneers instead: `response.text`, `message.text`, `doc.text`. |
-| `dump_dict` / `dump_json` | Fix at source — `GenkitBaseModel` defaults handle this. See [sdk_design §9](./python_beta_sdk_design.md). |
-| `get_callable_json` | Dies with `core/flows.py`. |
-| `create_flows_asgi_app` | Cloud Functions pattern — doesn't fit Python where FastAPI/Flask own routing. |
+### `ExecutablePrompt` — returned by `ai.prompt()` / `@ai.define_prompt`
 
-### Internalized (no longer importable)
+```python
+# Call like a function
+await prompt(input: InputT | None = None) -> GenerateResponse[OutputT]
+
+# Stream
+def stream(
+    self,
+    input: InputT | None = None,
+    *,
+    timeout: float | None = None,
+) -> GenerateStreamResponse[OutputT]
+
+# Render without executing
+async def render(
+    self,
+    input: InputT | dict[str, Any] | None = None,
+) -> GenerateActionOptions
+```
 
-| Symbol | Why |
-|---|---|
-| `to_json_schema` | `define_*` accepts types directly — no plugin needs manual conversion. Moves to `core/_internal/_schema.py`. See [sdk_design §10](./python_beta_sdk_design.md). |
-| `extract_json` | Zero plugin consumers. Only used internally by `formats/`. Moves to `core/_internal/_extract.py`. |
+### `FlowWrapper` — returned by `@ai.flow`
 
-JS exports both (`genkit/schema`, `genkit/extract`) but no JS plugin imports them either.
+```python
+# Call like a function — same signature as the wrapped flow
+flow(*args, **kwargs) -> Awaitable[OutputT]
+
+# Stream
+def stream(
+    self,
+    input: InputT | None = None,
+    *,
+    context: dict[str, object] | None = None,
+    telemetry_labels: dict[str, object] | None = None,
+    timeout: float | None = None,
+) -> FlowStreamResponse[ChunkT, OutputT]
+```
 
-### Moved to plugins (out of core SDK)
+### Plugin authoring surface
 
-| Symbol | Destination | Why |
-|---|---|---|
-| `AdjustingTraceExporter` | telemetry plugin | JS equivalent is in `js/plugins/google-cloud/`, not core. |
-| `RealtimeSpanProcessor` | telemetry plugin | Telemetry implementation detail. |
-| `RedactedSpan` | telemetry plugin | Only used by `AdjustingTraceExporter`. |
+```python
+def define_model(
+    self,
+    name: str,
+    fn: ModelFn,
+    *,
+    config_schema: type[BaseModel] | dict[str, object] | None = None,
+    label: str | None = None,
+    supports: Supports | None = None,
+    versions: list[str] | None = None,
+    stage: Stage | None = None,
+) -> Action: ...
+
+def define_embedder(
+    self,
+    name: str,
+    fn: EmbedderFn,
+    *,
+    config_schema: type[BaseModel] | dict[str, object] | None = None,
+    label: str | None = None,
+    supports: EmbedderSupports | None = None,
+    dimensions: int | None = None,
+) -> Action: ...
+
+def define_retriever(
+    self,
+    name: str,
+    fn: RetrieverFn,
+    *,
+    config_schema: type[BaseModel] | dict[str, object] | None = None,
+    label: str | None = None,
+    supports: RetrieverSupports | None = None,
+) -> Action: ...
+
+# InputT binds through input_schema — all Callables and the return type are typed accordingly
+InputT = TypeVar('InputT')
+
+# define_prompt() — 4 overloads (InputT × OutputT cross-product):
+
+# [1] Both bound — typed input, typed output:
+@overload
+def define_prompt(
+    self,
+    name: str | None = None,
+    *,
+    variant: str | None = None,
+    model: str | None = None,
+    config: GenerationCommonConfig | None = None,
+    description: str | None = None,
+    input_schema: type[InputT],             # no default — binds InputT
+    system: str | Part | list[Part] | Callable[[InputT, dict | None], str | Part | list[Part]] | None = None,
+    prompt: str | Part | list[Part] | Callable[[InputT, dict | None], str | Part | list[Part]] | None = None,
+    messages: str | list[Message] | Callable[[InputT, dict | None], list[Message]] | None = None,
+    docs: list[Document] | Callable[[InputT, dict | None], list[Document]] | None = None,
+    output_schema: type[OutputT],           # no default — binds OutputT
+    output_format: str | None = None,
+    output_content_type: str | None = None,
+    output_instructions: bool | str | None = None,
+    output_constrained: bool | None = None,
+    tools: list[str | Action | ExecutablePrompt] | None = None,
+    tool_choice: ToolChoice | None = None,
+    return_tool_requests: bool | None = None,
+    max_turns: int | None = None,
+    use: list[ModelMiddleware] | None = None,
+) -> ExecutablePrompt[InputT, OutputT]: ...
+
+# [2] InputT bound, OutputT untyped — typed input, unstructured output:
+@overload
+def define_prompt(
+    self,
+    name: str | None = None,
+    *,
+    variant: str | None = None,
+    model: str | None = None,
+    config: GenerationCommonConfig | None = None,
+    description: str | None = None,
+    input_schema: type[InputT],             # no default — binds InputT
+    system: str | Part | list[Part] | Callable[[InputT, dict | None], str | Part | list[Part]] | None = None,
+    prompt: str | Part | list[Part] | Callable[[InputT, dict | None], str | Part | list[Part]] | None = None,
+    messages: str | list[Message] | Callable[[InputT, dict | None], list[Message]] | None = None,
+    docs: list[Document] | Callable[[InputT, dict | None], list[Document]] | None = None,
+    output_schema: dict[str, object] | None = None,
+    output_format: str | None = None,
+    output_content_type: str | None = None,
+    output_instructions: bool | str | None = None,
+    output_constrained: bool | None = None,
+    tools: list[str | Action | ExecutablePrompt] | None = None,
+    tool_choice: ToolChoice | None = None,
+    return_tool_requests: bool | None = None,
+    max_turns: int | None = None,
+    use: list[ModelMiddleware] | None = None,
+) -> ExecutablePrompt[InputT, Any]: ...
+
+# [3] OutputT bound, InputT untyped — unstructured input, typed output:
+@overload
+def define_prompt(
+    self,
+    name: str | None = None,
+    *,
+    variant: str | None = None,
+    model: str | None = None,
+    config: GenerationCommonConfig | None = None,
+    description: str | None = None,
+    input_schema: dict[str, object] | None = None,
+    system: str | Part | list[Part] | Callable[..., str | Part | list[Part]] | None = None,
+    prompt: str | Part | list[Part] | Callable[..., str | Part | list[Part]] | None = None,
+    messages: str | list[Message] | Callable[..., list[Message]] | None = None,
+    docs: list[Document] | Callable[..., list[Document]] | None = None,
+    output_schema: type[OutputT],           # no default — binds OutputT
+    output_format: str | None = None,
+    output_content_type: str | None = None,
+    output_instructions: bool | str | None = None,
+    output_constrained: bool | None = None,
+    tools: list[str | Action | ExecutablePrompt] | None = None,
+    tool_choice: ToolChoice | None = None,
+    return_tool_requests: bool | None = None,
+    max_turns: int | None = None,
+    use: list[ModelMiddleware] | None = None,
+) -> ExecutablePrompt[Any, OutputT]: ...
+
+# [4] Catch-all — neither bound (dict, None, or omitted):
+@overload
+def define_prompt(
+    self,
+    name: str | None = None,
+    *,
+    variant: str | None = None,
+    model: str | None = None,
+    config: GenerationCommonConfig | None = None,
+    description: str | None = None,
+    input_schema: dict[str, object] | None = None,
+    system: str | Part | list[Part] | Callable[..., str | Part | list[Part]] | None = None,
+    prompt: str | Part | list[Part] | Callable[..., str | Part | list[Part]] | None = None,
+    messages: str | list[Message] | Callable[..., list[Message]] | None = None,
+    docs: list[Document] | Callable[..., list[Document]] | None = None,
+    output_schema: dict[str, object] | None = None,
+    output_format: str | None = None,
+    output_content_type: str | None = None,
+    output_instructions: bool | str | None = None,
+    output_constrained: bool | None = None,
+    tools: list[str | Action | ExecutablePrompt] | None = None,
+    tool_choice: ToolChoice | None = None,
+    return_tool_requests: bool | None = None,
+    max_turns: int | None = None,
+    use: list[ModelMiddleware] | None = None,
+) -> ExecutablePrompt[Any, Any]: ...
+
+# Action — returned by define_model, define_tool, etc.
+# Calling streams and returns the base wrapper; FlowWrapper/generate_stream build on top
+action.stream(
+    input: InputT | None = None,
+    *,
+    context: dict[str, object] | None = None,
+    telemetry_labels: dict[str, object] | None = None,
+    timeout: float | None = None,
+) -> ActionStreamResponse[ChunkT, OutputT]
+
+# ActionRunContext[ChunkT] — producer interface inside action/flow/tool functions
+# Go: StreamCallback[Stream] param (nil = not streaming)
+# JS: ActionFnArg<S> / FlowSideChannel<S> — two types; Python unifies into one
+ctx.is_streaming              # bool — whether caller requested a stream
+ctx.send_chunk(chunk: ChunkT) # type-safe push; no-op if not streaming
+ctx.context                   # dict[str, object] — request context
+
+# ToolRunContext(ActionRunContext[object]) — ChunkT is object, not GenerateResponseChunk
+# Tools don't own their chunk schema — they borrow the parent generate's callback
+# JS ToolAction hardcodes streaming type as z.ZodTypeAny for the same reason
+```
 
 ---
 
-## 3. What we added to imports (and why)
+## 4. Return Type Surfaces
+
+What users get back from calls and interact with.
+
+### `GenerateResponse` — from `generate()`, `await prompt(input)`
+
+```python
+response.text          # str — full text of the response
+response.output        # OutputT — typed output if output schema was provided
+response.message       # MessageWrapper — the final message
+response.messages      # list[MessageWrapper] — full conversation history
+response.tool_requests # list[ToolRequestPart] — pending tool calls
+```
+
+### `MessageWrapper` — accessed via `response.message` / `response.messages`
+
+```python
+message.text           # str — text content of the message
+message.tool_requests  # list[ToolRequestPart]
+message.interrupts     # list[ToolRequestPart] — tool calls requiring user input
+```
+
+Note: `MessageWrapper` is not exported for construction. Users construct `Message(role=..., content=[...])` and receive `MessageWrapper` back. See §5.
+
+### `GenerateResponseChunk` — stream chunks from `generate_stream()`
+
+```python
+chunk.text             # str — text in this chunk
+chunk.output           # object — partial typed output
+chunk.accumulated_text # str — all text so far
+```
+
+### Streaming wrappers — see [`streaming.md`](streaming.md)
+
+Three wrapper types, one hierarchy (`ActionStreamResponse` → `FlowStreamResponse` → `GenerateStreamResponse`). All expose the same two properties:
 
-### Added to `from genkit import ...`
+```python
+result.stream    # AsyncIterable[ChunkT]
+result.response  # Awaitable[OutputT]
+```
 
-| Symbol | Why |
-|---|---|
-| `StreamResponse` | Renamed from `GenerateStreamResponse`. Return type of all streaming APIs. Previously not importable — `generate_stream()` returned a raw tuple. |
-| `GenerateResponseChunk` | Veneer alias. Previously not exported from top level. |
-| `ToolInterruptError` | User-facing error type for tool interrupts. Previously only importable from internal path. |
-| `ToolChoice` | Tool selection control for `generate()`. Previously internal. |
-| `StatusCodes` | Error status codes for plugin authors. Previously only in `genkit.core`. |
-| `ReasoningPart` | Content type for chain-of-thought. New model capability. |
-| `DataPart` / `CustomPart` | Content types that were missing from top-level exports. |
+| Type | Returned by | ChunkT | OutputT |
+|---|---|---|---|
+| `ActionStreamResponse[C, O]` | `action.stream()` | action-defined | action-defined |
+| `FlowStreamResponse[C, O]` | `flow.stream()` | flow-defined | flow-defined |
+| `GenerateStreamResponse[O]` | `generate_stream()`, `prompt.stream()` | `GenerateResponseChunk` (fixed) | `GenerateResponse[O]` |
 
-### Added to `genkit.model`
+### `RetrieverResponse` — from `retrieve()`
 
-| Symbol | Why |
-|---|---|
-| `compute_usage_stats` | Renamed from `get_basic_usage_stats`. Centralized — avoids each plugin re-inventing token counting. |
-| `resolve_api_key` | Resolves per-request API key vs plugin default. Previously duplicated across plugins. |
+```python
+response.documents     # list[Document]
+```
 
 ---
 
-## 4. Internal design decisions
+## 5. Design Flags
+
+Open questions requiring explicit sign-off.
+
+### Sync vs async API surface
+
+All `Genkit` methods are `async def`. Users must `await` every call or run inside a flow.
+`run_main(coro)` exists as a helper for scripts. There is no sync API.
+
+Options:
+1. **Async-only** (current) — clean, but friction for scripts and simple use cases
+2. **Sync wrappers** — `ai.generate()` blocks, `ai.generate_stream()` stays async
+3. **Two classes** — `Genkit` (async) and `SyncGenkit` (sync), à la `httpx`
+
+Changing this post-beta requires a new class or a deprecation cycle.
+
+
+### Naming: `GenerateResponse` — veneer vs schema type
+
+`from genkit import GenerateResponse` gives the veneer (wrapper with `.text`, `.output`, etc.).
+`from genkit.model import GenerateResponse` gives the raw Pydantic schema type.
 
-The following design changes affect the public API indirectly. Full details
-(rationale, import DAG, migration plans, open questions) are in
-[python_beta_sdk_design.md](./python_beta_sdk_design.md).
+Same name, different type, different import path. A file that imports from both gets a collision.
+Plugin authors import from `genkit.model`; app developers from `genkit`. In practice no single
+file should need both — but it's an implicit contract that could surprise users.
 
-**Serialization cleanup (`GenkitBaseModel`)** — Internal base class that
-defaults `model_dump()` to `exclude_none=True, by_alias=True`. Eliminates
-`dump_dict`/`dump_json` wrappers and fixes 11 inconsistent serialization calls.
-See [sdk_design §9](./python_beta_sdk_design.md).
+### `config` typing: `GenerationCommonConfig | dict[str, object]`
 
-**`define_*` accepts raw Python types** — `define_model`, `define_retriever`,
-etc. accept `type | dict | None` directly instead of requiring pre-converted
-JSON Schema dicts. `to_json_schema` and `extract_json` move to
-`core/_internal/`. See [sdk_design §10](./python_beta_sdk_design.md).
+`generate()`, `generate_stream()`, and `define_prompt()` currently accept:
 
-**`ErrorResponse` consolidation** — Replaces 3 error wire format types with a
-single internal Pydantic model. See [sdk_design §11](./python_beta_sdk_design.md).
+```python
+config: GenerationCommonConfig | dict[str, object] | None = None
+```
+
+`config` passes model-specific generation parameters — both common fields (`temperature`, `top_k`)
+and provider-specific ones (`safety_settings` for Gemini, `frequency_penalty` for OpenAI).
+`GenerationCommonConfig` only covers the common fields; `| dict` is an escape hatch for the rest.
+Cost: no IDE autocomplete on model-specific fields, silent typos.
+
+**How JS solves it:** `ModelArgument<CustomOptions>` is generic — when you pass a typed
+`ModelAction<GeminiOptions>` or `ModelReference<GeminiOptions>`, TypeScript infers
+`config: GeminiOptions` at compile time. String models fall back to untyped (same limitation as
+Python today). Go doesn't solve this at all — `ModelRef.config` is typed as `any`.
+
+**Proposed fix: make `ModelReference` generic, add overloads.**
+
+`ModelReference` already exists in Python but its `config` field is `dict[str, object]` — not
+generic. The fix:
+
+```python
+C = TypeVar('C', bound=GenerationCommonConfig)
+
+# 1. Make ModelReference generic (plugin authors export typed refs):
+class ModelReference(BaseModel, Generic[C]):
+    name: str
+    config: C | None = None
+    ...
+
+# Plugin exports:
+gemini_flash: ModelReference[GeminiConfig] = ModelReference(name="googleai/gemini-2.0-flash")
+
+# 2. generate() gains two overloads:
+@overload
+async def generate(self, *, model: ModelReference[C], config: C | None = None, ...) -> GenerateResponse: ...
+@overload
+async def generate(self, *, model: str | None = None, config: GenerationCommonConfig | None = None, ...) -> GenerateResponse: ...
+```
+
+**Result:**
+```python
+# Typed path — IDE enforces config type, flags wrong plugin config:
+await ai.generate(model=gemini_flash, config=GeminiConfig(temperature=0.7, safety_settings=[...]))  # ✅
+await ai.generate(model=gemini_flash, config=OpenAIConfig(...))  # ❌ type error
+
+# String path — unchanged, falls back to GenerationCommonConfig:
+await ai.generate(model="googleai/gemini-2.0-flash", config=GenerationCommonConfig(temperature=0.7))
+```
+
+This achieves full JS parity on the typed path. `ModelReference` already exists — needs to be
+made generic and exported from `genkit`. Plugin authors export typed `ModelReference[C]` objects.
+`| dict` is dropped entirely.
+
+**Decision needed:** Ship this for beta, or ship `GenerationCommonConfig | None` (subclass
+approach, no cross-model safety) and do the generic `ModelReference[C]` post-beta?
+
+### Naming: `Message` vs `MessageWrapper`
+
+Users construct messages with `Message`:
+```python
+messages=[Message(role="user", content=[...])]
+```
+
+But `response.message` returns a `MessageWrapper` — a subclass that adds `.text`, `.tool_requests`,
+`.interrupts`. `MessageWrapper` is not exported, so users can't type-annotate it directly.
+
+Options:
+1. **Current** — `Message` for construction, `MessageWrapper` silently returned
+2. Export `MessageWrapper` under a better name (e.g. `ResponseMessage`)
+3. Add `.text` / `.tool_requests` to `Message` directly, eliminate the subclass
 
 ---
+
+---
+
+## Appendix: Pre-review action items
+
+Already decided — not for discussion. Listed for completeness.
+
+- Rename `UserFacingError` → `PublicError` (matches Go's `NewPublicError`; intent is "safe to return in HTTP response")
+- Remove `reflection_server_spec` from `Genkit.__init__` — server starts automatically via `GENKIT_ENV=dev`, port is auto-selected; expose port override as env var `GENKIT_REFLECTION_PORT` if needed (PR #4812 does the right thing but left the param in)
+- Make `ai.registry` private (`ai._registry`); remove direct access from all samples
+- Fix `part.root.text` / `part.root.media` ergonomics — Pydantic `RootModel` internals should not surface to users
+- Flatten `ExecutablePrompt` `opts: PromptGenerateOptions` TypedDict → flat kwargs (consistent with `generate()`)
+- Remove `on_chunk` callback from `generate()` — use `generate_stream()` instead
+- Change `generate_stream()` return type from `tuple[AsyncIterator, Future]` to `GenerateStreamResponse` — unifies with `prompt.stream()` which already returns `GenerateStreamResponse`
+- Introduce streaming type hierarchy (see `streaming.md`): `ActionStreamResponse[ChunkT, OutputT]` as base, `FlowStreamResponse[ChunkT, OutputT]` subclasses it, `GenerateStreamResponse[OutputT]` subclasses `FlowStreamResponse` with `ChunkT` pinned to `GenerateResponseChunk`
+- Fix `Action.stream()` to return `ActionStreamResponse[ChunkT, OutputT]` instead of raw tuple
+- Make `ActionRunContext` generic: `ActionRunContext[ChunkT]` so `send_chunk(chunk: ChunkT)` is type-safe — matches Go's `StreamCallback[Stream]` and JS's `ActionFnArg<S>`; currently `send_chunk(chunk: object)` accepts anything. `ToolRunContext` does NOT pin to `GenerateResponseChunk` — JS's `ToolAction` hardcodes the streaming type as `z.ZodTypeAny` (explicitly untyped) because tools borrow the parent generate's callback and don't own their chunk schema; `class ToolRunContext(ActionRunContext[object])` is the correct equivalent
+- Fix `FlowWrapper.stream()` to return `FlowStreamResponse[ChunkT, OutputT]` instead of raw tuple; fix `input: object` → `input: InputT`
+- Fix `Channel` internals: (1) simplify to `Generic[T]` — the `R` close-result type parameter is unnecessary coupling; (2) fix `_pop()` falsy check `if not r` → `if r is None` — current code incorrectly stops iteration on any falsy chunk value (empty string, `0`, `False`)
+- Tighten `Callable[..., Any]` on `define_prompt()` resolver params — current code uses `Callable[..., Any]` everywhere; correct parametrized forms are `Callable[[InputT, dict | None], str | Part | list[Part]]` for `system`/`prompt`, `Callable[[InputT, dict | None], list[Message]]` for `messages`, `Callable[[InputT, dict | None], list[Document]]` for `docs`
+- `ai.retrieve()` should return `list[Document]` not `RetrieverResponse` — JS converts wire `DocumentData` to `Document` veneers before returning (`response.documents.map(d => new Document(d))`); Python currently leaks the raw wire type, breaking the retrieve → generate pipeline ergonomics

From 2e0cc6d87ea2b7816d6abba9f397fd87b323a19b Mon Sep 17 00:00:00 2001
From: Jeff Huang <huangjeff@google.com>
Date: Tue, 24 Feb 2026 14:59:29 -0600
Subject: [PATCH 15/17] streaming wip

---
 py/docs/streaming.md | 220 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 220 insertions(+)
 create mode 100644 py/docs/streaming.md

diff --git a/py/docs/streaming.md b/py/docs/streaming.md
new file mode 100644
index 0000000000..335b6e48df
--- /dev/null
+++ b/py/docs/streaming.md
@@ -0,0 +1,220 @@
+# Python Streaming Design
+
+> **Status**: design proposal — see pre-review action items at the bottom for gaps between this design and the current implementation.
+
+---
+
+## Model
+
+Go and JS expose streaming as a single iterator that interleaves chunks and the final response. Python diverges deliberately: every streaming call returns a **two-channel wrapper object** with separate properties for chunks and the final response.
+
+```python
+result = flow.stream(input)
+
+async for chunk in result.stream:    # AsyncIterable[ChunkT]
+    print(chunk)
+
+response = await result.response     # Awaitable[OutputT]
+```
+
+This avoids the awkward "last item is the response" sentinel pattern that Go uses (`iter.Seq2[*StreamingFlowValue[S, O], error]`) and lets callers consume the stream and the response independently — e.g. start displaying chunks while also `await`-ing the final value in a separate task.
+
+---
+
+## Type hierarchy
+
+Three concrete wrapper classes, one inheritance chain:
+
+```
+ActionStreamResponse[ChunkT, OutputT]          ← base (action.stream())
+    └── FlowStreamResponse[ChunkT, OutputT]    ← flow.stream()
+            └── GenerateStreamResponse[OutputT] ← generate_stream(), prompt.stream()
+                                                   ChunkT pinned to GenerateResponseChunk
+                                                   OutputT wrapped in GenerateResponse[OutputT]
+```
+
+```python
+from typing import Generic, AsyncIterable, Awaitable
+ChunkT = TypeVar('ChunkT')
+OutputT = TypeVar('OutputT')
+
+class ActionStreamResponse(Generic[ChunkT, OutputT]):
+    @property
+    def stream(self) -> AsyncIterable[ChunkT]: ...
+    @property
+    def response(self) -> Awaitable[OutputT]: ...
+
+class FlowStreamResponse(ActionStreamResponse[ChunkT, OutputT]):
+    pass  # same interface, narrows the source
+
+class GenerateStreamResponse(FlowStreamResponse[GenerateResponseChunk, GenerateResponse[OutputT]]):
+    # ChunkT is pinned — generate always emits GenerateResponseChunk
+    # OutputT is the user's schema type (e.g. MyModel), wrapped in GenerateResponse
+    pass
+```
+
+`GenerateStreamResponse[OutputT]` is effectively `FlowStreamResponse[GenerateResponseChunk, GenerateResponse[OutputT]]` with the chunk type fixed. This lets callers write `async for chunk in result.stream` and get `GenerateResponseChunk` objects with `.text`, `.index`, etc. without needing to annotate the type themselves.
+
+---
+
+## Surfaces
+
+### `action.stream()`
+
+```python
+action.stream(
+    input: InputT | None = None,
+    *,
+    context: dict[str, object] | None = None,
+    telemetry_labels: dict[str, object] | None = None,
+    timeout: float | None = None,
+) -> ActionStreamResponse[ChunkT, OutputT]
+```
+
+```python
+result = my_action.stream(input_data)
+async for chunk in result.stream:
+    print(chunk)
+output = await result.response
+```
+
+### `flow.stream()`
+
+```python
+flow.stream(
+    input: InputT | None = None,
+    *,
+    context: dict[str, object] | None = None,
+    timeout: float | None = None,
+) -> FlowStreamResponse[ChunkT, OutputT]
+```
+
+```python
+result = my_flow.stream({"query": "hello"})
+async for chunk in result.stream:
+    print(chunk)
+final = await result.response
+```
+
+### `generate_stream()`
+
+```python
+# 4 overloads — see python_beta_api_proposal.md §2 for full signatures
+def generate_stream(
+    self,
+    *,
+    model: ModelReference[C] | str | None = None,
+    output_schema: type[OutputT] | dict[str, object] | None = None,
+    ...
+) -> GenerateStreamResponse[OutputT]
+```
+
+```python
+result = ai.generate_stream(
+    model=gemini_flash,
+    prompt="Tell me a story",
+    output_schema=StorySchema,
+)
+async for chunk in result.stream:
+    print(chunk.text, end="", flush=True)
+story: GenerateResponse[StorySchema] = await result.response
+print(story.output.title)
+```
+
+### `prompt.stream()`
+
+```python
+# On ExecutablePrompt[InputT, OutputT]
+def stream(
+    self,
+    input: InputT | None = None,
+    *,
+    timeout: float | None = None,
+) -> GenerateStreamResponse[OutputT]
+```
+
+```python
+result = my_prompt.stream({"topic": "space"})
+async for chunk in result.stream:
+    print(chunk.text, end="")
+response = await result.response
+```
+
+---
+
+## Internal: `Channel[T]`
+
+All streaming wrappers are backed by a `Channel[T]` — a thin async queue that bridges the producer (action implementation) and consumer (caller).
+
+```python
+class Channel(Generic[T]):
+    async def send(self, chunk: T) -> None: ...      # producer pushes a chunk
+    def close(self) -> None: ...                     # producer signals completion
+    def set_response(self, value: Any) -> None: ...  # producer delivers final result
+    def __aiter__(self) -> AsyncIterator[T]: ...     # consumer iterates chunks
+```
+
+**Key invariants**:
+- `None` is the sentinel that signals the iterator to stop — chunk types must not be `None` (use `Optional`-wrapped types if needed).
+- The response future is separate from the chunk channel — `await result.response` never needs to drain the stream first.
+- `_pop()` must use `if r is None` (not `if not r`) — otherwise falsy chunks (empty string `""`, `0`, `False`) incorrectly terminate iteration. *(Pre-review action item — current code uses `if not r`.)*
+
+**Current implementation** (`genkit.aio.channel`): `Channel` is typed as `Generic[T, R]` with a second type parameter `R` for the close-result type. The design simplifies this to `Generic[T]` — the close-result type adds coupling without benefit. The response is a separate `asyncio.Future` on the wrapper object, not baked into the channel.
+
+---
+
+## Producer interface
+
+Action, flow, and model implementations emit chunks through `ActionRunContext[ChunkT]`, passed as the second argument to the action function:
+
+```python
+@ai.flow()
+async def my_flow(input: str, ctx: ActionRunContext[str]) -> str:
+    for word in input.split():
+        await ctx.send_chunk(word)   # type-safe: ChunkT is str
+    return input
+
+ctx.is_streaming              # bool — False means caller didn't request a stream; send_chunk is a no-op
+ctx.send_chunk(chunk: ChunkT) # pushes chunk to consumer; no-op if not streaming
+ctx.context                   # dict[str, object] — request-scoped metadata
+```
+
+**Cross-language comparison**:
+
+| | Producer interface | Notes |
+|---|---|---|
+| **Go** | `StreamCallback[Stream]` callback param (nil if not streaming) | Caller checks nil before calling |
+| **JS** | `ActionFnArg<S>` + `FlowSideChannel<S>` — two separate types | Flows and actions have different producer objects |
+| **Python** | `ActionRunContext[ChunkT]` — unified | Single class for actions, flows, and models; `is_streaming` replaces nil check |
+
+**`ToolRunContext`**: Tools do not define their own chunk schema — they borrow the parent `generate()` call's callback. Therefore `ToolRunContext` is `ActionRunContext[object]` (ChunkT = `object`, explicitly untyped), matching JS's `ToolAction` which hardcodes the streaming type as `z.ZodTypeAny`.
+
+---
+
+## Transport layer
+
+The reflection server (Dev UI ↔ Python runtime) uses **Server-Sent Events (SSE)** to forward chunks over HTTP. This is an implementation detail — it does not affect the consumer API. The `Channel` is the in-process abstraction; SSE is how it crosses the wire to the Dev UI during local development.
+
+---
+
+## Cross-language comparison
+
+| Surface | Go | JS | Python |
+|---|---|---|---|
+| **action.stream()** | `action.Stream(ctx, input, cb)` — `cb StreamCallback[S]` | `action.stream(input)` → `ActionStreamResponse<S, O>` | `action.stream(input)` → `ActionStreamResponse[ChunkT, OutputT]` |
+| **flow.stream()** | `flow.Stream(ctx, input)` → `iter.Seq2[*StreamingFlowValue[S,O], error]` | `flow.stream(input)` → `FlowStreamResponse<S, O>` | `flow.stream(input)` → `FlowStreamResponse[ChunkT, OutputT]` |
+| **generate_stream()** | `genkit.GenerateStream(ctx, req)` → `iter.Seq2[*GenerateResponseChunk, error]` | `ai.generateStream(opts)` → `GenerateStreamResponse<O>` | `ai.generate_stream(...)` → `GenerateStreamResponse[OutputT]` |
+| **prompt.stream()** | `prompt.Stream(ctx, input)` | `prompt.stream(input)` → `GenerateStreamResponse<O>` | `prompt.stream(input)` → `GenerateStreamResponse[OutputT]` |
+| **chat.stream()** | n/a | `chat.sendStream(input)` → `GenerateStreamResponse<O>` | not yet implemented |
+| **Chunk/response split** | Single iterator, last value is response | Two-channel wrapper object | Two-channel wrapper object |
+| **Producer** | `StreamCallback[S]` func param | `ActionFnArg<S>` / `FlowSideChannel<S>` | `ActionRunContext[ChunkT]` |
+
+---
+
+## What's not implemented yet
+
+- **`Chat.send_stream()`** — no streaming equivalent for `chat.send()`.
+- **`Action.stream()`** — currently returns a raw tuple `(AsyncIterator, Awaitable)`, not `ActionStreamResponse`. Needs to be updated to return the wrapper.
+- **`FlowWrapper.stream()`** — same: currently returns raw tuple. Needs to return `FlowStreamResponse[ChunkT, OutputT]`.
+- **`Channel` cleanup** — needs two fixes: simplify to `Generic[T]` (drop `R`), and fix `_pop()` falsy sentinel check.
+- **`ActionRunContext` generics** — currently `send_chunk(chunk: object)`. Needs to become `ActionRunContext[ChunkT]` with `send_chunk(chunk: ChunkT)` for type safety.

From 25f6d8e9ac753709ed8d2ac77468fe44918c5de5 Mon Sep 17 00:00:00 2001
From: Jeff Huang <huangjeff@google.com>
Date: Thu, 26 Feb 2026 14:43:08 -0600
Subject: [PATCH 16/17] Updated types

---
 py/docs/python_beta_api_proposal.md | 616 +++++++++-------------------
 1 file changed, 205 insertions(+), 411 deletions(-)

diff --git a/py/docs/python_beta_api_proposal.md b/py/docs/python_beta_api_proposal.md
index 7b9ab5e91c..951de762f5 100644
--- a/py/docs/python_beta_api_proposal.md
+++ b/py/docs/python_beta_api_proposal.md
@@ -18,14 +18,12 @@ from genkit import (
     # Core
     Genkit,
     ActionRunContext,
-    GenerateResponse,         # veneer alias of GenerateResponseWrapper
-    GenerateResponseChunk,    # veneer alias of GenerateResponseChunkWrapper
-    ActionStreamResponse,     # base streaming wrapper — Action.stream()
-    FlowStreamResponse,       # flow streaming wrapper — FlowWrapper.stream()
-    GenerateStreamResponse,   # generate/prompt streaming wrapper — subclass of FlowStreamResponse
+    ModelResponse, # renamed from GenerateResponse, wire format + veneer unified
+    ModelResponseChunk, # renamed from GenerateResponse, wire format + veneer unified
+
     ExecutablePrompt,
     GenkitError,
-    PublicError,         # renamed from UserFacingError — matches Go's NewPublicError
+    PublicError,         # renamed from UserFacingError
 
     # Content types
     Part, TextPart, MediaPart, Media,
@@ -36,25 +34,25 @@ from genkit import (
     Message, Role,
 
     # Documents
-    Document, DocumentData, DocumentPart,
+    Document, DocumentPart,
 
     # Tool context
     ToolRunContext,
-    ToolInterruptError,
+    ToolInterruptError, 
     ToolChoice,
 
     # Generation config
-    GenerationCommonConfig,
+    ModelConfig # Renamed from GenerationCommonConfig
 
     # Evaluation
     BaseEvalDataPoint,
 
-    # Web framework integration
-    RequestData,
-    ContextProvider,
+    Flow,                     # Useful for annotation? 50/50 on this one
 
-    # Constants
-    is_dev_environment,
+    # WIP - Streaming Type Annotation
+    ActionStreamResponse,     # base streaming wrapper — Action.stream()
+    FlowStreamResponse,       # flow streaming wrapper — Flow.stream()
+    ModelStreamResponse,      # model/prompt streaming wrapper — subclass of FlowStreamResponse
 
 )
 ```
@@ -63,9 +61,9 @@ from genkit import (
 
 ```python
 from genkit.model import (
-    GenerateRequest,
-    GenerateResponse,        # schema type (NOT the veneer — see §5)
-    GenerateResponseChunk,
+    ModelRequest, # Renamed from GenerateRequest
+    ModelResponse, # Renamed from GenerateResponse, wire format + veneer unified
+    ModelResponseChunk, # Renamed from GenerateResponseChunk, wire format + veneer unified
     GenerationUsage,
     Candidate,
     OutputConfig,
@@ -87,17 +85,16 @@ from genkit.model import (
     lookup_background_action,
     compute_usage_stats,
     resolve_api_key,
-    GenerationCommonConfig,
-    ModelMiddleware,
-    ModelMiddlewareNext,
+    ModelConfig # Renamed from GenerationCommonConfig
 )
 ```
 
+Note: DAP and Model Middleware exports will be included in `genkit.model` namespace. Still working on the re-design of these features. Will update this API surface when done.
+
 ### `genkit.retriever`
 
 ```python
 from genkit.retriever import (
-    RetrieverRef,
     RetrieverRequest,
     RetrieverResponse,
     retriever_action_metadata,
@@ -112,7 +109,6 @@ from genkit.retriever import (
 
 ```python
 from genkit.embedder import (
-    EmbedderRef,
     EmbedRequest,
     EmbedResponse,
     Embedding,
@@ -152,16 +148,10 @@ from genkit.evaluator import (
 )
 ```
 
-### `genkit.tracing` — telemetry plugin authors
-
-```python
-from genkit.tracing import tracer, add_custom_exporter
-```
-
-### `genkit.plugin` — all plugin authors
+### `genkit.plugin_api` — all plugin authors
 
 ```python
-from genkit.plugin import (
+from genkit.plugin_api import (
     # Base class and framework primitives
     Plugin,
     Action,
@@ -185,9 +175,16 @@ from genkit.plugin import (
 ```
 
 Note: The domain sub-modules (`genkit.model`, `genkit.retriever`, etc.) are still the canonical
-paths for domain-specific types. `genkit.plugin` re-exports the cross-cutting framework primitives
+paths for domain-specific types. `genkit.plugin_api` re-exports the cross-cutting framework primitives
 and provides a single entry point for plugin authors who don't want to hunt across multiple paths.
 
+**Canonical import policy (beta):**
+- App developers use `from genkit import ...` for the application-facing API.
+- Plugin authors use `from genkit.plugin_api import ...` for framework primitives (`Plugin`, `Action`, etc.).
+- Domain modules (`genkit.model`, `genkit.retriever`, `genkit.embedder`, `genkit.reranker`, `genkit.evaluator`) are canonical for domain-specific types.
+- Prefer domain-specific imports over importing from `genkit.plugin_api` in all app-developer facing docs and samples. `genkit.plugin_api` convenience exports should be reserved for plugin author-facing documentation.
+- Telemetry/tracing helpers remain core/internal for beta (`genkit.core.tracing`) and align to OpenTelemetry semantics rather than a separate public tracing namespace. (WIP, need to flesh out the primary user journeys more clearly here)
+
 ---
 
 ## 2. Client Construction
@@ -213,231 +210,109 @@ High-traffic paths only — not exhaustive.
 ### `Genkit`
 
 ```python
-from typing import Any, overload
 C = TypeVar('C', bound=GenerationCommonConfig)
 InputT = TypeVar('InputT')
 OutputT = TypeVar('OutputT')
-# Key invariant: TypeVars are bound ONLY when the caller passes a concrete type[T] argument.
-# When a parameter typed type[T] has no default, the overload only matches when explicitly
-# provided — that absence-of-default is the mechanism that triggers TypeVar binding.
-# Catch-all overloads use dict[str, object] | None = None and return Any-parameterized types.
-
-# generate() has 4 overloads — the (model type) × (output_schema type) cross-product:
-#   [1] ModelReference[C] + type[OutputT]  →  GenerateResponse[OutputT]   ← C and OutputT both bound
-#   [2] ModelReference[C] + dict/None      →  GenerateResponse[Any]
-#   [3] str/None          + type[OutputT]  →  GenerateResponse[OutputT]   ← OutputT bound, C erased
-#   [4] str/None          + dict/None      →  GenerateResponse[Any]       ← catch-all
-#
-# Typed overloads (1, 3): output_schema has NO default — absence of a default is what forces
-# the type checker to bind OutputT only when the caller explicitly passes a concrete type.
-# Catch-all overloads (2, 4): output_schema: dict[str, object] | None = None covers
-# raw JSON Schema dicts, None, and the omitted-entirely case.
 
-# [1] ModelReference[C] + typed schema — both C and OutputT bound:
+# generate(): exact 4-overload matrix
+# Shared params omitted below:
+# prompt, system, messages, tools, return_tool_requests, tool_choice, tool_responses,
+# max_turns, context, output_format, output_content_type, output_instructions,
+# output_constrained, use, docs
+#
+# 1) typed model + typed output
 @overload
 async def generate(
     self,
     *,
     model: ModelReference[C],
     config: C | None = None,
-    prompt: str | Part | list[Part] | None = None,
-    system: str | Part | list[Part] | None = None,
-    messages: list[Message] | None = None,
-    tools: list[str | Action | ExecutablePrompt] | None = None,
-    return_tool_requests: bool | None = None,
-    tool_choice: ToolChoice | None = None,
-    tool_responses: list[Part] | None = None,
-    max_turns: int | None = None,
-    context: dict[str, object] | None = None,
-    output_schema: type[OutputT],           # no default — binds OutputT
-    output_format: str | None = None,
-    output_content_type: str | None = None,
-    output_instructions: bool | str | None = None,
-    output_constrained: bool | None = None,
-    use: list[ModelMiddleware] | None = None,
-    docs: list[Document] | None = None,
-) -> GenerateResponse[OutputT]: ...
+    output_schema: type[OutputT],
+    ...,
+) -> ModelResponse[OutputT]: ...
 
-# [2] ModelReference[C] + untyped schema:
+# 2) typed model + untyped output
 @overload
 async def generate(
     self,
     *,
     model: ModelReference[C],
     config: C | None = None,
-    prompt: str | Part | list[Part] | None = None,
-    system: str | Part | list[Part] | None = None,
-    messages: list[Message] | None = None,
-    tools: list[str | Action | ExecutablePrompt] | None = None,
-    return_tool_requests: bool | None = None,
-    tool_choice: ToolChoice | None = None,
-    tool_responses: list[Part] | None = None,
-    max_turns: int | None = None,
-    context: dict[str, object] | None = None,
     output_schema: dict[str, object] | None = None,
-    output_format: str | None = None,
-    output_content_type: str | None = None,
-    output_instructions: bool | str | None = None,
-    output_constrained: bool | None = None,
-    use: list[ModelMiddleware] | None = None,
-    docs: list[Document] | None = None,
-) -> GenerateResponse[Any]: ...
+    ...,
+) -> ModelResponse[Any]: ...
 
-# [3] str model + typed schema — OutputT bound, config falls back to GenerationCommonConfig:
+# 3) string model + typed output
 @overload
 async def generate(
     self,
     *,
     model: str | None = None,
     config: GenerationCommonConfig | None = None,
-    prompt: str | Part | list[Part] | None = None,
-    system: str | Part | list[Part] | None = None,
-    messages: list[Message] | None = None,
-    tools: list[str | Action | ExecutablePrompt] | None = None,
-    return_tool_requests: bool | None = None,
-    tool_choice: ToolChoice | None = None,
-    tool_responses: list[Part] | None = None,
-    max_turns: int | None = None,
-    context: dict[str, object] | None = None,
-    output_schema: type[OutputT],           # no default — binds OutputT
-    output_format: str | None = None,
-    output_content_type: str | None = None,
-    output_instructions: bool | str | None = None,
-    output_constrained: bool | None = None,
-    use: list[ModelMiddleware] | None = None,
-    docs: list[Document] | None = None,
-) -> GenerateResponse[OutputT]: ...
+    output_schema: type[OutputT],
+    ...,
+) -> ModelResponse[OutputT]: ...
 
-# [4] str model + untyped schema — catch-all (dict, None, or omitted):
+# 4) string model + untyped output
 @overload
 async def generate(
     self,
     *,
     model: str | None = None,
     config: GenerationCommonConfig | None = None,
-    prompt: str | Part | list[Part] | None = None,
-    system: str | Part | list[Part] | None = None,
-    messages: list[Message] | None = None,
-    tools: list[str | Action | ExecutablePrompt] | None = None,
-    return_tool_requests: bool | None = None,
-    tool_choice: ToolChoice | None = None,
-    tool_responses: list[Part] | None = None,
-    max_turns: int | None = None,
-    context: dict[str, object] | None = None,
     output_schema: dict[str, object] | None = None,
-    output_format: str | None = None,
-    output_content_type: str | None = None,
-    output_instructions: bool | str | None = None,
-    output_constrained: bool | None = None,
-    use: list[ModelMiddleware] | None = None,
-    docs: list[Document] | None = None,
-) -> GenerateResponse[Any]: ...
-
-# generate_stream() has the same 4-overload structure, returning GenerateStreamResponse[T]:
-
-# [1] ModelReference[C] + typed schema:
+    ...,
+) -> ModelResponse[Any]: ...
+
+# generate_stream(): same 4-overload matrix as generate()
+# Shared params omitted below:
+# prompt, system, messages, tools, return_tool_requests, tool_choice,
+# max_turns, context, output_format, output_content_type, output_instructions,
+# output_constrained, use, docs, timeout
+#
+# 1) typed model + typed output
 @overload
 def generate_stream(
     self,
     *,
     model: ModelReference[C],
     config: C | None = None,
-    prompt: str | Part | list[Part] | None = None,
-    system: str | Part | list[Part] | None = None,
-    messages: list[Message] | None = None,
-    tools: list[str | Action | ExecutablePrompt] | None = None,
-    return_tool_requests: bool | None = None,
-    tool_choice: ToolChoice | None = None,
-    tool_responses: list[Part] | None = None,
-    max_turns: int | None = None,
-    context: dict[str, object] | None = None,
-    output_schema: type[OutputT],           # no default — binds OutputT
-    output_format: str | None = None,
-    output_content_type: str | None = None,
-    output_instructions: bool | str | None = None,
-    output_constrained: bool | None = None,
-    use: list[ModelMiddleware] | None = None,
-    docs: list[Document] | None = None,
-    timeout: float | None = None,
-) -> GenerateStreamResponse[OutputT]: ...
+    output_schema: type[OutputT],
+    ...,
+) -> ModelStreamResponse[OutputT]: ...
 
-# [2] ModelReference[C] + untyped schema:
+# 2) typed model + untyped output
 @overload
 def generate_stream(
     self,
     *,
     model: ModelReference[C],
     config: C | None = None,
-    prompt: str | Part | list[Part] | None = None,
-    system: str | Part | list[Part] | None = None,
-    messages: list[Message] | None = None,
-    tools: list[str | Action | ExecutablePrompt] | None = None,
-    return_tool_requests: bool | None = None,
-    tool_choice: ToolChoice | None = None,
-    tool_responses: list[Part] | None = None,
-    max_turns: int | None = None,
-    context: dict[str, object] | None = None,
     output_schema: dict[str, object] | None = None,
-    output_format: str | None = None,
-    output_content_type: str | None = None,
-    output_instructions: bool | str | None = None,
-    output_constrained: bool | None = None,
-    use: list[ModelMiddleware] | None = None,
-    docs: list[Document] | None = None,
-    timeout: float | None = None,
-) -> GenerateStreamResponse[Any]: ...
+    ...,
+) -> ModelStreamResponse[Any]: ...
 
-# [3] str model + typed schema:
+# 3) string model + typed output
 @overload
 def generate_stream(
     self,
     *,
     model: str | None = None,
     config: GenerationCommonConfig | None = None,
-    prompt: str | Part | list[Part] | None = None,
-    system: str | Part | list[Part] | None = None,
-    messages: list[Message] | None = None,
-    tools: list[str | Action | ExecutablePrompt] | None = None,
-    return_tool_requests: bool | None = None,
-    tool_choice: ToolChoice | None = None,
-    tool_responses: list[Part] | None = None,
-    max_turns: int | None = None,
-    context: dict[str, object] | None = None,
-    output_schema: type[OutputT],           # no default — binds OutputT
-    output_format: str | None = None,
-    output_content_type: str | None = None,
-    output_instructions: bool | str | None = None,
-    output_constrained: bool | None = None,
-    use: list[ModelMiddleware] | None = None,
-    docs: list[Document] | None = None,
-    timeout: float | None = None,
-) -> GenerateStreamResponse[OutputT]: ...
+    output_schema: type[OutputT],
+    ...,
+) -> ModelStreamResponse[OutputT]: ...
 
-# [4] str model + untyped schema — catch-all:
+# 4) string model + untyped output
 @overload
 def generate_stream(
     self,
     *,
     model: str | None = None,
     config: GenerationCommonConfig | None = None,
-    prompt: str | Part | list[Part] | None = None,
-    system: str | Part | list[Part] | None = None,
-    messages: list[Message] | None = None,
-    tools: list[str | Action | ExecutablePrompt] | None = None,
-    return_tool_requests: bool | None = None,
-    tool_choice: ToolChoice | None = None,
-    tool_responses: list[Part] | None = None,
-    max_turns: int | None = None,
-    context: dict[str, object] | None = None,
     output_schema: dict[str, object] | None = None,
-    output_format: str | None = None,
-    output_content_type: str | None = None,
-    output_instructions: bool | str | None = None,
-    output_constrained: bool | None = None,
-    use: list[ModelMiddleware] | None = None,
-    docs: list[Document] | None = None,
-    timeout: float | None = None,
-) -> GenerateStreamResponse[Any]: ...
+    ...,
+) -> ModelStreamResponse[Any]: ...
 
 # Retrieval
 async def retrieve(
@@ -446,7 +321,7 @@ async def retrieve(
     query: str | Document,
     *,
     options: dict[str, object] | None = None,  # plugin-defined schema; shape varies per retriever
-) -> list[Document]: ...  # JS parity: wire DocumentData converted to Document veneers
+) -> list[Document]: ...
 
 # Embedding
 async def embed(
@@ -457,65 +332,68 @@ async def embed(
     options: dict[str, object] | None = None,  # plugin-defined schema; shape varies per embedder
 ) -> list[Embedding]: ...
 
-# Prompt lookup — 4 overloads (InputT × OutputT cross-product):
-# [1] Both bound:
+# Prompt lookup: same 4-overload input/output matrix as define_prompt()
+# Shared params omitted below:
+# variant
+#
+# 1) typed input + typed output
 @overload
 def prompt(
     self,
     name: str,
-    variant: str | None = None,
     *,
-    input_schema: type[InputT],             # no default — binds InputT
-    output_schema: type[OutputT],           # no default — binds OutputT
+    input_schema: type[InputT],
+    output_schema: type[OutputT],
+    ...,
 ) -> ExecutablePrompt[InputT, OutputT]: ...
 
-# [2] InputT bound, OutputT untyped — typed input, unstructured output:
+# 2) typed input + untyped output
 @overload
 def prompt(
     self,
     name: str,
-    variant: str | None = None,
     *,
-    input_schema: type[InputT],             # no default — binds InputT
+    input_schema: type[InputT],
     output_schema: dict[str, object] | None = None,
+    ...,
 ) -> ExecutablePrompt[InputT, Any]: ...
 
-# [3] OutputT bound, InputT untyped — unstructured input, typed output:
+# 3) untyped input + typed output
 @overload
 def prompt(
     self,
     name: str,
-    variant: str | None = None,
     *,
     input_schema: dict[str, object] | None = None,
-    output_schema: type[OutputT],           # no default — binds OutputT
+    output_schema: type[OutputT],
+    ...,
 ) -> ExecutablePrompt[Any, OutputT]: ...
 
-# [4] Catch-all — neither bound (dict, None, or omitted):
+# 4) untyped input + untyped output
 @overload
 def prompt(
     self,
     name: str,
-    variant: str | None = None,
     *,
     input_schema: dict[str, object] | None = None,
     output_schema: dict[str, object] | None = None,
+    ...,
 ) -> ExecutablePrompt[Any, Any]: ...
 
 # Decorators
 @ai.flow(name: str | None = None)
 async def my_flow(input: InputT) -> OutputT: ...
-# Returns: FlowWrapper
+# Returns: Flow
 
 @ai.tool(name: str | None = None, description: str | None = None)
-def my_tool(input: InputT, ctx: ToolRunContext | None = None) -> OutputT: ...
+def my_tool(input: InputT, ctx: ToolRunContext) -> OutputT: ...
 ```
 
 ### `ExecutablePrompt` — returned by `ai.prompt()` / `@ai.define_prompt`
 
 ```python
 # Call like a function
-await prompt(input: InputT | None = None) -> GenerateResponse[OutputT]
+await prompt(input: InputT | None = None) -> ModelResponse[OutputT]
 
 # Stream
 def stream(
@@ -523,7 +401,7 @@ def stream(
     input: InputT | None = None,
     *,
     timeout: float | None = None,
-) -> GenerateStreamResponse[OutputT]
+) -> ModelStreamResponse[OutputT]
 
 # Render without executing
 async def render(
@@ -532,7 +410,7 @@ async def render(
 ) -> GenerateActionOptions
 ```
 
-### `FlowWrapper` — returned by `@ai.flow`
+### `Flow` — returned by `@ai.flow`
 
 ```python
 # Call like a function — same signature as the wrapped flow
@@ -541,7 +419,7 @@ flow(*args, **kwargs) -> Awaitable[OutputT]
 # Stream
 def stream(
     self,
-    input: InputT | None = None,
+    input: InputT = None,
     *,
     context: dict[str, object] | None = None,
     telemetry_labels: dict[str, object] | None = None,
@@ -552,6 +430,52 @@ def stream(
 ### Plugin authoring surface
 
 ```python
+# define_prompt(): 4-overload input/output matrix only
+# Shared params omitted below:
+# name, variant, model, config, description, system, prompt, messages,
+# docs, output_format, output_content_type, output_instructions,
+# output_constrained, tools, tool_choice, return_tool_requests, max_turns, use
+#
+# 1) typed input + typed output
+@overload
+def define_prompt(
+    self,
+    *,
+    input: Input[InputT],
+    output: Output[OutputT],
+    ...,
+) -> ExecutablePrompt[InputT, OutputT]: ...
+
+# 2) typed input + untyped output
+@overload
+def define_prompt(
+    self,
+    *,
+    input: Input[InputT],
+    output: Output[Any] | None = None,
+    ...,
+) -> ExecutablePrompt[InputT, Any]: ...
+
+# 3) untyped input + typed output
+@overload
+def define_prompt(
+    self,
+    *,
+    input: Input[Any] | None = None,
+    output: Output[OutputT],
+    ...,
+) -> ExecutablePrompt[Any, OutputT]: ...
+
+# 4) untyped input + untyped output
+@overload
+def define_prompt(
+    self,
+    *,
+    input: Input[Any] | None = None,
+    output: Output[Any] | None = None,
+    ...,
+) -> ExecutablePrompt[Any, Any]: ...
+
 def define_model(
     self,
     name: str,
@@ -586,120 +510,35 @@ def define_retriever(
 ) -> Action: ...
 
 # InputT binds through input_schema — all Callables and the return type are typed accordingly
-InputT = TypeVar('InputT')
-
-# define_prompt() — 4 overloads (InputT × OutputT cross-product):
 
-# [1] Both bound — typed input, typed output:
-@overload
 def define_prompt(
     self,
     name: str | None = None,
     *,
     variant: str | None = None,
     model: str | None = None,
-    config: GenerationCommonConfig | None = None,
+    config: ModelConfig | None = None,  # or GeminiConfig, OpenAIConfig, etc. for model-specific fields
     description: str | None = None,
-    input_schema: type[InputT],             # no default — binds InputT
+    input_schema: type[InputT] | None = None,      # binds InputT for callables below
     system: str | Part | list[Part] | Callable[[InputT, dict | None], str | Part | list[Part]] | None = None,
     prompt: str | Part | list[Part] | Callable[[InputT, dict | None], str | Part | list[Part]] | None = None,
     messages: str | list[Message] | Callable[[InputT, dict | None], list[Message]] | None = None,
     docs: list[Document] | Callable[[InputT, dict | None], list[Document]] | None = None,
-    output_schema: type[OutputT],           # no default — binds OutputT
+    output_schema: type | dict[str, object] | None = None,
     output_format: str | None = None,
     output_content_type: str | None = None,
     output_instructions: bool | str | None = None,
     output_constrained: bool | None = None,
-    tools: list[str | Action | ExecutablePrompt] | None = None,
+    tools: list[str | Action | ExecutablePrompt] | None = None,  # str = registered name, Action = inline tool, ExecutablePrompt = sub-agent
     tool_choice: ToolChoice | None = None,
     return_tool_requests: bool | None = None,
     max_turns: int | None = None,
     use: list[ModelMiddleware] | None = None,
-) -> ExecutablePrompt[InputT, OutputT]: ...
-
-# [2] InputT bound, OutputT untyped — typed input, unstructured output:
-@overload
-def define_prompt(
-    self,
-    name: str | None = None,
-    *,
-    variant: str | None = None,
-    model: str | None = None,
-    config: GenerationCommonConfig | None = None,
-    description: str | None = None,
-    input_schema: type[InputT],             # no default — binds InputT
-    system: str | Part | list[Part] | Callable[[InputT, dict | None], str | Part | list[Part]] | None = None,
-    prompt: str | Part | list[Part] | Callable[[InputT, dict | None], str | Part | list[Part]] | None = None,
-    messages: str | list[Message] | Callable[[InputT, dict | None], list[Message]] | None = None,
-    docs: list[Document] | Callable[[InputT, dict | None], list[Document]] | None = None,
-    output_schema: dict[str, object] | None = None,
-    output_format: str | None = None,
-    output_content_type: str | None = None,
-    output_instructions: bool | str | None = None,
-    output_constrained: bool | None = None,
-    tools: list[str | Action | ExecutablePrompt] | None = None,
-    tool_choice: ToolChoice | None = None,
-    return_tool_requests: bool | None = None,
-    max_turns: int | None = None,
-    use: list[ModelMiddleware] | None = None,
-) -> ExecutablePrompt[InputT, Any]: ...
-
-# [3] OutputT bound, InputT untyped — unstructured input, typed output:
-@overload
-def define_prompt(
-    self,
-    name: str | None = None,
-    *,
-    variant: str | None = None,
-    model: str | None = None,
-    config: GenerationCommonConfig | None = None,
-    description: str | None = None,
-    input_schema: dict[str, object] | None = None,
-    system: str | Part | list[Part] | Callable[..., str | Part | list[Part]] | None = None,
-    prompt: str | Part | list[Part] | Callable[..., str | Part | list[Part]] | None = None,
-    messages: str | list[Message] | Callable[..., list[Message]] | None = None,
-    docs: list[Document] | Callable[..., list[Document]] | None = None,
-    output_schema: type[OutputT],           # no default — binds OutputT
-    output_format: str | None = None,
-    output_content_type: str | None = None,
-    output_instructions: bool | str | None = None,
-    output_constrained: bool | None = None,
-    tools: list[str | Action | ExecutablePrompt] | None = None,
-    tool_choice: ToolChoice | None = None,
-    return_tool_requests: bool | None = None,
-    max_turns: int | None = None,
-    use: list[ModelMiddleware] | None = None,
-) -> ExecutablePrompt[Any, OutputT]: ...
-
-# [4] Catch-all — neither bound (dict, None, or omitted):
-@overload
-def define_prompt(
-    self,
-    name: str | None = None,
-    *,
-    variant: str | None = None,
-    model: str | None = None,
-    config: GenerationCommonConfig | None = None,
-    description: str | None = None,
-    input_schema: dict[str, object] | None = None,
-    system: str | Part | list[Part] | Callable[..., str | Part | list[Part]] | None = None,
-    prompt: str | Part | list[Part] | Callable[..., str | Part | list[Part]] | None = None,
-    messages: str | list[Message] | Callable[..., list[Message]] | None = None,
-    docs: list[Document] | Callable[..., list[Document]] | None = None,
-    output_schema: dict[str, object] | None = None,
-    output_format: str | None = None,
-    output_content_type: str | None = None,
-    output_instructions: bool | str | None = None,
-    output_constrained: bool | None = None,
-    tools: list[str | Action | ExecutablePrompt] | None = None,
-    tool_choice: ToolChoice | None = None,
-    return_tool_requests: bool | None = None,
-    max_turns: int | None = None,
-    use: list[ModelMiddleware] | None = None,
-) -> ExecutablePrompt[Any, Any]: ...
+) -> ExecutablePrompt[InputT]: ...
 
+# Streaming - WIP
 # Action — returned by define_model, define_tool, etc.
-# Calling streams and returns the base wrapper; FlowWrapper/generate_stream build on top
+# Calling streams and returns the base wrapper; Flow/generate_stream build on top
 action.stream(
     input: InputT | None = None,
     *,
@@ -714,10 +553,6 @@ action.stream(
 ctx.is_streaming              # bool — whether caller requested a stream
 ctx.send_chunk(chunk: ChunkT) # type-safe push; no-op if not streaming
 ctx.context                   # dict[str, object] — request context
-
-# ToolRunContext(ActionRunContext[object]) — ChunkT is object, not GenerateResponseChunk
-# Tools don't own their chunk schema — they borrow the parent generate's callback
-# JS ToolAction hardcodes streaming type as z.ZodTypeAny for the same reason
 ```
 
 ---
@@ -726,17 +561,17 @@ ctx.context                   # dict[str, object] — request context
 
 What users get back from calls and interact with.
 
-### `GenerateResponse` — from `generate()`, `await prompt(input)`
+### `ModelResponse` — from `generate()`, `await prompt(input)`
 
 ```python
 response.text          # str — full text of the response
 response.output        # OutputT — typed output if output schema was provided
-response.message       # MessageWrapper — the final message
-response.messages      # list[MessageWrapper] — full conversation history
+response.message       # Message — the final message
+response.messages      # list[Message] — full conversation history
 response.tool_requests # list[ToolRequestPart] — pending tool calls
 ```
 
-### `MessageWrapper` — accessed via `response.message` / `response.messages`
+### `Message` — used for both inputs and returned responses
 
 ```python
 message.text           # str — text content of the message
@@ -744,9 +579,7 @@ message.tool_requests  # list[ToolRequestPart]
 message.interrupts     # list[ToolRequestPart] — tool calls requiring user input
 ```
 
-Note: `MessageWrapper` is not exported for construction. Users construct `Message(role=..., content=[...])` and receive `MessageWrapper` back. See §5.
-
-### `GenerateResponseChunk` — stream chunks from `generate_stream()`
+### `ModelResponseChunk` — stream chunks from `generate_stream()`
 
 ```python
 chunk.text             # str — text in this chunk
@@ -754,9 +587,9 @@ chunk.output           # object — partial typed output
 chunk.accumulated_text # str — all text so far
 ```
 
-### Streaming wrappers — see [`streaming.md`](streaming.md)
+### Streaming wrappers — WIP
 
-Three wrapper types, one hierarchy (`ActionStreamResponse` → `FlowStreamResponse` → `GenerateStreamResponse`). All expose the same two properties:
+Three wrapper types, one hierarchy (`ActionStreamResponse` → `FlowStreamResponse` → `ModelStreamResponse`). All expose the same two properties:
 
 ```python
 result.stream    # AsyncIterable[ChunkT]
@@ -767,123 +600,84 @@ result.response  # Awaitable[OutputT]
 |---|---|---|---|
 | `ActionStreamResponse[C, O]` | `action.stream()` | action-defined | action-defined |
 | `FlowStreamResponse[C, O]` | `flow.stream()` | flow-defined | flow-defined |
-| `GenerateStreamResponse[O]` | `generate_stream()`, `prompt.stream()` | `GenerateResponseChunk` (fixed) | `GenerateResponse[O]` |
+| `ModelStreamResponse[O]` | `generate_stream()`, `prompt.stream()` | `ModelResponseChunk` (fixed) | `ModelResponse[O]` |
 
-### `RetrieverResponse` — from `retrieve()`
+### `retrieve()` return value
 
 ```python
-response.documents     # list[Document]
+documents              # list[Document]
 ```
 
 ---
 
 ## 5. Design Flags
 
-Open questions requiring explicit sign-off.
-
-### Sync vs async API surface
-
-All `Genkit` methods are `async def`. Users must `await` every call or run inside a flow.
-`run_main(coro)` exists as a helper for scripts. There is no sync API.
-
-Options:
-1. **Async-only** (current) — clean, but friction for scripts and simple use cases
-2. **Sync wrappers** — `ai.generate()` blocks, `ai.generate_stream()` stays async
-3. **Two classes** — `Genkit` (async) and `SyncGenkit` (sync), à la `httpx`
-
-Changing this post-beta requires a new class or a deprecation cycle.
-
+### Single public type per concept
 
-### Naming: `GenerateResponse` — veneer vs schema type
+For beta, Python uses one public type per concept (no split between "wire type" and
+"veneer type" in the public API):
 
-`from genkit import GenerateResponse` gives the veneer (wrapper with `.text`, `.output`, etc.).
-`from genkit.model import GenerateResponse` gives the raw Pydantic schema type.
+- `ModelResponse` is the single public response type used by app code and plugin contracts.
+- `ModelResponseChunk` is the single public streaming chunk type.
+- `Message` and `Document` are the single public message/document types for both construction and returned values.
 
-Same name, different type, different import path. A file that imports from both gets a collision.
-Plugin authors import from `genkit.model`; app developers from `genkit`. In practice no single
-file should need both — but it's an implicit contract that could surprise users.
+This is an explicit beta design decision:
 
-### `config` typing: `GenerationCommonConfig | dict[str, object]`
+- Originally, JSON-schema-exported wire types were intended to be the plugin contract.
+- JS then added veneer/helper layers for frequently used types.
+- Python copied that split initially, and the resulting surface was too confusing.
+- We adopt Go's approach for common response/message-result types:
+  - Omit the most common response wire types from default autogen output.
+  - Handwrite canonical runtime types (`ModelResponse`, `ModelResponseChunk`).
+  - Use those same types for both plugin contracts and app-developer annotations/usages.
+- Rule: if a wire type is common enough that we'd add a veneer helper layer, do not expose two public types; use one handwritten canonical type instead.
 
-`generate()`, `generate_stream()`, and `define_prompt()` currently accept:
+### Plugin namespace role and boundaries
 
-```python
-config: GenerationCommonConfig | dict[str, object] | None = None
-```
+- We considered `genkit.plugin`, but it collides semantically with `genkit.plugins.*` (actual provider/plugin packages) and repeatedly confused app developers.
+- We therefore standardize on `genkit.plugin_api` for framework/plugin-author primitives.
+- It exists to gather framework primitives plus convenience domain re-exports in one place. Otherwise, it's unclear what common stuff a plugin developer might need and the surface of concepts to grasp suddenly looks huge.
+- Canonical domain contracts should still be documented/imported from domain modules (`genkit.model`, `genkit.retriever`, etc.) to avoid import-path drift.
 
-`config` passes model-specific generation parameters — both common fields (`temperature`, `top_k`)
-and provider-specific ones (`safety_settings` for Gemini, `frequency_penalty` for OpenAI).
-`GenerationCommonConfig` only covers the common fields; `| dict` is an escape hatch for the rest.
-Cost: no IDE autocomplete on model-specific fields, silent typos.
+### Tradeoff: overload-heavy typing for `generate()` and `prompt()`
 
-**How JS solves it:** `ModelArgument<CustomOptions>` is generic — when you pass a typed
-`ModelAction<GeminiOptions>` or `ModelReference<GeminiOptions>`, TypeScript infers
-`config: GeminiOptions` at compile time. String models fall back to untyped (same limitation as
-Python today). Go doesn't solve this at all — `ModelRef.config` is typed as `any`.
+**Decision**
 
-**Proposed fix: make `ModelReference` generic, add overloads.**
+- `generate()` and `generate_stream()` each use a 4-overload matrix across two axes:
+  - model path (`ModelReference[C]` vs `str`)
+  - output typing (`output_schema: type[OutputT]` vs untyped schema)
+- Prompt APIs (`prompt()` and `define_prompt()`) also use 4 overloads, but only across:
+  - input typing
+  - output typing
+- We do **not** add model-config as a prompt overload axis.
 
-`ModelReference` already exists in Python but its `config` field is `dict[str, object]` — not
-generic. The fix:
+**Why this split**
 
-```python
-C = TypeVar('C', bound=GenerationCommonConfig)
+- For `generate*`, `config` is where plugin-specific correctness matters most. `ModelReference[C]`
+  lets type checking enforce that model config matches the selected model family.
+- For prompt APIs, the highest-value contracts are prompt input/output shapes. Those are what
+  prompt authors and prompt callers interact with most directly.
+- Adding model-config to prompt overload axes would increase prompt overloads from 4 to 8 for
+  relatively low additional value.
 
-# 1. Make ModelReference generic (plugin authors export typed refs):
-class ModelReference(BaseModel, Generic[C]):
-    name: str
-    config: C | None = None
-    ...
+**What this buys us**
 
-# Plugin exports:
-gemini_flash: ModelReference[GeminiConfig] = ModelReference(name="googleai/gemini-2.0-flash")
+- Strong config safety on the typed model path (`ModelReference[C]`).
+- Strongly typed `response.output` for schema-typed output paths.
+- Bounded overload growth (4 overloads per high-traffic API instead of 8+ for prompt APIs).
+- Practical parity with JS ergonomics while keeping one public response type per concept.
 
-# 2. generate() gains two overloads:
-@overload
-async def generate(self, *, model: ModelReference[C], config: C | None = None, ...) -> GenerateResponse: ...
-@overload
-async def generate(self, *, model: str | None = None, config: GenerationCommonConfig | None = None, ...) -> GenerateResponse: ...
-```
+**Cross-language note**
 
-**Result:**
-```python
-# Typed path — IDE enforces config type, flags wrong plugin config:
-await ai.generate(model=gemini_flash, config=GeminiConfig(temperature=0.7, safety_settings=[...]))  # ✅
-await ai.generate(model=gemini_flash, config=OpenAIConfig(...))  # ❌ type error
-
-# String path — unchanged, falls back to GenerationCommonConfig:
-await ai.generate(model="googleai/gemini-2.0-flash", config=GenerationCommonConfig(temperature=0.7))
-```
-
-This achieves full JS parity on the typed path. `ModelReference` already exists — needs to be
-made generic and exported from `genkit`. Plugin authors export typed `ModelReference[C]` objects.
-`| dict` is dropped entirely.
-
-**Decision needed:** Ship this for beta, or ship `GenerationCommonConfig | None` (subclass
-approach, no cross-model safety) and do the generic `ModelReference[C]` post-beta?
-
-### Naming: `Message` vs `MessageWrapper`
-
-Users construct messages with `Message`:
-```python
-messages=[Message(role="user", content=[...])]
-```
-
-But `response.message` returns a `MessageWrapper` — a subclass that adds `.text`, `.tool_requests`,
-`.interrupts`. `MessageWrapper` is not exported, so users can't type-annotate it directly.
-
-Options:
-1. **Current** — `Message` for construction, `MessageWrapper` silently returned
-2. Export `MessageWrapper` under a better name (e.g. `ResponseMessage`)
-3. Add `.text` / `.tool_requests` to `Message` directly, eliminate the subclass
-
----
+- JS has the same dynamic lookup limitation: `prompt(name)` cannot infer types from runtime registry
+  names unless types/schemas are provided at the call site.
+- Go does not provide equivalent generic config typing on model refs.
 
 ---
 
 ## Appendix: Pre-review action items
 
-Already decided — not for discussion. Listed for completeness.
+Smaller decisions we made to clean up the API surface as part of auditing the existing codebase. Referenced here to help with implementation later and remember why we made some of these decisions.
 
 - Rename `UserFacingError` → `PublicError` (matches Go's `NewPublicError`; intent is "safe to return in HTTP response")
 - Remove `reflection_server_spec` from `Genkit.__init__` — server starts automatically via `GENKIT_ENV=dev`, port is auto-selected; expose port override as env var `GENKIT_REFLECTION_PORT` if needed (PR #4812 does the right thing but left the param in)
@@ -891,11 +685,11 @@ Already decided — not for discussion. Listed for completeness.
 - Fix `part.root.text` / `part.root.media` ergonomics — Pydantic `RootModel` internals should not surface to users
 - Flatten `ExecutablePrompt` `opts: PromptGenerateOptions` TypedDict → flat kwargs (consistent with `generate()`)
 - Remove `on_chunk` callback from `generate()` — use `generate_stream()` instead
-- Change `generate_stream()` return type from `tuple[AsyncIterator, Future]` to `GenerateStreamResponse` — unifies with `prompt.stream()` which already returns `GenerateStreamResponse`
-- Introduce streaming type hierarchy (see `streaming.md`): `ActionStreamResponse[ChunkT, OutputT]` as base, `FlowStreamResponse[ChunkT, OutputT]` subclasses it, `GenerateStreamResponse[OutputT]` subclasses `FlowStreamResponse` with `ChunkT` pinned to `GenerateResponseChunk`
+- Change `generate_stream()` return type from `tuple[AsyncIterator, Future]` to `ModelStreamResponse` — unifies with `prompt.stream()` which already returns `ModelStreamResponse`
+- Introduce streaming type hierarchy (see `streaming.md`): `ActionStreamResponse[ChunkT, OutputT]` as base, `FlowStreamResponse[ChunkT, OutputT]` subclasses it, `ModelStreamResponse[OutputT]` subclasses `FlowStreamResponse` with `ChunkT` pinned to `ModelResponseChunk`
 - Fix `Action.stream()` to return `ActionStreamResponse[ChunkT, OutputT]` instead of raw tuple
-- Make `ActionRunContext` generic: `ActionRunContext[ChunkT]` so `send_chunk(chunk: ChunkT)` is type-safe — matches Go's `StreamCallback[Stream]` and JS's `ActionFnArg<S>`; currently `send_chunk(chunk: object)` accepts anything. `ToolRunContext` does NOT pin to `GenerateResponseChunk` — JS's `ToolAction` hardcodes the streaming type as `z.ZodTypeAny` (explicitly untyped) because tools borrow the parent generate's callback and don't own their chunk schema; `class ToolRunContext(ActionRunContext[object])` is the correct equivalent
-- Fix `FlowWrapper.stream()` to return `FlowStreamResponse[ChunkT, OutputT]` instead of raw tuple; fix `input: object` → `input: InputT`
+- Make `ActionRunContext` generic: `ActionRunContext[ChunkT]` so `send_chunk(chunk: ChunkT)` is type-safe — matches Go's `StreamCallback[Stream]` and JS's `ActionFnArg<S>` which are both typed on the chunk; currently Python uses `send_chunk(chunk: object)` which accepts anything
+- Fix `Flow.stream()` to return `FlowStreamResponse[ChunkT, OutputT]` instead of raw tuple; fix `input: object` → `input: InputT`
 - Fix `Channel` internals: (1) simplify to `Generic[T]` — the `R` close-result type parameter is unnecessary coupling; (2) fix `_pop()` falsy check `if not r` → `if r is None` — current code incorrectly stops iteration on any falsy chunk value (empty string, `0`, `False`)
 - Tighten `Callable[..., Any]` on `define_prompt()` resolver params — current code uses `Callable[..., Any]` everywhere; correct parametrized forms are `Callable[[InputT, dict | None], str | Part | list[Part]]` for `system`/`prompt`, `Callable[[InputT, dict | None], list[Message]]` for `messages`, `Callable[[InputT, dict | None], list[Document]]` for `docs`
 - `ai.retrieve()` should return `list[Document]` not `RetrieverResponse` — JS converts wire `DocumentData` to `Document` veneers before returning (`response.documents.map(d => new Document(d))`); Python currently leaks the raw wire type, breaking the retrieve → generate pipeline ergonomics

From 7a76c93c15e7460736856564c73ebc1b2e9e8a18 Mon Sep 17 00:00:00 2001
From: Jeff Huang <huangjeff@google.com>
Date: Thu, 26 Feb 2026 14:47:05 -0600
Subject: [PATCH 17/17] Add ModelRef

---
 py/docs/python_beta_api_proposal.md | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/py/docs/python_beta_api_proposal.md b/py/docs/python_beta_api_proposal.md
index 951de762f5..5cfca1e6b6 100644
--- a/py/docs/python_beta_api_proposal.md
+++ b/py/docs/python_beta_api_proposal.md
@@ -80,7 +80,7 @@ from genkit.model import (
     Stage,
     model_action_metadata,
     model_ref,
-    ModelReference,
+    ModelRef, # Renamed from ModelReference
     BackgroundAction,
     lookup_background_action,
     compute_usage_stats,
@@ -99,6 +99,7 @@ from genkit.retriever import (
     RetrieverResponse,
     retriever_action_metadata,
     retriever_ref,
+    RetrieverRef,
     IndexerRequest,
     indexer_action_metadata,
     indexer_ref,
@@ -114,6 +115,7 @@ from genkit.embedder import (
     Embedding,
     embedder_action_metadata,
     embedder_ref,
+    EmbedderRef,
     EmbedderSupports,
 )
 ```
@@ -124,6 +126,7 @@ from genkit.embedder import (
 from genkit.reranker import (
     reranker_action_metadata,
     reranker_ref,
+    RerankerRef,
     RankedDocument,
     RerankerRequest,
     RerankerResponse,
@@ -144,6 +147,7 @@ from genkit.evaluator import (
     BaseEvalDataPoint,
     EvalStatusEnum,
     evaluator_action_metadata,
+    EvaluatorRef,
     evaluator_ref,
 )
 ```