diff --git a/py/docs/python_beta_api_proposal.md b/py/docs/python_beta_api_proposal.md
new file mode 100644
index 0000000000..5cfca1e6b6
--- /dev/null
+++ b/py/docs/python_beta_api_proposal.md
@@ -0,0 +1,699 @@
+# Genkit Python — Beta API Design Review
+
+This doc covers the full public API surface being locked at beta: what's importable,
+how the client is constructed, the high-traffic method signatures, and the return types
+users interact with. Section 5 lists open design questions requiring explicit sign-off.
+
+---
+
+## 1. Import Surface
+
+Every symbol exported at beta. This is exhaustive by design — the list itself is what's
+being approved.
+
+### `from genkit import ...` — app developers
+
+```python
+from genkit import (
+    # Core
+    Genkit,
+    ActionRunContext,
+    ModelResponse, # renamed from GenerateResponse, wire format + veneer unified
+    ModelResponseChunk, # renamed from GenerateResponse, wire format + veneer unified
+
+    ExecutablePrompt,
+    GenkitError,
+    PublicError,         # renamed from UserFacingError
+
+    # Content types
+    Part, TextPart, MediaPart, Media,
+    DataPart, ToolRequestPart, ToolResponsePart, CustomPart,
+    ReasoningPart,
+
+    # Messages
+    Message, Role,
+
+    # Documents
+    Document, DocumentPart,
+
+    # Tool context
+    ToolRunContext,
+    ToolInterruptError, 
+    ToolChoice,
+
+    # Generation config
+    ModelConfig # Renamed from GenerationCommonConfig
+
+    # Evaluation
+    BaseEvalDataPoint,
+
+    Flow,                     # Useful for annotation? 50/50 on this one
+
+    # WIP - Streaming Type Annotation
+    ActionStreamResponse,     # base streaming wrapper — Action.stream()
+    FlowStreamResponse,       # flow streaming wrapper — Flow.stream()
+    ModelStreamResponse,      # model/prompt streaming wrapper — subclass of FlowStreamResponse
+
+)
+```
+
+### `genkit.model`
+
+```python
+from genkit.model import (
+    ModelRequest, # Renamed from GenerateRequest
+    ModelResponse, # Renamed from GenerateResponse, wire format + veneer unified
+    ModelResponseChunk, # Renamed from GenerateResponseChunk, wire format + veneer unified
+    GenerationUsage,
+    Candidate,
+    OutputConfig,
+    FinishReason,
+    GenerateActionOptions,
+    Error,
+    Operation,
+    ToolRequest,
+    ToolDefinition,
+    ToolResponse,
+    ModelInfo,
+    Supports,
+    Constrained,
+    Stage,
+    model_action_metadata,
+    model_ref,
+    ModelRef, # Renamed from ModelReference
+    BackgroundAction,
+    lookup_background_action,
+    compute_usage_stats,
+    resolve_api_key,
+    ModelConfig # Renamed from GenerationCommonConfig
+)
+```
+
+Note: DAP and Model Middleware exports will be included in `genkit.model` namespace. Still working on the re-design of these features. Will update this API surface when done.
+
+### `genkit.retriever`
+
+```python
+from genkit.retriever import (
+    RetrieverRequest,
+    RetrieverResponse,
+    retriever_action_metadata,
+    retriever_ref,
+    RetrieverRef,
+    IndexerRequest,
+    indexer_action_metadata,
+    indexer_ref,
+)
+```
+
+### `genkit.embedder`
+
+```python
+from genkit.embedder import (
+    EmbedRequest,
+    EmbedResponse,
+    Embedding,
+    embedder_action_metadata,
+    embedder_ref,
+    EmbedderRef,
+    EmbedderSupports,
+)
+```
+
+### `genkit.reranker`
+
+```python
+from genkit.reranker import (
+    reranker_action_metadata,
+    reranker_ref,
+    RerankerRef,
+    RankedDocument,
+    RerankerRequest,
+    RerankerResponse,
+    RankedDocumentData,
+    RankedDocumentMetadata,
+)
+```
+
+### `genkit.evaluator`
+
+```python
+from genkit.evaluator import (
+    EvalRequest,
+    EvalResponse,
+    EvalFnResponse,
+    Score,
+    Details,
+    BaseEvalDataPoint,
+    EvalStatusEnum,
+    evaluator_action_metadata,
+    EvaluatorRef,
+    evaluator_ref,
+)
+```
+
+### `genkit.plugin_api` — all plugin authors
+
+```python
+from genkit.plugin_api import (
+    # Base class and framework primitives
+    Plugin,
+    Action,
+    ActionMetadata,
+    ActionKind,
+    StatusCodes,
+
+    # HTTP / version stamping (for setting x-goog-api-client and user-agent headers)
+    GENKIT_CLIENT_HEADER,
+    GENKIT_VERSION,
+
+    # Convenience re-exports from domain modules
+    # (identical to importing from genkit.model, genkit.retriever, etc.)
+    model_action_metadata, model_ref, ModelReference,
+    embedder_action_metadata, embedder_ref,
+    retriever_action_metadata, retriever_ref,
+    indexer_action_metadata, indexer_ref,
+    reranker_action_metadata, reranker_ref,
+    evaluator_action_metadata, evaluator_ref,
+)
+```
+
+Note: The domain sub-modules (`genkit.model`, `genkit.retriever`, etc.) are still the canonical
+paths for domain-specific types. `genkit.plugin_api` re-exports the cross-cutting framework primitives
+and provides a single entry point for plugin authors who don't want to hunt across multiple paths.
+
+**Canonical import policy (beta):**
+- App developers use `from genkit import ...` for the application-facing API.
+- Plugin authors use `from genkit.plugin_api import ...` for framework primitives (`Plugin`, `Action`, etc.).
+- Domain modules (`genkit.model`, `genkit.retriever`, `genkit.embedder`, `genkit.reranker`, `genkit.evaluator`) are canonical for domain-specific types.
+- Prefer domain-specific imports over importing from `genkit.plugin_api` in all app-developer facing docs and samples. `genkit.plugin_api` convenience exports should be reserved for plugin author-facing documentation.
+- Telemetry/tracing helpers remain core/internal for beta (`genkit.core.tracing`) and align to OpenTelemetry semantics rather than a separate public tracing namespace. (WIP, need to flesh out the primary user journeys more clearly here)
+
+---
+
+## 2. Client Construction
+
+```python
+ai = Genkit(
+    plugins: list[Plugin] | None = None,
+    model: str | None = None,
+    prompt_dir: str | Path | None = None,
+)
+```
+
+- `plugins` — list of initialized plugin instances
+- `model` — default model name used when `model=` is omitted from `generate()`
+- `prompt_dir` — directory to load `.prompt` files from; defaults to `./prompts` if it exists
+
+---
+
+## 3. Method Signatures
+
+High-traffic paths only — not exhaustive.
+
+### `Genkit`
+
+```python
+C = TypeVar('C', bound=GenerationCommonConfig)
+InputT = TypeVar('InputT')
+OutputT = TypeVar('OutputT')
+
+# generate(): exact 4-overload matrix
+# Shared params omitted below:
+# prompt, system, messages, tools, return_tool_requests, tool_choice, tool_responses,
+# max_turns, context, output_format, output_content_type, output_instructions,
+# output_constrained, use, docs
+#
+# 1) typed model + typed output
+@overload
+async def generate(
+    self,
+    *,
+    model: ModelReference[C],
+    config: C | None = None,
+    output_schema: type[OutputT],
+    ...,
+) -> ModelResponse[OutputT]: ...
+
+# 2) typed model + untyped output
+@overload
+async def generate(
+    self,
+    *,
+    model: ModelReference[C],
+    config: C | None = None,
+    output_schema: dict[str, object] | None = None,
+    ...,
+) -> ModelResponse[Any]: ...
+
+# 3) string model + typed output
+@overload
+async def generate(
+    self,
+    *,
+    model: str | None = None,
+    config: GenerationCommonConfig | None = None,
+    output_schema: type[OutputT],
+    ...,
+) -> ModelResponse[OutputT]: ...
+
+# 4) string model + untyped output
+@overload
+async def generate(
+    self,
+    *,
+    model: str | None = None,
+    config: GenerationCommonConfig | None = None,
+    output_schema: dict[str, object] | None = None,
+    ...,
+) -> ModelResponse[Any]: ...
+
+# generate_stream(): same 4-overload matrix as generate()
+# Shared params omitted below:
+# prompt, system, messages, tools, return_tool_requests, tool_choice,
+# max_turns, context, output_format, output_content_type, output_instructions,
+# output_constrained, use, docs, timeout
+#
+# 1) typed model + typed output
+@overload
+def generate_stream(
+    self,
+    *,
+    model: ModelReference[C],
+    config: C | None = None,
+    output_schema: type[OutputT],
+    ...,
+) -> ModelStreamResponse[OutputT]: ...
+
+# 2) typed model + untyped output
+@overload
+def generate_stream(
+    self,
+    *,
+    model: ModelReference[C],
+    config: C | None = None,
+    output_schema: dict[str, object] | None = None,
+    ...,
+) -> ModelStreamResponse[Any]: ...
+
+# 3) string model + typed output
+@overload
+def generate_stream(
+    self,
+    *,
+    model: str | None = None,
+    config: GenerationCommonConfig | None = None,
+    output_schema: type[OutputT],
+    ...,
+) -> ModelStreamResponse[OutputT]: ...
+
+# 4) string model + untyped output
+@overload
+def generate_stream(
+    self,
+    *,
+    model: str | None = None,
+    config: GenerationCommonConfig | None = None,
+    output_schema: dict[str, object] | None = None,
+    ...,
+) -> ModelStreamResponse[Any]: ...
+
+# Retrieval
+async def retrieve(
+    self,
+    retriever: str | RetrieverRef,
+    query: str | Document,
+    *,
+    options: dict[str, object] | None = None,  # plugin-defined schema; shape varies per retriever
+) -> list[Document]: ...
+
+# Embedding
+async def embed(
+    self,
+    embedder: str | EmbedderRef,
+    content: str | Document,
+    *,
+    options: dict[str, object] | None = None,  # plugin-defined schema; shape varies per embedder
+) -> list[Embedding]: ...
+
+# Prompt lookup: same 4-overload input/output matrix as define_prompt()
+# Shared params omitted below:
+# variant
+#
+# 1) typed input + typed output
+@overload
+def prompt(
+    self,
+    name: str,
+    *,
+    input_schema: type[InputT],
+    output_schema: type[OutputT],
+    ...,
+) -> ExecutablePrompt[InputT, OutputT]: ...
+
+# 2) typed input + untyped output
+@overload
+def prompt(
+    self,
+    name: str,
+    *,
+    input_schema: type[InputT],
+    output_schema: dict[str, object] | None = None,
+    ...,
+) -> ExecutablePrompt[InputT, Any]: ...
+
+# 3) untyped input + typed output
+@overload
+def prompt(
+    self,
+    name: str,
+    *,
+    input_schema: dict[str, object] | None = None,
+    output_schema: type[OutputT],
+    ...,
+) -> ExecutablePrompt[Any, OutputT]: ...
+
+# 4) untyped input + untyped output
+@overload
+def prompt(
+    self,
+    name: str,
+    *,
+    input_schema: dict[str, object] | None = None,
+    output_schema: dict[str, object] | None = None,
+    ...,
+) -> ExecutablePrompt[Any, Any]: ...
+
+# Decorators
+@ai.flow(name: str | None = None)
+async def my_flow(input: InputT) -> OutputT: ...
+# Returns: Flow
+
+@ai.tool(name: str | None = None, description: str | None = None)
+def my_tool(input: InputT, ctx: ToolRunContext) -> OutputT: ...
+```
+
+### `ExecutablePrompt` — returned by `ai.prompt()` / `@ai.define_prompt`
+
+```python
+# Call like a function
+await prompt(input: InputT | None = None) -> ModelResponse[OutputT]
+
+# Stream
+def stream(
+    self,
+    input: InputT | None = None,
+    *,
+    timeout: float | None = None,
+) -> ModelStreamResponse[OutputT]
+
+# Render without executing
+async def render(
+    self,
+    input: InputT | dict[str, Any] | None = None,
+) -> GenerateActionOptions
+```
+
+### `Flow` — returned by `@ai.flow`
+
+```python
+# Call like a function — same signature as the wrapped flow
+flow(*args, **kwargs) -> Awaitable[OutputT]
+
+# Stream
+def stream(
+    self,
+    input: InputT = None,
+    *,
+    context: dict[str, object] | None = None,
+    telemetry_labels: dict[str, object] | None = None,
+    timeout: float | None = None,
+) -> FlowStreamResponse[ChunkT, OutputT]
+```
+
+### Plugin authoring surface
+
+```python
+# define_prompt(): 4-overload input/output matrix only
+# Shared params omitted below:
+# name, variant, model, config, description, system, prompt, messages,
+# docs, output_format, output_content_type, output_instructions,
+# output_constrained, tools, tool_choice, return_tool_requests, max_turns, use
+#
+# 1) typed input + typed output
+@overload
+def define_prompt(
+    self,
+    *,
+    input: Input[InputT],
+    output: Output[OutputT],
+    ...,
+) -> ExecutablePrompt[InputT, OutputT]: ...
+
+# 2) typed input + untyped output
+@overload
+def define_prompt(
+    self,
+    *,
+    input: Input[InputT],
+    output: Output[Any] | None = None,
+    ...,
+) -> ExecutablePrompt[InputT, Any]: ...
+
+# 3) untyped input + typed output
+@overload
+def define_prompt(
+    self,
+    *,
+    input: Input[Any] | None = None,
+    output: Output[OutputT],
+    ...,
+) -> ExecutablePrompt[Any, OutputT]: ...
+
+# 4) untyped input + untyped output
+@overload
+def define_prompt(
+    self,
+    *,
+    input: Input[Any] | None = None,
+    output: Output[Any] | None = None,
+    ...,
+) -> ExecutablePrompt[Any, Any]: ...
+
+def define_model(
+    self,
+    name: str,
+    fn: ModelFn,
+    *,
+    config_schema: type[BaseModel] | dict[str, object] | None = None,
+    label: str | None = None,
+    supports: Supports | None = None,
+    versions: list[str] | None = None,
+    stage: Stage | None = None,
+) -> Action: ...
+
+def define_embedder(
+    self,
+    name: str,
+    fn: EmbedderFn,
+    *,
+    config_schema: type[BaseModel] | dict[str, object] | None = None,
+    label: str | None = None,
+    supports: EmbedderSupports | None = None,
+    dimensions: int | None = None,
+) -> Action: ...
+
+def define_retriever(
+    self,
+    name: str,
+    fn: RetrieverFn,
+    *,
+    config_schema: type[BaseModel] | dict[str, object] | None = None,
+    label: str | None = None,
+    supports: RetrieverSupports | None = None,
+) -> Action: ...
+
+# InputT binds through input_schema — all Callables and the return type are typed accordingly
+
+def define_prompt(
+    self,
+    name: str | None = None,
+    *,
+    variant: str | None = None,
+    model: str | None = None,
+    config: ModelConfig | None = None,  # or GeminiConfig, OpenAIConfig, etc. for model-specific fields
+    description: str | None = None,
+    input_schema: type[InputT] | None = None,      # binds InputT for callables below
+    system: str | Part | list[Part] | Callable[[InputT, dict | None], str | Part | list[Part]] | None = None,
+    prompt: str | Part | list[Part] | Callable[[InputT, dict | None], str | Part | list[Part]] | None = None,
+    messages: str | list[Message] | Callable[[InputT, dict | None], list[Message]] | None = None,
+    docs: list[Document] | Callable[[InputT, dict | None], list[Document]] | None = None,
+    output_schema: type | dict[str, object] | None = None,
+    output_format: str | None = None,
+    output_content_type: str | None = None,
+    output_instructions: bool | str | None = None,
+    output_constrained: bool | None = None,
+    tools: list[str | Action | ExecutablePrompt] | None = None,  # str = registered name, Action = inline tool, ExecutablePrompt = sub-agent
+    tool_choice: ToolChoice | None = None,
+    return_tool_requests: bool | None = None,
+    max_turns: int | None = None,
+    use: list[ModelMiddleware] | None = None,
+) -> ExecutablePrompt[InputT]: ...
+
+# Streaming - WIP
+# Action — returned by define_model, define_tool, etc.
+# Calling streams and returns the base wrapper; Flow/generate_stream build on top
+action.stream(
+    input: InputT | None = None,
+    *,
+    context: dict[str, object] | None = None,
+    telemetry_labels: dict[str, object] | None = None,
+    timeout: float | None = None,
+) -> ActionStreamResponse[ChunkT, OutputT]
+
+# ActionRunContext[ChunkT] — producer interface inside action/flow/tool functions
+# Go: StreamCallback[Stream] param (nil = not streaming)
+# JS: ActionFnArg<S> / FlowSideChannel<S> — two types; Python unifies into one
+ctx.is_streaming              # bool — whether caller requested a stream
+ctx.send_chunk(chunk: ChunkT) # type-safe push; no-op if not streaming
+ctx.context                   # dict[str, object] — request context
+```
+
+---
+
+## 4. Return Type Surfaces
+
+What users get back from calls and interact with.
+
+### `ModelResponse` — from `generate()`, `await prompt(input)`
+
+```python
+response.text          # str — full text of the response
+response.output        # OutputT — typed output if output schema was provided
+response.message       # Message — the final message
+response.messages      # list[Message] — full conversation history
+response.tool_requests # list[ToolRequestPart] — pending tool calls
+```
+
+### `Message` — used for both inputs and returned responses
+
+```python
+message.text           # str — text content of the message
+message.tool_requests  # list[ToolRequestPart]
+message.interrupts     # list[ToolRequestPart] — tool calls requiring user input
+```
+
+### `ModelResponseChunk` — stream chunks from `generate_stream()`
+
+```python
+chunk.text             # str — text in this chunk
+chunk.output           # object — partial typed output
+chunk.accumulated_text # str — all text so far
+```
+
+### Streaming wrappers — WIP
+
+Three wrapper types, one hierarchy (`ActionStreamResponse` → `FlowStreamResponse` → `ModelStreamResponse`). All expose the same two properties:
+
+```python
+result.stream    # AsyncIterable[ChunkT]
+result.response  # Awaitable[OutputT]
+```
+
+| Type | Returned by | ChunkT | OutputT |
+|---|---|---|---|
+| `ActionStreamResponse[C, O]` | `action.stream()` | action-defined | action-defined |
+| `FlowStreamResponse[C, O]` | `flow.stream()` | flow-defined | flow-defined |
+| `ModelStreamResponse[O]` | `generate_stream()`, `prompt.stream()` | `ModelResponseChunk` (fixed) | `ModelResponse[O]` |
+
+### `retrieve()` return value
+
+```python
+documents              # list[Document]
+```
+
+---
+
+## 5. Design Flags
+
+### Single public type per concept
+
+For beta, Python uses one public type per concept (no split between "wire type" and
+"veneer type" in the public API):
+
+- `ModelResponse` is the single public response type used by app code and plugin contracts.
+- `ModelResponseChunk` is the single public streaming chunk type.
+- `Message` and `Document` are the single public message/document types for both construction and returned values.
+
+This is an explicit beta design decision:
+
+- Originally, JSON-schema-exported wire types were intended to be the plugin contract.
+- JS then added veneer/helper layers for frequently used types.
+- Python copied that split initially, and the resulting surface was too confusing.
+- We adopt Go's approach for common response/message-result types:
+  - Omit the most common response wire types from default autogen output.
+  - Handwrite canonical runtime types (`ModelResponse`, `ModelResponseChunk`).
+  - Use those same types for both plugin contracts and app-developer annotations/usages.
+- Rule: if a wire type is common enough that we'd add a veneer helper layer, do not expose two public types; use one handwritten canonical type instead.
+
+### Plugin namespace role and boundaries
+
+- We considered `genkit.plugin`, but it collides semantically with `genkit.plugins.*` (actual provider/plugin packages) and repeatedly confused app developers.
+- We therefore standardize on `genkit.plugin_api` for framework/plugin-author primitives.
+- It exists to gather framework primitives plus convenience domain re-exports in one place. Otherwise, it's unclear what common stuff a plugin developer might need and the surface of concepts to grasp suddenly looks huge.
+- Canonical domain contracts should still be documented/imported from domain modules (`genkit.model`, `genkit.retriever`, etc.) to avoid import-path drift.
+
+### Tradeoff: overload-heavy typing for `generate()` and `prompt()`
+
+**Decision**
+
+- `generate()` and `generate_stream()` each use a 4-overload matrix across two axes:
+  - model path (`ModelReference[C]` vs `str`)
+  - output typing (`output_schema: type[OutputT]` vs untyped schema)
+- Prompt APIs (`prompt()` and `define_prompt()`) also use 4 overloads, but only across:
+  - input typing
+  - output typing
+- We do **not** add model-config as a prompt overload axis.
+
+**Why this split**
+
+- For `generate*`, `config` is where plugin-specific correctness matters most. `ModelReference[C]`
+  lets type checking enforce that model config matches the selected model family.
+- For prompt APIs, the highest-value contracts are prompt input/output shapes. Those are what
+  prompt authors and prompt callers interact with most directly.
+- Adding model-config to prompt overload axes would increase prompt overloads from 4 to 8 for
+  relatively low additional value.
+
+**What this buys us**
+
+- Strong config safety on the typed model path (`ModelReference[C]`).
+- Strongly typed `response.output` for schema-typed output paths.
+- Bounded overload growth (4 overloads per high-traffic API instead of 8+ for prompt APIs).
+- Practical parity with JS ergonomics while keeping one public response type per concept.
+
+**Cross-language note**
+
+- JS has the same dynamic lookup limitation: `prompt(name)` cannot infer types from runtime registry
+  names unless types/schemas are provided at the call site.
+- Go does not provide equivalent generic config typing on model refs.
+
+---
+
+## Appendix: Pre-review action items
+
+Smaller decisions we made to clean up the API surface as part of auditing the existing codebase. Referenced here to help with implementation later and remember why we made some of these decisions.
+
+- Rename `UserFacingError` → `PublicError` (matches Go's `NewPublicError`; intent is "safe to return in HTTP response")
+- Remove `reflection_server_spec` from `Genkit.__init__` — server starts automatically via `GENKIT_ENV=dev`, port is auto-selected; expose port override as env var `GENKIT_REFLECTION_PORT` if needed (PR #4812 does the right thing but left the param in)
+- Make `ai.registry` private (`ai._registry`); remove direct access from all samples
+- Fix `part.root.text` / `part.root.media` ergonomics — Pydantic `RootModel` internals should not surface to users
+- Flatten `ExecutablePrompt` `opts: PromptGenerateOptions` TypedDict → flat kwargs (consistent with `generate()`)
+- Remove `on_chunk` callback from `generate()` — use `generate_stream()` instead
+- Change `generate_stream()` return type from `tuple[AsyncIterator, Future]` to `ModelStreamResponse` — unifies with `prompt.stream()` which already returns `ModelStreamResponse`
+- Introduce streaming type hierarchy (see `streaming.md`): `ActionStreamResponse[ChunkT, OutputT]` as base, `FlowStreamResponse[ChunkT, OutputT]` subclasses it, `ModelStreamResponse[OutputT]` subclasses `FlowStreamResponse` with `ChunkT` pinned to `ModelResponseChunk`
+- Fix `Action.stream()` to return `ActionStreamResponse[ChunkT, OutputT]` instead of raw tuple
+- Make `ActionRunContext` generic: `ActionRunContext[ChunkT]` so `send_chunk(chunk: ChunkT)` is type-safe — matches Go's `StreamCallback[Stream]` and JS's `ActionFnArg<S>` which are both typed on the chunk; currently Python uses `send_chunk(chunk: object)` which accepts anything
+- Fix `Flow.stream()` to return `FlowStreamResponse[ChunkT, OutputT]` instead of raw tuple; fix `input: object` → `input: InputT`
+- Fix `Channel` internals: (1) simplify to `Generic[T]` — the `R` close-result type parameter is unnecessary coupling; (2) fix `_pop()` falsy check `if not r` → `if r is None` — current code incorrectly stops iteration on any falsy chunk value (empty string, `0`, `False`)
+- Tighten `Callable[..., Any]` on `define_prompt()` resolver params — current code uses `Callable[..., Any]` everywhere; correct parametrized forms are `Callable[[InputT, dict | None], str | Part | list[Part]]` for `system`/`prompt`, `Callable[[InputT, dict | None], list[Message]]` for `messages`, `Callable[[InputT, dict | None], list[Document]]` for `docs`
+- `ai.retrieve()` should return `list[Document]` not `RetrieverResponse` — JS converts wire `DocumentData` to `Document` veneers before returning (`response.documents.map(d => new Document(d))`); Python currently leaks the raw wire type, breaking the retrieve → generate pipeline ergonomics
diff --git a/py/docs/python_beta_sdk_audit.md b/py/docs/python_beta_sdk_audit.md
new file mode 100644
index 0000000000..df25d70d96
--- /dev/null
+++ b/py/docs/python_beta_sdk_audit.md
@@ -0,0 +1,92 @@
+# API Audit: Resolved Decisions
+
+The documentation audit surfaced various API issues. The items below had clear Pythonic answers and are resolved. The remaining issues — streaming, public API surface, output configuration, async support, method signatures, and class structure — are open design questions covered in [PYTHON_API_REVIEW.md](./PYTHON_API_REVIEW.md).
+
+---
+
+## Keyword-only arguments
+
+All public methods currently accept positional arguments. Nothing prevents `ai.generate("gemini", "Hi", None, None, ["search"])` — five positional args where the middle three are just filling slots. This is the single most common source of fragile call sites.
+
+**Decision:** Every public method gets a `*` marker after `self`. At most one positional argument is allowed (e.g., `input` on prompt `__call__`). Everything else is keyword-only.
+
+```python
+# Before — positional abuse possible
+ai.generate("gemini", "Hi", None, None, ["search"])
+
+# After — every argument named
+ai.generate(model="gemini", prompt="Hi", tools=["search"])
+```
+
+This is standard Python convention. OpenAI, Anthropic, and most modern Python APIs enforce keyword-only arguments on methods with more than 2-3 parameters.
+
+## Decorator shorthands
+
+`@ai.tool()`, `@ai.flow()` exist alongside imperative `define_*` methods. App developers use decorators; plugin authors use the imperative API. This also resolved the handler signature discoverability issue — decorators make expected signatures clear through type hints, while the imperative `define_*` methods accept generic callables with no signature guidance.
+
+^^ Need decorators for other primitives as well.
+
+## Part constructor
+
+Runtime testing revealed that `Part(text="hello")` works via Pydantic's union parsing — `Part` is a `RootModel[Union[TextPart, MediaPart, ...]]` and Pydantic resolves the correct variant from keyword arguments. The verbose `Part(root=TextPart(text="hello"))` form also works but adds no value. Samples use the verbose form.
+
+**Decision:** Bless the shorthand as the documented pattern. Both forms produce identical objects.
+
+```python
+Part(text="hello")                                              # blessed
+Part(media=Media(url="https://...", content_type="image/png"))  # blessed
+```
+
+## RetrieverResponse iterability
+
+`RetrieverResponse` has a `.documents` field but doesn't implement Python's sequence protocol. The audit found 9 occurrences of code trying to iterate over the response directly — the most common single error pattern.
+
+**Decision:** Implement `__iter__`, `__len__`, `__getitem__` delegating to `self.documents`.
+
+```python
+# Before — must access .documents
+for doc in response.documents:
+
+# After — response is directly iterable
+for doc in await ai.retrieve(retriever=my_retriever, query=query):
+    print(doc.text)
+
+len(response)    # number of documents
+response[0]      # first document
+```
+
+This follows the Python convention that collection-like objects should implement the sequence protocol. `RetrieverResponse` is conceptually a list of documents with metadata — it should behave like one.
+
+## response.media property
+
+JS has `response.media` for image generation responses. The audit found 5 occurrences of code using this property — all runtime errors in Python. Users currently have to navigate `response.message.content[0].media`.
+
+**Decision:** Add a `response.media` convenience property on `GenerateResponseWrapper`.
+
+```python
+response = await ai.generate(model="googleai/imagen3", prompt="a cat")
+image = response.media  # Media | None
+```
+
+## Type consolidation
+
+Two nearly-identical types existed: `BaseDataPoint` (generic) and `BaseEvalDataPoint` (evaluator-specific). The audit found samples using them interchangeably.
+
+**Decision:** Merge into `BaseEvalDataPoint`. Remove `BaseDataPoint` from the public API.
+
+## Public API cleanup
+
+Several symbols were in the public `__all__` that don't belong:
+
+- **`tool_response`** — only 3 sample usages. JS and Go use a method on the tool instance.
+- **`dump_dict` / `dump_json`** — internal serialization utilities.
+- **`get_logger`** — thin wrapper around `logging.getLogger("genkit")`. Python developers know the stdlib.
+- **`GenkitRegistry`, `FlowWrapper`, `SimpleRetrieverOptions`** — internal implementation types.
+
+^^ Need to audit __all__ in all packages and see what comes up. 
+
+## Evaluator API
+
+The evaluator API (`GenkitMetricType`, `MetricConfig`, `PluginOptions`) has its own design issues — the audit found the API shape diverges significantly from what the naming suggests. Not addressed in this review; flagged for separate follow-up.
+
+^^ Follow up
diff --git a/py/docs/python_beta_sdk_design.md b/py/docs/python_beta_sdk_design.md
new file mode 100644
index 0000000000..98643c47ff
--- /dev/null
+++ b/py/docs/python_beta_sdk_design.md
@@ -0,0 +1,534 @@
+# Genkit Python SDK — Design Review
+
+Related docs:
+- [python_beta_api_proposal.md](./python_beta_api_proposal.md) — public API surface (what's importable)
+- [python_package_reorg.md](./python_package_reorg.md) — internal package structure
+- [python_type_audit_checklist.md](./python_type_audit_checklist.md) — type deletions/fixes
+- [python_beta_sdk_audit.md](./python_beta_sdk_audit.md) — initial friction audit
+
+## 1. Background
+
+The Python SDK launched to match JS and Go feature timelines. It achieved feature parity, but the API surface was never designed independently. Patterns were ported from JS rather than designed natively for Python. 
+
+The Python SDK is public but hasn't cut a stable release. The JS SDK went through a similar cleanup between v0.5 and v1.0, and the migration cost grew with each release. Python is earlier in that curve and changes are still cheap.
+
+This doc covers **design decisions** — the "how" and "why" behind internal architecture choices. For the public import surface (the "what"), see [python_beta_api_proposal.md](./python_beta_api_proposal.md).
+
+## 2. Principles
+
+### Pythonic API conventions
+
+The JS SDK uses options objects, camelCase, and callback patterns. "Pythonic" means a set of concrete conventions:
+
+**Zero-to-one positional arguments.** Every public method allows at most one positional arg — the "obvious" one (e.g., `input` for a prompt call). Everything else is keyword-only via the `*` marker. This prevents positional abuse and makes call sites self-documenting:
+
+```python
+# Bad: what are these arguments?
+ai.generate("gemini", "Hi", None, None, ["search"])
+
+# Good: every argument is named
+ai.generate(model="gemini", prompt="Hi", tools=["search"])
+```
+
+**Kwargs over options dicts.** JS groups parameters into an options object. Python has first-class keyword arguments. Dict-based configuration loses autocomplete, type checking, and discoverability. This applies to `generate()`, `prompt()`, and every public method.
+
+**Flat imports, intentional boundaries.** Python has no access modifiers — any module is importable, and there's no way to enforce "private." This makes API boundary design a deliberate choice, not a language feature. Public entry points are `from genkit import ...` (app developers) and domain sub-modules like `genkit.model`, `genkit.retriever`, `genkit.tracing` (plugin authors). Internal modules use `_internal/` directories following the Pydantic v2 convention. There is no public `genkit.core` or `genkit.ai` import path — those are internal structure only. Full symbol lists are in [python_beta_api_proposal.md](./python_beta_api_proposal.md). The internal package structure is in [python_package_reorg.md](./python_package_reorg.md).
+
+## 3. Initial Audit
+
+While working on updated docs, we identified several friction points in the developer experience. 
+
+For many of these friction points, there was a clear Pythonic standard to follow — keyword-only arguments on all methods, sequence protocol on `RetrieverResponse`, convenience properties like `response.media`, veneer aliasing (`GenerateResponseWrapper` → `GenerateResponse`), and cleanup of internal utilities from the public surface. More details here: [python_beta_sdk_audit.md](./python_beta_sdk_audit.md)
+
+The remaining sections in this doc are open questions that need some discussion to resolve.
+
+## 4. Public API surface & type architecture
+
+Today there is no formal public/internal boundary. The documentation audit found samples importing from `genkit.core.action`, `genkit.blocks.model`, and `genkit.ai` — all internal paths that happen to work. This means any internal module rename or refactor is a breaking change for external developers, even if the public API hasn't changed.
+
+**Resolved decisions:**
+
+- **Single entry point.** `from genkit import ...` covers both app developers (~25 symbols) and plugin authors (~9 additional). No separate `genkit.types` or `genkit.plugin`. Domain sub-modules (`genkit.model`, `genkit.retriever`, `genkit.tracing`, etc.) provide wire-format types for plugin authors who need them.
+
+- **No public `genkit.core`.** Internal packages (`core/`, `ai/`) use `_internal/` directories following Pydantic v2's convention. `genkit/__init__.py` re-exports everything users need. See [python_package_reorg.md](./python_package_reorg.md) for the full structure.
+
+- **Veneer aliasing.** `GenerateResponseWrapper` → `GenerateResponse` via inheritance (so `isinstance` works). `MessageWrapper` stays as-is because it uses composition — aliasing would break `Message(role="user", content=[...])`. App developers get `MessageWrapper` via `response.messages` but never construct it directly.
+
+- **`__all__` on every public `__init__.py`.** Enforced by `import-linter` in CI.
+
+- **Internal code organization.** `blocks/` is deleted (merged into `ai/`). `aio/`, `lang/`, `types/` are deleted (absorbed into `core/_internal/`). `web/` renamed to `_web/`. See [python_package_reorg.md](./python_package_reorg.md).
+
+Full symbol lists and rationale for each inclusion/exclusion: [python_beta_api_proposal.md](./python_beta_api_proposal.md).
+
+## 5. Output configuration
+
+The `generate()` method currently accepts output configuration multiple ways:
+
+```python
+# Way 1: Inline kwargs
+await ai.generate(prompt="...", output_format="json", output_content_type="application/json",
+                  output_instructions="Return valid JSON", output_constrained=True)
+
+# Way 2: Config helper with generics
+await ai.generate(prompt="...", output=Output(schema=MyModel))
+```
+
+**Flat kwargs vs. wrapper object.** We considered both approaches:
+
+A **wrapper object** (`output=OutputConfig(Recipe)` or `output=Recipe`) bundles the schema with secondary options (format, constrained, instructions) into one param, reducing `generate()`'s parameter count. But it introduces a new type developers have to learn and import.
+
+**Flat kwargs** (`output_schema=Recipe`) is the more Pythonic approach. Python functions embrace explicit parameters with defaults — `requests.get()` has 15+ kwargs, `json.dumps()` has 8, `subprocess.run()` has 12. No config objects. The secondary output params (`output_format`, `output_constrained`, `output_content_type`, `output_instructions`) stay as kwargs with sensible defaults — `output_format` auto-defaults to `'json'` when a schema is set, the rest default to `None` and are rarely used. The common case is just:
+
+```python
+response = await ai.generate(prompt="...", output_schema=Recipe)
+response.output.name  # typed as str — IDE autocomplete works
+```
+
+No new types, no imports beyond the Pydantic model, and the 95% case is one kwarg.
+
+**Recommendation.** Flat kwargs. Remove the `output` param. Keep `output_schema`, `output_format`, `output_constrained`, `output_content_type`, and `output_instructions` as individual keyword-only params. Use `@overload` so `output_schema: type[T]` parameterizes the return type. The wire-format `OutputConfig` remains an internal/plugin type — plugin authors use it when implementing model plugins to read output configuration from the `GenerateRequest`, but app developers never see it.
+
+`output_schema` accepts three forms: a Pydantic model class (`type[T]` — the common case, gives typed returns), a raw JSON schema dict (`dict` — for dynamic schemas, returns `Any`), or a registered schema name (`str`, looked up from registry at runtime, returns `Any`). Only the Pydantic class form carries the generic type. 
+
+**The same applies to `input_schema`.** With flat kwargs and overloads, `input_schema: type[T]` carries the generic directly — `Input[T]` can be removed:
+
+```python
+prompt = ai.define_prompt(
+    name='recipe',
+    input_schema=RecipeInput,
+    output_schema=Recipe,
+    prompt='Make a recipe for {dish}',
+)
+
+response = await prompt(RecipeInput(dish='pizza'))
+response.output.name  # typed as str — IDE knows this is Recipe
+```
+
+Dotprompt should work in a similar way but with one additional nuance. When a schema is defined in a `.prompt` file's YAML frontmatter (`output: { schema: Recipe }`), the SDK uses it to constrain the model's JSON output at runtime. But the type checker doesn't know this — `.prompt` files can't carry Python type references — so `response.output` is `Any`. To get typed output, pass the schema at the call site:
+
+```python
+# Without output_schema — runtime parsing works, but typing is Any
+recipe = ai.prompt('recipe')
+response = await recipe({'food': 'pizza'})
+response.output  # Any — no autocomplete
+
+# With output_schema — typed
+recipe = ai.prompt('recipe', output_schema=Recipe)
+response = await recipe({'food': 'pizza'})
+response.output.name  # str — IDE knows this is Recipe
+```
+
+This is inherent to Python's static type system. The redundancy (schema in both `.prompt` and Python) is the cost of typed output, and it's a cost every framework with external schema files pays. The flat kwarg `output_schema=Recipe` keeps this as lightweight as possible — no wrapper type needed, just name the class.
+
+## 6. Streaming API
+
+The SDK currently has two streaming patterns:
+
+```python
+# generate_stream() — returns a tuple
+stream, future = ai.generate_stream(prompt="Tell me a story")
+async for chunk in stream:
+    print(chunk.text, end="")
+response = await future
+
+# prompt.stream() — returns an object with .stream accessor
+result = prompt.stream({"topic": "AI"})
+async for chunk in result.stream:
+    print(chunk.text, end="")
+response = await result.response
+```
+
+The Python standard for streaming is iterators — OpenAI, Anthropic, and every major Python SDK use them.
+
+Genkit can't use a plain async generator (the OpenAI pattern) because Genkit responses carry more than text — structured output parsing, usage statistics, tool request handling, and the assembled `Message` for multi-turn conversations. A plain generator can't expose a final response object after iteration.
+
+**Proposed streaming syntax:**
+
+```python
+# Simple case — looks identical to a plain generator
+async for chunk in ai.generate_stream(prompt="Tell me a story"):
+    print(chunk.text, end="")
+
+# When you need the final response — assign, iterate, then access
+result = ai.generate_stream(prompt="Tell me a story")
+async for chunk in result:
+    print(chunk.text, end="")
+response = await result.response  # structured output, usage stats, tool requests
+```
+
+**What changes:**
+- `generate_stream()` returns a directly iterable object (implements `__aiter__`) with a `.response` property for the final assembled response
+- `prompt.stream()` uses the same pattern — one streaming convention across the SDK
+- The tuple return is removed — no more destructuring
+
+## 7. Sync and async support
+
+Every Genkit Python method is `async def`. There is no sync API. This is a Python-specific problem — JS is inherently async, Go handles concurrency transparently with goroutines. Python is the only language where the developer has to explicitly choose.
+
+The practical consequences: a Flask route handler can't call `ai.generate()` without managing an event loop. A Jupyter notebook cell needs `await` or `nest_asyncio` workarounds. A CLI script requires wrapping everything in `async def main()` and `ai.run_main()`. These are the most common entry points for developers trying Genkit for the first time.
+
+For context, every major Python LLM SDK offers both sync and async: OpenAI and Anthropic ship dual clients (`OpenAI` / `AsyncOpenAI`), LangChain has dual methods (`.invoke()` / `.ainvoke()`), Google Cloud AI uses a separate async transport. Even Hugging Face, which is sync-only, made a deliberate choice. Genkit is the only async-only SDK in the ecosystem. A developer coming from OpenAI's `client.chat.completions.create()` — no `await`, no `async def` — hits immediate friction.
+
+**Proposal.** Dual clients — `Genkit` (sync) and `AsyncGenkit` (async). This is the industry standard: OpenAI, Anthropic, and Cohere all ship it. The async client holds the real implementation; the sync client delegates to it. We prefer the dual-client pattern (separate classes) over dual methods (`generate()` / `agenerate()` on the same class) because it keeps each class's type signatures clean — every method on `Genkit` returns `T`, every method on `AsyncGenkit` returns `Awaitable[T]` — and avoids polluting autocomplete with `a`-prefixed duplicates of every method.
+
+```python
+from genkit import Genkit, AsyncGenkit
+
+# Sync (scripts, Flask, notebooks)
+ai = Genkit(plugins=[GoogleAI()])
+response = ai.generate(model="googleai/gemini-2.0-flash", prompt="Hi")
+
+# Async (FastAPI, high-concurrency)
+ai = AsyncGenkit(plugins=[GoogleAI()])
+response = await ai.generate(model="googleai/gemini-2.0-flash", prompt="Hi")
+```
+
+The maintenance cost is manageable: the sync client is auto-generated from the async client's method signatures, as OpenAI and Anthropic do. No duplicate implementation, no diverging logic.
+
+## 8. Method signatures
+
+The current `generate()` signature has 20 parameters:
+
+```python
+async def generate(
+    self,
+    model=None, prompt=None, system=None, messages=None,
+    tools=None, return_tool_requests=None, tool_choice=None,
+    tool_responses=None, config=None, max_turns=None,
+    on_chunk=None, context=None,
+    output_format=None, output_content_type=None,
+    output_instructions=None, output_constrained=None, *,
+    output=None, use=None, docs=None,
+) -> GenerateResponseWrapper[Any]:
+```
+
+Pretty much none are keyword-only. The original decision of where to put the * in the first place seems arbitrary. Several params don't belong.
+
+**What changes:**
+
+- **Add `*`** (section 3) — all params keyword-only.
+- **Keep output as flat kwargs** (section 5) — `output_schema`, `output_format`, `output_constrained` stay as individual kwargs with defaults. Remove the `output` param that accepted `OutputConfig | OutputConfigDict | Output[T]`. Net: same param count for output, but one way to configure instead of five.
+- **Remove `on_chunk`** — `generate()` has a streaming callback parameter, but streaming belongs on `generate_stream()`.
+- **Move `tool_responses` to `resume`** — only used when resuming from a tool interrupt. JS already groups this under a `resume` options object.
+
+**After cleanup:**
+
+```python
+async def generate(
+    self,
+    *,
+    model: str | None = None,
+    prompt: str | Part | list[Part] | None = None,
+    system: str | Part | list[Part] | None = None,
+    messages: list[Message] | None = None,
+    tools: list[str] | None = None,
+    tool_choice: ToolChoice | None = None,
+    return_tool_requests: bool | None = None,
+    config: GenerationCommonConfig | dict | None = None,
+    max_turns: int | None = None,
+    context: dict[str, object] | None = None,
+    output_schema: type[OutputT] | None = None,
+    output_format: str | None = None,
+    output_constrained: bool | None = None,
+    use: list[ModelMiddleware] | None = None,
+    docs: list[DocumentData] | None = None,
+) -> GenerateResponse[OutputT]:
+```
+
+**`prompt.__call__()` also changes.** Today it takes a JS-style opts dict:
+
+```python
+# Before — opts dict, no autocomplete
+response = await my_prompt({"name": "Ted"}, opts={"config": {"temperature": 0.4}})
+
+# After — kwargs, full IDE support
+response = await my_prompt({"name": "Ted"}, config={"temperature": 0.4})
+```
+
+The `opts` dict (a TypedDict with 16 fields) is replaced with individual kwargs:
+
+```python
+async def __call__(
+    self,
+    input: InputT | None = None,
+    *,
+    model: str | None = None,
+    config: GenerationCommonConfig | dict[str, Any] | None = None,
+    messages: list[Message] | None = None,
+    docs: list[DocumentData] | None = None,
+    tools: list[str] | None = None,
+    tool_choice: ToolChoice | None = None,
+    output_schema: type | dict[str, Any] | None = None,
+    output_format: str | None = None,
+    output_constrained: bool | None = None,
+    return_tool_requests: bool | None = None,
+    max_turns: int | None = None,
+    use: list[ModelMiddleware] | None = None,
+    context: dict[str, Any] | None = None,
+) -> GenerateResponse[OutputT]:
+```
+
+`input` stays as the one positional arg (the template variables). Everything else is keyword-only. The `resume` options move to a separate `resume()` method or a `resume` kwarg (matching section 8's `generate()` cleanup). `on_chunk` is removed — streaming belongs on `prompt.stream()`.
+
+**`generate_stream()` — after cleanup:**
+
+```python
+async def generate_stream(
+    self,
+    *,
+    model: str | None = None,
+    prompt: str | Part | list[Part] | None = None,
+    system: str | Part | list[Part] | None = None,
+    messages: list[Message] | None = None,
+    tools: list[str] | None = None,
+    tool_choice: ToolChoice | None = None,
+    return_tool_requests: bool | None = None,
+    config: GenerationCommonConfig | dict | None = None,
+    max_turns: int | None = None,
+    context: dict[str, object] | None = None,
+    output_schema: type[OutputT] | None = None,
+    output_format: str | None = None,
+    output_constrained: bool | None = None,
+    use: list[ModelMiddleware] | None = None,
+    docs: list[DocumentData] | None = None,
+) -> GenerateStreamResponse[OutputT]:
+```
+
+Same params as `generate()`. The return type changes from `tuple[AsyncIterator, Future]` to a single `GenerateStreamResponse` object that is directly async-iterable and exposes `.response` (see section 7).
+
+**`retrieve()` — after cleanup:**
+
+```python
+async def retrieve(
+    self,
+    *,
+    retriever: str,
+    query: str | DocumentData,
+    options: dict[str, object] | None = None,
+) -> RetrieverResponse:
+```
+
+Already clean — just needs the `*` marker to enforce keyword-only. `retriever` and `query` become required (they were Optional before, but calling without them always fails).
+
+**`embed()` — after cleanup:**
+
+```python
+async def embed(
+    self,
+    *,
+    embedder: str,
+    content: str | Document | DocumentData,
+    metadata: dict[str, object] | None = None,
+    options: dict[str, object] | None = None,
+) -> list[Embedding]:
+```
+
+Same treatment — `*` marker, `embedder` and `content` become required.
+
+## 9. Serialization cleanup — `GenkitBaseModel`
+
+### The problem
+
+Every Genkit type extends raw `pydantic.BaseModel`. Serialization to the wire
+(camelCase JSON, no null fields) requires passing two flags every time:
+
+```python
+obj.model_dump(exclude_none=True, by_alias=True)
+```
+
+Nobody remembers both flags. So `codec.py` provides `dump_dict()` and `dump_json()`
+wrappers. But call sites are split three ways:
+
+| Pattern | Correct? | Count |
+|---|---|---|
+| `dump_dict(obj)` / `dump_json(obj)` | Yes (both flags) | ~20 calls across 13 files |
+| `.model_dump(exclude_none=True, by_alias=True)` | Yes (both flags) | 5 calls |
+| `.model_dump()` with partial or no flags | **No** | **11 calls** |
+
+### The fix: `GenkitBaseModel`
+
+Pydantic's `model_config` doesn't support `exclude_none` as a config key — it's
+a parameter to `model_dump()`. So we override the methods to change the defaults:
+
+```python
+from pydantic import BaseModel, ConfigDict
+from pydantic.alias_generators import to_camel
+
+class GenkitBaseModel(BaseModel):
+    model_config = ConfigDict(
+        populate_by_name=True,
+        alias_generator=to_camel,
+    )
+
+    def model_dump(self, *, exclude_none=True, by_alias=True, **kwargs):
+        return super().model_dump(exclude_none=exclude_none, by_alias=by_alias, **kwargs)
+
+    def model_dump_json(self, *, exclude_none=True, by_alias=True, **kwargs):
+        return super().model_dump_json(exclude_none=exclude_none, by_alias=by_alias, **kwargs)
+```
+
+Now `obj.model_dump()` does the right thing. You can still override:
+`obj.model_dump(exclude_none=False)` when you actually want nulls.
+
+**Where it lives:** `genkit/core/_internal/_base.py` — Level 0 in the import DAG
+(see §12). Zero genkit imports, no circular import risk.
+
+**Not re-exported.** `GenkitBaseModel` is strictly internal. App developers
+construct `Message(...)`, `Document(...)`, etc. and never see the base class.
+Plugin authors extend exported types like `GenerationCommonConfig` or use plain
+`BaseModel` for plugin-internal types.
+
+### Migration plan
+
+| Step | Scope | Risk |
+|---|---|---|
+| Create `GenkitBaseModel` in `genkit.core._internal._base` | 1 file | None |
+| Change core schema types to inherit from it | ~10 files in `genkit/core/`, `genkit/blocks/` | Low — behavioral change only on direct `.model_dump()` calls |
+| Audit the 11 inconsistent calls — some may intentionally want no aliases | Case-by-case | Medium — need to check if any internal-only paths rely on snake_case keys |
+| Simplify `dump_dict`/`dump_json` | `codec.py` | Low |
+| Remove `dump_dict`/`dump_json` from public API | `__init__.py` | None — already proposed for removal |
+
+### Open questions
+
+1. **Do any internal paths intentionally use snake_case keys?** The `prompt.py`
+   calls that skip `by_alias` might be feeding data back into `model_validate()`,
+   where snake_case is fine. Need to audit each of the 11 sites.
+
+2. **`document.py` dedup hash** — uses `model_dump_json()` with no flags for
+   equality comparison. If we change defaults, the hash changes for any model
+   that has aliases. This could break dedup for in-flight data. Probably fine
+   (dedup is ephemeral), but worth noting.
+
+3. **Third-party model types** — e.g. Google AI SDK types that Genkit wraps.
+   These won't inherit `GenkitBaseModel`, so `dump_dict()` still needs to handle
+   the `isinstance(obj, BaseModel)` case with explicit flags. Or we only use
+   `dump_dict` for third-party types and `.model_dump()` for our own.
+
+## 10. `define_*` should accept raw Python types
+
+### The problem
+
+17 plugins and 8 core files call `to_json_schema()` manually before passing
+schemas to `define_model`, `define_retriever`, etc.:
+
+```python
+# Current — every plugin does this:
+from genkit.core.schema import to_json_schema
+
+ai.define_model(
+    name='my-model',
+    metadata={'model': {'customOptions': to_json_schema(MyConfig)}},
+    config_schema=to_json_schema(MyConfig),
+    ...
+)
+```
+
+This is unnecessary boilerplate. The framework should handle the conversion.
+
+### Cross-language comparison
+
+- **JS** — `toJsonSchema` is public at `genkit/schema` (alongside `parseSchema`,
+  `validateSchema`, `JSONSchema`). But JS `defineModel` also accepts Zod schemas
+  directly — plugins don't *have* to call `toJsonSchema` manually.
+- **Go** — `jsonschema.Reflect()` is internal. `defineModel` in `ai/gen.go`
+  accepts Go types and converts internally.
+- **Python** — `to_json_schema` is public but lives at a deep path
+  (`genkit.core.schema`). And `define_*` functions *require* pre-converted dicts.
+
+Python is the only SDK where plugins are *forced* to call the schema conversion
+themselves. JS has it public but optional; Go internalizes it entirely.
+
+### The fix
+
+`define_*` functions accept `type | dict | None` directly:
+
+```python
+# After — plugins just pass the type:
+ai.define_model(
+    name='my-model',
+    config_schema=MyConfig,   # Python type, not JSON Schema dict
+    ...
+)
+```
+
+The framework calls `to_json_schema()` internally when building action metadata.
+Same for `define_retriever`, `define_embedder`, `define_reranker`, `define_evaluator`.
+
+`to_json_schema` moves to `core/_internal/_schema.py`. No plugin needs it.
+`extract_json` moves to `core/_internal/_extract.py`. Zero plugin consumers —
+only used by `formats/` internally.
+
+### Migration
+
+| Step | Scope | Risk |
+|---|---|---|
+| Update `define_*` signatures to accept `type \| dict \| None` | ~6 functions in ai/ | Low — dict passthrough preserves backward compat |
+| Move `to_json_schema` calls inside `define_*` functions | Same 6 functions | Low |
+| Move `schema.py` to `core/_internal/_schema.py` | 1 file | None |
+| Move `extract.py` to `core/_internal/_extract.py` | 1 file | None |
+| Update 16 plugins to drop `to_json_schema` import + calls | 16 plugin files | Medium — mechanical but wide |
+
+## 11. `ErrorResponse` — internal type consolidation
+
+Replaces 3 error wire format types (`HttpErrorWireFormat`,
+`GenkitReflectionApiDetailsWireFormat`, `GenkitReflectionApiErrorWireFormat`).
+Single Pydantic model with `message`, `status`, `details: dict | None`.
+Internal only — used by the reflection server (`_web/_reflection.py`).
+
+## 12. Import DAG
+
+The internal import graph of the `genkit` package, simplified. Every new module
+or dependency should be evaluated against this to prevent circular imports.
+
+```
+Level 0  (no genkit imports — leaf modules):
+  core/_internal/_base.py      GenkitBaseModel
+  core/_internal/_compat.py    StrEnum, override, wait_for backfills
+  core/_internal/_schema.py    to_json_schema
+  core/_internal/_extract.py   extract_json, extract_items
+  core/_internal/_constants.py GENKIT_VERSION, GENKIT_CLIENT_HEADER
+  core/_internal/_logging.py   get_logger
+
+Level 1  (imports Level 0 only):
+  core/_internal/_typing.py    60+ BaseModel classes (imports _compat + _base)
+  core/error.py                GenkitError, UserFacingError, StatusCodes, Status
+                               (absorbs status_types.py — imports _base only)
+
+Level 2  (imports Level 0–1):
+  core/action.py               Action, ActionRunContext, ActionMetadata, ActionKind,
+                               ActionResponse (absorbs action_types.py)
+  core/_internal/_registry.py  Registry
+  core/_internal/_context.py   RequestData, ContextMetadata
+  core/_internal/_environment.py  EnvVar, is_dev_environment
+  core/_internal/_aio.py       Channel, run_async, ensure_async
+  core/_internal/_http_client.py  per-event-loop httpx.AsyncClient cache
+  core/plugin.py               Plugin ABC
+  core/_internal/_flow.py      FlowWrapper (~50 lines)
+  core/_internal/_background.py  BackgroundAction (imports action, error)
+  core/_internal/_dap.py       DynamicActionProvider (imports action)
+
+Level 3  (imports Level 0–2):
+  ai/model.py                  define_model, GenerateResponseWrapper, etc.
+  ai/retriever.py              define_retriever, RetrieverRef, etc.
+  ai/embedding.py              define_embedder, EmbedderRef, etc.
+  ai/evaluator.py              define_evaluator, EvaluatorRef
+  ai/tools.py                  define_tool, ToolRunContext
+  ai/prompt.py                 ExecutablePrompt, define_prompt
+  ai/_internal/_generate.py    generate() orchestration, tool loop
+  ai/_internal/_dotprompt.py   dotprompt template engine
+
+Level 4  (imports Level 0–3):
+  ai/_internal/_genkit.py      Genkit class body
+  ai/_internal/_genkit_base.py Genkit __init__, server startup
+  _web/_reflection.py          Dev UI ASGI app
+  _web/_runtime.py             RuntimeManager
+```
+
+**Rules:**
+- Each level may only import from levels below it.
+- `core/` has zero imports from `ai/` or `_web/`.
+- `ai/` has zero imports from `_web/`.
+- All `_internal/` modules are plumbing — can change between versions. Parent packages re-export what's needed. `import-linter` blocks plugins from importing `_internal/` paths.
+- `core/` has only 3 package-level files (`action.py`, `error.py`, `plugin.py`) — everything else is `_internal/`. These are stable abstractions listed in `core/__init__.py`'s `__all__`. They can still have `_`-prefixed private helpers inside — normal Python.
+- Since there's no public `genkit.core` or `genkit.ai` import path, the split is for SDK developer clarity, not external API.
+- Enforced by `import-linter` in CI (see [python_package_reorg.md](./python_package_reorg.md)).
diff --git a/py/docs/python_package_reorg.md b/py/docs/python_package_reorg.md
new file mode 100644
index 0000000000..a83a09ea11
--- /dev/null
+++ b/py/docs/python_package_reorg.md
@@ -0,0 +1,415 @@
+# Python SDK — Package Reorganization
+
+Proposal to align the Python SDK's internal package structure with Go and JS,
+enforce public/internal boundaries, and split oversized files.
+
+Related docs:
+- [python_beta_type_design.md](./python_beta_type_design.md) — type audit
+- [python_type_audit_checklist.md](./python_type_audit_checklist.md) — checklist (33 types deleted, affects file contents below)
+- [python_beta_api_proposal.md](./python_beta_api_proposal.md) — public API surface + `GenkitBaseModel` serialization fix + `define_*` accepts raw types (schema/extract internalized)
+- [GENKIT_CLASS_DESIGN.md](../GENKIT_CLASS_DESIGN.md) — Genkit class
+
+---
+
+## Current state
+
+```
+genkit/                          7 sub-packages, 73 .py files
+├── ai/                          god object + helpers (5 files, 4,500 lines)
+├── aio/                         async utilities (4 files)
+├── blocks/                      domain types (14 files, 7,800 lines)
+│   └── formats/                 output format impls
+├── core/                        framework internals (15 files, 5,500 lines)
+│   ├── action/                  Action class, context, types
+│   └── trace/                   OTel exporters/processors
+├── lang/                        deprecation helpers (1 file)
+├── types/                       barrel re-export
+├── web/                         ASGI server management (8 files)
+│   └── manager/
+├── __init__.py                  public API barrel
+├── codec.py                     JSON serialization helpers
+├── model_types.py               GenerationCommonConfig + api_key helpers
+└── testing.py                   test doubles
+```
+
+### Problems
+
+1. **`blocks/` doesn't exist in Go or JS.** Both put domain types in `ai/`.
+   Python's extra layer creates the question "does this go in `ai/` or `blocks/`?"
+
+2. **Orphan packages.** `aio/` (4 files), `lang/` (1 file), `types/` (barrel).
+   None earn their existence as top-level packages.
+
+3. **Giant files.** `blocks/prompt.py` (2,446 lines), `ai/_registry.py` (1,680),
+   `ai/_aio.py` (1,164). JS/Go equivalents are 600–900 lines.
+
+4. **No boundary enforcement.** Plugins import from `genkit.core.action._action`,
+   `genkit.blocks.model`, `genkit.ai._runtime` — deep internal paths. No `__all__`
+   on most `__init__.py` files.
+
+5. **Loose root files.** `codec.py` and `model_types.py` are orphans that belong
+   in `core/` and `ai/` respectively.
+
+---
+
+## Proposed structure
+
+```
+genkit/
+├── __init__.py              public API barrel (__all__ defined)
+├── ai/                      AI domain types + Genkit class
+│   ├── __init__.py          public exports (__all__ defined)
+│   ├── prompt.py            ExecutablePrompt + define_prompt
+│   ├── streaming.py         GenerateStreamResponse
+│   ├── model.py             GenerateResponseWrapper, ChunkWrapper, MessageWrapper,
+│   │                        ModelReference, GenerationCommonConfig, define_model,
+│   │                        resolve_api_key, compute_usage_stats
+│   ├── document.py          Document, RankedDocument
+│   ├── retriever.py         RetrieverRef, define_retriever, etc. (RetrieverOptions deleted — kwargs)
+│   ├── embedding.py         Embedder, EmbedderRef, define_embedder (EmbedderOptions deleted — kwargs)
+│   ├── reranker.py          RerankerRef, define_reranker (RerankerOptions deleted — kwargs)
+│   ├── evaluator.py         EvaluatorRef, define_evaluator
+│   ├── tools.py             ToolRunContext, ToolInterruptError, define_tool
+│   ├── resource.py          resource actions, define_resource
+│   ├── formats/             output format system
+│   │   ├── types.py         FormatDef, Formatter, FormatterConfig
+│   │   ├── json.py, text.py, jsonl.py, enum.py, array.py
+│   ├── _internal/
+│   │   ├── _genkit.py       Genkit class body (from ai/_aio.py)
+│   │   ├── _genkit_base.py  Genkit __init__, server startup (from ai/_base_async.py)
+│   │   ├── _dotprompt.py    dotprompt template engine — render_*, file loading, PromptCache
+│   │   ├── _generate.py     generate() orchestration, tool loop (from blocks/generate.py)
+│   │   ├── _middleware.py    model middleware execution
+│   │   └── _messages.py     message construction helpers
+│
+├── core/                    framework primitives (not AI-specific)
+│   ├── __init__.py          public exports (__all__ defined)
+│   ├── action.py            Action, ActionRunContext, ActionMetadata, ActionKind,
+│   │                         ActionResponse, ActionMetadataKey (flattened —
+│   │                         absorbs action_types.py, 18 consumers, same concept)
+│   ├── error.py             GenkitError, UserFacingError, StatusCodes, Status,
+│   │                         http_status_code (absorbs status_types.py — only consumer)
+│   ├── plugin.py            Plugin ABC
+│   ├── _internal/
+│   │   ├── _typing.py       auto-generated schema types (DO NOT EDIT header).
+│   │   │                     60+ BaseModel classes. Re-exported via genkit/__init__.py
+│   │   │                     and domain sub-modules. Nobody imports this directly.
+│   │   ├── _base.py         GenkitBaseModel (Pydantic base with exclude_none + by_alias defaults)
+│   │   ├── _compat.py       StrEnum (3.10), override (3.11), wait_for (3.10) backfills
+│   │   │                     (absorbs aio/_compat.py — dies when min Python ≥ 3.12)
+│   │   ├── _registry.py     Registry class
+│   │   ├── _server.py       ServerSpec (reflection API config — moved from ai/)
+│   │   ├── _context.py      RequestData, ContextMetadata
+│   │   ├── _tracing.py      tracing setup, span creation, dev UI exporter,
+│   │   │                     RealtimeSpanProcessor (~350 lines merged from
+│   │   │                     tracing.py + default_exporter.py + realtime_processor.py.
+│   │   │                     AdjustingTraceExporter + RedactedSpan moved to
+│   │   │                     telemetry plugin — 5 plugins import, 0 core files do.)
+│   │   ├── _http_client.py  HTTP client cache (per-event-loop httpx.AsyncClient — 8 plugins use)
+│   │   ├── _environment.py  EnvVar, GenkitEnvironment, is_dev_environment()
+│   │   ├── _aio.py          Channel, run_async, run_loop, ensure_async, iter_over_async
+│   │   │                     (~500 lines merged from all 4 aio/* files)
+│   │   ├── _schema.py       to_json_schema (internal — define_* accepts types directly)
+│   │   ├── _extract.py      extract_json, extract_items (internal — only used by formats/)
+│   │   ├── _logging.py      get_logger (structlog wrapper — trim 20-method Protocol to ~7)
+│   │   ├── _constants.py    GENKIT_VERSION, GENKIT_CLIENT_HEADER
+│   │   ├── _flow.py         FlowWrapper (~50 lines — users never construct, returned by @ai.flow())
+│   │   ├── _background.py   BackgroundAction (2 internal consumers, not re-exported top-level)
+│   │   └── _dap.py          DynamicActionProvider (1 internal consumer, not re-exported top-level)
+│
+├── tracing.py               tracer, add_custom_exporter (public — matches JS genkit/tracing)
+│
+├── _web/                    dev server only (all internal)
+│   ├── _reflection.py       Dev UI reflection API (moved from core/). Starlette ASGI app
+│   │                         exposing /api/actions, /api/runAction, etc. Only consumer is
+│   │                         the runtime startup code that mounts it on uvicorn.
+│   └── _runtime.py          RuntimeManager — writes .genkit/runtimes/ files
+│
+│   DELETED: web/manager/ (~1,500 lines, 7 types)
+│   ServerManager, ASGIServerAdapter, UvicornAdapter, GranianAdapter,
+│   SignalHandler, ServerLifecycle, ServerConfig, AbstractBaseServer,
+│   ports.py, info.py — all unused by framework/plugins. Only consumer
+│   was one sample (web-multi-server). The reflection server uses raw
+│   uvicorn directly (~15 lines in _base_async.py). No abstraction needed.
+│
+└── testing.py               ProgrammableModel, EchoModel, StaticResponseModel
+```
+
+### What changed
+
+| Change | Details |
+|---|---|
+| **Delete `blocks/`** | All files move into `ai/`. Domain types live where Go/JS put them. |
+| **Delete `aio/`** | `Channel` + loop utils → `core/_internal/_aio.py` |
+| **Delete `lang/`** | `deprecations.py` → inline into google-genai plugin (only consumer). |
+| **Delete `types/`** | Barrel re-export removed. `genkit/__init__.py` handles this. |
+| **Delete `web/manager/`** | ~1,500 lines of unused multi-server orchestration. Reflection server uses raw uvicorn (~15 lines). |
+| **Delete `core/flows.py`** | `create_flows_asgi_app()` — auto-exposes flows as HTTP endpoints. Firebase Cloud Functions pattern that doesn't fit Python (Cloud Functions uses Flask, not ASGI; no `onCallGenkit` for Python). Users should use FastAPI/Flask instead. JS has this (`startFlowServer`) because the Express ecosystem aligns; Python's doesn't. ~370 lines. |
+| **Rename `web/` → `_web/`** | Prefix signals "internal, don't import". Now just reflection + runtime. |
+| **Move `core/reflection.py` → `_web/`** | It's a Starlette ASGI app, not a core primitive. Breaks `core/` → `web/` cycle. |
+| **Delete `codec.py`** | `dump_dict`/`dump_json` die with `GenkitBaseModel` (see [python_beta_api_proposal.md §5](./python_beta_api_proposal.md)). Third-party `BaseModel` fallback inlined into `_base.py`. |
+| **Delete `model_types.py`** | `GenerationCommonConfig` → `ai/model.py`. API key helpers renamed to `resolve_api_key` and exposed from `model.py`. `get_basic_usage_stats` renamed to `compute_usage_stats`. |
+| **Merge `action_types.py` into `action.py`** | 95 lines, same 18 consumers, same concept. `ActionKind`, `ActionResponse`, `ActionMetadataKey` live alongside `Action`. |
+| **Merge `status_types.py` into `error.py`** | Only consumer is `error.py`. `StatusCodes`, `Status`, `http_status_code` are tightly coupled with the error hierarchy. |
+| **Move `FlowWrapper` → `_internal/`** | `ai/_registry.py` → `core/_internal/_flow.py`. ~50 lines, 2 consumers, users never construct directly (returned by `@ai.flow()`). |
+| **Move `BackgroundAction` → `_internal/`** | `blocks/background_model.py` → `core/_internal/_background.py`. Not re-exported top-level, only 2 internal consumers. `genkit.model` sub-module re-exports it for plugin authors. |
+| **Move `DynamicActionProvider` → `_internal/`** | `blocks/dap.py` → `core/_internal/_dap.py`. Not re-exported top-level, single internal consumer (`ai/_registry.py`). |
+| **Split `prompt.py`** | 2,446 → ~600 (prompt.py) + ~200 (streaming.py) + ~800 (_dotprompt.py) |
+| **Move `typing.py` → `_internal/`** | `core/typing.py` → `core/_internal/_typing.py`. Auto-generated 60+ `BaseModel` classes. `core/` is not a public import path — public types are re-exported from `genkit/__init__.py` and domain sub-modules. The file is pure plumbing. |
+| **Internalize `schema.py` + `extract.py`** | Both move to `core/_internal/`. `define_*` functions accept raw Python types so no plugin needs `to_json_schema`. `extract_json` has zero plugin consumers — only used by `formats/`. JS exports both publicly but nobody imports them there either. See [python_beta_api_proposal.md §6](./python_beta_api_proposal.md). |
+| **Dissolve `ai/_registry.py`** | define_* functions move to their domain files (like Go). `define_model` → `ai/model.py`, `define_retriever` → `ai/retriever.py`, etc. Genkit method stubs stay in `ai/_internal/_genkit.py`. `_registry.py` ceases to exist. |
+| **Add `_internal/`** | Pydantic v2 pattern: private implementation behind `_internal/` |
+| **Add `__all__`** | Every public `__init__.py` declares its exports |
+
+## Plugin import paths — before and after
+
+### Model plugin (e.g., google-genai gemini.py)
+
+```python
+# Before (6 deep imports):
+from genkit.ai import ActionRunContext, GENKIT_CLIENT_HEADER
+from genkit.blocks.model import get_basic_usage_stats
+from genkit.codec import dump_dict, dump_json
+from genkit.core.error import GenkitError, StatusName
+from genkit.core.tracing import tracer
+from genkit.core.typing import GenerationCommonConfig, Message, ...
+
+# After (2-3 imports — top-level genkit, genkit.ai, genkit.tracing):
+from genkit import GenkitError, GENKIT_CLIENT_HEADER
+from genkit.tracing import tracer
+from genkit.ai import (
+    ActionRunContext, GenerationCommonConfig,
+    Message, compute_usage_stats,
+)
+```
+
+### Retriever plugin (e.g., vertex-ai vector_search.py)
+
+```python
+# Before (5 deep imports):
+from genkit.ai import Genkit
+from genkit.blocks.document import Document
+from genkit.blocks.retriever import retriever_action_metadata
+from genkit.core.action.types import ActionKind
+from genkit.core.schema import to_json_schema
+
+# After (1 import — define_retriever accepts types directly, no manual to_json_schema):
+from genkit import Genkit, Document, ActionKind
+```
+
+### Telemetry plugin (e.g., observability)
+
+```python
+# Before (3 deep imports):
+from genkit.core.environment import is_dev_environment
+from genkit.core.trace.adjusting_exporter import AdjustingTraceExporter
+from genkit.core.tracing import add_custom_exporter
+
+# After (2 imports — AdjustingTraceExporter moves to telemetry plugin):
+from genkit import is_dev_environment
+from genkit.tracing import add_custom_exporter
+```
+
+---
+
+## Circular import fix: `core/` → `_web/` cycle
+
+**Problem.** Today `core/` has a hidden dependency on `web/`:
+
+- `core/reflection.py` imports `genkit.web.manager` (it **is** a Starlette ASGI app)
+- `core/flows.py` imports `genkit.web.manager` (it **is** a Starlette ASGI app)
+- `web/` modules import from `genkit.core.*`
+
+This creates a package-level cycle: `core/ ↔ web/`.
+
+**Root cause.** Both `reflection.py` and `flows.py` are 100% HTTP server
+code — Starlette routes, ASGI apps, request/response handling. They ended
+up in `core/` by accident, not because they provide core primitives.
+
+**Fix.**
+
+- `core/reflection.py` → move to `_web/reflection.py`
+- `core/flows.py` → **delete** (see "What changed" table — Firebase pattern
+  that doesn't fit Python; users should use FastAPI/Flask)
+
+```
+_web/
+├── _reflection.py   ← was core/reflection.py
+└── _runtime.py      ← RuntimeManager
+```
+
+### Additional cross-package violations to fix
+
+**`core/plugin.py` → `blocks/` (becomes `core/` → `ai/` after reorg).**
+The `Plugin` base class has two convenience methods — `model(name)` and
+`embedder(name)` — that do deferred runtime imports of `ModelReference` and
+`EmbedderRef` from `blocks/`. After the reorg, `blocks/` merges into `ai/`,
+creating a `core/ → ai/` layering violation.
+
+Fix: **restore the original async resolve-based helpers, add `embedder()`.** The
+current methods (added in #4278) construct `ModelReference`/`EmbedderRef` objects,
+which requires importing from `blocks/`. The original version (from #4132) called
+`self.resolve(ActionKind.MODEL, name)` and returned `Action` — no imports from
+`blocks/` or `ai/`, zero layering violation. Matches JS's
+`GenkitPluginV2Instance.model()`. The `embedder()` method gets the same treatment.
+Both are async, return `Action | None`, and only use types already in `core/`.
+
+**`ai/_base_async.py` → `web/manager/_ports.py`.**
+Imports `find_free_port_sync` — a 15-line stdlib socket utility. After the
+reorg, `web/manager/` is deleted.
+
+Fix: move `find_free_port_sync` to `core/_internal/_ports.py`. It's pure
+stdlib (`socket.bind`), no dependencies.
+
+### After all fixes
+
+The dependency graph is strictly one-directional:
+
+```
+_web/  →  ai/  →  core/
+  └────────────────↗
+```
+
+`core/` has zero imports from `_web/` or `ai/`. Clean layering.
+
+---
+
+## Boundary enforcement
+
+### 1. `__all__` on every public `__init__.py`
+
+```python
+# genkit/__init__.py  (the ONLY public import path for most users)
+__all__ = [
+    'Genkit', 'Document', 'GenkitError', 'UserFacingError',
+    'GenerateResponse', 'StreamResponse', 'GenerateResponseChunk',
+    'ExecutablePrompt', 'Message', 'Role',
+    'Part', 'TextPart', 'MediaPart', 'Media',
+    'ToolRunContext', 'ToolInterruptError', 'ToolChoice',
+    'RequestData', 'ContextProvider',
+    'GENKIT_VERSION', 'GENKIT_CLIENT_HEADER', 'is_dev_environment',
+    'Plugin', 'Action', 'ActionMetadata', 'ActionKind', 'StatusCodes',
+    # ... ~34 symbols (see python_beta_api_proposal.md §1)
+]
+
+# genkit/tracing.py  (telemetry plugin authors)
+__all__ = ['tracer', 'add_custom_exporter']
+
+# genkit/model.py, genkit/retriever.py, etc.  (domain sub-modules for plugin authors)
+# Each defines __all__ with its domain types.
+```
+
+**No public `genkit.core` or `genkit.ai` import paths.** `core/` and `ai/` are
+internal package structure — `genkit/__init__.py` re-exports everything users need.
+Domain sub-modules (`genkit.model`, `genkit.retriever`, etc.) are for plugin authors
+who need wire-format types not in the top-level barrel.
+
+### 2. `import-linter` in CI
+
+```ini
+# .importlinter
+[importlinter]
+root_package = genkit
+
+[importlinter:contract:layers]
+name = Package layers
+type = layers
+layers =
+    genkit._web
+    genkit.ai
+    genkit.core
+
+[importlinter:contract:no-internal-from-plugins]
+name = Plugins must not import _internal
+type = forbidden
+source_modules =
+    genkit.plugins
+forbidden_modules =
+    genkit.ai._internal
+    genkit.core._internal
+```
+
+### 3. `_internal/` convention
+
+Following Pydantic v2's pattern. The split works like this:
+
+**Files at package level (e.g. `core/action.py`, `core/error.py`):**
+- Clean abstractions within the package — the "logical public API" of that sub-package
+- Listed in the sub-package's `__init__.py` `__all__`
+- Other SDK modules import from here: `from genkit.core.action import Action`
+- Can still have private helpers (`_foo()`) inside the file — normal Python
+- Signals to SDK developers: "this is a stable abstraction"
+
+**Files in `_internal/` (e.g. `core/_internal/_registry.py`, `core/_internal/_typing.py`):**
+- Implementation machinery — can change between versions without notice
+- NOT listed in the sub-package's `__init__.py` `__all__`
+- Other SDK modules import directly when needed: `from genkit.core._internal._base import GenkitBaseModel`
+- `import-linter` prevents plugins from importing these paths
+- Signals to SDK developers: "this is plumbing, handle with care"
+
+Since there's **no public `genkit.core`** import path anyway, the split is primarily
+about signaling intent to other developers working on the SDK itself. External users
+import from `genkit`, `genkit.model`, `genkit.tracing`, etc. — they never see either level.
+
+---
+
+## File size targets
+
+| File | Current | Target | How |
+|---|---|---|---|
+| `blocks/prompt.py` | 2,446 | ~600 | Split into prompt.py + streaming.py + _dotprompt.py (render_*, file loading, PromptCache) |
+| `ai/_registry.py` | 1,680 | **0 (deleted)** | define_* functions move to domain files (model.py, retriever.py, etc.). Genkit method stubs absorbed into _genkit.py. File ceases to exist. |
+| `ai/_aio.py` | 1,164 | ~800 | Rename to _genkit.py, extract server startup to _genkit_base.py |
+| `blocks/generate.py` | 1,088 | ~600 | Extract tool loop to _generate.py, keep public generate function |
+| `core/_internal/_typing.py` | 1,066 | 1,066 | Auto-generated, don't touch. Add DO NOT EDIT header. Moved to `_internal/`. |
+
+Target: no hand-written file over 800 lines. Matches Go/JS norms.
+
+---
+
+## Migration path
+
+This is a **one-time refactor** with minimal logic changes. Most of the diff is
+file moves and import path updates. The API changes are:
+
+- `define_*` functions accept `type | dict | None` (see [§6](./python_beta_api_proposal.md))
+- `GenkitBaseModel` replaces `dump_dict`/`dump_json` (see [§5](./python_beta_api_proposal.md))
+- `to_json_schema` and `extract_json` become internal
+- Public import paths change from `genkit.core.*` / `genkit.blocks.*` to
+  `from genkit import ...` and domain sub-modules (`genkit.model`, etc.)
+
+The structural diff is:
+
+1. Move files
+2. Update import paths (find-and-replace across plugins)
+3. Add `__all__` to `__init__.py` files
+4. Split 3 oversized files
+
+### Order of operations
+
+1. **Add `__all__` to existing `__init__.py` files** — zero-risk, clarifies
+   public API immediately. Can land as its own PR.
+
+2. **Merge `blocks/` into `ai/`** — the big structural move. Update all
+   import paths. One PR.
+
+3. **Move `FlowWrapper`, `BackgroundAction`, `DynamicActionProvider` to `core/`** —
+   small cross-language alignment fix. One PR.
+
+4. **Kill orphans** — delete `aio/`, `lang/`, `types/`, move root files.
+   One PR.
+
+5. **Create `_internal/` directories** — move implementation files behind
+   the boundary. Update internal imports. One PR.
+
+6. **Rename `web/` → `_web/`, move `core/reflection.py` into `_web/`,
+   delete `core/flows.py`** — breaks the `core/ ↔ web/` circular
+   dependency and removes the unused flows server. One PR.
+
+7. **Split oversized files** — `prompt.py`, `_registry.py`, `generate.py`.
+   One PR each.
+
+8. **Add `import-linter` to CI** — one PR, enforces the new structure going
+   forward.
+
+Each step is independently shippable and independently revertible.
diff --git a/py/docs/python_type_audit_checklist.md b/py/docs/python_type_audit_checklist.md
new file mode 100644
index 0000000000..9d8f33931c
--- /dev/null
+++ b/py/docs/python_type_audit_checklist.md
@@ -0,0 +1,183 @@
+# Hand-Written Type Audit — Checklist
+
+121 classes total (119 audited + 2 private: `_LatencyTrackable`, `_ModelCopyable`).
+
+Detailed write-ups: [python_beta_type_design.md](./python_beta_type_design.md),
+[python_class_audits.md](./python_class_audits.md),
+[GENKIT_CLASS_DESIGN.md](../GENKIT_CLASS_DESIGN.md).
+
+---
+
+## Must fix (5) — significant design rework
+
+- [ ] `Genkit` — god object, 38 methods, positional args, `generate_stream`
+      returns raw tuple, `define_prompt` has 23 params. Audited in
+      GENKIT_CLASS_DESIGN.md.
+
+- [ ] `ExecutablePrompt` — `opts: TypedDict` kills IDE autocomplete on
+      `__call__`, `stream`, `render`. 220-line `render()`. Fragile
+      `_ensure_resolved()` copies 20 fields. Audited in python_class_audits.md §2.
+
+- [ ] `GenerateStreamResponse` — not used by `Genkit.generate_stream()` (returns
+      raw tuple instead), not directly iterable (no `__aiter__`), lives in
+      wrong module (`blocks/prompt.py`). Audited in python_class_audits.md §5.
+
+- [ ] `GenerateResponseWrapper` — `assert_valid()`/`assert_valid_schema()` are
+      empty placeholders, missing `reasoning`/`media`/`data`/`model` properties
+      that JS has. Audited in python_class_audits.md §3.
+
+- [ ] `ToolInterruptError` — extends `Exception` not `GenkitError` (blocked on
+      #4346), `str(err)` returns empty string, `metadata` not keyword-only.
+      Audited in python_class_audits.md §6.
+
+---
+
+## Should fix (28) — non-trivial changes needed
+
+- [ ] `UserFacingError` — positional args, should be keyword-only.
+- [ ] `GenkitError` — two serialization methods + standalone function, consolidate.
+- [ ] `Document` — `.text()` is a method, not property. Inconsistent with every
+      other `.text` in the SDK. Breaking change. Audited in python_class_audits.md §4.
+- [ ] `FlowWrapper` — `stream()` returns tuple, should return `GenerateStreamResponse`.
+- [ ] `GenerationResponseError` — positional args, should be keyword-only.
+- [ ] `Plugin` — has `model()`/`embedder()` convenience but not `retriever()` etc.
+      Causes layering violation (circular import).
+- [ ] `TelemetryServerSpanExporter` — creates new `httpx.Client()` per `export()`
+      call (no connection pooling), ignores HTTP errors.
+- [ ] `ServerSpec` — confusingly similar name to `ServerConfig` (being deleted).
+      Rename to `ReflectionServerConfig` or similar.
+- [ ] `ModelReference` / `EmbedderRef` / `RetrieverRef` / `IndexerRef` /
+      `RerankerRef` / `EvaluatorRef` — wildly inconsistent shapes. `ModelReference`
+      allows extras, `EvaluatorRef` uses different fields, `EmbedderRef` missing
+      `info`. See python_beta_type_design.md §20.
+- [ ] `GenerateResponseChunkWrapper` / `MessageWrapper` — missing `reasoning`,
+      `media`, `data` properties that JS has. See python_beta_type_design.md §21.
+- [ ] `Action` — mutable `input_schema`/`output_schema` (should be immutable),
+      `on_chunk`/`on_trace_start` callbacks on public API (Python uses `async for`),
+      `run()` should be deleted, `arun()`/`arun_raw()` confusing, no `__call__`.
+      Audited in python_class_audits.md §1.
+- [ ] `ActionRunContext` / `ToolRunContext` — missing trace_id/span_id (JS
+      provides), `ToolRunContext` accesses parent private fields.
+- [ ] `FormatDef` — uses `@abc.abstractmethod` but doesn't extend `abc.ABC`.
+      One-line fix.
+- [ ] `Logger` — 20-method Protocol. `warn`/`warning` redundant alias,
+      `fatal`/`critical` redundant alias. JS Logger has 7 methods.
+- [ ] `AdjustingTraceExporter` — belongs in telemetry plugin, not core SDK.
+      JS equivalent lives in `js/plugins/google-cloud/`.
+- [ ] `RealtimeSpanProcessor` — belongs in telemetry plugin, not core SDK.
+- [ ] `RedactedSpan` — used exclusively by `AdjustingTraceExporter`, moves
+      with it to telemetry plugin.
+- [ ] `GablorkenInput` — test fixture exported publicly in `__all__`. Should
+      be private or inlined into `test_models()`.
+- [ ] `PromptCache` — plain class with 3 optional fields, not even a dataclass.
+      Fold into `ExecutablePrompt` as private attributes.
+- [ ] `RerankerParams` — misnamed. Has `reranker`, `query`, `documents` — this
+      is action input, should be `RerankerRequest` for consistency with
+      `RetrieverRequest`/`IndexerRequest`.
+- [ ] `ResumeOptions` — TypedDict (same autocomplete-killer as
+      `PromptGenerateOptions`). Convert to dataclass or flatten when
+      `PromptGenerateOptions` is replaced with kwargs.
+
+---
+
+## Delete (34) — remove entirely
+
+**Replaced by kwargs on `define_*` methods:**
+- [ ] `EmbedderOptions` — flatten to kwargs on `define_embedder()`
+- [ ] `RetrieverOptions` — flatten to kwargs on `define_retriever()`
+- [ ] `IndexerOptions` — flatten to kwargs on `define_indexer()`
+- [ ] `RerankerOptions` — flatten to kwargs on `define_reranker()`
+- [ ] `ResourceOptions` — `define_resource()` already has the same kwargs
+- [ ] `DapConfig` — flatten to kwargs on `define_dynamic_action_provider()`
+- [ ] `DapCacheConfig` — one-field dataclass (`ttl_millis`), fold into parent
+- [ ] `DefineBackgroundModelOptions` — flatten to kwargs on `define_background_model()`
+- [ ] `SimpleRetrieverOptions` — flatten to kwargs on `define_simple_retriever()`
+
+**Replaced by flat kwargs on prompt methods:**
+- [ ] `PromptGenerateOptions` — 17-field TypedDict, THE autocomplete-killer
+- [ ] `OutputOptions` — dies when `PromptGenerateOptions` is replaced
+- [ ] `OutputConfigDict` — dies when `Output[T]` is replaced
+
+**Inlined into helpers (class unnecessary):**
+- [ ] `GenkitSpan` — `__getattr__` proxy kills type checking. Replace with free
+      functions in `_tracing.py` (`_set_genkit_attr`, `_set_span_input`,
+      `_set_span_output`). `is_root` becomes `span.parent is None` at call site.
+      `_trace/_types.py` deleted.
+
+**Dead code / unused:**
+- [ ] `Input` / `Output` — replace with `output_schema` kwarg + `@overload`
+- [ ] `Retriever` — dead code, never instantiated
+- [ ] `ToolRequestLike` — used in 1 place as cast target, delete
+- [ ] `ResourceFn` — dead weight, only used in union with `Callable[..., ...]`
+- [ ] `MatchableAction` — code smell, `Action` already has `.matches` field
+- [ ] `ASGIApp` — defined but never used as type annotation
+- [ ] `ServerManagerProtocol` — lives in `web/manager/` being deleted
+
+**Error wire formats (consolidate into one `ErrorResponse`):**
+- [ ] `HttpErrorWireFormat` — dies with `core/flows.py`
+- [ ] `GenkitReflectionApiDetailsWireFormat` — collapse into `ErrorResponse`
+- [ ] `GenkitReflectionApiErrorWireFormat` — collapse into `ErrorResponse`
+
+**Server over-engineering (`web/manager/` deleted — 15-line problem):**
+- [ ] `ServerManager`
+- [ ] `ASGIServerAdapter`
+- [ ] `UvicornAdapter`
+- [ ] `GranianAdapter`
+- [ ] `Server`
+- [ ] `ServerConfig`
+- [ ] `ServerLifecycle`
+- [ ] `AbstractBaseServer`
+- [ ] `SignalHandler`
+- [ ] `ServerType`
+
+---
+
+## Clean (52) — no changes needed
+
+**User-facing types:**
+- [x] `RankedDocument` — `Document` subclass with `.score`. All 3 SDKs.
+- [x] `EmbedderSupports` — value type for embedder capabilities. All 3 SDKs.
+- [x] `Formatter` / `FormatterConfig` — format system base types. All 3 SDKs.
+- [x] `ActionMetadata` — 9-field data bag for action registration. All 3 SDKs.
+- [x] `GenerationCommonConfig` — extends schema type with `api_key`. ~36 files.
+- [x] `ContextMetadata` / `RequestData` — request-level context for web frameworks.
+- [x] `BackgroundAction` — wraps start/check/cancel for long-running ops. All 3 SDKs.
+- [x] `DynamicActionProvider` — runtime action discovery (MCP). All 3 SDKs.
+- [x] `Channel` — async iteration channel for streaming. JS has same.
+- [x] `Registry` — central action/plugin/schema registry. All 3 SDKs.
+- [x] `Embedder` — wraps embedder `Action` with `embed()`. Go has same.
+
+**Enums:**
+- [x] `ActionKind` — StrEnum, 17 action types.
+- [x] `ActionMetadataKey` — StrEnum, 3 keys.
+- [x] `StatusCodes` — IntEnum, gRPC-style status codes.
+- [x] `EnvVar` — StrEnum, `GENKIT_ENV`.
+- [x] `GenkitEnvironment` — StrEnum, `DEV`/`PROD`.
+- [x] `DeprecationStatus` — Enum, 3 values. Python-only.
+
+**Internal plumbing:**
+- [x] `ActionResponse` — action result wrapper. All 3 SDKs.
+- [x] `Status` — status with code + message.
+- [x] `ResourceInput` / `ResourceOutput` — action I/O for resources. All 3 SDKs.
+- [x] `RetrieverRequest` / `RetrieverSupports` / `RetrieverInfo` — retriever wire types.
+- [x] `IndexerRequest` / `IndexerInfo` — indexer wire types.
+- [x] `RerankerSupports` / `RerankerInfo` — reranker wire types.
+- [x] `PartCounts` — token counting helper.
+- [x] `PromptConfig` — BaseModel stored with prompt action. Internal.
+- [x] `ExtractItemsResult` — JSON extraction helper. Python-only.
+- [x] `DeprecationInfo` — deprecation metadata. Python-only.
+
+**Format implementations (all subclass FormatDef):**
+- [x] `TextFormat` / `JsonFormat` / `JsonlFormat` / `EnumFormat` / `ArrayFormat`
+
+**Runtime/testing:**
+- [x] `RuntimeManager` — writes `.genkit/runtimes/` for Dev UI discovery.
+- [x] `SimpleCache` — thread-safe TTL cache for DAP. Internal to `DynamicActionProvider`.
+- [x] `ProgrammableModel` / `EchoModel` / `StaticResponseModel` — test doubles.
+- [x] `SkipTestError` / `ModelTestError` / `ModelTestResult` / `TestCaseReport` — test infra.
+
+**Other:**
+- [x] `UnstableApiError` — `GenkitError` subclass for beta API gating. Matches JS.
+- [x] `DeprecatedEnumMeta` — metaclass for enum deprecation warnings. Python-only.
+- [x] `GenkitBase` / `GenkitRegistry` — `Genkit` class hierarchy. Audited in
+      GENKIT_CLASS_DESIGN.md.
diff --git a/py/docs/python_type_audit_details.md b/py/docs/python_type_audit_details.md
new file mode 100644
index 0000000000..9c51b7858c
--- /dev/null
+++ b/py/docs/python_type_audit_details.md
@@ -0,0 +1,996 @@
+# Class Audits — Action, ExecutablePrompt, GenerateResponseWrapper, Document
+
+Method-by-method audit of the four most important classes users interact with
+(after `Genkit` itself, which is covered in [GENKIT_CLASS_DESIGN.md](../GENKIT_CLASS_DESIGN.md)).
+
+---
+
+## 1. `Action`
+
+`core/action/_action.py` — the foundational type. Everything in Genkit is an
+Action. ~65 files reference it.
+
+### Class shape
+
+```
+Action(Generic[InputT, OutputT, ChunkT])
+  ├── Properties (read-only): kind, name, description, metadata, input_type, is_async
+  ├── Properties (read/write!): input_schema, output_schema
+  ├── run(input, on_chunk, context, _telemetry_labels)         → ActionResponse
+  ├── arun(input, on_chunk, context, on_trace_start, ...)      → ActionResponse
+  ├── arun_raw(raw_input, on_chunk, context, on_trace_start, ...) → ActionResponse
+  └── stream(input, context, telemetry_labels, timeout)        → tuple[AsyncIterator, Future]
+```
+
+### JS comparison
+
+JS `Action` is a **type alias**, not a class — it's a callable function with
+attached properties:
+
+```typescript
+type Action<I, O, S> = ((input?, options?) => Promise<O>) & {
+  __action: ActionMetadata<I, O, S>;
+  run(input?, options?): Promise<ActionResult<O>>;
+  stream(input?, opts?): StreamingResponse<O, S>;
+};
+```
+
+Key differences:
+- **JS actions are callable** — `await action(input)` works. Python requires
+  `await action.arun(input)`.
+- **JS has one `run()` method** that returns `ActionResult`. Python has three:
+  `run()` (sync), `arun()` (async), `arun_raw()` (async + validation).
+- **JS `stream()` returns `StreamingResponse`** with `.stream` and `.response`
+  properties. Python returns a raw tuple.
+
+### `run()`
+
+```python
+# Today
+def run(
+    self,
+    input: InputT | None = None,
+    on_chunk: StreamingCallback | None = None,
+    context: dict[str, object] | None = None,
+    _telemetry_labels: dict[str, object] | None = None,
+) -> ActionResponse[OutputT]
+```
+
+**Verdict: delete entirely.** Only exists to support sync flow/tool wrappers
+(2 callsites in `_registry.py`). The framework is async-first — sync user
+functions should be auto-wrapped with `ensure_async()` at registration time
+instead of maintaining a parallel sync execution path. JS and Go don't have
+this because they don't have separate sync/async function types.
+
+### `arun()`
+
+```python
+# Today
+async def arun(
+    self,
+    input: InputT | None = None,
+    on_chunk: StreamingCallback | None = None,
+    context: dict[str, object] | None = None,
+    on_trace_start: Callable[[str, str], None] | None = None,
+    _telemetry_labels: dict[str, object] | None = None,
+) -> ActionResponse[OutputT]
+```
+
+**Issues:**
+1. **`on_chunk` is a JS callback pattern leaking into the public API.** Python's
+   streaming convention is `async for` (async iterators), not callbacks. `on_chunk`
+   is internal plumbing — `Action.stream()` already wraps it into a `Channel`
+   (async iterator) for users. The only external caller passing `on_chunk` directly
+   is `core/reflection.py` (Dev UI server). Regular users should never see this
+   parameter; they should use `stream()` instead.
+2. **`on_trace_start` is internal Dev UI plumbing** — only called by
+   `core/reflection.py` to grab trace/span IDs for the Dev UI response. No user
+   ever passes this. Shouldn't be on the public method.
+3. `_telemetry_labels` — same underscore issue.
+4. `input` is optional but many actions require it — fails at runtime.
+
+### `arun_raw()`
+
+```python
+# Today
+async def arun_raw(
+    self,
+    raw_input: InputT | None = None,
+    on_chunk: StreamingCallback | None = None,
+    context: dict[str, object] | None = None,
+    on_trace_start: Callable[[str, str], None] | None = None,
+    telemetry_labels: dict[str, object] | None = None,
+) -> ActionResponse[OutputT]
+```
+
+**Issues:**
+1. **Confusing name** — "raw" means "I'll validate for you via Pydantic." But
+   `arun()` does NOT validate. So `arun_raw` does more work, not less. The name
+   implies the opposite.
+2. The only difference from `arun()` is Pydantic validation on input. This
+   should be a flag, not a separate method.
+3. Same `on_chunk`/`on_trace_start` callback leakage as `arun()`.
+4. `telemetry_labels` has no underscore here but has underscore in `arun()`.
+   Inconsistent.
+
+### `stream()`
+
+```python
+# Today
+def stream(
+    self,
+    input: InputT | None = None,
+    context: dict[str, object] | None = None,
+    telemetry_labels: dict[str, object] | None = None,
+    timeout: float | None = None,
+) -> tuple[AsyncIterator[ChunkT], asyncio.Future[ActionResponse[OutputT]]]
+```
+
+**Issues:**
+1. **Returns a tuple** — not directly iterable. Must destructure:
+   `chunks, future = action.stream(input)`. JS returns a `StreamingResponse`
+   with `.stream` and `.response`.
+2. **Not async** — synchronously creates a Channel and kicks off a task. This
+   is fine mechanically but unexpected (an async operation that doesn't use `await`).
+3. Creates a redundant `result_future` that wraps `stream.closed` — why not
+   just expose `stream.closed` directly?
+
+### `input_schema` / `output_schema` (property setters)
+
+```python
+@input_schema.setter
+def input_schema(self, value: dict[str, object]) -> None:
+    self._input_schema = value
+    self._metadata[ActionMetadataKey.INPUT_KEY] = value
+
+@output_schema.setter
+def output_schema(self, value: dict[str, object]) -> None:
+    self._output_schema = value
+    self._metadata[ActionMetadataKey.OUTPUT_KEY] = value
+```
+
+**Issues:**
+1. **Actions should be immutable after construction.** Mutable schemas invite
+   subtle bugs — if someone stores a reference to `action.input_schema` and
+   the schema later changes, they have stale data.
+2. The setters exist for lazy-loaded prompts that set schema after registration.
+   This is a hack around the construction order — the prompt system should pass
+   schemas at construction time, or the action should accept a schema-factory.
+
+### Proposed `Action` changes
+
+```python
+class Action(Generic[InputT, OutputT, ChunkT]):
+    # All properties read-only (remove setters)
+    kind: ActionKind       # read-only
+    name: str              # read-only
+    input_schema: dict     # read-only
+    output_schema: dict    # read-only
+
+    async def __call__(
+        self,
+        input: InputT | None = None,
+        *,
+        context: dict[str, object] | None = None,
+    ) -> ActionResponse[OutputT]:
+        """Primary execution method. Validates input, runs async."""
+
+    def stream(
+        self,
+        input: InputT | None = None,
+        *,
+        context: dict[str, object] | None = None,
+        timeout: float | None = None,
+    ) -> StreamResponse[OutputT, ChunkT]:
+        """Returns a StreamResponse (iterable + awaitable response)."""
+
+    def __repr__(self) -> str:
+        return f"Action(kind={self.kind}, name={self.name!r})"
+```
+
+**Removed:** `run()` (delete — only 2 internal callsites in `_registry.py`,
+rewrite to use `__call__` with sync-async bridging), `arun()` (replaced by
+`__call__`), `arun_raw()` (merge validation into `__call__`), schema setters,
+`on_chunk`/`on_trace_start` from public signatures.
+
+---
+
+## 2. `ExecutablePrompt`
+
+`blocks/prompt.py` — returned by `define_prompt()` and `prompt()`. The primary
+way users work with prompts.
+
+### Class shape
+
+```
+ExecutablePrompt(Generic[InputT, OutputT])
+  ├── ref (property)                                → dict
+  ├── __call__(input, opts: TypedDict | None)       → GenerateResponseWrapper
+  ├── stream(input, opts, *, timeout)               → GenerateStreamResponse
+  ├── render(input, opts)                           → GenerateActionOptions
+  ├── as_tool()                                     → Action
+  └── _ensure_resolved()                            → None (lazy loading)
+```
+
+25 constructor params. The constructor stores every prompt option as an instance
+field — model, config, system, prompt, messages, output_format,
+output_content_type, output_instructions, output_schema, output_constrained,
+max_turns, return_tool_requests, metadata, tools, tool_choice, use, docs,
+resources, plus internal fields (_name, _ns, _prompt_action, _cache_prompt).
+
+### JS comparison
+
+JS `ExecutablePrompt` is an **interface**:
+
+```typescript
+interface ExecutablePrompt<I, O, CustomOptions> {
+  ref: { name: string; metadata?: Record<string, any> };
+  (input?, opts?): Promise<GenerateResponse<O>>;
+  stream(input?, opts?): GenerateStreamResponse<O>;
+  render(input?, opts?): Promise<GenerateOptions<O>>;
+  asTool(): Promise<ToolAction>;
+}
+```
+
+Same methods, same shape. The difference is in how `opts` works.
+
+### `__call__()`
+
+```python
+# Today
+async def __call__(
+    self,
+    input: InputT | None = None,
+    opts: PromptGenerateOptions | None = None,
+) -> GenerateResponseWrapper[OutputT]
+```
+
+**Issues:**
+1. **`opts` is a TypedDict** — `PromptGenerateOptions` is a dict-like type.
+   This kills IDE autocomplete. Users must know the TypedDict keys by heart.
+   Compare with kwargs:
+   ```python
+   # TypedDict (today) — no autocomplete on keys
+   await prompt(input, opts={'model': 'gemini-2.0-flash', 'config': {...}})
+
+   # Kwargs (proposed) — full autocomplete
+   await prompt(input, model='gemini-2.0-flash', config={...})
+   ```
+2. **JS does the same thing** — `opts` is an object parameter there too. But
+   TypeScript has much better autocomplete for object literals. Python
+   TypedDicts don't get the same treatment from IDEs.
+
+### `stream()`
+
+```python
+# Today
+def stream(
+    self,
+    input: InputT | None = None,
+    opts: PromptGenerateOptions | None = None,
+    *,
+    timeout: float | None = None,
+) -> GenerateStreamResponse[OutputT]
+```
+
+**Clean.** Returns `GenerateStreamResponse` (not a tuple). This is correct —
+it's the one place streaming is done right. The irony is that
+`Genkit.generate_stream()` doesn't use this type but `ExecutablePrompt.stream()`
+does.
+
+### `render()`
+
+```python
+# Today
+async def render(
+    self,
+    input: InputT | dict[str, Any] | None = None,
+    opts: PromptGenerateOptions | None = None,
+) -> GenerateActionOptions
+```
+
+**Issues:**
+1. **220 lines of merging logic** — the method body is enormous. It merges
+   config, model, tools, output options, tool_choice, return_tool_requests,
+   max_turns, metadata, docs, resources, and messages from three sources
+   (prompt defaults, opts overrides, and input rendering). This is the
+   complexity center of the entire prompt system.
+2. `input` accepts `InputT | dict[str, Any]` — mixed typing. Should be one
+   or the other. The method body has 4 branches to handle different input types
+   (None, dict, Pydantic v2, Pydantic v1, fallback cast).
+
+### `as_tool()`
+
+```python
+# Today
+async def as_tool(self) -> Action
+```
+
+**Clean.** Simple lookup. Minor naming difference from JS (`asTool`).
+
+### `_ensure_resolved()`
+
+```python
+async def _ensure_resolved(self) -> None
+```
+
+**Issues:**
+1. **Lazy loading that can fail** — if the prompt was created via `ai.prompt(name)`,
+   it's unresolved until first use. The first `__call__`, `stream`, `render`, or
+   `as_tool` triggers resolution. If the prompt file doesn't exist, the error
+   appears at call time, not at construction time.
+2. **Copies all fields from resolved prompt** — 20 field assignments. If a new
+   field is added to `ExecutablePrompt`, someone must remember to add it here too.
+   This is fragile.
+
+### Proposed `ExecutablePrompt` changes
+
+```python
+class Prompt(Generic[InputT, OutputT]):
+    """Renamed from ExecutablePrompt (shorter, clearer)."""
+
+    @property
+    def ref(self) -> PromptRef: ...
+
+    async def __call__(
+        self,
+        input: InputT | None = None,
+        *,
+        model: str | None = None,
+        config: dict | GenerationCommonConfig | None = None,
+        tools: list[str] | None = None,
+        tool_choice: ToolChoice | None = None,
+        return_tool_requests: bool | None = None,
+        max_turns: int | None = None,
+        context: dict[str, object] | None = None,
+        output_schema: type | None = None,
+        output_format: str | None = None,
+        docs: list[DocumentData] | None = None,
+    ) -> GenerateResponse[OutputT]:
+        """Execute the prompt. Flat kwargs instead of opts TypedDict."""
+
+    def stream(
+        self,
+        input: InputT | None = None,
+        *,
+        # same kwargs as __call__
+        timeout: float | None = None,
+    ) -> GenerateStreamResponse[OutputT]: ...
+
+    async def render(
+        self,
+        input: InputT | None = None,
+        *,
+        # same kwargs as __call__
+    ) -> GenerateOptions: ...
+
+    async def as_tool(self) -> Action: ...
+```
+
+**Key changes:**
+- Rename `ExecutablePrompt` → `Prompt` (shorter).
+- Replace `opts: TypedDict` with flat kwargs for IDE autocomplete.
+- Simplify `render()` — extract merging logic into a shared helper.
+
+---
+
+## 3. `GenerateResponseWrapper`
+
+`blocks/model.py` — the response users get from `generate()`. The thing they
+interact with most after calling the model.
+
+### Class shape
+
+```
+GenerateResponseWrapper(GenerateResponse, Generic[OutputT])
+  ├── Private: _message_parser, _schema_type
+  ├── message: MessageWrapper | None
+  ├── text (cached_property)                → str
+  ├── output (cached_property)              → OutputT
+  ├── messages (cached_property)            → list[Message]
+  ├── tool_requests (cached_property)       → list[ToolRequestPart]
+  ├── interrupts (cached_property)          → list[ToolRequestPart]
+  ├── assert_valid()                        → None (PLACEHOLDER)
+  └── assert_valid_schema()                 → None (PLACEHOLDER)
+```
+
+### JS comparison
+
+JS `GenerateResponse<O>` has everything Python has, plus:
+
+| Property/Method | JS | Python |
+|---|---|---|
+| `text` | getter | cached_property |
+| `output` | getter | cached_property |
+| `reasoning` | getter | **missing** |
+| `media` | getter | **missing** |
+| `data` | getter | **missing** |
+| `toolRequests` | getter | cached_property |
+| `interrupts` | getter | cached_property |
+| `messages` | getter | cached_property |
+| `model` | field | **missing** |
+| `raw` | field | **missing** |
+| `assertValid()` | **implemented** | **placeholder (TODO)** |
+| `assertValidSchema()` | **implemented** | **placeholder (TODO)** |
+| `isValid()` | method | **missing** |
+| `toJSON()` | method | Pydantic handles it |
+
+### `__init__()`
+
+```python
+# Today
+def __init__(
+    self,
+    response: GenerateResponse,
+    request: GenerateRequest,
+    message_parser: Callable[[MessageWrapper], object] | None = None,
+    schema_type: type[BaseModel] | None = None,
+) -> None
+```
+
+**Issues:**
+1. Wraps a `GenerateResponse` but copies all fields into `super().__init__()`.
+   Could just store the response and delegate. The copy-and-reconstruct
+   pattern is fragile — if `GenerateResponse` adds a field, this breaks.
+2. `message_parser` and `schema_type` are internal — users never pass these.
+   They should be keyword-only or prefixed.
+
+### `assert_valid()` / `assert_valid_schema()`
+
+```python
+def assert_valid(self) -> None:
+    # TODO(#4343): implement
+    pass
+
+def assert_valid_schema(self) -> None:
+    # TODO(#4343): implement
+    pass
+```
+
+**Issue:** Empty placeholders since initial implementation. JS has these
+fully implemented — they check for empty responses, missing messages,
+malformed content, and schema violations. These are important for production
+use — without them, users can't validate responses programmatically.
+
+### `messages`
+
+```python
+@cached_property
+def messages(self) -> list[Message]:
+    if self.message is None:
+        return list(self.request.messages) if self.request else []
+    return [
+        *(self.request.messages if self.request else []),
+        self.message._original_message,  # private field access!
+    ]
+```
+
+**Issue:** Accesses `self.message._original_message` (private field on
+`MessageWrapper`). Should expose a public method on `MessageWrapper` for this,
+like `.unwrap()` or `.to_message()`.
+
+### `output`
+
+```python
+@cached_property
+def output(self) -> OutputT:
+    if self._message_parser and self.message is not None:
+        parsed = self._message_parser(self.message)
+    else:
+        parsed = extract_json(self.text)
+
+    if self._schema_type is not None and parsed is not None and isinstance(parsed, dict):
+        return cast(OutputT, self._schema_type.model_validate(parsed))
+
+    return cast(OutputT, parsed)
+```
+
+**Issue:** Falls back to `extract_json(self.text)` when no parser is set. This
+regex-based JSON extraction is fragile — it scans the text for `{...}` or
+`[...]`. If the model returns markdown with JSON in a code fence, this might
+extract it or might not. JS has the same pattern, so this is cross-language
+consistent at least.
+
+### Proposed `GenerateResponseWrapper` changes
+
+```python
+class GenerateResponse(Generic[OutputT]):
+    """Rename from GenerateResponseWrapper (drop 'Wrapper' suffix)."""
+
+    # Existing
+    text: str                              # property
+    output: OutputT                        # property
+    messages: list[Message]                # property
+    tool_requests: list[ToolRequestPart]   # property
+    interrupts: list[ToolRequestPart]      # property
+
+    # Add (parity with JS)
+    reasoning: str                         # for chain-of-thought models
+    media: MediaPart | None                # first media part
+    data: OutputT | None                   # first data part
+    model: str | None                      # which model generated this
+
+    # Implement
+    def assert_valid(self) -> None: ...            # actually check response
+    def assert_valid_schema(self) -> None: ...     # actually check schema
+    def is_valid(self) -> bool: ...                # non-throwing version
+```
+
+---
+
+## 4. `Document`
+
+`blocks/document.py` — used by every retrieval, embedding, and reranking
+operation. ~25 files reference it.
+
+### Class shape
+
+```
+Document(DocumentData)
+  ├── text()                                → str  (METHOD, not property!)
+  ├── media()                               → list[Media]
+  ├── data()                                → str
+  ├── data_type()                           → str | None
+  ├── get_embedding_documents(embeddings)   → list[Document]
+  ├── from_document_data(data)              → Document  (static)
+  ├── from_text(text, metadata)             → Document  (static)
+  ├── from_media(url, content_type, meta)   → Document  (static)
+  └── from_data(data, data_type, metadata)  → Document  (static)
+```
+
+### JS comparison
+
+| Member | JS | Python |
+|---|---|---|
+| `text` | **getter** (property) | **method** `text()` |
+| `media` | **getter** (property) | **method** `media()` |
+| `data` | **getter** (property) | **method** `data()` |
+| `dataType` | **getter** (property) | **method** `data_type()` |
+| `toJSON()` | method | (Pydantic handles) |
+| `getEmbeddingDocuments()` | method | method |
+| `fromText()` | static | static |
+| `fromMedia()` | static | static |
+| `fromData()` | static | static |
+
+### `text()` — method vs property
+
+```python
+# Today (Python)
+def text(self) -> str:
+    texts = []
+    for p in self.content:
+        part = p.root if hasattr(p, 'root') else p
+        text_val = getattr(part, 'text', None)
+        if isinstance(text_val, str):
+            texts.append(text_val)
+    return ''.join(texts)
+```
+
+```typescript
+// JS — property
+get text(): string {
+    return this.content.map((part) => part.text || '').join('');
+}
+```
+
+**Issues:**
+1. **The single most confusing inconsistency in the SDK.** `MessageWrapper.text`
+   is a property. `GenerateResponseWrapper.text` is a property.
+   `GenerateResponseChunkWrapper.text` is a property. `Document.text()` is a
+   method. Users will write `doc.text` (no parens) and get a bound method
+   reference instead of a string. No error, no warning, just silent bugs.
+2. **Breaking change to fix** — this is a public API. Changing from method to
+   property will break every call site that uses `doc.text()`. But the
+   inconsistency is worse than the break.
+3. Same issue applies to `media()`, `data()`, `data_type()`.
+
+### `data()` — calls `text()` twice
+
+```python
+def data(self) -> str:
+    if self.text():      # first call — scans all content
+        return self.text()  # second call — scans all content again
+    if self.media():
+        return self.media()[0].url
+    return ''
+```
+
+**Issue:** Scans content twice. Should cache or store the result. Not a
+correctness bug but wasteful. Same issue with `data_type()` calling `text()`
+and `media()` again.
+
+### Constructor — deep copies
+
+```python
+def __init__(
+    self,
+    content: list[DocumentPart],
+    metadata: dict[str, Any] | None = None,
+) -> None:
+    doc_content = deepcopy(content)
+    doc_metadata = deepcopy(metadata)
+    super().__init__(content=doc_content, metadata=doc_metadata)
+```
+
+**Issue:** Always deep-copies content and metadata. JS does the same, so
+this is cross-language consistent. But in Python, `deepcopy` on Pydantic
+models is expensive. For large documents (e.g., embedding pipelines with
+thousands of documents), this could be a performance bottleneck.
+
+### Proposed `Document` changes
+
+```python
+class Document(DocumentData):
+    @property
+    def text(self) -> str: ...          # Change to property
+
+    @property
+    def media(self) -> list[Media]: ... # Change to property
+
+    @property
+    def data(self) -> str: ...          # Change to property
+
+    @property
+    def data_type(self) -> str | None: ...  # Change to property
+
+    # Static factories stay the same
+    @staticmethod
+    def from_text(text: str, metadata: dict | None = None) -> Document: ...
+    @staticmethod
+    def from_media(url: str, content_type: str | None = None, ...) -> Document: ...
+    @staticmethod
+    def from_data(data: str, data_type: str | None = None, ...) -> Document: ...
+```
+
+**Key changes:**
+- All accessors become `@property` (or `@cached_property` for perf).
+- Breaking change for `text()`, `media()`, `data()`, `data_type()` call sites.
+- Consider lazy `@cached_property` to avoid scanning content multiple times.
+
+---
+
+## 5. `GenerateStreamResponse`
+
+**File:** `blocks/prompt.py` (lines 414–539)
+**Base class:** `Generic[OutputT]`
+
+### Class shape
+
+```python
+class GenerateStreamResponse(Generic[OutputT]):
+    _channel: Channel[GenerateResponseChunkWrapper, GenerateResponseWrapper[OutputT]]
+    _response_future: asyncio.Future[GenerateResponseWrapper[OutputT]]
+
+    @property stream -> AsyncIterable[GenerateResponseChunkWrapper]
+    @property response -> Awaitable[GenerateResponseWrapper[OutputT]]
+```
+
+Two properties, two private fields. That's the entire class.
+
+### JS comparison
+
+```typescript
+// js/ai/src/generate.ts
+export interface GenerateStreamResponse<O extends z.ZodTypeAny = z.ZodTypeAny> {
+  get stream(): AsyncIterable<GenerateResponseChunk>;
+  get response(): Promise<GenerateResponse<O>>;
+}
+```
+
+JS has the identical interface — `stream` + `response`. But critically, JS uses
+this type everywhere: both `generateStream()` and `prompt.stream()` return it.
+
+### Go comparison
+
+Go has no wrapper class. `GenerateStream()` returns `iter.Seq2[*ModelStreamValue, error]`
+— a native Go iterator. Each yielded `ModelStreamValue` has either `.Chunk` (streaming)
+or `.Done == true` with `.Response` (final). Go-idiomatic, no need for a wrapper.
+
+### Issue 1: Not used by `Genkit.generate_stream()`
+
+This is the biggest problem. The main streaming entry point returns a raw tuple:
+
+```python
+# ai/_aio.py
+def generate_stream(self, ...) -> tuple[
+    AsyncIterator[GenerateResponseChunkWrapper],
+    asyncio.Future[GenerateResponseWrapper[Any]],
+]:
+```
+
+But `ExecutablePrompt.stream()` returns `GenerateStreamResponse`. So there are
+two inconsistent streaming APIs in the same SDK:
+
+```python
+# Prompt streaming — nice wrapper
+result = prompt.stream({"topic": "AI"})
+async for chunk in result.stream:
+    print(chunk.text)
+final = await result.response
+
+# Genkit.generate_stream() — raw tuple
+stream, future = ai.generate_stream(prompt="hello")
+async for chunk in stream:
+    print(chunk.text)
+final = await future
+```
+
+JS doesn't have this split — both paths return `GenerateStreamResponse`.
+
+### Issue 2: Not directly iterable
+
+You can't do `async for chunk in result:` — you must access `.stream` first.
+Python convention for iterable wrappers is to implement `__aiter__`:
+
+```python
+# Current — requires .stream access
+async for chunk in result.stream:
+    print(chunk.text)
+
+# Expected Pythonic pattern
+async for chunk in result:
+    print(chunk.text)
+```
+
+JS has the same `.stream` access pattern, but Python's `async for` protocol
+makes direct iteration a stronger convention.
+
+### Issue 3: Lives in wrong module
+
+Defined in `blocks/prompt.py` even though it's a general streaming response type.
+It's not prompt-specific — `Genkit.generate_stream()` should use it too.
+Should live in `blocks/generate.py` or `blocks/model.py`.
+
+### Issue 4: No `__await__`
+
+You can't `await` the response directly on the object:
+
+```python
+# Current — must access .response
+final = await result.response
+
+# Could support direct await
+final = await result
+```
+
+This is a minor convenience but makes the object more Pythonic.
+
+### Issue 5: No `__repr__`
+
+`repr(result)` gives `<GenerateStreamResponse object at 0x...>`. Should show
+useful state (e.g., whether stream is consumed, whether response is resolved).
+
+### Proposed `GenerateStreamResponse` changes
+
+1. **Wire into `Genkit.generate_stream()`** — return `GenerateStreamResponse`
+   instead of raw tuple. This is the highest-priority fix. One streaming API,
+   not two.
+
+2. **Add `__aiter__`** — delegate to `self._channel` so `async for chunk in result:`
+   works directly.
+
+3. **Add `__await__`** — delegate to `self._response_future` so `final = await result`
+   works as a shortcut for `await result.response`.
+
+4. **Move to `blocks/generate.py`** or a shared module — it's not prompt-specific.
+
+5. **Rename to `StreamResponse`** — shorter, matches the pattern of removing
+   redundant prefixes (`GenerateResponseWrapper` → `GenerateResponse`).
+
+6. **Add `__repr__`** — show stream/response state.
+
+After these changes:
+
+```python
+# Unified streaming API
+result = ai.generate_stream(prompt="hello")
+# OR
+result = prompt.stream({"topic": "AI"})
+
+# Direct iteration (no .stream needed)
+async for chunk in result:
+    print(chunk.text)
+
+# Direct await (no .response needed)
+final = await result
+
+# .stream and .response still work for explicit access
+async for chunk in result.stream:
+    print(chunk.text)
+final = await result.response
+```
+
+---
+
+## 6. `ToolInterruptError`
+
+**File:** `blocks/tools.py` (lines 172–188)
+**Base class:** `Exception`
+
+### Class shape
+
+```python
+class ToolInterruptError(Exception):
+    metadata: dict[str, Any]
+
+    def __init__(self, metadata: dict[str, Any] | None = None) -> None:
+        super().__init__()
+        self.metadata = metadata or {}
+```
+
+One field, one constructor. No methods, no `__str__`, no `__repr__`.
+
+### JS comparison
+
+```typescript
+// js/ai/src/tool.ts
+export class ToolInterruptError extends Error {
+  constructor(readonly metadata?: Record<string, any>) {
+    super();
+    this.name = 'ToolInterruptError';
+  }
+}
+```
+
+Same shape but sets `this.name`. Both extend base error (not framework error).
+JS comment: "It's meant to be caught by the framework, not public API."
+
+### Go comparison
+
+```go
+// go/ai/tools.go (unexported)
+type toolInterruptError struct {
+    Metadata map[string]any
+}
+
+func (e *toolInterruptError) Error() string {
+    if e.Metadata != nil {
+        data, _ := json.MarshalIndent(e.Metadata, "", "  ")
+        return fmt.Sprintf("tool execution interrupted: \n\n%s", string(data))
+    }
+    return "tool execution interrupted"
+}
+
+func IsToolInterruptError(err error) (bool, map[string]any) { ... }
+```
+
+Go is the best here: unexported type (can't be constructed by users),
+public `IsToolInterruptError()` helper for checking, and a useful `Error()` string.
+
+### Issue 1: Extends `Exception` not `GenkitError`
+
+The TODO at line 171 says it all:
+
+```python
+# TODO(#4346): make this extend GenkitError once it has INTERRUPTED status
+```
+
+This means `except GenkitError` won't catch tool interrupts. Users who write
+broad Genkit error handlers will miss these. Blocked on adding an `INTERRUPTED`
+status code to `StatusCodes`.
+
+### Issue 2: No error message
+
+```python
+err = ToolInterruptError(metadata={"step": "confirm"})
+str(err)   # => ''
+repr(err)  # => 'ToolInterruptError()'
+```
+
+Compare Go: `"tool execution interrupted: \n\n{\"step\": \"confirm\"}"` — actually
+useful in logs. Python's version is silent, which makes debugging painful.
+
+### Issue 3: `metadata` should be keyword-only
+
+```python
+# Currently allows positional
+ToolInterruptError({"key": "val"})
+
+# Should require keyword
+ToolInterruptError(metadata={"key": "val"})
+```
+
+All other error constructors in the SDK are being moved to keyword-only.
+This should follow.
+
+### Issue 4: No `__repr__`
+
+As noted above, `repr()` is useless. Should show metadata.
+
+### Issue 5: Mutable default via `or {}`
+
+```python
+self.metadata = metadata or {}
+```
+
+This creates a new dict each time (which is fine), but the pattern is inconsistent
+with the rest of the codebase which uses `field(default_factory=dict)` for dataclasses
+or explicit `None` checks. Minor.
+
+### Proposed `ToolInterruptError` changes
+
+1. **Extend `GenkitError`** once `StatusCodes.INTERRUPTED` exists (unblock #4346).
+   This gives: status code, serialization, cause chaining for free.
+
+2. **Add `__str__`** — `"tool execution interrupted"` + metadata dump (match Go).
+
+3. **Add `__repr__`** — `ToolInterruptError(metadata={'step': 'confirm'})`.
+
+4. **Make `metadata` keyword-only**:
+   ```python
+   def __init__(self, *, metadata: dict[str, Any] | None = None) -> None:
+   ```
+
+5. **Consider Go pattern** — make the class private (`_ToolInterruptError`) with
+   a public `is_tool_interrupt(err)` helper, since the JS comment says "not public
+   API." Python can't fully hide it (users need `except ToolInterruptError`), but
+   the Go pattern is worth noting.
+
+After these changes:
+
+```python
+class ToolInterruptError(GenkitError):
+    def __init__(self, *, metadata: dict[str, Any] | None = None) -> None:
+        super().__init__(status=StatusCodes.INTERRUPTED, message="tool execution interrupted")
+        self.metadata: dict[str, Any] = metadata or {}
+
+    def __str__(self) -> str:
+        if self.metadata:
+            return f"tool execution interrupted: {json.dumps(self.metadata, indent=2)}"
+        return "tool execution interrupted"
+
+    def __repr__(self) -> str:
+        return f"ToolInterruptError(metadata={self.metadata!r})"
+```
+
+---
+
+## Summary of all issues
+
+### High priority (user-facing, correctness, or API consistency)
+
+| Class | Issue | Effort |
+|---|---|---|
+| `Action` | `stream()` returns tuple instead of iterable object | medium |
+| `Action` | No `__call__` — can't do `await action(input)` | low |
+| `Action` | `on_chunk` callback on public API — Python uses `async for` not callbacks | medium |
+| `Action` | `arun()` vs `arun_raw()` confusing, inconsistent naming | medium |
+| `Action` | Mutable `input_schema`/`output_schema` setters | low |
+| `ExecutablePrompt` | `opts: TypedDict` kills autocomplete | medium |
+| `ExecutablePrompt` | `render()` is 220 lines of merging | refactor |
+| `GenerateResponseWrapper` | `assert_valid()`/`assert_valid_schema()` empty | medium |
+| `GenerateResponseWrapper` | Missing `reasoning`, `media`, `data` | low |
+| `GenerateResponseWrapper` | Missing `model` field | low |
+| `GenerateStreamResponse` | Not used by `Genkit.generate_stream()` — two streaming APIs | medium |
+| `GenerateStreamResponse` | Not directly iterable (no `__aiter__`) | low |
+| `ToolInterruptError` | Extends `Exception` not `GenkitError` — blocked on #4346 | medium |
+| `Document` | `text()` is method, not property — inconsistent | **breaking** |
+
+### Medium priority (engineering quality)
+
+| Class | Issue | Effort |
+|---|---|---|
+| `Action` | No `__repr__` | low |
+| `Action` | `_telemetry_labels` inconsistent underscore | low |
+| `Action` | `on_trace_start` Dev UI plumbing leaked into public API | low |
+| `ExecutablePrompt` | 25 constructor params | refactor |
+| `ExecutablePrompt` | `_ensure_resolved()` copies 20 fields — fragile | refactor |
+| `GenerateResponseWrapper` | Accesses `message._original_message` | low |
+| `GenerateResponseWrapper` | Constructor copies fields from response — fragile | refactor |
+| `GenerateStreamResponse` | Lives in `blocks/prompt.py` — not prompt-specific | low |
+| `ToolInterruptError` | No `__str__` — empty string in logs | low |
+| `ToolInterruptError` | `metadata` should be keyword-only | low |
+| `Document` | `data()` calls `text()` twice | low |
+| `Document` | `deepcopy` on every construction — perf risk | low |
+
+### Low priority (nice to have)
+
+| Class | Issue | Effort |
+|---|---|---|
+| `Action` | `run()` sync method — remove entirely (2 internal callsites) | low |
+| `ExecutablePrompt` | Rename to `Prompt` | **breaking** |
+| `GenerateResponseWrapper` | Rename to `GenerateResponse` | **breaking** |
+| `GenerateResponseWrapper` | Add `is_valid()` non-throwing check | low |
+| `GenerateStreamResponse` | No `__await__` for direct `await result` | low |
+| `GenerateStreamResponse` | Rename to `StreamResponse` | **breaking** |
+| `ToolInterruptError` | No `__repr__` | low |
diff --git a/py/docs/streaming.md b/py/docs/streaming.md
new file mode 100644
index 0000000000..335b6e48df
--- /dev/null
+++ b/py/docs/streaming.md
@@ -0,0 +1,220 @@
+# Python Streaming Design
+
+> **Status**: design proposal — see pre-review action items at the bottom for gaps between this design and the current implementation.
+
+---
+
+## Model
+
+Go and JS expose streaming as a single iterator that interleaves chunks and the final response. Python diverges deliberately: every streaming call returns a **two-channel wrapper object** with separate properties for chunks and the final response.
+
+```python
+result = flow.stream(input)
+
+async for chunk in result.stream:    # AsyncIterable[ChunkT]
+    print(chunk)
+
+response = await result.response     # Awaitable[OutputT]
+```
+
+This avoids the awkward "last item is the response" sentinel pattern that Go uses (`iter.Seq2[*StreamingFlowValue[S, O], error]`) and lets callers consume the stream and the response independently — e.g. start displaying chunks while also `await`-ing the final value in a separate task.
+
+---
+
+## Type hierarchy
+
+Three concrete wrapper classes, one inheritance chain:
+
+```
+ActionStreamResponse[ChunkT, OutputT]          ← base (action.stream())
+    └── FlowStreamResponse[ChunkT, OutputT]    ← flow.stream()
+            └── GenerateStreamResponse[OutputT] ← generate_stream(), prompt.stream()
+                                                   ChunkT pinned to GenerateResponseChunk
+                                                   OutputT wrapped in GenerateResponse[OutputT]
+```
+
+```python
+from typing import Generic, AsyncIterable, Awaitable
+ChunkT = TypeVar('ChunkT')
+OutputT = TypeVar('OutputT')
+
+class ActionStreamResponse(Generic[ChunkT, OutputT]):
+    @property
+    def stream(self) -> AsyncIterable[ChunkT]: ...
+    @property
+    def response(self) -> Awaitable[OutputT]: ...
+
+class FlowStreamResponse(ActionStreamResponse[ChunkT, OutputT]):
+    pass  # same interface, narrows the source
+
+class GenerateStreamResponse(FlowStreamResponse[GenerateResponseChunk, GenerateResponse[OutputT]]):
+    # ChunkT is pinned — generate always emits GenerateResponseChunk
+    # OutputT is the user's schema type (e.g. MyModel), wrapped in GenerateResponse
+    pass
+```
+
+`GenerateStreamResponse[OutputT]` is effectively `FlowStreamResponse[GenerateResponseChunk, GenerateResponse[OutputT]]` with the chunk type fixed. This lets callers write `async for chunk in result.stream` and get `GenerateResponseChunk` objects with `.text`, `.index`, etc. without needing to annotate the type themselves.
+
+---
+
+## Surfaces
+
+### `action.stream()`
+
+```python
+action.stream(
+    input: InputT | None = None,
+    *,
+    context: dict[str, object] | None = None,
+    telemetry_labels: dict[str, object] | None = None,
+    timeout: float | None = None,
+) -> ActionStreamResponse[ChunkT, OutputT]
+```
+
+```python
+result = my_action.stream(input_data)
+async for chunk in result.stream:
+    print(chunk)
+output = await result.response
+```
+
+### `flow.stream()`
+
+```python
+flow.stream(
+    input: InputT | None = None,
+    *,
+    context: dict[str, object] | None = None,
+    timeout: float | None = None,
+) -> FlowStreamResponse[ChunkT, OutputT]
+```
+
+```python
+result = my_flow.stream({"query": "hello"})
+async for chunk in result.stream:
+    print(chunk)
+final = await result.response
+```
+
+### `generate_stream()`
+
+```python
+# 4 overloads — see python_beta_api_proposal.md §2 for full signatures
+def generate_stream(
+    self,
+    *,
+    model: ModelReference[C] | str | None = None,
+    output_schema: type[OutputT] | dict[str, object] | None = None,
+    ...
+) -> GenerateStreamResponse[OutputT]
+```
+
+```python
+result = ai.generate_stream(
+    model=gemini_flash,
+    prompt="Tell me a story",
+    output_schema=StorySchema,
+)
+async for chunk in result.stream:
+    print(chunk.text, end="", flush=True)
+story: GenerateResponse[StorySchema] = await result.response
+print(story.output.title)
+```
+
+### `prompt.stream()`
+
+```python
+# On ExecutablePrompt[InputT, OutputT]
+def stream(
+    self,
+    input: InputT | None = None,
+    *,
+    timeout: float | None = None,
+) -> GenerateStreamResponse[OutputT]
+```
+
+```python
+result = my_prompt.stream({"topic": "space"})
+async for chunk in result.stream:
+    print(chunk.text, end="")
+response = await result.response
+```
+
+---
+
+## Internal: `Channel[T]`
+
+All streaming wrappers are backed by a `Channel[T]` — a thin async queue that bridges the producer (action implementation) and consumer (caller).
+
+```python
+class Channel(Generic[T]):
+    async def send(self, chunk: T) -> None: ...      # producer pushes a chunk
+    def close(self) -> None: ...                     # producer signals completion
+    def set_response(self, value: Any) -> None: ...  # producer delivers final result
+    def __aiter__(self) -> AsyncIterator[T]: ...     # consumer iterates chunks
+```
+
+**Key invariants**:
+- `None` is the sentinel that signals the iterator to stop — chunk types must not be `None` (use `Optional`-wrapped types if needed).
+- The response future is separate from the chunk channel — `await result.response` never needs to drain the stream first.
+- `_pop()` must use `if r is None` (not `if not r`) — otherwise falsy chunks (empty string `""`, `0`, `False`) incorrectly terminate iteration. *(Pre-review action item — current code uses `if not r`.)*
+
+**Current implementation** (`genkit.aio.channel`): `Channel` is typed as `Generic[T, R]` with a second type parameter `R` for the close-result type. The design simplifies this to `Generic[T]` — the close-result type adds coupling without benefit. The response is a separate `asyncio.Future` on the wrapper object, not baked into the channel.
+
+---
+
+## Producer interface
+
+Action, flow, and model implementations emit chunks through `ActionRunContext[ChunkT]`, passed as the second argument to the action function:
+
+```python
+@ai.flow()
+async def my_flow(input: str, ctx: ActionRunContext[str]) -> str:
+    for word in input.split():
+        await ctx.send_chunk(word)   # type-safe: ChunkT is str
+    return input
+
+ctx.is_streaming              # bool — False means caller didn't request a stream; send_chunk is a no-op
+ctx.send_chunk(chunk: ChunkT) # pushes chunk to consumer; no-op if not streaming
+ctx.context                   # dict[str, object] — request-scoped metadata
+```
+
+**Cross-language comparison**:
+
+| | Producer interface | Notes |
+|---|---|---|
+| **Go** | `StreamCallback[Stream]` callback param (nil if not streaming) | Caller checks nil before calling |
+| **JS** | `ActionFnArg<S>` + `FlowSideChannel<S>` — two separate types | Flows and actions have different producer objects |
+| **Python** | `ActionRunContext[ChunkT]` — unified | Single class for actions, flows, and models; `is_streaming` replaces nil check |
+
+**`ToolRunContext`**: Tools do not define their own chunk schema — they borrow the parent `generate()` call's callback. Therefore `ToolRunContext` is `ActionRunContext[object]` (ChunkT = `object`, explicitly untyped), matching JS's `ToolAction` which hardcodes the streaming type as `z.ZodTypeAny`.
+
+---
+
+## Transport layer
+
+The reflection server (Dev UI ↔ Python runtime) uses **Server-Sent Events (SSE)** to forward chunks over HTTP. This is an implementation detail — it does not affect the consumer API. The `Channel` is the in-process abstraction; SSE is how it crosses the wire to the Dev UI during local development.
+
+---
+
+## Cross-language comparison
+
+| Surface | Go | JS | Python |
+|---|---|---|---|
+| **action.stream()** | `action.Stream(ctx, input, cb)` — `cb StreamCallback[S]` | `action.stream(input)` → `ActionStreamResponse<S, O>` | `action.stream(input)` → `ActionStreamResponse[ChunkT, OutputT]` |
+| **flow.stream()** | `flow.Stream(ctx, input)` → `iter.Seq2[*StreamingFlowValue[S,O], error]` | `flow.stream(input)` → `FlowStreamResponse<S, O>` | `flow.stream(input)` → `FlowStreamResponse[ChunkT, OutputT]` |
+| **generate_stream()** | `genkit.GenerateStream(ctx, req)` → `iter.Seq2[*GenerateResponseChunk, error]` | `ai.generateStream(opts)` → `GenerateStreamResponse<O>` | `ai.generate_stream(...)` → `GenerateStreamResponse[OutputT]` |
+| **prompt.stream()** | `prompt.Stream(ctx, input)` | `prompt.stream(input)` → `GenerateStreamResponse<O>` | `prompt.stream(input)` → `GenerateStreamResponse[OutputT]` |
+| **chat.stream()** | n/a | `chat.sendStream(input)` → `GenerateStreamResponse<O>` | not yet implemented |
+| **Chunk/response split** | Single iterator, last value is response | Two-channel wrapper object | Two-channel wrapper object |
+| **Producer** | `StreamCallback[S]` func param | `ActionFnArg<S>` / `FlowSideChannel<S>` | `ActionRunContext[ChunkT]` |
+
+---
+
+## What's not implemented yet
+
+- **`Chat.send_stream()`** — no streaming equivalent for `chat.send()`.
+- **`Action.stream()`** — currently returns a raw tuple `(AsyncIterator, Awaitable)`, not `ActionStreamResponse`. Needs to be updated to return the wrapper.
+- **`FlowWrapper.stream()`** — same: currently returns raw tuple. Needs to return `FlowStreamResponse[ChunkT, OutputT]`.
+- **`Channel` cleanup** — needs two fixes: simplify to `Generic[T]` (drop `R`), and fix `_pop()` falsy sentinel check.
+- **`ActionRunContext` generics** — currently `send_chunk(chunk: object)`. Needs to become `ActionRunContext[ChunkT]` with `send_chunk(chunk: ChunkT)` for type safety.