From 2e3e763baad34bb57eaf99b3d5878712ec376d49 Mon Sep 17 00:00:00 2001
From: Eduard van Valkenburg <github@vanvalkenburg.eu>
Date: Mon, 2 Feb 2026 10:22:20 +0100
Subject: [PATCH 01/19] Add ADR for Python ContextMiddleware unification

---
 .../00XX-python-context-middleware.md         | 1693 +++++++++++++++++
 1 file changed, 1693 insertions(+)
 create mode 100644 docs/decisions/00XX-python-context-middleware.md

diff --git a/docs/decisions/00XX-python-context-middleware.md b/docs/decisions/00XX-python-context-middleware.md
new file mode 100644
index 0000000000..97f6e5e3a4
--- /dev/null
+++ b/docs/decisions/00XX-python-context-middleware.md
@@ -0,0 +1,1693 @@
+---
+# These are optional elements. Feel free to remove any of them.
+status: proposed
+contact: eavanvalkenburg
+date: 2026-02-02
+deciders: eavanvalkenburg, markwallace-microsoft, sphenry, alliscode, johanst, brettcannon
+consulted: taochenosu, moonbox3, dmytrostruk, giles17
+---
+
+# Unifying Context Management with ContextMiddleware
+
+## Context and Problem Statement
+
+The Agent Framework Python SDK currently has multiple abstractions for managing conversation context:
+
+| Concept | Purpose | Location |
+|---------|---------|----------|
+| `ContextProvider` | Injects instructions, messages, and tools before/after invocations | `_memory.py` |
+| `ChatMessageStore` / `ChatMessageStoreProtocol` | Stores and retrieves conversation history | `_threads.py` |
+| `AgentThread` | Manages conversation state and coordinates storage | `_threads.py` |
+
+This creates cognitive overhead for developers doing "Context Engineering" - the practice of dynamically managing what context (history, RAG results, instructions, tools) is sent to the model. Users must understand:
+- When to use `ContextProvider` vs `ChatMessageStore`
+- How `AgentThread` coordinates between them
+- Different lifecycle hooks (`invoking()`, `invoked()`, `thread_created()`)
+
+**How can we simplify context management into a single, composable pattern that handles all context-related concerns?**
+
+## Decision Drivers
+
+- **Simplicity**: Reduce the number of concepts users must learn
+- **Composability**: Enable multiple context sources to be combined flexibly
+- **Consistency**: Follow existing patterns in the framework (middleware)
+- **Flexibility**: Support both stateless and session-specific middleware
+- **Attribution**: Enable tracking which middleware added which messages/tools
+- **Zero-config**: Simple use cases should work without configuration
+
+## Current State Analysis
+
+### ContextProvider (Current)
+
+```python
+class ContextProvider(ABC):
+    async def thread_created(self, thread_id: str | None) -> None:
+        """Called when a new thread is created."""
+        pass
+
+    async def invoked(
+        self,
+        request_messages: ChatMessage | Sequence[ChatMessage],
+        response_messages: ChatMessage | Sequence[ChatMessage] | None = None,
+        invoke_exception: Exception | None = None,
+        **kwargs: Any,
+    ) -> None:
+        """Called after the agent receives a response."""
+        pass
+
+    @abstractmethod
+    async def invoking(self, messages: ChatMessage | MutableSequence[ChatMessage], **kwargs: Any) -> Context:
+        """Called before model invocation. Returns Context with instructions, messages, tools."""
+        pass
+```
+
+**Limitations:**
+- Separate `invoking()` and `invoked()` methods make pre/post processing awkward
+- Returns a `Context` object that must be merged externally
+- No clear way to compose multiple providers
+- No source attribution for debugging
+
+### ChatMessageStore (Current)
+
+```python
+class ChatMessageStoreProtocol(Protocol):
+    async def list_messages(self) -> list[ChatMessage]: ...
+    async def add_messages(self, messages: Sequence[ChatMessage]) -> None: ...
+    async def serialize(self, **kwargs: Any) -> dict[str, Any]: ...
+    @classmethod
+    async def deserialize(cls, state: MutableMapping[str, Any], **kwargs: Any) -> "ChatMessageStoreProtocol": ...
+```
+
+**Limitations:**
+- Only handles storage, no context injection
+- Separate concept from `ContextProvider`
+- No control over what gets stored (RAG context vs user messages)
+
+### AgentThread (Current)
+
+```python
+class AgentThread:
+    def __init__(
+        self,
+        *,
+        service_thread_id: str | None = None,
+        message_store: ChatMessageStoreProtocol | None = None,
+        context_provider: ContextProvider | None = None,
+    ) -> None: ...
+```
+
+**Limitations:**
+- Coordinates storage and context separately
+- Only one `context_provider` (no composition)
+- Naming confusion (`Thread` vs `Session`)
+
+## Design Decisions Summary
+
+The following key decisions shape the ContextMiddleware design:
+
+| # | Decision | Rationale |
+|---|----------|-----------|
+| 1 | **Agent vs Session Ownership** | Agent owns middleware config; Session owns resolved pipeline. Enables per-session factories. |
+| 2 | **Instance or Factory** | Middleware can be shared instances or `(session_id) -> Middleware` factories for per-session state. |
+| 3 | **Default Storage at Runtime** | `InMemoryStorageMiddleware` auto-added when no service_session_id, store≠True, and no pipeline. Evaluated at runtime so users can modify pipeline first. |
+| 4 | **Multiple Storage Allowed** | Warn if multiple have `load_messages=True` (likely misconfiguration). |
+| 5 | **Single Storage Class** | One `StorageContextMiddleware` configured for memory/audit/evaluation - no separate classes. |
+| 6 | **Mandatory source_id** | Required parameter forces explicit naming for attribution in `context_messages` dict. |
+| 7 | **Smart Load Behavior** | `load_messages=None` (default) disables loading when `options.store=False` OR `service_session_id` present. |
+| 8 | **Dict-based Context** | `context_messages: dict[str, list[ChatMessage]]` keyed by source_id maintains order and enables filtering. |
+| 9 | **Selective Storage** | `store_context_messages` and `store_context_from` control what gets persisted from other middleware. |
+| 10 | **Tool Attribution** | `add_tools()` automatically sets `tool.metadata["context_source"] = source_id`. |
+| 11 | **Clean Break** | Remove `AgentThread`, `ContextProvider`, `ChatMessageStore` completely (preview, no compatibility shims). |
+| 12 | **Middleware Ordering** | User-defined order; storage sees prior middleware (pre-processing) or all middleware (post-processing). |
+
+## Considered Options
+
+### Option 1: Status Quo - Keep Separate Abstractions
+
+Keep `ContextProvider`, `ChatMessageStore`, and `AgentThread` as separate concepts.
+
+**Pros:**
+- No migration required
+- Familiar to existing users
+
+**Cons:**
+- Cognitive overhead remains
+- No composability for context providers
+- Inconsistent with middleware pattern used elsewhere
+
+### Option 2: ContextMiddleware (Chosen)
+
+Create a unified `ContextMiddleware` that uses the onion/wrapper pattern (like existing `AgentMiddleware`, `ChatMiddleware`) to handle all context-related concerns.
+
+**Pros:**
+- Single concept for all context engineering
+- Familiar pattern from other middleware in the framework
+- Natural composition via pipeline
+- Pre/post processing in one method
+- Source attribution built-in
+
+**Cons:**
+- Breaking change (acceptable in preview)
+- Migration effort for existing users
+
+## Decision Outcome
+
+Chosen option: **"Option 2: ContextMiddleware"**, because it significantly reduces cognitive overhead, follows established patterns in the framework, and enables powerful composition for context engineering scenarios.
+
+### Key Design Decisions
+
+#### 1. Onion/Wrapper Pattern
+
+Like other middleware in the framework, `ContextMiddleware` uses `process(context, next)`:
+
+```python
+class ContextMiddleware(ABC):
+    def __init__(self, source_id: str, *, session_id: str | None = None):
+        self.source_id = source_id
+        self.session_id = session_id
+
+    @abstractmethod
+    async def process(self, context: SessionContext, next: ContextMiddlewareNext) -> None:
+        """Wrap the context flow - modify before next(), process after."""
+        pass
+```
+
+**Comparison to Current:**
+| Aspect | ContextProvider (Current) | ContextMiddleware (New) |
+|--------|--------------------------|------------------------|
+| Pre-processing | `invoking()` method | Before `await next(context)` |
+| Post-processing | `invoked()` method | After `await next(context)` |
+| Composition | Single provider only | Pipeline of middleware |
+| Pattern | Callback hooks | Onion/wrapper |
+
+#### 2. Agent vs Session Ownership
+
+- **Agent** owns `Sequence[ContextMiddlewareConfig]` (instances or factories)
+- **AgentSession** owns `ContextMiddlewarePipeline` (resolved at runtime)
+
+```python
+# Agent holds middleware configuration
+agent = ChatAgent(
+    chat_client=client,
+    context_middleware=[
+        InMemoryStorageMiddleware("memory"),
+        RAGContextMiddleware("rag"),
+    ]
+)
+
+# Session holds the resolved pipeline
+session = agent.get_new_session()
+```
+
+**Comparison to Current:**
+| Aspect | AgentThread (Current) | AgentSession (New) |
+|--------|----------------------|-------------------|
+| Storage | `message_store` attribute | Via `StorageContextMiddleware` in pipeline |
+| Context | `context_provider` attribute | Via any `ContextMiddleware` in pipeline |
+| Composition | One of each | Unlimited middleware |
+
+#### 3. Unified Storage Middleware
+
+Instead of separate `ChatMessageStore`, storage is a type of `ContextMiddleware`:
+
+```python
+class StorageContextMiddleware(ContextMiddleware):
+    def __init__(
+        self,
+        source_id: str,
+        *,
+        load_messages: bool | None = None,  # None = smart mode
+        store_inputs: bool = True,
+        store_responses: bool = True,
+        store_context_messages: bool = False,
+        store_context_from: Sequence[str] | None = None,
+    ): ...
+```
+
+**Smart Load Behavior:**
+- `load_messages=None` (default): Automatically disable loading when:
+  - `context.options.get('store') == False`, OR
+  - `context.service_session_id is not None` (service handles storage)
+
+**Comparison to Current:**
+| Aspect | ChatMessageStore (Current) | StorageContextMiddleware (New) |
+|--------|---------------------------|------------------------------|
+| Load messages | Always via `list_messages()` | Configurable `load_messages` flag |
+| Store messages | Always via `add_messages()` | Configurable `store_*` flags |
+| What to store | All messages | Selective: inputs, responses, context |
+| RAG context | Not supported | `store_context_messages=True` |
+
+#### 4. Source Attribution via `source_id`
+
+Every middleware has a required `source_id` that attributes added messages:
+
+```python
+class SessionContext:
+    # Messages keyed by source_id
+    context_messages: dict[str, list[ChatMessage]]
+
+    def add_messages(self, source_id: str, messages: Sequence[ChatMessage]) -> None:
+        if source_id not in self.context_messages:
+            self.context_messages[source_id] = []
+        self.context_messages[source_id].extend(messages)
+
+    def get_messages(
+        self,
+        sources: Sequence[str] | None = None,
+        exclude_sources: Sequence[str] | None = None,
+    ) -> list[ChatMessage]:
+        """Get messages, optionally filtered by source."""
+        ...
+```
+
+**Benefits over Current:**
+- Debug which middleware added which messages
+- Filter messages by source (e.g., exclude RAG from storage)
+- Multiple instances of same middleware type distinguishable
+
+#### 5. Default Storage Behavior
+
+Zero-config works out of the box:
+
+```python
+# No middleware configured - still gets conversation history!
+agent = ChatAgent(chat_client=client, name="assistant")
+session = agent.get_new_session()
+response = await agent.run("Hello!", session=session)
+response = await agent.run("What did I say?", session=session)  # Remembers!
+```
+
+Default `InMemoryStorageMiddleware` is added at runtime when:
+- No `service_session_id` (service not managing storage)
+- `options.store` is not `True` (user not expecting service storage)
+- Pipeline is empty or None
+
+**Comparison to Current:**
+| Aspect | AgentThread (Current) | AgentSession (New) |
+|--------|----------------------|-------------------|
+| Default storage | Creates `ChatMessageStore` lazily | Creates `InMemoryStorageMiddleware` at runtime |
+| When | In `on_new_messages()` | In `run_context_pipeline()` |
+| Customizable | After creation | Before first `run()` |
+
+#### 6. Middleware Instance vs Factory
+
+Support both shared instances and per-session factories:
+
+```python
+# Instance (shared across sessions)
+agent = ChatAgent(
+    context_middleware=[RAGContextMiddleware("rag")]
+)
+
+# Factory (new instance per session)
+def create_session_cache(session_id: str | None) -> ContextMiddleware:
+    return SessionCacheMiddleware("cache", session_id=session_id)
+
+agent = ChatAgent(
+    context_middleware=[create_session_cache]
+)
+```
+
+#### 7. Renaming: Thread → Session
+
+`AgentThread` becomes `AgentSession` to better reflect its purpose:
+- "Thread" implies a sequence of messages
+- "Session" better captures the broader scope (state, middleware, lifecycle)
+
+### Migration Impact
+
+| Current | New | Notes |
+|---------|-----|-------|
+| `ContextProvider` | `ContextMiddleware` | Implement `process()` instead of `invoking()`/`invoked()` |
+| `ChatMessageStore` | `StorageContextMiddleware` | Extend and implement `get_messages()`/`save_messages()` |
+| `AgentThread` | `AgentSession` | Clean break, no alias |
+| `thread.message_store` | Via middleware in pipeline | Configure at agent level |
+| `thread.context_provider` | Via middleware in pipeline | Multiple providers supported |
+
+### Example: Current vs New
+
+**Current:**
+```python
+class MyContextProvider(ContextProvider):
+    async def invoking(self, messages, **kwargs) -> Context:
+        docs = await self.retrieve_documents(messages[-1].text)
+        return Context(messages=[ChatMessage.system(f"Context: {docs}")])
+
+    async def invoked(self, request, response, **kwargs) -> None:
+        await self.store_interaction(request, response)
+
+async with MyContextProvider() as provider:
+    agent = ChatAgent(chat_client=client, name="assistant")
+    thread = await agent.get_new_thread(message_store=ChatMessageStore())
+    thread.context_provider = provider
+    response = await agent.run("Hello", thread=thread)
+```
+
+**New:**
+```python
+class RAGMiddleware(ContextMiddleware):
+    async def process(self, context: SessionContext, next) -> None:
+        # Pre-processing
+        docs = await self.retrieve_documents(context.input_messages[-1].text)
+        context.add_messages(self.source_id, [ChatMessage.system(f"Context: {docs}")])
+
+        await next(context)
+
+        # Post-processing
+        await self.store_interaction(context.input_messages, context.response_messages)
+
+agent = ChatAgent(
+    chat_client=client,
+    name="assistant",
+    context_middleware=[
+        InMemoryStorageMiddleware("memory"),
+        RAGMiddleware("rag"),
+    ]
+)
+session = agent.get_new_session()
+response = await agent.run("Hello", session=session)
+```
+
+## Implementation Plan
+
+See **Appendix A** for the detailed implementation plan including:
+- Complete class definitions
+- User experience examples
+- Phase-by-phase workplan
+
+---
+
+## Appendix A: Implementation Plan
+
+### New Types
+
+```python
+# Copyright (c) Microsoft. All rights reserved.
+
+from abc import ABC, abstractmethod
+from collections.abc import Awaitable, Callable, Sequence
+from typing import Any
+
+from ._types import ChatMessage
+from ._tools import ToolProtocol
+
+
+class SessionContext:
+    """State passed through the ContextMiddleware pipeline for a single invocation.
+
+    This object is created fresh for each agent invocation and flows through the
+    middleware pipeline. Middleware can read from and write to the mutable fields
+    to add context before invocation and process responses after.
+
+    Attributes:
+        session_id: The ID of the current session
+        service_session_id: Service-managed session ID (if present, service handles storage)
+        input_messages: The new messages being sent to the agent (read-only, set by caller)
+        context_messages: Dict mapping source_id -> messages added by that middleware.
+            Maintains insertion order (middleware execution order). Use add_context_messages()
+            to add messages with proper source attribution.
+        instructions: Additional instructions - middleware can append here
+        tools: Additional tools - middleware can append here
+        response_messages: After invocation, contains the agent's response (set by agent)
+        options: Options passed to agent.run() - READ-ONLY, for reflection only
+        metadata: Shared metadata dictionary for cross-middleware communication
+
+    Note:
+        - `options` is read-only; changes will NOT be merged back into the agent run
+        - `instructions` and `tools` are merged by the agent into the run options
+        - `context_messages` values are flattened in order when building the final input
+    """
+
+    def __init__(
+        self,
+        *,
+        session_id: str | None = None,
+        service_session_id: str | None = None,
+        input_messages: list[ChatMessage],
+        context_messages: dict[str, list[ChatMessage]] | None = None,
+        instructions: list[str] | None = None,
+        tools: list[ToolProtocol] | None = None,
+        response_messages: list[ChatMessage] | None = None,
+        options: dict[str, Any] | None = None,
+        metadata: dict[str, Any] | None = None,
+    ):
+        self.session_id = session_id
+        self.service_session_id = service_session_id
+        self.input_messages = input_messages
+        self.context_messages: dict[str, list[ChatMessage]] = context_messages or {}
+        self.instructions: list[str] = instructions or []
+        self.tools: list[ToolProtocol] = tools or []
+        self.response_messages = response_messages
+        self.options = options or {}  # READ-ONLY - for reflection only
+        self.metadata = metadata or {}
+
+    # --- Methods for adding context ---
+
+    def add_messages(self, source_id: str, messages: Sequence[ChatMessage]) -> None:
+        """Add context messages from a specific source.
+
+        Messages are stored keyed by source_id, maintaining insertion order
+        based on middleware execution order.
+
+        Args:
+            source_id: The middleware source_id adding these messages
+            messages: The messages to add
+        """
+        if source_id not in self.context_messages:
+            self.context_messages[source_id] = []
+        self.context_messages[source_id].extend(messages)
+
+    def add_instructions(self, source_id: str, instructions: str | Sequence[str]) -> None:
+        """Add instructions to be prepended to the conversation.
+
+        Instructions are added to a flat list. The source_id is recorded
+        in metadata for debugging but instructions are not keyed by source.
+
+        Args:
+            source_id: The middleware source_id adding these instructions
+            instructions: A single instruction string or sequence of strings
+        """
+        if isinstance(instructions, str):
+            instructions = [instructions]
+        self.instructions.extend(instructions)
+
+    def add_tools(self, source_id: str, tools: Sequence[ToolProtocol]) -> None:
+        """Add tools to be available for this invocation.
+
+        Tools are added with source attribution in their metadata.
+
+        Args:
+            source_id: The middleware source_id adding these tools
+            tools: The tools to add
+        """
+        for tool in tools:
+            # Add source attribution to tool metadata
+            if hasattr(tool, 'metadata') and isinstance(tool.metadata, dict):
+                tool.metadata["context_source"] = source_id
+        self.tools.extend(tools)
+
+    # --- Methods for reading context ---
+
+    def get_messages(
+        self,
+        sources: Sequence[str] | None = None,
+        exclude_sources: Sequence[str] | None = None,
+    ) -> list[ChatMessage]:
+        """Get context messages, optionally filtered by source.
+
+        Returns messages in middleware execution order (dict insertion order).
+
+        Args:
+            sources: If provided, only include messages from these sources
+            exclude_sources: If provided, exclude messages from these sources
+
+        Returns:
+            Flattened list of messages in middleware execution order
+        """
+        result: list[ChatMessage] = []
+        for source_id, messages in self.context_messages.items():
+            if sources is not None and source_id not in sources:
+                continue
+            if exclude_sources is not None and source_id in exclude_sources:
+                continue
+            result.extend(messages)
+        return result
+
+    def get_all_messages(
+        self,
+        *,
+        include_input: bool = False,
+        include_response: bool = False,
+    ) -> list[ChatMessage]:
+        """Get all messages, optionally including input and response.
+
+        Returns messages in the order they would appear in a full conversation:
+        1. Context messages (from middleware, in execution order)
+        2. Input messages (if include_input=True)
+        3. Response messages (if include_response=True)
+
+        Args:
+            include_input: If True, append input_messages after context
+            include_response: If True, append response_messages at the end
+
+        Returns:
+            Flattened list of messages in conversation order
+        """
+        result: list[ChatMessage] = []
+
+        # Context messages in middleware execution order
+        for messages in self.context_messages.values():
+            result.extend(messages)
+
+        # Input messages (user's new messages for this invocation)
+        if include_input and self.input_messages:
+            result.extend(self.input_messages)
+
+        # Response messages (agent's response)
+        if include_response and self.response_messages:
+            result.extend(self.response_messages)
+
+        return result
+
+
+# Type alias for the next middleware callable
+ContextMiddlewareNext = Callable[[SessionContext], Awaitable[None]]
+
+# Type alias for middleware factory - takes session_id, returns middleware
+ContextMiddlewareFactory = Callable[[str | None], ContextMiddleware]
+
+# Union type for middleware configuration - either instance or factory
+ContextMiddlewareConfig = ContextMiddleware | ContextMiddlewareFactory
+
+
+class ContextMiddleware(ABC):
+    """Base class for context middleware (onion/wrapper pattern).
+
+    Context middleware wraps the context preparation and storage flow,
+    allowing modification of messages, tools, and instructions before
+    invocation and processing of responses after invocation.
+
+    The process() method receives a context and a next() callable.
+    Before calling next(), you can modify the context (add messages, tools, etc.).
+    After calling next(), the response_messages will be populated and you can
+    process them (store, extract info, etc.).
+
+    Lifecycle:
+    - session_created(): Called once when a new session is created
+    - process(): Called for each invocation, wraps the context flow
+
+    Attributes:
+        source_id: Unique identifier for this middleware instance (required).
+            Used for message/tool attribution so other middleware can filter.
+        session_id: The session ID, automatically set when created via factory.
+            None if middleware is shared across sessions (instance mode).
+
+    Note:
+        Middleware can be provided to agents as either:
+        - An instantiated middleware object (shared across all sessions)
+        - A factory function `(session_id: str | None) -> ContextMiddleware`
+          that creates a new instance per session
+
+    Examples:
+        # As instance (shared across sessions)
+        class MyContextMiddleware(ContextMiddleware):
+            def __init__(self, source_id: str):
+                super().__init__(source_id=source_id)
+
+            async def process(self, context, next):
+                context.add_instructions(self.source_id, "Be helpful!")
+                await next(context)
+
+        # As factory (new instance per session)
+        def create_session_middleware(session_id: str | None) -> ContextMiddleware:
+            return MySessionMiddleware(
+                source_id="session_specific",
+                session_id=session_id,
+            )
+
+                # POST-PROCESSING: Handle response after invocation
+                for msg in context.response_messages or []:
+                    print(f"Response: {msg.text}")
+    """
+
+    def __init__(self, source_id: str, *, session_id: str | None = None):
+        """Initialize the middleware.
+
+        Args:
+            source_id: Unique identifier for this middleware instance.
+                Used for message/tool attribution.
+            session_id: Optional session ID. Automatically set when middleware
+                is created via a factory function.
+        """
+        self.source_id = source_id
+        self.session_id = session_id
+
+    async def session_created(self, session_id: str | None) -> None:
+        """Called when a new session is created.
+
+        Override this to load any initial data from persistent storage
+        or perform session-level initialization.
+
+        Note: If you need the session_id, prefer using `self.session_id`
+        which is set automatically when using a factory.
+
+        Args:
+            session_id: The ID of the newly created session
+        """
+        pass
+
+    @abstractmethod
+    async def process(
+        self,
+        context: SessionContext,
+        next: ContextMiddlewareNext
+    ) -> None:
+        """Process the context, wrapping the call to next middleware.
+
+        Before calling next():
+        - Modify context.context_messages to add messages (RAG, memory, etc.)
+        - Modify context.instructions to add system instructions
+        - Modify context.tools to add tools for this invocation
+        - Access context.history_messages to see loaded history
+        - Access context.input_messages to see new user messages
+
+        After calling next():
+        - context.response_messages contains the agent's response
+        - Store messages, extract information, perform cleanup
+
+        Args:
+            context: The invocation context being processed
+            next: Callable to invoke the next middleware in the chain
+        """
+        pass
+```
+
+### Storage Middleware Base
+
+```python
+class StorageContextMiddleware(ContextMiddleware):
+    """Base class for storage-focused context middleware.
+
+    A single class that can be configured for different use cases:
+    - Primary memory storage (loads + stores messages)
+    - Audit/logging storage (stores only, doesn't load)
+    - Evaluation storage (stores only for later analysis)
+
+    Loading behavior (when to add messages to context_messages[source_id]):
+    - `load_messages=True`: Always load messages
+    - `load_messages=False`: Never load (audit/logging mode)
+    - `load_messages=None` (default): Smart mode - load unless:
+      - `context.options.get('store', True)` is False, OR
+      - `context.service_session_id` is present (service manages storage)
+
+    Storage behavior:
+    - `store_inputs`: Store input messages (default True)
+    - `store_responses`: Store response messages (default True)
+    - Storage always happens unless explicitly disabled, regardless of load_messages
+
+    Warning: If multiple middleware have load_messages=True, a warning
+    is logged at pipeline creation time (likely misconfiguration).
+
+    Examples:
+        # Primary memory - loads and stores
+        memory = InMemoryStorageMiddleware(source_id="memory")
+
+        # Audit storage - stores only, doesn't add to context
+        audit = RedisStorageMiddleware(
+            source_id="audit",
+            load_messages=False,
+            redis_url="redis://...",
+        )
+
+        # Evaluation storage - stores responses only
+        eval_storage = CosmosStorageMiddleware(
+            source_id="evaluation",
+            load_messages=False,
+            store_inputs=False,
+            store_responses=True,
+        )
+
+        # Full audit - stores everything including RAG context
+        full_audit = CosmosStorageMiddleware(
+            source_id="full_audit",
+            load_messages=False,
+            store_context_messages=True,  # Also store context from other middleware
+        )
+    """
+
+    def __init__(
+        self,
+        source_id: str,
+        *,
+        session_id: str | None = None,
+        load_messages: bool | None = None,  # None = smart mode
+        store_responses: bool = True,
+        store_inputs: bool = True,
+        store_context_messages: bool = False,  # Store context added by other middleware
+        store_context_from: Sequence[str] | None = None,  # Only store from these sources
+    ):
+        super().__init__(source_id, session_id=session_id)
+        self.load_messages = load_messages
+        self.store_responses = store_responses
+        self.store_inputs = store_inputs
+        self.store_context_messages = store_context_messages
+        self.store_context_from = list(store_context_from) if store_context_from else None
+
+    @abstractmethod
+    async def get_messages(self, session_id: str | None) -> list[ChatMessage]:
+        """Retrieve stored messages for this session."""
+        pass
+
+    @abstractmethod
+    async def save_messages(
+        self,
+        session_id: str | None,
+        messages: Sequence[ChatMessage]
+    ) -> None:
+        """Persist messages for this session."""
+        pass
+
+    def _should_load_messages(self, context: SessionContext) -> bool:
+        """Determine if we should load messages based on config and context."""
+        # Explicit configuration takes precedence
+        if self.load_messages is not None:
+            return self.load_messages
+
+        # Smart mode: don't load if service manages storage
+        if context.service_session_id is not None:
+            return False
+
+        # Smart mode: respect options['store']
+        return context.options.get('store', True)
+
+    def _get_context_messages_to_store(self, context: SessionContext) -> list[ChatMessage]:
+        """Get context messages that should be stored based on configuration."""
+        if not self.store_context_messages:
+            return []
+
+        if self.store_context_from is not None:
+            # Only store from specific sources
+            return context.get_messages(sources=self.store_context_from)
+        else:
+            # Store all context messages (excluding our own to avoid duplication)
+            return context.get_messages(exclude_sources=[self.source_id])
+
+    async def process(
+        self,
+        context: SessionContext,
+        next: ContextMiddlewareNext
+    ) -> None:
+        # PRE: Load history if configured, keyed by our source_id
+        if self._should_load_messages(context):
+            history = await self.get_messages(context.session_id)
+            context.add_messages(self.source_id, history)
+
+        # Continue to next middleware
+        await next(context)
+
+        # POST: Store messages
+        messages_to_store: list[ChatMessage] = []
+
+        # Optionally store context messages from other middleware
+        messages_to_store.extend(self._get_context_messages_to_store(context))
+
+        if self.store_inputs:
+            messages_to_store.extend(context.input_messages)
+        if self.store_responses and context.response_messages:
+            messages_to_store.extend(context.response_messages)
+        if messages_to_store:
+            await self.save_messages(context.session_id, messages_to_store)
+```
+
+### Message/Tool Attribution
+
+The `SessionContext` provides explicit methods for adding context:
+
+```python
+# Adding messages (keyed by source_id in context_messages dict)
+context.add_messages(self.source_id, messages)
+
+# Adding instructions (flat list, source_id for debugging)
+context.add_instructions(self.source_id, "Be concise and helpful.")
+context.add_instructions(self.source_id, ["Instruction 1", "Instruction 2"])
+
+# Adding tools (source attribution added to tool.metadata automatically)
+context.add_tools(self.source_id, [my_tool, another_tool])
+
+# Getting all messages in middleware execution order
+all_messages = context.get_all_messages()
+
+# Filtering by source
+memory_messages = context.get_messages(sources=["memory"])
+non_rag_messages = context.get_messages(exclude_sources=["rag"])
+
+# Direct access to check specific sources
+if "memory" in context.context_messages:
+    history = context.context_messages["memory"]
+```
+
+### AgentSession Class (replaces AgentThread)
+
+```python
+import uuid
+import warnings
+from collections.abc import Sequence
+
+
+def _resolve_middleware(
+    config: ContextMiddlewareConfig,
+    session_id: str | None,
+) -> ContextMiddleware:
+    """Resolve a middleware config to an instance.
+
+    If config is already a ContextMiddleware instance, return it.
+    If config is a factory callable, call it with session_id to create an instance.
+    """
+    if isinstance(config, ContextMiddleware):
+        return config
+    # It's a factory - call it with session_id
+    return config(session_id)
+
+
+class ContextMiddlewarePipeline:
+    """Executes a chain of context middleware in onion/wrapper style."""
+
+    def __init__(self, middleware: Sequence[ContextMiddleware]):
+        self._middleware = list(middleware)
+        self._validate_middleware()
+
+    @classmethod
+    def from_config(
+        cls,
+        configs: Sequence[ContextMiddlewareConfig],
+        session_id: str | None,
+    ) -> "ContextMiddlewarePipeline":
+        """Create a pipeline from middleware configs, resolving factories.
+
+        Args:
+            configs: Sequence of middleware instances or factories
+            session_id: Session ID to pass to factories
+
+        Returns:
+            A new pipeline with resolved middleware instances
+        """
+        middleware = [_resolve_middleware(config, session_id) for config in configs]
+        return cls(middleware)
+
+    def _validate_middleware(self) -> None:
+        """Warn if multiple middleware are configured to load messages."""
+        loaders = [
+            m for m in self._middleware
+            if isinstance(m, StorageContextMiddleware)
+            and m.load_messages is True
+        ]
+        if len(loaders) > 1:
+            warnings.warn(
+                f"Multiple storage middleware configured to load messages: "
+                f"{[m.source_id for m in loaders]}. "
+                f"This may cause duplicate messages in context. "
+                f"Consider setting load_messages=False on all but one.",
+                UserWarning
+            )
+
+    async def session_created(self, session_id: str | None) -> None:
+        """Notify all middleware that a session was created."""
+        for middleware in self._middleware:
+            await middleware.session_created(session_id)
+
+    async def execute(self, context: SessionContext) -> None:
+        """Execute the middleware pipeline."""
+
+        async def terminal(s: SessionContext) -> None:
+            # Terminal handler - nothing more to do
+            pass
+
+        # Build the chain from last to first
+        next_handler = terminal
+        for middleware in reversed(self._middleware):
+            # Capture middleware in closure
+            current_middleware = middleware
+            current_next = next_handler
+
+            async def handler(s: SessionContext, mw=current_middleware, nxt=current_next) -> None:
+                await mw.process(s, nxt)
+
+            next_handler = handler
+
+        # Execute the chain
+        await next_handler(context)
+
+
+class AgentSession:
+    """A conversation session with an agent.
+
+    AgentSession manages the conversation state and owns a ContextMiddlewarePipeline
+    that processes context before each invocation and handles responses after.
+
+    Note: The session is created by calling agent.get_new_session(), which constructs
+    the pipeline from the agent's context_middleware sequence, resolving any factories.
+
+    Attributes:
+        session_id: Unique identifier for this session
+        service_session_id: Service-managed session ID (if using service-side storage)
+        context_pipeline: The middleware pipeline for this session
+    """
+
+    def __init__(
+        self,
+        *,
+        session_id: str | None = None,
+        service_session_id: str | None = None,
+        context_pipeline: ContextMiddlewarePipeline | None = None,
+    ):
+        """Initialize the session.
+
+        Note: Prefer using agent.get_new_session() instead of direct construction.
+
+        Default storage behavior (applied at runtime, not init):
+        - If service_session_id is set: service handles storage, no default added
+        - If options.store=True: user expects service storage, no default added
+        - If no service_session_id AND store is not True AND no pipeline:
+          InMemoryStorageMiddleware is automatically added
+
+        Args:
+            session_id: Optional session ID (generated if not provided)
+            service_session_id: Optional service-managed session ID
+            context_pipeline: The middleware pipeline (created by agent)
+        """
+        self._session_id = session_id or str(uuid.uuid4())
+        self._service_session_id = service_session_id
+        self._context_pipeline = context_pipeline
+        self._initialized = False
+        self._default_storage_checked = False
+
+    @property
+    def session_id(self) -> str:
+        """The unique identifier for this session."""
+        return self._session_id
+
+    @property
+    def service_session_id(self) -> str | None:
+        """The service-managed session ID (if using service-side storage)."""
+        return self._service_session_id
+
+    @service_session_id.setter
+    def service_session_id(self, value: str | None) -> None:
+        self._service_session_id = value
+
+    @property
+    def context_pipeline(self) -> ContextMiddlewarePipeline | None:
+        """The middleware pipeline for this session."""
+        return self._context_pipeline
+
+    @context_pipeline.setter
+    def context_pipeline(self, value: ContextMiddlewarePipeline | None) -> None:
+        """Set the middleware pipeline for this session."""
+        self._context_pipeline = value
+
+    def _ensure_default_storage(self, options: dict[str, Any]) -> None:
+        """Add default InMemoryStorageMiddleware if needed.
+
+        Called at runtime (first run) so users can modify the pipeline
+        after session creation but before first invocation.
+
+        Default storage is added when ALL of these are true:
+        - No service_session_id (service not managing storage)
+        - options.store is not True (user not expecting service storage)
+        - Pipeline is empty or None (user hasn't configured middleware)
+        """
+        if self._default_storage_checked:
+            return
+        self._default_storage_checked = True
+
+        # User expects service-side storage
+        if options.get("store") is True:
+            return
+
+        # Service is managing storage
+        if self._service_session_id is not None:
+            return
+
+        # User has configured middleware
+        if self._context_pipeline is not None and len(self._context_pipeline) > 0:
+            return
+
+        # Add default in-memory storage
+        default_middleware = InMemoryStorageMiddleware("memory")
+        if self._context_pipeline is None:
+            self._context_pipeline = ContextMiddlewarePipeline([default_middleware])
+        else:
+            self._context_pipeline.prepend(default_middleware)
+
+    async def initialize(self) -> None:
+        """Initialize the session and notify middleware."""
+        if not self._initialized and self._context_pipeline is not None:
+            await self._context_pipeline.session_created(self._session_id)
+            self._initialized = True
+
+    async def run_context_pipeline(
+        self,
+        input_messages: list[ChatMessage],
+        *,
+        tools: list[ToolProtocol] | None = None,
+        options: dict[str, Any] | None = None,
+    ) -> SessionContext:
+        """Prepare context by running the middleware pipeline.
+
+        This runs the full middleware pipeline (pre-processing, then post-processing
+        after response_messages is set).
+
+        Args:
+            input_messages: New messages to send to the agent
+            tools: Additional tools available for this invocation
+            options: Options including 'store' flag (READ-ONLY, for reflection)
+
+        Returns:
+            The invocation context with history, context, instructions, and tools populated
+        """
+        options = options or {}
+
+        # Check for default storage on first run (deferred from init)
+        self._ensure_default_storage(options)
+
+        await self.initialize()
+        context = SessionContext(
+            session_id=self._session_id,
+            service_session_id=self._service_session_id,
+            input_messages=input_messages,
+            tools=tools or [],
+            options=options,
+        )
+        if self._context_pipeline is not None:
+            await self._context_pipeline.execute(context)
+        return context
+
+
+# Example of how agent creates sessions:
+class ChatAgent:
+    def __init__(
+        self,
+        chat_client: ...,
+        *,
+        context_middleware: Sequence[ContextMiddleware] | None = None,
+        # ... other params
+    ):
+        self._context_middleware = list(context_middleware or [])
+        # ... other init
+
+    def get_new_session(
+        self,
+        *,
+        session_id: str | None = None,
+        service_session_id: str | None = None,
+    ) -> AgentSession:
+        """Create a new session with a fresh middleware pipeline.
+
+        Middleware factories are called with the session_id to create
+        session-specific instances.
+
+        Args:
+            session_id: Optional session ID (generated if not provided)
+            service_session_id: Optional service-managed session ID
+        """
+        resolved_session_id = session_id or str(uuid.uuid4())
+
+        # Only create pipeline if we have middleware configured
+        pipeline = None
+        if self._context_middleware:
+            pipeline = ContextMiddlewarePipeline.from_config(
+                self._context_middleware,
+                session_id=resolved_session_id,
+            )
+
+        return AgentSession(
+            session_id=resolved_session_id,
+            service_session_id=service_session_id,
+            context_pipeline=pipeline,
+        )
+
+    async def run(self, input: str, *, session: AgentSession, options: dict[str, Any] | None = None) -> ...:
+        """Run the agent with the given input."""
+        # Default storage check happens inside session.run_context_pipeline()
+        # ... rest of run logic
+```
+
+---
+
+## User Experience Examples
+
+### Example 0: Zero-Config Default (Simplest Use Case)
+
+```python
+from agent_framework import ChatAgent
+
+# No middleware configured - but conversation history still works!
+agent = ChatAgent(
+    chat_client=client,
+    name="assistant",
+    # No context_middleware specified
+)
+
+# Create session - automatically gets InMemoryStorageMiddleware on first run
+session = agent.get_new_session()
+response = await agent.run("Hello, my name is Alice!", session=session)
+
+# Conversation history is preserved automatically
+response = await agent.run("What's my name?", session=session)
+# Agent remembers: "Your name is Alice!"
+
+# With service-managed session - no default storage added (service handles it)
+service_session = agent.get_new_session()
+
+# With store=True in options - user expects service storage, no default added
+response = await agent.run("Hello!", session=session, options={"store": True})
+
+# User can manually add middleware to session before first run
+session = agent.get_new_session()
+session.context_pipeline = ContextMiddlewarePipeline([
+    MyCustomMiddleware(source_id="custom")
+])
+response = await agent.run("Hello!", session=session)  # No default added since pipeline exists
+```
+
+### Example 1: Explicit Memory Storage
+
+```python
+from agent_framework import ChatAgent
+from agent_framework.context import InMemoryStorageMiddleware
+
+# Explicit middleware configuration (same behavior as default, but explicit)
+agent = ChatAgent(
+    chat_client=client,
+    name="assistant",
+    context_middleware=[
+        InMemoryStorageMiddleware(source_id="memory")
+    ]
+)
+
+# Create session and chat
+session = agent.get_new_session()
+response = await agent.run("Hello!", session=session)
+
+# Messages are automatically stored and loaded on next invocation
+response = await agent.run("What did I say before?", session=session)
+```
+
+### Example 1b: Using Middleware Factory for Per-Session State
+
+```python
+from agent_framework import ChatAgent
+from agent_framework.context import ContextMiddleware, SessionContext
+
+class SessionSpecificMiddleware(ContextMiddleware):
+    """Middleware that stores state per session."""
+
+    def __init__(self, source_id: str, session_id: str | None):
+        super().__init__(source_id=source_id)
+        self.session_id = session_id
+        self.invocation_count = 0  # Per-session counter
+
+    async def process(self, context: SessionContext, next) -> None:
+        self.invocation_count += 1
+        context.add_instructions(
+            self.source_id,
+            f"This is invocation #{self.invocation_count} in session {self.session_id}"
+        )
+        await next(context)
+
+
+# Factory function - receives session_id when session is created
+def create_session_middleware(session_id: str | None) -> ContextMiddleware:
+    return SessionSpecificMiddleware(
+        source_id="session_tracker",
+        session_id=session_id,
+    )
+
+
+# Agent with factory - each session gets its own middleware instance
+agent = ChatAgent(
+    chat_client=client,
+    name="assistant",
+    context_middleware=[
+        InMemoryStorageMiddleware(source_id="memory"),  # Instance (shared)
+        create_session_middleware,  # Factory (per-session)
+    ]
+)
+
+# Each session gets a fresh SessionSpecificMiddleware instance
+session1 = agent.get_new_session()
+session2 = agent.get_new_session()
+# session1 and session2 have independent invocation_count
+```
+
+### Example 2: RAG + Memory + Audit (All StorageContextMiddleware)
+
+```python
+from agent_framework import ChatAgent
+from agent_framework.azure import CosmosStorageMiddleware, AzureAISearchContextMiddleware
+from agent_framework.redis import RedisStorageMiddleware
+
+# RAG middleware that injects relevant documents
+search_middleware = AzureAISearchContextMiddleware(
+    source_id="rag",
+    endpoint="https://...",
+    index_name="documents",
+)
+
+# Primary memory storage (loads + stores)
+# load_messages=None (default) = smart mode, respects options['store'] and service_session_id
+memory_middleware = RedisStorageMiddleware(
+    source_id="memory",
+    redis_url="redis://...",
+)
+
+# Audit storage - SAME CLASS, different configuration
+# load_messages=False = never loads, just stores for audit
+audit_middleware = CosmosStorageMiddleware(
+    source_id="audit",
+    connection_string="...",
+    load_messages=False,  # Don't load - just store for audit
+)
+
+agent = ChatAgent(
+    chat_client=client,
+    name="assistant",
+    context_middleware=[
+        memory_middleware,   # First: loads history (smart mode)
+        search_middleware,   # Second: adds RAG context
+        audit_middleware,    # Third: stores for audit (no load)
+    ]
+)
+```
+
+### Example 3: Custom Context Middleware (Onion Pattern)
+
+```python
+from agent_framework.context import ContextMiddleware, SessionContext
+
+class TimeContextMiddleware(ContextMiddleware):
+    """Adds current time to the context."""
+
+    def __init__(self, source_id: str):
+        super().__init__(source_id=source_id)
+
+    async def process(
+        self,
+        context: SessionContext,
+        next
+    ) -> None:
+        from datetime import datetime
+
+        # PRE: Add time instruction using explicit method
+        context.add_instructions(
+            self.source_id,
+            f"Current date and time: {datetime.now().isoformat()}"
+        )
+
+        # Continue to next middleware
+        await next(context)
+
+        # POST: Nothing to do after invocation for this middleware
+
+
+class UserPreferencesMiddleware(ContextMiddleware):
+    """Tracks and applies user preferences from conversation."""
+
+    def __init__(self, source_id: str):
+        super().__init__(source_id=source_id)
+        self._preferences: dict[str, dict[str, Any]] = {}
+
+    async def process(
+        self,
+        context: SessionContext,
+        next
+    ) -> None:
+        # PRE: Add known preferences as instructions
+        prefs = self._preferences.get(context.session_id or "", {})
+        if prefs:
+            context.add_instructions(
+                self.source_id,
+                f"User preferences: {json.dumps(prefs)}"
+            )
+
+        # Continue to next middleware and model invocation
+        await next(context)
+
+        # POST: Extract preferences from response
+        for msg in context.response_messages or []:
+            if "preference:" in msg.text.lower():
+                # Store extracted preference for future sessions
+                pass
+
+
+# Compose middleware - each with mandatory source_id
+agent = ChatAgent(
+    chat_client=client,
+    context_middleware=[
+        InMemoryStorageMiddleware(source_id="memory"),
+        TimeContextMiddleware(source_id="time"),
+        UserPreferencesMiddleware(source_id="prefs"),
+    ]
+)
+```
+
+### Example 4: Filtering by Source (Using Dict-Based Context)
+
+```python
+class SelectiveContextMiddleware(ContextMiddleware):
+    """Middleware that only processes messages from specific sources."""
+
+    def __init__(self, source_id: str):
+        super().__init__(source_id=source_id)
+
+    async def process(
+        self,
+        context: SessionContext,
+        next
+    ) -> None:
+        # Check what sources have added messages so far
+        print(f"Sources so far: {list(context.context_messages.keys())}")
+
+        # Get messages excluding RAG context
+        non_rag_messages = context.get_messages(exclude_sources=["rag"])
+
+        # Or get only memory messages
+        if "memory" in context.context_messages:
+            memory_only = context.context_messages["memory"]
+
+        # Do something with filtered messages...
+        # e.g., sentiment analysis, topic extraction
+
+        # Continue to next middleware
+        await next(context)
+
+
+class RAGContextMiddleware(ContextMiddleware):
+    """Middleware that adds RAG context."""
+
+    def __init__(self, source_id: str):
+        super().__init__(source_id=source_id)
+
+    async def process(
+        self,
+        context: SessionContext,
+        next
+    ) -> None:
+        # Search for relevant documents based on input
+        relevant_docs = await self._search(context.input_messages)
+
+        # Add RAG context using explicit method
+        rag_messages = [
+            ChatMessage(role="system", text=f"Relevant info: {doc}")
+            for doc in relevant_docs
+        ]
+        context.add_messages(self.source_id, rag_messages)
+
+        await next(context)
+```
+
+### Example 5: Smart Storage with options.store and service_session_id
+
+```python
+# Default StorageContextMiddleware already has smart behavior!
+# load_messages=None (default) means:
+#   - Don't load if options['store'] is False
+#   - Don't load if service_session_id is present (service manages storage)
+#   - Otherwise, load messages
+
+agent = ChatAgent(
+    chat_client=client,
+    context_middleware=[
+        RedisStorageMiddleware(
+            source_id="memory",
+            redis_url="redis://...",
+            # load_messages=None is the default - smart mode
+        )
+    ]
+)
+
+session = agent.get_new_session()
+
+# Normal run - loads and stores messages
+response = await agent.run("Hello!", session=session)
+
+# Run without loading history (but still stores for audit)
+response = await agent.run(
+    "What's 2+2?",
+    session=session,
+    options={"store": False}  # Don't load history for this call
+)
+
+# With service-managed session - won't load (service handles it)
+service_session = agent.get_new_session(service_session_id="thread_abc123")
+response = await agent.run("Hello!", session=service_session)
+# Storage middleware sees service_session_id, skips loading
+```
+
+### Example 6: Multiple Instances of Same Middleware Type
+
+```python
+# You can have multiple instances of the same middleware class
+# by using different source_ids
+
+agent = ChatAgent(
+    chat_client=client,
+    context_middleware=[
+        # Primary storage for conversation history
+        RedisStorageMiddleware(
+            source_id="conversation_memory",
+            redis_url="redis://primary...",
+            load_messages=True,  # This one loads
+        ),
+        # Secondary storage for audit (different Redis instance)
+        RedisStorageMiddleware(
+            source_id="audit_log",
+            redis_url="redis://audit...",
+            load_messages=False,  # This one just stores
+        ),
+    ]
+)
+# Warning will NOT be logged because only one has load_messages=True
+```
+
+### Example 7: Middleware Ordering - RAG Before vs After Memory
+
+The order of middleware determines what context each middleware can see. This is especially important for RAG, which may benefit from seeing conversation history.
+
+```python
+from agent_framework import ChatAgent
+from agent_framework.context import InMemoryStorageMiddleware, ContextMiddleware, SessionContext
+
+class RAGContextMiddleware(ContextMiddleware):
+    """RAG middleware that retrieves relevant documents based on available context."""
+
+    async def process(self, context: SessionContext, next) -> None:
+        # Build query from what we can see
+        query_parts = []
+
+        # We can always see the current input
+        for msg in context.input_messages:
+            query_parts.append(msg.text)
+
+        # Can we see history? Depends on middleware order!
+        history = context.get_all_messages()  # Gets context from middleware that ran before us
+        if history:
+            # Include recent history for better RAG context
+            recent = history[-3:]  # Last 3 messages
+            for msg in recent:
+                query_parts.append(msg.text)
+
+        query = " ".join(query_parts)
+        documents = await self._retrieve_documents(query)
+
+        # Add retrieved documents as context
+        rag_messages = [ChatMessage.system(f"Relevant context:\n{doc}") for doc in documents]
+        context.add_messages(self.source_id, rag_messages)
+
+        await next(context)
+
+    async def _retrieve_documents(self, query: str) -> list[str]:
+        # ... vector search implementation
+        return ["doc1", "doc2"]
+
+
+# =============================================================================
+# SCENARIO A: RAG runs BEFORE Memory
+# =============================================================================
+# RAG only sees the current input message - no conversation history
+# Use when: RAG should be based purely on the current query
+
+agent_rag_first = ChatAgent(
+    chat_client=client,
+    context_middleware=[
+        RAGContextMiddleware("rag"),           # Runs first - only sees input_messages
+        InMemoryStorageMiddleware("memory"),   # Runs second - loads/stores history
+    ]
+)
+
+# Flow:
+# 1. RAG.process() BEFORE next():
+#    - context.input_messages = ["What's the weather?"]
+#    - context.get_all_messages() = []  (empty - memory hasn't run yet)
+#    - RAG query based on: "What's the weather?" only
+#    - Adds: context_messages["rag"] = [retrieved docs]
+#
+# 2. Memory.process() BEFORE next():
+#    - context.get_all_messages() = [rag docs]  (sees RAG context)
+#    - Loads history: context_messages["memory"] = [previous conversation]
+#
+# 3. Agent invocation with: history + rag docs + input
+#
+# 4. Memory.process() AFTER next():
+#    - Stores: input + response (not RAG docs by default)
+
+
+# =============================================================================
+# SCENARIO B: RAG runs AFTER Memory
+# =============================================================================
+# RAG sees conversation history - can use it for better retrieval
+# Use when: RAG should consider conversation context for better results
+
+agent_memory_first = ChatAgent(
+    chat_client=client,
+    context_middleware=[
+        InMemoryStorageMiddleware("memory"),   # Runs first - loads history
+        RAGContextMiddleware("rag"),           # Runs second - sees history + input
+    ]
+)
+
+# Flow:
+# 1. Memory.process() BEFORE next():
+#    - Loads history: context_messages["memory"] = [previous conversation]
+#
+# 2. RAG.process() BEFORE next():
+#    - context.input_messages = ["What's the weather?"]
+#    - context.get_all_messages() = [previous conversation]  (sees history!)
+#    - RAG query based on: recent history + "What's the weather?"
+#    - Better retrieval because RAG understands conversation context
+#    - Adds: context_messages["rag"] = [more relevant docs]
+#
+# 3. Agent invocation with: history + rag docs + input
+#
+# 4. Memory.process() AFTER next():
+#    - Stores: input + response
+
+
+# =============================================================================
+# SCENARIO C: RAG after Memory, with selective storage
+# =============================================================================
+# Memory first for better RAG, plus separate audit that stores RAG context
+
+agent_full_context = ChatAgent(
+    chat_client=client,
+    context_middleware=[
+        InMemoryStorageMiddleware("memory"),   # Primary history storage
+        RAGContextMiddleware("rag"),           # Gets history context for better retrieval
+        PersonaContextMiddleware("persona"),   # Adds persona instructions
+        # Audit storage - stores everything including RAG results
+        CosmosStorageMiddleware(
+            "audit",
+            load_messages=False,               # Don't load (memory handles that)
+            store_context_messages=True,       # Store RAG + persona context too
+        ),
+    ]
+)
+```
+
+### Example 8: Understanding the Onion Pattern for Storage
+
+```python
+# Detailed breakdown of what storage middleware sees at each phase:
+#
+# Middleware order: [Storage, RAG, Persona]
+#
+# BEFORE next() - Storage pre-processing:
+#   context.context_messages = {}  (empty, no one has added yet)
+#   context.input_messages = [user's message]
+#   context.response_messages = None
+#
+# BEFORE next() - RAG pre-processing:
+#   context.context_messages = {"memory": [...]}  (storage added history)
+#
+# BEFORE next() - Persona pre-processing:
+#   context.context_messages = {"memory": [...], "rag": [...]}
+#
+# --- Agent invocation happens ---
+#
+# AFTER next() - Persona post-processing:
+#   context.response_messages = [assistant's response]
+#
+# AFTER next() - RAG post-processing:
+#   (same state)
+#
+# AFTER next() - Storage post-processing:
+#   context.context_messages = {"memory": [...], "rag": [...], "persona": [...]}
+#   context.response_messages = [assistant's response]
+#
+#   Storage NOW has access to ALL context if store_context_messages=True
+
+class StorageWithLogging(StorageContextMiddleware):
+    """Example showing what storage sees at each phase."""
+
+    async def process(self, context: SessionContext, next) -> None:
+        # PRE: Load history
+        print(f"PRE - context sources: {list(context.context_messages.keys())}")
+        # Output: PRE - context sources: []
+
+        if self._should_load_messages(context):
+            history = await self.get_messages(context.session_id)
+            context.add_messages(self.source_id, history)
+
+        await next(context)
+
+        # POST: Now we see everything
+        print(f"POST - context sources: {list(context.context_messages.keys())}")
+        # Output: POST - context sources: ['memory', 'rag', 'persona']
+
+        # Store based on configuration
+        # 1. Determine which context messages to include
+        if self.store_context_messages:
+            if self.store_context_from:
+                # Only from specific sources
+                context_msgs = context.get_messages(sources=self.store_context_from)
+            else:
+                # All context messages from all middleware
+                context_msgs = context.get_all_messages()
+        else:
+            # No context from other middleware - typically just our own loaded history
+            context_msgs = []
+
+        # 2. Build final list: context + input + response
+        messages_to_store = list(context_msgs)
+        if self.store_inputs:
+            messages_to_store.extend(context.input_messages)
+        if self.store_responses:
+            messages_to_store.extend(context.response_messages or [])
+
+        await self.save_messages(context.session_id, messages_to_store)
+```
+
+---
+
+### Workplan
+
+#### Phase 1: Core Implementation
+- [ ] Create `ContextMiddleware` base class in `_context_middleware.py` (onion/wrapper pattern)
+- [ ] Create `SessionContext` class with explicit add/get methods
+- [ ] Create `ContextMiddlewarePipeline` with `from_config()` factory method
+- [ ] Create `ContextMiddlewareFactory` type alias and resolution logic
+- [ ] Create `StorageContextMiddleware` base class with load_messages/store flags
+- [ ] Implement pipeline validation (warn on multiple loaders with `load_messages=True`)
+
+#### Phase 2: AgentSession Implementation
+- [ ] Create `AgentSession` class with `context_pipeline` attribute
+- [ ] Add `context_middleware: Sequence[ContextMiddlewareConfig]` parameter to `BaseAgent` and `ChatAgent`
+- [ ] Implement `get_new_session()` that resolves factories and creates pipeline
+- [ ] Wire up context pipeline execution in agent invocation flow
+- [ ] Remove `AgentThread` completely (no alias, clean break)
+
+#### Phase 3: Built-in Middleware
+- [ ] Create `InMemoryStorageMiddleware` (replaces `ChatMessageStore`)
+- [ ] Create `@context_middleware` decorator for function-based middleware
+
+#### Phase 4: Migrate Existing Implementations
+- [ ] Migrate `AzureAISearchContextProvider` → `AzureAISearchContextMiddleware`
+- [ ] Migrate `RedisProvider` → `RedisStorageMiddleware`
+- [ ] Migrate `Mem0Provider` → `Mem0ContextMiddleware`
+- [ ] Create optional `ContextProviderAdapter` for gradual migration (if needed)
+
+#### Phase 5: Cleanup & Documentation
+- [ ] Remove `ContextProvider` class
+- [ ] Remove `ChatMessageStore` / `ChatMessageStoreProtocol`
+- [ ] Update all samples to use new middleware pattern
+- [ ] Write migration guide
+- [ ] Update API documentation
+
+#### Phase 6: Testing
+- [ ] Unit tests for `ContextMiddleware` and pipeline execution order
+- [ ] Unit tests for middleware factory resolution
+- [ ] Unit tests for `StorageContextMiddleware` load/store behavior
+- [ ] Unit tests for `options.store` and `service_session_id` triggers
+- [ ] Unit tests for source attribution (mandatory source_id)
+- [ ] Unit tests for `store_context_messages` and `store_context_from` options
+- [ ] Integration tests for full agent flow with middleware

From 351b533c13032be30b3fa23795112e9d3159473b Mon Sep 17 00:00:00 2001
From: Eduard van Valkenburg <github@vanvalkenburg.eu>
Date: Mon, 2 Feb 2026 10:29:03 +0100
Subject: [PATCH 02/19] Add session serialization/deserialization design to ADR

---
 .../00XX-python-context-middleware.md         | 126 ++++++++++++++++++
 1 file changed, 126 insertions(+)

diff --git a/docs/decisions/00XX-python-context-middleware.md b/docs/decisions/00XX-python-context-middleware.md
index 97f6e5e3a4..daa1c795cf 100644
--- a/docs/decisions/00XX-python-context-middleware.md
+++ b/docs/decisions/00XX-python-context-middleware.md
@@ -119,6 +119,7 @@ The following key decisions shape the ContextMiddleware design:
 | 10 | **Tool Attribution** | `add_tools()` automatically sets `tool.metadata["context_source"] = source_id`. |
 | 11 | **Clean Break** | Remove `AgentThread`, `ContextProvider`, `ChatMessageStore` completely (preview, no compatibility shims). |
 | 12 | **Middleware Ordering** | User-defined order; storage sees prior middleware (pre-processing) or all middleware (post-processing). |
+| 13 | **Session Serialization via Agent** | `session.serialize()` captures middleware state; `agent.restore_session()` reconstructs pipeline. Each middleware implements optional `serialize()`/`restore()`. |
 
 ## Considered Options
 
@@ -314,6 +315,126 @@ agent = ChatAgent(
 - "Thread" implies a sequence of messages
 - "Session" better captures the broader scope (state, middleware, lifecycle)
 
+#### 8. Session Serialization/Deserialization
+
+Sessions need to be serializable for persistence across process restarts. Serialization happens through **agent methods** (not directly on session) because the agent holds the middleware configuration needed to reconstruct the pipeline.
+
+```python
+class ContextMiddleware(ABC):
+    """Each middleware can optionally implement serialization."""
+    
+    async def serialize(self) -> Any:
+        """Serialize middleware state to a persistable object.
+        
+        Returns any object that can be serialized (typically dict for JSON).
+        Default returns None (no state to persist).
+        """
+        return None
+    
+    async def restore(self, state: Any) -> None:
+        """Restore middleware state from a previously serialized object.
+        
+        Args:
+            state: The object returned by serialize()
+        """
+        pass
+
+
+class InMemoryStorageMiddleware(StorageContextMiddleware):
+    """Example: In-memory storage serializes its messages."""
+    
+    async def serialize(self) -> dict[str, Any]:
+        return {
+            "source_id": self.source_id,
+            "messages": [msg.to_dict() for msg in self._messages],
+        }
+    
+    async def restore(self, state: dict[str, Any]) -> None:
+        self._messages = [ChatMessage.from_dict(m) for m in state.get("messages", [])]
+
+
+class AgentSession:
+    """Session serialization delegates to middleware."""
+    
+    async def serialize(self) -> dict[str, Any]:
+        """Serialize session state including all middleware state."""
+        middleware_states: dict[str, Any] = {}
+        if self._context_pipeline:
+            for middleware in self._context_pipeline:
+                state = await middleware.serialize()
+                if state is not None:
+                    middleware_states[middleware.source_id] = state
+        
+        return {
+            "session_id": self._session_id,
+            "service_session_id": self._service_session_id,
+            "middleware_states": middleware_states,
+        }
+
+
+class ChatAgent:
+    """Agent handles session restore because it owns middleware config."""
+    
+    async def restore_session(self, serialized: dict[str, Any]) -> AgentSession:
+        """Restore a session from serialized state.
+        
+        The agent must restore the session because it holds the middleware
+        configuration needed to reconstruct the pipeline.
+        
+        Args:
+            serialized: Previously serialized session state
+            
+        Returns:
+            Restored AgentSession with middleware state restored
+        """
+        session_id = serialized.get("session_id")
+        service_session_id = serialized.get("service_session_id")
+        middleware_states = serialized.get("middleware_states", {})
+        
+        # Create fresh session with new pipeline
+        session = self.get_new_session(
+            session_id=session_id,
+            service_session_id=service_session_id,
+        )
+        
+        # Restore middleware state by source_id
+        if session.context_pipeline:
+            for middleware in session.context_pipeline:
+                if middleware.source_id in middleware_states:
+                    await middleware.restore(middleware_states[middleware.source_id])
+        
+        return session
+```
+
+**Usage:**
+```python
+# Save session
+state = await session.serialize()
+json_str = json.dumps(state)  # Or store in database, Redis, etc.
+
+# Later: restore session
+state = json.loads(json_str)
+session = await agent.restore_session(state)
+
+# Continue conversation
+response = await agent.run("What did we talk about?", session=session)
+```
+
+**Key Points:**
+- `serialize()` returns `Any` - typically dict for JSON, but could be bytes, protobuf, etc.
+- Each middleware decides what state to persist (messages, counters, embeddings, etc.)
+- Stateless middleware returns `None` from `serialize()` (skipped in output)
+- Agent reconstructs pipeline from its config, then restores state by `source_id`
+- `source_id` acts as the key to match serialized state to middleware instances
+
+**Comparison to Current:**
+| Aspect | AgentThread (Current) | AgentSession (New) |
+|--------|----------------------|-------------------|
+| Serialization | `thread.serialize()` → dict with messages | `session.serialize()` → dict with middleware states |
+| Deserialization | `AgentThread.deserialize(state, message_store=...)` | `agent.restore_session(state)` |
+| What's saved | Just messages | Each middleware's custom state |
+| Restore location | Class method on AgentThread | Instance method on Agent |
+
 ### Migration Impact
 
 | Current | New | Notes |
@@ -1658,16 +1779,20 @@ class StorageWithLogging(StorageContextMiddleware):
 - [ ] Create `ContextMiddlewareFactory` type alias and resolution logic
 - [ ] Create `StorageContextMiddleware` base class with load_messages/store flags
 - [ ] Implement pipeline validation (warn on multiple loaders with `load_messages=True`)
+- [ ] Add `serialize()` and `restore()` methods to `ContextMiddleware` base class
 
 #### Phase 2: AgentSession Implementation
 - [ ] Create `AgentSession` class with `context_pipeline` attribute
 - [ ] Add `context_middleware: Sequence[ContextMiddlewareConfig]` parameter to `BaseAgent` and `ChatAgent`
 - [ ] Implement `get_new_session()` that resolves factories and creates pipeline
 - [ ] Wire up context pipeline execution in agent invocation flow
+- [ ] Implement `AgentSession.serialize()` to capture middleware states
+- [ ] Implement `Agent.restore_session()` to reconstruct session from serialized state
 - [ ] Remove `AgentThread` completely (no alias, clean break)
 
 #### Phase 3: Built-in Middleware
 - [ ] Create `InMemoryStorageMiddleware` (replaces `ChatMessageStore`)
+- [ ] Implement `serialize()`/`restore()` for `InMemoryStorageMiddleware`
 - [ ] Create `@context_middleware` decorator for function-based middleware
 
 #### Phase 4: Migrate Existing Implementations
@@ -1690,4 +1815,5 @@ class StorageWithLogging(StorageContextMiddleware):
 - [ ] Unit tests for `options.store` and `service_session_id` triggers
 - [ ] Unit tests for source attribution (mandatory source_id)
 - [ ] Unit tests for `store_context_messages` and `store_context_from` options
+- [ ] Unit tests for session serialization/deserialization
 - [ ] Integration tests for full agent flow with middleware

From 6e0dd2961d0793986df5b3abba78a8cb07de3ba0 Mon Sep 17 00:00:00 2001
From: Eduard van Valkenburg <github@vanvalkenburg.eu>
Date: Mon, 2 Feb 2026 10:43:48 +0100
Subject: [PATCH 03/19] Add Related Issues section mapping to ADR

---
 docs/decisions/00XX-python-context-middleware.md | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/docs/decisions/00XX-python-context-middleware.md b/docs/decisions/00XX-python-context-middleware.md
index daa1c795cf..80d5bdea7b 100644
--- a/docs/decisions/00XX-python-context-middleware.md
+++ b/docs/decisions/00XX-python-context-middleware.md
@@ -35,6 +35,18 @@ This creates cognitive overhead for developers doing "Context Engineering" - the
 - **Attribution**: Enable tracking which middleware added which messages/tools
 - **Zero-config**: Simple use cases should work without configuration
 
+## Related Issues
+
+This ADR addresses the following issues from the parent issue [#3575](https://github.com/microsoft/agent-framework/issues/3575):
+
+| Issue | Title | How Addressed |
+|-------|-------|---------------|
+| [#3587](https://github.com/microsoft/agent-framework/issues/3587) | Rename AgentThread to AgentSession | ✅ `AgentThread` → `AgentSession` (clean break, no alias). See [§7 Renaming](#7-renaming-thread--session). |
+| [#3588](https://github.com/microsoft/agent-framework/issues/3588) | Add get_new_session, get_session_by_id methods | ✅ `agent.get_new_session()` creates session with resolved middleware pipeline. `agent.restore_session(state)` restores from serialized state. See [§8 Serialization](#8-session-serializationdeserialization). |
+| [#3589](https://github.com/microsoft/agent-framework/issues/3589) | Move serialize method into the agent | ✅ `session.serialize()` captures state, but `agent.restore_session()` handles restoration (agent owns middleware config). See [§8 Serialization](#8-session-serializationdeserialization). |
+| [#3590](https://github.com/microsoft/agent-framework/issues/3590) | Design orthogonal ChatMessageStore for service vs local | ✅ `StorageContextMiddleware` works orthogonally: `service_session_id` presence triggers smart behavior (don't load if service manages storage). Multiple storage middleware allowed. See [§3 Unified Storage](#3-unified-storage-middleware). |
+| [#3601](https://github.com/microsoft/agent-framework/issues/3601) | Rename ChatMessageStore to ChatHistoryProvider | ✅ Superseded: `ChatMessageStore` removed entirely. Replaced by `StorageContextMiddleware` (e.g., `InMemoryStorageMiddleware`). The middleware pattern is more flexible than a "provider" rename. |
+
 ## Current State Analysis
 
 ### ContextProvider (Current)

From 10e7700abd91d3f69ba53a1e8855186b732b92ff Mon Sep 17 00:00:00 2001
From: Eduard van Valkenburg <github@vanvalkenburg.eu>
Date: Mon, 2 Feb 2026 10:55:00 +0100
Subject: [PATCH 04/19] Update session management: create_session,
 get_session_by_id, agent.serialize_session

---
 .../00XX-python-context-middleware.md         | 157 +++++++++++++-----
 1 file changed, 120 insertions(+), 37 deletions(-)

diff --git a/docs/decisions/00XX-python-context-middleware.md b/docs/decisions/00XX-python-context-middleware.md
index 80d5bdea7b..b450e8ba0c 100644
--- a/docs/decisions/00XX-python-context-middleware.md
+++ b/docs/decisions/00XX-python-context-middleware.md
@@ -42,10 +42,10 @@ This ADR addresses the following issues from the parent issue [#3575](https://gi
 | Issue | Title | How Addressed |
 |-------|-------|---------------|
 | [#3587](https://github.com/microsoft/agent-framework/issues/3587) | Rename AgentThread to AgentSession | ✅ `AgentThread` → `AgentSession` (clean break, no alias). See [§7 Renaming](#7-renaming-thread--session). |
-| [#3588](https://github.com/microsoft/agent-framework/issues/3588) | Add get_new_session, get_session_by_id methods | ✅ `agent.get_new_session()` creates session with resolved middleware pipeline. `agent.restore_session(state)` restores from serialized state. See [§8 Serialization](#8-session-serializationdeserialization). |
-| [#3589](https://github.com/microsoft/agent-framework/issues/3589) | Move serialize method into the agent | ✅ `session.serialize()` captures state, but `agent.restore_session()` handles restoration (agent owns middleware config). See [§8 Serialization](#8-session-serializationdeserialization). |
+| [#3588](https://github.com/microsoft/agent-framework/issues/3588) | Add get_new_session, get_session_by_id methods | ✅ `agent.create_session()` (no params) and `agent.get_session_by_id(id)`. See [§9 Session Management Methods](#9-session-management-methods). |
+| [#3589](https://github.com/microsoft/agent-framework/issues/3589) | Move serialize method into the agent | ✅ `agent.serialize_session(session)` and `agent.restore_session(state)`. Agent handles all serialization. See [§8 Serialization](#8-session-serializationdeserialization). |
 | [#3590](https://github.com/microsoft/agent-framework/issues/3590) | Design orthogonal ChatMessageStore for service vs local | ✅ `StorageContextMiddleware` works orthogonally: `service_session_id` presence triggers smart behavior (don't load if service manages storage). Multiple storage middleware allowed. See [§3 Unified Storage](#3-unified-storage-middleware). |
-| [#3601](https://github.com/microsoft/agent-framework/issues/3601) | Rename ChatMessageStore to ChatHistoryProvider | ✅ Superseded: `ChatMessageStore` removed entirely. Replaced by `StorageContextMiddleware` (e.g., `InMemoryStorageMiddleware`). The middleware pattern is more flexible than a "provider" rename. |
+| [#3601](https://github.com/microsoft/agent-framework/issues/3601) | Rename ChatMessageStore to ChatHistoryProvider | 🔒 **Closed** - Superseded by this ADR. `ChatMessageStore` removed entirely, replaced by `StorageContextMiddleware`. |
 
 ## Current State Analysis
 
@@ -131,7 +131,8 @@ The following key decisions shape the ContextMiddleware design:
 | 10 | **Tool Attribution** | `add_tools()` automatically sets `tool.metadata["context_source"] = source_id`. |
 | 11 | **Clean Break** | Remove `AgentThread`, `ContextProvider`, `ChatMessageStore` completely (preview, no compatibility shims). |
 | 12 | **Middleware Ordering** | User-defined order; storage sees prior middleware (pre-processing) or all middleware (post-processing). |
-| 13 | **Session Serialization via Agent** | `session.serialize()` captures middleware state; `agent.restore_session()` reconstructs pipeline. Each middleware implements optional `serialize()`/`restore()`. |
+| 13 | **Agent-owned Serialization** | `agent.serialize_session(session)` and `agent.restore_session(state)`. Agent handles all serialization. |
+| 14 | **Session Management Methods** | `agent.create_session()` (no required params) and `agent.get_session_by_id(id)` for clear lifecycle management. |
 
 ## Considered Options
 
@@ -209,7 +210,7 @@ agent = ChatAgent(
 )
 
 # Session holds the resolved pipeline
-session = agent.get_new_session()
+session = agent.create_session()
 ```
 
 **Comparison to Current:**
@@ -285,7 +286,7 @@ Zero-config works out of the box:
 ```python
 # No middleware configured - still gets conversation history!
 agent = ChatAgent(chat_client=client, name="assistant")
-session = agent.get_new_session()
+session = agent.create_session()
 response = await agent.run("Hello!", session=session)
 response = await agent.run("What did I say?", session=session)  # Remembers!
 ```
@@ -329,7 +330,7 @@ agent = ChatAgent(
 
 #### 8. Session Serialization/Deserialization
 
-Sessions need to be serializable for persistence across process restarts. Serialization happens through **agent methods** (not directly on session) because the agent holds the middleware configuration needed to reconstruct the pipeline.
+Sessions need to be serializable for persistence across process restarts. Serialization happens through **agent methods** because the agent holds the middleware configuration needed to reconstruct the pipeline.
 
 ```python
 class ContextMiddleware(ABC):
@@ -365,27 +366,33 @@ class InMemoryStorageMiddleware(StorageContextMiddleware):
         self._messages = [ChatMessage.from_dict(m) for m in state.get("messages", [])]
 
 
-class AgentSession:
-    """Session serialization delegates to middleware."""
+class ChatAgent:
+    """Agent handles all session serialization."""
     
-    async def serialize(self) -> dict[str, Any]:
-        """Serialize session state including all middleware state."""
+    async def serialize_session(self, session: AgentSession) -> dict[str, Any]:
+        """Serialize a session's state for persistence.
+        
+        The agent handles serialization because it understands the middleware
+        configuration and can coordinate state capture across all middleware.
+        
+        Args:
+            session: The session to serialize
+            
+        Returns:
+            Serialized state that can be persisted (JSON-compatible dict)
+        """
         middleware_states: dict[str, Any] = {}
-        if self._context_pipeline:
-            for middleware in self._context_pipeline:
+        if session.context_pipeline:
+            for middleware in session.context_pipeline:
                 state = await middleware.serialize()
                 if state is not None:
                     middleware_states[middleware.source_id] = state
         
         return {
-            "session_id": self._session_id,
-            "service_session_id": self._service_session_id,
+            "session_id": session.session_id,
+            "service_session_id": session.service_session_id,
             "middleware_states": middleware_states,
         }
-
-
-class ChatAgent:
-    """Agent handles session restore because it owns middleware config."""
     
     async def restore_session(self, serialized: dict[str, Any]) -> AgentSession:
         """Restore a session from serialized state.
@@ -404,7 +411,7 @@ class ChatAgent:
         middleware_states = serialized.get("middleware_states", {})
         
         # Create fresh session with new pipeline
-        session = self.get_new_session(
+        session = self.create_session(
             session_id=session_id,
             service_session_id=service_session_id,
         )
@@ -421,7 +428,7 @@ class ChatAgent:
 **Usage:**
 ```python
 # Save session
-state = await session.serialize()
+state = await agent.serialize_session(session)
 json_str = json.dumps(state)  # Or store in database, Redis, etc.
 
 # Later: restore session
@@ -433,19 +440,95 @@ response = await agent.run("What did we talk about?", session=session)
 ```
 
 **Key Points:**
+- `agent.serialize_session(session)` - agent handles serialization
+- `agent.restore_session(state)` - agent handles restoration
+- Each middleware implements optional `serialize()` and `restore()` methods
 - `serialize()` returns `Any` - typically dict for JSON, but could be bytes, protobuf, etc.
-- Each middleware decides what state to persist (messages, counters, embeddings, etc.)
 - Stateless middleware returns `None` from `serialize()` (skipped in output)
-- Agent reconstructs pipeline from its config, then restores state by `source_id`
 - `source_id` acts as the key to match serialized state to middleware instances
 
 **Comparison to Current:**
 | Aspect | AgentThread (Current) | AgentSession (New) |
 |--------|----------------------|-------------------|
-| Serialization | `thread.serialize()` → dict with messages | `session.serialize()` → dict with middleware states |
+| Serialization | `thread.serialize()` | `agent.serialize_session(session)` |
 | Deserialization | `AgentThread.deserialize(state, message_store=...)` | `agent.restore_session(state)` |
 | What's saved | Just messages | Each middleware's custom state |
-| Restore location | Class method on AgentThread | Instance method on Agent |
+| Owner | Thread class | Agent instance |
+
+#### 9. Session Management Methods
+
+The agent provides clear methods for session lifecycle management:
+
+```python
+class ChatAgent:
+    def create_session(
+        self,
+        *,
+        session_id: str | None = None,
+        service_session_id: str | None = None,
+    ) -> AgentSession:
+        """Create a new session with a fresh middleware pipeline.
+        
+        This is the primary way to create sessions. Middleware factories
+        are called with the session_id to create session-specific instances.
+        
+        Args:
+            session_id: Optional session ID (generated if not provided)
+            service_session_id: Optional service-managed session ID
+            
+        Returns:
+            New AgentSession with resolved middleware pipeline
+        """
+        resolved_session_id = session_id or str(uuid.uuid4())
+        
+        pipeline = None
+        if self._context_middleware:
+            pipeline = ContextMiddlewarePipeline.from_config(
+                self._context_middleware,
+                session_id=resolved_session_id,
+            )
+        
+        return AgentSession(
+            session_id=resolved_session_id,
+            service_session_id=service_session_id,
+            context_pipeline=pipeline,
+        )
+    
+    def get_session_by_id(self, session_id: str) -> AgentSession:
+        """Get a session by ID with a fresh middleware pipeline.
+        
+        Use this when you have a session ID but no persisted state.
+        The middleware pipeline is freshly created (no state restored).
+        
+        For restoring a session with state, use restore_session() instead.
+        
+        Args:
+            session_id: The session ID to use
+            
+        Returns:
+            AgentSession with the specified ID and fresh middleware
+        """
+        return self.create_session(session_id=session_id)
+```
+
+**Usage:**
+```python
+# Create a brand new session
+session = agent.create_session()
+
+# Create session with specific ID (e.g., from external system)
+session = agent.create_session(session_id="user-123-session-456")
+
+# Create session for service-managed storage
+session = agent.create_session(service_session_id="thread_abc123")
+
+# Get session by ID (fresh pipeline, no state)
+session = agent.get_session_by_id("existing-session-id")
+
+# Restore session with full state
+state = load_from_database(session_id)
+session = await agent.restore_session(state)
+```
 
 ### Migration Impact
 
@@ -497,7 +580,7 @@ agent = ChatAgent(
         RAGMiddleware("rag"),
     ]
 )
-session = agent.get_new_session()
+session = agent.create_session()
 response = await agent.run("Hello", session=session)
 ```
 
@@ -1057,7 +1140,7 @@ class AgentSession:
     AgentSession manages the conversation state and owns a ContextMiddlewarePipeline
     that processes context before each invocation and handles responses after.
 
-    Note: The session is created by calling agent.get_new_session(), which constructs
+    Note: The session is created by calling agent.create_session(), which constructs
     the pipeline from the agent's context_middleware sequence, resolving any factories.
 
     Attributes:
@@ -1075,7 +1158,7 @@ class AgentSession:
     ):
         """Initialize the session.
 
-        Note: Prefer using agent.get_new_session() instead of direct construction.
+        Note: Prefer using agent.create_session() instead of direct construction.
 
         Default storage behavior (applied at runtime, not init):
         - If service_session_id is set: service handles storage, no default added
@@ -1208,7 +1291,7 @@ class ChatAgent:
         self._context_middleware = list(context_middleware or [])
         # ... other init
 
-    def get_new_session(
+    def create_session(
         self,
         *,
         session_id: str | None = None,
@@ -1262,7 +1345,7 @@ agent = ChatAgent(
 )
 
 # Create session - automatically gets InMemoryStorageMiddleware on first run
-session = agent.get_new_session()
+session = agent.create_session()
 response = await agent.run("Hello, my name is Alice!", session=session)
 
 # Conversation history is preserved automatically
@@ -1270,13 +1353,13 @@ response = await agent.run("What's my name?", session=session)
 # Agent remembers: "Your name is Alice!"
 
 # With service-managed session - no default storage added (service handles it)
-service_session = agent.get_new_session()
+service_session = agent.create_session()
 
 # With store=True in options - user expects service storage, no default added
 response = await agent.run("Hello!", session=session, options={"store": True})
 
 # User can manually add middleware to session before first run
-session = agent.get_new_session()
+session = agent.create_session()
 session.context_pipeline = ContextMiddlewarePipeline([
     MyCustomMiddleware(source_id="custom")
 ])
@@ -1299,7 +1382,7 @@ agent = ChatAgent(
 )
 
 # Create session and chat
-session = agent.get_new_session()
+session = agent.create_session()
 response = await agent.run("Hello!", session=session)
 
 # Messages are automatically stored and loaded on next invocation
@@ -1348,8 +1431,8 @@ agent = ChatAgent(
 )
 
 # Each session gets a fresh SessionSpecificMiddleware instance
-session1 = agent.get_new_session()
-session2 = agent.get_new_session()
+session1 = agent.create_session()
+session2 = agent.create_session()
 # session1 and session2 have independent invocation_count
 ```
 
@@ -1539,7 +1622,7 @@ agent = ChatAgent(
     ]
 )
 
-session = agent.get_new_session()
+session = agent.create_session()
 
 # Normal run - loads and stores messages
 response = await agent.run("Hello!", session=session)
@@ -1796,7 +1879,7 @@ class StorageWithLogging(StorageContextMiddleware):
 #### Phase 2: AgentSession Implementation
 - [ ] Create `AgentSession` class with `context_pipeline` attribute
 - [ ] Add `context_middleware: Sequence[ContextMiddlewareConfig]` parameter to `BaseAgent` and `ChatAgent`
-- [ ] Implement `get_new_session()` that resolves factories and creates pipeline
+- [ ] Implement `create_session()` that resolves factories and creates pipeline
 - [ ] Wire up context pipeline execution in agent invocation flow
 - [ ] Implement `AgentSession.serialize()` to capture middleware states
 - [ ] Implement `Agent.restore_session()` to reconstruct session from serialized state

From 307337378d39c8b5590ae6b4a4ff8879be83e865 Mon Sep 17 00:00:00 2001
From: eavanvalkenburg <github@vanvalkenburg.eu>
Date: Wed, 4 Feb 2026 12:11:54 +0100
Subject: [PATCH 05/19] ADR: Add hooks alternative, context compaction
 discussion, and PR feedback

- Add Option 3: ContextHooks with before_run/after_run pattern
- Add detailed pros/cons for both wrapper and hooks approaches
- Add Open Discussion section on context compaction strategies
- Clarify response_messages is read-only (use AgentMiddleware for modifications)
- Add SimpleRAG examples showing input-only filtering
- Clarify default storage only added when NO middleware configured
- Add RAGWithBuffer examples for self-managed history
- Rename hook methods to before_run/after_run
---
 .../00XX-python-context-middleware.md         | 884 +++++++++++++++++-
 1 file changed, 843 insertions(+), 41 deletions(-)

diff --git a/docs/decisions/00XX-python-context-middleware.md b/docs/decisions/00XX-python-context-middleware.md
index b450e8ba0c..0e2ffbcceb 100644
--- a/docs/decisions/00XX-python-context-middleware.md
+++ b/docs/decisions/00XX-python-context-middleware.md
@@ -35,6 +35,78 @@ This creates cognitive overhead for developers doing "Context Engineering" - the
 - **Attribution**: Enable tracking which middleware added which messages/tools
 - **Zero-config**: Simple use cases should work without configuration
 
+## Open Discussion: Context Compaction
+
+### Problem Statement
+
+A common need for long-running agents is **context compaction** - automatically summarizing or truncating conversation history when approaching token limits. This is particularly important for agents that make many tool calls in succession (10s or 100s), where the context can grow unboundedly.
+
+Currently, this is challenging because:
+- `ChatMessageStore.list_messages()` is only called once at the start of `agent.run()`, not during the tool loop
+- `ChatMiddleware` operates on a copy of messages, so modifications don't persist across tool loop iterations
+- The function calling loop happens deep within the `ChatClient`, which is below the agent level
+
+### Design Question
+
+Should `ContextMiddleware`/`ContextHooks` be invoked:
+1. **Only at agent invocation boundaries** (current proposal) - before/after each `agent.run()` call
+2. **During the tool loop** - before/after each model call within a single `agent.run()`
+
+### Boundary vs In-Run Compaction
+
+While boundary and in-run compaction could potentially use the same mechanism, they have **different goals and behaviors**:
+
+**Boundary compaction** (before/after `agent.run()`):
+- **Before run**: Keep context manageable - load a compacted view of history
+- **After run**: Keep storage compact - summarize/truncate before persisting
+- Useful for maintaining reasonable context sizes across conversation turns
+- One reason to have **multiple storage middleware**: persist compacted history for use during runs, while also storing the full uncompacted history for auditing and evaluations
+
+**In-run compaction** (during function calling loops):
+- Relevant for **function calling scenarios** where many tool calls accumulate
+- Typically **in-memory only** - no need to persist intermediate compaction and only useful when the conversation/session is _not_ managed by the service
+- Different strategies apply:
+  - Remove old function call/result pairs entirely/Keep only the most recent N tool interactions
+  - Replace call/result pairs with a single summary message (with a different role)
+  - Summarize several function call/result pairs into one larger context message
+
+### Service-Managed vs Local Storage
+
+**Important:** In-run compaction is relevant only for **non-service-managed histories**. When using service-managed storage (`service_session_id` is set):
+- The service handles history management internally
+- Only the new calls and results are sent to/from the service each turn
+- The service is responsible for its own compaction strategy, but we do not control that
+
+For local storage, a full message list is sent to the model each time, making compaction the client's responsibility.
+
+### Options
+
+**Option A: Invocation-boundary only (current proposal)**
+- Simpler mental model
+- Consistent with `AgentMiddleware` pattern
+- In-run compaction would need to happen via a separate mechanism (e.g., `ChatMiddleware` at the client level)
+- Risk: Different compaction mechanisms at different layers could be confusing
+
+**Option B: Also during tool loops**
+- Single mechanism for all context manipulation
+- More powerful but more complex
+- Requires coordination with `ChatClient` internals
+- Risk: Performance overhead if middleware/hooks are expensive
+
+**Option C: Unified approach across layers**
+- Define a single context compaction abstraction that works at both agent and client levels
+- `ContextMiddleware`/`ContextHooks` could delegate to `ChatMiddleware` for mid-loop execution
+- Requires deeper architectural thought
+
+### Potential Extension Points (for any option)
+
+Regardless of the chosen approach, these extension points could support compaction:
+- A `CompactionStrategy` that can be shared between middleware/hooks and function calling configuration
+- Hooks for `ChatClient` to notify the agent layer when context limits are approaching
+- A unified `ContextManager` that coordinates compaction across layers
+
+**This section requires further discussion.**
+
 ## Related Issues
 
 This ADR addresses the following issues from the parent issue [#3575](https://github.com/microsoft/agent-framework/issues/3575):
@@ -143,30 +215,588 @@ Keep `ContextProvider`, `ChatMessageStore`, and `AgentThread` as separate concep
 **Pros:**
 - No migration required
 - Familiar to existing users
+- Each concept has a clear, focused responsibility
+- Existing documentation and examples remain valid
 
 **Cons:**
-- Cognitive overhead remains
-- No composability for context providers
-- Inconsistent with middleware pattern used elsewhere
+- Cognitive overhead: three concepts to learn for context management
+- No composability: only one `ContextProvider` per thread
+- Inconsistent with middleware pattern used elsewhere in the framework
+- `invoking()`/`invoked()` split makes related pre/post logic harder to follow
+- No source attribution for debugging which provider added which context
+- `ChatMessageStore` and `ContextProvider` overlap conceptually but are separate APIs
 
-### Option 2: ContextMiddleware (Chosen)
+### Option 2: ContextMiddleware - Wrapper Pattern
 
 Create a unified `ContextMiddleware` that uses the onion/wrapper pattern (like existing `AgentMiddleware`, `ChatMiddleware`) to handle all context-related concerns.
 
+```python
+class ContextMiddleware(ABC):
+    def __init__(self, source_id: str, *, session_id: str | None = None):
+        self.source_id = source_id
+        self.session_id = session_id
+
+    @abstractmethod
+    async def process(self, context: SessionContext, next: ContextMiddlewareNext) -> None:
+        """Wrap the context flow - modify before next(), process after."""
+        # Pre-processing: add context, modify messages
+        context.add_messages(self.source_id, [...])
+
+        await next(context)  # Call next middleware or terminal handler
+
+        # Post-processing: log, store, react to response
+        await self.store(context.response_messages)
+```
+
 **Pros:**
 - Single concept for all context engineering
-- Familiar pattern from other middleware in the framework
-- Natural composition via pipeline
-- Pre/post processing in one method
+- Familiar pattern from other middleware in the framework (`AgentMiddleware`, `ChatMiddleware`)
+- Natural composition via pipeline with clear execution order
+- Pre/post processing in one method keeps related logic together
 - Source attribution built-in
+- Full control over the invocation chain (can short-circuit, retry, wrap with try/catch)
+- Exception handling naturally scoped to the middleware that caused it
+
+**Cons:**
+- Forgetting `await next(context)` silently breaks the chain
+- Stack depth increases with each middleware layer
+- Harder to implement middleware that only needs pre OR post processing
+
+### Option 3: ContextHooks - Pre/Post Pattern
+
+Create a `ContextHooks` abstraction with explicit `before_run()` and `after_run()` methods, diverging from the wrapper pattern used by middleware.
+
+```python
+class ContextHooks(ABC):
+    def __init__(self, source_id: str, *, session_id: str | None = None):
+        self.source_id = source_id
+        self.session_id = session_id
+
+    async def before_run(self, context: SessionContext) -> None:
+        """Called before model invocation. Modify context here."""
+        pass
+
+    async def after_run(self, context: SessionContext) -> None:
+        """Called after model invocation. React to response here."""
+        pass
+```
+
+**Alternative naming options:**
+
+| Name | Rationale |
+|------|-----------|
+| `ContextHooks` | Emphasizes the hook-based nature, familiar from React/Git hooks |
+| `ContextHandler` | Generic term for something that handles context events |
+| `ContextInterceptor` | Common in Java/Spring, emphasizes interception points |
+| `ContextProcessor` | Emphasizes processing at defined stages |
+| `ContextPlugin` | Emphasizes extensibility, familiar from build tools |
+| `SessionHooks` | Ties to `AgentSession`, emphasizes session lifecycle |
+| `InvokeHooks` | Directly describes what's being hooked (the invoke call) |
+
+**Example usage:**
+
+```python
+class RAGHooks(ContextHooks):
+    async def before_run(self, context: SessionContext) -> None:
+        docs = await self.retrieve_documents(context.input_messages[-1].text)
+        context.add_messages(self.source_id, [ChatMessage.system(f"Context: {docs}")])
+
+    async def after_run(self, context: SessionContext) -> None:
+        await self.store_interaction(context.input_messages, context.response_messages)
+
+
+# Pipeline execution is linear, not nested:
+# 1. hook1.before_run(context)
+# 2. hook2.before_run(context)
+# 3. <model invocation>
+# 4. hook2.after_run(context)  # Reverse order for symmetry
+# 5. hook1.after_run(context)
+
+agent = ChatAgent(
+    chat_client=client,
+    context_hooks=[
+        InMemoryStorageHooks("memory"),
+        RAGHooks("rag"),
+    ]
+)
+```
+
+**Pros:**
+- Simpler mental model: "before" runs before, "after" runs after - no nesting to understand
+- Clearer separation between what this does vs what Agent Middleware can do.
+- Impossible to forget calling `next()` - the framework handles sequencing
+- Easier to implement hooks that only need one phase (just override one method)
+- Lower cognitive overhead for developers new to middleware patterns
+- Clearer separation of concerns: pre-processing logic separate from post-processing
+- Easier to test: no need to mock `next` callable, just call methods directly
+- Flatter stack traces when debugging
+- More similar to the current `ContextProvider` API (`invoking`/`invoked`), easing migration
+- Explicit about what happens when: no hidden control flow
 
 **Cons:**
-- Breaking change (acceptable in preview)
-- Migration effort for existing users
+- Diverges from the wrapper pattern used by `AgentMiddleware` and `ChatMiddleware`
+- Less powerful: cannot short-circuit the chain or implement retry logic
+- No "around" advice: cannot wrap invocation in try/catch or timing block
+- Exception in `before_run` may leave state inconsistent if no cleanup in `after_run`
+- Two methods to implement instead of one (though both are optional)
+- Harder to share state between before/after (need instance variables)
+- Cannot control whether subsequent hooks run (no early termination)
 
 ## Decision Outcome
 
-Chosen option: **"Option 2: ContextMiddleware"**, because it significantly reduces cognitive overhead, follows established patterns in the framework, and enables powerful composition for context engineering scenarios.
+**TBD** - This ADR presents two viable approaches:
+
+- **Option 2: ContextMiddleware (Wrapper Pattern)** - Consistent with existing middleware patterns, more powerful control flow
+- **Option 3: ContextHooks (Pre/Post Pattern)** - Simpler mental model, easier migration from current `ContextProvider`
+
+Both options share the same:
+- Agent vs Session ownership model
+- `source_id` attribution
+- Serialization/deserialization via agent methods
+- Session management methods (`create_session`, `get_session_by_id`, `serialize_session`, `restore_session`)
+- Renaming `AgentThread` → `AgentSession`
+
+The key difference is the execution model: nested wrapper vs linear phases.
+
+---
+
+## Detailed Design: Option 3 (ContextHooks - Pre/Post Pattern)
+
+### Key Design Decisions
+
+#### 1. Linear Pre/Post Pattern
+
+Unlike the wrapper pattern, hooks execute in a linear sequence with explicit phases:
+
+```python
+class ContextHooks(ABC):
+    def __init__(self, source_id: str, *, session_id: str | None = None):
+        self.source_id = source_id
+        self.session_id = session_id
+
+    async def before_run(self, context: SessionContext) -> None:
+        """Called before model invocation. Modify context here."""
+        pass
+
+    async def after_run(self, context: SessionContext) -> None:
+        """Called after model invocation. React to response here."""
+        pass
+```
+
+**Comparison to Current:**
+| Aspect | ContextProvider (Current) | ContextHooks (New) |
+|--------|--------------------------|------------------------|
+| Pre-processing | `invoking()` method | `before_run()` method |
+| Post-processing | `invoked()` method | `after_run()` method |
+| Composition | Single provider only | Pipeline of hooks |
+| Pattern | Callback hooks | Linear hooks (similar but composable) |
+
+#### 2. Agent vs Session Ownership
+
+Same ownership model as Option 2 - **Agent** owns configuration, **AgentSession** owns resolved pipeline:
+
+```python
+# Agent holds hooks configuration
+agent = ChatAgent(
+    chat_client=client,
+    context_hooks=[
+        InMemoryStorageHooks("memory"),
+        RAGContextHooks("rag"),
+    ]
+)
+
+# Session holds the resolved pipeline
+session = agent.create_session()
+```
+
+#### 3. Unified Storage Hooks
+
+Storage is a type of `ContextHooks`:
+
+```python
+class StorageContextHooks(ContextHooks):
+    def __init__(
+        self,
+        source_id: str,
+        *,
+        load_messages: bool | None = None,  # None = smart mode
+        store_inputs: bool = True,
+        store_responses: bool = True,
+        store_context_messages: bool = False,
+        store_context_from: Sequence[str] | None = None,
+    ): ...
+
+    async def before_run(self, context: SessionContext) -> None:
+        # Load messages into context
+        if self._should_load(context):
+            messages = await self.get_messages(context.session_id)
+            context.add_messages(self.source_id, messages)
+
+    async def after_run(self, context: SessionContext) -> None:
+        # Store messages after invocation
+        await self.save_messages(context)
+```
+
+#### 4. Source Attribution via `source_id`
+
+Same as Option 2 - every hook has a required `source_id`:
+
+```python
+class SessionContext:
+    context_messages: dict[str, list[ChatMessage]]
+
+    def add_messages(self, source_id: str, messages: Sequence[ChatMessage]) -> None:
+        if source_id not in self.context_messages:
+            self.context_messages[source_id] = []
+        self.context_messages[source_id].extend(messages)
+```
+
+#### 5. Default Storage Behavior
+
+Zero-config works out of the box:
+
+```python
+# No hooks configured - still gets conversation history!
+agent = ChatAgent(chat_client=client, name="assistant")
+session = agent.create_session()
+response = await agent.run("Hello!", session=session)
+response = await agent.run("What did I say?", session=session)  # Remembers!
+```
+
+Default `InMemoryStorageHooks` is added at runtime **only when**:
+- No `service_session_id` (service not managing storage)
+- `options.store` is not `True` (user not expecting service storage)
+- **No hooks pipeline configured at all** (pipeline is empty or None)
+
+**Important:** If the user configures *any* hooks (even non-storage hooks), the framework does **not** automatically add storage. This is intentional:
+- Once users start customizing the pipeline, they should be considered advanced, they should explicitly configure storage
+- Automatic insertion would create ordering ambiguity (should storage be first? last?)
+- Explicit configuration is clearer than implicit behavior for non-trivial setups
+- We could consider adding a warning when no storage is present, while store=False and not service_session_id is set
+
+```python
+# This agent has NO automatic storage - user configured hooks but no storage
+agent = ChatAgent(
+    chat_client=client,
+    context_hooks=[RAGContextHooks("rag")]  # No storage hook!
+)
+session = agent.create_session()
+await agent.run("Hello!", session=session)
+await agent.run("What did I say?", session=session)  # Won't remember!
+
+# To get storage, explicitly add it, in the right order:
+agent = ChatAgent(
+    chat_client=client,
+    context_hooks=[
+        InMemoryStorageHooks("memory"),  # Explicit storage
+        RAGContextHooks("rag"),
+    ]
+)
+```
+
+#### 6. Hooks Instance vs Factory
+
+Same pattern as Option 2 - support both shared instances and per-session factories:
+
+```python
+# Instance (shared across sessions)
+agent = ChatAgent(
+    context_hooks=[RAGContextHooks("rag")]
+)
+
+# Factory (new instance per session)
+def create_session_cache(session_id: str | None) -> ContextHooks:
+    return SessionCacheHooks("cache", session_id=session_id)
+
+agent = ChatAgent(
+    context_hooks=[create_session_cache]
+)
+```
+
+#### 7. Renaming: Thread → Session
+
+Same as Option 2 - `AgentThread` becomes `AgentSession`.
+
+#### 8. Session Serialization/Deserialization
+
+Same agent-owned serialization pattern as Option 2:
+
+```python
+class ContextHooks(ABC):
+    async def serialize(self) -> Any:
+        """Serialize hooks state. Default returns None (no state)."""
+        return None
+
+    async def restore(self, state: Any) -> None:
+        """Restore hooks state from serialized object."""
+        pass
+
+
+class InMemoryStorageHooks(StorageContextHooks):
+    async def serialize(self) -> dict[str, Any]:
+        return {
+            "source_id": self.source_id,
+            "messages": [msg.to_dict() for msg in self._messages],
+        }
+
+    async def restore(self, state: dict[str, Any]) -> None:
+        self._messages = [ChatMessage.from_dict(m) for m in state.get("messages", [])]
+```
+
+#### 9. Session Management Methods
+
+Same API as Option 2:
+
+```python
+class ChatAgent:
+    def create_session(
+        self,
+        *,
+        session_id: str | None = None,
+        service_session_id: str | None = None,
+    ) -> AgentSession: ...
+
+    def get_session_by_id(self, session_id: str) -> AgentSession: ...
+
+    async def serialize_session(self, session: AgentSession) -> dict[str, Any]: ...
+
+    async def restore_session(self, serialized: dict[str, Any]) -> AgentSession: ...
+```
+
+### Pipeline Execution Model
+
+The key difference from Option 2 is the execution model:
+
+```python
+class ContextHooksPipeline:
+    def __init__(self, hooks: Sequence[ContextHooks]):
+        self._hooks = list(hooks)
+
+    async def run(self, context: SessionContext, invoke: Callable) -> None:
+        # Phase 1: All before_run in order
+        for hook in self._hooks:
+            await hook.before_run(context)
+
+        # Phase 2: Model invocation
+        await invoke(context)
+
+        # Phase 3: All after_run in reverse order (symmetry)
+        for hook in reversed(self._hooks):
+            await hook.after_run(context)
+```
+
+**Execution flow comparison:**
+
+```
+Option 2 (Wrapper/Onion):          Option 3 (Hooks/Linear):
+┌─────────────────────────┐        ┌─────────────────────────┐
+│ middleware1.process()   │        │ hook1.before_run()   │
+│  ┌───────────────────┐  │        │ hook2.before_run()   │
+│  │ middleware2.process│  │        │ hook3.before_run()   │
+│  │  ┌─────────────┐  │  │        ├─────────────────────────┤
+│  │  │   invoke    │  │  │   vs   │      <invoke>           │
+│  │  └─────────────┘  │  │        ├─────────────────────────┤
+│  │ (post-processing) │  │        │ hook3.after_run()    │
+│  └───────────────────┘  │        │ hook2.after_run()    │
+│ (post-processing)       │        │ hook1.after_run()    │
+└─────────────────────────┘        └─────────────────────────┘
+```
+
+### Accessing Context from Other Hooks
+
+Non-storage hooks can read context added by other hooks via `context.context_messages`. However, hooks should operate under the assumption that **only the current input messages are available** - there is no implicit conversation history.
+
+If a hook needs historical context (e.g., a RAG hook that wants to search based on the last few messages, not just the newest), it must **maintain its own message buffer** as part of its instance state. This makes the hook self-contained and predictable, similar to how storage hooks manage their own persistence.
+
+**Key principles:**
+- `context.input_messages` contains only the new message(s) for this invocation
+- `context.context_messages` contains messages added by hooks that ran earlier in the pipeline
+- For history beyond current input, hooks must track it themselves
+- Use `serialize()`/`restore()` to persist the buffer across sessions
+
+**Example: RAG hook with conversation history buffer**
+
+```python
+class RAGWithBufferHooks(ContextHooks):
+    """RAG hook that uses recent conversation history for better retrieval."""
+
+    def __init__(
+        self,
+        source_id: str,
+        retriever: Retriever,
+        *,
+        buffer_window: int = 5,  # Number of recent exchanges to consider
+        session_id: str | None = None,
+    ):
+        super().__init__(source_id, session_id=session_id)
+        self._retriever = retriever
+        self._buffer_window = buffer_window
+        self._message_buffer: list[ChatMessage] = []  # Self-managed history
+
+    async def before_run(self, context: SessionContext) -> None:
+        # Build search query from current input + recent history
+        recent_messages = self._message_buffer[-self._buffer_window * 2:]  # pairs of user/assistant
+        search_context = recent_messages + list(context.input_messages)
+
+        # Use conversation context for better retrieval
+        query = self._build_search_query(search_context)
+        docs = await self._retriever.search(query)
+
+        # Add retrieved context
+        context.add_messages(self.source_id, [
+            ChatMessage.system(f"Relevant context:\n{self._format_docs(docs)}")
+        ])
+
+    async def after_run(self, context: SessionContext) -> None:
+        # Update our own history buffer with this exchange
+        self._message_buffer.extend(context.input_messages)
+        if context.response_messages:
+            self._message_buffer.extend(context.response_messages)
+
+        # Trim to prevent unbounded growth
+        max_messages = self._buffer_window * 4  # Keep some buffer
+        if len(self._message_buffer) > max_messages:
+            self._message_buffer = self._message_buffer[-max_messages:]
+
+    async def serialize(self) -> dict[str, Any]:
+        """Persist the history buffer."""
+        return {
+            "source_id": self.source_id,
+            "message_buffer": [msg.to_dict() for msg in self._message_buffer],
+        }
+
+    async def restore(self, state: dict[str, Any]) -> None:
+        """Restore the history buffer."""
+        self._message_buffer = [
+            ChatMessage.from_dict(m) for m in state.get("message_buffer", [])
+        ]
+
+    def _build_search_query(self, messages: list[ChatMessage]) -> str:
+        # Combine recent messages into a search query
+        return " ".join(msg.text for msg in messages if msg.text)
+
+    def _format_docs(self, docs: list[Document]) -> str:
+        return "\n\n".join(doc.content for doc in docs)
+```
+
+**Usage:**
+```python
+agent = ChatAgent(
+    chat_client=client,
+    context_hooks=[
+        InMemoryStorageHooks("memory"),
+        RAGWithBufferHooks("rag", retriever=my_retriever, buffer_window=3),
+    ]
+)
+
+session = agent.create_session()
+
+# First message - RAG uses only this message
+await agent.run("What is Python?", session=session)
+
+# Second message - RAG now uses both messages for better retrieval
+await agent.run("How does it compare to JavaScript?", session=session)
+
+# The RAG hook's internal buffer now contains the conversation,
+# enabling context-aware retrieval even though it's not a storage hook
+```
+
+This pattern allows any hook to behave like a "mini storage" for its own purposes while keeping the clear separation between storage hooks (which persist the canonical conversation) and context hooks (which enhance the invocation).
+
+**Example: Simple RAG using only current input (no history)**
+
+If you want RAG that only uses the current user input (ignoring conversation history), simply use `context.input_messages` directly:
+
+```python
+class SimpleRAGHooks(ContextHooks):
+    """RAG hook that uses only the current input for retrieval."""
+
+    def __init__(self, source_id: str, retriever: Retriever):
+        super().__init__(source_id)
+        self._retriever = retriever
+
+    async def before_run(self, context: SessionContext) -> None:
+        # Use ONLY the current input - no history needed
+        query = " ".join(msg.text for msg in context.input_messages if msg.text)
+        docs = await self._retriever.search(query)
+
+        context.add_messages(self.source_id, [
+            ChatMessage.system(f"Relevant context:\n{self._format_docs(docs)}")
+        ])
+
+    def _format_docs(self, docs: list[Document]) -> str:
+        return "\n\n".join(doc.content for doc in docs)
+
+
+# Usage - storage hook provides history to the model, RAG only uses current input
+agent = ChatAgent(
+    chat_client=client,
+    context_hooks=[
+        InMemoryStorageHooks("memory"),  # Loads full history for the model
+        SimpleRAGHooks("rag", retriever=my_retriever),  # Only uses current input
+    ]
+)
+```
+
+The key distinction:
+- `context.input_messages` - only the new message(s) passed to this `agent.run()` call
+- `context.context_messages` - messages added by other hooks (e.g., history loaded by storage)
+- `context.get_all_messages()` - combines everything for the model
+
+### Example: Current vs New (Option 3)
+
+**Current:**
+```python
+class MyContextProvider(ContextProvider):
+    async def invoking(self, messages, **kwargs) -> Context:
+        docs = await self.retrieve_documents(messages[-1].text)
+        return Context(messages=[ChatMessage.system(f"Context: {docs}")])
+
+    async def invoked(self, request, response, **kwargs) -> None:
+        await self.store_interaction(request, response)
+
+async with MyContextProvider() as provider:
+    agent = ChatAgent(chat_client=client, name="assistant")
+    thread = await agent.get_new_thread(message_store=ChatMessageStore())
+    thread.context_provider = provider
+    response = await agent.run("Hello", thread=thread)
+```
+
+**New (Option 3 - Hooks):**
+```python
+class RAGHooks(ContextHooks):
+    async def before_run(self, context: SessionContext) -> None:
+        docs = await self.retrieve_documents(context.input_messages[-1].text)
+        context.add_messages(self.source_id, [ChatMessage.system(f"Context: {docs}")])
+
+    async def after_run(self, context: SessionContext) -> None:
+        await self.store_interaction(context.input_messages, context.response_messages)
+
+agent = ChatAgent(
+    chat_client=client,
+    name="assistant",
+    context_hooks=[
+        InMemoryStorageHooks("memory"),
+        RAGHooks("rag"),
+    ]
+)
+session = agent.create_session()
+response = await agent.run("Hello", session=session)
+```
+
+### Migration Impact (Option 3)
+
+| Current | New (Option 3) | Notes |
+|---------|----------------|-------|
+| `ContextProvider` | `ContextHooks` | Rename `invoking()` → `before_run()`, `invoked()` → `after_run()` |
+| `ChatMessageStore` | `StorageContextHooks` | Extend and implement storage methods |
+| `AgentThread` | `AgentSession` | Clean break, no alias |
+| `thread.message_store` | Via hooks in pipeline | Configure at agent level |
+| `thread.context_provider` | Via hooks in pipeline | Multiple hooks supported |
+
+---
+
+## Detailed Design: Option 2 (ContextMiddleware - Wrapper Pattern)
 
 ### Key Design Decisions
 
@@ -291,10 +921,35 @@ response = await agent.run("Hello!", session=session)
 response = await agent.run("What did I say?", session=session)  # Remembers!
 ```
 
-Default `InMemoryStorageMiddleware` is added at runtime when:
+Default `InMemoryStorageMiddleware` is added at runtime **only when**:
 - No `service_session_id` (service not managing storage)
 - `options.store` is not `True` (user not expecting service storage)
-- Pipeline is empty or None
+- **No middleware pipeline configured at all** (pipeline is empty or None)
+
+**Important:** If the user configures *any* middleware (even non-storage middleware), the framework does **not** automatically add storage. This is intentional:
+- Once users start customizing the pipeline, they should explicitly configure storage
+- Automatic insertion would create ordering ambiguity (should storage be first? last?)
+- Explicit configuration is clearer than implicit behavior for non-trivial setups
+
+```python
+# This agent has NO automatic storage - user configured middleware but no storage
+agent = ChatAgent(
+    chat_client=client,
+    context_middleware=[RAGContextMiddleware("rag")]  # No storage middleware!
+)
+session = agent.create_session()
+await agent.run("Hello!", session=session)
+await agent.run("What did I say?", session=session)  # Won't remember!
+
+# To get storage, explicitly add it:
+agent = ChatAgent(
+    chat_client=client,
+    context_middleware=[
+        InMemoryStorageMiddleware("memory"),  # Explicit storage
+        RAGContextMiddleware("rag"),
+    ]
+)
+```
 
 **Comparison to Current:**
 | Aspect | AgentThread (Current) | AgentSession (New) |
@@ -335,18 +990,18 @@ Sessions need to be serializable for persistence across process restarts. Serial
 ```python
 class ContextMiddleware(ABC):
     """Each middleware can optionally implement serialization."""
-    
+
     async def serialize(self) -> Any:
         """Serialize middleware state to a persistable object.
-        
+
         Returns any object that can be serialized (typically dict for JSON).
         Default returns None (no state to persist).
         """
         return None
-    
+
     async def restore(self, state: Any) -> None:
         """Restore middleware state from a previously serialized object.
-        
+
         Args:
             state: The object returned by serialize()
         """
@@ -355,29 +1010,29 @@ class ContextMiddleware(ABC):
 
 class InMemoryStorageMiddleware(StorageContextMiddleware):
     """Example: In-memory storage serializes its messages."""
-    
+
     async def serialize(self) -> dict[str, Any]:
         return {
             "source_id": self.source_id,
             "messages": [msg.to_dict() for msg in self._messages],
         }
-    
+
     async def restore(self, state: dict[str, Any]) -> None:
         self._messages = [ChatMessage.from_dict(m) for m in state.get("messages", [])]
 
 
 class ChatAgent:
     """Agent handles all session serialization."""
-    
+
     async def serialize_session(self, session: AgentSession) -> dict[str, Any]:
         """Serialize a session's state for persistence.
-        
+
         The agent handles serialization because it understands the middleware
         configuration and can coordinate state capture across all middleware.
-        
+
         Args:
             session: The session to serialize
-            
+
         Returns:
             Serialized state that can be persisted (JSON-compatible dict)
         """
@@ -387,41 +1042,41 @@ class ChatAgent:
                 state = await middleware.serialize()
                 if state is not None:
                     middleware_states[middleware.source_id] = state
-        
+
         return {
             "session_id": session.session_id,
             "service_session_id": session.service_session_id,
             "middleware_states": middleware_states,
         }
-    
+
     async def restore_session(self, serialized: dict[str, Any]) -> AgentSession:
         """Restore a session from serialized state.
-        
+
         The agent must restore the session because it holds the middleware
         configuration needed to reconstruct the pipeline.
-        
+
         Args:
             serialized: Previously serialized session state
-            
+
         Returns:
             Restored AgentSession with middleware state restored
         """
         session_id = serialized.get("session_id")
         service_session_id = serialized.get("service_session_id")
         middleware_states = serialized.get("middleware_states", {})
-        
+
         # Create fresh session with new pipeline
         session = self.create_session(
             session_id=session_id,
             service_session_id=service_session_id,
         )
-        
+
         # Restore middleware state by source_id
         if session.context_pipeline:
             for middleware in session.context_pipeline:
                 if middleware.source_id in middleware_states:
                     await middleware.restore(middleware_states[middleware.source_id])
-        
+
         return session
 ```
 
@@ -468,43 +1123,43 @@ class ChatAgent:
         service_session_id: str | None = None,
     ) -> AgentSession:
         """Create a new session with a fresh middleware pipeline.
-        
+
         This is the primary way to create sessions. Middleware factories
         are called with the session_id to create session-specific instances.
-        
+
         Args:
             session_id: Optional session ID (generated if not provided)
             service_session_id: Optional service-managed session ID
-            
+
         Returns:
             New AgentSession with resolved middleware pipeline
         """
         resolved_session_id = session_id or str(uuid.uuid4())
-        
+
         pipeline = None
         if self._context_middleware:
             pipeline = ContextMiddlewarePipeline.from_config(
                 self._context_middleware,
                 session_id=resolved_session_id,
             )
-        
+
         return AgentSession(
             session_id=resolved_session_id,
             service_session_id=service_session_id,
             context_pipeline=pipeline,
         )
-    
+
     def get_session_by_id(self, session_id: str) -> AgentSession:
         """Get a session by ID with a fresh middleware pipeline.
-        
+
         Use this when you have a session ID but no persisted state.
         The middleware pipeline is freshly created (no state restored).
-        
+
         For restoring a session with state, use restore_session() instead.
-        
+
         Args:
             session_id: The session ID to use
-            
+
         Returns:
             AgentSession with the specified ID and fresh middleware
         """
@@ -530,6 +1185,151 @@ state = load_from_database(session_id)
 session = await agent.restore_session(state)
 ```
 
+### Accessing Context from Other Middleware
+
+Non-storage middleware can read context added by other middleware via `context.context_messages`. However, middleware should operate under the assumption that **only the current input messages are available** - there is no implicit conversation history.
+
+If a middleware needs historical context (e.g., a RAG middleware that wants to search based on the last few messages, not just the newest), it must **maintain its own message buffer** as part of its instance state. This makes the middleware self-contained and predictable, similar to how storage middleware manages its own persistence.
+
+**Key principles:**
+- `context.input_messages` contains only the new message(s) for this invocation
+- `context.context_messages` contains messages added by middleware that ran earlier in the pipeline
+- For history beyond current input, middleware must track it themselves
+- Use `serialize()`/`restore()` to persist the buffer across sessions
+
+**Example: RAG middleware with conversation history buffer**
+
+```python
+class RAGWithBufferMiddleware(ContextMiddleware):
+    """RAG middleware that uses recent conversation history for better retrieval."""
+
+    def __init__(
+        self,
+        source_id: str,
+        retriever: Retriever,
+        *,
+        buffer_window: int = 5,  # Number of recent exchanges to consider
+        session_id: str | None = None,
+    ):
+        super().__init__(source_id, session_id=session_id)
+        self._retriever = retriever
+        self._buffer_window = buffer_window
+        self._message_buffer: list[ChatMessage] = []  # Self-managed history
+
+    async def process(self, context: SessionContext, next: ContextMiddlewareNext) -> None:
+        # Build search query from current input + recent history
+        recent_messages = self._message_buffer[-self._buffer_window * 2:]  # pairs of user/assistant
+        search_context = recent_messages + list(context.input_messages)
+
+        # Use conversation context for better retrieval
+        query = self._build_search_query(search_context)
+        docs = await self._retriever.search(query)
+
+        # Add retrieved context
+        context.add_messages(self.source_id, [
+            ChatMessage.system(f"Relevant context:\n{self._format_docs(docs)}")
+        ])
+
+        # Call next middleware
+        await next(context)
+
+        # Update our own history buffer with this exchange
+        self._message_buffer.extend(context.input_messages)
+        if context.response_messages:
+            self._message_buffer.extend(context.response_messages)
+
+        # Trim to prevent unbounded growth
+        max_messages = self._buffer_window * 4  # Keep some buffer
+        if len(self._message_buffer) > max_messages:
+            self._message_buffer = self._message_buffer[-max_messages:]
+
+    async def serialize(self) -> dict[str, Any]:
+        """Persist the history buffer."""
+        return {
+            "source_id": self.source_id,
+            "message_buffer": [msg.to_dict() for msg in self._message_buffer],
+        }
+
+    async def restore(self, state: dict[str, Any]) -> None:
+        """Restore the history buffer."""
+        self._message_buffer = [
+            ChatMessage.from_dict(m) for m in state.get("message_buffer", [])
+        ]
+
+    def _build_search_query(self, messages: list[ChatMessage]) -> str:
+        # Combine recent messages into a search query
+        return " ".join(msg.text for msg in messages if msg.text)
+
+    def _format_docs(self, docs: list[Document]) -> str:
+        return "\n\n".join(doc.content for doc in docs)
+```
+
+**Usage:**
+```python
+agent = ChatAgent(
+    chat_client=client,
+    context_middleware=[
+        InMemoryStorageMiddleware("memory"),
+        RAGWithBufferMiddleware("rag", retriever=my_retriever, buffer_window=3),
+    ]
+)
+
+session = agent.create_session()
+
+# First message - RAG uses only this message
+await agent.run("What is Python?", session=session)
+
+# Second message - RAG now uses both messages for better retrieval
+await agent.run("How does it compare to JavaScript?", session=session)
+
+# The RAG middleware's internal buffer now contains the conversation,
+# enabling context-aware retrieval even though it's not a storage middleware
+```
+
+This pattern allows any middleware to behave like a "mini storage" for its own purposes while keeping the clear separation between storage middleware (which persist the canonical conversation) and context middleware (which enhance the invocation).
+
+**Example: Simple RAG using only current input (no history)**
+
+If you want RAG that only uses the current user input (ignoring conversation history), simply use `context.input_messages` directly:
+
+```python
+class SimpleRAGMiddleware(ContextMiddleware):
+    """RAG middleware that uses only the current input for retrieval."""
+
+    def __init__(self, source_id: str, retriever: Retriever):
+        super().__init__(source_id)
+        self._retriever = retriever
+
+    async def process(self, context: SessionContext, next: ContextMiddlewareNext) -> None:
+        # Use ONLY the current input - no history needed
+        query = " ".join(msg.text for msg in context.input_messages if msg.text)
+        docs = await self._retriever.search(query)
+
+        context.add_messages(self.source_id, [
+            ChatMessage.system(f"Relevant context:\n{self._format_docs(docs)}")
+        ])
+
+        await next(context)
+
+    def _format_docs(self, docs: list[Document]) -> str:
+        return "\n\n".join(doc.content for doc in docs)
+
+
+# Usage - storage middleware provides history to the model, RAG only uses current input
+agent = ChatAgent(
+    chat_client=client,
+    context_middleware=[
+        InMemoryStorageMiddleware("memory"),  # Loads full history for the model
+        SimpleRAGMiddleware("rag", retriever=my_retriever),  # Only uses current input
+    ]
+)
+```
+
+The key distinction:
+- `context.input_messages` - only the new message(s) passed to this `agent.run()` call
+- `context.context_messages` - messages added by other middleware (e.g., history loaded by storage)
+- `context.get_all_messages()` - combines everything for the model
+
 ### Migration Impact
 
 | Current | New | Notes |
@@ -624,12 +1424,14 @@ class SessionContext:
             to add messages with proper source attribution.
         instructions: Additional instructions - middleware can append here
         tools: Additional tools - middleware can append here
-        response_messages: After invocation, contains the agent's response (set by agent)
+        response_messages: After invocation, contains the agent's response (set by agent).
+            READ-ONLY - modifications are ignored. Use AgentMiddleware to modify responses.
         options: Options passed to agent.run() - READ-ONLY, for reflection only
         metadata: Shared metadata dictionary for cross-middleware communication
 
     Note:
         - `options` is read-only; changes will NOT be merged back into the agent run
+        - `response_messages` is read-only; use AgentMiddleware to modify responses
         - `instructions` and `tools` are merged by the agent into the run options
         - `context_messages` values are flattened in order when building the final input
     """

From 85ba6c5dcc36e269bc4ae968cb3cde17c51bd09b Mon Sep 17 00:00:00 2001
From: eavanvalkenburg <github@vanvalkenburg.eu>
Date: Wed, 4 Feb 2026 14:25:04 +0100
Subject: [PATCH 06/19] ADR: Restructure and add .NET comparison

- Add class hierarchy clarification for both options
- Merge detailed design sections (side-by-side comparison)
- Move detailed design before decision outcome
- Move compaction discussion after decision
- Add .NET implementation comparison (feature equivalence)
- Update .NET method names to match actual implementation
- Rename hook methods to before_run/after_run
- Fix storage context table for injected context
---
 .../00XX-python-context-middleware.md         | 1272 +++++------------
 1 file changed, 364 insertions(+), 908 deletions(-)

diff --git a/docs/decisions/00XX-python-context-middleware.md b/docs/decisions/00XX-python-context-middleware.md
index 0e2ffbcceb..4c031dfd6a 100644
--- a/docs/decisions/00XX-python-context-middleware.md
+++ b/docs/decisions/00XX-python-context-middleware.md
@@ -35,78 +35,6 @@ This creates cognitive overhead for developers doing "Context Engineering" - the
 - **Attribution**: Enable tracking which middleware added which messages/tools
 - **Zero-config**: Simple use cases should work without configuration
 
-## Open Discussion: Context Compaction
-
-### Problem Statement
-
-A common need for long-running agents is **context compaction** - automatically summarizing or truncating conversation history when approaching token limits. This is particularly important for agents that make many tool calls in succession (10s or 100s), where the context can grow unboundedly.
-
-Currently, this is challenging because:
-- `ChatMessageStore.list_messages()` is only called once at the start of `agent.run()`, not during the tool loop
-- `ChatMiddleware` operates on a copy of messages, so modifications don't persist across tool loop iterations
-- The function calling loop happens deep within the `ChatClient`, which is below the agent level
-
-### Design Question
-
-Should `ContextMiddleware`/`ContextHooks` be invoked:
-1. **Only at agent invocation boundaries** (current proposal) - before/after each `agent.run()` call
-2. **During the tool loop** - before/after each model call within a single `agent.run()`
-
-### Boundary vs In-Run Compaction
-
-While boundary and in-run compaction could potentially use the same mechanism, they have **different goals and behaviors**:
-
-**Boundary compaction** (before/after `agent.run()`):
-- **Before run**: Keep context manageable - load a compacted view of history
-- **After run**: Keep storage compact - summarize/truncate before persisting
-- Useful for maintaining reasonable context sizes across conversation turns
-- One reason to have **multiple storage middleware**: persist compacted history for use during runs, while also storing the full uncompacted history for auditing and evaluations
-
-**In-run compaction** (during function calling loops):
-- Relevant for **function calling scenarios** where many tool calls accumulate
-- Typically **in-memory only** - no need to persist intermediate compaction and only useful when the conversation/session is _not_ managed by the service
-- Different strategies apply:
-  - Remove old function call/result pairs entirely/Keep only the most recent N tool interactions
-  - Replace call/result pairs with a single summary message (with a different role)
-  - Summarize several function call/result pairs into one larger context message
-
-### Service-Managed vs Local Storage
-
-**Important:** In-run compaction is relevant only for **non-service-managed histories**. When using service-managed storage (`service_session_id` is set):
-- The service handles history management internally
-- Only the new calls and results are sent to/from the service each turn
-- The service is responsible for its own compaction strategy, but we do not control that
-
-For local storage, a full message list is sent to the model each time, making compaction the client's responsibility.
-
-### Options
-
-**Option A: Invocation-boundary only (current proposal)**
-- Simpler mental model
-- Consistent with `AgentMiddleware` pattern
-- In-run compaction would need to happen via a separate mechanism (e.g., `ChatMiddleware` at the client level)
-- Risk: Different compaction mechanisms at different layers could be confusing
-
-**Option B: Also during tool loops**
-- Single mechanism for all context manipulation
-- More powerful but more complex
-- Requires coordination with `ChatClient` internals
-- Risk: Performance overhead if middleware/hooks are expensive
-
-**Option C: Unified approach across layers**
-- Define a single context compaction abstraction that works at both agent and client levels
-- `ContextMiddleware`/`ContextHooks` could delegate to `ChatMiddleware` for mid-loop execution
-- Requires deeper architectural thought
-
-### Potential Extension Points (for any option)
-
-Regardless of the chosen approach, these extension points could support compaction:
-- A `CompactionStrategy` that can be shared between middleware/hooks and function calling configuration
-- Hooks for `ChatClient` to notify the agent layer when context limits are approaching
-- A unified `ContextManager` that coordinates compaction across layers
-
-**This section requires further discussion.**
-
 ## Related Issues
 
 This ADR addresses the following issues from the parent issue [#3575](https://github.com/microsoft/agent-framework/issues/3575):
@@ -228,7 +156,11 @@ Keep `ContextProvider`, `ChatMessageStore`, and `AgentThread` as separate concep
 
 ### Option 2: ContextMiddleware - Wrapper Pattern
 
-Create a unified `ContextMiddleware` that uses the onion/wrapper pattern (like existing `AgentMiddleware`, `ChatMiddleware`) to handle all context-related concerns.
+Create a unified `ContextMiddleware` base class that uses the onion/wrapper pattern (like existing `AgentMiddleware`, `ChatMiddleware`) to handle all context-related concerns. This includes a `StorageContextMiddleware` subclass specifically for history persistence.
+
+**Class hierarchy:**
+- `ContextMiddleware` (base) - for general context injection (RAG, instructions, tools)
+- `StorageContextMiddleware(ContextMiddleware)` - for conversation history storage (in-memory, Redis, Cosmos, etc.)
 
 ```python
 class ContextMiddleware(ABC):
@@ -264,7 +196,11 @@ class ContextMiddleware(ABC):
 
 ### Option 3: ContextHooks - Pre/Post Pattern
 
-Create a `ContextHooks` abstraction with explicit `before_run()` and `after_run()` methods, diverging from the wrapper pattern used by middleware.
+Create a `ContextHooks` base class with explicit `before_run()` and `after_run()` methods, diverging from the wrapper pattern used by middleware. This includes a `StorageContextHooks` subclass specifically for history persistence.
+
+**Class hierarchy:**
+- `ContextHooks` (base) - for general context injection (RAG, instructions, tools)
+- `StorageContextHooks(ContextHooks)` - for conversation history storage (in-memory, Redis, Cosmos, etc.)
 
 ```python
 class ContextHooks(ABC):
@@ -342,61 +278,75 @@ agent = ChatAgent(
 - Harder to share state between before/after (need instance variables)
 - Cannot control whether subsequent hooks run (no early termination)
 
-## Decision Outcome
-
-**TBD** - This ADR presents two viable approaches:
-
-- **Option 2: ContextMiddleware (Wrapper Pattern)** - Consistent with existing middleware patterns, more powerful control flow
-- **Option 3: ContextHooks (Pre/Post Pattern)** - Simpler mental model, easier migration from current `ContextProvider`
-
-Both options share the same:
-- Agent vs Session ownership model
-- `source_id` attribution
-- Serialization/deserialization via agent methods
-- Session management methods (`create_session`, `get_session_by_id`, `serialize_session`, `restore_session`)
-- Renaming `AgentThread` → `AgentSession`
-
-The key difference is the execution model: nested wrapper vs linear phases.
-
----
+## Detailed Design
 
-## Detailed Design: Option 3 (ContextHooks - Pre/Post Pattern)
+This section covers the design decisions that apply to both approaches. Where the approaches differ, both are shown.
 
-### Key Design Decisions
+### 1. Execution Pattern
 
-#### 1. Linear Pre/Post Pattern
+The core difference between the two options is the execution model:
 
-Unlike the wrapper pattern, hooks execute in a linear sequence with explicit phases:
+**Option 2 - Middleware (Wrapper/Onion):**
+```python
+class ContextMiddleware(ABC):
+    @abstractmethod
+    async def process(self, context: SessionContext, next: ContextMiddlewareNext) -> None:
+        # Pre-processing
+        context.add_messages(self.source_id, [...])
+        await next(context)  # Call next middleware
+        # Post-processing
+        await self.store(context.response_messages)
+```
 
+**Option 3 - Hooks (Linear):**
 ```python
 class ContextHooks(ABC):
-    def __init__(self, source_id: str, *, session_id: str | None = None):
-        self.source_id = source_id
-        self.session_id = session_id
-
     async def before_run(self, context: SessionContext) -> None:
-        """Called before model invocation. Modify context here."""
-        pass
+        """Called before model invocation."""
+        context.add_messages(self.source_id, [...])
 
     async def after_run(self, context: SessionContext) -> None:
-        """Called after model invocation. React to response here."""
-        pass
+        """Called after model invocation."""
+        await self.store(context.response_messages)
 ```
 
-**Comparison to Current:**
-| Aspect | ContextProvider (Current) | ContextHooks (New) |
-|--------|--------------------------|------------------------|
-| Pre-processing | `invoking()` method | `before_run()` method |
-| Post-processing | `invoked()` method | `after_run()` method |
-| Composition | Single provider only | Pipeline of hooks |
-| Pattern | Callback hooks | Linear hooks (similar but composable) |
+**Execution flow comparison:**
 
-#### 2. Agent vs Session Ownership
+```
+Middleware (Wrapper/Onion):            Hooks (Linear):
+┌─────────────────────────┐            ┌─────────────────────────┐
+│ middleware1.process()   │            │ hook1.before_run()      │
+│  ┌───────────────────┐  │            │ hook2.before_run()      │
+│  │ middleware2.process│  │            │ hook3.before_run()      │
+│  │  ┌─────────────┐  │  │            ├─────────────────────────┤
+│  │  │   invoke    │  │  │     vs     │      <invoke>           │
+│  │  └─────────────┘  │  │            ├─────────────────────────┤
+│  │ (post-processing) │  │            │ hook3.after_run()       │
+│  └───────────────────┘  │            │ hook2.after_run()       │
+│ (post-processing)       │            │ hook1.after_run()       │
+└─────────────────────────┘            └─────────────────────────┘
+```
+
+### 2. Agent vs Session Ownership
+
+Both approaches use the same ownership model:
+- **Agent** owns the configuration (instances or factories)
+- **AgentSession** owns the resolved pipeline (created at runtime)
 
-Same ownership model as Option 2 - **Agent** owns configuration, **AgentSession** owns resolved pipeline:
+**Middleware:**
+```python
+agent = ChatAgent(
+    chat_client=client,
+    context_middleware=[
+        InMemoryStorageMiddleware("memory"),
+        RAGContextMiddleware("rag"),
+    ]
+)
+session = agent.create_session()
+```
 
+**Hooks:**
 ```python
-# Agent holds hooks configuration
 agent = ChatAgent(
     chat_client=client,
     context_hooks=[
@@ -404,15 +354,36 @@ agent = ChatAgent(
         RAGContextHooks("rag"),
     ]
 )
-
-# Session holds the resolved pipeline
 session = agent.create_session()
 ```
 
-#### 3. Unified Storage Hooks
+**Comparison to Current:**
+| Aspect | AgentThread (Current) | AgentSession (New) |
+|--------|----------------------|-------------------|
+| Storage | `message_store` attribute | Via storage middleware/hooks in pipeline |
+| Context | `context_provider` attribute | Via any middleware/hooks in pipeline |
+| Composition | One of each | Unlimited middleware/hooks |
+
+### 3. Unified Storage
+
+Instead of separate `ChatMessageStore`, storage is a subclass of the base context type:
 
-Storage is a type of `ContextHooks`:
+**Middleware:**
+```python
+class StorageContextMiddleware(ContextMiddleware):
+    def __init__(
+        self,
+        source_id: str,
+        *,
+        load_messages: bool | None = None,  # None = smart mode
+        store_inputs: bool = True,
+        store_responses: bool = True,
+        store_context_messages: bool = False,
+        store_context_from: Sequence[str] | None = None,
+    ): ...
+```
 
+**Hooks:**
 ```python
 class StorageContextHooks(ContextHooks):
     def __init__(
@@ -425,21 +396,24 @@ class StorageContextHooks(ContextHooks):
         store_context_messages: bool = False,
         store_context_from: Sequence[str] | None = None,
     ): ...
+```
 
-    async def before_run(self, context: SessionContext) -> None:
-        # Load messages into context
-        if self._should_load(context):
-            messages = await self.get_messages(context.session_id)
-            context.add_messages(self.source_id, messages)
+**Smart Load Behavior (both approaches):**
+- `load_messages=None` (default): Automatically disable loading when:
+  - `context.options.get('store') == False`, OR
+  - `context.service_session_id is not None` (service handles storage)
 
-    async def after_run(self, context: SessionContext) -> None:
-        # Store messages after invocation
-        await self.save_messages(context)
-```
+**Comparison to Current:**
+| Aspect | ChatMessageStore (Current) | Storage Middleware/Hooks (New) |
+|--------|---------------------------|------------------------------|
+| Load messages | Always via `list_messages()` | Configurable `load_messages` flag |
+| Store messages | Always via `add_messages()` | Configurable `store_*` flags |
+| What to store | All messages | Selective: inputs, responses, context |
+| Injected context | Not supported | `store_context_messages=True/False` + `store_context_from=[source_ids]` for filtering |
 
-#### 4. Source Attribution via `source_id`
+### 4. Source Attribution via `source_id`
 
-Same as Option 2 - every hook has a required `source_id`:
+Both approaches require a `source_id` for attribution (identical implementation):
 
 ```python
 class SessionContext:
@@ -449,103 +423,118 @@ class SessionContext:
         if source_id not in self.context_messages:
             self.context_messages[source_id] = []
         self.context_messages[source_id].extend(messages)
+
+    def get_messages(
+        self,
+        sources: Sequence[str] | None = None,
+        exclude_sources: Sequence[str] | None = None,
+    ) -> list[ChatMessage]:
+        """Get messages, optionally filtered by source."""
+        ...
 ```
 
-#### 5. Default Storage Behavior
+**Benefits:**
+- Debug which middleware/hooks added which messages
+- Filter messages by source (e.g., exclude RAG from storage)
+- Multiple instances of same type distinguishable
+
+### 5. Default Storage Behavior
 
-Zero-config works out of the box:
+Zero-config works out of the box (both approaches):
 
 ```python
-# No hooks configured - still gets conversation history!
+# No middleware/hooks configured - still gets conversation history!
 agent = ChatAgent(chat_client=client, name="assistant")
 session = agent.create_session()
 response = await agent.run("Hello!", session=session)
 response = await agent.run("What did I say?", session=session)  # Remembers!
 ```
 
-Default `InMemoryStorageHooks` is added at runtime **only when**:
+Default in-memory storage is added at runtime **only when**:
 - No `service_session_id` (service not managing storage)
 - `options.store` is not `True` (user not expecting service storage)
-- **No hooks pipeline configured at all** (pipeline is empty or None)
+- **No pipeline configured at all** (pipeline is empty or None)
 
-**Important:** If the user configures *any* hooks (even non-storage hooks), the framework does **not** automatically add storage. This is intentional:
-- Once users start customizing the pipeline, they should be considered advanced, they should explicitly configure storage
-- Automatic insertion would create ordering ambiguity (should storage be first? last?)
-- Explicit configuration is clearer than implicit behavior for non-trivial setups
-- We could consider adding a warning when no storage is present, while store=False and not service_session_id is set
+**Important:** If the user configures *any* middleware/hooks (even non-storage ones), the framework does **not** automatically add storage. This is intentional:
+- Once users start customizing the pipeline, we consider them a advanced user and they should know what they are doing,. therefore they should explicitly configure storage
+- Automatic insertion would create ordering ambiguity
+- Explicit configuration is clearer than implicit behavior
 
-```python
-# This agent has NO automatic storage - user configured hooks but no storage
-agent = ChatAgent(
-    chat_client=client,
-    context_hooks=[RAGContextHooks("rag")]  # No storage hook!
-)
-session = agent.create_session()
-await agent.run("Hello!", session=session)
-await agent.run("What did I say?", session=session)  # Won't remember!
+### 6. Instance vs Factory
 
-# To get storage, explicitly add it, in the right order:
-agent = ChatAgent(
-    chat_client=client,
-    context_hooks=[
-        InMemoryStorageHooks("memory"),  # Explicit storage
-        RAGContextHooks("rag"),
-    ]
-)
-```
+Both approaches support shared instances and per-session factories:
+
+**Middleware:**
+```python
+# Instance (shared across sessions)
+agent = ChatAgent(context_middleware=[RAGContextMiddleware("rag")])
 
-#### 6. Hooks Instance vs Factory
+# Factory (new instance per session)
+def create_cache(session_id: str | None) -> ContextMiddleware:
+    return SessionCacheMiddleware("cache", session_id=session_id)
 
-Same pattern as Option 2 - support both shared instances and per-session factories:
+agent = ChatAgent(context_middleware=[create_cache])
+```
 
+**Hooks:**
 ```python
 # Instance (shared across sessions)
-agent = ChatAgent(
-    context_hooks=[RAGContextHooks("rag")]
-)
+agent = ChatAgent(context_hooks=[RAGContextHooks("rag")])
 
 # Factory (new instance per session)
-def create_session_cache(session_id: str | None) -> ContextHooks:
+def create_cache(session_id: str | None) -> ContextHooks:
     return SessionCacheHooks("cache", session_id=session_id)
 
-agent = ChatAgent(
-    context_hooks=[create_session_cache]
-)
+agent = ChatAgent(context_hooks=[create_cache])
 ```
 
-#### 7. Renaming: Thread → Session
+### 7. Renaming: Thread → Session
 
-Same as Option 2 - `AgentThread` becomes `AgentSession`.
+`AgentThread` becomes `AgentSession` to better reflect its purpose:
+- "Thread" implies a sequence of messages
+- "Session" better captures the broader scope (state, pipeline, lifecycle)
 
-#### 8. Session Serialization/Deserialization
+### 8. Session Serialization/Deserialization
 
-Same agent-owned serialization pattern as Option 2:
+Both approaches use the same agent-owned serialization pattern:
 
+**Base class (both approaches):**
 ```python
-class ContextHooks(ABC):
-    async def serialize(self) -> Any:
-        """Serialize hooks state. Default returns None (no state)."""
-        return None
-
-    async def restore(self, state: Any) -> None:
-        """Restore hooks state from serialized object."""
-        pass
-
+# ContextMiddleware or ContextHooks - same interface
+async def serialize(self) -> Any:
+    """Serialize state. Default returns None (no state)."""
+    return None
+
+async def restore(self, state: Any) -> None:
+    """Restore state from serialized object."""
+    pass
+```
 
-class InMemoryStorageHooks(StorageContextHooks):
-    async def serialize(self) -> dict[str, Any]:
+**Agent methods (identical for both):**
+```python
+class ChatAgent:
+    async def serialize_session(self, session: AgentSession) -> dict[str, Any]:
+        """Serialize a session's state for persistence."""
+        middleware_states: dict[str, Any] = {}
+        if session.context_pipeline:
+            for item in session.context_pipeline:
+                state = await item.serialize()
+                if state is not None:
+                    middleware_states[item.source_id] = state
         return {
-            "source_id": self.source_id,
-            "messages": [msg.to_dict() for msg in self._messages],
+            "session_id": session.session_id,
+            "service_session_id": session.service_session_id,
+            "middleware_states": middleware_states,
         }
 
-    async def restore(self, state: dict[str, Any]) -> None:
-        self._messages = [ChatMessage.from_dict(m) for m in state.get("messages", [])]
+    async def restore_session(self, serialized: dict[str, Any]) -> AgentSession:
+        """Restore a session from serialized state."""
+        ...
 ```
 
-#### 9. Session Management Methods
+### 9. Session Management Methods
 
-Same API as Option 2:
+Both approaches use identical agent methods:
 
 ```python
 class ChatAgent:
@@ -554,196 +543,106 @@ class ChatAgent:
         *,
         session_id: str | None = None,
         service_session_id: str | None = None,
-    ) -> AgentSession: ...
+    ) -> AgentSession:
+        """Create a new session with a fresh pipeline."""
+        ...
 
-    def get_session_by_id(self, session_id: str) -> AgentSession: ...
+    def get_session_by_id(self, session_id: str) -> AgentSession:
+        """Get a session by ID with a fresh pipeline."""
+        return self.create_session(session_id=session_id)
 
     async def serialize_session(self, session: AgentSession) -> dict[str, Any]: ...
-
     async def restore_session(self, serialized: dict[str, Any]) -> AgentSession: ...
 ```
 
-### Pipeline Execution Model
-
-The key difference from Option 2 is the execution model:
-
+**Usage (identical for both):**
 ```python
-class ContextHooksPipeline:
-    def __init__(self, hooks: Sequence[ContextHooks]):
-        self._hooks = list(hooks)
-
-    async def run(self, context: SessionContext, invoke: Callable) -> None:
-        # Phase 1: All before_run in order
-        for hook in self._hooks:
-            await hook.before_run(context)
-
-        # Phase 2: Model invocation
-        await invoke(context)
-
-        # Phase 3: All after_run in reverse order (symmetry)
-        for hook in reversed(self._hooks):
-            await hook.after_run(context)
+session = agent.create_session()
+session = agent.create_session(session_id="user-123-session-456")
+session = agent.create_session(service_session_id="thread_abc123")
+session = agent.get_session_by_id("existing-session-id")
+session = await agent.restore_session(state)
 ```
 
-**Execution flow comparison:**
+### 10. Accessing Context from Other Middleware/Hooks
 
-```
-Option 2 (Wrapper/Onion):          Option 3 (Hooks/Linear):
-┌─────────────────────────┐        ┌─────────────────────────┐
-│ middleware1.process()   │        │ hook1.before_run()   │
-│  ┌───────────────────┐  │        │ hook2.before_run()   │
-│  │ middleware2.process│  │        │ hook3.before_run()   │
-│  │  ┌─────────────┐  │  │        ├─────────────────────────┤
-│  │  │   invoke    │  │  │   vs   │      <invoke>           │
-│  │  └─────────────┘  │  │        ├─────────────────────────┤
-│  │ (post-processing) │  │        │ hook3.after_run()    │
-│  └───────────────────┘  │        │ hook2.after_run()    │
-│ (post-processing)       │        │ hook1.after_run()    │
-└─────────────────────────┘        └─────────────────────────┘
-```
+Non-storage middleware/hooks can read context added by others via `context.context_messages`. However, they should operate under the assumption that **only the current input messages are available** - there is no implicit conversation history.
 
-### Accessing Context from Other Hooks
+If historical context is needed (e.g., RAG using last few messages), maintain a **self-managed buffer**, which would look something like this:
 
-Non-storage hooks can read context added by other hooks via `context.context_messages`. However, hooks should operate under the assumption that **only the current input messages are available** - there is no implicit conversation history.
+**Middleware:**
+```python
+class RAGWithBufferMiddleware(ContextMiddleware):
+    def __init__(self, source_id: str, retriever: Retriever, *, buffer_window: int = 5):
+        super().__init__(source_id)
+        self._retriever = retriever
+        self._buffer_window = buffer_window
+        self._message_buffer: list[ChatMessage] = []
 
-If a hook needs historical context (e.g., a RAG hook that wants to search based on the last few messages, not just the newest), it must **maintain its own message buffer** as part of its instance state. This makes the hook self-contained and predictable, similar to how storage hooks manage their own persistence.
+    async def process(self, context: SessionContext, next: ContextMiddlewareNext) -> None:
+        # Use buffer + current input for retrieval
+        recent = self._message_buffer[-self._buffer_window * 2:]
+        query = self._build_query(recent + list(context.input_messages))
+        docs = await self._retriever.search(query)
+        context.add_messages(self.source_id, [ChatMessage.system(f"Context: {docs}")])
 
-**Key principles:**
-- `context.input_messages` contains only the new message(s) for this invocation
-- `context.context_messages` contains messages added by hooks that ran earlier in the pipeline
-- For history beyond current input, hooks must track it themselves
-- Use `serialize()`/`restore()` to persist the buffer across sessions
+        await next(context)
 
-**Example: RAG hook with conversation history buffer**
+        # Update buffer
+        self._message_buffer.extend(context.input_messages)
+        if context.response_messages:
+            self._message_buffer.extend(context.response_messages)
+```
 
+**Hooks:**
 ```python
 class RAGWithBufferHooks(ContextHooks):
-    """RAG hook that uses recent conversation history for better retrieval."""
-
-    def __init__(
-        self,
-        source_id: str,
-        retriever: Retriever,
-        *,
-        buffer_window: int = 5,  # Number of recent exchanges to consider
-        session_id: str | None = None,
-    ):
-        super().__init__(source_id, session_id=session_id)
+    def __init__(self, source_id: str, retriever: Retriever, *, buffer_window: int = 5):
+        super().__init__(source_id)
         self._retriever = retriever
         self._buffer_window = buffer_window
-        self._message_buffer: list[ChatMessage] = []  # Self-managed history
+        self._message_buffer: list[ChatMessage] = []
 
     async def before_run(self, context: SessionContext) -> None:
-        # Build search query from current input + recent history
-        recent_messages = self._message_buffer[-self._buffer_window * 2:]  # pairs of user/assistant
-        search_context = recent_messages + list(context.input_messages)
-
-        # Use conversation context for better retrieval
-        query = self._build_search_query(search_context)
+        recent = self._message_buffer[-self._buffer_window * 2:]
+        query = self._build_query(recent + list(context.input_messages))
         docs = await self._retriever.search(query)
-
-        # Add retrieved context
-        context.add_messages(self.source_id, [
-            ChatMessage.system(f"Relevant context:\n{self._format_docs(docs)}")
-        ])
+        context.add_messages(self.source_id, [ChatMessage.system(f"Context: {docs}")])
 
     async def after_run(self, context: SessionContext) -> None:
-        # Update our own history buffer with this exchange
         self._message_buffer.extend(context.input_messages)
         if context.response_messages:
             self._message_buffer.extend(context.response_messages)
+```
 
-        # Trim to prevent unbounded growth
-        max_messages = self._buffer_window * 4  # Keep some buffer
-        if len(self._message_buffer) > max_messages:
-            self._message_buffer = self._message_buffer[-max_messages:]
+**Simple RAG (input only, no buffer):**
 
-    async def serialize(self) -> dict[str, Any]:
-        """Persist the history buffer."""
-        return {
-            "source_id": self.source_id,
-            "message_buffer": [msg.to_dict() for msg in self._message_buffer],
-        }
+```python
+# Middleware
+async def process(self, context, next):
+    query = " ".join(msg.text for msg in context.input_messages if msg.text)
+    docs = await self._retriever.search(query)
+    context.add_messages(self.source_id, [ChatMessage.system(f"Context: {docs}")])
+    await next(context)
+
+# Hooks
+async def before_run(self, context):
+    query = " ".join(msg.text for msg in context.input_messages if msg.text)
+    docs = await self._retriever.search(query)
+    context.add_messages(self.source_id, [ChatMessage.system(f"Context: {docs}")])
+```
 
-    async def restore(self, state: dict[str, Any]) -> None:
-        """Restore the history buffer."""
-        self._message_buffer = [
-            ChatMessage.from_dict(m) for m in state.get("message_buffer", [])
-        ]
-
-    def _build_search_query(self, messages: list[ChatMessage]) -> str:
-        # Combine recent messages into a search query
-        return " ".join(msg.text for msg in messages if msg.text)
-
-    def _format_docs(self, docs: list[Document]) -> str:
-        return "\n\n".join(doc.content for doc in docs)
-```
-
-**Usage:**
-```python
-agent = ChatAgent(
-    chat_client=client,
-    context_hooks=[
-        InMemoryStorageHooks("memory"),
-        RAGWithBufferHooks("rag", retriever=my_retriever, buffer_window=3),
-    ]
-)
-
-session = agent.create_session()
-
-# First message - RAG uses only this message
-await agent.run("What is Python?", session=session)
-
-# Second message - RAG now uses both messages for better retrieval
-await agent.run("How does it compare to JavaScript?", session=session)
-
-# The RAG hook's internal buffer now contains the conversation,
-# enabling context-aware retrieval even though it's not a storage hook
-```
-
-This pattern allows any hook to behave like a "mini storage" for its own purposes while keeping the clear separation between storage hooks (which persist the canonical conversation) and context hooks (which enhance the invocation).
-
-**Example: Simple RAG using only current input (no history)**
-
-If you want RAG that only uses the current user input (ignoring conversation history), simply use `context.input_messages` directly:
-
-```python
-class SimpleRAGHooks(ContextHooks):
-    """RAG hook that uses only the current input for retrieval."""
-
-    def __init__(self, source_id: str, retriever: Retriever):
-        super().__init__(source_id)
-        self._retriever = retriever
-
-    async def before_run(self, context: SessionContext) -> None:
-        # Use ONLY the current input - no history needed
-        query = " ".join(msg.text for msg in context.input_messages if msg.text)
-        docs = await self._retriever.search(query)
-
-        context.add_messages(self.source_id, [
-            ChatMessage.system(f"Relevant context:\n{self._format_docs(docs)}")
-        ])
-
-    def _format_docs(self, docs: list[Document]) -> str:
-        return "\n\n".join(doc.content for doc in docs)
-
-
-# Usage - storage hook provides history to the model, RAG only uses current input
-agent = ChatAgent(
-    chat_client=client,
-    context_hooks=[
-        InMemoryStorageHooks("memory"),  # Loads full history for the model
-        SimpleRAGHooks("rag", retriever=my_retriever),  # Only uses current input
-    ]
-)
-```
+### Migration Impact
 
-The key distinction:
-- `context.input_messages` - only the new message(s) passed to this `agent.run()` call
-- `context.context_messages` - messages added by other hooks (e.g., history loaded by storage)
-- `context.get_all_messages()` - combines everything for the model
+| Current | Middleware (Option 2) | Hooks (Option 3) |
+|---------|----------------------|------------------|
+| `ContextProvider` | `ContextMiddleware` | `ContextHooks` |
+| `invoking()` | Before `await next(context)` | `before_run()` |
+| `invoked()` | After `await next(context)` | `after_run()` |
+| `ChatMessageStore` | `StorageContextMiddleware` | `StorageContextHooks` |
+| `AgentThread` | `AgentSession` | `AgentSession` |
 
-### Example: Current vs New (Option 3)
+### Example: Current vs New
 
 **Current:**
 ```python
@@ -755,634 +654,191 @@ class MyContextProvider(ContextProvider):
     async def invoked(self, request, response, **kwargs) -> None:
         await self.store_interaction(request, response)
 
-async with MyContextProvider() as provider:
-    agent = ChatAgent(chat_client=client, name="assistant")
-    thread = await agent.get_new_thread(message_store=ChatMessageStore())
-    thread.context_provider = provider
-    response = await agent.run("Hello", thread=thread)
+thread = await agent.get_new_thread(message_store=ChatMessageStore())
+thread.context_provider = provider
+response = await agent.run("Hello", thread=thread)
 ```
 
-**New (Option 3 - Hooks):**
+**New (Middleware):**
 ```python
-class RAGHooks(ContextHooks):
-    async def before_run(self, context: SessionContext) -> None:
+class RAGMiddleware(ContextMiddleware):
+    async def process(self, context: SessionContext, next) -> None:
         docs = await self.retrieve_documents(context.input_messages[-1].text)
         context.add_messages(self.source_id, [ChatMessage.system(f"Context: {docs}")])
-
-    async def after_run(self, context: SessionContext) -> None:
+        await next(context)
         await self.store_interaction(context.input_messages, context.response_messages)
 
 agent = ChatAgent(
     chat_client=client,
-    name="assistant",
-    context_hooks=[
-        InMemoryStorageHooks("memory"),
-        RAGHooks("rag"),
-    ]
+    context_middleware=[InMemoryStorageMiddleware("memory"), RAGMiddleware("rag")]
 )
 session = agent.create_session()
 response = await agent.run("Hello", session=session)
 ```
 
-### Migration Impact (Option 3)
-
-| Current | New (Option 3) | Notes |
-|---------|----------------|-------|
-| `ContextProvider` | `ContextHooks` | Rename `invoking()` → `before_run()`, `invoked()` → `after_run()` |
-| `ChatMessageStore` | `StorageContextHooks` | Extend and implement storage methods |
-| `AgentThread` | `AgentSession` | Clean break, no alias |
-| `thread.message_store` | Via hooks in pipeline | Configure at agent level |
-| `thread.context_provider` | Via hooks in pipeline | Multiple hooks supported |
-
----
-
-## Detailed Design: Option 2 (ContextMiddleware - Wrapper Pattern)
-
-### Key Design Decisions
-
-#### 1. Onion/Wrapper Pattern
-
-Like other middleware in the framework, `ContextMiddleware` uses `process(context, next)`:
-
-```python
-class ContextMiddleware(ABC):
-    def __init__(self, source_id: str, *, session_id: str | None = None):
-        self.source_id = source_id
-        self.session_id = session_id
-
-    @abstractmethod
-    async def process(self, context: SessionContext, next: ContextMiddlewareNext) -> None:
-        """Wrap the context flow - modify before next(), process after."""
-        pass
-```
-
-**Comparison to Current:**
-| Aspect | ContextProvider (Current) | ContextMiddleware (New) |
-|--------|--------------------------|------------------------|
-| Pre-processing | `invoking()` method | Before `await next(context)` |
-| Post-processing | `invoked()` method | After `await next(context)` |
-| Composition | Single provider only | Pipeline of middleware |
-| Pattern | Callback hooks | Onion/wrapper |
-
-#### 2. Agent vs Session Ownership
-
-- **Agent** owns `Sequence[ContextMiddlewareConfig]` (instances or factories)
-- **AgentSession** owns `ContextMiddlewarePipeline` (resolved at runtime)
-
-```python
-# Agent holds middleware configuration
-agent = ChatAgent(
-    chat_client=client,
-    context_middleware=[
-        InMemoryStorageMiddleware("memory"),
-        RAGContextMiddleware("rag"),
-    ]
-)
-
-# Session holds the resolved pipeline
-session = agent.create_session()
-```
-
-**Comparison to Current:**
-| Aspect | AgentThread (Current) | AgentSession (New) |
-|--------|----------------------|-------------------|
-| Storage | `message_store` attribute | Via `StorageContextMiddleware` in pipeline |
-| Context | `context_provider` attribute | Via any `ContextMiddleware` in pipeline |
-| Composition | One of each | Unlimited middleware |
-
-#### 3. Unified Storage Middleware
-
-Instead of separate `ChatMessageStore`, storage is a type of `ContextMiddleware`:
-
+**New (Hooks):**
 ```python
-class StorageContextMiddleware(ContextMiddleware):
-    def __init__(
-        self,
-        source_id: str,
-        *,
-        load_messages: bool | None = None,  # None = smart mode
-        store_inputs: bool = True,
-        store_responses: bool = True,
-        store_context_messages: bool = False,
-        store_context_from: Sequence[str] | None = None,
-    ): ...
-```
-
-**Smart Load Behavior:**
-- `load_messages=None` (default): Automatically disable loading when:
-  - `context.options.get('store') == False`, OR
-  - `context.service_session_id is not None` (service handles storage)
-
-**Comparison to Current:**
-| Aspect | ChatMessageStore (Current) | StorageContextMiddleware (New) |
-|--------|---------------------------|------------------------------|
-| Load messages | Always via `list_messages()` | Configurable `load_messages` flag |
-| Store messages | Always via `add_messages()` | Configurable `store_*` flags |
-| What to store | All messages | Selective: inputs, responses, context |
-| RAG context | Not supported | `store_context_messages=True` |
-
-#### 4. Source Attribution via `source_id`
-
-Every middleware has a required `source_id` that attributes added messages:
-
-```python
-class SessionContext:
-    # Messages keyed by source_id
-    context_messages: dict[str, list[ChatMessage]]
-
-    def add_messages(self, source_id: str, messages: Sequence[ChatMessage]) -> None:
-        if source_id not in self.context_messages:
-            self.context_messages[source_id] = []
-        self.context_messages[source_id].extend(messages)
-
-    def get_messages(
-        self,
-        sources: Sequence[str] | None = None,
-        exclude_sources: Sequence[str] | None = None,
-    ) -> list[ChatMessage]:
-        """Get messages, optionally filtered by source."""
-        ...
-```
-
-**Benefits over Current:**
-- Debug which middleware added which messages
-- Filter messages by source (e.g., exclude RAG from storage)
-- Multiple instances of same middleware type distinguishable
-
-#### 5. Default Storage Behavior
-
-Zero-config works out of the box:
-
-```python
-# No middleware configured - still gets conversation history!
-agent = ChatAgent(chat_client=client, name="assistant")
-session = agent.create_session()
-response = await agent.run("Hello!", session=session)
-response = await agent.run("What did I say?", session=session)  # Remembers!
-```
-
-Default `InMemoryStorageMiddleware` is added at runtime **only when**:
-- No `service_session_id` (service not managing storage)
-- `options.store` is not `True` (user not expecting service storage)
-- **No middleware pipeline configured at all** (pipeline is empty or None)
+class RAGHooks(ContextHooks):
+    async def before_run(self, context: SessionContext) -> None:
+        docs = await self.retrieve_documents(context.input_messages[-1].text)
+        context.add_messages(self.source_id, [ChatMessage.system(f"Context: {docs}")])
 
-**Important:** If the user configures *any* middleware (even non-storage middleware), the framework does **not** automatically add storage. This is intentional:
-- Once users start customizing the pipeline, they should explicitly configure storage
-- Automatic insertion would create ordering ambiguity (should storage be first? last?)
-- Explicit configuration is clearer than implicit behavior for non-trivial setups
+    async def after_run(self, context: SessionContext) -> None:
+        await self.store_interaction(context.input_messages, context.response_messages)
 
-```python
-# This agent has NO automatic storage - user configured middleware but no storage
 agent = ChatAgent(
     chat_client=client,
-    context_middleware=[RAGContextMiddleware("rag")]  # No storage middleware!
+    context_hooks=[InMemoryStorageHooks("memory"), RAGHooks("rag")]
 )
 session = agent.create_session()
-await agent.run("Hello!", session=session)
-await agent.run("What did I say?", session=session)  # Won't remember!
-
-# To get storage, explicitly add it:
-agent = ChatAgent(
-    chat_client=client,
-    context_middleware=[
-        InMemoryStorageMiddleware("memory"),  # Explicit storage
-        RAGContextMiddleware("rag"),
-    ]
-)
-```
-
-**Comparison to Current:**
-| Aspect | AgentThread (Current) | AgentSession (New) |
-|--------|----------------------|-------------------|
-| Default storage | Creates `ChatMessageStore` lazily | Creates `InMemoryStorageMiddleware` at runtime |
-| When | In `on_new_messages()` | In `run_context_pipeline()` |
-| Customizable | After creation | Before first `run()` |
-
-#### 6. Middleware Instance vs Factory
-
-Support both shared instances and per-session factories:
-
-```python
-# Instance (shared across sessions)
-agent = ChatAgent(
-    context_middleware=[RAGContextMiddleware("rag")]
-)
-
-# Factory (new instance per session)
-def create_session_cache(session_id: str | None) -> ContextMiddleware:
-    return SessionCacheMiddleware("cache", session_id=session_id)
-
-agent = ChatAgent(
-    context_middleware=[create_session_cache]
-)
-```
-
-#### 7. Renaming: Thread → Session
-
-`AgentThread` becomes `AgentSession` to better reflect its purpose:
-- "Thread" implies a sequence of messages
-- "Session" better captures the broader scope (state, middleware, lifecycle)
-
-#### 8. Session Serialization/Deserialization
-
-Sessions need to be serializable for persistence across process restarts. Serialization happens through **agent methods** because the agent holds the middleware configuration needed to reconstruct the pipeline.
-
-```python
-class ContextMiddleware(ABC):
-    """Each middleware can optionally implement serialization."""
-
-    async def serialize(self) -> Any:
-        """Serialize middleware state to a persistable object.
-
-        Returns any object that can be serialized (typically dict for JSON).
-        Default returns None (no state to persist).
-        """
-        return None
-
-    async def restore(self, state: Any) -> None:
-        """Restore middleware state from a previously serialized object.
-
-        Args:
-            state: The object returned by serialize()
-        """
-        pass
-
-
-class InMemoryStorageMiddleware(StorageContextMiddleware):
-    """Example: In-memory storage serializes its messages."""
-
-    async def serialize(self) -> dict[str, Any]:
-        return {
-            "source_id": self.source_id,
-            "messages": [msg.to_dict() for msg in self._messages],
-        }
-
-    async def restore(self, state: dict[str, Any]) -> None:
-        self._messages = [ChatMessage.from_dict(m) for m in state.get("messages", [])]
-
-
-class ChatAgent:
-    """Agent handles all session serialization."""
-
-    async def serialize_session(self, session: AgentSession) -> dict[str, Any]:
-        """Serialize a session's state for persistence.
-
-        The agent handles serialization because it understands the middleware
-        configuration and can coordinate state capture across all middleware.
-
-        Args:
-            session: The session to serialize
-
-        Returns:
-            Serialized state that can be persisted (JSON-compatible dict)
-        """
-        middleware_states: dict[str, Any] = {}
-        if session.context_pipeline:
-            for middleware in session.context_pipeline:
-                state = await middleware.serialize()
-                if state is not None:
-                    middleware_states[middleware.source_id] = state
-
-        return {
-            "session_id": session.session_id,
-            "service_session_id": session.service_session_id,
-            "middleware_states": middleware_states,
-        }
-
-    async def restore_session(self, serialized: dict[str, Any]) -> AgentSession:
-        """Restore a session from serialized state.
-
-        The agent must restore the session because it holds the middleware
-        configuration needed to reconstruct the pipeline.
-
-        Args:
-            serialized: Previously serialized session state
-
-        Returns:
-            Restored AgentSession with middleware state restored
-        """
-        session_id = serialized.get("session_id")
-        service_session_id = serialized.get("service_session_id")
-        middleware_states = serialized.get("middleware_states", {})
-
-        # Create fresh session with new pipeline
-        session = self.create_session(
-            session_id=session_id,
-            service_session_id=service_session_id,
-        )
-
-        # Restore middleware state by source_id
-        if session.context_pipeline:
-            for middleware in session.context_pipeline:
-                if middleware.source_id in middleware_states:
-                    await middleware.restore(middleware_states[middleware.source_id])
-
-        return session
-```
-
-**Usage:**
-```python
-# Save session
-state = await agent.serialize_session(session)
-json_str = json.dumps(state)  # Or store in database, Redis, etc.
-
-# Later: restore session
-state = json.loads(json_str)
-session = await agent.restore_session(state)
-
-# Continue conversation
-response = await agent.run("What did we talk about?", session=session)
-```
-
-**Key Points:**
-- `agent.serialize_session(session)` - agent handles serialization
-- `agent.restore_session(state)` - agent handles restoration
-- Each middleware implements optional `serialize()` and `restore()` methods
-- `serialize()` returns `Any` - typically dict for JSON, but could be bytes, protobuf, etc.
-- Stateless middleware returns `None` from `serialize()` (skipped in output)
-- `source_id` acts as the key to match serialized state to middleware instances
-
-**Comparison to Current:**
-| Aspect | AgentThread (Current) | AgentSession (New) |
-|--------|----------------------|-------------------|
-| Serialization | `thread.serialize()` | `agent.serialize_session(session)` |
-| Deserialization | `AgentThread.deserialize(state, message_store=...)` | `agent.restore_session(state)` |
-| What's saved | Just messages | Each middleware's custom state |
-| Owner | Thread class | Agent instance |
-
-#### 9. Session Management Methods
-
-The agent provides clear methods for session lifecycle management:
-
-```python
-class ChatAgent:
-    def create_session(
-        self,
-        *,
-        session_id: str | None = None,
-        service_session_id: str | None = None,
-    ) -> AgentSession:
-        """Create a new session with a fresh middleware pipeline.
-
-        This is the primary way to create sessions. Middleware factories
-        are called with the session_id to create session-specific instances.
-
-        Args:
-            session_id: Optional session ID (generated if not provided)
-            service_session_id: Optional service-managed session ID
-
-        Returns:
-            New AgentSession with resolved middleware pipeline
-        """
-        resolved_session_id = session_id or str(uuid.uuid4())
-
-        pipeline = None
-        if self._context_middleware:
-            pipeline = ContextMiddlewarePipeline.from_config(
-                self._context_middleware,
-                session_id=resolved_session_id,
-            )
-
-        return AgentSession(
-            session_id=resolved_session_id,
-            service_session_id=service_session_id,
-            context_pipeline=pipeline,
-        )
-
-    def get_session_by_id(self, session_id: str) -> AgentSession:
-        """Get a session by ID with a fresh middleware pipeline.
-
-        Use this when you have a session ID but no persisted state.
-        The middleware pipeline is freshly created (no state restored).
-
-        For restoring a session with state, use restore_session() instead.
-
-        Args:
-            session_id: The session ID to use
-
-        Returns:
-            AgentSession with the specified ID and fresh middleware
-        """
-        return self.create_session(session_id=session_id)
-```
-
-**Usage:**
-```python
-# Create a brand new session
-session = agent.create_session()
-
-# Create session with specific ID (e.g., from external system)
-session = agent.create_session(session_id="user-123-session-456")
-
-# Create session for service-managed storage
-session = agent.create_session(service_session_id="thread_abc123")
-
-# Get session by ID (fresh pipeline, no state)
-session = agent.get_session_by_id("existing-session-id")
-
-# Restore session with full state
-state = load_from_database(session_id)
-session = await agent.restore_session(state)
+response = await agent.run("Hello", session=session)
 ```
+## Decision Outcome
 
-### Accessing Context from Other Middleware
-
-Non-storage middleware can read context added by other middleware via `context.context_messages`. However, middleware should operate under the assumption that **only the current input messages are available** - there is no implicit conversation history.
-
-If a middleware needs historical context (e.g., a RAG middleware that wants to search based on the last few messages, not just the newest), it must **maintain its own message buffer** as part of its instance state. This makes the middleware self-contained and predictable, similar to how storage middleware manages its own persistence.
-
-**Key principles:**
-- `context.input_messages` contains only the new message(s) for this invocation
-- `context.context_messages` contains messages added by middleware that ran earlier in the pipeline
-- For history beyond current input, middleware must track it themselves
-- Use `serialize()`/`restore()` to persist the buffer across sessions
+**TBD** - This ADR presents two viable approaches:
 
-**Example: RAG middleware with conversation history buffer**
+- **Option 2: ContextMiddleware (Wrapper Pattern)** - Consistent with existing middleware patterns, more powerful control flow
+- **Option 3: ContextHooks (Pre/Post Pattern)** - Simpler mental model, easier migration from current `ContextProvider`
 
-```python
-class RAGWithBufferMiddleware(ContextMiddleware):
-    """RAG middleware that uses recent conversation history for better retrieval."""
+Both options share the same:
+- Agent vs Session ownership model
+- `source_id` attribution
+- Serialization/deserialization via agent methods
+- Session management methods (`create_session`, `get_session_by_id`, `serialize_session`, `restore_session`)
+- Renaming `AgentThread` → `AgentSession`
 
-    def __init__(
-        self,
-        source_id: str,
-        retriever: Retriever,
-        *,
-        buffer_window: int = 5,  # Number of recent exchanges to consider
-        session_id: str | None = None,
-    ):
-        super().__init__(source_id, session_id=session_id)
-        self._retriever = retriever
-        self._buffer_window = buffer_window
-        self._message_buffer: list[ChatMessage] = []  # Self-managed history
+The key difference is the execution model: nested wrapper vs linear phases.
 
-    async def process(self, context: SessionContext, next: ContextMiddlewareNext) -> None:
-        # Build search query from current input + recent history
-        recent_messages = self._message_buffer[-self._buffer_window * 2:]  # pairs of user/assistant
-        search_context = recent_messages + list(context.input_messages)
+---
 
-        # Use conversation context for better retrieval
-        query = self._build_search_query(search_context)
-        docs = await self._retriever.search(query)
+## Comparison to .NET Implementation
 
-        # Add retrieved context
-        context.add_messages(self.source_id, [
-            ChatMessage.system(f"Relevant context:\n{self._format_docs(docs)}")
-        ])
+The .NET Agent Framework provides equivalent functionality through a different structure. Both implementations achieve the same goals using idioms natural to their respective languages.
 
-        # Call next middleware
-        await next(context)
+### Concept Mapping
 
-        # Update our own history buffer with this exchange
-        self._message_buffer.extend(context.input_messages)
-        if context.response_messages:
-            self._message_buffer.extend(context.response_messages)
+| .NET Concept | Python Middleware (Option 2) | Python Hooks (Option 3) |
+|--------------|------------------------------|-------------------------|
+| `AIContextProvider` | `ContextMiddleware` | `ContextHooks` |
+| `ChatHistoryProvider` | `StorageContextMiddleware` | `StorageContextHooks` |
+| `AgentSession` | `AgentSession` | `AgentSession` |
 
-        # Trim to prevent unbounded growth
-        max_messages = self._buffer_window * 4  # Keep some buffer
-        if len(self._message_buffer) > max_messages:
-            self._message_buffer = self._message_buffer[-max_messages:]
+### Feature Equivalence
 
-    async def serialize(self) -> dict[str, Any]:
-        """Persist the history buffer."""
-        return {
-            "source_id": self.source_id,
-            "message_buffer": [msg.to_dict() for msg in self._message_buffer],
-        }
+Both platforms provide the same core capabilities:
 
-    async def restore(self, state: dict[str, Any]) -> None:
-        """Restore the history buffer."""
-        self._message_buffer = [
-            ChatMessage.from_dict(m) for m in state.get("message_buffer", [])
-        ]
+| Capability | .NET | Python |
+|------------|------|--------|
+| Inject context before invocation | `AIContextProvider.InvokingAsync()` | `process()` before `next()` / `before_run()` |
+| React after invocation | `AIContextProvider.InvokedAsync()` | `process()` after `next()` / `after_run()` |
+| Load conversation history | `ChatHistoryProvider.InvokingAsync()` | `StorageContextMiddleware/Hooks` with `load_messages=True` |
+| Store conversation history | `ChatHistoryProvider.InvokedAsync()` | `StorageContextMiddleware/Hooks` with `store_*` flags |
+| Session serialization | `Serialize()` on providers | `serialize()`/`restore()` on middleware/hooks |
+| Factory-based creation | `AIContextProviderFactory`, `ChatHistoryProviderFactory` | Factory functions in middleware/hooks list |
 
-    def _build_search_query(self, messages: list[ChatMessage]) -> str:
-        # Combine recent messages into a search query
-        return " ".join(msg.text for msg in messages if msg.text)
+### Implementation Differences
 
-    def _format_docs(self, docs: list[Document]) -> str:
-        return "\n\n".join(doc.content for doc in docs)
-```
+The implementations differ in ways idiomatic to each language:
 
-**Usage:**
-```python
-agent = ChatAgent(
-    chat_client=client,
-    context_middleware=[
-        InMemoryStorageMiddleware("memory"),
-        RAGWithBufferMiddleware("rag", retriever=my_retriever, buffer_window=3),
-    ]
-)
+| Aspect | .NET Approach | Python Approach |
+|--------|---------------|-----------------|
+| **Context providers** | Separate `AIContextProvider` (single) and `ChatHistoryProvider` (single) | Unified list of middleware/hooks (multiple) |
+| **Composition** | One of each provider type per session | Unlimited middleware/hooks in pipeline |
+| **Type system** | Strict interfaces, compile-time checks | Duck typing, protocols, runtime flexibility |
+| **Configuration** | DI container, factory delegates | Direct instantiation, list of instances/factories |
+| **Default storage** | Can auto-inject when `ChatHistoryProvider` missing | Only auto-injects when no pipeline configured |
+| **Source tracking** | Via separate provider types | Built-in `source_id` on each middleware/hook |
 
-session = agent.create_session()
+### Design Trade-offs
 
-# First message - RAG uses only this message
-await agent.run("What is Python?", session=session)
+Each approach has trade-offs that align with language conventions:
 
-# Second message - RAG now uses both messages for better retrieval
-await agent.run("How does it compare to JavaScript?", session=session)
+**.NET's separate provider types:**
+- Clearer separation between context injection and history storage
+- Easier to detect "missing storage" and auto-inject defaults
+- Type system enforces single provider of each type
 
-# The RAG middleware's internal buffer now contains the conversation,
-# enabling context-aware retrieval even though it's not a storage middleware
-```
+**Python's unified pipeline:**
+- Single abstraction for all context concerns
+- Multiple instances of same type (e.g., multiple storage backends)
+- More explicit - customization means owning full configuration
+- `source_id` enables filtering/debugging across all sources
 
-This pattern allows any middleware to behave like a "mini storage" for its own purposes while keeping the clear separation between storage middleware (which persist the canonical conversation) and context middleware (which enhance the invocation).
+Neither approach is inherently better - they reflect different language philosophies while achieving equivalent functionality. The Python design embraces the "we're all consenting adults" philosophy, while .NET provides more compile-time guardrails.
 
-**Example: Simple RAG using only current input (no history)**
+---
 
-If you want RAG that only uses the current user input (ignoring conversation history), simply use `context.input_messages` directly:
+## Open Discussion: Context Compaction
 
-```python
-class SimpleRAGMiddleware(ContextMiddleware):
-    """RAG middleware that uses only the current input for retrieval."""
+### Problem Statement
 
-    def __init__(self, source_id: str, retriever: Retriever):
-        super().__init__(source_id)
-        self._retriever = retriever
+A common need for long-running agents is **context compaction** - automatically summarizing or truncating conversation history when approaching token limits. This is particularly important for agents that make many tool calls in succession (10s or 100s), where the context can grow unboundedly.
 
-    async def process(self, context: SessionContext, next: ContextMiddlewareNext) -> None:
-        # Use ONLY the current input - no history needed
-        query = " ".join(msg.text for msg in context.input_messages if msg.text)
-        docs = await self._retriever.search(query)
+Currently, this is challenging because:
+- `ChatMessageStore.list_messages()` is only called once at the start of `agent.run()`, not during the tool loop
+- `ChatMiddleware` operates on a copy of messages, so modifications don't persist across tool loop iterations
+- The function calling loop happens deep within the `ChatClient`, which is below the agent level
 
-        context.add_messages(self.source_id, [
-            ChatMessage.system(f"Relevant context:\n{self._format_docs(docs)}")
-        ])
+### Design Question
 
-        await next(context)
+Should `ContextMiddleware`/`ContextHooks` be invoked:
+1. **Only at agent invocation boundaries** (current proposal) - before/after each `agent.run()` call
+2. **During the tool loop** - before/after each model call within a single `agent.run()`
 
-    def _format_docs(self, docs: list[Document]) -> str:
-        return "\n\n".join(doc.content for doc in docs)
+### Boundary vs In-Run Compaction
 
+While boundary and in-run compaction could potentially use the same mechanism, they have **different goals and behaviors**:
 
-# Usage - storage middleware provides history to the model, RAG only uses current input
-agent = ChatAgent(
-    chat_client=client,
-    context_middleware=[
-        InMemoryStorageMiddleware("memory"),  # Loads full history for the model
-        SimpleRAGMiddleware("rag", retriever=my_retriever),  # Only uses current input
-    ]
-)
-```
+**Boundary compaction** (before/after `agent.run()`):
+- **Before run**: Keep context manageable - load a compacted view of history
+- **After run**: Keep storage compact - summarize/truncate before persisting
+- Useful for maintaining reasonable context sizes across conversation turns
+- One reason to have **multiple storage middleware**: persist compacted history for use during runs, while also storing the full uncompacted history for auditing and evaluations
 
-The key distinction:
-- `context.input_messages` - only the new message(s) passed to this `agent.run()` call
-- `context.context_messages` - messages added by other middleware (e.g., history loaded by storage)
-- `context.get_all_messages()` - combines everything for the model
+**In-run compaction** (during function calling loops):
+- Relevant for **function calling scenarios** where many tool calls accumulate
+- Typically **in-memory only** - no need to persist intermediate compaction and only useful when the conversation/session is _not_ managed by the service
+- Different strategies apply:
+  - Remove old function call/result pairs entirely/Keep only the most recent N tool interactions
+  - Replace call/result pairs with a single summary message (with a different role)
+  - Summarize several function call/result pairs into one larger context message
 
-### Migration Impact
+### Service-Managed vs Local Storage
 
-| Current | New | Notes |
-|---------|-----|-------|
-| `ContextProvider` | `ContextMiddleware` | Implement `process()` instead of `invoking()`/`invoked()` |
-| `ChatMessageStore` | `StorageContextMiddleware` | Extend and implement `get_messages()`/`save_messages()` |
-| `AgentThread` | `AgentSession` | Clean break, no alias |
-| `thread.message_store` | Via middleware in pipeline | Configure at agent level |
-| `thread.context_provider` | Via middleware in pipeline | Multiple providers supported |
+**Important:** In-run compaction is relevant only for **non-service-managed histories**. When using service-managed storage (`service_session_id` is set):
+- The service handles history management internally
+- Only the new calls and results are sent to/from the service each turn
+- The service is responsible for its own compaction strategy, but we do not control that
 
-### Example: Current vs New
+For local storage, a full message list is sent to the model each time, making compaction the client's responsibility.
 
-**Current:**
-```python
-class MyContextProvider(ContextProvider):
-    async def invoking(self, messages, **kwargs) -> Context:
-        docs = await self.retrieve_documents(messages[-1].text)
-        return Context(messages=[ChatMessage.system(f"Context: {docs}")])
+### Options
 
-    async def invoked(self, request, response, **kwargs) -> None:
-        await self.store_interaction(request, response)
+**Option A: Invocation-boundary only (current proposal)**
+- Simpler mental model
+- Consistent with `AgentMiddleware` pattern
+- In-run compaction would need to happen via a separate mechanism (e.g., `ChatMiddleware` at the client level)
+- Risk: Different compaction mechanisms at different layers could be confusing
 
-async with MyContextProvider() as provider:
-    agent = ChatAgent(chat_client=client, name="assistant")
-    thread = await agent.get_new_thread(message_store=ChatMessageStore())
-    thread.context_provider = provider
-    response = await agent.run("Hello", thread=thread)
-```
+**Option B: Also during tool loops**
+- Single mechanism for all context manipulation
+- More powerful but more complex
+- Requires coordination with `ChatClient` internals
+- Risk: Performance overhead if middleware/hooks are expensive
 
-**New:**
-```python
-class RAGMiddleware(ContextMiddleware):
-    async def process(self, context: SessionContext, next) -> None:
-        # Pre-processing
-        docs = await self.retrieve_documents(context.input_messages[-1].text)
-        context.add_messages(self.source_id, [ChatMessage.system(f"Context: {docs}")])
+**Option C: Unified approach across layers**
+- Define a single context compaction abstraction that works at both agent and client levels
+- `ContextMiddleware`/`ContextHooks` could delegate to `ChatMiddleware` for mid-loop execution
+- Requires deeper architectural thought
 
-        await next(context)
+### Potential Extension Points (for any option)
 
-        # Post-processing
-        await self.store_interaction(context.input_messages, context.response_messages)
+Regardless of the chosen approach, these extension points could support compaction:
+- A `CompactionStrategy` that can be shared between middleware/hooks and function calling configuration
+- Hooks for `ChatClient` to notify the agent layer when context limits are approaching
+- A unified `ContextManager` that coordinates compaction across layers
 
-agent = ChatAgent(
-    chat_client=client,
-    name="assistant",
-    context_middleware=[
-        InMemoryStorageMiddleware("memory"),
-        RAGMiddleware("rag"),
-    ]
-)
-session = agent.create_session()
-response = await agent.run("Hello", session=session)
-```
+**This section requires further discussion.**
 
 ## Implementation Plan
 

From 4fbbe71c859c6d4bd067085923393da8beea13e9 Mon Sep 17 00:00:00 2001
From: eavanvalkenburg <github@vanvalkenburg.eu>
Date: Wed, 4 Feb 2026 14:29:50 +0100
Subject: [PATCH 07/19] tweaks

---
 .../00XX-python-context-middleware.md         | 22 +++++++++----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/docs/decisions/00XX-python-context-middleware.md b/docs/decisions/00XX-python-context-middleware.md
index 4c031dfd6a..062fd09c1d 100644
--- a/docs/decisions/00XX-python-context-middleware.md
+++ b/docs/decisions/00XX-python-context-middleware.md
@@ -125,7 +125,7 @@ The following key decisions shape the ContextMiddleware design:
 | 4 | **Multiple Storage Allowed** | Warn if multiple have `load_messages=True` (likely misconfiguration). |
 | 5 | **Single Storage Class** | One `StorageContextMiddleware` configured for memory/audit/evaluation - no separate classes. |
 | 6 | **Mandatory source_id** | Required parameter forces explicit naming for attribution in `context_messages` dict. |
-| 7 | **Smart Load Behavior** | `load_messages=None` (default) disables loading when `options.store=False` OR `service_session_id` present. |
+| 7 | **Smart Load Behavior** | `load_messages=None` (default) disables loading when `options.store=True` OR `service_session_id` present. And does load otherwise  |
 | 8 | **Dict-based Context** | `context_messages: dict[str, list[ChatMessage]]` keyed by source_id maintains order and enables filtering. |
 | 9 | **Selective Storage** | `store_context_messages` and `store_context_from` control what gets persisted from other middleware. |
 | 10 | **Tool Attribution** | `add_tools()` automatically sets `tool.metadata["context_source"] = source_id`. |
@@ -314,17 +314,17 @@ class ContextHooks(ABC):
 
 ```
 Middleware (Wrapper/Onion):            Hooks (Linear):
-┌─────────────────────────┐            ┌─────────────────────────┐
-│ middleware1.process()   │            │ hook1.before_run()      │
-│  ┌───────────────────┐  │            │ hook2.before_run()      │
+┌──────────────────────────┐            ┌─────────────────────────┐
+│ middleware1.process()    │            │ hook1.before_run()      │
+│  ┌───────────────────┐   │            │ hook2.before_run()      │
 │  │ middleware2.process│  │            │ hook3.before_run()      │
-│  │  ┌─────────────┐  │  │            ├─────────────────────────┤
-│  │  │   invoke    │  │  │     vs     │      <invoke>           │
-│  │  └─────────────┘  │  │            ├─────────────────────────┤
-│  │ (post-processing) │  │            │ hook3.after_run()       │
-│  └───────────────────┘  │            │ hook2.after_run()       │
-│ (post-processing)       │            │ hook1.after_run()       │
-└─────────────────────────┘            └─────────────────────────┘
+│  │  ┌─────────────┐  │   │            ├─────────────────────────┤
+│  │  │   invoke    │  │   │     vs     │      <invoke>           │
+│  │  └─────────────┘  │   │            ├─────────────────────────┤
+│  │ (post-processing) │   │            │ hook3.after_run()       │
+│  └───────────────────┘   │            │ hook2.after_run()       │
+│ (post-processing)        │            │ hook1.after_run()       │
+└──────────────────────────┘            └─────────────────────────┘
 ```
 
 ### 2. Agent vs Session Ownership

From 577258726cd40acefa11d1501c701928a4f14d34 Mon Sep 17 00:00:00 2001
From: eavanvalkenburg <github@vanvalkenburg.eu>
Date: Wed, 4 Feb 2026 14:31:36 +0100
Subject: [PATCH 08/19] fix smart load

---
 docs/decisions/00XX-python-context-middleware.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/decisions/00XX-python-context-middleware.md b/docs/decisions/00XX-python-context-middleware.md
index 062fd09c1d..af8e1f303f 100644
--- a/docs/decisions/00XX-python-context-middleware.md
+++ b/docs/decisions/00XX-python-context-middleware.md
@@ -400,7 +400,7 @@ class StorageContextHooks(ContextHooks):
 
 **Smart Load Behavior (both approaches):**
 - `load_messages=None` (default): Automatically disable loading when:
-  - `context.options.get('store') == False`, OR
+  - `context.options.get('store') == True`, OR
   - `context.service_session_id is not None` (service handles storage)
 
 **Comparison to Current:**

From 59d53ff48035dc5162a5c3ab594f40936e8daa64 Mon Sep 17 00:00:00 2001
From: eavanvalkenburg <github@vanvalkenburg.eu>
Date: Wed, 4 Feb 2026 14:39:35 +0100
Subject: [PATCH 09/19] ADR: Add naming discussion note for ContextHooks

- Note that class and method names are open for discussion
- Add alternative method naming options table
- Include invoking/invoked as option matching current Python and .NET
---
 .../00XX-python-context-middleware.md          | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/docs/decisions/00XX-python-context-middleware.md b/docs/decisions/00XX-python-context-middleware.md
index af8e1f303f..85303df043 100644
--- a/docs/decisions/00XX-python-context-middleware.md
+++ b/docs/decisions/00XX-python-context-middleware.md
@@ -217,7 +217,9 @@ class ContextHooks(ABC):
         pass
 ```
 
-**Alternative naming options:**
+> **Note on naming:** Both the class name (`ContextHooks`) and method names (`before_run`/`after_run`) are open for discussion. The names used throughout this ADR are placeholders pending a final decision. See alternative naming options below.
+
+**Alternative class naming options:**
 
 | Name | Rationale |
 |------|-----------|
@@ -229,6 +231,17 @@ class ContextHooks(ABC):
 | `SessionHooks` | Ties to `AgentSession`, emphasizes session lifecycle |
 | `InvokeHooks` | Directly describes what's being hooked (the invoke call) |
 
+**Alternative method naming options:**
+
+| before / after | Rationale |
+|----------------|-----------|
+| `before_run` / `after_run` | Matches `agent.run()` terminology |
+| `before_invoke` / `after_invoke` | Emphasizes invocation lifecycle |
+| `invoking` / `invoked` | Matches current Python `ContextProvider` and .NET naming |
+| `pre_invoke` / `post_invoke` | Common prefix convention |
+| `on_invoking` / `on_invoked` | Event-style naming |
+| `prepare` / `finalize` | Action-oriented naming |
+
 **Example usage:**
 
 ```python
@@ -456,7 +469,7 @@ Default in-memory storage is added at runtime **only when**:
 - **No pipeline configured at all** (pipeline is empty or None)
 
 **Important:** If the user configures *any* middleware/hooks (even non-storage ones), the framework does **not** automatically add storage. This is intentional:
-- Once users start customizing the pipeline, we consider them a advanced user and they should know what they are doing,. therefore they should explicitly configure storage
+- Once users start customizing the pipeline, we consider them a advanced user and they should know what they are doing, therefore they should explicitly configure storage
 - Automatic insertion would create ordering ambiguity
 - Explicit configuration is clearer than implicit behavior
 
@@ -493,6 +506,7 @@ agent = ChatAgent(context_hooks=[create_cache])
 `AgentThread` becomes `AgentSession` to better reflect its purpose:
 - "Thread" implies a sequence of messages
 - "Session" better captures the broader scope (state, pipeline, lifecycle)
+- Align with recent change in .NET SDK
 
 ### 8. Session Serialization/Deserialization
 

From 232d71123670f903d082a1d292e186e4d2e01789 Mon Sep 17 00:00:00 2001
From: eavanvalkenburg <github@vanvalkenburg.eu>
Date: Wed, 4 Feb 2026 17:04:03 +0100
Subject: [PATCH 10/19] Update context middleware design: remove smart mode,
 add attribution filtering

- Remove smart mode for load_messages (now explicit bool, default True)
- Add attribution marker in additional_properties for message filtering
- Update validation to warn on multiple or zero storage loaders
- Add note about ChatReducer naming from .NET
- Note that attribution should not be propagated to storage
---
 .../00XX-python-context-middleware.md         | 141 +++++++++++-------
 1 file changed, 86 insertions(+), 55 deletions(-)

diff --git a/docs/decisions/00XX-python-context-middleware.md b/docs/decisions/00XX-python-context-middleware.md
index 85303df043..cdb8aaaab1 100644
--- a/docs/decisions/00XX-python-context-middleware.md
+++ b/docs/decisions/00XX-python-context-middleware.md
@@ -44,7 +44,7 @@ This ADR addresses the following issues from the parent issue [#3575](https://gi
 | [#3587](https://github.com/microsoft/agent-framework/issues/3587) | Rename AgentThread to AgentSession | ✅ `AgentThread` → `AgentSession` (clean break, no alias). See [§7 Renaming](#7-renaming-thread--session). |
 | [#3588](https://github.com/microsoft/agent-framework/issues/3588) | Add get_new_session, get_session_by_id methods | ✅ `agent.create_session()` (no params) and `agent.get_session_by_id(id)`. See [§9 Session Management Methods](#9-session-management-methods). |
 | [#3589](https://github.com/microsoft/agent-framework/issues/3589) | Move serialize method into the agent | ✅ `agent.serialize_session(session)` and `agent.restore_session(state)`. Agent handles all serialization. See [§8 Serialization](#8-session-serializationdeserialization). |
-| [#3590](https://github.com/microsoft/agent-framework/issues/3590) | Design orthogonal ChatMessageStore for service vs local | ✅ `StorageContextMiddleware` works orthogonally: `service_session_id` presence triggers smart behavior (don't load if service manages storage). Multiple storage middleware allowed. See [§3 Unified Storage](#3-unified-storage-middleware). |
+| [#3590](https://github.com/microsoft/agent-framework/issues/3590) | Design orthogonal ChatMessageStore for service vs local | ✅ `StorageContextMiddleware` works orthogonally: configure `load_messages=False` when service manages storage. Multiple storage middleware allowed. See [§3 Unified Storage](#3-unified-storage-middleware). |
 | [#3601](https://github.com/microsoft/agent-framework/issues/3601) | Rename ChatMessageStore to ChatHistoryProvider | 🔒 **Closed** - Superseded by this ADR. `ChatMessageStore` removed entirely, replaced by `StorageContextMiddleware`. |
 
 ## Current State Analysis
@@ -122,11 +122,11 @@ The following key decisions shape the ContextMiddleware design:
 | 1 | **Agent vs Session Ownership** | Agent owns middleware config; Session owns resolved pipeline. Enables per-session factories. |
 | 2 | **Instance or Factory** | Middleware can be shared instances or `(session_id) -> Middleware` factories for per-session state. |
 | 3 | **Default Storage at Runtime** | `InMemoryStorageMiddleware` auto-added when no service_session_id, store≠True, and no pipeline. Evaluated at runtime so users can modify pipeline first. |
-| 4 | **Multiple Storage Allowed** | Warn if multiple have `load_messages=True` (likely misconfiguration). |
+| 4 | **Multiple Storage Allowed** | Warn at session creation if multiple or zero storage middleware/hooks have `load_messages=True` (likely misconfiguration). |
 | 5 | **Single Storage Class** | One `StorageContextMiddleware` configured for memory/audit/evaluation - no separate classes. |
 | 6 | **Mandatory source_id** | Required parameter forces explicit naming for attribution in `context_messages` dict. |
-| 7 | **Smart Load Behavior** | `load_messages=None` (default) disables loading when `options.store=True` OR `service_session_id` present. And does load otherwise  |
-| 8 | **Dict-based Context** | `context_messages: dict[str, list[ChatMessage]]` keyed by source_id maintains order and enables filtering. |
+| 7 | **Explicit Load Behavior** | `load_messages: bool = True` - explicit configuration with no automatic detection. For `StorageContextHooks`, `before_run` is skipped entirely when `load_messages=False`. |
+| 8 | **Dict-based Context** | `context_messages: dict[str, list[ChatMessage]]` keyed by source_id maintains order and enables filtering. Messages can have an `attribution` marker in `additional_properties` for external filtering scenarios. |
 | 9 | **Selective Storage** | `store_context_messages` and `store_context_from` control what gets persisted from other middleware. |
 | 10 | **Tool Attribution** | `add_tools()` automatically sets `tool.metadata["context_source"] = source_id`. |
 | 11 | **Clean Break** | Remove `AgentThread`, `ContextProvider`, `ChatMessageStore` completely (preview, no compatibility shims). |
@@ -388,7 +388,7 @@ class StorageContextMiddleware(ContextMiddleware):
         self,
         source_id: str,
         *,
-        load_messages: bool | None = None,  # None = smart mode
+        load_messages: bool = True,
         store_inputs: bool = True,
         store_responses: bool = True,
         store_context_messages: bool = False,
@@ -403,7 +403,7 @@ class StorageContextHooks(ContextHooks):
         self,
         source_id: str,
         *,
-        load_messages: bool | None = None,  # None = smart mode
+        load_messages: bool = True,
         store_inputs: bool = True,
         store_responses: bool = True,
         store_context_messages: bool = False,
@@ -411,10 +411,9 @@ class StorageContextHooks(ContextHooks):
     ): ...
 ```
 
-**Smart Load Behavior (both approaches):**
-- `load_messages=None` (default): Automatically disable loading when:
-  - `context.options.get('store') == True`, OR
-  - `context.service_session_id is not None` (service handles storage)
+**Load Behavior:**
+- `load_messages=True` (default): Load messages from storage in `before_run`/pre-processing
+- `load_messages=False`: Skip loading; for `StorageContextHooks`, the `before_run` hook is not called at all
 
 **Comparison to Current:**
 | Aspect | ChatMessageStore (Current) | Storage Middleware/Hooks (New) |
@@ -451,6 +450,27 @@ class SessionContext:
 - Filter messages by source (e.g., exclude RAG from storage)
 - Multiple instances of same type distinguishable
 
+**Message-level Attribution:**
+
+In addition to source-based filtering, individual `ChatMessage` objects should have an `attribution` marker in their `additional_properties` dict. This enables external scenarios to filter messages after the full list has been composed from input and context messages:
+
+```python
+# Setting attribution on a message
+message = ChatMessage(
+    role="system",
+    text="Relevant context from knowledge base",
+    additional_properties={"attribution": "knowledge_base"}
+)
+
+# Filtering by attribution (external scenario)
+all_messages = context.get_all_messages(include_input=True)
+filtered = [m for m in all_messages if m.additional_properties.get("attribution") != "ephemeral"]
+```
+
+This is useful for scenarios where filtering by `source_id` is not sufficient, such as when messages from the same source need different treatment.
+
+> **Note:** The `attribution` marker is intended for runtime filtering only and should **not** be propagated to storage. Storage middleware should strip `attribution` from `additional_properties` before persisting messages.
+
 ### 5. Default Storage Behavior
 
 Zero-config works out of the box (both approaches):
@@ -851,6 +871,9 @@ Regardless of the chosen approach, these extension points could support compacti
 - A `CompactionStrategy` that can be shared between middleware/hooks and function calling configuration
 - Hooks for `ChatClient` to notify the agent layer when context limits are approaching
 - A unified `ContextManager` that coordinates compaction across layers
+- **Message-level attribution**: The `attribution` marker in `ChatMessage.additional_properties` can be used during compaction to identify messages that should be preserved (e.g., `attribution: "important"`) or that are safe to remove (e.g., `attribution: "ephemeral"`). This prevents accidental filtering of critical context during aggressive compaction.
+
+> **Note:** The .NET SDK currently has a `ChatReducer` interface for context reduction/compaction. We should consider adopting similar naming in Python (e.g., `ChatReducer` or `ContextReducer`) for cross-platform consistency.
 
 **This section requires further discussion.**
 
@@ -1162,19 +1185,20 @@ class StorageContextMiddleware(ContextMiddleware):
     - Evaluation storage (stores only for later analysis)
 
     Loading behavior (when to add messages to context_messages[source_id]):
-    - `load_messages=True`: Always load messages
+    - `load_messages=True` (default): Load messages from storage
     - `load_messages=False`: Never load (audit/logging mode)
-    - `load_messages=None` (default): Smart mode - load unless:
-      - `context.options.get('store', True)` is False, OR
-      - `context.service_session_id` is present (service manages storage)
 
     Storage behavior:
     - `store_inputs`: Store input messages (default True)
     - `store_responses`: Store response messages (default True)
     - Storage always happens unless explicitly disabled, regardless of load_messages
 
-    Warning: If multiple middleware have load_messages=True, a warning
-    is logged at pipeline creation time (likely misconfiguration).
+    Warning: At session creation time, a warning is logged if:
+    - Multiple storage middleware have `load_messages=True` (likely duplicate loading)
+    - Zero storage middleware have `load_messages=True` (likely missing primary storage)
+
+    These are warnings only (not errors) because valid use cases exist for both scenarios,
+    such as intentional multi-source loading or audit-only storage configurations.
 
     Examples:
         # Primary memory - loads and stores
@@ -1208,7 +1232,7 @@ class StorageContextMiddleware(ContextMiddleware):
         source_id: str,
         *,
         session_id: str | None = None,
-        load_messages: bool | None = None,  # None = smart mode
+        load_messages: bool = True,
         store_responses: bool = True,
         store_inputs: bool = True,
         store_context_messages: bool = False,  # Store context added by other middleware
@@ -1235,19 +1259,6 @@ class StorageContextMiddleware(ContextMiddleware):
         """Persist messages for this session."""
         pass
 
-    def _should_load_messages(self, context: SessionContext) -> bool:
-        """Determine if we should load messages based on config and context."""
-        # Explicit configuration takes precedence
-        if self.load_messages is not None:
-            return self.load_messages
-
-        # Smart mode: don't load if service manages storage
-        if context.service_session_id is not None:
-            return False
-
-        # Smart mode: respect options['store']
-        return context.options.get('store', True)
-
     def _get_context_messages_to_store(self, context: SessionContext) -> list[ChatMessage]:
         """Get context messages that should be stored based on configuration."""
         if not self.store_context_messages:
@@ -1266,7 +1277,7 @@ class StorageContextMiddleware(ContextMiddleware):
         next: ContextMiddlewareNext
     ) -> None:
         # PRE: Load history if configured, keyed by our source_id
-        if self._should_load_messages(context):
+        if self.load_messages:
             history = await self.get_messages(context.session_id)
             context.add_messages(self.source_id, history)
 
@@ -1363,18 +1374,36 @@ class ContextMiddlewarePipeline:
         return cls(middleware)
 
     def _validate_middleware(self) -> None:
-        """Warn if multiple middleware are configured to load messages."""
-        loaders = [
+        """Warn if storage middleware configuration looks like a mistake.
+
+        These are warnings only (not errors) because valid use cases exist
+        for both multiple loaders and zero loaders.
+        """
+        storage_middleware = [
             m for m in self._middleware
             if isinstance(m, StorageContextMiddleware)
-            and m.load_messages is True
         ]
+
+        if not storage_middleware:
+            # No storage middleware at all - that's fine, user may not need it
+            return
+
+        loaders = [m for m in storage_middleware if m.load_messages is True]
+
         if len(loaders) > 1:
             warnings.warn(
                 f"Multiple storage middleware configured to load messages: "
                 f"{[m.source_id for m in loaders]}. "
                 f"This may cause duplicate messages in context. "
-                f"Consider setting load_messages=False on all but one.",
+                f"If this is intentional, you can ignore this warning.",
+                UserWarning
+            )
+        elif len(loaders) == 0:
+            warnings.warn(
+                f"Storage middleware configured but none have load_messages=True: "
+                f"{[m.source_id for m in storage_middleware]}. "
+                f"No conversation history will be loaded. "
+                f"If this is intentional (e.g., audit-only), you can ignore this warning.",
                 UserWarning
             )
 
@@ -1723,7 +1752,7 @@ search_middleware = AzureAISearchContextMiddleware(
 )
 
 # Primary memory storage (loads + stores)
-# load_messages=None (default) = smart mode, respects options['store'] and service_session_id
+# load_messages=True (default) - loads and stores messages
 memory_middleware = RedisStorageMiddleware(
     source_id="memory",
     redis_url="redis://...",
@@ -1741,7 +1770,7 @@ agent = ChatAgent(
     chat_client=client,
     name="assistant",
     context_middleware=[
-        memory_middleware,   # First: loads history (smart mode)
+        memory_middleware,   # First: loads history
         search_middleware,   # Second: adds RAG context
         audit_middleware,    # Third: stores for audit (no load)
     ]
@@ -1874,14 +1903,12 @@ class RAGContextMiddleware(ContextMiddleware):
         await next(context)
 ```
 
-### Example 5: Smart Storage with options.store and service_session_id
+### Example 5: Explicit Storage Configuration for Service-Managed Sessions
 
 ```python
-# Default StorageContextMiddleware already has smart behavior!
-# load_messages=None (default) means:
-#   - Don't load if options['store'] is False
-#   - Don't load if service_session_id is present (service manages storage)
-#   - Otherwise, load messages
+# StorageContextMiddleware uses explicit configuration - no automatic detection.
+# load_messages=True (default): Load messages from storage
+# load_messages=False: Skip loading (useful for audit-only storage)
 
 agent = ChatAgent(
     chat_client=client,
@@ -1889,7 +1916,7 @@ agent = ChatAgent(
         RedisStorageMiddleware(
             source_id="memory",
             redis_url="redis://...",
-            # load_messages=None is the default - smart mode
+            # load_messages=True is the default
         )
     ]
 )
@@ -1899,17 +1926,21 @@ session = agent.create_session()
 # Normal run - loads and stores messages
 response = await agent.run("Hello!", session=session)
 
-# Run without loading history (but still stores for audit)
-response = await agent.run(
-    "What's 2+2?",
-    session=session,
-    options={"store": False}  # Don't load history for this call
+# For service-managed sessions, configure storage explicitly:
+# - Use load_messages=False when service handles history
+service_storage = RedisStorageMiddleware(
+    source_id="audit",
+    redis_url="redis://...",
+    load_messages=False,  # Don't load - service manages history
 )
 
-# With service-managed session - won't load (service handles it)
-service_session = agent.get_new_session(service_session_id="thread_abc123")
-response = await agent.run("Hello!", session=service_session)
-# Storage middleware sees service_session_id, skips loading
+agent_with_service = ChatAgent(
+    chat_client=client,
+    context_middleware=[service_storage]
+)
+service_session = agent_with_service.create_session(service_session_id="thread_abc123")
+response = await agent_with_service.run("Hello!", session=service_session)
+# Storage middleware stores for audit but doesn't load (service handles history)
 ```
 
 ### Example 6: Multiple Instances of Same Middleware Type
@@ -2145,7 +2176,7 @@ class StorageWithLogging(StorageContextMiddleware):
 - [ ] Create `ContextMiddlewarePipeline` with `from_config()` factory method
 - [ ] Create `ContextMiddlewareFactory` type alias and resolution logic
 - [ ] Create `StorageContextMiddleware` base class with load_messages/store flags
-- [ ] Implement pipeline validation (warn on multiple loaders with `load_messages=True`)
+- [ ] Implement pipeline validation (warn if multiple or zero storage middleware have `load_messages=True`)
 - [ ] Add `serialize()` and `restore()` methods to `ContextMiddleware` base class
 
 #### Phase 2: AgentSession Implementation
@@ -2179,7 +2210,7 @@ class StorageWithLogging(StorageContextMiddleware):
 - [ ] Unit tests for `ContextMiddleware` and pipeline execution order
 - [ ] Unit tests for middleware factory resolution
 - [ ] Unit tests for `StorageContextMiddleware` load/store behavior
-- [ ] Unit tests for `options.store` and `service_session_id` triggers
+- [ ] Unit tests for pipeline validation warnings (multiple/zero loaders)
 - [ ] Unit tests for source attribution (mandatory source_id)
 - [ ] Unit tests for `store_context_messages` and `store_context_from` options
 - [ ] Unit tests for session serialization/deserialization

From 65005739a18921af3902f6c5c8657b332d0f3dbf Mon Sep 17 00:00:00 2001
From: eavanvalkenburg <github@vanvalkenburg.eu>
Date: Thu, 5 Feb 2026 11:15:30 +0100
Subject: [PATCH 11/19] Add Decision 2: Instance Ownership (instances in
 session vs agent)

- Option A: Instances in Session (current proposal)
- Option B: Instances in Agent, State in Session
  - B1: Simple dict state with optional return
  - B2: SessionState object with mutable wrapper
- Updated examples to use Hooks pattern (before_run/after_run)
- Added open discussion on hook factories in Option B model
---
 .../00XX-python-context-middleware.md         | 378 +++++++++++++++++-
 1 file changed, 377 insertions(+), 1 deletion(-)

diff --git a/docs/decisions/00XX-python-context-middleware.md b/docs/decisions/00XX-python-context-middleware.md
index cdb8aaaab1..7edde44767 100644
--- a/docs/decisions/00XX-python-context-middleware.md
+++ b/docs/decisions/00XX-python-context-middleware.md
@@ -119,7 +119,7 @@ The following key decisions shape the ContextMiddleware design:
 
 | # | Decision | Rationale |
 |---|----------|-----------|
-| 1 | **Agent vs Session Ownership** | Agent owns middleware config; Session owns resolved pipeline. Enables per-session factories. |
+| 1 | **Agent vs Session Ownership** | Agent owns middleware config; Session owns resolved pipeline. Enables per-session factories. **(TBD - see Decision 2 in Outcome)** |
 | 2 | **Instance or Factory** | Middleware can be shared instances or `(session_id) -> Middleware` factories for per-session state. |
 | 3 | **Default Storage at Runtime** | `InMemoryStorageMiddleware` auto-added when no service_session_id, store≠True, and no pipeline. Evaluated at runtime so users can modify pipeline first. |
 | 4 | **Multiple Storage Allowed** | Warn at session creation if multiple or zero storage middleware/hooks have `load_messages=True` (likely misconfiguration). |
@@ -729,6 +729,8 @@ response = await agent.run("Hello", session=session)
 ```
 ## Decision Outcome
 
+### Decision 1: Execution Pattern
+
 **TBD** - This ADR presents two viable approaches:
 
 - **Option 2: ContextMiddleware (Wrapper Pattern)** - Consistent with existing middleware patterns, more powerful control flow
@@ -743,6 +745,380 @@ Both options share the same:
 
 The key difference is the execution model: nested wrapper vs linear phases.
 
+### Decision 2: Instance Ownership (Orthogonal)
+
+**TBD** - Where should the actual middleware/hooks instances live?
+
+This decision is orthogonal to Decision 1 (execution pattern) and applies equally to both Middleware and Hooks approaches.
+
+#### Option A: Instances in Session (Current Proposal)
+
+The `AgentSession` owns the actual middleware/hooks instances. The pipeline is created when the session is created, and instances are stored in the session.
+
+```python
+class AgentSession:
+    """Session owns the middleware instances."""
+
+    def __init__(
+        self,
+        *,
+        session_id: str | None = None,
+        context_pipeline: ContextMiddlewarePipeline | None = None,  # Owns instances
+    ):
+        self._session_id = session_id or str(uuid.uuid4())
+        self._context_pipeline = context_pipeline  # Actual instances live here
+
+
+class ChatAgent:
+    def __init__(
+        self,
+        chat_client: ...,
+        *,
+        context_middleware: Sequence[ContextMiddlewareConfig] | None = None,
+    ):
+        self._context_middleware_config = list(context_middleware or [])
+
+    def create_session(self, *, session_id: str | None = None) -> AgentSession:
+        """Create session with resolved middleware instances."""
+        resolved_id = session_id or str(uuid.uuid4())
+
+        # Resolve factories and create actual instances
+        pipeline = None
+        if self._context_middleware_config:
+            pipeline = ContextMiddlewarePipeline.from_config(
+                self._context_middleware_config,
+                session_id=resolved_id,
+            )
+
+        return AgentSession(
+            session_id=resolved_id,
+            context_pipeline=pipeline,  # Session owns the instances
+        )
+
+    async def run(self, input: str, *, session: AgentSession) -> AgentResponse:
+        # Session's pipeline executes
+        context = await session.run_context_pipeline(input_messages)
+        # ... invoke model ...
+```
+
+**Pros:**
+- Self-contained session - all state and behavior together
+- Middleware can maintain per-session instance state naturally
+- Session given to another agent will work the same way
+
+**Cons:**
+- Session becomes heavier (instances + state)
+- Complicated serialization - serialization needs to deal with instances, which might include non-serializable things like clients or connections
+- Harder to share stateless middleware across sessions efficiently
+- Factories must be re-resolved for each session
+
+#### Option B: Instances in Agent, State in Session
+
+The `ChatAgent` owns and manages the middleware/hooks instances. The `AgentSession` only stores state data that middleware reads/writes. The agent's runner executes the pipeline using the session's state.
+
+Two variants exist for how state is stored in the session:
+
+##### Option B1: Simple Dict State
+
+The session stores state as a simple `dict[str, Any]` keyed by `source_id`. Each hook receives its own state slice and can optionally return updated state.
+
+```python
+class AgentSession:
+    """Session only holds state as a simple dict."""
+
+    def __init__(self, *, session_id: str | None = None):
+        self._session_id = session_id or str(uuid.uuid4())
+        self.service_session_id: str | None = None
+        self.state: dict[str, Any] = {}  # source_id -> hook state
+
+
+class ContextHooksRunner:
+    """Agent-owned runner that executes hooks with session state."""
+
+    def __init__(self, hooks: Sequence[ContextHooks]):
+        self._hooks = list(hooks)
+
+    async def run_before(
+        self,
+        context: SessionContext,
+        session_state: dict[str, Any],
+    ) -> None:
+        """Run before_run for all hooks, passing each only its own state."""
+        for hook in self._hooks:
+            my_state = session_state.get(hook.source_id)
+            new_state = await hook.before_run(context, my_state)
+            if new_state is not None:
+                session_state[hook.source_id] = new_state
+
+    async def run_after(
+        self,
+        context: SessionContext,
+        session_state: dict[str, Any],
+    ) -> None:
+        """Run after_run for all hooks in reverse order."""
+        for hook in reversed(self._hooks):
+            my_state = session_state.get(hook.source_id)
+            new_state = await hook.after_run(context, my_state)
+            if new_state is not None:
+                session_state[hook.source_id] = new_state
+
+
+class ChatAgent:
+    def __init__(
+        self,
+        chat_client: ...,
+        *,
+        context_hooks: Sequence[ContextHooks] | None = None,
+    ):
+        # Agent owns the actual hook instances
+        self._hooks_runner = ContextHooksRunner(list(context_hooks or []))
+
+    def create_session(self, *, session_id: str | None = None) -> AgentSession:
+        """Create lightweight session with just state."""
+        return AgentSession(session_id=session_id)
+
+    async def run(self, input: str, *, session: AgentSession) -> AgentResponse:
+        context = SessionContext(
+            session_id=session.session_id,
+            input_messages=[...],
+        )
+
+        # Before hooks
+        await self._hooks_runner.run_before(context, session.state)
+
+        # ... invoke model ...
+
+        # After hooks
+        await self._hooks_runner.run_after(context, session.state)
+
+
+# Hook that maintains state - returns updated state
+class InMemoryStorageHooks(ContextHooks):
+    async def before_run(
+        self,
+        context: SessionContext,
+        state: dict[str, Any] | None,
+    ) -> dict[str, Any] | None:
+        # Read from own state (or empty if first invocation)
+        messages = (state or {}).get("messages", [])
+        context.add_messages(self.source_id, messages)
+        return None  # No state change in before_run
+
+    async def after_run(
+        self,
+        context: SessionContext,
+        state: dict[str, Any] | None,
+    ) -> dict[str, Any]:  # Returns updated state
+        messages = (state or {}).get("messages", [])
+        return {
+            "messages": [
+                *messages,
+                *context.input_messages,
+                *(context.response_messages or []),
+            ],
+        }
+
+
+# Stateless hook - returns None (no state to store)
+class TimeContextHooks(ContextHooks):
+    async def before_run(
+        self,
+        context: SessionContext,
+        state: dict[str, Any] | None,
+    ) -> None:
+        context.add_instructions(self.source_id, f"Current time: {datetime.now()}")
+
+    async def after_run(
+        self,
+        context: SessionContext,
+        state: dict[str, Any] | None,
+    ) -> None:
+        pass  # No state, nothing to do after
+```
+
+##### Option B2: SessionState Object
+
+The session stores state in a dedicated `SessionState` object. Each hook receives its own state slice through a mutable wrapper that writes back automatically.
+
+```python
+class HookState:
+    """Mutable wrapper for a single hook's state.
+
+    Changes are written back to the session state automatically.
+    """
+
+    def __init__(self, session_state: dict[str, dict[str, Any]], source_id: str):
+        self._session_state = session_state
+        self._source_id = source_id
+        if source_id not in session_state:
+            session_state[source_id] = {}
+
+    def get(self, key: str, default: Any = None) -> Any:
+        return self._session_state[self._source_id].get(key, default)
+
+    def set(self, key: str, value: Any) -> None:
+        self._session_state[self._source_id][key] = value
+
+    def update(self, values: dict[str, Any]) -> None:
+        self._session_state[self._source_id].update(values)
+
+
+class SessionState:
+    """Structured state container for a session."""
+
+    def __init__(self, session_id: str):
+        self.session_id = session_id
+        self.service_session_id: str | None = None
+        self._hook_state: dict[str, dict[str, Any]] = {}  # source_id -> state
+
+    def get_hook_state(self, source_id: str) -> HookState:
+        """Get mutable state wrapper for a specific hook."""
+        return HookState(self._hook_state, source_id)
+
+
+class AgentSession:
+    """Session holds a SessionState object."""
+
+    def __init__(self, *, session_id: str | None = None):
+        self._session_id = session_id or str(uuid.uuid4())
+        self._state = SessionState(self._session_id)
+
+    @property
+    def state(self) -> SessionState:
+        return self._state
+
+
+class ContextHooksRunner:
+    """Agent-owned runner that executes hooks with session state."""
+
+    def __init__(self, hooks: Sequence[ContextHooks]):
+        self._hooks = list(hooks)
+
+    async def run_before(
+        self,
+        context: SessionContext,
+        session_state: SessionState,
+    ) -> None:
+        """Run before_run for all hooks."""
+        for hook in self._hooks:
+            my_state = session_state.get_hook_state(hook.source_id)
+            await hook.before_run(context, my_state)
+
+    async def run_after(
+        self,
+        context: SessionContext,
+        session_state: SessionState,
+    ) -> None:
+        """Run after_run for all hooks in reverse order."""
+        for hook in reversed(self._hooks):
+            my_state = session_state.get_hook_state(hook.source_id)
+            await hook.after_run(context, my_state)
+
+
+# Hook uses HookState wrapper - no return needed
+class InMemoryStorageHooks(ContextHooks):
+    async def before_run(
+        self,
+        context: SessionContext,
+        state: HookState,  # Mutable wrapper
+    ) -> None:
+        messages = state.get("messages", [])
+        context.add_messages(self.source_id, messages)
+
+    async def after_run(
+        self,
+        context: SessionContext,
+        state: HookState,  # Mutable wrapper
+    ) -> None:
+        messages = state.get("messages", [])
+        state.set("messages", [
+            *messages,
+            *context.input_messages,
+            *(context.response_messages or []),
+        ])
+
+
+# Stateless hook - state wrapper provided but not used
+class TimeContextHooks(ContextHooks):
+    async def before_run(
+        self,
+        context: SessionContext,
+        state: HookState,
+    ) -> None:
+        context.add_instructions(self.source_id, f"Current time: {datetime.now()}")
+
+    async def after_run(
+        self,
+        context: SessionContext,
+        state: HookState,
+    ) -> None:
+        pass  # Nothing to do
+```
+
+**Option B Pros (both variants):**
+- Lightweight sessions - just data, easy to serialize/transfer
+- Hook instances shared across sessions (more memory efficient)
+- Clearer separation: agent = behavior, session = state
+
+**Option B Cons (both variants):**
+- More complex execution model (agent + session coordination)
+- Hooks must explicitly read/write state (no implicit instance variables)
+- Session given to another agent may not work (different hooks configuration)
+
+**B1 vs B2:**
+
+| Aspect | B1: Simple Dict | B2: SessionState Object |
+|--------|-----------------|-------------------------|
+| Simplicity | Simpler, less abstraction | More structure, helper methods |
+| State return | Optional return (`dict | None`) | Mutable wrapper, no return needed |
+| Type safety | `dict[str, Any] | None` - loose | Can add type hints on methods |
+| Extensibility | Add keys as needed | Can add methods/validation |
+| Serialization | Direct JSON serialization | Need custom serialization |
+
+#### Comparison
+
+| Aspect | Option A: Instances in Session | Option B: Instances in Agent |
+|--------|-------------------------------|------------------------------|
+| Session weight | Heavier (instances + state) | Lighter (state only) |
+| Hook sharing | Per-session instances | Shared across sessions |
+| Instance state | Natural (instance variables) | Explicit (state dict) |
+| Serialization | Serialize session + hooks | Serialize state only |
+| Factory handling | Resolved at session creation | See open discussion below |
+| Signature | `before_run(context)` | `before_run(context, state)` |
+| Session portability | Works with any agent | Tied to agent's hooks config |
+
+#### Open Discussion: Hook Factories in Option B
+
+With Option A (instances in session), hook factories naturally fit: the factory is called at session creation to produce a per-session instance that can hold instance-level state.
+
+With Option B (instances in agent, state in session), the hooks are shared across sessions, which raises questions about factories:
+
+**Question:** What is the purpose of hook factories when hooks are shared?
+
+**Possible approaches:**
+
+1. **No factories in Option B** - Since state is externalized, there's no need for per-session instances. All hooks are shared. If a hook needs per-session initialization, it can do so in `before_run` on first call (checking if state is empty).
+
+2. **Factories create agent-level instances** - Factories are called once when the agent is created, not per-session. Useful for dependency injection or configuration, but not per-session state.
+
+3. **Factories still create per-session instances** - Keep factory support, but now factories create instances that are stored... where? This reintroduces complexity:
+   - Store instances in session? (Back to Option A)
+   - Store instances in agent keyed by session_id? (Memory leak risk)
+   - Discard after use? (Defeats purpose of instance state)
+
+4. **Hybrid: allow both shared and per-session hooks** - Agent can have a mix:
+   ```python
+   agent = ChatAgent(
+       context_hooks=[
+           SharedRAGHooks("rag"),  # Shared instance
+           lambda session_id: PerSessionCache("cache", session_id),  # Factory
+       ]
+   )
+   ```
+   Per-session hooks would need to be stored somewhere (session or agent).
+
+**Recommendation:** If choosing Option B, consider dropping factory support entirely. The explicit state parameter handles per-session state needs. Factories in Option A exist primarily to enable per-session instance state, which Option B solves differently via the state parameter.
+
 ---
 
 ## Comparison to .NET Implementation

From 50589b0f4e6700e3f9bdc7a93f84b094e860a6ff Mon Sep 17 00:00:00 2001
From: eavanvalkenburg <github@vanvalkenburg.eu>
Date: Thu, 5 Feb 2026 15:38:13 +0100
Subject: [PATCH 12/19] Update ADR: Choose ContextPlugin with
 before_run/after_run and Option B1

Decision outcomes:
- Option 3 (Hooks pattern) with ContextPlugin class name
- Methods: before_run/after_run
- Option B1: Instances in Agent, State in Session (simple dict)
- Whole state dict passed to plugins (mutable, no return needed)
- Added trust note: plugins reason over messages, so they're trusted by default

Status changed from proposed to accepted.
---
 .../00XX-python-context-middleware.md         | 273 +++++++++---------
 1 file changed, 132 insertions(+), 141 deletions(-)

diff --git a/docs/decisions/00XX-python-context-middleware.md b/docs/decisions/00XX-python-context-middleware.md
index 7edde44767..673a3f364b 100644
--- a/docs/decisions/00XX-python-context-middleware.md
+++ b/docs/decisions/00XX-python-context-middleware.md
@@ -1,13 +1,13 @@
 ---
 # These are optional elements. Feel free to remove any of them.
-status: proposed
+status: accepted
 contact: eavanvalkenburg
-date: 2026-02-02
-deciders: eavanvalkenburg, markwallace-microsoft, sphenry, alliscode, johanst, brettcannon
+date: 2026-02-05
+deciders: eavanvalkenburg, markwallace-microsoft, sphenry, alliscode, johanst, brettcannon, westey-m
 consulted: taochenosu, moonbox3, dmytrostruk, giles17
 ---
 
-# Unifying Context Management with ContextMiddleware
+# Unifying Context Management with ContextPlugin
 
 ## Context and Problem Statement
 
@@ -115,24 +115,25 @@ class AgentThread:
 
 ## Design Decisions Summary
 
-The following key decisions shape the ContextMiddleware design:
+The following key decisions shape the ContextPlugin design:
 
 | # | Decision | Rationale |
 |---|----------|-----------|
-| 1 | **Agent vs Session Ownership** | Agent owns middleware config; Session owns resolved pipeline. Enables per-session factories. **(TBD - see Decision 2 in Outcome)** |
-| 2 | **Instance or Factory** | Middleware can be shared instances or `(session_id) -> Middleware` factories for per-session state. |
-| 3 | **Default Storage at Runtime** | `InMemoryStorageMiddleware` auto-added when no service_session_id, store≠True, and no pipeline. Evaluated at runtime so users can modify pipeline first. |
-| 4 | **Multiple Storage Allowed** | Warn at session creation if multiple or zero storage middleware/hooks have `load_messages=True` (likely misconfiguration). |
-| 5 | **Single Storage Class** | One `StorageContextMiddleware` configured for memory/audit/evaluation - no separate classes. |
-| 6 | **Mandatory source_id** | Required parameter forces explicit naming for attribution in `context_messages` dict. |
-| 7 | **Explicit Load Behavior** | `load_messages: bool = True` - explicit configuration with no automatic detection. For `StorageContextHooks`, `before_run` is skipped entirely when `load_messages=False`. |
-| 8 | **Dict-based Context** | `context_messages: dict[str, list[ChatMessage]]` keyed by source_id maintains order and enables filtering. Messages can have an `attribution` marker in `additional_properties` for external filtering scenarios. |
-| 9 | **Selective Storage** | `store_context_messages` and `store_context_from` control what gets persisted from other middleware. |
-| 10 | **Tool Attribution** | `add_tools()` automatically sets `tool.metadata["context_source"] = source_id`. |
-| 11 | **Clean Break** | Remove `AgentThread`, `ContextProvider`, `ChatMessageStore` completely (preview, no compatibility shims). |
-| 12 | **Middleware Ordering** | User-defined order; storage sees prior middleware (pre-processing) or all middleware (post-processing). |
-| 13 | **Agent-owned Serialization** | `agent.serialize_session(session)` and `agent.restore_session(state)`. Agent handles all serialization. |
-| 14 | **Session Management Methods** | `agent.create_session()` (no required params) and `agent.get_session_by_id(id)` for clear lifecycle management. |
+| 1 | **Agent vs Session Ownership** | Agent owns plugin instances; Session owns state as mutable dict. Plugins shared across sessions, state isolated per session. |
+| 2 | **Execution Pattern** | **ContextPlugin** with `before_run`/`after_run` methods (hooks pattern). Simpler mental model than wrapper/onion pattern. |
+| 3 | **State Management** | Whole state dict (`dict[str, Any]`) passed to each plugin. Dict is mutable, so no return value needed. |
+| 4 | **Default Storage at Runtime** | `InMemoryStoragePlugin` auto-added when no service_session_id, store≠True, and no plugins. Evaluated at runtime so users can modify pipeline first. |
+| 5 | **Multiple Storage Allowed** | Warn at session creation if multiple or zero storage plugins have `load_messages=True` (likely misconfiguration). |
+| 6 | **Single Storage Class** | One `StorageContextPlugin` configured for memory/audit/evaluation - no separate classes. |
+| 7 | **Mandatory source_id** | Required parameter forces explicit naming for attribution in `context_messages` dict. |
+| 8 | **Explicit Load Behavior** | `load_messages: bool = True` - explicit configuration with no automatic detection. For `StorageContextPlugin`, `before_run` is skipped entirely when `load_messages=False`. |
+| 9 | **Dict-based Context** | `context_messages: dict[str, list[ChatMessage]]` keyed by source_id maintains order and enables filtering. Messages can have an `attribution` marker in `additional_properties` for external filtering scenarios. |
+| 10 | **Selective Storage** | `store_context_messages` and `store_context_from` control what gets persisted from other plugins. |
+| 11 | **Tool Attribution** | `add_tools()` automatically sets `tool.metadata["context_source"] = source_id`. |
+| 12 | **Clean Break** | Remove `AgentThread`, `ContextProvider`, `ChatMessageStore` completely (preview, no compatibility shims). |
+| 13 | **Plugin Ordering** | User-defined order; storage sees prior plugins (pre-processing) or all plugins (post-processing). |
+| 14 | **Agent-owned Serialization** | `agent.serialize_session(session)` and `agent.restore_session(state)`. Agent handles all serialization. |
+| 15 | **Session Management Methods** | `agent.create_session()` (no required params) and `agent.get_session_by_id(id)` for clear lifecycle management. |
 
 ## Considered Options
 
@@ -731,10 +732,15 @@ response = await agent.run("Hello", session=session)
 
 ### Decision 1: Execution Pattern
 
-**TBD** - This ADR presents two viable approaches:
+**Chosen: Option 3 - Hooks (Pre/Post Pattern)** with the following naming:
+- **Class name:** `ContextPlugin` (emphasizes extensibility, familiar from build tools)
+- **Method names:** `before_run` / `after_run` (matches `agent.run()` terminology)
 
-- **Option 2: ContextMiddleware (Wrapper Pattern)** - Consistent with existing middleware patterns, more powerful control flow
-- **Option 3: ContextHooks (Pre/Post Pattern)** - Simpler mental model, easier migration from current `ContextProvider`
+Rationale:
+- Simpler mental model: "before" runs before, "after" runs after - no nesting to understand
+- Easier to implement plugins that only need one phase (just override one method)
+- More similar to the current `ContextProvider` API (`invoking`/`invoked`), easing migration
+- Clearer separation between what this does vs what Agent Middleware can do
 
 Both options share the same:
 - Agent vs Session ownership model
@@ -743,15 +749,25 @@ Both options share the same:
 - Session management methods (`create_session`, `get_session_by_id`, `serialize_session`, `restore_session`)
 - Renaming `AgentThread` → `AgentSession`
 
-The key difference is the execution model: nested wrapper vs linear phases.
-
 ### Decision 2: Instance Ownership (Orthogonal)
 
-**TBD** - Where should the actual middleware/hooks instances live?
+**Chosen: Option B1 - Instances in Agent, State in Session (Simple Dict)**
+
+The `ChatAgent` owns and manages the `ContextPlugin` instances. The `AgentSession` only stores state as a mutable `dict[str, Any]`. Each plugin receives the **whole state dict** (not just its own slice), and since a dict is mutable, no return value is needed - plugins modify the dict in place.
+
+> **Note on trust:** Since all `ContextPlugin` instances reason over conversation messages (which may contain sensitive user data), they should be **trusted by default**. This is also why we allow all plugins to see all state - if a plugin is untrusted, it shouldn't be in the pipeline at all. The whole state dict is passed rather than isolated slices because plugins that handle messages already have access to the full conversation context.
 
-This decision is orthogonal to Decision 1 (execution pattern) and applies equally to both Middleware and Hooks approaches.
+Rationale for B1 over B2: Simpler is better. The whole state dict is passed to each plugin, and since Python dicts are mutable, plugins can modify state in place without returning anything. This is the most Pythonic approach.
 
-#### Option A: Instances in Session (Current Proposal)
+Rationale for B over A:
+- Lightweight sessions - just data, easy to serialize/transfer
+- Plugin instances shared across sessions (more memory efficient)
+- Clearer separation: agent = behavior, session = state
+- Factories not needed - state dict handles per-session needs
+
+### Instance Ownership Options (for reference)
+
+#### Option A: Instances in Session
 
 The `AgentSession` owns the actual middleware/hooks instances. The pipeline is created when the session is created, and instances are stored in the session.
 
@@ -812,15 +828,15 @@ class ChatAgent:
 - Harder to share stateless middleware across sessions efficiently
 - Factories must be re-resolved for each session
 
-#### Option B: Instances in Agent, State in Session
+#### Option B: Instances in Agent, State in Session (CHOSEN)
 
 The `ChatAgent` owns and manages the middleware/hooks instances. The `AgentSession` only stores state data that middleware reads/writes. The agent's runner executes the pipeline using the session's state.
 
 Two variants exist for how state is stored in the session:
 
-##### Option B1: Simple Dict State
+##### Option B1: Simple Dict State (CHOSEN)
 
-The session stores state as a simple `dict[str, Any]` keyed by `source_id`. Each hook receives its own state slice and can optionally return updated state.
+The session stores state as a simple `dict[str, Any]`. Each plugin receives the **whole state dict**, and since dicts are mutable in Python, plugins can modify it in place without needing to return a value.
 
 ```python
 class AgentSession:
@@ -829,38 +845,32 @@ class AgentSession:
     def __init__(self, *, session_id: str | None = None):
         self._session_id = session_id or str(uuid.uuid4())
         self.service_session_id: str | None = None
-        self.state: dict[str, Any] = {}  # source_id -> hook state
+        self.state: dict[str, Any] = {}  # Mutable state dict
 
 
-class ContextHooksRunner:
-    """Agent-owned runner that executes hooks with session state."""
+class ContextPluginRunner:
+    """Agent-owned runner that executes plugins with session state."""
 
-    def __init__(self, hooks: Sequence[ContextHooks]):
-        self._hooks = list(hooks)
+    def __init__(self, plugins: Sequence[ContextPlugin]):
+        self._plugins = list(plugins)
 
-    async def run_before(
+    async def before_run(
         self,
         context: SessionContext,
-        session_state: dict[str, Any],
+        state: dict[str, Any],
     ) -> None:
-        """Run before_run for all hooks, passing each only its own state."""
-        for hook in self._hooks:
-            my_state = session_state.get(hook.source_id)
-            new_state = await hook.before_run(context, my_state)
-            if new_state is not None:
-                session_state[hook.source_id] = new_state
+        """Run before_run for all plugins, passing the whole state dict."""
+        for plugin in self._plugins:
+            await plugin.before_run(context, state)  # Dict is mutable, no return needed
 
-    async def run_after(
+    async def after_run(
         self,
         context: SessionContext,
-        session_state: dict[str, Any],
+        state: dict[str, Any],
     ) -> None:
-        """Run after_run for all hooks in reverse order."""
-        for hook in reversed(self._hooks):
-            my_state = session_state.get(hook.source_id)
-            new_state = await hook.after_run(context, my_state)
-            if new_state is not None:
-                session_state[hook.source_id] = new_state
+        """Run after_run for all plugins in reverse order."""
+        for plugin in reversed(self._plugins):
+            await plugin.after_run(context, state)  # Dict is mutable, no return needed
 
 
 class ChatAgent:
@@ -868,10 +878,10 @@ class ChatAgent:
         self,
         chat_client: ...,
         *,
-        context_hooks: Sequence[ContextHooks] | None = None,
+        context_plugins: Sequence[ContextPlugin] | None = None,
     ):
-        # Agent owns the actual hook instances
-        self._hooks_runner = ContextHooksRunner(list(context_hooks or []))
+        # Agent owns the actual plugin instances
+        self._plugin_runner = ContextPluginRunner(list(context_plugins or []))
 
     def create_session(self, *, session_id: str | None = None) -> AgentSession:
         """Create lightweight session with just state."""
@@ -883,55 +893,57 @@ class ChatAgent:
             input_messages=[...],
         )
 
-        # Before hooks
-        await self._hooks_runner.run_before(context, session.state)
+        # Before-run plugins
+        await self._plugin_runner.before_run(context, session.state)
 
-        # ... invoke model ...
+        # assemble final input messages from context
 
-        # After hooks
-        await self._hooks_runner.run_after(context, session.state)
+        # ... actual running, i.e. `get_response` for ChatAgent ...
 
+        # After-run plugins
+        await self._plugin_runner.after_run(context, session.state)
 
-# Hook that maintains state - returns updated state
-class InMemoryStorageHooks(ContextHooks):
+
+# Plugin that maintains state - modifies dict in place
+class InMemoryStoragePlugin(ContextPlugin):
     async def before_run(
         self,
         context: SessionContext,
-        state: dict[str, Any] | None,
-    ) -> dict[str, Any] | None:
-        # Read from own state (or empty if first invocation)
-        messages = (state or {}).get("messages", [])
+        state: dict[str, Any],
+    ) -> None:
+        # Read from state (use source_id as key for namespace)
+        my_state = state.get(self.source_id, {})
+        messages = my_state.get("messages", [])
         context.add_messages(self.source_id, messages)
-        return None  # No state change in before_run
 
     async def after_run(
         self,
         context: SessionContext,
-        state: dict[str, Any] | None,
-    ) -> dict[str, Any]:  # Returns updated state
-        messages = (state or {}).get("messages", [])
-        return {
-            "messages": [
-                *messages,
-                *context.input_messages,
-                *(context.response_messages or []),
-            ],
-        }
+        state: dict[str, Any],
+    ) -> None:
+        # Modify state dict in place - no return needed
+        my_state = state.setdefault(self.source_id, {})
+        messages = my_state.get("messages", [])
+        my_state["messages"] = [
+            *messages,
+            *context.input_messages,
+            *(context.response_messages or []),
+        ]
 
 
-# Stateless hook - returns None (no state to store)
-class TimeContextHooks(ContextHooks):
+# Stateless plugin - ignores state
+class TimeContextPlugin(ContextPlugin):
     async def before_run(
         self,
         context: SessionContext,
-        state: dict[str, Any] | None,
+        state: dict[str, Any],
     ) -> None:
         context.add_instructions(self.source_id, f"Current time: {datetime.now()}")
 
     async def after_run(
         self,
         context: SessionContext,
-        state: dict[str, Any] | None,
+        state: dict[str, Any],
     ) -> None:
         pass  # No state, nothing to do after
 ```
@@ -1057,67 +1069,44 @@ class TimeContextHooks(ContextHooks):
 
 **Option B Pros (both variants):**
 - Lightweight sessions - just data, easy to serialize/transfer
-- Hook instances shared across sessions (more memory efficient)
+- Plugin instances shared across sessions (more memory efficient)
 - Clearer separation: agent = behavior, session = state
 
 **Option B Cons (both variants):**
 - More complex execution model (agent + session coordination)
-- Hooks must explicitly read/write state (no implicit instance variables)
-- Session given to another agent may not work (different hooks configuration)
+- Plugins must explicitly read/write state (no implicit instance variables)
+- Session given to another agent may not work (different plugins configuration)
 
 **B1 vs B2:**
 
-| Aspect | B1: Simple Dict | B2: SessionState Object |
+| Aspect | B1: Simple Dict (CHOSEN) | B2: SessionState Object |
 |--------|-----------------|-------------------------|
 | Simplicity | Simpler, less abstraction | More structure, helper methods |
-| State return | Optional return (`dict | None`) | Mutable wrapper, no return needed |
-| Type safety | `dict[str, Any] | None` - loose | Can add type hints on methods |
+| State passing | Whole dict passed, mutate in place | Mutable wrapper, no return needed |
+| Type safety | `dict[str, Any]` - loose | Can add type hints on methods |
 | Extensibility | Add keys as needed | Can add methods/validation |
 | Serialization | Direct JSON serialization | Need custom serialization |
 
 #### Comparison
 
-| Aspect | Option A: Instances in Session | Option B: Instances in Agent |
+| Aspect | Option A: Instances in Session | Option B: Instances in Agent (CHOSEN) |
 |--------|-------------------------------|------------------------------|
 | Session weight | Heavier (instances + state) | Lighter (state only) |
-| Hook sharing | Per-session instances | Shared across sessions |
+| Plugin sharing | Per-session instances | Shared across sessions |
 | Instance state | Natural (instance variables) | Explicit (state dict) |
-| Serialization | Serialize session + hooks | Serialize state only |
-| Factory handling | Resolved at session creation | See open discussion below |
+| Serialization | Serialize session + plugins | Serialize state only |
+| Factory handling | Resolved at session creation | Not needed (state dict handles per-session needs) |
 | Signature | `before_run(context)` | `before_run(context, state)` |
-| Session portability | Works with any agent | Tied to agent's hooks config |
-
-#### Open Discussion: Hook Factories in Option B
-
-With Option A (instances in session), hook factories naturally fit: the factory is called at session creation to produce a per-session instance that can hold instance-level state.
-
-With Option B (instances in agent, state in session), the hooks are shared across sessions, which raises questions about factories:
-
-**Question:** What is the purpose of hook factories when hooks are shared?
-
-**Possible approaches:**
-
-1. **No factories in Option B** - Since state is externalized, there's no need for per-session instances. All hooks are shared. If a hook needs per-session initialization, it can do so in `before_run` on first call (checking if state is empty).
-
-2. **Factories create agent-level instances** - Factories are called once when the agent is created, not per-session. Useful for dependency injection or configuration, but not per-session state.
+| Session portability | Works with any agent | Tied to agent's plugins config |
 
-3. **Factories still create per-session instances** - Keep factory support, but now factories create instances that are stored... where? This reintroduces complexity:
-   - Store instances in session? (Back to Option A)
-   - Store instances in agent keyed by session_id? (Memory leak risk)
-   - Discard after use? (Defeats purpose of instance state)
+#### Factories Not Needed with Option B
 
-4. **Hybrid: allow both shared and per-session hooks** - Agent can have a mix:
-   ```python
-   agent = ChatAgent(
-       context_hooks=[
-           SharedRAGHooks("rag"),  # Shared instance
-           lambda session_id: PerSessionCache("cache", session_id),  # Factory
-       ]
-   )
-   ```
-   Per-session hooks would need to be stored somewhere (session or agent).
+With Option B (instances in agent, state in session), the plugins are shared across sessions and the explicit state dict handles per-session needs. Therefore, **factory support is not needed**:
 
-**Recommendation:** If choosing Option B, consider dropping factory support entirely. The explicit state parameter handles per-session state needs. Factories in Option A exist primarily to enable per-session instance state, which Option B solves differently via the state parameter.
+- State is externalized to the session's `state: dict[str, Any]`
+- If a plugin needs per-session initialization, it can do so in `before_run` on first call (checking if state is empty)
+- All plugins are shared across sessions (more memory efficient)
+- Plugins use `state.setdefault(self.source_id, {})` to namespace their state
 
 ---
 
@@ -1127,11 +1116,11 @@ The .NET Agent Framework provides equivalent functionality through a different s
 
 ### Concept Mapping
 
-| .NET Concept | Python Middleware (Option 2) | Python Hooks (Option 3) |
-|--------------|------------------------------|-------------------------|
-| `AIContextProvider` | `ContextMiddleware` | `ContextHooks` |
-| `ChatHistoryProvider` | `StorageContextMiddleware` | `StorageContextHooks` |
-| `AgentSession` | `AgentSession` | `AgentSession` |
+| .NET Concept | Python (Chosen) |
+|--------------|-----------------|
+| `AIContextProvider` | `ContextPlugin` |
+| `ChatHistoryProvider` | `StorageContextPlugin` |
+| `AgentSession` | `AgentSession` |
 
 ### Feature Equivalence
 
@@ -1139,12 +1128,12 @@ Both platforms provide the same core capabilities:
 
 | Capability | .NET | Python |
 |------------|------|--------|
-| Inject context before invocation | `AIContextProvider.InvokingAsync()` | `process()` before `next()` / `before_run()` |
-| React after invocation | `AIContextProvider.InvokedAsync()` | `process()` after `next()` / `after_run()` |
-| Load conversation history | `ChatHistoryProvider.InvokingAsync()` | `StorageContextMiddleware/Hooks` with `load_messages=True` |
-| Store conversation history | `ChatHistoryProvider.InvokedAsync()` | `StorageContextMiddleware/Hooks` with `store_*` flags |
-| Session serialization | `Serialize()` on providers | `serialize()`/`restore()` on middleware/hooks |
-| Factory-based creation | `AIContextProviderFactory`, `ChatHistoryProviderFactory` | Factory functions in middleware/hooks list |
+| Inject context before invocation | `AIContextProvider.InvokingAsync()` | `ContextPlugin.before_run()` |
+| React after invocation | `AIContextProvider.InvokedAsync()` | `ContextPlugin.after_run()` |
+| Load conversation history | `ChatHistoryProvider.InvokingAsync()` | `StorageContextPlugin` with `load_messages=True` |
+| Store conversation history | `ChatHistoryProvider.InvokedAsync()` | `StorageContextPlugin` with `store_*` flags |
+| Session serialization | `Serialize()` on providers | Session's `state` dict is directly serializable |
+| Factory-based creation | `AIContextProviderFactory`, `ChatHistoryProviderFactory` | Not needed - state dict handles per-session needs |
 
 ### Implementation Differences
 
@@ -1152,12 +1141,13 @@ The implementations differ in ways idiomatic to each language:
 
 | Aspect | .NET Approach | Python Approach |
 |--------|---------------|-----------------|
-| **Context providers** | Separate `AIContextProvider` (single) and `ChatHistoryProvider` (single) | Unified list of middleware/hooks (multiple) |
-| **Composition** | One of each provider type per session | Unlimited middleware/hooks in pipeline |
+| **Context providers** | Separate `AIContextProvider` (single) and `ChatHistoryProvider` (single) | Unified list of `ContextPlugin` (multiple) |
+| **Composition** | One of each provider type per session | Unlimited plugins in pipeline |
 | **Type system** | Strict interfaces, compile-time checks | Duck typing, protocols, runtime flexibility |
-| **Configuration** | DI container, factory delegates | Direct instantiation, list of instances/factories |
-| **Default storage** | Can auto-inject when `ChatHistoryProvider` missing | Only auto-injects when no pipeline configured |
-| **Source tracking** | Via separate provider types | Built-in `source_id` on each middleware/hook |
+| **Configuration** | DI container, factory delegates | Direct instantiation, list of instances |
+| **State management** | Instance state in providers | Explicit state dict in session |
+| **Default storage** | Can auto-inject when `ChatHistoryProvider` missing | Only auto-injects when no plugins configured |
+| **Source tracking** | Via separate provider types | Built-in `source_id` on each plugin |
 
 ### Design Trade-offs
 
@@ -1173,6 +1163,7 @@ Each approach has trade-offs that align with language conventions:
 - Multiple instances of same type (e.g., multiple storage backends)
 - More explicit - customization means owning full configuration
 - `source_id` enables filtering/debugging across all sources
+- Explicit state dict makes serialization trivial
 
 Neither approach is inherently better - they reflect different language philosophies while achieving equivalent functionality. The Python design embraces the "we're all consenting adults" philosophy, while .NET provides more compile-time guardrails.
 
@@ -1191,7 +1182,7 @@ Currently, this is challenging because:
 
 ### Design Question
 
-Should `ContextMiddleware`/`ContextHooks` be invoked:
+Should `ContextPlugin` be invoked:
 1. **Only at agent invocation boundaries** (current proposal) - before/after each `agent.run()` call
 2. **During the tool loop** - before/after each model call within a single `agent.run()`
 
@@ -1203,7 +1194,7 @@ While boundary and in-run compaction could potentially use the same mechanism, t
 - **Before run**: Keep context manageable - load a compacted view of history
 - **After run**: Keep storage compact - summarize/truncate before persisting
 - Useful for maintaining reasonable context sizes across conversation turns
-- One reason to have **multiple storage middleware**: persist compacted history for use during runs, while also storing the full uncompacted history for auditing and evaluations
+- One reason to have **multiple storage plugins**: persist compacted history for use during runs, while also storing the full uncompacted history for auditing and evaluations
 
 **In-run compaction** (during function calling loops):
 - Relevant for **function calling scenarios** where many tool calls accumulate
@@ -1234,17 +1225,17 @@ For local storage, a full message list is sent to the model each time, making co
 - Single mechanism for all context manipulation
 - More powerful but more complex
 - Requires coordination with `ChatClient` internals
-- Risk: Performance overhead if middleware/hooks are expensive
+- Risk: Performance overhead if plugins are expensive
 
 **Option C: Unified approach across layers**
 - Define a single context compaction abstraction that works at both agent and client levels
-- `ContextMiddleware`/`ContextHooks` could delegate to `ChatMiddleware` for mid-loop execution
+- `ContextPlugin` could delegate to `ChatMiddleware` for mid-loop execution
 - Requires deeper architectural thought
 
 ### Potential Extension Points (for any option)
 
 Regardless of the chosen approach, these extension points could support compaction:
-- A `CompactionStrategy` that can be shared between middleware/hooks and function calling configuration
+- A `CompactionStrategy` that can be shared between plugins and function calling configuration
 - Hooks for `ChatClient` to notify the agent layer when context limits are approaching
 - A unified `ContextManager` that coordinates compaction across layers
 - **Message-level attribution**: The `attribution` marker in `ChatMessage.additional_properties` can be used during compaction to identify messages that should be preserved (e.g., `attribution: "important"`) or that are safe to remove (e.g., `attribution: "ephemeral"`). This prevents accidental filtering of critical context during aggressive compaction.

From a75cd7e39c5ac17848c861b59ebea0453c94244c Mon Sep 17 00:00:00 2001
From: eavanvalkenburg <github@vanvalkenburg.eu>
Date: Thu, 5 Feb 2026 15:45:09 +0100
Subject: [PATCH 13/19] Add agent and session params to before_run/after_run
 methods

Signature now: before_run(agent, session, context, state)
---
 .../00XX-python-context-middleware.md         | 24 +++++++++++++------
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/docs/decisions/00XX-python-context-middleware.md b/docs/decisions/00XX-python-context-middleware.md
index 673a3f364b..a564e7aa8b 100644
--- a/docs/decisions/00XX-python-context-middleware.md
+++ b/docs/decisions/00XX-python-context-middleware.md
@@ -856,21 +856,23 @@ class ContextPluginRunner:
 
     async def before_run(
         self,
+        agent: "ChatAgent",
+        session: AgentSession,
         context: SessionContext,
-        state: dict[str, Any],
     ) -> None:
         """Run before_run for all plugins, passing the whole state dict."""
         for plugin in self._plugins:
-            await plugin.before_run(context, state)  # Dict is mutable, no return needed
+            await plugin.before_run(agent, session, context, session.state)  # Dict is mutable, no return needed
 
     async def after_run(
         self,
+        agent: "ChatAgent",
+        session: AgentSession,
         context: SessionContext,
-        state: dict[str, Any],
     ) -> None:
         """Run after_run for all plugins in reverse order."""
         for plugin in reversed(self._plugins):
-            await plugin.after_run(context, state)  # Dict is mutable, no return needed
+            await plugin.after_run(agent, session, context, session.state)  # Dict is mutable, no return needed
 
 
 class ChatAgent:
@@ -894,20 +896,22 @@ class ChatAgent:
         )
 
         # Before-run plugins
-        await self._plugin_runner.before_run(context, session.state)
+        await self._plugin_runner.before_run(self, session, context)
 
         # assemble final input messages from context
 
         # ... actual running, i.e. `get_response` for ChatAgent ...
 
         # After-run plugins
-        await self._plugin_runner.after_run(context, session.state)
+        await self._plugin_runner.after_run(self, session, context)
 
 
 # Plugin that maintains state - modifies dict in place
 class InMemoryStoragePlugin(ContextPlugin):
     async def before_run(
         self,
+        agent: "ChatAgent",
+        session: AgentSession,
         context: SessionContext,
         state: dict[str, Any],
     ) -> None:
@@ -918,6 +922,8 @@ class InMemoryStoragePlugin(ContextPlugin):
 
     async def after_run(
         self,
+        agent: "ChatAgent",
+        session: AgentSession,
         context: SessionContext,
         state: dict[str, Any],
     ) -> None:
@@ -935,6 +941,8 @@ class InMemoryStoragePlugin(ContextPlugin):
 class TimeContextPlugin(ContextPlugin):
     async def before_run(
         self,
+        agent: "ChatAgent",
+        session: AgentSession,
         context: SessionContext,
         state: dict[str, Any],
     ) -> None:
@@ -942,6 +950,8 @@ class TimeContextPlugin(ContextPlugin):
 
     async def after_run(
         self,
+        agent: "ChatAgent",
+        session: AgentSession,
         context: SessionContext,
         state: dict[str, Any],
     ) -> None:
@@ -1096,7 +1106,7 @@ class TimeContextHooks(ContextHooks):
 | Instance state | Natural (instance variables) | Explicit (state dict) |
 | Serialization | Serialize session + plugins | Serialize state only |
 | Factory handling | Resolved at session creation | Not needed (state dict handles per-session needs) |
-| Signature | `before_run(context)` | `before_run(context, state)` |
+| Signature | `before_run(context)` | `before_run(agent, session, context, state)` |
 | Session portability | Works with any agent | Tied to agent's plugins config |
 
 #### Factories Not Needed with Option B

From 81d6a1f63e54534f82070882f6c64c2ea48cb53b Mon Sep 17 00:00:00 2001
From: eavanvalkenburg <github@vanvalkenburg.eu>
Date: Thu, 5 Feb 2026 15:47:34 +0100
Subject: [PATCH 14/19] Remove ContextPluginRunner, store plugins directly on
 agent

Simpler design: agent stores Sequence[ContextPlugin] and calls
before_run/after_run directly in the run method.
---
 .../00XX-python-context-middleware.md         | 37 +++----------------
 1 file changed, 6 insertions(+), 31 deletions(-)

diff --git a/docs/decisions/00XX-python-context-middleware.md b/docs/decisions/00XX-python-context-middleware.md
index a564e7aa8b..4b4af9b918 100644
--- a/docs/decisions/00XX-python-context-middleware.md
+++ b/docs/decisions/00XX-python-context-middleware.md
@@ -848,33 +848,6 @@ class AgentSession:
         self.state: dict[str, Any] = {}  # Mutable state dict
 
 
-class ContextPluginRunner:
-    """Agent-owned runner that executes plugins with session state."""
-
-    def __init__(self, plugins: Sequence[ContextPlugin]):
-        self._plugins = list(plugins)
-
-    async def before_run(
-        self,
-        agent: "ChatAgent",
-        session: AgentSession,
-        context: SessionContext,
-    ) -> None:
-        """Run before_run for all plugins, passing the whole state dict."""
-        for plugin in self._plugins:
-            await plugin.before_run(agent, session, context, session.state)  # Dict is mutable, no return needed
-
-    async def after_run(
-        self,
-        agent: "ChatAgent",
-        session: AgentSession,
-        context: SessionContext,
-    ) -> None:
-        """Run after_run for all plugins in reverse order."""
-        for plugin in reversed(self._plugins):
-            await plugin.after_run(agent, session, context, session.state)  # Dict is mutable, no return needed
-
-
 class ChatAgent:
     def __init__(
         self,
@@ -883,7 +856,7 @@ class ChatAgent:
         context_plugins: Sequence[ContextPlugin] | None = None,
     ):
         # Agent owns the actual plugin instances
-        self._plugin_runner = ContextPluginRunner(list(context_plugins or []))
+        self._context_plugins = list(context_plugins or [])
 
     def create_session(self, *, session_id: str | None = None) -> AgentSession:
         """Create lightweight session with just state."""
@@ -896,14 +869,16 @@ class ChatAgent:
         )
 
         # Before-run plugins
-        await self._plugin_runner.before_run(self, session, context)
+        for plugin in self._context_plugins:
+            await plugin.before_run(self, session, context, session.state)
 
         # assemble final input messages from context
 
         # ... actual running, i.e. `get_response` for ChatAgent ...
 
-        # After-run plugins
-        await self._plugin_runner.after_run(self, session, context)
+        # After-run plugins (reverse order)
+        for plugin in reversed(self._context_plugins):
+            await plugin.after_run(self, session, context, session.state)
 
 
 # Plugin that maintains state - modifies dict in place

From 5c26843d4a61f7500e6213f4eaf0484250555061 Mon Sep 17 00:00:00 2001
From: Eduard van Valkenburg <github@vanvalkenburg.eu>
Date: Fri, 6 Feb 2026 17:24:12 +0100
Subject: [PATCH 15/19] Update workplan to 2 PRs for simpler review

---
 .../00XX-python-context-middleware.md         | 121 ++++++++++++------
 1 file changed, 80 insertions(+), 41 deletions(-)

diff --git a/docs/decisions/00XX-python-context-middleware.md b/docs/decisions/00XX-python-context-middleware.md
index 4b4af9b918..c0d459ea16 100644
--- a/docs/decisions/00XX-python-context-middleware.md
+++ b/docs/decisions/00XX-python-context-middleware.md
@@ -2522,48 +2522,87 @@ class StorageWithLogging(StorageContextMiddleware):
 
 ### Workplan
 
-#### Phase 1: Core Implementation
-- [ ] Create `ContextMiddleware` base class in `_context_middleware.py` (onion/wrapper pattern)
-- [ ] Create `SessionContext` class with explicit add/get methods
-- [ ] Create `ContextMiddlewarePipeline` with `from_config()` factory method
-- [ ] Create `ContextMiddlewareFactory` type alias and resolution logic
-- [ ] Create `StorageContextMiddleware` base class with load_messages/store flags
-- [ ] Implement pipeline validation (warn if multiple or zero storage middleware have `load_messages=True`)
-- [ ] Add `serialize()` and `restore()` methods to `ContextMiddleware` base class
-
-#### Phase 2: AgentSession Implementation
-- [ ] Create `AgentSession` class with `context_pipeline` attribute
-- [ ] Add `context_middleware: Sequence[ContextMiddlewareConfig]` parameter to `BaseAgent` and `ChatAgent`
-- [ ] Implement `create_session()` that resolves factories and creates pipeline
-- [ ] Wire up context pipeline execution in agent invocation flow
-- [ ] Implement `AgentSession.serialize()` to capture middleware states
-- [ ] Implement `Agent.restore_session()` to reconstruct session from serialized state
-- [ ] Remove `AgentThread` completely (no alias, clean break)
-
-#### Phase 3: Built-in Middleware
-- [ ] Create `InMemoryStorageMiddleware` (replaces `ChatMessageStore`)
-- [ ] Implement `serialize()`/`restore()` for `InMemoryStorageMiddleware`
-- [ ] Create `@context_middleware` decorator for function-based middleware
-
-#### Phase 4: Migrate Existing Implementations
-- [ ] Migrate `AzureAISearchContextProvider` → `AzureAISearchContextMiddleware`
-- [ ] Migrate `RedisProvider` → `RedisStorageMiddleware`
-- [ ] Migrate `Mem0Provider` → `Mem0ContextMiddleware`
-- [ ] Create optional `ContextProviderAdapter` for gradual migration (if needed)
-
-#### Phase 5: Cleanup & Documentation
-- [ ] Remove `ContextProvider` class
-- [ ] Remove `ChatMessageStore` / `ChatMessageStoreProtocol`
-- [ ] Update all samples to use new middleware pattern
+The implementation is split into 2 PRs to limit scope and simplify review.
+
+```
+PR1 (New Types) ──► PR2 (Agent Integration + Cleanup)
+```
+
+#### PR 1: New Types
+
+**Goal:** Create all new types. No changes to existing code yet.
+
+**Core Package - `packages/core/agent_framework/_sessions.py`:**
+- [ ] `SessionContext` class with explicit add/get methods
+- [ ] `ContextPlugin` base class with `before_run()`/`after_run()`
+- [ ] `StorageContextPlugin` derived class with load_messages/store flags
+- [ ] Add `serialize()` and `restore()` methods to `ContextPlugin` base class
+- [ ] `AgentSession` class with `state: dict[str, Any]`
+- [ ] `InMemoryStoragePlugin(StorageContextPlugin)`
+
+**External Packages:**
+- [ ] `packages/azure-ai-search/` - create `AzureAISearchContextPlugin`
+- [ ] `packages/redis/` - create `RedisStoragePlugin`
+- [ ] `packages/mem0/` - create `Mem0ContextPlugin`
+
+**Testing:**
+- [ ] Unit tests for `SessionContext` methods (add_messages, get_messages, add_instructions, add_tools)
+- [ ] Unit tests for `StorageContextPlugin` load/store flags
+- [ ] Unit tests for `InMemoryStoragePlugin` serialize/restore
+- [ ] Unit tests for source attribution (mandatory source_id)
+
+---
+
+#### PR 2: Agent Integration + Cleanup
+
+**Goal:** Wire up new types into `ChatAgent` and remove old types.
+
+**Changes to `ChatAgent`:**
+- [ ] Replace `thread` parameter with `session` in `agent.run()`
+- [ ] Add `context_plugins` parameter to `ChatAgent.__init__()`
+- [ ] Add `create_session()` method
+- [ ] Add `serialize_session()` / `restore_session()` methods
+- [ ] Wire up plugin iteration (before_run forward, after_run reverse)
+- [ ] Add validation warning if multiple/zero storage plugins have `load_messages=True`
+- [ ] Wire up default `InMemoryStoragePlugin` behavior (auto-add when no plugins and no service_session_id)
+
+**Remove Legacy Types:**
+- [ ] `packages/core/agent_framework/_memory.py` - remove `ContextProvider` class
+- [ ] `packages/core/agent_framework/_threads.py` - remove `ChatMessageStore`, `ChatMessageStoreProtocol`, `AgentThread`
+- [ ] `packages/core/agent_framework/__init__.py` - remove old exports, add new exports from `_sessions.py`
+- [ ] Remove old provider classes from `azure-ai-search`, `redis`, `mem0`
+
+**Documentation & Samples:**
+- [ ] Update all samples in `samples/` to use new API
 - [ ] Write migration guide
 - [ ] Update API documentation
 
-#### Phase 6: Testing
-- [ ] Unit tests for `ContextMiddleware` and pipeline execution order
-- [ ] Unit tests for middleware factory resolution
-- [ ] Unit tests for `StorageContextMiddleware` load/store behavior
-- [ ] Unit tests for pipeline validation warnings (multiple/zero loaders)
-- [ ] Unit tests for source attribution (mandatory source_id)
-- [ ] Unit tests for `store_context_messages` and `store_context_from` options
+**Testing:**
+- [ ] Unit tests for plugin execution order (before_run forward, after_run reverse)
+- [ ] Unit tests for validation warnings (multiple/zero loaders)
 - [ ] Unit tests for session serialization/deserialization
-- [ ] Integration tests for full agent flow with middleware
+- [ ] Integration test: agent with `context_plugins` + `session` works
+- [ ] Integration test: full conversation with memory persistence
+- [ ] Ensure all existing tests still pass (with updated API)
+- [ ] Verify no references to removed types remain
+
+---
+
+#### CHANGELOG (single entry for release)
+
+- **[BREAKING]** Replaced `ContextProvider` with `ContextPlugin` (hooks pattern with `before_run`/`after_run`)
+- **[BREAKING]** Replaced `ChatMessageStore` with `StorageContextPlugin`
+- **[BREAKING]** Replaced `AgentThread` with `AgentSession`
+- **[BREAKING]** Replaced `thread` parameter with `session` in `agent.run()`
+- Added `SessionContext` for invocation state with source attribution
+- Added `InMemoryStoragePlugin` for conversation history
+- Added session serialization (`serialize_session`, `restore_session`)
+
+---
+
+#### Estimated Sizes
+
+| PR | New Lines | Modified Lines | Risk |
+|----|-----------|----------------|------|
+| PR1 | ~500 | ~0 | Low |
+| PR2 | ~150 | ~400 | Medium |

From 1372be3ce62feaab147d72df63dccd9ac60d4936 Mon Sep 17 00:00:00 2001
From: eavanvalkenburg <github@vanvalkenburg.eu>
Date: Mon, 9 Feb 2026 11:26:34 +0100
Subject: [PATCH 16/19] updated doc

---
 .../00XX-python-context-middleware.md         | 2117 ++++++++---------
 1 file changed, 982 insertions(+), 1135 deletions(-)

diff --git a/docs/decisions/00XX-python-context-middleware.md b/docs/decisions/00XX-python-context-middleware.md
index c0d459ea16..e1c24b4f7b 100644
--- a/docs/decisions/00XX-python-context-middleware.md
+++ b/docs/decisions/00XX-python-context-middleware.md
@@ -115,22 +115,22 @@ class AgentThread:
 
 ## Design Decisions Summary
 
-The following key decisions shape the ContextPlugin design:
+The following key decisions shape the ContextProvider design:
 
 | # | Decision | Rationale |
 |---|----------|-----------|
 | 1 | **Agent vs Session Ownership** | Agent owns plugin instances; Session owns state as mutable dict. Plugins shared across sessions, state isolated per session. |
-| 2 | **Execution Pattern** | **ContextPlugin** with `before_run`/`after_run` methods (hooks pattern). Simpler mental model than wrapper/onion pattern. |
+| 2 | **Execution Pattern** | **ContextProvider** with `before_run`/`after_run` methods (hooks pattern). Simpler mental model than wrapper/onion pattern. |
 | 3 | **State Management** | Whole state dict (`dict[str, Any]`) passed to each plugin. Dict is mutable, so no return value needed. |
-| 4 | **Default Storage at Runtime** | `InMemoryStoragePlugin` auto-added when no service_session_id, store≠True, and no plugins. Evaluated at runtime so users can modify pipeline first. |
-| 5 | **Multiple Storage Allowed** | Warn at session creation if multiple or zero storage plugins have `load_messages=True` (likely misconfiguration). |
-| 6 | **Single Storage Class** | One `StorageContextPlugin` configured for memory/audit/evaluation - no separate classes. |
+| 4 | **Default Storage at Runtime** | `InMemoryHistoryProvider` auto-added when no service_session_id, store≠True, and no plugins. Evaluated at runtime so users can modify pipeline first. |
+| 5 | **Multiple Storage Allowed** | Warn at session creation if multiple or zero history providers have `load_messages=True` (likely misconfiguration). |
+| 6 | **Single Storage Class** | One `HistoryProvider` configured for memory/audit/evaluation - no separate classes. |
 | 7 | **Mandatory source_id** | Required parameter forces explicit naming for attribution in `context_messages` dict. |
-| 8 | **Explicit Load Behavior** | `load_messages: bool = True` - explicit configuration with no automatic detection. For `StorageContextPlugin`, `before_run` is skipped entirely when `load_messages=False`. |
+| 8 | **Explicit Load Behavior** | `load_messages: bool = True` - explicit configuration with no automatic detection. For `HistoryProvider`, `before_run` is skipped entirely when `load_messages=False`. |
 | 9 | **Dict-based Context** | `context_messages: dict[str, list[ChatMessage]]` keyed by source_id maintains order and enables filtering. Messages can have an `attribution` marker in `additional_properties` for external filtering scenarios. |
 | 10 | **Selective Storage** | `store_context_messages` and `store_context_from` control what gets persisted from other plugins. |
 | 11 | **Tool Attribution** | `add_tools()` automatically sets `tool.metadata["context_source"] = source_id`. |
-| 12 | **Clean Break** | Remove `AgentThread`, `ContextProvider`, `ChatMessageStore` completely (preview, no compatibility shims). |
+| 12 | **Clean Break** | Remove `AgentThread`, old `ContextProvider`, `ChatMessageStore` completely; replace with new `ContextProvider` (hooks pattern), `HistoryProvider`, `AgentSession`. No compatibility shims (preview). |
 | 13 | **Plugin Ordering** | User-defined order; storage sees prior plugins (pre-processing) or all plugins (post-processing). |
 | 14 | **Agent-owned Serialization** | `agent.serialize_session(session)` and `agent.restore_session(state)`. Agent handles all serialization. |
 | 15 | **Session Management Methods** | `agent.create_session()` (no required params) and `agent.get_session_by_id(id)` for clear lifecycle management. |
@@ -733,7 +733,7 @@ response = await agent.run("Hello", session=session)
 ### Decision 1: Execution Pattern
 
 **Chosen: Option 3 - Hooks (Pre/Post Pattern)** with the following naming:
-- **Class name:** `ContextPlugin` (emphasizes extensibility, familiar from build tools)
+- **Class name:** `ContextProvider` (emphasizes extensibility, familiar from build tools)
 - **Method names:** `before_run` / `after_run` (matches `agent.run()` terminology)
 
 Rationale:
@@ -753,9 +753,9 @@ Both options share the same:
 
 **Chosen: Option B1 - Instances in Agent, State in Session (Simple Dict)**
 
-The `ChatAgent` owns and manages the `ContextPlugin` instances. The `AgentSession` only stores state as a mutable `dict[str, Any]`. Each plugin receives the **whole state dict** (not just its own slice), and since a dict is mutable, no return value is needed - plugins modify the dict in place.
+The `ChatAgent` owns and manages the `ContextProvider` instances. The `AgentSession` only stores state as a mutable `dict[str, Any]`. Each plugin receives the **whole state dict** (not just its own slice), and since a dict is mutable, no return value is needed - plugins modify the dict in place.
 
-> **Note on trust:** Since all `ContextPlugin` instances reason over conversation messages (which may contain sensitive user data), they should be **trusted by default**. This is also why we allow all plugins to see all state - if a plugin is untrusted, it shouldn't be in the pipeline at all. The whole state dict is passed rather than isolated slices because plugins that handle messages already have access to the full conversation context.
+> **Note on trust:** Since all `ContextProvider` instances reason over conversation messages (which may contain sensitive user data), they should be **trusted by default**. This is also why we allow all plugins to see all state - if a plugin is untrusted, it shouldn't be in the pipeline at all. The whole state dict is passed rather than isolated slices because plugins that handle messages already have access to the full conversation context.
 
 Rationale for B1 over B2: Simpler is better. The whole state dict is passed to each plugin, and since Python dicts are mutable, plugins can modify state in place without returning anything. This is the most Pythonic approach.
 
@@ -853,10 +853,10 @@ class ChatAgent:
         self,
         chat_client: ...,
         *,
-        context_plugins: Sequence[ContextPlugin] | None = None,
+        context_providers: Sequence[ContextProvider] | None = None,
     ):
         # Agent owns the actual plugin instances
-        self._context_plugins = list(context_plugins or [])
+        self._context_providers = list(context_providers or [])
 
     def create_session(self, *, session_id: str | None = None) -> AgentSession:
         """Create lightweight session with just state."""
@@ -869,7 +869,7 @@ class ChatAgent:
         )
 
         # Before-run plugins
-        for plugin in self._context_plugins:
+        for plugin in self._context_providers:
             await plugin.before_run(self, session, context, session.state)
 
         # assemble final input messages from context
@@ -877,12 +877,12 @@ class ChatAgent:
         # ... actual running, i.e. `get_response` for ChatAgent ...
 
         # After-run plugins (reverse order)
-        for plugin in reversed(self._context_plugins):
+        for plugin in reversed(self._context_providers):
             await plugin.after_run(self, session, context, session.state)
 
 
 # Plugin that maintains state - modifies dict in place
-class InMemoryStoragePlugin(ContextPlugin):
+class InMemoryHistoryProvider(ContextProvider):
     async def before_run(
         self,
         agent: "ChatAgent",
@@ -913,7 +913,7 @@ class InMemoryStoragePlugin(ContextPlugin):
 
 
 # Stateless plugin - ignores state
-class TimeContextPlugin(ContextPlugin):
+class TimeContextProvider(ContextProvider):
     async def before_run(
         self,
         agent: "ChatAgent",
@@ -1103,8 +1103,8 @@ The .NET Agent Framework provides equivalent functionality through a different s
 
 | .NET Concept | Python (Chosen) |
 |--------------|-----------------|
-| `AIContextProvider` | `ContextPlugin` |
-| `ChatHistoryProvider` | `StorageContextPlugin` |
+| `AIContextProvider` | `ContextProvider` |
+| `ChatHistoryProvider` | `HistoryProvider` |
 | `AgentSession` | `AgentSession` |
 
 ### Feature Equivalence
@@ -1113,10 +1113,10 @@ Both platforms provide the same core capabilities:
 
 | Capability | .NET | Python |
 |------------|------|--------|
-| Inject context before invocation | `AIContextProvider.InvokingAsync()` | `ContextPlugin.before_run()` |
-| React after invocation | `AIContextProvider.InvokedAsync()` | `ContextPlugin.after_run()` |
-| Load conversation history | `ChatHistoryProvider.InvokingAsync()` | `StorageContextPlugin` with `load_messages=True` |
-| Store conversation history | `ChatHistoryProvider.InvokedAsync()` | `StorageContextPlugin` with `store_*` flags |
+| Inject context before invocation | `AIContextProvider.InvokingAsync()` | `ContextProvider.before_run()` |
+| React after invocation | `AIContextProvider.InvokedAsync()` | `ContextProvider.after_run()` |
+| Load conversation history | `ChatHistoryProvider.InvokingAsync()` | `HistoryProvider` with `load_messages=True` |
+| Store conversation history | `ChatHistoryProvider.InvokedAsync()` | `HistoryProvider` with `store_*` flags |
 | Session serialization | `Serialize()` on providers | Session's `state` dict is directly serializable |
 | Factory-based creation | `AIContextProviderFactory`, `ChatHistoryProviderFactory` | Not needed - state dict handles per-session needs |
 
@@ -1126,7 +1126,7 @@ The implementations differ in ways idiomatic to each language:
 
 | Aspect | .NET Approach | Python Approach |
 |--------|---------------|-----------------|
-| **Context providers** | Separate `AIContextProvider` (single) and `ChatHistoryProvider` (single) | Unified list of `ContextPlugin` (multiple) |
+| **Context providers** | Separate `AIContextProvider` (single) and `ChatHistoryProvider` (single) | Unified list of `ContextProvider` (multiple) |
 | **Composition** | One of each provider type per session | Unlimited plugins in pipeline |
 | **Type system** | Strict interfaces, compile-time checks | Duck typing, protocols, runtime flexibility |
 | **Configuration** | DI container, factory delegates | Direct instantiation, list of instances |
@@ -1231,54 +1231,141 @@ Regardless of the chosen approach, these extension points could support compacti
 
 ## Implementation Plan
 
-See **Appendix A** for the detailed implementation plan including:
-- Complete class definitions
-- User experience examples
-- Phase-by-phase workplan
+See **Appendix A** for class hierarchy, API signatures, and user experience examples.
+See the **Workplan** at the end for PR breakdown and reference implementation.
 
 ---
 
-## Appendix A: Implementation Plan
+## Appendix A: API Overview
 
-### New Types
+### Class Hierarchy
+
+```
+ContextProvider (base - hooks pattern)
+├── HistoryProvider (storage subclass)
+│   ├── InMemoryHistoryProvider (built-in)
+│   ├── RedisHistoryProvider (packages/redis)
+│   └── CosmosHistoryProvider (packages/azure-ai)
+├── AzureAISearchContextProvider (packages/azure-ai-search)
+├── Mem0ContextProvider (packages/mem0)
+└── (custom user providers)
+
+AgentSession (lightweight state container)
+
+SessionContext (per-invocation state)
+```
+
+### ContextProvider
 
 ```python
-# Copyright (c) Microsoft. All rights reserved.
+class ContextProvider(ABC):
+    """Base class for context providers (hooks pattern).
 
-from abc import ABC, abstractmethod
-from collections.abc import Awaitable, Callable, Sequence
-from typing import Any
+    Context providers participate in the context engineering pipeline,
+    adding context before model invocation and processing responses after.
 
-from ._types import ChatMessage
-from ._tools import ToolProtocol
+    Attributes:
+        source_id: Unique identifier for this provider instance (required).
+            Used for message/tool attribution so other providers can filter.
+    """
+
+    def __init__(self, source_id: str):
+        self.source_id = source_id
+
+    async def before_run(
+        self,
+        agent: "ChatAgent",
+        session: AgentSession,
+        context: SessionContext,
+        state: dict[str, Any],
+    ) -> None:
+        """Called before model invocation. Override to add context."""
+        pass
+
+    async def after_run(
+        self,
+        agent: "ChatAgent",
+        session: AgentSession,
+        context: SessionContext,
+        state: dict[str, Any],
+    ) -> None:
+        """Called after model invocation. Override to process response."""
+        pass
+
+    async def serialize(self) -> Any:
+        """Serialize provider state. Default returns None (no state)."""
+        return None
+
+    async def restore(self, state: Any) -> None:
+        """Restore provider state from serialized object."""
+        pass
+```
+
+### HistoryProvider
+
+```python
+class HistoryProvider(ContextProvider):
+    """Base class for conversation history storage providers.
+
+    A single class configured for different use cases:
+    - Primary memory storage (loads + stores messages)
+    - Audit/logging storage (stores only, doesn't load)
+    - Evaluation storage (stores only for later analysis)
+
+    Loading behavior:
+    - `load_messages=True` (default): Load messages from storage in before_run
+    - `load_messages=False`: Skip loading (before_run is a no-op)
+
+    Storage behavior:
+    - `store_inputs`: Store input messages (default True)
+    - `store_responses`: Store response messages (default True)
+    - `store_context_messages`: Also store context from other providers (default False)
+    - `store_context_from`: Only store from specific source_ids (default None = all)
+    """
+
+    def __init__(
+        self,
+        source_id: str,
+        *,
+        load_messages: bool = True,
+        store_inputs: bool = True,
+        store_responses: bool = True,
+        store_context_messages: bool = False,
+        store_context_from: Sequence[str] | None = None,
+    ): ...
+
+    @abstractmethod
+    async def get_messages(self, session_id: str | None) -> list[ChatMessage]:
+        """Retrieve stored messages for this session."""
+        ...
+
+    @abstractmethod
+    async def save_messages(self, session_id: str | None, messages: Sequence[ChatMessage]) -> None:
+        """Persist messages for this session."""
+        ...
+```
 
+### SessionContext
 
+```python
 class SessionContext:
-    """State passed through the ContextMiddleware pipeline for a single invocation.
+    """Per-invocation state passed through the context provider pipeline.
 
-    This object is created fresh for each agent invocation and flows through the
-    middleware pipeline. Middleware can read from and write to the mutable fields
-    to add context before invocation and process responses after.
+    Created fresh for each agent.run() call. Providers read from and write to
+    the mutable fields to add context before invocation and process responses after.
 
     Attributes:
         session_id: The ID of the current session
-        service_session_id: Service-managed session ID (if present, service handles storage)
-        input_messages: The new messages being sent to the agent (read-only, set by caller)
-        context_messages: Dict mapping source_id -> messages added by that middleware.
-            Maintains insertion order (middleware execution order). Use add_context_messages()
-            to add messages with proper source attribution.
-        instructions: Additional instructions - middleware can append here
-        tools: Additional tools - middleware can append here
+        service_session_id: Service-managed session ID (if present)
+        input_messages: New messages being sent to the agent (set by caller)
+        context_messages: Dict mapping source_id -> messages added by that provider.
+            Maintains insertion order (provider execution order).
+        instructions: Additional instructions - providers can append here
+        tools: Additional tools - providers can append here
         response_messages: After invocation, contains the agent's response (set by agent).
-            READ-ONLY - modifications are ignored. Use AgentMiddleware to modify responses.
+            READ-ONLY - use AgentMiddleware to modify responses.
         options: Options passed to agent.run() - READ-ONLY, for reflection only
-        metadata: Shared metadata dictionary for cross-middleware communication
-
-    Note:
-        - `options` is read-only; changes will NOT be merged back into the agent run
-        - `response_messages` is read-only; use AgentMiddleware to modify responses
-        - `instructions` and `tools` are merged by the agent into the run options
-        - `context_messages` values are flattened in order when building the final input
+        metadata: Shared metadata dictionary for cross-provider communication
     """
 
     def __init__(
@@ -1293,88 +1380,27 @@ class SessionContext:
         response_messages: list[ChatMessage] | None = None,
         options: dict[str, Any] | None = None,
         metadata: dict[str, Any] | None = None,
-    ):
-        self.session_id = session_id
-        self.service_session_id = service_session_id
-        self.input_messages = input_messages
-        self.context_messages: dict[str, list[ChatMessage]] = context_messages or {}
-        self.instructions: list[str] = instructions or []
-        self.tools: list[ToolProtocol] = tools or []
-        self.response_messages = response_messages
-        self.options = options or {}  # READ-ONLY - for reflection only
-        self.metadata = metadata or {}
-
-    # --- Methods for adding context ---
+    ): ...
 
     def add_messages(self, source_id: str, messages: Sequence[ChatMessage]) -> None:
-        """Add context messages from a specific source.
-
-        Messages are stored keyed by source_id, maintaining insertion order
-        based on middleware execution order.
-
-        Args:
-            source_id: The middleware source_id adding these messages
-            messages: The messages to add
-        """
-        if source_id not in self.context_messages:
-            self.context_messages[source_id] = []
-        self.context_messages[source_id].extend(messages)
+        """Add context messages from a specific source."""
+        ...
 
     def add_instructions(self, source_id: str, instructions: str | Sequence[str]) -> None:
-        """Add instructions to be prepended to the conversation.
-
-        Instructions are added to a flat list. The source_id is recorded
-        in metadata for debugging but instructions are not keyed by source.
-
-        Args:
-            source_id: The middleware source_id adding these instructions
-            instructions: A single instruction string or sequence of strings
-        """
-        if isinstance(instructions, str):
-            instructions = [instructions]
-        self.instructions.extend(instructions)
+        """Add instructions to be prepended to the conversation."""
+        ...
 
     def add_tools(self, source_id: str, tools: Sequence[ToolProtocol]) -> None:
-        """Add tools to be available for this invocation.
-
-        Tools are added with source attribution in their metadata.
-
-        Args:
-            source_id: The middleware source_id adding these tools
-            tools: The tools to add
-        """
-        for tool in tools:
-            # Add source attribution to tool metadata
-            if hasattr(tool, 'metadata') and isinstance(tool.metadata, dict):
-                tool.metadata["context_source"] = source_id
-        self.tools.extend(tools)
-
-    # --- Methods for reading context ---
+        """Add tools with source attribution in tool.metadata."""
+        ...
 
     def get_messages(
         self,
         sources: Sequence[str] | None = None,
         exclude_sources: Sequence[str] | None = None,
     ) -> list[ChatMessage]:
-        """Get context messages, optionally filtered by source.
-
-        Returns messages in middleware execution order (dict insertion order).
-
-        Args:
-            sources: If provided, only include messages from these sources
-            exclude_sources: If provided, exclude messages from these sources
-
-        Returns:
-            Flattened list of messages in middleware execution order
-        """
-        result: list[ChatMessage] = []
-        for source_id, messages in self.context_messages.items():
-            if sources is not None and source_id not in sources:
-                continue
-            if exclude_sources is not None and source_id in exclude_sources:
-                continue
-            result.extend(messages)
-        return result
+        """Get context messages, optionally filtered by source."""
+        ...
 
     def get_all_messages(
         self,
@@ -1382,1227 +1408,1048 @@ class SessionContext:
         include_input: bool = False,
         include_response: bool = False,
     ) -> list[ChatMessage]:
-        """Get all messages, optionally including input and response.
-
-        Returns messages in the order they would appear in a full conversation:
-        1. Context messages (from middleware, in execution order)
-        2. Input messages (if include_input=True)
-        3. Response messages (if include_response=True)
+        """Get all messages (context + optionally input + response)."""
+        ...
+```
 
-        Args:
-            include_input: If True, append input_messages after context
-            include_response: If True, append response_messages at the end
+### AgentSession (Decision B1)
 
-        Returns:
-            Flattened list of messages in conversation order
-        """
-        result: list[ChatMessage] = []
+```python
+class AgentSession:
+    """A conversation session with an agent.
 
-        # Context messages in middleware execution order
-        for messages in self.context_messages.values():
-            result.extend(messages)
+    Lightweight state container. Provider instances are owned by the agent,
+    not the session. The session only holds session IDs and a mutable state dict.
+    """
 
-        # Input messages (user's new messages for this invocation)
-        if include_input and self.input_messages:
-            result.extend(self.input_messages)
+    def __init__(self, *, session_id: str | None = None):
+        self._session_id = session_id or str(uuid.uuid4())
+        self.service_session_id: str | None = None
+        self.state: dict[str, Any] = {}
 
-        # Response messages (agent's response)
-        if include_response and self.response_messages:
-            result.extend(self.response_messages)
+    @property
+    def session_id(self) -> str:
+        return self._session_id
+```
 
-        return result
+### ChatAgent Integration
 
+```python
+class ChatAgent:
+    def __init__(
+        self,
+        chat_client: ...,
+        *,
+        context_providers: Sequence[ContextProvider] | None = None,
+    ):
+        self._context_providers = list(context_providers or [])
 
-# Type alias for the next middleware callable
-ContextMiddlewareNext = Callable[[SessionContext], Awaitable[None]]
+    def create_session(self, *, session_id: str | None = None, service_session_id: str | None = None) -> AgentSession:
+        """Create a new lightweight session."""
+        session = AgentSession(session_id=session_id)
+        session.service_session_id = service_session_id
+        return session
 
-# Type alias for middleware factory - takes session_id, returns middleware
-ContextMiddlewareFactory = Callable[[str | None], ContextMiddleware]
+    async def run(self, input: str, *, session: AgentSession) -> AgentResponse:
+        context = SessionContext(session_id=session.session_id, input_messages=[...])
 
-# Union type for middleware configuration - either instance or factory
-ContextMiddlewareConfig = ContextMiddleware | ContextMiddlewareFactory
+        # Before-run providers (forward order)
+        for provider in self._context_providers:
+            await provider.before_run(self, session, context, session.state)
 
+        # ... assemble messages, invoke model ...
 
-class ContextMiddleware(ABC):
-    """Base class for context middleware (onion/wrapper pattern).
+        # After-run providers (reverse order)
+        for provider in reversed(self._context_providers):
+            await provider.after_run(self, session, context, session.state)
+```
 
-    Context middleware wraps the context preparation and storage flow,
-    allowing modification of messages, tools, and instructions before
-    invocation and processing of responses after invocation.
+### Message/Tool Attribution
 
-    The process() method receives a context and a next() callable.
-    Before calling next(), you can modify the context (add messages, tools, etc.).
-    After calling next(), the response_messages will be populated and you can
-    process them (store, extract info, etc.).
+The `SessionContext` provides explicit methods for adding context:
 
-    Lifecycle:
-    - session_created(): Called once when a new session is created
-    - process(): Called for each invocation, wraps the context flow
+```python
+# Adding messages (keyed by source_id in context_messages dict)
+context.add_messages(self.source_id, messages)
 
-    Attributes:
-        source_id: Unique identifier for this middleware instance (required).
-            Used for message/tool attribution so other middleware can filter.
-        session_id: The session ID, automatically set when created via factory.
-            None if middleware is shared across sessions (instance mode).
+# Adding instructions (flat list, source_id for debugging)
+context.add_instructions(self.source_id, "Be concise and helpful.")
+context.add_instructions(self.source_id, ["Instruction 1", "Instruction 2"])
 
-    Note:
-        Middleware can be provided to agents as either:
-        - An instantiated middleware object (shared across all sessions)
-        - A factory function `(session_id: str | None) -> ContextMiddleware`
-          that creates a new instance per session
+# Adding tools (source attribution added to tool.metadata automatically)
+context.add_tools(self.source_id, [my_tool, another_tool])
 
-    Examples:
-        # As instance (shared across sessions)
-        class MyContextMiddleware(ContextMiddleware):
-            def __init__(self, source_id: str):
-                super().__init__(source_id=source_id)
-
-            async def process(self, context, next):
-                context.add_instructions(self.source_id, "Be helpful!")
-                await next(context)
-
-        # As factory (new instance per session)
-        def create_session_middleware(session_id: str | None) -> ContextMiddleware:
-            return MySessionMiddleware(
-                source_id="session_specific",
-                session_id=session_id,
-            )
+# Getting all messages in provider execution order
+all_messages = context.get_all_messages()
 
-                # POST-PROCESSING: Handle response after invocation
-                for msg in context.response_messages or []:
-                    print(f"Response: {msg.text}")
-    """
+# Filtering by source
+memory_messages = context.get_messages(sources=["memory"])
+non_rag_messages = context.get_messages(exclude_sources=["rag"])
 
-    def __init__(self, source_id: str, *, session_id: str | None = None):
-        """Initialize the middleware.
+# Direct access to check specific sources
+if "memory" in context.context_messages:
+    history = context.context_messages["memory"]
+```
 
-        Args:
-            source_id: Unique identifier for this middleware instance.
-                Used for message/tool attribution.
-            session_id: Optional session ID. Automatically set when middleware
-                is created via a factory function.
-        """
-        self.source_id = source_id
-        self.session_id = session_id
+---
 
-    async def session_created(self, session_id: str | None) -> None:
-        """Called when a new session is created.
+## User Experience Examples
 
-        Override this to load any initial data from persistent storage
-        or perform session-level initialization.
+### Example 0: Zero-Config Default (Simplest Use Case)
 
-        Note: If you need the session_id, prefer using `self.session_id`
-        which is set automatically when using a factory.
+```python
+from agent_framework import ChatAgent
 
-        Args:
-            session_id: The ID of the newly created session
-        """
-        pass
+# No providers configured - but conversation history still works!
+agent = ChatAgent(
+    chat_client=client,
+    name="assistant",
+    # No context_providers specified
+)
 
-    @abstractmethod
-    async def process(
-        self,
-        context: SessionContext,
-        next: ContextMiddlewareNext
-    ) -> None:
-        """Process the context, wrapping the call to next middleware.
+# Create session - automatically gets InMemoryHistoryProvider on first run
+session = agent.create_session()
+response = await agent.run("Hello, my name is Alice!", session=session)
 
-        Before calling next():
-        - Modify context.context_messages to add messages (RAG, memory, etc.)
-        - Modify context.instructions to add system instructions
-        - Modify context.tools to add tools for this invocation
-        - Access context.history_messages to see loaded history
-        - Access context.input_messages to see new user messages
+# Conversation history is preserved automatically
+response = await agent.run("What's my name?", session=session)
+# Agent remembers: "Your name is Alice!"
 
-        After calling next():
-        - context.response_messages contains the agent's response
-        - Store messages, extract information, perform cleanup
+# With service-managed session - no default storage added (service handles it)
+service_session = agent.create_session(service_session_id="thread_abc123")
 
-        Args:
-            context: The invocation context being processed
-            next: Callable to invoke the next middleware in the chain
-        """
-        pass
+# With store=True in options - user expects service storage, no default added
+response = await agent.run("Hello!", session=session, options={"store": True})
 ```
 
-### Storage Middleware Base
+### Example 1: Explicit Memory Storage
 
 ```python
-class StorageContextMiddleware(ContextMiddleware):
-    """Base class for storage-focused context middleware.
-
-    A single class that can be configured for different use cases:
-    - Primary memory storage (loads + stores messages)
-    - Audit/logging storage (stores only, doesn't load)
-    - Evaluation storage (stores only for later analysis)
+from agent_framework import ChatAgent
+from agent_framework.context import InMemoryHistoryProvider
 
-    Loading behavior (when to add messages to context_messages[source_id]):
-    - `load_messages=True` (default): Load messages from storage
-    - `load_messages=False`: Never load (audit/logging mode)
+# Explicit provider configuration (same behavior as default, but explicit)
+agent = ChatAgent(
+    chat_client=client,
+    name="assistant",
+    context_providers=[
+        InMemoryHistoryProvider(source_id="memory")
+    ]
+)
 
-    Storage behavior:
-    - `store_inputs`: Store input messages (default True)
-    - `store_responses`: Store response messages (default True)
-    - Storage always happens unless explicitly disabled, regardless of load_messages
+# Create session and chat
+session = agent.create_session()
+response = await agent.run("Hello!", session=session)
 
-    Warning: At session creation time, a warning is logged if:
-    - Multiple storage middleware have `load_messages=True` (likely duplicate loading)
-    - Zero storage middleware have `load_messages=True` (likely missing primary storage)
+# Messages are automatically stored and loaded on next invocation
+response = await agent.run("What did I say before?", session=session)
+```
 
-    These are warnings only (not errors) because valid use cases exist for both scenarios,
-    such as intentional multi-source loading or audit-only storage configurations.
+### Example 2: RAG + Memory + Audit (All HistoryProvider)
 
-    Examples:
-        # Primary memory - loads and stores
-        memory = InMemoryStorageMiddleware(source_id="memory")
+```python
+from agent_framework import ChatAgent
+from agent_framework.azure import CosmosHistoryProvider, AzureAISearchContextProvider
+from agent_framework.redis import RedisHistoryProvider
 
-        # Audit storage - stores only, doesn't add to context
-        audit = RedisStorageMiddleware(
-            source_id="audit",
-            load_messages=False,
-            redis_url="redis://...",
-        )
+# RAG provider that injects relevant documents
+search_provider = AzureAISearchContextProvider(
+    source_id="rag",
+    endpoint="https://...",
+    index_name="documents",
+)
 
-        # Evaluation storage - stores responses only
-        eval_storage = CosmosStorageMiddleware(
-            source_id="evaluation",
-            load_messages=False,
-            store_inputs=False,
-            store_responses=True,
-        )
+# Primary memory storage (loads + stores)
+# load_messages=True (default) - loads and stores messages
+memory_provider = RedisHistoryProvider(
+    source_id="memory",
+    redis_url="redis://...",
+)
 
-        # Full audit - stores everything including RAG context
-        full_audit = CosmosStorageMiddleware(
-            source_id="full_audit",
-            load_messages=False,
-            store_context_messages=True,  # Also store context from other middleware
-        )
-    """
+# Audit storage - SAME CLASS, different configuration
+# load_messages=False = never loads, just stores for audit
+audit_provider = CosmosHistoryProvider(
+    source_id="audit",
+    connection_string="...",
+    load_messages=False,  # Don't load - just store for audit
+)
 
-    def __init__(
-        self,
-        source_id: str,
-        *,
-        session_id: str | None = None,
-        load_messages: bool = True,
-        store_responses: bool = True,
-        store_inputs: bool = True,
-        store_context_messages: bool = False,  # Store context added by other middleware
-        store_context_from: Sequence[str] | None = None,  # Only store from these sources
-    ):
-        super().__init__(source_id, session_id=session_id)
-        self.load_messages = load_messages
-        self.store_responses = store_responses
-        self.store_inputs = store_inputs
-        self.store_context_messages = store_context_messages
-        self.store_context_from = list(store_context_from) if store_context_from else None
+agent = ChatAgent(
+    chat_client=client,
+    name="assistant",
+    context_providers=[
+        memory_provider,   # First: loads history
+        search_provider,   # Second: adds RAG context
+        audit_provider,    # Third: stores for audit (no load)
+    ]
+)
+```
 
-    @abstractmethod
-    async def get_messages(self, session_id: str | None) -> list[ChatMessage]:
-        """Retrieve stored messages for this session."""
-        pass
+### Example 3: Custom Context Providers
 
-    @abstractmethod
-    async def save_messages(
-        self,
-        session_id: str | None,
-        messages: Sequence[ChatMessage]
-    ) -> None:
-        """Persist messages for this session."""
-        pass
+```python
+from agent_framework.context import ContextProvider, SessionContext
 
-    def _get_context_messages_to_store(self, context: SessionContext) -> list[ChatMessage]:
-        """Get context messages that should be stored based on configuration."""
-        if not self.store_context_messages:
-            return []
+class TimeContextProvider(ContextProvider):
+    """Adds current time to the context."""
 
-        if self.store_context_from is not None:
-            # Only store from specific sources
-            return context.get_messages(sources=self.store_context_from)
-        else:
-            # Store all context messages (excluding our own to avoid duplication)
-            return context.get_messages(exclude_sources=[self.source_id])
+    async def before_run(self, agent, session, context, state) -> None:
+        from datetime import datetime
+        context.add_instructions(
+            self.source_id,
+            f"Current date and time: {datetime.now().isoformat()}"
+        )
 
-    async def process(
-        self,
-        context: SessionContext,
-        next: ContextMiddlewareNext
-    ) -> None:
-        # PRE: Load history if configured, keyed by our source_id
-        if self.load_messages:
-            history = await self.get_messages(context.session_id)
-            context.add_messages(self.source_id, history)
 
-        # Continue to next middleware
-        await next(context)
+class UserPreferencesProvider(ContextProvider):
+    """Tracks and applies user preferences from conversation."""
 
-        # POST: Store messages
-        messages_to_store: list[ChatMessage] = []
+    async def before_run(self, agent, session, context, state) -> None:
+        prefs = state.get(self.source_id, {}).get("preferences", {})
+        if prefs:
+            context.add_instructions(
+                self.source_id,
+                f"User preferences: {json.dumps(prefs)}"
+            )
 
-        # Optionally store context messages from other middleware
-        messages_to_store.extend(self._get_context_messages_to_store(context))
+    async def after_run(self, agent, session, context, state) -> None:
+        # Extract preferences from response and store in session state
+        for msg in context.response_messages or []:
+            if "preference:" in msg.text.lower():
+                my_state = state.setdefault(self.source_id, {})
+                my_state.setdefault("preferences", {})
+                # ... extract and store preference
 
-        if self.store_inputs:
-            messages_to_store.extend(context.input_messages)
-        if self.store_responses and context.response_messages:
-            messages_to_store.extend(context.response_messages)
-        if messages_to_store:
-            await self.save_messages(context.session_id, messages_to_store)
-```
 
-### Message/Tool Attribution
+# Compose providers - each with mandatory source_id
+agent = ChatAgent(
+    chat_client=client,
+    context_providers=[
+        InMemoryHistoryProvider(source_id="memory"),
+        TimeContextProvider(source_id="time"),
+        UserPreferencesProvider(source_id="prefs"),
+    ]
+)
+```
 
-The `SessionContext` provides explicit methods for adding context:
+### Example 4: Filtering by Source (Using Dict-Based Context)
 
 ```python
-# Adding messages (keyed by source_id in context_messages dict)
-context.add_messages(self.source_id, messages)
-
-# Adding instructions (flat list, source_id for debugging)
-context.add_instructions(self.source_id, "Be concise and helpful.")
-context.add_instructions(self.source_id, ["Instruction 1", "Instruction 2"])
-
-# Adding tools (source attribution added to tool.metadata automatically)
-context.add_tools(self.source_id, [my_tool, another_tool])
+class SelectiveContextProvider(ContextProvider):
+    """Provider that only processes messages from specific sources."""
 
-# Getting all messages in middleware execution order
-all_messages = context.get_all_messages()
+    async def before_run(self, agent, session, context, state) -> None:
+        # Check what sources have added messages so far
+        print(f"Sources so far: {list(context.context_messages.keys())}")
 
-# Filtering by source
-memory_messages = context.get_messages(sources=["memory"])
-non_rag_messages = context.get_messages(exclude_sources=["rag"])
+        # Get messages excluding RAG context
+        non_rag_messages = context.get_messages(exclude_sources=["rag"])
 
-# Direct access to check specific sources
-if "memory" in context.context_messages:
-    history = context.context_messages["memory"]
-```
+        # Or get only memory messages
+        if "memory" in context.context_messages:
+            memory_only = context.context_messages["memory"]
 
-### AgentSession Class (replaces AgentThread)
+        # Do something with filtered messages...
+        # e.g., sentiment analysis, topic extraction
 
-```python
-import uuid
-import warnings
-from collections.abc import Sequence
 
+class RAGContextProvider(ContextProvider):
+    """Provider that adds RAG context."""
 
-def _resolve_middleware(
-    config: ContextMiddlewareConfig,
-    session_id: str | None,
-) -> ContextMiddleware:
-    """Resolve a middleware config to an instance.
+    async def before_run(self, agent, session, context, state) -> None:
+        # Search for relevant documents based on input
+        relevant_docs = await self._search(context.input_messages)
 
-    If config is already a ContextMiddleware instance, return it.
-    If config is a factory callable, call it with session_id to create an instance.
-    """
-    if isinstance(config, ContextMiddleware):
-        return config
-    # It's a factory - call it with session_id
-    return config(session_id)
+        # Add RAG context using explicit method
+        rag_messages = [
+            ChatMessage(role="system", text=f"Relevant info: {doc}")
+            for doc in relevant_docs
+        ]
+        context.add_messages(self.source_id, rag_messages)
+```
 
+### Example 5: Explicit Storage Configuration for Service-Managed Sessions
 
-class ContextMiddlewarePipeline:
-    """Executes a chain of context middleware in onion/wrapper style."""
+```python
+# HistoryProvider uses explicit configuration - no automatic detection.
+# load_messages=True (default): Load messages from storage
+# load_messages=False: Skip loading (useful for audit-only storage)
 
-    def __init__(self, middleware: Sequence[ContextMiddleware]):
-        self._middleware = list(middleware)
-        self._validate_middleware()
+agent = ChatAgent(
+    chat_client=client,
+    context_providers=[
+        RedisHistoryProvider(
+            source_id="memory",
+            redis_url="redis://...",
+            # load_messages=True is the default
+        )
+    ]
+)
 
-    @classmethod
-    def from_config(
-        cls,
-        configs: Sequence[ContextMiddlewareConfig],
-        session_id: str | None,
-    ) -> "ContextMiddlewarePipeline":
-        """Create a pipeline from middleware configs, resolving factories.
+session = agent.create_session()
 
-        Args:
-            configs: Sequence of middleware instances or factories
-            session_id: Session ID to pass to factories
+# Normal run - loads and stores messages
+response = await agent.run("Hello!", session=session)
 
-        Returns:
-            A new pipeline with resolved middleware instances
-        """
-        middleware = [_resolve_middleware(config, session_id) for config in configs]
-        return cls(middleware)
+# For service-managed sessions, configure storage explicitly:
+# - Use load_messages=False when service handles history
+service_storage = RedisHistoryProvider(
+    source_id="audit",
+    redis_url="redis://...",
+    load_messages=False,  # Don't load - service manages history
+)
 
-    def _validate_middleware(self) -> None:
-        """Warn if storage middleware configuration looks like a mistake.
+agent_with_service = ChatAgent(
+    chat_client=client,
+    context_providers=[service_storage]
+)
+service_session = agent_with_service.create_session(service_session_id="thread_abc123")
+response = await agent_with_service.run("Hello!", session=service_session)
+# History provider stores for audit but doesn't load (service handles history)
+```
 
-        These are warnings only (not errors) because valid use cases exist
-        for both multiple loaders and zero loaders.
-        """
-        storage_middleware = [
-            m for m in self._middleware
-            if isinstance(m, StorageContextMiddleware)
-        ]
+### Example 6: Multiple Instances of Same Provider Type
 
-        if not storage_middleware:
-            # No storage middleware at all - that's fine, user may not need it
-            return
+```python
+# You can have multiple instances of the same provider class
+# by using different source_ids
 
-        loaders = [m for m in storage_middleware if m.load_messages is True]
+agent = ChatAgent(
+    chat_client=client,
+    context_providers=[
+        # Primary storage for conversation history
+        RedisHistoryProvider(
+            source_id="conversation_memory",
+            redis_url="redis://primary...",
+            load_messages=True,  # This one loads
+        ),
+        # Secondary storage for audit (different Redis instance)
+        RedisHistoryProvider(
+            source_id="audit_log",
+            redis_url="redis://audit...",
+            load_messages=False,  # This one just stores
+        ),
+    ]
+)
+# Warning will NOT be logged because only one has load_messages=True
+```
 
-        if len(loaders) > 1:
-            warnings.warn(
-                f"Multiple storage middleware configured to load messages: "
-                f"{[m.source_id for m in loaders]}. "
-                f"This may cause duplicate messages in context. "
-                f"If this is intentional, you can ignore this warning.",
-                UserWarning
-            )
-        elif len(loaders) == 0:
-            warnings.warn(
-                f"Storage middleware configured but none have load_messages=True: "
-                f"{[m.source_id for m in storage_middleware]}. "
-                f"No conversation history will be loaded. "
-                f"If this is intentional (e.g., audit-only), you can ignore this warning.",
-                UserWarning
-            )
+### Example 7: Provider Ordering - RAG Before vs After Memory
 
-    async def session_created(self, session_id: str | None) -> None:
-        """Notify all middleware that a session was created."""
-        for middleware in self._middleware:
-            await middleware.session_created(session_id)
+The order of providers determines what context each one can see. This is especially important for RAG, which may benefit from seeing conversation history.
 
-    async def execute(self, context: SessionContext) -> None:
-        """Execute the middleware pipeline."""
+```python
+from agent_framework import ChatAgent
+from agent_framework.context import InMemoryHistoryProvider, ContextProvider, SessionContext
 
-        async def terminal(s: SessionContext) -> None:
-            # Terminal handler - nothing more to do
-            pass
+class RAGContextProvider(ContextProvider):
+    """RAG provider that retrieves relevant documents based on available context."""
 
-        # Build the chain from last to first
-        next_handler = terminal
-        for middleware in reversed(self._middleware):
-            # Capture middleware in closure
-            current_middleware = middleware
-            current_next = next_handler
+    async def before_run(self, agent, session, context, state) -> None:
+        # Build query from what we can see
+        query_parts = []
 
-            async def handler(s: SessionContext, mw=current_middleware, nxt=current_next) -> None:
-                await mw.process(s, nxt)
+        # We can always see the current input
+        for msg in context.input_messages:
+            query_parts.append(msg.text)
 
-            next_handler = handler
+        # Can we see history? Depends on provider order!
+        history = context.get_all_messages()  # Gets context from providers that ran before us
+        if history:
+            # Include recent history for better RAG context
+            recent = history[-3:]  # Last 3 messages
+            for msg in recent:
+                query_parts.append(msg.text)
 
-        # Execute the chain
-        await next_handler(context)
+        query = " ".join(query_parts)
+        documents = await self._retrieve_documents(query)
 
+        # Add retrieved documents as context
+        rag_messages = [ChatMessage.system(f"Relevant context:\n{doc}") for doc in documents]
+        context.add_messages(self.source_id, rag_messages)
 
-class AgentSession:
-    """A conversation session with an agent.
+    async def _retrieve_documents(self, query: str) -> list[str]:
+        # ... vector search implementation
+        return ["doc1", "doc2"]
 
-    AgentSession manages the conversation state and owns a ContextMiddlewarePipeline
-    that processes context before each invocation and handles responses after.
 
-    Note: The session is created by calling agent.create_session(), which constructs
-    the pipeline from the agent's context_middleware sequence, resolving any factories.
+# =============================================================================
+# SCENARIO A: RAG runs BEFORE Memory
+# =============================================================================
+# RAG only sees the current input message - no conversation history
+# Use when: RAG should be based purely on the current query
 
-    Attributes:
-        session_id: Unique identifier for this session
-        service_session_id: Service-managed session ID (if using service-side storage)
-        context_pipeline: The middleware pipeline for this session
-    """
-
-    def __init__(
-        self,
-        *,
-        session_id: str | None = None,
-        service_session_id: str | None = None,
-        context_pipeline: ContextMiddlewarePipeline | None = None,
-    ):
-        """Initialize the session.
+agent_rag_first = ChatAgent(
+    chat_client=client,
+    context_providers=[
+        RAGContextProvider("rag"),           # Runs first - only sees input_messages
+        InMemoryHistoryProvider("memory"),   # Runs second - loads/stores history
+    ]
+)
 
-        Note: Prefer using agent.create_session() instead of direct construction.
+# Flow:
+# 1. RAG.before_run():
+#    - context.input_messages = ["What's the weather?"]
+#    - context.get_all_messages() = []  (empty - memory hasn't run yet)
+#    - RAG query based on: "What's the weather?" only
+#    - Adds: context_messages["rag"] = [retrieved docs]
+#
+# 2. Memory.before_run():
+#    - Loads history: context_messages["memory"] = [previous conversation]
+#
+# 3. Agent invocation with: history + rag docs + input
+#
+# 4. Memory.after_run():
+#    - Stores: input + response (not RAG docs by default)
+#
+# 5. RAG.after_run():
+#    - (nothing to do)
 
-        Default storage behavior (applied at runtime, not init):
-        - If service_session_id is set: service handles storage, no default added
-        - If options.store=True: user expects service storage, no default added
-        - If no service_session_id AND store is not True AND no pipeline:
-          InMemoryStorageMiddleware is automatically added
 
-        Args:
-            session_id: Optional session ID (generated if not provided)
-            service_session_id: Optional service-managed session ID
-            context_pipeline: The middleware pipeline (created by agent)
-        """
-        self._session_id = session_id or str(uuid.uuid4())
-        self._service_session_id = service_session_id
-        self._context_pipeline = context_pipeline
-        self._initialized = False
-        self._default_storage_checked = False
+# =============================================================================
+# SCENARIO B: RAG runs AFTER Memory
+# =============================================================================
+# RAG sees conversation history - can use it for better retrieval
+# Use when: RAG should consider conversation context for better results
 
-    @property
-    def session_id(self) -> str:
-        """The unique identifier for this session."""
-        return self._session_id
+agent_memory_first = ChatAgent(
+    chat_client=client,
+    context_providers=[
+        InMemoryHistoryProvider("memory"),   # Runs first - loads history
+        RAGContextProvider("rag"),           # Runs second - sees history + input
+    ]
+)
 
-    @property
-    def service_session_id(self) -> str | None:
-        """The service-managed session ID (if using service-side storage)."""
-        return self._service_session_id
+# Flow:
+# 1. Memory.before_run():
+#    - Loads history: context_messages["memory"] = [previous conversation]
+#
+# 2. RAG.before_run():
+#    - context.input_messages = ["What's the weather?"]
+#    - context.get_all_messages() = [previous conversation]  (sees history!)
+#    - RAG query based on: recent history + "What's the weather?"
+#    - Better retrieval because RAG understands conversation context
+#    - Adds: context_messages["rag"] = [more relevant docs]
+#
+# 3. Agent invocation with: history + rag docs + input
+#
+# 4. RAG.after_run():
+#    - (nothing to do)
+#
+# 5. Memory.after_run():
+#    - Stores: input + response
 
-    @service_session_id.setter
-    def service_session_id(self, value: str | None) -> None:
-        self._service_session_id = value
 
-    @property
-    def context_pipeline(self) -> ContextMiddlewarePipeline | None:
-        """The middleware pipeline for this session."""
-        return self._context_pipeline
+# =============================================================================
+# SCENARIO C: RAG after Memory, with selective storage
+# =============================================================================
+# Memory first for better RAG, plus separate audit that stores RAG context
 
-    @context_pipeline.setter
-    def context_pipeline(self, value: ContextMiddlewarePipeline | None) -> None:
-        """Set the middleware pipeline for this session."""
-        self._context_pipeline = value
+agent_full_context = ChatAgent(
+    chat_client=client,
+    context_providers=[
+        InMemoryHistoryProvider("memory"),   # Primary history storage
+        RAGContextProvider("rag"),           # Gets history context for better retrieval
+        PersonaContextProvider("persona"),   # Adds persona instructions
+        # Audit storage - stores everything including RAG results
+        CosmosHistoryProvider(
+            "audit",
+            load_messages=False,               # Don't load (memory handles that)
+            store_context_messages=True,       # Store RAG + persona context too
+        ),
+    ]
+)
+```
 
-    def _ensure_default_storage(self, options: dict[str, Any]) -> None:
-        """Add default InMemoryStorageMiddleware if needed.
+---
 
-        Called at runtime (first run) so users can modify the pipeline
-        after session creation but before first invocation.
+### Workplan
 
-        Default storage is added when ALL of these are true:
-        - No service_session_id (service not managing storage)
-        - options.store is not True (user not expecting service storage)
-        - Pipeline is empty or None (user hasn't configured middleware)
-        """
-        if self._default_storage_checked:
-            return
-        self._default_storage_checked = True
+The implementation is split into 2 PRs to limit scope and simplify review.
 
-        # User expects service-side storage
-        if options.get("store") is True:
-            return
+```
+PR1 (New Types) ──► PR2 (Agent Integration + Cleanup)
+```
 
-        # Service is managing storage
-        if self._service_session_id is not None:
-            return
+#### PR 1: New Types
 
-        # User has configured middleware
-        if self._context_pipeline is not None and len(self._context_pipeline) > 0:
-            return
+**Goal:** Create all new types. No changes to existing code yet.
 
-        # Add default in-memory storage
-        default_middleware = InMemoryStorageMiddleware("memory")
-        if self._context_pipeline is None:
-            self._context_pipeline = ContextMiddlewarePipeline([default_middleware])
-        else:
-            self._context_pipeline.prepend(default_middleware)
+**Core Package - `packages/core/agent_framework/_sessions.py`:**
+- [ ] `SessionContext` class with explicit add/get methods
+- [ ] `ContextProvider` base class with `before_run()`/`after_run()`
+- [ ] `HistoryProvider` derived class with load_messages/store flags
+- [ ] Add `serialize()` and `restore()` methods to `ContextProvider` base class
+- [ ] `AgentSession` class with `state: dict[str, Any]`
+- [ ] `InMemoryHistoryProvider(HistoryProvider)`
 
-    async def initialize(self) -> None:
-        """Initialize the session and notify middleware."""
-        if not self._initialized and self._context_pipeline is not None:
-            await self._context_pipeline.session_created(self._session_id)
-            self._initialized = True
+**External Packages:**
+- [ ] `packages/azure-ai-search/` - create `AzureAISearchContextProvider`
+- [ ] `packages/redis/` - create `RedisHistoryProvider`
+- [ ] `packages/mem0/` - create `Mem0ContextProvider`
 
-    async def run_context_pipeline(
-        self,
-        input_messages: list[ChatMessage],
-        *,
-        tools: list[ToolProtocol] | None = None,
-        options: dict[str, Any] | None = None,
-    ) -> SessionContext:
-        """Prepare context by running the middleware pipeline.
+**Testing:**
+- [ ] Unit tests for `SessionContext` methods (add_messages, get_messages, add_instructions, add_tools)
+- [ ] Unit tests for `HistoryProvider` load/store flags
+- [ ] Unit tests for `InMemoryHistoryProvider` serialize/restore
+- [ ] Unit tests for source attribution (mandatory source_id)
 
-        This runs the full middleware pipeline (pre-processing, then post-processing
-        after response_messages is set).
+---
 
-        Args:
-            input_messages: New messages to send to the agent
-            tools: Additional tools available for this invocation
-            options: Options including 'store' flag (READ-ONLY, for reflection)
+#### PR 2: Agent Integration + Cleanup
 
-        Returns:
-            The invocation context with history, context, instructions, and tools populated
-        """
-        options = options or {}
+**Goal:** Wire up new types into `ChatAgent` and remove old types.
 
-        # Check for default storage on first run (deferred from init)
-        self._ensure_default_storage(options)
+**Changes to `ChatAgent`:**
+- [ ] Replace `thread` parameter with `session` in `agent.run()`
+- [ ] Add `context_providers` parameter to `ChatAgent.__init__()`
+- [ ] Add `create_session()` method
+- [ ] Add `serialize_session()` / `restore_session()` methods
+- [ ] Wire up provider iteration (before_run forward, after_run reverse)
+- [ ] Add validation warning if multiple/zero history providers have `load_messages=True`
+- [ ] Wire up default `InMemoryHistoryProvider` behavior (auto-add when no providers and no service_session_id)
 
-        await self.initialize()
-        context = SessionContext(
-            session_id=self._session_id,
-            service_session_id=self._service_session_id,
-            input_messages=input_messages,
-            tools=tools or [],
-            options=options,
-        )
-        if self._context_pipeline is not None:
-            await self._context_pipeline.execute(context)
-        return context
+**Remove Legacy Types:**
+- [ ] `packages/core/agent_framework/_memory.py` - remove `ContextProvider` class
+- [ ] `packages/core/agent_framework/_threads.py` - remove `ChatMessageStore`, `ChatMessageStoreProtocol`, `AgentThread`
+- [ ] `packages/core/agent_framework/__init__.py` - remove old exports, add new exports from `_sessions.py`
+- [ ] Remove old provider classes from `azure-ai-search`, `redis`, `mem0`
 
+**Documentation & Samples:**
+- [ ] Update all samples in `samples/` to use new API
+- [ ] Write migration guide
+- [ ] Update API documentation
 
-# Example of how agent creates sessions:
-class ChatAgent:
-    def __init__(
-        self,
-        chat_client: ...,
-        *,
-        context_middleware: Sequence[ContextMiddleware] | None = None,
-        # ... other params
-    ):
-        self._context_middleware = list(context_middleware or [])
-        # ... other init
+**Testing:**
+- [ ] Unit tests for provider execution order (before_run forward, after_run reverse)
+- [ ] Unit tests for validation warnings (multiple/zero loaders)
+- [ ] Unit tests for session serialization/deserialization
+- [ ] Integration test: agent with `context_providers` + `session` works
+- [ ] Integration test: full conversation with memory persistence
+- [ ] Ensure all existing tests still pass (with updated API)
+- [ ] Verify no references to removed types remain
 
-    def create_session(
-        self,
-        *,
-        session_id: str | None = None,
-        service_session_id: str | None = None,
-    ) -> AgentSession:
-        """Create a new session with a fresh middleware pipeline.
+---
 
-        Middleware factories are called with the session_id to create
-        session-specific instances.
+#### CHANGELOG (single entry for release)
 
-        Args:
-            session_id: Optional session ID (generated if not provided)
-            service_session_id: Optional service-managed session ID
-        """
-        resolved_session_id = session_id or str(uuid.uuid4())
+- **[BREAKING]** Replaced `ContextProvider` with new `ContextProvider` (hooks pattern with `before_run`/`after_run`)
+- **[BREAKING]** Replaced `ChatMessageStore` with `HistoryProvider`
+- **[BREAKING]** Replaced `AgentThread` with `AgentSession`
+- **[BREAKING]** Replaced `thread` parameter with `session` in `agent.run()`
+- Added `SessionContext` for invocation state with source attribution
+- Added `InMemoryHistoryProvider` for conversation history
+- Added session serialization (`serialize_session`, `restore_session`)
 
-        # Only create pipeline if we have middleware configured
-        pipeline = None
-        if self._context_middleware:
-            pipeline = ContextMiddlewarePipeline.from_config(
-                self._context_middleware,
-                session_id=resolved_session_id,
-            )
+---
 
-        return AgentSession(
-            session_id=resolved_session_id,
-            service_session_id=service_session_id,
-            context_pipeline=pipeline,
-        )
+#### Estimated Sizes
 
-    async def run(self, input: str, *, session: AgentSession, options: dict[str, Any] | None = None) -> ...:
-        """Run the agent with the given input."""
-        # Default storage check happens inside session.run_context_pipeline()
-        # ... rest of run logic
-```
+| PR | New Lines | Modified Lines | Risk |
+|----|-----------|----------------|------|
+| PR1 | ~500 | ~0 | Low |
+| PR2 | ~150 | ~400 | Medium |
 
 ---
 
-## User Experience Examples
-
-### Example 0: Zero-Config Default (Simplest Use Case)
-
-```python
-from agent_framework import ChatAgent
+#### Reference Implementation
 
-# No middleware configured - but conversation history still works!
-agent = ChatAgent(
-    chat_client=client,
-    name="assistant",
-    # No context_middleware specified
-)
+Full implementation code for the chosen design (hooks pattern, Decision B1).
 
-# Create session - automatically gets InMemoryStorageMiddleware on first run
-session = agent.create_session()
-response = await agent.run("Hello, my name is Alice!", session=session)
+##### SessionContext
 
-# Conversation history is preserved automatically
-response = await agent.run("What's my name?", session=session)
-# Agent remembers: "Your name is Alice!"
+```python
+# Copyright (c) Microsoft. All rights reserved.
 
-# With service-managed session - no default storage added (service handles it)
-service_session = agent.create_session()
+from abc import ABC, abstractmethod
+from collections.abc import Awaitable, Callable, Sequence
+from typing import Any
 
-# With store=True in options - user expects service storage, no default added
-response = await agent.run("Hello!", session=session, options={"store": True})
+from ._types import ChatMessage
+from ._tools import ToolProtocol
 
-# User can manually add middleware to session before first run
-session = agent.create_session()
-session.context_pipeline = ContextMiddlewarePipeline([
-    MyCustomMiddleware(source_id="custom")
-])
-response = await agent.run("Hello!", session=session)  # No default added since pipeline exists
-```
 
-### Example 1: Explicit Memory Storage
+class SessionContext:
+    """Per-invocation state passed through the context provider pipeline.
 
-```python
-from agent_framework import ChatAgent
-from agent_framework.context import InMemoryStorageMiddleware
+    Created fresh for each agent.run() call. Providers read from and write to
+    the mutable fields to add context before invocation and process responses after.
 
-# Explicit middleware configuration (same behavior as default, but explicit)
-agent = ChatAgent(
-    chat_client=client,
-    name="assistant",
-    context_middleware=[
-        InMemoryStorageMiddleware(source_id="memory")
-    ]
-)
+    Attributes:
+        session_id: The ID of the current session
+        service_session_id: Service-managed session ID (if present, service handles storage)
+        input_messages: The new messages being sent to the agent (read-only, set by caller)
+        context_messages: Dict mapping source_id -> messages added by that provider.
+            Maintains insertion order (provider execution order). Use add_messages()
+            to add messages with proper source attribution.
+        instructions: Additional instructions - providers can append here
+        tools: Additional tools - providers can append here
+        response_messages: After invocation, contains the agent's response (set by agent).
+            READ-ONLY - modifications are ignored. Use AgentMiddleware to modify responses.
+        options: Options passed to agent.run() - READ-ONLY, for reflection only
+        metadata: Shared metadata dictionary for cross-provider communication
 
-# Create session and chat
-session = agent.create_session()
-response = await agent.run("Hello!", session=session)
+    Note:
+        - `options` is read-only; changes will NOT be merged back into the agent run
+        - `response_messages` is read-only; use AgentMiddleware to modify responses
+        - `instructions` and `tools` are merged by the agent into the run options
+        - `context_messages` values are flattened in order when building the final input
+    """
 
-# Messages are automatically stored and loaded on next invocation
-response = await agent.run("What did I say before?", session=session)
-```
+    def __init__(
+        self,
+        *,
+        session_id: str | None = None,
+        service_session_id: str | None = None,
+        input_messages: list[ChatMessage],
+        context_messages: dict[str, list[ChatMessage]] | None = None,
+        instructions: list[str] | None = None,
+        tools: list[ToolProtocol] | None = None,
+        response_messages: list[ChatMessage] | None = None,
+        options: dict[str, Any] | None = None,
+        metadata: dict[str, Any] | None = None,
+    ):
+        self.session_id = session_id
+        self.service_session_id = service_session_id
+        self.input_messages = input_messages
+        self.context_messages: dict[str, list[ChatMessage]] = context_messages or {}
+        self.instructions: list[str] = instructions or []
+        self.tools: list[ToolProtocol] = tools or []
+        self.response_messages = response_messages
+        self.options = options or {}  # READ-ONLY - for reflection only
+        self.metadata = metadata or {}
 
-### Example 1b: Using Middleware Factory for Per-Session State
+    def add_messages(self, source_id: str, messages: Sequence[ChatMessage]) -> None:
+        """Add context messages from a specific source.
 
-```python
-from agent_framework import ChatAgent
-from agent_framework.context import ContextMiddleware, SessionContext
+        Messages are stored keyed by source_id, maintaining insertion order
+        based on provider execution order.
 
-class SessionSpecificMiddleware(ContextMiddleware):
-    """Middleware that stores state per session."""
+        Args:
+            source_id: The provider source_id adding these messages
+            messages: The messages to add
+        """
+        if source_id not in self.context_messages:
+            self.context_messages[source_id] = []
+        self.context_messages[source_id].extend(messages)
 
-    def __init__(self, source_id: str, session_id: str | None):
-        super().__init__(source_id=source_id)
-        self.session_id = session_id
-        self.invocation_count = 0  # Per-session counter
+    def add_instructions(self, source_id: str, instructions: str | Sequence[str]) -> None:
+        """Add instructions to be prepended to the conversation.
 
-    async def process(self, context: SessionContext, next) -> None:
-        self.invocation_count += 1
-        context.add_instructions(
-            self.source_id,
-            f"This is invocation #{self.invocation_count} in session {self.session_id}"
-        )
-        await next(context)
+        Instructions are added to a flat list. The source_id is recorded
+        in metadata for debugging but instructions are not keyed by source.
 
+        Args:
+            source_id: The provider source_id adding these instructions
+            instructions: A single instruction string or sequence of strings
+        """
+        if isinstance(instructions, str):
+            instructions = [instructions]
+        self.instructions.extend(instructions)
 
-# Factory function - receives session_id when session is created
-def create_session_middleware(session_id: str | None) -> ContextMiddleware:
-    return SessionSpecificMiddleware(
-        source_id="session_tracker",
-        session_id=session_id,
-    )
+    def add_tools(self, source_id: str, tools: Sequence[ToolProtocol]) -> None:
+        """Add tools to be available for this invocation.
 
+        Tools are added with source attribution in their metadata.
 
-# Agent with factory - each session gets its own middleware instance
-agent = ChatAgent(
-    chat_client=client,
-    name="assistant",
-    context_middleware=[
-        InMemoryStorageMiddleware(source_id="memory"),  # Instance (shared)
-        create_session_middleware,  # Factory (per-session)
-    ]
-)
+        Args:
+            source_id: The provider source_id adding these tools
+            tools: The tools to add
+        """
+        for tool in tools:
+            if hasattr(tool, 'metadata') and isinstance(tool.metadata, dict):
+                tool.metadata["context_source"] = source_id
+        self.tools.extend(tools)
 
-# Each session gets a fresh SessionSpecificMiddleware instance
-session1 = agent.create_session()
-session2 = agent.create_session()
-# session1 and session2 have independent invocation_count
-```
+    def get_messages(
+        self,
+        sources: Sequence[str] | None = None,
+        exclude_sources: Sequence[str] | None = None,
+    ) -> list[ChatMessage]:
+        """Get context messages, optionally filtered by source.
 
-### Example 2: RAG + Memory + Audit (All StorageContextMiddleware)
+        Returns messages in provider execution order (dict insertion order).
 
-```python
-from agent_framework import ChatAgent
-from agent_framework.azure import CosmosStorageMiddleware, AzureAISearchContextMiddleware
-from agent_framework.redis import RedisStorageMiddleware
+        Args:
+            sources: If provided, only include messages from these sources
+            exclude_sources: If provided, exclude messages from these sources
 
-# RAG middleware that injects relevant documents
-search_middleware = AzureAISearchContextMiddleware(
-    source_id="rag",
-    endpoint="https://...",
-    index_name="documents",
-)
+        Returns:
+            Flattened list of messages in provider execution order
+        """
+        result: list[ChatMessage] = []
+        for source_id, messages in self.context_messages.items():
+            if sources is not None and source_id not in sources:
+                continue
+            if exclude_sources is not None and source_id in exclude_sources:
+                continue
+            result.extend(messages)
+        return result
 
-# Primary memory storage (loads + stores)
-# load_messages=True (default) - loads and stores messages
-memory_middleware = RedisStorageMiddleware(
-    source_id="memory",
-    redis_url="redis://...",
-)
+    def get_all_messages(
+        self,
+        *,
+        include_input: bool = False,
+        include_response: bool = False,
+    ) -> list[ChatMessage]:
+        """Get all messages, optionally including input and response.
 
-# Audit storage - SAME CLASS, different configuration
-# load_messages=False = never loads, just stores for audit
-audit_middleware = CosmosStorageMiddleware(
-    source_id="audit",
-    connection_string="...",
-    load_messages=False,  # Don't load - just store for audit
-)
+        Returns messages in the order they would appear in a full conversation:
+        1. Context messages (from providers, in execution order)
+        2. Input messages (if include_input=True)
+        3. Response messages (if include_response=True)
 
-agent = ChatAgent(
-    chat_client=client,
-    name="assistant",
-    context_middleware=[
-        memory_middleware,   # First: loads history
-        search_middleware,   # Second: adds RAG context
-        audit_middleware,    # Third: stores for audit (no load)
-    ]
-)
+        Args:
+            include_input: If True, append input_messages after context
+            include_response: If True, append response_messages at the end
+
+        Returns:
+            Flattened list of messages in conversation order
+        """
+        result: list[ChatMessage] = []
+        for messages in self.context_messages.values():
+            result.extend(messages)
+        if include_input and self.input_messages:
+            result.extend(self.input_messages)
+        if include_response and self.response_messages:
+            result.extend(self.response_messages)
+        return result
 ```
 
-### Example 3: Custom Context Middleware (Onion Pattern)
+##### ContextProvider
 
 ```python
-from agent_framework.context import ContextMiddleware, SessionContext
+class ContextProvider(ABC):
+    """Base class for context providers (hooks pattern).
 
-class TimeContextMiddleware(ContextMiddleware):
-    """Adds current time to the context."""
+    Context providers participate in the context engineering pipeline,
+    adding context before model invocation and processing responses after.
+
+    Attributes:
+        source_id: Unique identifier for this provider instance (required).
+            Used for message/tool attribution so other providers can filter.
+    """
 
     def __init__(self, source_id: str):
-        super().__init__(source_id=source_id)
+        """Initialize the provider.
+
+        Args:
+            source_id: Unique identifier for this provider instance.
+                Used for message/tool attribution.
+        """
+        self.source_id = source_id
 
-    async def process(
+    async def before_run(
         self,
+        agent: "ChatAgent",
+        session: AgentSession,
         context: SessionContext,
-        next
+        state: dict[str, Any],
     ) -> None:
-        from datetime import datetime
-
-        # PRE: Add time instruction using explicit method
-        context.add_instructions(
-            self.source_id,
-            f"Current date and time: {datetime.now().isoformat()}"
-        )
-
-        # Continue to next middleware
-        await next(context)
+        """Called before model invocation.
 
-        # POST: Nothing to do after invocation for this middleware
+        Override to add context (messages, instructions, tools) to the
+        SessionContext before the model is invoked.
 
+        Args:
+            agent: The agent running this invocation
+            session: The current session
+            context: The invocation context - add messages/instructions/tools here
+            state: The session's mutable state dict
+        """
+        pass
 
-class UserPreferencesMiddleware(ContextMiddleware):
-    """Tracks and applies user preferences from conversation."""
-
-    def __init__(self, source_id: str):
-        super().__init__(source_id=source_id)
-        self._preferences: dict[str, dict[str, Any]] = {}
-
-    async def process(
+    async def after_run(
         self,
+        agent: "ChatAgent",
+        session: AgentSession,
         context: SessionContext,
-        next
+        state: dict[str, Any],
     ) -> None:
-        # PRE: Add known preferences as instructions
-        prefs = self._preferences.get(context.session_id or "", {})
-        if prefs:
-            context.add_instructions(
-                self.source_id,
-                f"User preferences: {json.dumps(prefs)}"
-            )
+        """Called after model invocation.
 
-        # Continue to next middleware and model invocation
-        await next(context)
+        Override to process the response (store messages, extract info, etc.).
+        The context.response_messages will be populated at this point.
 
-        # POST: Extract preferences from response
-        for msg in context.response_messages or []:
-            if "preference:" in msg.text.lower():
-                # Store extracted preference for future sessions
-                pass
+        Args:
+            agent: The agent that ran this invocation
+            session: The current session
+            context: The invocation context with response_messages populated
+            state: The session's mutable state dict
+        """
+        pass
 
+    async def serialize(self) -> Any:
+        """Serialize provider state. Default returns None (no state)."""
+        return None
 
-# Compose middleware - each with mandatory source_id
-agent = ChatAgent(
-    chat_client=client,
-    context_middleware=[
-        InMemoryStorageMiddleware(source_id="memory"),
-        TimeContextMiddleware(source_id="time"),
-        UserPreferencesMiddleware(source_id="prefs"),
-    ]
-)
+    async def restore(self, state: Any) -> None:
+        """Restore provider state from serialized object."""
+        pass
 ```
 
-### Example 4: Filtering by Source (Using Dict-Based Context)
+##### HistoryProvider
 
 ```python
-class SelectiveContextMiddleware(ContextMiddleware):
-    """Middleware that only processes messages from specific sources."""
+class HistoryProvider(ContextProvider):
+    """Base class for conversation history storage providers.
 
-    def __init__(self, source_id: str):
-        super().__init__(source_id=source_id)
+    A single class that can be configured for different use cases:
+    - Primary memory storage (loads + stores messages)
+    - Audit/logging storage (stores only, doesn't load)
+    - Evaluation storage (stores only for later analysis)
 
-    async def process(
-        self,
-        context: SessionContext,
-        next
-    ) -> None:
-        # Check what sources have added messages so far
-        print(f"Sources so far: {list(context.context_messages.keys())}")
+    Loading behavior (when to add messages to context_messages[source_id]):
+    - `load_messages=True` (default): Load messages from storage
+    - `load_messages=False`: Skip loading (before_run is a no-op)
 
-        # Get messages excluding RAG context
-        non_rag_messages = context.get_messages(exclude_sources=["rag"])
+    Storage behavior:
+    - `store_inputs`: Store input messages (default True)
+    - `store_responses`: Store response messages (default True)
+    - Storage always happens unless explicitly disabled, regardless of load_messages
 
-        # Or get only memory messages
-        if "memory" in context.context_messages:
-            memory_only = context.context_messages["memory"]
+    Warning: At session creation time, a warning is logged if:
+    - Multiple history providers have `load_messages=True` (likely duplicate loading)
+    - Zero history providers have `load_messages=True` (likely missing primary storage)
 
-        # Do something with filtered messages...
-        # e.g., sentiment analysis, topic extraction
+    Examples:
+        # Primary memory - loads and stores
+        memory = InMemoryHistoryProvider(source_id="memory")
 
-        # Continue to next middleware
-        await next(context)
+        # Audit storage - stores only, doesn't add to context
+        audit = RedisHistoryProvider(
+            source_id="audit",
+            load_messages=False,
+            redis_url="redis://...",
+        )
 
+        # Full audit - stores everything including RAG context
+        full_audit = CosmosHistoryProvider(
+            source_id="full_audit",
+            load_messages=False,
+            store_context_messages=True,
+        )
+    """
 
-class RAGContextMiddleware(ContextMiddleware):
-    """Middleware that adds RAG context."""
+    def __init__(
+        self,
+        source_id: str,
+        *,
+        load_messages: bool = True,
+        store_responses: bool = True,
+        store_inputs: bool = True,
+        store_context_messages: bool = False,
+        store_context_from: Sequence[str] | None = None,
+    ):
+        super().__init__(source_id)
+        self.load_messages = load_messages
+        self.store_responses = store_responses
+        self.store_inputs = store_inputs
+        self.store_context_messages = store_context_messages
+        self.store_context_from = list(store_context_from) if store_context_from else None
 
-    def __init__(self, source_id: str):
-        super().__init__(source_id=source_id)
+    @abstractmethod
+    async def get_messages(self, session_id: str | None) -> list[ChatMessage]:
+        """Retrieve stored messages for this session."""
+        pass
 
-    async def process(
+    @abstractmethod
+    async def save_messages(
         self,
-        context: SessionContext,
-        next
+        session_id: str | None,
+        messages: Sequence[ChatMessage]
     ) -> None:
-        # Search for relevant documents based on input
-        relevant_docs = await self._search(context.input_messages)
+        """Persist messages for this session."""
+        pass
 
-        # Add RAG context using explicit method
-        rag_messages = [
-            ChatMessage(role="system", text=f"Relevant info: {doc}")
-            for doc in relevant_docs
-        ]
-        context.add_messages(self.source_id, rag_messages)
+    def _get_context_messages_to_store(self, context: SessionContext) -> list[ChatMessage]:
+        """Get context messages that should be stored based on configuration."""
+        if not self.store_context_messages:
+            return []
+        if self.store_context_from is not None:
+            return context.get_messages(sources=self.store_context_from)
+        else:
+            return context.get_messages(exclude_sources=[self.source_id])
 
-        await next(context)
+    async def before_run(self, agent, session, context, state) -> None:
+        """Load history into context if configured."""
+        if self.load_messages:
+            history = await self.get_messages(context.session_id)
+            context.add_messages(self.source_id, history)
+
+    async def after_run(self, agent, session, context, state) -> None:
+        """Store messages based on configuration."""
+        messages_to_store: list[ChatMessage] = []
+        messages_to_store.extend(self._get_context_messages_to_store(context))
+        if self.store_inputs:
+            messages_to_store.extend(context.input_messages)
+        if self.store_responses and context.response_messages:
+            messages_to_store.extend(context.response_messages)
+        if messages_to_store:
+            await self.save_messages(context.session_id, messages_to_store)
 ```
 
-### Example 5: Explicit Storage Configuration for Service-Managed Sessions
+##### AgentSession
 
 ```python
-# StorageContextMiddleware uses explicit configuration - no automatic detection.
-# load_messages=True (default): Load messages from storage
-# load_messages=False: Skip loading (useful for audit-only storage)
-
-agent = ChatAgent(
-    chat_client=client,
-    context_middleware=[
-        RedisStorageMiddleware(
-            source_id="memory",
-            redis_url="redis://...",
-            # load_messages=True is the default
-        )
-    ]
-)
-
-session = agent.create_session()
-
-# Normal run - loads and stores messages
-response = await agent.run("Hello!", session=session)
-
-# For service-managed sessions, configure storage explicitly:
-# - Use load_messages=False when service handles history
-service_storage = RedisStorageMiddleware(
-    source_id="audit",
-    redis_url="redis://...",
-    load_messages=False,  # Don't load - service manages history
-)
-
-agent_with_service = ChatAgent(
-    chat_client=client,
-    context_middleware=[service_storage]
-)
-service_session = agent_with_service.create_session(service_session_id="thread_abc123")
-response = await agent_with_service.run("Hello!", session=service_session)
-# Storage middleware stores for audit but doesn't load (service handles history)
-```
-
-### Example 6: Multiple Instances of Same Middleware Type
-
-```python
-# You can have multiple instances of the same middleware class
-# by using different source_ids
-
-agent = ChatAgent(
-    chat_client=client,
-    context_middleware=[
-        # Primary storage for conversation history
-        RedisStorageMiddleware(
-            source_id="conversation_memory",
-            redis_url="redis://primary...",
-            load_messages=True,  # This one loads
-        ),
-        # Secondary storage for audit (different Redis instance)
-        RedisStorageMiddleware(
-            source_id="audit_log",
-            redis_url="redis://audit...",
-            load_messages=False,  # This one just stores
-        ),
-    ]
-)
-# Warning will NOT be logged because only one has load_messages=True
-```
-
-### Example 7: Middleware Ordering - RAG Before vs After Memory
-
-The order of middleware determines what context each middleware can see. This is especially important for RAG, which may benefit from seeing conversation history.
-
-```python
-from agent_framework import ChatAgent
-from agent_framework.context import InMemoryStorageMiddleware, ContextMiddleware, SessionContext
-
-class RAGContextMiddleware(ContextMiddleware):
-    """RAG middleware that retrieves relevant documents based on available context."""
-
-    async def process(self, context: SessionContext, next) -> None:
-        # Build query from what we can see
-        query_parts = []
-
-        # We can always see the current input
-        for msg in context.input_messages:
-            query_parts.append(msg.text)
-
-        # Can we see history? Depends on middleware order!
-        history = context.get_all_messages()  # Gets context from middleware that ran before us
-        if history:
-            # Include recent history for better RAG context
-            recent = history[-3:]  # Last 3 messages
-            for msg in recent:
-                query_parts.append(msg.text)
-
-        query = " ".join(query_parts)
-        documents = await self._retrieve_documents(query)
-
-        # Add retrieved documents as context
-        rag_messages = [ChatMessage.system(f"Relevant context:\n{doc}") for doc in documents]
-        context.add_messages(self.source_id, rag_messages)
-
-        await next(context)
-
-    async def _retrieve_documents(self, query: str) -> list[str]:
-        # ... vector search implementation
-        return ["doc1", "doc2"]
+import uuid
+import warnings
+from collections.abc import Sequence
 
 
-# =============================================================================
-# SCENARIO A: RAG runs BEFORE Memory
-# =============================================================================
-# RAG only sees the current input message - no conversation history
-# Use when: RAG should be based purely on the current query
+class AgentSession:
+    """A conversation session with an agent.
 
-agent_rag_first = ChatAgent(
-    chat_client=client,
-    context_middleware=[
-        RAGContextMiddleware("rag"),           # Runs first - only sees input_messages
-        InMemoryStorageMiddleware("memory"),   # Runs second - loads/stores history
-    ]
-)
+    Lightweight state container. Provider instances are owned by the agent,
+    not the session. The session only holds session IDs and a mutable state dict.
 
-# Flow:
-# 1. RAG.process() BEFORE next():
-#    - context.input_messages = ["What's the weather?"]
-#    - context.get_all_messages() = []  (empty - memory hasn't run yet)
-#    - RAG query based on: "What's the weather?" only
-#    - Adds: context_messages["rag"] = [retrieved docs]
-#
-# 2. Memory.process() BEFORE next():
-#    - context.get_all_messages() = [rag docs]  (sees RAG context)
-#    - Loads history: context_messages["memory"] = [previous conversation]
-#
-# 3. Agent invocation with: history + rag docs + input
-#
-# 4. Memory.process() AFTER next():
-#    - Stores: input + response (not RAG docs by default)
+    Attributes:
+        session_id: Unique identifier for this session
+        service_session_id: Service-managed session ID (if using service-side storage)
+        state: Mutable state dict shared with all providers
+    """
 
+    def __init__(
+        self,
+        *,
+        session_id: str | None = None,
+        service_session_id: str | None = None,
+    ):
+        """Initialize the session.
 
-# =============================================================================
-# SCENARIO B: RAG runs AFTER Memory
-# =============================================================================
-# RAG sees conversation history - can use it for better retrieval
-# Use when: RAG should consider conversation context for better results
+        Note: Prefer using agent.create_session() instead of direct construction.
 
-agent_memory_first = ChatAgent(
-    chat_client=client,
-    context_middleware=[
-        InMemoryStorageMiddleware("memory"),   # Runs first - loads history
-        RAGContextMiddleware("rag"),           # Runs second - sees history + input
-    ]
-)
+        Args:
+            session_id: Optional session ID (generated if not provided)
+            service_session_id: Optional service-managed session ID
+        """
+        self._session_id = session_id or str(uuid.uuid4())
+        self.service_session_id = service_session_id
+        self.state: dict[str, Any] = {}
 
-# Flow:
-# 1. Memory.process() BEFORE next():
-#    - Loads history: context_messages["memory"] = [previous conversation]
-#
-# 2. RAG.process() BEFORE next():
-#    - context.input_messages = ["What's the weather?"]
-#    - context.get_all_messages() = [previous conversation]  (sees history!)
-#    - RAG query based on: recent history + "What's the weather?"
-#    - Better retrieval because RAG understands conversation context
-#    - Adds: context_messages["rag"] = [more relevant docs]
-#
-# 3. Agent invocation with: history + rag docs + input
-#
-# 4. Memory.process() AFTER next():
-#    - Stores: input + response
+    @property
+    def session_id(self) -> str:
+        """The unique identifier for this session."""
+        return self._session_id
 
 
-# =============================================================================
-# SCENARIO C: RAG after Memory, with selective storage
-# =============================================================================
-# Memory first for better RAG, plus separate audit that stores RAG context
+# Example of how agent creates sessions and runs providers:
+class ChatAgent:
+    def __init__(
+        self,
+        chat_client: ...,
+        *,
+        context_providers: Sequence[ContextProvider] | None = None,
+    ):
+        self._context_providers = list(context_providers or [])
 
-agent_full_context = ChatAgent(
-    chat_client=client,
-    context_middleware=[
-        InMemoryStorageMiddleware("memory"),   # Primary history storage
-        RAGContextMiddleware("rag"),           # Gets history context for better retrieval
-        PersonaContextMiddleware("persona"),   # Adds persona instructions
-        # Audit storage - stores everything including RAG results
-        CosmosStorageMiddleware(
-            "audit",
-            load_messages=False,               # Don't load (memory handles that)
-            store_context_messages=True,       # Store RAG + persona context too
-        ),
-    ]
-)
-```
+    def create_session(
+        self,
+        *,
+        session_id: str | None = None,
+        service_session_id: str | None = None,
+    ) -> AgentSession:
+        """Create a new lightweight session.
 
-### Example 8: Understanding the Onion Pattern for Storage
+        Args:
+            session_id: Optional session ID (generated if not provided)
+            service_session_id: Optional service-managed session ID
+        """
+        return AgentSession(
+            session_id=session_id,
+            service_session_id=service_session_id,
+        )
 
-```python
-# Detailed breakdown of what storage middleware sees at each phase:
-#
-# Middleware order: [Storage, RAG, Persona]
-#
-# BEFORE next() - Storage pre-processing:
-#   context.context_messages = {}  (empty, no one has added yet)
-#   context.input_messages = [user's message]
-#   context.response_messages = None
-#
-# BEFORE next() - RAG pre-processing:
-#   context.context_messages = {"memory": [...]}  (storage added history)
-#
-# BEFORE next() - Persona pre-processing:
-#   context.context_messages = {"memory": [...], "rag": [...]}
-#
-# --- Agent invocation happens ---
-#
-# AFTER next() - Persona post-processing:
-#   context.response_messages = [assistant's response]
-#
-# AFTER next() - RAG post-processing:
-#   (same state)
-#
-# AFTER next() - Storage post-processing:
-#   context.context_messages = {"memory": [...], "rag": [...], "persona": [...]}
-#   context.response_messages = [assistant's response]
-#
-#   Storage NOW has access to ALL context if store_context_messages=True
+    def _ensure_default_storage(self, session: AgentSession, options: dict[str, Any]) -> None:
+        """Add default InMemoryHistoryProvider if needed.
 
-class StorageWithLogging(StorageContextMiddleware):
-    """Example showing what storage sees at each phase."""
+        Default storage is added when ALL of these are true:
+        - No service_session_id (service not managing storage)
+        - options.store is not True (user not expecting service storage)
+        - No context_providers configured at all
+        """
+        if options.get("store") is True:
+            return
+        if session.service_session_id is not None:
+            return
+        if self._context_providers:
+            return
+        # Add default in-memory storage
+        self._context_providers.append(InMemoryHistoryProvider("memory"))
 
-    async def process(self, context: SessionContext, next) -> None:
-        # PRE: Load history
-        print(f"PRE - context sources: {list(context.context_messages.keys())}")
-        # Output: PRE - context sources: []
+    def _validate_providers(self) -> None:
+        """Warn if history provider configuration looks like a mistake."""
+        storage_providers = [
+            p for p in self._context_providers
+            if isinstance(p, HistoryProvider)
+        ]
+        if not storage_providers:
+            return
+        loaders = [p for p in storage_providers if p.load_messages is True]
+        if len(loaders) > 1:
+            warnings.warn(
+                f"Multiple history providers configured to load messages: "
+                f"{[p.source_id for p in loaders]}. "
+                f"This may cause duplicate messages in context.",
+                UserWarning
+            )
+        elif len(loaders) == 0:
+            warnings.warn(
+                f"History providers configured but none have load_messages=True: "
+                f"{[p.source_id for p in storage_providers]}. "
+                f"No conversation history will be loaded.",
+                UserWarning
+            )
 
-        if self._should_load_messages(context):
-            history = await self.get_messages(context.session_id)
-            context.add_messages(self.source_id, history)
+    async def run(self, input: str, *, session: AgentSession, options: dict[str, Any] | None = None) -> ...:
+        """Run the agent with the given input."""
+        options = options or {}
 
-        await next(context)
+        # Ensure default storage on first run
+        self._ensure_default_storage(session, options)
+        self._validate_providers()
 
-        # POST: Now we see everything
-        print(f"POST - context sources: {list(context.context_messages.keys())}")
-        # Output: POST - context sources: ['memory', 'rag', 'persona']
-
-        # Store based on configuration
-        # 1. Determine which context messages to include
-        if self.store_context_messages:
-            if self.store_context_from:
-                # Only from specific sources
-                context_msgs = context.get_messages(sources=self.store_context_from)
-            else:
-                # All context messages from all middleware
-                context_msgs = context.get_all_messages()
-        else:
-            # No context from other middleware - typically just our own loaded history
-            context_msgs = []
+        context = SessionContext(
+            session_id=session.session_id,
+            service_session_id=session.service_session_id,
+            input_messages=[...],
+            options=options,
+        )
 
-        # 2. Build final list: context + input + response
-        messages_to_store = list(context_msgs)
-        if self.store_inputs:
-            messages_to_store.extend(context.input_messages)
-        if self.store_responses:
-            messages_to_store.extend(context.response_messages or [])
+        # Before-run providers (forward order)
+        for provider in self._context_providers:
+            await provider.before_run(self, session, context, session.state)
 
-        await self.save_messages(context.session_id, messages_to_store)
-```
+        # ... assemble final messages from context, invoke model ...
 
----
-
-### Workplan
+        # After-run providers (reverse order)
+        for provider in reversed(self._context_providers):
+            await provider.after_run(self, session, context, session.state)
 
-The implementation is split into 2 PRs to limit scope and simplify review.
+    async def serialize_session(self, session: AgentSession) -> dict[str, Any]:
+        """Serialize a session's state for persistence."""
+        provider_states: dict[str, Any] = {}
+        for provider in self._context_providers:
+            state = await provider.serialize()
+            if state is not None:
+                provider_states[provider.source_id] = state
+        return {
+            "session_id": session.session_id,
+            "service_session_id": session.service_session_id,
+            "state": session.state,
+            "provider_states": provider_states,
+        }
 
+    async def restore_session(self, serialized: dict[str, Any]) -> AgentSession:
+        """Restore a session from serialized state."""
+        session = AgentSession(
+            session_id=serialized["session_id"],
+            service_session_id=serialized.get("service_session_id"),
+        )
+        session.state = serialized.get("state", {})
+        provider_states = serialized.get("provider_states", {})
+        for provider in self._context_providers:
+            if provider.source_id in provider_states:
+                await provider.restore(provider_states[provider.source_id])
+        return session
 ```
-PR1 (New Types) ──► PR2 (Agent Integration + Cleanup)
-```
-
-#### PR 1: New Types
-
-**Goal:** Create all new types. No changes to existing code yet.
-
-**Core Package - `packages/core/agent_framework/_sessions.py`:**
-- [ ] `SessionContext` class with explicit add/get methods
-- [ ] `ContextPlugin` base class with `before_run()`/`after_run()`
-- [ ] `StorageContextPlugin` derived class with load_messages/store flags
-- [ ] Add `serialize()` and `restore()` methods to `ContextPlugin` base class
-- [ ] `AgentSession` class with `state: dict[str, Any]`
-- [ ] `InMemoryStoragePlugin(StorageContextPlugin)`
-
-**External Packages:**
-- [ ] `packages/azure-ai-search/` - create `AzureAISearchContextPlugin`
-- [ ] `packages/redis/` - create `RedisStoragePlugin`
-- [ ] `packages/mem0/` - create `Mem0ContextPlugin`
-
-**Testing:**
-- [ ] Unit tests for `SessionContext` methods (add_messages, get_messages, add_instructions, add_tools)
-- [ ] Unit tests for `StorageContextPlugin` load/store flags
-- [ ] Unit tests for `InMemoryStoragePlugin` serialize/restore
-- [ ] Unit tests for source attribution (mandatory source_id)
-
----
-
-#### PR 2: Agent Integration + Cleanup
-
-**Goal:** Wire up new types into `ChatAgent` and remove old types.
-
-**Changes to `ChatAgent`:**
-- [ ] Replace `thread` parameter with `session` in `agent.run()`
-- [ ] Add `context_plugins` parameter to `ChatAgent.__init__()`
-- [ ] Add `create_session()` method
-- [ ] Add `serialize_session()` / `restore_session()` methods
-- [ ] Wire up plugin iteration (before_run forward, after_run reverse)
-- [ ] Add validation warning if multiple/zero storage plugins have `load_messages=True`
-- [ ] Wire up default `InMemoryStoragePlugin` behavior (auto-add when no plugins and no service_session_id)
-
-**Remove Legacy Types:**
-- [ ] `packages/core/agent_framework/_memory.py` - remove `ContextProvider` class
-- [ ] `packages/core/agent_framework/_threads.py` - remove `ChatMessageStore`, `ChatMessageStoreProtocol`, `AgentThread`
-- [ ] `packages/core/agent_framework/__init__.py` - remove old exports, add new exports from `_sessions.py`
-- [ ] Remove old provider classes from `azure-ai-search`, `redis`, `mem0`
-
-**Documentation & Samples:**
-- [ ] Update all samples in `samples/` to use new API
-- [ ] Write migration guide
-- [ ] Update API documentation
-
-**Testing:**
-- [ ] Unit tests for plugin execution order (before_run forward, after_run reverse)
-- [ ] Unit tests for validation warnings (multiple/zero loaders)
-- [ ] Unit tests for session serialization/deserialization
-- [ ] Integration test: agent with `context_plugins` + `session` works
-- [ ] Integration test: full conversation with memory persistence
-- [ ] Ensure all existing tests still pass (with updated API)
-- [ ] Verify no references to removed types remain
-
----
-
-#### CHANGELOG (single entry for release)
-
-- **[BREAKING]** Replaced `ContextProvider` with `ContextPlugin` (hooks pattern with `before_run`/`after_run`)
-- **[BREAKING]** Replaced `ChatMessageStore` with `StorageContextPlugin`
-- **[BREAKING]** Replaced `AgentThread` with `AgentSession`
-- **[BREAKING]** Replaced `thread` parameter with `session` in `agent.run()`
-- Added `SessionContext` for invocation state with source attribution
-- Added `InMemoryStoragePlugin` for conversation history
-- Added session serialization (`serialize_session`, `restore_session`)
-
----
-
-#### Estimated Sizes
-
-| PR | New Lines | Modified Lines | Risk |
-|----|-----------|----------------|------|
-| PR1 | ~500 | ~0 | Low |
-| PR2 | ~150 | ~400 | Medium |

From 78272596dce95dd9dd263be6c724412fb1bfee52 Mon Sep 17 00:00:00 2001
From: eavanvalkenburg <github@vanvalkenburg.eu>
Date: Mon, 9 Feb 2026 15:02:25 +0100
Subject: [PATCH 17/19] Refine ADR: serialization, ownership, decorators,
 session methods, exports

- Add to_dict()/from_dict() on AgentSession with 'type' discriminator
- Present serialization as Option A (direct) vs Option B (through agent)
- Rewrite ownership section as 2x2 matrix (orthogonal decision)
- Move Instance Ownership Options before Decision Outcome
- Fix get_session to use service_session_id, split from create_session
- Add decorator-based provider convenience API (@before_run/@after_run)
- Add _ prefix naming strategy for all PR1 types (core + external)
- Constructor compatibility table for existing providers
- Add load_messages=False skip logic to all agent run loops
- Clarify abstract vs non-abstract in execution pattern samples
- Update auto-provision: trigger on conversation_id or store=True
- Document root package exports (ContextProvider, HistoryProvider, etc.)
- Rename section heading to 'Key Design Considerations'
---
 .../00XX-python-context-middleware.md         | 806 ++++++++++--------
 1 file changed, 463 insertions(+), 343 deletions(-)

diff --git a/docs/decisions/00XX-python-context-middleware.md b/docs/decisions/00XX-python-context-middleware.md
index e1c24b4f7b..d324e6f79a 100644
--- a/docs/decisions/00XX-python-context-middleware.md
+++ b/docs/decisions/00XX-python-context-middleware.md
@@ -2,7 +2,7 @@
 # These are optional elements. Feel free to remove any of them.
 status: accepted
 contact: eavanvalkenburg
-date: 2026-02-05
+date: 2026-02-09
 deciders: eavanvalkenburg, markwallace-microsoft, sphenry, alliscode, johanst, brettcannon, westey-m
 consulted: taochenosu, moonbox3, dmytrostruk, giles17
 ---
@@ -16,7 +16,7 @@ The Agent Framework Python SDK currently has multiple abstractions for managing
 | Concept | Purpose | Location |
 |---------|---------|----------|
 | `ContextProvider` | Injects instructions, messages, and tools before/after invocations | `_memory.py` |
-| `ChatMessageStore` / `ChatMessageStoreProtocol` | Stores and retrieves conversation history | `_threads.py` |
+| `ChatMessageStore` | Stores and retrieves conversation history | `_threads.py` |
 | `AgentThread` | Manages conversation state and coordinates storage | `_threads.py` |
 
 This creates cognitive overhead for developers doing "Context Engineering" - the practice of dynamically managing what context (history, RAG results, instructions, tools) is sent to the model. Users must understand:
@@ -30,9 +30,9 @@ This creates cognitive overhead for developers doing "Context Engineering" - the
 
 - **Simplicity**: Reduce the number of concepts users must learn
 - **Composability**: Enable multiple context sources to be combined flexibly
-- **Consistency**: Follow existing patterns in the framework (middleware)
-- **Flexibility**: Support both stateless and session-specific middleware
-- **Attribution**: Enable tracking which middleware added which messages/tools
+- **Consistency**: Follow existing patterns in the framework
+- **Flexibility**: Support both stateless and session-specific context engineering
+- **Attribution**: Enable tracking which provider added which messages/tools
 - **Zero-config**: Simple use cases should work without configuration
 
 ## Related Issues
@@ -42,8 +42,8 @@ This ADR addresses the following issues from the parent issue [#3575](https://gi
 | Issue | Title | How Addressed |
 |-------|-------|---------------|
 | [#3587](https://github.com/microsoft/agent-framework/issues/3587) | Rename AgentThread to AgentSession | ✅ `AgentThread` → `AgentSession` (clean break, no alias). See [§7 Renaming](#7-renaming-thread--session). |
-| [#3588](https://github.com/microsoft/agent-framework/issues/3588) | Add get_new_session, get_session_by_id methods | ✅ `agent.create_session()` (no params) and `agent.get_session_by_id(id)`. See [§9 Session Management Methods](#9-session-management-methods). |
-| [#3589](https://github.com/microsoft/agent-framework/issues/3589) | Move serialize method into the agent | ✅ `agent.serialize_session(session)` and `agent.restore_session(state)`. Agent handles all serialization. See [§8 Serialization](#8-session-serializationdeserialization). |
+| [#3588](https://github.com/microsoft/agent-framework/issues/3588) | Add get_new_session, get_session_by_id methods | ✅ `agent.create_session()` and `agent.get_session(service_session_id)`. See [§9 Session Management Methods](#9-session-management-methods). |
+| [#3589](https://github.com/microsoft/agent-framework/issues/3589) | Move serialize method into the agent | ✅ No longer needed. `AgentSession` provides `to_dict()`/`from_dict()` for serialization. Providers write JSON-serializable values to `session.state`. See [§8 Serialization](#8-session-serializationdeserialization). |
 | [#3590](https://github.com/microsoft/agent-framework/issues/3590) | Design orthogonal ChatMessageStore for service vs local | ✅ `StorageContextMiddleware` works orthogonally: configure `load_messages=False` when service manages storage. Multiple storage middleware allowed. See [§3 Unified Storage](#3-unified-storage-middleware). |
 | [#3601](https://github.com/microsoft/agent-framework/issues/3601) | Rename ChatMessageStore to ChatHistoryProvider | 🔒 **Closed** - Superseded by this ADR. `ChatMessageStore` removed entirely, replaced by `StorageContextMiddleware`. |
 
@@ -74,8 +74,6 @@ class ContextProvider(ABC):
 ```
 
 **Limitations:**
-- Separate `invoking()` and `invoked()` methods make pre/post processing awkward
-- Returns a `Context` object that must be merged externally
 - No clear way to compose multiple providers
 - No source attribution for debugging
 
@@ -91,9 +89,10 @@ class ChatMessageStoreProtocol(Protocol):
 ```
 
 **Limitations:**
-- Only handles storage, no context injection
+- Only handles message storage, no context injection
 - Separate concept from `ContextProvider`
 - No control over what gets stored (RAG context vs user messages)
+- No control over which get's executed first, the Context Provider or the ChatMessageStore (ordering ambiguity), this is controlled by the framework
 
 ### AgentThread (Current)
 
@@ -110,41 +109,40 @@ class AgentThread:
 
 **Limitations:**
 - Coordinates storage and context separately
-- Only one `context_provider` (no composition)
-- Naming confusion (`Thread` vs `Session`)
+- Only one `context_provider` and one `ChatMessageStore` (no composition)
 
-## Design Decisions Summary
+## Key Design Considerations
 
 The following key decisions shape the ContextProvider design:
 
 | # | Decision | Rationale |
 |---|----------|-----------|
-| 1 | **Agent vs Session Ownership** | Agent owns plugin instances; Session owns state as mutable dict. Plugins shared across sessions, state isolated per session. |
+| 1 | **Agent vs Session Ownership** | Agent owns provider instances; Session owns state as mutable dict. Providers shared across sessions, state isolated per session. |
 | 2 | **Execution Pattern** | **ContextProvider** with `before_run`/`after_run` methods (hooks pattern). Simpler mental model than wrapper/onion pattern. |
 | 3 | **State Management** | Whole state dict (`dict[str, Any]`) passed to each plugin. Dict is mutable, so no return value needed. |
-| 4 | **Default Storage at Runtime** | `InMemoryHistoryProvider` auto-added when no service_session_id, store≠True, and no plugins. Evaluated at runtime so users can modify pipeline first. |
+| 4 | **Default Storage at Runtime** | `InMemoryHistoryProvider` auto-added when no providers configured and `options.conversation_id` is set or `options.store` is True. Evaluated at runtime so users can modify pipeline first. |
 | 5 | **Multiple Storage Allowed** | Warn at session creation if multiple or zero history providers have `load_messages=True` (likely misconfiguration). |
 | 6 | **Single Storage Class** | One `HistoryProvider` configured for memory/audit/evaluation - no separate classes. |
 | 7 | **Mandatory source_id** | Required parameter forces explicit naming for attribution in `context_messages` dict. |
-| 8 | **Explicit Load Behavior** | `load_messages: bool = True` - explicit configuration with no automatic detection. For `HistoryProvider`, `before_run` is skipped entirely when `load_messages=False`. |
+| 8 | **Explicit Load Behavior** | `load_messages: bool = True` - explicit configuration with no automatic detection. For history, `before_run` is skipped entirely when `load_messages=False`. |
 | 9 | **Dict-based Context** | `context_messages: dict[str, list[ChatMessage]]` keyed by source_id maintains order and enables filtering. Messages can have an `attribution` marker in `additional_properties` for external filtering scenarios. |
 | 10 | **Selective Storage** | `store_context_messages` and `store_context_from` control what gets persisted from other plugins. |
-| 11 | **Tool Attribution** | `add_tools()` automatically sets `tool.metadata["context_source"] = source_id`. |
-| 12 | **Clean Break** | Remove `AgentThread`, old `ContextProvider`, `ChatMessageStore` completely; replace with new `ContextProvider` (hooks pattern), `HistoryProvider`, `AgentSession`. No compatibility shims (preview). |
+| 11 | **Tool Attribution** | `extend_tools()` automatically sets `tool.metadata["context_source"] = source_id`. |
+| 12 | **Clean Break** | Remove `AgentThread`, old `ContextProvider`, `ChatMessageStore` completely; replace with new `ContextProvider` (hooks pattern), `HistoryProvider`, `AgentSession`. PR1 uses temporary names (`_ContextProviderBase`, `_HistoryProviderBase`) to coexist with old types; PR2 renames to final names after old types are removed. No compatibility shims (preview). |
 | 13 | **Plugin Ordering** | User-defined order; storage sees prior plugins (pre-processing) or all plugins (post-processing). |
-| 14 | **Agent-owned Serialization** | `agent.serialize_session(session)` and `agent.restore_session(state)`. Agent handles all serialization. |
-| 15 | **Session Management Methods** | `agent.create_session()` (no required params) and `agent.get_session_by_id(id)` for clear lifecycle management. |
+| 14 | **Session Serialization via `to_dict`/`from_dict`** | `AgentSession` provides `to_dict()` and `from_dict()` for round-tripping. Providers must ensure values they write to `session.state` are JSON-serializable. No `serialize()`/`restore()` methods on providers. |
+| 15 | **Session Management Methods** | `agent.create_session()` and `agent.get_session(service_session_id)` for clear lifecycle management. |
 
 ## Considered Options
 
 ### Option 1: Status Quo - Keep Separate Abstractions
 
-Keep `ContextProvider`, `ChatMessageStore`, and `AgentThread` as separate concepts.
+Keep `ContextProvider`, `ChatMessageStore`, and `AgentThread` as separate concepts. With updated naming and minor improvements, but no fundamental changes to the API or execution model.
 
 **Pros:**
 - No migration required
 - Familiar to existing users
-- Each concept has a clear, focused responsibility
+- Each concept has a focused responsibility
 - Existing documentation and examples remain valid
 
 **Cons:**
@@ -194,14 +192,15 @@ class ContextMiddleware(ABC):
 - Forgetting `await next(context)` silently breaks the chain
 - Stack depth increases with each middleware layer
 - Harder to implement middleware that only needs pre OR post processing
+- Streaming is more complicated
 
 ### Option 3: ContextHooks - Pre/Post Pattern
 
-Create a `ContextHooks` base class with explicit `before_run()` and `after_run()` methods, diverging from the wrapper pattern used by middleware. This includes a `StorageContextHooks` subclass specifically for history persistence.
+Create a `ContextHooks` base class with explicit `before_run()` and `after_run()` methods, diverging from the wrapper pattern used by middleware. This includes a `HistoryContextHooks` subclass specifically for history persistence.
 
 **Class hierarchy:**
 - `ContextHooks` (base) - for general context injection (RAG, instructions, tools)
-- `StorageContextHooks(ContextHooks)` - for conversation history storage (in-memory, Redis, Cosmos, etc.)
+- `HistoryContextHooks(ContextHooks)` - for conversation history storage (in-memory, Redis, Cosmos, etc.)
 
 ```python
 class ContextHooks(ABC):
@@ -285,11 +284,11 @@ agent = ChatAgent(
 
 **Cons:**
 - Diverges from the wrapper pattern used by `AgentMiddleware` and `ChatMiddleware`
-- Less powerful: cannot short-circuit the chain or implement retry logic
+- Less powerful: cannot short-circuit the chain or implement retry logic (to mitigate, AgentMiddleware still exists and can be used for  this scenario.)
 - No "around" advice: cannot wrap invocation in try/catch or timing block
 - Exception in `before_run` may leave state inconsistent if no cleanup in `after_run`
 - Two methods to implement instead of one (though both are optional)
-- Harder to share state between before/after (need instance variables)
+- Harder to share state between before/after (need instance variables, use state)
 - Cannot control whether subsequent hooks run (no early termination)
 
 ## Detailed Design
@@ -305,22 +304,34 @@ The core difference between the two options is the execution model:
 class ContextMiddleware(ABC):
     @abstractmethod
     async def process(self, context: SessionContext, next: ContextMiddlewareNext) -> None:
-        # Pre-processing
-        context.add_messages(self.source_id, [...])
-        await next(context)  # Call next middleware
-        # Post-processing
-        await self.store(context.response_messages)
+        """Abstract — subclasses must implement the full pre/invoke/post flow."""
+        ...
+
+# Subclass must implement process():
+class RAGMiddleware(ContextMiddleware):
+    async def process(self, context, next):
+        context.add_messages(self.source_id, [...])  # Pre-processing
+        await next(context)                           # Call next middleware
+        await self.store(context.response_messages)   # Post-processing
 ```
 
 **Option 3 - Hooks (Linear):**
 ```python
-class ContextHooks(ABC):
+class ContextHooks:
     async def before_run(self, context: SessionContext) -> None:
-        """Called before model invocation."""
-        context.add_messages(self.source_id, [...])
+        """Default no-op. Override to add pre-invocation logic."""
+        pass
 
     async def after_run(self, context: SessionContext) -> None:
-        """Called after model invocation."""
+        """Default no-op. Override to add post-invocation logic."""
+        pass
+
+# Subclass overrides only the hooks it needs:
+class RAGHooks(ContextHooks):
+    async def before_run(self, context):
+        context.add_messages(self.source_id, [...])
+
+    async def after_run(self, context):
         await self.store(context.response_messages)
 ```
 
@@ -343,40 +354,19 @@ Middleware (Wrapper/Onion):            Hooks (Linear):
 
 ### 2. Agent vs Session Ownership
 
-Both approaches use the same ownership model:
-- **Agent** owns the configuration (instances or factories)
-- **AgentSession** owns the resolved pipeline (created at runtime)
+Where provider instances live (agent-level vs session-level) is an orthogonal decision that applies to both execution patterns. Each combination has different consequences:
 
-**Middleware:**
-```python
-agent = ChatAgent(
-    chat_client=client,
-    context_middleware=[
-        InMemoryStorageMiddleware("memory"),
-        RAGContextMiddleware("rag"),
-    ]
-)
-session = agent.create_session()
-```
+|  | **Agent owns instances** | **Session owns instances** |
+|--|--------------------------|---------------------------|
+| **Middleware (Option 2)** | Agent holds the middleware chain; all sessions share it. Per-session state must be externalized (e.g., passed via context). Pipeline ordering is fixed across sessions. | Each session gets its own middleware chain (via factories). Middleware can hold per-session state internally. Requires factory pattern to construct per-session instances. |
+| **Hooks (Option 3)** | Agent holds provider instances; all sessions share them. Per-session state lives in `session.state` dict. Simple flat iteration, no pipeline to construct. | Each session gets its own provider instances (via factories). Providers can hold per-session state internally. Adds factory complexity without the pipeline benefit. |
 
-**Hooks:**
-```python
-agent = ChatAgent(
-    chat_client=client,
-    context_hooks=[
-        InMemoryStorageHooks("memory"),
-        RAGContextHooks("rag"),
-    ]
-)
-session = agent.create_session()
-```
+**Key trade-offs:**
 
-**Comparison to Current:**
-| Aspect | AgentThread (Current) | AgentSession (New) |
-|--------|----------------------|-------------------|
-| Storage | `message_store` attribute | Via storage middleware/hooks in pipeline |
-| Context | `context_provider` attribute | Via any middleware/hooks in pipeline |
-| Composition | One of each | Unlimited middleware/hooks |
+- **Agent-owned + Middleware**: The nested call chain makes it awkward to share — each `process()` call captures `next` in its closure, which may carry session-specific assumptions. Externalizing state is harder when it's interleaved with the wrapping flow.
+- **Session-owned + Middleware**: Natural fit — each session gets its own chain with isolated state. But requires factories and heavier sessions.
+- **Agent-owned + Hooks**: Natural fit — `before_run`/`after_run` are stateless calls that receive everything they need as parameters (`session`, `context`, `state`). No pipeline to construct, lightweight sessions.
+- **Session-owned + Hooks**: Works but adds factory overhead without clear benefit — hooks don't need per-instance state since `session.state` handles isolation.
 
 ### 3. Unified Storage
 
@@ -531,72 +521,56 @@ agent = ChatAgent(context_hooks=[create_cache])
 
 ### 8. Session Serialization/Deserialization
 
-Both approaches use the same agent-owned serialization pattern:
+There are two approaches to session serialization:
 
-**Base class (both approaches):**
-```python
-# ContextMiddleware or ContextHooks - same interface
-async def serialize(self) -> Any:
-    """Serialize state. Default returns None (no state)."""
-    return None
-
-async def restore(self, state: Any) -> None:
-    """Restore state from serialized object."""
-    pass
-```
+**Option A: Direct serialization on `AgentSession`**
+
+The session itself provides `to_dict()` and `from_dict()`. The caller controls when and where to persist:
 
-**Agent methods (identical for both):**
 ```python
-class ChatAgent:
-    async def serialize_session(self, session: AgentSession) -> dict[str, Any]:
-        """Serialize a session's state for persistence."""
-        middleware_states: dict[str, Any] = {}
-        if session.context_pipeline:
-            for item in session.context_pipeline:
-                state = await item.serialize()
-                if state is not None:
-                    middleware_states[item.source_id] = state
-        return {
-            "session_id": session.session_id,
-            "service_session_id": session.service_session_id,
-            "middleware_states": middleware_states,
-        }
+# Serialize
+data = session.to_dict()          # → {"type": "session", "session_id": ..., "service_session_id": ..., "state": {...}}
+json_str = json.dumps(data)       # Store anywhere (database, file, cache, etc.)
 
-    async def restore_session(self, serialized: dict[str, Any]) -> AgentSession:
-        """Restore a session from serialized state."""
-        ...
+# Deserialize
+data = json.loads(json_str)
+session = AgentSession.from_dict(data)  # Reconstructs session with all state intact
 ```
 
+**Option B: Serialization through the agent**
+
+The agent provides `save_session()`/`load_session()` methods that coordinate with providers (e.g., letting providers hook into the serialization process, or validating state before persisting). This adds flexibility but also complexity — providers would need lifecycle hooks for serialization, and the agent becomes responsible for persistence concerns.
+
+**Provider contract (both options):** Any values a provider writes to `session.state`/through lifecycle hooks **must be JSON-serializable** (dicts, lists, strings, numbers, booleans, None).
+
+**Comparison to Current:**
+| Aspect | Current (`AgentThread`) | New (`AgentSession`) |
+|--------|------------------------|---------------------|
+| Serialization | `ChatMessageStore.serialize()` + custom logic | `session.to_dict()` → plain dict |
+| Deserialization | `ChatMessageStore.deserialize()` + factory | `AgentSession.from_dict(data)` |
+| Provider state | Instance state, needs custom ser/deser | Plain dict values in `session.state` |
+
 ### 9. Session Management Methods
 
 Both approaches use identical agent methods:
 
 ```python
 class ChatAgent:
-    def create_session(
-        self,
-        *,
-        session_id: str | None = None,
-        service_session_id: str | None = None,
-    ) -> AgentSession:
-        """Create a new session with a fresh pipeline."""
+    def create_session(self, *, session_id: str | None = None) -> AgentSession:
+        """Create a new session."""
         ...
 
-    def get_session_by_id(self, session_id: str) -> AgentSession:
-        """Get a session by ID with a fresh pipeline."""
-        return self.create_session(session_id=session_id)
-
-    async def serialize_session(self, session: AgentSession) -> dict[str, Any]: ...
-    async def restore_session(self, serialized: dict[str, Any]) -> AgentSession: ...
+    def get_session(self, service_session_id: str, *, session_id: str | None = None) -> AgentSession:
+        """Get a session for a service-managed session ID."""
+        ...
 ```
 
 **Usage (identical for both):**
 ```python
 session = agent.create_session()
-session = agent.create_session(session_id="user-123-session-456")
-session = agent.create_session(service_session_id="thread_abc123")
-session = agent.get_session_by_id("existing-session-id")
-session = await agent.restore_session(state)
+session = agent.create_session(session_id="custom-id")
+session = agent.get_session("existing-service-session-id")
+session = agent.get_session("existing-service-session-id", session_id="custom-id")
 ```
 
 ### 10. Accessing Context from Other Middleware/Hooks
@@ -728,43 +702,6 @@ agent = ChatAgent(
 session = agent.create_session()
 response = await agent.run("Hello", session=session)
 ```
-## Decision Outcome
-
-### Decision 1: Execution Pattern
-
-**Chosen: Option 3 - Hooks (Pre/Post Pattern)** with the following naming:
-- **Class name:** `ContextProvider` (emphasizes extensibility, familiar from build tools)
-- **Method names:** `before_run` / `after_run` (matches `agent.run()` terminology)
-
-Rationale:
-- Simpler mental model: "before" runs before, "after" runs after - no nesting to understand
-- Easier to implement plugins that only need one phase (just override one method)
-- More similar to the current `ContextProvider` API (`invoking`/`invoked`), easing migration
-- Clearer separation between what this does vs what Agent Middleware can do
-
-Both options share the same:
-- Agent vs Session ownership model
-- `source_id` attribution
-- Serialization/deserialization via agent methods
-- Session management methods (`create_session`, `get_session_by_id`, `serialize_session`, `restore_session`)
-- Renaming `AgentThread` → `AgentSession`
-
-### Decision 2: Instance Ownership (Orthogonal)
-
-**Chosen: Option B1 - Instances in Agent, State in Session (Simple Dict)**
-
-The `ChatAgent` owns and manages the `ContextProvider` instances. The `AgentSession` only stores state as a mutable `dict[str, Any]`. Each plugin receives the **whole state dict** (not just its own slice), and since a dict is mutable, no return value is needed - plugins modify the dict in place.
-
-> **Note on trust:** Since all `ContextProvider` instances reason over conversation messages (which may contain sensitive user data), they should be **trusted by default**. This is also why we allow all plugins to see all state - if a plugin is untrusted, it shouldn't be in the pipeline at all. The whole state dict is passed rather than isolated slices because plugins that handle messages already have access to the full conversation context.
-
-Rationale for B1 over B2: Simpler is better. The whole state dict is passed to each plugin, and since Python dicts are mutable, plugins can modify state in place without returning anything. This is the most Pythonic approach.
-
-Rationale for B over A:
-- Lightweight sessions - just data, easy to serialize/transfer
-- Plugin instances shared across sessions (more memory efficient)
-- Clearer separation: agent = behavior, session = state
-- Factories not needed - state dict handles per-session needs
-
 ### Instance Ownership Options (for reference)
 
 #### Option A: Instances in Session
@@ -830,7 +767,7 @@ class ChatAgent:
 
 #### Option B: Instances in Agent, State in Session (CHOSEN)
 
-The `ChatAgent` owns and manages the middleware/hooks instances. The `AgentSession` only stores state data that middleware reads/writes. The agent's runner executes the pipeline using the session's state.
+The agent owns and manages the middleware/hooks instances. The `AgentSession` only stores state data that middleware reads/writes. The agent's runner executes the pipeline using the session's state.
 
 Two variants exist for how state is stored in the session:
 
@@ -870,6 +807,9 @@ class ChatAgent:
 
         # Before-run plugins
         for plugin in self._context_providers:
+            # Skip before_run for HistoryProviders that don't load messages
+            if isinstance(plugin, HistoryProvider) and not plugin.load_messages:
+                continue
             await plugin.before_run(self, session, context, session.state)
 
         # assemble final input messages from context
@@ -885,7 +825,7 @@ class ChatAgent:
 class InMemoryHistoryProvider(ContextProvider):
     async def before_run(
         self,
-        agent: "ChatAgent",
+        agent: "SupportsAgentRun",
         session: AgentSession,
         context: SessionContext,
         state: dict[str, Any],
@@ -893,11 +833,11 @@ class InMemoryHistoryProvider(ContextProvider):
         # Read from state (use source_id as key for namespace)
         my_state = state.get(self.source_id, {})
         messages = my_state.get("messages", [])
-        context.add_messages(self.source_id, messages)
+        context.extend_messages(self.source_id, messages)
 
     async def after_run(
         self,
-        agent: "ChatAgent",
+        agent: "SupportsAgentRun",
         session: AgentSession,
         context: SessionContext,
         state: dict[str, Any],
@@ -908,7 +848,7 @@ class InMemoryHistoryProvider(ContextProvider):
         my_state["messages"] = [
             *messages,
             *context.input_messages,
-            *(context.response_messages or []),
+            *(context.response.messages or []),
         ]
 
 
@@ -916,16 +856,16 @@ class InMemoryHistoryProvider(ContextProvider):
 class TimeContextProvider(ContextProvider):
     async def before_run(
         self,
-        agent: "ChatAgent",
+        agent: "SupportsAgentRun",
         session: AgentSession,
         context: SessionContext,
         state: dict[str, Any],
     ) -> None:
-        context.add_instructions(self.source_id, f"Current time: {datetime.now()}")
+        context.extend_instructions(self.source_id, f"Current time: {datetime.now()}")
 
     async def after_run(
         self,
-        agent: "ChatAgent",
+        agent: "SupportsAgentRun",
         session: AgentSession,
         context: SessionContext,
         state: dict[str, Any],
@@ -1053,7 +993,7 @@ class TimeContextHooks(ContextHooks):
 ```
 
 **Option B Pros (both variants):**
-- Lightweight sessions - just data, easy to serialize/transfer
+- Lightweight sessions - just data, serializable via `to_dict()`/`from_dict()`
 - Plugin instances shared across sessions (more memory efficient)
 - Clearer separation: agent = behavior, session = state
 
@@ -1079,7 +1019,7 @@ class TimeContextHooks(ContextHooks):
 | Session weight | Heavier (instances + state) | Lighter (state only) |
 | Plugin sharing | Per-session instances | Shared across sessions |
 | Instance state | Natural (instance variables) | Explicit (state dict) |
-| Serialization | Serialize session + plugins | Serialize state only |
+| Serialization | Serialize session + plugins | `session.to_dict()`/`AgentSession.from_dict()` |
 | Factory handling | Resolved at session creation | Not needed (state dict handles per-session needs) |
 | Signature | `before_run(context)` | `before_run(agent, session, context, state)` |
 | Session portability | Works with any agent | Tied to agent's plugins config |
@@ -1094,6 +1034,43 @@ With Option B (instances in agent, state in session), the plugins are shared acr
 - Plugins use `state.setdefault(self.source_id, {})` to namespace their state
 
 ---
+## Decision Outcome
+
+### Decision 1: Execution Pattern
+
+**Chosen: Option 3 - Hooks (Pre/Post Pattern)** with the following naming:
+- **Class name:** `ContextProvider` (emphasizes extensibility, familiar from build tools, and does not favor reading or writing)
+- **Method names:** `before_run` / `after_run` (matches `agent.run()` terminology)
+
+Rationale:
+- Simpler mental model: "before" runs before, "after" runs after - no nesting to understand
+- Easier to implement plugins that only need one phase (just override one method)
+- More similar to the current `ContextProvider` API (`invoking`/`invoked`), easing migration
+- Clearer separation between what this does vs what Agent Middleware can do
+
+Both options share the same:
+- Agent vs Session ownership model
+- `source_id` attribution
+- Natively serializable sessions (state dict is JSON-serializable)
+- Session management methods (`create_session`, `get_session`)
+- Renaming `AgentThread` → `AgentSession`
+
+### Decision 2: Instance Ownership (Orthogonal)
+
+**Chosen: Option B1 - Instances in Agent, State in Session (Simple Dict)**
+
+The agent (any `SupportsAgentRun` implementation) owns and manages the `ContextProvider` instances. The `AgentSession` only stores state as a mutable `dict[str, Any]`. Each plugin receives the **whole state dict** (not just its own slice), and since a dict is mutable, no return value is needed - plugins modify the dict in place.
+
+Rationale for B over A:
+- Lightweight sessions - just data, serializable via `to_dict()`/`from_dict()`
+- Plugin instances shared across sessions (more memory efficient)
+- Clearer separation: agent = behavior, session = state
+- Factories not needed - state dict handles per-session needs
+
+Rationale for B1 over B2: Simpler is better. The whole state dict is passed to each plugin, and since Python dicts are mutable, plugins can modify state in place without returning anything. This is the most Pythonic approach.
+
+> **Note on trust:** Since all `ContextProvider` instances reason over conversation messages (which may contain sensitive user data), they should be **trusted by default**. This is also why we allow all plugins to see all state - if a plugin is untrusted, it shouldn't be in the pipeline at all. The whole state dict is passed rather than isolated slices because plugins that handle messages already have access to the full conversation context.
+
 
 ## Comparison to .NET Implementation
 
@@ -1103,9 +1080,12 @@ The .NET Agent Framework provides equivalent functionality through a different s
 
 | .NET Concept | Python (Chosen) |
 |--------------|-----------------|
-| `AIContextProvider` | `ContextProvider` |
-| `ChatHistoryProvider` | `HistoryProvider` |
-| `AgentSession` | `AgentSession` |
+| `AIContextProvider` (abstract base) | `ContextProvider` |
+| `ChatHistoryProvider` (abstract base) | `HistoryProvider` |
+| `AIContext` (return from `InvokingAsync`) | `SessionContext` (mutable, passed through) |
+| `AgentSession` / `ChatClientAgentSession` | `AgentSession` |
+| `InMemoryChatHistoryProvider` | `InMemoryHistoryProvider` |
+| `ChatClientAgentOptions` factory delegates | Not needed - state dict handles per-session needs |
 
 ### Feature Equivalence
 
@@ -1113,12 +1093,15 @@ Both platforms provide the same core capabilities:
 
 | Capability | .NET | Python |
 |------------|------|--------|
-| Inject context before invocation | `AIContextProvider.InvokingAsync()` | `ContextProvider.before_run()` |
+| Inject context before invocation | `AIContextProvider.InvokingAsync()` → returns `AIContext` with `Instructions`, `Messages`, `Tools` | `ContextProvider.before_run()` → mutates `SessionContext` in place |
 | React after invocation | `AIContextProvider.InvokedAsync()` | `ContextProvider.after_run()` |
-| Load conversation history | `ChatHistoryProvider.InvokingAsync()` | `HistoryProvider` with `load_messages=True` |
-| Store conversation history | `ChatHistoryProvider.InvokedAsync()` | `HistoryProvider` with `store_*` flags |
-| Session serialization | `Serialize()` on providers | Session's `state` dict is directly serializable |
-| Factory-based creation | `AIContextProviderFactory`, `ChatHistoryProviderFactory` | Not needed - state dict handles per-session needs |
+| Load conversation history | `ChatHistoryProvider.InvokingAsync()` → returns `IEnumerable<ChatMessage>` | `HistoryProvider.before_run()` → calls `context.extend_messages()` |
+| Store conversation history | `ChatHistoryProvider.InvokedAsync()` | `HistoryProvider.after_run()` → calls `save_messages()` |
+| Session serialization | `Serialize()` on providers → `JsonElement` | `session.to_dict()`/`AgentSession.from_dict()` — providers write JSON-serializable values to `session.state` |
+| Factory-based creation | `Func<FactoryContext, CancellationToken, ValueTask<Provider>>` delegates on `ChatClientAgentOptions` | Not needed - state dict handles per-session needs |
+| Default storage | Auto-injects `InMemoryChatHistoryProvider` when no `ChatHistoryProvider` or `ConversationId` set | Auto-injects `InMemoryHistoryProvider` when no providers and `conversation_id` or `store=True` |
+| Service-managed history | `ConversationId` property (mutually exclusive with `ChatHistoryProvider`) | `service_session_id` on `AgentSession` |
+| Message reduction | `IChatReducer` on `InMemoryChatHistoryProvider` | Not yet designed (see Open Discussion: Context Compaction) |
 
 ### Implementation Differences
 
@@ -1126,13 +1109,16 @@ The implementations differ in ways idiomatic to each language:
 
 | Aspect | .NET Approach | Python Approach |
 |--------|---------------|-----------------|
-| **Context providers** | Separate `AIContextProvider` (single) and `ChatHistoryProvider` (single) | Unified list of `ContextProvider` (multiple) |
-| **Composition** | One of each provider type per session | Unlimited plugins in pipeline |
-| **Type system** | Strict interfaces, compile-time checks | Duck typing, protocols, runtime flexibility |
-| **Configuration** | DI container, factory delegates | Direct instantiation, list of instances |
-| **State management** | Instance state in providers | Explicit state dict in session |
-| **Default storage** | Can auto-inject when `ChatHistoryProvider` missing | Only auto-injects when no plugins configured |
-| **Source tracking** | Via separate provider types | Built-in `source_id` on each plugin |
+| **Context providers** | Separate `AIContextProvider` and `ChatHistoryProvider` (one of each per session) | Unified list of `ContextProvider` (multiple) |
+| **Composition** | One of each provider type per session | Unlimited providers in pipeline |
+| **Context passing** | `InvokingAsync()` returns `AIContext` (instructions + messages + tools) | `before_run()` mutates `SessionContext` in place |
+| **Response access** | `InvokedContext` carries response messages | `SessionContext.response` carries full `AgentResponse` (messages, response_id, usage_details, etc.) |
+| **Type system** | Strict abstract classes, compile-time checks | Duck typing, protocols, runtime flexibility |
+| **Configuration** | Factory delegates on `ChatClientAgentOptions` | Direct instantiation, list of instances |
+| **State management** | Instance state in providers, serialized via `JsonElement` | Explicit state dict in session, serialized via `session.to_dict()` |
+| **Default storage** | Auto-injects `InMemoryChatHistoryProvider` when neither `ChatHistoryProvider` nor `ConversationId` is set | Auto-injects `InMemoryHistoryProvider` when no providers and `conversation_id` or `store=True` |
+| **Source tracking** | Limited - `message.source_id` in observability/DevUI only | Built-in `source_id` on every provider, keyed in `context_messages` dict |
+| **Service discovery** | `GetService<T>()` on providers and sessions | Not applicable - Python uses direct references |
 
 ### Design Trade-offs
 
@@ -1140,15 +1126,18 @@ Each approach has trade-offs that align with language conventions:
 
 **.NET's separate provider types:**
 - Clearer separation between context injection and history storage
-- Easier to detect "missing storage" and auto-inject defaults
+- Easier to detect "missing storage" and auto-inject defaults (checks for `ChatHistoryProvider` or `ConversationId`)
 - Type system enforces single provider of each type
+- `AIContext` return type makes it clear what context is being added (instructions vs messages vs tools)
+- `GetService<T>()` pattern enables provider discovery without tight coupling
 
 **Python's unified pipeline:**
 - Single abstraction for all context concerns
-- Multiple instances of same type (e.g., multiple storage backends)
+- Multiple instances of same type (e.g., multiple storage backends with different `source_id`s)
 - More explicit - customization means owning full configuration
 - `source_id` enables filtering/debugging across all sources
-- Explicit state dict makes serialization trivial
+- Mutable `SessionContext` avoids allocating return objects
+- Explicit state dict makes serialization trivial (no `JsonElement` layer)
 
 Neither approach is inherently better - they reflect different language philosophies while achieving equivalent functionality. The Python design embraces the "we're all consenting adults" philosophy, while .NET provides more compile-time guardrails.
 
@@ -1274,7 +1263,7 @@ class ContextProvider(ABC):
 
     async def before_run(
         self,
-        agent: "ChatAgent",
+        agent: "SupportsAgentRun",
         session: AgentSession,
         context: SessionContext,
         state: dict[str, Any],
@@ -1284,22 +1273,18 @@ class ContextProvider(ABC):
 
     async def after_run(
         self,
-        agent: "ChatAgent",
+        agent: "SupportsAgentRun",
         session: AgentSession,
         context: SessionContext,
         state: dict[str, Any],
     ) -> None:
         """Called after model invocation. Override to process response."""
         pass
+```
 
-    async def serialize(self) -> Any:
-        """Serialize provider state. Default returns None (no state)."""
-        return None
+> **Serialization contract:** Any values a provider writes to `state` must be JSON-serializable. Sessions are serialized via `session.to_dict()` and restored via `AgentSession.from_dict()`.
 
-    async def restore(self, state: Any) -> None:
-        """Restore provider state from serialized object."""
-        pass
-```
+> **Agent-agnostic:** The `agent` parameter is typed as `SupportsAgentRun` (the base protocol), not `ChatAgent`. Context providers work with any agent implementation.
 
 ### HistoryProvider
 
@@ -1307,6 +1292,10 @@ class ContextProvider(ABC):
 class HistoryProvider(ContextProvider):
     """Base class for conversation history storage providers.
 
+    Subclasses only need to implement get_messages() and save_messages().
+    The default before_run/after_run handle loading and storing based on
+    configuration flags. Override them for custom behavior.
+
     A single class configured for different use cases:
     - Primary memory storage (loads + stores messages)
     - Audit/logging storage (stores only, doesn't load)
@@ -1314,7 +1303,7 @@ class HistoryProvider(ContextProvider):
 
     Loading behavior:
     - `load_messages=True` (default): Load messages from storage in before_run
-    - `load_messages=False`: Skip loading (before_run is a no-op)
+    - `load_messages=False`: Agent skips `before_run` entirely (audit/logging mode)
 
     Storage behavior:
     - `store_inputs`: Store input messages (default True)
@@ -1334,6 +1323,8 @@ class HistoryProvider(ContextProvider):
         store_context_from: Sequence[str] | None = None,
     ): ...
 
+    # --- Subclasses implement these ---
+
     @abstractmethod
     async def get_messages(self, session_id: str | None) -> list[ChatMessage]:
         """Retrieve stored messages for this session."""
@@ -1343,6 +1334,29 @@ class HistoryProvider(ContextProvider):
     async def save_messages(self, session_id: str | None, messages: Sequence[ChatMessage]) -> None:
         """Persist messages for this session."""
         ...
+
+    # --- Default implementations (override for custom behavior) ---
+
+    async def before_run(self, agent, session, context, state) -> None:
+        """Load history into context. Skipped by the agent when load_messages=False."""
+        history = await self.get_messages(context.session_id)
+        context.extend_messages(self.source_id, history)
+
+    async def after_run(self, agent, session, context, state) -> None:
+        """Store messages based on store_* configuration flags."""
+        messages_to_store: list[ChatMessage] = []
+        # Optionally include context from other providers
+        if self.store_context_messages:
+            if self.store_context_from:
+                messages_to_store.extend(context.get_messages(sources=self.store_context_from))
+            else:
+                messages_to_store.extend(context.get_messages(exclude_sources=[self.source_id]))
+        if self.store_inputs:
+            messages_to_store.extend(context.input_messages)
+        if self.store_responses and context.response.messages:
+            messages_to_store.extend(context.response.messages)
+        if messages_to_store:
+            await self.save_messages(context.session_id, messages_to_store)
 ```
 
 ### SessionContext
@@ -1362,8 +1376,9 @@ class SessionContext:
             Maintains insertion order (provider execution order).
         instructions: Additional instructions - providers can append here
         tools: Additional tools - providers can append here
-        response_messages: After invocation, contains the agent's response (set by agent).
-            READ-ONLY - use AgentMiddleware to modify responses.
+        response (property): After invocation, contains the full AgentResponse (set by agent).
+            Includes response.messages, response.response_id, response.agent_id,
+            response.usage_details, etc. Read-only property - use AgentMiddleware to modify.
         options: Options passed to agent.run() - READ-ONLY, for reflection only
         metadata: Shared metadata dictionary for cross-provider communication
     """
@@ -1377,38 +1392,41 @@ class SessionContext:
         context_messages: dict[str, list[ChatMessage]] | None = None,
         instructions: list[str] | None = None,
         tools: list[ToolProtocol] | None = None,
-        response_messages: list[ChatMessage] | None = None,
         options: dict[str, Any] | None = None,
         metadata: dict[str, Any] | None = None,
     ): ...
+        self._response: "AgentResponse | None" = None
 
-    def add_messages(self, source_id: str, messages: Sequence[ChatMessage]) -> None:
+    @property
+    def response(self) -> "AgentResponse | None":
+        """The agent's response. Set by the framework after invocation, read-only for providers."""
+        ...
+
+    def extend_messages(self, source_id: str, messages: Sequence[ChatMessage]) -> None:
         """Add context messages from a specific source."""
         ...
 
-    def add_instructions(self, source_id: str, instructions: str | Sequence[str]) -> None:
+    def extend_instructions(self, source_id: str, instructions: str | Sequence[str]) -> None:
         """Add instructions to be prepended to the conversation."""
         ...
 
-    def add_tools(self, source_id: str, tools: Sequence[ToolProtocol]) -> None:
+    def extend_tools(self, source_id: str, tools: Sequence[ToolProtocol]) -> None:
         """Add tools with source attribution in tool.metadata."""
         ...
 
     def get_messages(
         self,
+        *,
         sources: Sequence[str] | None = None,
         exclude_sources: Sequence[str] | None = None,
-    ) -> list[ChatMessage]:
-        """Get context messages, optionally filtered by source."""
-        ...
-
-    def get_all_messages(
-        self,
-        *,
         include_input: bool = False,
         include_response: bool = False,
     ) -> list[ChatMessage]:
-        """Get all messages (context + optionally input + response)."""
+        """Get context messages, optionally filtered and optionally including input/response.
+
+        Returns messages in provider execution order (dict insertion order),
+        with input and response appended if requested.
+        """
         ...
 ```
 
@@ -1430,6 +1448,23 @@ class AgentSession:
     @property
     def session_id(self) -> str:
         return self._session_id
+
+    def to_dict(self) -> dict[str, Any]:
+        """Serialize session to a plain dict."""
+        return {
+            "type": "session",
+            "session_id": self._session_id,
+            "service_session_id": self.service_session_id,
+            "state": self.state,
+        }
+
+    @classmethod
+    def from_dict(cls, data: dict[str, Any]) -> "AgentSession":
+        """Restore session from a dict."""
+        session = cls(session_id=data["session_id"])
+        session.service_session_id = data.get("service_session_id")
+        session.state = data.get("state", {})
+        return session
 ```
 
 ### ChatAgent Integration
@@ -1444,20 +1479,33 @@ class ChatAgent:
     ):
         self._context_providers = list(context_providers or [])
 
-    def create_session(self, *, session_id: str | None = None, service_session_id: str | None = None) -> AgentSession:
+    def create_session(self, *, session_id: str | None = None) -> AgentSession:
         """Create a new lightweight session."""
+        return AgentSession(session_id=session_id)
+
+    def get_session(self, service_session_id: str, *, session_id: str | None = None) -> AgentSession:
+        """Get or create a session for a service-managed session ID."""
         session = AgentSession(session_id=session_id)
         session.service_session_id = service_session_id
         return session
 
-    async def run(self, input: str, *, session: AgentSession) -> AgentResponse:
+    async def run(self, input: str, *, session: AgentSession, options: dict[str, Any] | None = None) -> AgentResponse:
+        options = options or {}
+
+        # Auto-add InMemoryHistoryProvider when no providers and conversation_id/store requested
+        if not self._context_providers and (options.get("conversation_id") or options.get("store") is True):
+            self._context_providers.append(InMemoryHistoryProvider("memory"))
+
         context = SessionContext(session_id=session.session_id, input_messages=[...])
 
-        # Before-run providers (forward order)
+        # Before-run providers (forward order, skip HistoryProviders with load_messages=False)
         for provider in self._context_providers:
+            if isinstance(provider, HistoryProvider) and not provider.load_messages:
+                continue
             await provider.before_run(self, session, context, session.state)
 
         # ... assemble messages, invoke model ...
+        context._response = response  # Set the full AgentResponse for after_run access
 
         # After-run providers (reverse order)
         for provider in reversed(self._context_providers):
@@ -1470,17 +1518,20 @@ The `SessionContext` provides explicit methods for adding context:
 
 ```python
 # Adding messages (keyed by source_id in context_messages dict)
-context.add_messages(self.source_id, messages)
+context.extend_messages(self.source_id, messages)
 
 # Adding instructions (flat list, source_id for debugging)
-context.add_instructions(self.source_id, "Be concise and helpful.")
-context.add_instructions(self.source_id, ["Instruction 1", "Instruction 2"])
+context.extend_instructions(self.source_id, "Be concise and helpful.")
+context.extend_instructions(self.source_id, ["Instruction 1", "Instruction 2"])
 
 # Adding tools (source attribution added to tool.metadata automatically)
-context.add_tools(self.source_id, [my_tool, another_tool])
+context.extend_tools(self.source_id, [my_tool, another_tool])
 
-# Getting all messages in provider execution order
-all_messages = context.get_all_messages()
+# Getting all context messages in provider execution order
+all_context = context.get_messages()
+
+# Including input and response messages too
+full_conversation = context.get_messages(include_input=True, include_response=True)
 
 # Filtering by source
 memory_messages = context.get_messages(sources=["memory"])
@@ -1507,7 +1558,7 @@ agent = ChatAgent(
     # No context_providers specified
 )
 
-# Create session - automatically gets InMemoryHistoryProvider on first run
+# Create session - automatically gets InMemoryHistoryProvider when conversation_id or store=True
 session = agent.create_session()
 response = await agent.run("Hello, my name is Alice!", session=session)
 
@@ -1525,8 +1576,7 @@ response = await agent.run("Hello!", session=session, options={"store": True})
 ### Example 1: Explicit Memory Storage
 
 ```python
-from agent_framework import ChatAgent
-from agent_framework.context import InMemoryHistoryProvider
+from agent_framework import ChatAgent, InMemoryHistoryProvider
 
 # Explicit provider configuration (same behavior as default, but explicit)
 agent = ChatAgent(
@@ -1588,14 +1638,14 @@ agent = ChatAgent(
 ### Example 3: Custom Context Providers
 
 ```python
-from agent_framework.context import ContextProvider, SessionContext
+from agent_framework import ContextProvider, SessionContext
 
 class TimeContextProvider(ContextProvider):
     """Adds current time to the context."""
 
     async def before_run(self, agent, session, context, state) -> None:
         from datetime import datetime
-        context.add_instructions(
+        context.extend_instructions(
             self.source_id,
             f"Current date and time: {datetime.now().isoformat()}"
         )
@@ -1607,14 +1657,14 @@ class UserPreferencesProvider(ContextProvider):
     async def before_run(self, agent, session, context, state) -> None:
         prefs = state.get(self.source_id, {}).get("preferences", {})
         if prefs:
-            context.add_instructions(
+            context.extend_instructions(
                 self.source_id,
                 f"User preferences: {json.dumps(prefs)}"
             )
 
     async def after_run(self, agent, session, context, state) -> None:
         # Extract preferences from response and store in session state
-        for msg in context.response_messages or []:
+        for msg in context.response.messages or []:
             if "preference:" in msg.text.lower():
                 my_state = state.setdefault(self.source_id, {})
                 my_state.setdefault("preferences", {})
@@ -1665,7 +1715,7 @@ class RAGContextProvider(ContextProvider):
             ChatMessage(role="system", text=f"Relevant info: {doc}")
             for doc in relevant_docs
         ]
-        context.add_messages(self.source_id, rag_messages)
+        context.extend_messages(self.source_id, rag_messages)
 ```
 
 ### Example 5: Explicit Storage Configuration for Service-Managed Sessions
@@ -1754,7 +1804,7 @@ class RAGContextProvider(ContextProvider):
             query_parts.append(msg.text)
 
         # Can we see history? Depends on provider order!
-        history = context.get_all_messages()  # Gets context from providers that ran before us
+        history = context.get_messages()  # Gets context from providers that ran before us
         if history:
             # Include recent history for better RAG context
             recent = history[-3:]  # Last 3 messages
@@ -1766,7 +1816,7 @@ class RAGContextProvider(ContextProvider):
 
         # Add retrieved documents as context
         rag_messages = [ChatMessage.system(f"Relevant context:\n{doc}") for doc in documents]
-        context.add_messages(self.source_id, rag_messages)
+        context.extend_messages(self.source_id, rag_messages)
 
     async def _retrieve_documents(self, query: str) -> list[str]:
         # ... vector search implementation
@@ -1790,7 +1840,7 @@ agent_rag_first = ChatAgent(
 # Flow:
 # 1. RAG.before_run():
 #    - context.input_messages = ["What's the weather?"]
-#    - context.get_all_messages() = []  (empty - memory hasn't run yet)
+#    - context.get_messages() = []  (empty - memory hasn't run yet)
 #    - RAG query based on: "What's the weather?" only
 #    - Adds: context_messages["rag"] = [retrieved docs]
 #
@@ -1826,7 +1876,7 @@ agent_memory_first = ChatAgent(
 #
 # 2. RAG.before_run():
 #    - context.input_messages = ["What's the weather?"]
-#    - context.get_all_messages() = [previous conversation]  (sees history!)
+#    - context.get_messages() = [previous conversation]  (sees history!)
 #    - RAG query based on: recent history + "What's the weather?"
 #    - Better retrieval because RAG understands conversation context
 #    - Adds: context_messages["rag"] = [more relevant docs]
@@ -1873,25 +1923,36 @@ PR1 (New Types) ──► PR2 (Agent Integration + Cleanup)
 
 #### PR 1: New Types
 
-**Goal:** Create all new types. No changes to existing code yet.
+**Goal:** Create all new types. No changes to existing code yet. Because the old `ContextProvider` class (in `_memory.py`) still exists during this PR, the new base class uses the **temporary name `_ContextProviderBase`** to avoid import collisions. All new provider implementations reference `_ContextProviderBase` / `_HistoryProviderBase` in PR1.
 
 **Core Package - `packages/core/agent_framework/_sessions.py`:**
 - [ ] `SessionContext` class with explicit add/get methods
-- [ ] `ContextProvider` base class with `before_run()`/`after_run()`
-- [ ] `HistoryProvider` derived class with load_messages/store flags
-- [ ] Add `serialize()` and `restore()` methods to `ContextProvider` base class
-- [ ] `AgentSession` class with `state: dict[str, Any]`
-- [ ] `InMemoryHistoryProvider(HistoryProvider)`
+- [ ] `_ContextProviderBase` base class with `before_run()`/`after_run()` (temporary name; renamed to `ContextProvider` in PR2)
+- [ ] `_HistoryProviderBase(_ContextProviderBase)` derived class with load_messages/store flags (temporary; renamed to `HistoryProvider` in PR2)
+- [ ] `AgentSession` class with `state: dict[str, Any]`, `to_dict()`, `from_dict()`
+- [ ] `InMemoryHistoryProvider(_HistoryProviderBase)`
 
-**External Packages:**
-- [ ] `packages/azure-ai-search/` - create `AzureAISearchContextProvider`
-- [ ] `packages/redis/` - create `RedisHistoryProvider`
-- [ ] `packages/mem0/` - create `Mem0ContextProvider`
+**External Packages (new classes alongside existing ones, temporary `_` prefix):**
+- [ ] `packages/azure-ai-search/` - create `_AzureAISearchContextProvider(_ContextProviderBase)` — constructor keeps existing params, adds `source_id` (see compatibility notes below)
+- [ ] `packages/redis/` - create `_RedisHistoryProvider(_HistoryProviderBase)` — constructor keeps existing `RedisChatMessageStore` connection params, adds `source_id` + storage flags
+- [ ] `packages/redis/` - create `_RedisContextProvider(_ContextProviderBase)` — constructor keeps existing `RedisProvider` vector/search params, adds `source_id`
+- [ ] `packages/mem0/` - create `_Mem0ContextProvider(_ContextProviderBase)` — constructor keeps existing params, adds `source_id`
+
+**Constructor Compatibility Notes:**
+
+The existing provider constructors can be preserved with minimal additions:
+
+| Existing Class | New Class (PR1 temporary name) | Constructor Changes |
+|---|---|---|
+| `AzureAISearchContextProvider(ContextProvider)` | `_AzureAISearchContextProvider(_ContextProviderBase)` | Add `source_id: str` (required). All existing params (`endpoint`, `index_name`, `api_key`, `mode`, `top_k`, etc.) stay the same. `invoking()` → `before_run()`, `invoked()` → `after_run()`. |
+| `Mem0Provider(ContextProvider)` | `_Mem0ContextProvider(_ContextProviderBase)` | Add `source_id: str` (required). All existing params (`mem0_client`, `api_key`, `agent_id`, `user_id`, etc.) stay the same. `scope_to_per_operation_thread_id` → maps to session_id scoping via `before_run`. |
+| `RedisChatMessageStore` | `_RedisHistoryProvider(_HistoryProviderBase)` | Add `source_id: str` (required) + `load_messages`, `store_inputs`, `store_responses` flags. Keep connection params (`redis_url`, `credential_provider`, `host`, `port`, `ssl`). Drop `thread_id` (now from `context.session_id`), `messages` (state managed via `session.state`), `max_messages` (→ message reduction concern). |
+| `RedisProvider(ContextProvider)` | `_RedisContextProvider(_ContextProviderBase)` | Add `source_id: str` (required). Keep vector/search params (`redis_url`, `index_name`, `redis_vectorizer`, etc.). Drop `thread_id` scoping (now from `context.session_id`). |
 
 **Testing:**
-- [ ] Unit tests for `SessionContext` methods (add_messages, get_messages, add_instructions, add_tools)
-- [ ] Unit tests for `HistoryProvider` load/store flags
-- [ ] Unit tests for `InMemoryHistoryProvider` serialize/restore
+- [ ] Unit tests for `SessionContext` methods (extend_messages, get_messages, extend_instructions, extend_tools)
+- [ ] Unit tests for `_HistoryProviderBase` load/store flags
+- [ ] Unit tests for `InMemoryHistoryProvider` state persistence via session.state
 - [ ] Unit tests for source attribution (mandatory source_id)
 
 ---
@@ -1904,17 +1965,38 @@ PR1 (New Types) ──► PR2 (Agent Integration + Cleanup)
 - [ ] Replace `thread` parameter with `session` in `agent.run()`
 - [ ] Add `context_providers` parameter to `ChatAgent.__init__()`
 - [ ] Add `create_session()` method
-- [ ] Add `serialize_session()` / `restore_session()` methods
+- [ ] Verify `session.to_dict()`/`AgentSession.from_dict()` round-trip in integration tests
 - [ ] Wire up provider iteration (before_run forward, after_run reverse)
 - [ ] Add validation warning if multiple/zero history providers have `load_messages=True`
-- [ ] Wire up default `InMemoryHistoryProvider` behavior (auto-add when no providers and no service_session_id)
+- [ ] Wire up default `InMemoryHistoryProvider` behavior (auto-add when no providers and `conversation_id` or `store=True`)
 
 **Remove Legacy Types:**
-- [ ] `packages/core/agent_framework/_memory.py` - remove `ContextProvider` class
+- [ ] `packages/core/agent_framework/_memory.py` - remove old `ContextProvider` class
 - [ ] `packages/core/agent_framework/_threads.py` - remove `ChatMessageStore`, `ChatMessageStoreProtocol`, `AgentThread`
-- [ ] `packages/core/agent_framework/__init__.py` - remove old exports, add new exports from `_sessions.py`
 - [ ] Remove old provider classes from `azure-ai-search`, `redis`, `mem0`
 
+**Rename Temporary Types → Final Names:**
+- [ ] `_ContextProviderBase` → `ContextProvider` in `_sessions.py`
+- [ ] `_HistoryProviderBase` → `HistoryProvider` in `_sessions.py`
+- [ ] `_AzureAISearchContextProvider` → `AzureAISearchContextProvider` in `packages/azure-ai-search/`
+- [ ] `_Mem0ContextProvider` → `Mem0ContextProvider` in `packages/mem0/`
+- [ ] `_RedisHistoryProvider` → `RedisHistoryProvider` in `packages/redis/`
+- [ ] `_RedisContextProvider` → `RedisContextProvider` in `packages/redis/`
+- [ ] Update all imports across packages and `__init__.py` exports to use final names
+
+**Public API (root package exports):**
+
+All base classes and `InMemoryHistoryProvider` are exported from the root package:
+```python
+from agent_framework import (
+    ContextProvider,
+    HistoryProvider,
+    InMemoryHistoryProvider,
+    SessionContext,
+    AgentSession,
+)
+```
+
 **Documentation & Samples:**
 - [ ] Update all samples in `samples/` to use new API
 - [ ] Write migration guide
@@ -1923,7 +2005,7 @@ PR1 (New Types) ──► PR2 (Agent Integration + Cleanup)
 **Testing:**
 - [ ] Unit tests for provider execution order (before_run forward, after_run reverse)
 - [ ] Unit tests for validation warnings (multiple/zero loaders)
-- [ ] Unit tests for session serialization/deserialization
+- [ ] Unit tests for session serialization (`session.to_dict()`/`AgentSession.from_dict()` round-trip)
 - [ ] Integration test: agent with `context_providers` + `session` works
 - [ ] Integration test: full conversation with memory persistence
 - [ ] Ensure all existing tests still pass (with updated API)
@@ -1939,7 +2021,7 @@ PR1 (New Types) ──► PR2 (Agent Integration + Cleanup)
 - **[BREAKING]** Replaced `thread` parameter with `session` in `agent.run()`
 - Added `SessionContext` for invocation state with source attribution
 - Added `InMemoryHistoryProvider` for conversation history
-- Added session serialization (`serialize_session`, `restore_session`)
+- `AgentSession` provides `to_dict()`/`from_dict()` for serialization (no special serialize/restore on providers)
 
 ---
 
@@ -1952,6 +2034,48 @@ PR1 (New Types) ──► PR2 (Agent Integration + Cleanup)
 
 ---
 
+#### Implementation Detail: Decorator-based Providers
+
+For simple use cases, a class-based provider can be verbose. A decorator API allows registering plain functions as `before_run` or `after_run` hooks for a more Pythonic setup:
+
+```python
+from agent_framework import ChatAgent, before_run, after_run
+
+agent = ChatAgent(chat_client=client)
+
+@before_run(agent)
+async def add_system_prompt(agent, session, context, state):
+    """Inject a system prompt before every invocation."""
+    context.extend_messages("system", [ChatMessage(role="system", content="You are helpful.")])
+
+@after_run(agent)
+async def log_response(agent, session, context, state):
+    """Log the response after every invocation."""
+    print(f"Response: {context.response.text}")
+```
+
+Under the hood, the decorators create a `ContextProvider` instance wrapping the function and append it to `agent._context_providers`:
+
+```python
+def before_run(agent: ChatAgent, *, source_id: str = "decorated"):
+    def decorator(fn):
+        provider = _FunctionContextProvider(source_id=source_id, before_fn=fn)
+        agent._context_providers.append(provider)
+        return fn
+    return decorator
+
+def after_run(agent: ChatAgent, *, source_id: str = "decorated"):
+    def decorator(fn):
+        provider = _FunctionContextProvider(source_id=source_id, after_fn=fn)
+        agent._context_providers.append(provider)
+        return fn
+    return decorator
+```
+
+This is a convenience layer — the class-based API remains the primary interface for providers that need configuration, state, or both hooks.
+
+---
+
 #### Reference Implementation
 
 Full implementation code for the chosen design (hooks pattern, Decision B1).
@@ -1980,18 +2104,20 @@ class SessionContext:
         service_session_id: Service-managed session ID (if present, service handles storage)
         input_messages: The new messages being sent to the agent (read-only, set by caller)
         context_messages: Dict mapping source_id -> messages added by that provider.
-            Maintains insertion order (provider execution order). Use add_messages()
+            Maintains insertion order (provider execution order). Use extend_messages()
             to add messages with proper source attribution.
         instructions: Additional instructions - providers can append here
         tools: Additional tools - providers can append here
-        response_messages: After invocation, contains the agent's response (set by agent).
-            READ-ONLY - modifications are ignored. Use AgentMiddleware to modify responses.
+        response (property): After invocation, contains the full AgentResponse (set by agent).
+            Includes response.messages, response.response_id, response.agent_id,
+            response.usage_details, etc.
+            Read-only property - use AgentMiddleware to modify responses.
         options: Options passed to agent.run() - READ-ONLY, for reflection only
         metadata: Shared metadata dictionary for cross-provider communication
 
     Note:
         - `options` is read-only; changes will NOT be merged back into the agent run
-        - `response_messages` is read-only; use AgentMiddleware to modify responses
+        - `response` is a read-only property; use AgentMiddleware to modify responses
         - `instructions` and `tools` are merged by the agent into the run options
         - `context_messages` values are flattened in order when building the final input
     """
@@ -2005,7 +2131,6 @@ class SessionContext:
         context_messages: dict[str, list[ChatMessage]] | None = None,
         instructions: list[str] | None = None,
         tools: list[ToolProtocol] | None = None,
-        response_messages: list[ChatMessage] | None = None,
         options: dict[str, Any] | None = None,
         metadata: dict[str, Any] | None = None,
     ):
@@ -2015,11 +2140,16 @@ class SessionContext:
         self.context_messages: dict[str, list[ChatMessage]] = context_messages or {}
         self.instructions: list[str] = instructions or []
         self.tools: list[ToolProtocol] = tools or []
-        self.response_messages = response_messages
+        self._response: AgentResponse | None = None
         self.options = options or {}  # READ-ONLY - for reflection only
         self.metadata = metadata or {}
 
-    def add_messages(self, source_id: str, messages: Sequence[ChatMessage]) -> None:
+    @property
+    def response(self) -> AgentResponse | None:
+        """The agent's response. Set by the framework after invocation, read-only for providers."""
+        return self._response
+
+    def extend_messages(self, source_id: str, messages: Sequence[ChatMessage]) -> None:
         """Add context messages from a specific source.
 
         Messages are stored keyed by source_id, maintaining insertion order
@@ -2033,7 +2163,7 @@ class SessionContext:
             self.context_messages[source_id] = []
         self.context_messages[source_id].extend(messages)
 
-    def add_instructions(self, source_id: str, instructions: str | Sequence[str]) -> None:
+    def extend_instructions(self, source_id: str, instructions: str | Sequence[str]) -> None:
         """Add instructions to be prepended to the conversation.
 
         Instructions are added to a flat list. The source_id is recorded
@@ -2047,7 +2177,7 @@ class SessionContext:
             instructions = [instructions]
         self.instructions.extend(instructions)
 
-    def add_tools(self, source_id: str, tools: Sequence[ToolProtocol]) -> None:
+    def extend_tools(self, source_id: str, tools: Sequence[ToolProtocol]) -> None:
         """Add tools to be available for this invocation.
 
         Tools are added with source attribution in their metadata.
@@ -2063,19 +2193,25 @@ class SessionContext:
 
     def get_messages(
         self,
+        *,
         sources: Sequence[str] | None = None,
         exclude_sources: Sequence[str] | None = None,
+        include_input: bool = False,
+        include_response: bool = False,
     ) -> list[ChatMessage]:
-        """Get context messages, optionally filtered by source.
+        """Get context messages, optionally filtered and including input/response.
 
-        Returns messages in provider execution order (dict insertion order).
+        Returns messages in provider execution order (dict insertion order),
+        with input and response appended if requested.
 
         Args:
-            sources: If provided, only include messages from these sources
-            exclude_sources: If provided, exclude messages from these sources
+            sources: If provided, only include context messages from these sources
+            exclude_sources: If provided, exclude context messages from these sources
+            include_input: If True, append input_messages after context
+            include_response: If True, append response.messages at the end
 
         Returns:
-            Flattened list of messages in provider execution order
+            Flattened list of messages in conversation order
         """
         result: list[ChatMessage] = []
         for source_id, messages in self.context_messages.items():
@@ -2084,35 +2220,10 @@ class SessionContext:
             if exclude_sources is not None and source_id in exclude_sources:
                 continue
             result.extend(messages)
-        return result
-
-    def get_all_messages(
-        self,
-        *,
-        include_input: bool = False,
-        include_response: bool = False,
-    ) -> list[ChatMessage]:
-        """Get all messages, optionally including input and response.
-
-        Returns messages in the order they would appear in a full conversation:
-        1. Context messages (from providers, in execution order)
-        2. Input messages (if include_input=True)
-        3. Response messages (if include_response=True)
-
-        Args:
-            include_input: If True, append input_messages after context
-            include_response: If True, append response_messages at the end
-
-        Returns:
-            Flattened list of messages in conversation order
-        """
-        result: list[ChatMessage] = []
-        for messages in self.context_messages.values():
-            result.extend(messages)
         if include_input and self.input_messages:
             result.extend(self.input_messages)
-        if include_response and self.response_messages:
-            result.extend(self.response_messages)
+        if include_response and self.response:
+            result.extend(self.response.messages)
         return result
 ```
 
@@ -2141,7 +2252,7 @@ class ContextProvider(ABC):
 
     async def before_run(
         self,
-        agent: "ChatAgent",
+        agent: "SupportsAgentRun",
         session: AgentSession,
         context: SessionContext,
         state: dict[str, Any],
@@ -2161,7 +2272,7 @@ class ContextProvider(ABC):
 
     async def after_run(
         self,
-        agent: "ChatAgent",
+        agent: "SupportsAgentRun",
         session: AgentSession,
         context: SessionContext,
         state: dict[str, Any],
@@ -2169,23 +2280,19 @@ class ContextProvider(ABC):
         """Called after model invocation.
 
         Override to process the response (store messages, extract info, etc.).
-        The context.response_messages will be populated at this point.
+        The context.response.messages will be populated at this point.
 
         Args:
             agent: The agent that ran this invocation
             session: The current session
-            context: The invocation context with response_messages populated
+            context: The invocation context with response populated
             state: The session's mutable state dict
         """
         pass
+```
 
-    async def serialize(self) -> Any:
-        """Serialize provider state. Default returns None (no state)."""
-        return None
-
-    async def restore(self, state: Any) -> None:
-        """Restore provider state from serialized object."""
-        pass
+> **Serialization contract:** Any values a provider writes to `state` must be JSON-serializable.
+> Sessions are serialized via `session.to_dict()` and restored via `AgentSession.from_dict()`.
 ```
 
 ##### HistoryProvider
@@ -2201,7 +2308,7 @@ class HistoryProvider(ContextProvider):
 
     Loading behavior (when to add messages to context_messages[source_id]):
     - `load_messages=True` (default): Load messages from storage
-    - `load_messages=False`: Skip loading (before_run is a no-op)
+    - `load_messages=False`: Agent skips `before_run` entirely (audit/logging mode)
 
     Storage behavior:
     - `store_inputs`: Store input messages (default True)
@@ -2272,10 +2379,9 @@ class HistoryProvider(ContextProvider):
             return context.get_messages(exclude_sources=[self.source_id])
 
     async def before_run(self, agent, session, context, state) -> None:
-        """Load history into context if configured."""
-        if self.load_messages:
-            history = await self.get_messages(context.session_id)
-            context.add_messages(self.source_id, history)
+        """Load history into context. Skipped by the agent when load_messages=False."""
+        history = await self.get_messages(context.session_id)
+        context.extend_messages(self.source_id, history)
 
     async def after_run(self, agent, session, context, state) -> None:
         """Store messages based on configuration."""
@@ -2283,8 +2389,8 @@ class HistoryProvider(ContextProvider):
         messages_to_store.extend(self._get_context_messages_to_store(context))
         if self.store_inputs:
             messages_to_store.extend(context.input_messages)
-        if self.store_responses and context.response_messages:
-            messages_to_store.extend(context.response_messages)
+        if self.store_responses and context.response.messages:
+            messages_to_store.extend(context.response.messages)
         if messages_to_store:
             await self.save_messages(context.session_id, messages_to_store)
 ```
@@ -2332,8 +2438,24 @@ class AgentSession:
         """The unique identifier for this session."""
         return self._session_id
 
+    def to_dict(self) -> dict[str, Any]:
+        """Serialize session to a plain dict for storage/transfer."""
+        return {
+            "type": "session",
+            "session_id": self._session_id,
+            "service_session_id": self.service_session_id,
+            "state": self.state,
+        }
 
-# Example of how agent creates sessions and runs providers:
+    @classmethod
+    def from_dict(cls, data: dict[str, Any]) -> "AgentSession":
+        """Restore session from a previously serialized dict."""
+        session = cls(
+            session_id=data["session_id"],
+            service_session_id=data.get("service_session_id"),
+        )
+        session.state = data.get("state", {})
+        return session
 class ChatAgent:
     def __init__(
         self,
@@ -2347,35 +2469,42 @@ class ChatAgent:
         self,
         *,
         session_id: str | None = None,
-        service_session_id: str | None = None,
     ) -> AgentSession:
         """Create a new lightweight session.
 
         Args:
             session_id: Optional session ID (generated if not provided)
-            service_session_id: Optional service-managed session ID
         """
-        return AgentSession(
-            session_id=session_id,
-            service_session_id=service_session_id,
-        )
+        return AgentSession(session_id=session_id)
+
+    def get_session(
+        self,
+        service_session_id: str,
+        *,
+        session_id: str | None = None,
+    ) -> AgentSession:
+        """Get or create a session for a service-managed session ID.
+
+        Args:
+            service_session_id: Service-managed session ID
+            session_id: Optional session ID (generated if not provided)
+        """
+        session = AgentSession(session_id=session_id)
+        session.service_session_id = service_session_id
+        return session
 
     def _ensure_default_storage(self, session: AgentSession, options: dict[str, Any]) -> None:
         """Add default InMemoryHistoryProvider if needed.
 
         Default storage is added when ALL of these are true:
-        - No service_session_id (service not managing storage)
-        - options.store is not True (user not expecting service storage)
-        - No context_providers configured at all
+        - A session is provided (always the case here)
+        - No context_providers configured
+        - Either options.conversation_id is set or options.store is True
         """
-        if options.get("store") is True:
-            return
-        if session.service_session_id is not None:
-            return
         if self._context_providers:
             return
-        # Add default in-memory storage
-        self._context_providers.append(InMemoryHistoryProvider("memory"))
+        if options.get("conversation_id") or options.get("store") is True:
+            self._context_providers.append(InMemoryHistoryProvider("memory"))
 
     def _validate_providers(self) -> None:
         """Warn if history provider configuration looks like a mistake."""
@@ -2416,8 +2545,10 @@ class ChatAgent:
             options=options,
         )
 
-        # Before-run providers (forward order)
+        # Before-run providers (forward order, skip HistoryProviders with load_messages=False)
         for provider in self._context_providers:
+            if isinstance(provider, HistoryProvider) and not provider.load_messages:
+                continue
             await provider.before_run(self, session, context, session.state)
 
         # ... assemble final messages from context, invoke model ...
@@ -2426,30 +2557,19 @@ class ChatAgent:
         for provider in reversed(self._context_providers):
             await provider.after_run(self, session, context, session.state)
 
-    async def serialize_session(self, session: AgentSession) -> dict[str, Any]:
-        """Serialize a session's state for persistence."""
-        provider_states: dict[str, Any] = {}
-        for provider in self._context_providers:
-            state = await provider.serialize()
-            if state is not None:
-                provider_states[provider.source_id] = state
-        return {
-            "session_id": session.session_id,
-            "service_session_id": session.service_session_id,
-            "state": session.state,
-            "provider_states": provider_states,
-        }
 
-    async def restore_session(self, serialized: dict[str, Any]) -> AgentSession:
-        """Restore a session from serialized state."""
-        session = AgentSession(
-            session_id=serialized["session_id"],
-            service_session_id=serialized.get("service_session_id"),
-        )
-        session.state = serialized.get("state", {})
-        provider_states = serialized.get("provider_states", {})
-        for provider in self._context_providers:
-            if provider.source_id in provider_states:
-                await provider.restore(provider_states[provider.source_id])
-        return session
+# Session serialization is trivial — session.state is a plain dict:
+#
+#   # Serialize
+#   data = {
+#       "session_id": session.session_id,
+#       "service_session_id": session.service_session_id,
+#       "state": session.state,
+#   }
+#   json_str = json.dumps(data)
+#
+#   # Deserialize
+#   data = json.loads(json_str)
+#   session = AgentSession(session_id=data["session_id"], service_session_id=data.get("service_session_id"))
+#   session.state = data["state"]
 ```

From 9d59f0ce11dc4cdcbcc8c6f890cba41c2aa56835 Mon Sep 17 00:00:00 2001
From: eavanvalkenburg <github@vanvalkenburg.eu>
Date: Mon, 9 Feb 2026 15:07:03 +0100
Subject: [PATCH 18/19] Rename ADR to 0016-python-context-middleware.md

---
 ...on-context-middleware.md => 0016-python-context-middleware.md} | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename docs/decisions/{00XX-python-context-middleware.md => 0016-python-context-middleware.md} (100%)

diff --git a/docs/decisions/00XX-python-context-middleware.md b/docs/decisions/0016-python-context-middleware.md
similarity index 100%
rename from docs/decisions/00XX-python-context-middleware.md
rename to docs/decisions/0016-python-context-middleware.md

From 2c22929f5d5af5ace783e6010cbad7b10bf69de8 Mon Sep 17 00:00:00 2001
From: eavanvalkenburg <github@vanvalkenburg.eu>
Date: Mon, 9 Feb 2026 15:12:40 +0100
Subject: [PATCH 19/19] =?UTF-8?q?Fix=20broken=20link:=20#3-unified-storage?=
 =?UTF-8?q?-middleware=20=E2=86=92=20#3-unified-storage?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 docs/decisions/0016-python-context-middleware.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/decisions/0016-python-context-middleware.md b/docs/decisions/0016-python-context-middleware.md
index d324e6f79a..63a4014e18 100644
--- a/docs/decisions/0016-python-context-middleware.md
+++ b/docs/decisions/0016-python-context-middleware.md
@@ -44,7 +44,7 @@ This ADR addresses the following issues from the parent issue [#3575](https://gi
 | [#3587](https://github.com/microsoft/agent-framework/issues/3587) | Rename AgentThread to AgentSession | ✅ `AgentThread` → `AgentSession` (clean break, no alias). See [§7 Renaming](#7-renaming-thread--session). |
 | [#3588](https://github.com/microsoft/agent-framework/issues/3588) | Add get_new_session, get_session_by_id methods | ✅ `agent.create_session()` and `agent.get_session(service_session_id)`. See [§9 Session Management Methods](#9-session-management-methods). |
 | [#3589](https://github.com/microsoft/agent-framework/issues/3589) | Move serialize method into the agent | ✅ No longer needed. `AgentSession` provides `to_dict()`/`from_dict()` for serialization. Providers write JSON-serializable values to `session.state`. See [§8 Serialization](#8-session-serializationdeserialization). |
-| [#3590](https://github.com/microsoft/agent-framework/issues/3590) | Design orthogonal ChatMessageStore for service vs local | ✅ `StorageContextMiddleware` works orthogonally: configure `load_messages=False` when service manages storage. Multiple storage middleware allowed. See [§3 Unified Storage](#3-unified-storage-middleware). |
+| [#3590](https://github.com/microsoft/agent-framework/issues/3590) | Design orthogonal ChatMessageStore for service vs local | ✅ `HistoryProvider` works orthogonally: configure `load_messages=False` when service manages storage. Multiple history providers allowed. See [§3 Unified Storage](#3-unified-storage). |
 | [#3601](https://github.com/microsoft/agent-framework/issues/3601) | Rename ChatMessageStore to ChatHistoryProvider | 🔒 **Closed** - Superseded by this ADR. `ChatMessageStore` removed entirely, replaced by `StorageContextMiddleware`. |
 
 ## Current State Analysis