Python: Port to Pydantic BaseModel for Durable State#2564
Python: Port to Pydantic BaseModel for Durable State#2564larohra wants to merge 8 commits intomicrosoft:mainfrom
Conversation
Co-authored-by: larohra <41490930+larohra@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR refactors the durable agent state implementation by replacing custom serialization logic with Pydantic BaseModel, significantly simplifying the codebase and improving maintainability.
Key Changes:
- Introduced
DurableAgentStateModelbase class with automatic camelCase ↔ snake_case conversion - Replaced manual
to_dict/from_dictimplementations with Pydantic's built-in serialization - Implemented discriminated unions for polymorphic types using Pydantic's
AnnotatedandField(discriminator=...) - Added
ApiResponseFieldsconstants class to centralize API response field names
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
python/packages/azurefunctions/agent_framework_azurefunctions/_durable_agent_state.py |
Complete refactor to use Pydantic models; removed ~500 lines of manual serialization code while maintaining the same durable state schema |
python/packages/azurefunctions/agent_framework_azurefunctions/_constants.py |
Added ApiResponseFields class to centralize HTTP API response field names, improving code organization |
You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.
| def from_run_response(correlation_id: str, response: AgentRunResponse) -> DurableAgentStateResponse: | ||
| """Creates a DurableAgentStateResponse from an AgentRunResponse.""" | ||
| """Create a DurableAgentStateResponse from an AgentRunResponse.""" | ||
| logger.warning("Received Agent Run Response response: %s", json.dumps(response.to_dict(), indent=2)) |
There was a problem hiding this comment.
This logger.warning appears to be debug code that should be removed before merging. Debug logging at warning level in production code can cause confusion and performance issues.
| logger.warning("Received Agent Run Response response: %s", json.dumps(response.to_dict(), indent=2)) |
| ( | ||
| DurableAgentStateTextContent | ||
| | DurableAgentStateDataContent | ||
| | DurableAgentStateErrorContent | ||
| | DurableAgentStateFunctionCallContent | ||
| | DurableAgentStateFunctionResultContent | ||
| | DurableAgentStateHostedFileContent | ||
| | DurableAgentStateHostedVectorStoreContent | ||
| | DurableAgentStateTextReasoningContent | ||
| | DurableAgentStateUriContent | ||
| | DurableAgentStateUsageContent | ||
| | DurableAgentStateUnknownContent | ||
| ) |
There was a problem hiding this comment.
The contents field should use Pydantic's discriminated union pattern for proper deserialization, similar to how conversation_history is configured in DurableAgentStateData (lines 140-148).
The field should be defined as:
contents: list[
Annotated[
(
Annotated[DurableAgentStateTextContent, Tag("text")]
| Annotated[DurableAgentStateDataContent, Tag("data")]
| Annotated[DurableAgentStateErrorContent, Tag("error")]
| Annotated[DurableAgentStateFunctionCallContent, Tag("functionCall")]
| Annotated[DurableAgentStateFunctionResultContent, Tag("functionResult")]
| Annotated[DurableAgentStateHostedFileContent, Tag("hostedFile")]
| Annotated[DurableAgentStateHostedVectorStoreContent, Tag("hostedVectorStore")]
| Annotated[DurableAgentStateTextReasoningContent, Tag("reasoning")]
| Annotated[DurableAgentStateUriContent, Tag("uri")]
| Annotated[DurableAgentStateUsageContent, Tag("usage")]
| Annotated[DurableAgentStateUnknownContent, Tag("unknown")]
),
Field(discriminator="type"),
]
]This ensures Pydantic uses the type field (aliased as $type) to discriminate between content types during deserialization, preventing potential deserialization errors.
| ( | |
| DurableAgentStateTextContent | |
| | DurableAgentStateDataContent | |
| | DurableAgentStateErrorContent | |
| | DurableAgentStateFunctionCallContent | |
| | DurableAgentStateFunctionResultContent | |
| | DurableAgentStateHostedFileContent | |
| | DurableAgentStateHostedVectorStoreContent | |
| | DurableAgentStateTextReasoningContent | |
| | DurableAgentStateUriContent | |
| | DurableAgentStateUsageContent | |
| | DurableAgentStateUnknownContent | |
| ) | |
| Annotated[ | |
| ( | |
| Annotated[DurableAgentStateTextContent, Tag("text")] | |
| | Annotated[DurableAgentStateDataContent, Tag("data")] | |
| | Annotated[DurableAgentStateErrorContent, Tag("error")] | |
| | Annotated[DurableAgentStateFunctionCallContent, Tag("functionCall")] | |
| | Annotated[DurableAgentStateFunctionResultContent, Tag("functionResult")] | |
| | Annotated[DurableAgentStateHostedFileContent, Tag("hostedFile")] | |
| | Annotated[DurableAgentStateHostedVectorStoreContent, Tag("hostedVectorStore")] | |
| | Annotated[DurableAgentStateTextReasoningContent, Tag("reasoning")] | |
| | Annotated[DurableAgentStateUriContent, Tag("uri")] | |
| | Annotated[DurableAgentStateUsageContent, Tag("usage")] | |
| | Annotated[DurableAgentStateUnknownContent, Tag("unknown")] | |
| ), | |
| Field(discriminator="type"), | |
| ] |
| | DurableAgentStateUsageContent | ||
| | DurableAgentStateUnknownContent | ||
| ) | ||
| ] |
There was a problem hiding this comment.
The contents field should have a default value using Field(default_factory=list) to allow creating DurableAgentStateMessage instances without providing contents. This is consistent with other list fields in the codebase (e.g., conversation_history on line 148 and messages on line 202).
| ] | |
| ] = Field(default_factory=list) |
| @staticmethod | ||
| def from_usage_content(content: UsageContent) -> DurableAgentStateUsageContent: | ||
| return DurableAgentStateUsageContent(usage=DurableAgentStateUsage.from_usage(content.details)) # type: ignore | ||
| return DurableAgentStateUsageContent(usage=DurableAgentStateUsage.from_usage(content.details)) |
There was a problem hiding this comment.
Potential type mismatch: DurableAgentStateUsage.from_usage() returns DurableAgentStateUsage | None (line 493), but this field expects DurableAgentStateUsage. If content.details is None, this will attempt to assign None to a non-optional field, which could cause issues.
Consider either:
- Making the
usagefield optional:usage: DurableAgentStateUsage | None = None - Or adding a check:
@staticmethod
def from_usage_content(content: UsageContent) -> DurableAgentStateUsageContent:
usage = DurableAgentStateUsage.from_usage(content.details)
if usage is None:
usage = DurableAgentStateUsage()
return DurableAgentStateUsageContent(usage=usage)| return DurableAgentStateUsageContent(usage=DurableAgentStateUsage.from_usage(content.details)) | |
| usage = DurableAgentStateUsage.from_usage(content.details) | |
| if usage is None: | |
| usage = DurableAgentStateUsage() | |
| return DurableAgentStateUsageContent(usage=usage) |
| """Token usage as message content.""" | ||
|
|
||
| type: Literal["usage"] = Field(default="usage", alias="$type") | ||
| usage: DurableAgentStateUsage = DurableAgentStateUsage() |
There was a problem hiding this comment.
Using a mutable default value (DurableAgentStateUsage()) directly can cause issues. While Pydantic handles this correctly, best practice is to use Field(default_factory=DurableAgentStateUsage) to make the intent explicit and follow Python best practices.
| usage: DurableAgentStateUsage = DurableAgentStateUsage() | |
| usage: DurableAgentStateUsage = Field(default_factory=DurableAgentStateUsage) |
| get_logger, | ||
| ) | ||
| from dateutil import parser as date_parser | ||
| from pydantic import BaseModel, ConfigDict, Field, Tag, field_validator |
There was a problem hiding this comment.
Hi @larohra, we've deliberately moved away from Pydantic, for this type of behavior, in favor of a custom SerializationMixin (see code here). It may seem silly to do this, but we've determined that we don't want to support Pydantic, which can have breaking changes even between minor versions, as well as tricky migrations to future major versions.
There was a problem hiding this comment.
Hey @moonbox3, thanks for sharing your concerns with me. I did look at the SerializationMixin but decided against it primarily because it didnt offer all the features that we need (like snake_case to camelCase serializations). Is the guidance to add all missing features to the SerializationMixin for using it?
There was a problem hiding this comment.
I did a deeper dive into it, and here's the feature gap breakdown -
Feature Comparison
| Feature | Pydantic (DurableAgentStateModel) |
SerializationMixin |
Gap Severity |
|---|---|---|---|
| Serialization | Automatic to_dict() via model_dump. |
to_dict() with deep serialization support. |
Low |
| Deserialization | Automatic from_dict() via model_validate. |
from_dict() with dependency injection support. |
Low |
| Case Conversion | Automatic via alias_generator=to_camel. |
None. Keys are serialized as-is (snake_case). | High |
| Polymorphism | Declarative via Annotated[Union[...], Field(discriminator="type")]. |
None. Parent class must manually instantiate correct subclasses. | High |
| Type Discriminator | Configurable alias (e.g., $type). |
Defaults to type field. |
Medium |
| Data Validation | Automatic type coercion & custom validators (field_validator). |
None. Must be handled in __init__. |
Medium |
| Forward Compatibility | extra="ignore" handles unknown fields automatically. |
from_dict passes **kwargs. __init__ must explicitly handle/ignore **kwargs. |
Low |
Apart from the Low severity, others are important gaps that do help us keep the code more manageable and cleaner.
Given this gap list, is it a feasible stop-gap solution to keep the Pydantic BaseModel for now for our use case until the SerializationMixin is more updated with these additional features @eavanvalkenburg @moonbox3? Using SerializationMixin does not give us major benefit over what we currently have (i.e. customized solution with a lot of boilerplate code) so using Pydantic is a much more comprehensive option for us and reduces the scope for errors.
|
Closing the PR since we dont want to add dependency on Pydantic. Will open a separate one with the fix for the issue. |
Motivation and Context
This PR fixes the serialization issues in the DurableState by removing all custom serialization logic and relying solely on Pydantic BaseModel to provide the serialization/deserialization support. This drastically cleans up the existing code making it much more readable and easier to maintain.
Fixes #2483
Description
Contribution Checklist