Python: feat(python): Add embedding abstractions and OpenAI implementation (Phase 1)#4153
Open
eavanvalkenburg wants to merge 7 commits intomicrosoft:mainfrom
Open
Conversation
…hase 1) This PR contains two parts: 1. **Overall migration plan** for porting vector stores and embeddings from Semantic Kernel to Agent Framework (docs/features/vector-stores-and-embeddings/README.md) covering all 10 phases from core abstractions through connectors and TextSearch. 2. **Phase 1 implementation** — core embedding abstractions and OpenAI/Azure OpenAI embedding clients: Core types (_types.py): - EmbeddingGenerationOptions TypedDict (total=False) - Embedding[EmbeddingT] generic class with model_id, dimensions, created_at - GeneratedEmbeddings[EmbeddingT, EmbeddingOptionsT] list container with options, usage - EmbeddingInputT (default str) and EmbeddingT (default list[float]) TypeVars Protocol + base class (_clients.py): - SupportsGetEmbeddings protocol — Generic[EmbeddingInputT, EmbeddingT, OptionsContraT] - BaseEmbeddingClient ABC — Generic[EmbeddingInputT, EmbeddingT, OptionsCoT] Telemetry (observability.py): - EmbeddingTelemetryLayer with gen_ai.operation.name = "embeddings" OpenAI implementation (openai/_embedding_client.py): - RawOpenAIEmbeddingClient, OpenAIEmbeddingClient, OpenAIEmbeddingOptions - Uses _ensure_client() factory pattern Azure OpenAI implementation (azure/_embedding_client.py): - AzureOpenAIEmbeddingClient following AzureOpenAIChatClient pattern - Supports API key, Entra ID credentials, env var configuration Tests: - 47 unit tests for types, protocol, base class, OpenAI, and Azure clients - 6 integration tests (gated behind RUN_INTEGRATION_TESTS + credentials) Samples: - samples/02-agents/embeddings/openai_embeddings.py - samples/02-agents/embeddings/azure_openai_embeddings.py Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Member
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Map OPENAI_EMBEDDING_MODEL_ID and AZURE_OPENAI_EMBEDDING_DEPLOYMENT_NAME from GitHub vars to the integration test environment. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Adds the first phase of a broader “vector stores + embeddings” feature by introducing core embedding abstractions in agent_framework, plus OpenAI/Azure OpenAI embedding client implementations and supporting telemetry, tests, and samples.
Changes:
- Introduces core embedding types (
Embedding,GeneratedEmbeddings,EmbeddingGenerationOptions) and embedding client abstractions (SupportsGetEmbeddings,BaseEmbeddingClient). - Adds OpenAI + Azure OpenAI embedding clients and exports them via the provider namespaces.
- Adds embedding-specific OpenTelemetry instrumentation plus new unit/integration tests and runnable samples.
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| python/samples/02-agents/embeddings/openai_embeddings.py | New sample demonstrating OpenAI embeddings usage. |
| python/samples/02-agents/embeddings/azure_openai_embeddings.py | New sample demonstrating Azure OpenAI embeddings usage. |
| python/packages/core/tests/workflow/test_full_conversation.py | Minor formatting-only updates in workflow tests. |
| python/packages/core/tests/openai/test_openai_embedding_client.py | New OpenAI/Azure OpenAI embedding client unit + integration tests. |
| python/packages/core/tests/core/test_embedding_types.py | New tests for embedding container/types. |
| python/packages/core/tests/core/test_embedding_client.py | New tests for embedding protocol + base client behavior. |
| python/packages/core/agent_framework/openai/_shared.py | Extends OpenAI settings to include embedding model env-var/config support. |
| python/packages/core/agent_framework/openai/_embedding_client.py | New OpenAI embedding client implementation (raw + telemetry-wrapped). |
| python/packages/core/agent_framework/openai/init.py | Exports OpenAIEmbeddingClient and OpenAIEmbeddingOptions. |
| python/packages/core/agent_framework/observability.py | Adds EmbeddingTelemetryLayer and embedding operation name. |
| python/packages/core/agent_framework/azure/_shared.py | Extends Azure OpenAI settings with embedding deployment name support. |
| python/packages/core/agent_framework/azure/_embedding_client.py | New Azure OpenAI embedding client implementation. |
| python/packages/core/agent_framework/azure/init.py | Lazy-export wiring for AzureOpenAIEmbeddingClient. |
| python/packages/core/agent_framework/_types.py | Adds core embedding types + type variables. |
| python/packages/core/agent_framework/_clients.py | Adds embedding protocol + base class abstractions. |
| python/packages/core/agent_framework/init.py | Re-exports new embedding abstractions/types at top level. |
| docs/features/vector-stores-and-embeddings/README.md | Adds the multi-phase migration/implementation plan and design notes. |
python/packages/core/agent_framework/openai/_embedding_client.py
Outdated
Show resolved
Hide resolved
When encoding_format='base64' is used, the OpenAI API returns base64-encoded floats instead of a JSON array. Decode these automatically to list[float] so the return type stays consistent regardless of encoding format. Also adds a unit test for base64 decoding and fixes minor docstring/import issues. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Embeddings have no output/completion tokens. Remove OUTPUT_TOKENS recording which was double-counting prompt_tokens via the total_tokens fallback. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use contravariant/covariant TypeVars for SupportsGetEmbeddings Protocol. Combine nested if into single statement in telemetry layer. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
GeneratedEmbeddings is invariant in its type param, so the Protocol TypeVar cannot be covariant. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
This PR contains two parts:
1. Migration Plan (docs/features/vector-stores-and-embeddings/README.md)
A comprehensive 10-phase plan for porting all vector store and embedding abstractions from Semantic Kernel into Agent Framework. The plan covers:
Key design decisions are documented including naming conventions (SK → AF), generic type structure, Protocol + Base class pattern, and package organization.
2. Phase 1 Implementation — Core Embedding Abstractions
Core types (
_types.py):EmbeddingGenerationOptionsTypedDictEmbedding[EmbeddingT]generic class withmodel_id,dimensions,created_atGeneratedEmbeddings[EmbeddingT, EmbeddingOptionsT]list container withoptions,usageEmbeddingInputT(defaultstr) andEmbeddingT(defaultlist[float]) TypeVars for multimodal supportProtocol + base class (
_clients.py):SupportsGetEmbeddings—Protocol[EmbeddingInputT, EmbeddingT, OptionsContraT]BaseEmbeddingClient—ABC, Generic[EmbeddingInputT, EmbeddingT, OptionsCoT]Telemetry (
observability.py):EmbeddingTelemetryLayerwithgen_ai.operation.name = "embeddings"OpenAI implementation (
openai/_embedding_client.py):RawOpenAIEmbeddingClient,OpenAIEmbeddingClient,OpenAIEmbeddingOptions_ensure_client()factory pattern, MRO-based telemetryAzure OpenAI implementation (
azure/_embedding_client.py):AzureOpenAIEmbeddingClientfollowingAzureOpenAIChatClientpatternTests: 47 unit tests + 6 integration tests (gated behind
RUN_INTEGRATION_TESTS)Samples:
samples/02-agents/embeddings/with OpenAI and Azure OpenAI examplesVerification
poe check(fmt, lint, pyright)poe test(1782 tests in core)Cannot find module for google) remains