Skip to content

Comments

Python: feat(python): Add embedding abstractions and OpenAI implementation (Phase 1)#4153

Open
eavanvalkenburg wants to merge 7 commits intomicrosoft:mainfrom
eavanvalkenburg:feature/embedding-abstractions-phase1
Open

Python: feat(python): Add embedding abstractions and OpenAI implementation (Phase 1)#4153
eavanvalkenburg wants to merge 7 commits intomicrosoft:mainfrom
eavanvalkenburg:feature/embedding-abstractions-phase1

Conversation

@eavanvalkenburg
Copy link
Member

Overview

This PR contains two parts:

1. Migration Plan (docs/features/vector-stores-and-embeddings/README.md)

A comprehensive 10-phase plan for porting all vector store and embedding abstractions from Semantic Kernel into Agent Framework. The plan covers:

  • Core embedding abstractions (Phase 1 — this PR)
  • Embedding generators for existing providers (Phase 2)
  • Core vector store abstractions (Phase 3)
  • In-memory vector store (Phase 4)
  • 13+ vector store connectors (Phases 5-7)
  • CRUD tools for agents (Phase 8)
  • Additional embedding providers (Phase 9)
  • TextSearch abstractions (Phase 10)

Key design decisions are documented including naming conventions (SK → AF), generic type structure, Protocol + Base class pattern, and package organization.

2. Phase 1 Implementation — Core Embedding Abstractions

Core types (_types.py):

  • EmbeddingGenerationOptions TypedDict
  • Embedding[EmbeddingT] generic class with model_id, dimensions, created_at
  • GeneratedEmbeddings[EmbeddingT, EmbeddingOptionsT] list container with options, usage
  • EmbeddingInputT (default str) and EmbeddingT (default list[float]) TypeVars for multimodal support

Protocol + base class (_clients.py):

  • SupportsGetEmbeddingsProtocol[EmbeddingInputT, EmbeddingT, OptionsContraT]
  • BaseEmbeddingClientABC, Generic[EmbeddingInputT, EmbeddingT, OptionsCoT]

Telemetry (observability.py):

  • EmbeddingTelemetryLayer with gen_ai.operation.name = "embeddings"

OpenAI implementation (openai/_embedding_client.py):

  • RawOpenAIEmbeddingClient, OpenAIEmbeddingClient, OpenAIEmbeddingOptions
  • Uses _ensure_client() factory pattern, MRO-based telemetry

Azure OpenAI implementation (azure/_embedding_client.py):

  • AzureOpenAIEmbeddingClient following AzureOpenAIChatClient pattern
  • Supports API key, Entra ID credentials, env var configuration

Tests: 47 unit tests + 6 integration tests (gated behind RUN_INTEGRATION_TESTS)
Samples: samples/02-agents/embeddings/ with OpenAI and Azure OpenAI examples

Verification

  • All 22 packages pass poe check (fmt, lint, pyright)
  • All 22 packages pass poe test (1782 tests in core)
  • Only pre-existing mypy crash (Cannot find module for google) remains

…hase 1)

This PR contains two parts:

1. **Overall migration plan** for porting vector stores and embeddings from
   Semantic Kernel to Agent Framework (docs/features/vector-stores-and-embeddings/README.md)
   covering all 10 phases from core abstractions through connectors and TextSearch.

2. **Phase 1 implementation** — core embedding abstractions and OpenAI/Azure OpenAI
   embedding clients:

   Core types (_types.py):
   - EmbeddingGenerationOptions TypedDict (total=False)
   - Embedding[EmbeddingT] generic class with model_id, dimensions, created_at
   - GeneratedEmbeddings[EmbeddingT, EmbeddingOptionsT] list container with options, usage
   - EmbeddingInputT (default str) and EmbeddingT (default list[float]) TypeVars

   Protocol + base class (_clients.py):
   - SupportsGetEmbeddings protocol — Generic[EmbeddingInputT, EmbeddingT, OptionsContraT]
   - BaseEmbeddingClient ABC — Generic[EmbeddingInputT, EmbeddingT, OptionsCoT]

   Telemetry (observability.py):
   - EmbeddingTelemetryLayer with gen_ai.operation.name = "embeddings"

   OpenAI implementation (openai/_embedding_client.py):
   - RawOpenAIEmbeddingClient, OpenAIEmbeddingClient, OpenAIEmbeddingOptions
   - Uses _ensure_client() factory pattern

   Azure OpenAI implementation (azure/_embedding_client.py):
   - AzureOpenAIEmbeddingClient following AzureOpenAIChatClient pattern
   - Supports API key, Entra ID credentials, env var configuration

   Tests:
   - 47 unit tests for types, protocol, base class, OpenAI, and Azure clients
   - 6 integration tests (gated behind RUN_INTEGRATION_TESTS + credentials)

   Samples:
   - samples/02-agents/embeddings/openai_embeddings.py
   - samples/02-agents/embeddings/azure_openai_embeddings.py

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 22, 2026 14:01
@markwallace-microsoft markwallace-microsoft added documentation Improvements or additions to documentation python labels Feb 22, 2026
@github-actions github-actions bot changed the title feat(python): Add embedding abstractions and OpenAI implementation (Phase 1) Python: feat(python): Add embedding abstractions and OpenAI implementation (Phase 1) Feb 22, 2026
@markwallace-microsoft
Copy link
Member

markwallace-microsoft commented Feb 22, 2026

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
packages/core/agent_framework
   _clients.py86396%298, 491, 493
   _types.py10248791%59, 68–69, 123, 128, 147, 149, 153, 157, 159, 161, 163, 181, 185, 211, 233, 238, 243, 247, 273, 277, 626–627, 998, 1060, 1077, 1095, 1100, 1118, 1128, 1145–1146, 1148, 1166–1167, 1169, 1176–1177, 1179, 1214, 1225–1226, 1228, 1266, 1493, 1545, 1636–1641, 1663, 1668, 1834, 1846, 2089, 2098, 2119, 2214, 2439, 2646, 2716, 2728, 2746, 2944–2946, 2949–2951, 2955, 2960, 2964, 3048–3050, 3079, 3133, 3152–3153, 3156–3160, 3166
   observability.py6528387%354, 356–358, 361–363, 368–369, 375–376, 382–383, 390, 392–394, 397–399, 404–405, 411–412, 418–419, 426, 464, 555, 697, 700, 708–709, 712–715, 717, 720–722, 725–726, 754, 756, 767–769, 771–773, 777, 785, 886, 888, 1037, 1039, 1043–1048, 1050, 1053–1057, 1059, 1168–1169, 1171, 1228–1229, 1305, 1428, 1598, 1601, 1660, 1830, 1984, 1986
packages/core/agent_framework/azure
   _embedding_client.py210100% 
   _shared.py72395%177, 191, 201
packages/core/agent_framework/openai
   _embedding_client.py51198%90
   _shared.py1271687%65, 71–74, 151, 153, 155, 162, 164, 177, 252, 276, 335–336, 338
TOTAL21410331684% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
4243 246 💤 0 ❌ 0 🔥 1m 17s ⏱️

eavanvalkenburg and others added 2 commits February 22, 2026 15:04
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Map OPENAI_EMBEDDING_MODEL_ID and AZURE_OPENAI_EMBEDDING_DEPLOYMENT_NAME
from GitHub vars to the integration test environment.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds the first phase of a broader “vector stores + embeddings” feature by introducing core embedding abstractions in agent_framework, plus OpenAI/Azure OpenAI embedding client implementations and supporting telemetry, tests, and samples.

Changes:

  • Introduces core embedding types (Embedding, GeneratedEmbeddings, EmbeddingGenerationOptions) and embedding client abstractions (SupportsGetEmbeddings, BaseEmbeddingClient).
  • Adds OpenAI + Azure OpenAI embedding clients and exports them via the provider namespaces.
  • Adds embedding-specific OpenTelemetry instrumentation plus new unit/integration tests and runnable samples.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
python/samples/02-agents/embeddings/openai_embeddings.py New sample demonstrating OpenAI embeddings usage.
python/samples/02-agents/embeddings/azure_openai_embeddings.py New sample demonstrating Azure OpenAI embeddings usage.
python/packages/core/tests/workflow/test_full_conversation.py Minor formatting-only updates in workflow tests.
python/packages/core/tests/openai/test_openai_embedding_client.py New OpenAI/Azure OpenAI embedding client unit + integration tests.
python/packages/core/tests/core/test_embedding_types.py New tests for embedding container/types.
python/packages/core/tests/core/test_embedding_client.py New tests for embedding protocol + base client behavior.
python/packages/core/agent_framework/openai/_shared.py Extends OpenAI settings to include embedding model env-var/config support.
python/packages/core/agent_framework/openai/_embedding_client.py New OpenAI embedding client implementation (raw + telemetry-wrapped).
python/packages/core/agent_framework/openai/init.py Exports OpenAIEmbeddingClient and OpenAIEmbeddingOptions.
python/packages/core/agent_framework/observability.py Adds EmbeddingTelemetryLayer and embedding operation name.
python/packages/core/agent_framework/azure/_shared.py Extends Azure OpenAI settings with embedding deployment name support.
python/packages/core/agent_framework/azure/_embedding_client.py New Azure OpenAI embedding client implementation.
python/packages/core/agent_framework/azure/init.py Lazy-export wiring for AzureOpenAIEmbeddingClient.
python/packages/core/agent_framework/_types.py Adds core embedding types + type variables.
python/packages/core/agent_framework/_clients.py Adds embedding protocol + base class abstractions.
python/packages/core/agent_framework/init.py Re-exports new embedding abstractions/types at top level.
docs/features/vector-stores-and-embeddings/README.md Adds the multi-phase migration/implementation plan and design notes.

eavanvalkenburg and others added 4 commits February 22, 2026 15:13
When encoding_format='base64' is used, the OpenAI API returns base64-encoded
floats instead of a JSON array. Decode these automatically to list[float]
so the return type stays consistent regardless of encoding format.

Also adds a unit test for base64 decoding and fixes minor docstring/import issues.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Embeddings have no output/completion tokens. Remove OUTPUT_TOKENS recording
which was double-counting prompt_tokens via the total_tokens fallback.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use contravariant/covariant TypeVars for SupportsGetEmbeddings Protocol.
Combine nested if into single statement in telemetry layer.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
GeneratedEmbeddings is invariant in its type param, so the Protocol
TypeVar cannot be covariant.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants