Skip to content

Python: Fix OpenAI chat client compatibility with third-party endpoints and OTel 0.4.14#4161

Merged
eavanvalkenburg merged 5 commits intomicrosoft:mainfrom
eavanvalkenburg:fix/1407-system-message-content-list
Feb 23, 2026
Merged

Python: Fix OpenAI chat client compatibility with third-party endpoints and OTel 0.4.14#4161
eavanvalkenburg merged 5 commits intomicrosoft:mainfrom
eavanvalkenburg:fix/1407-system-message-content-list

Conversation

@eavanvalkenburg
Copy link
Member

@eavanvalkenburg eavanvalkenburg commented Feb 23, 2026

Problems Fixed

  1. System message content as list (#1407) — Some OpenAI-compatible endpoints (e.g. NVIDIA NIM) reject system messages when content is a list of content parts. They only accept plain string content for system/developer roles.

  2. Text-only content as list for all roles (#4084) — Foundry Local's Neutron/.NET backend cannot deserialize the list format for content at all, failing with a 500 JSON serialization error for any message role.

  3. OTel attribute removal (#4160)opentelemetry-semantic-conventions-ai v0.4.14 removed several SpanAttributes.LLM_* attributes (LLM_SYSTEM, LLM_REQUEST_MODEL, LLM_RESPONSE_MODEL, LLM_REQUEST_MAX_TOKENS, LLM_REQUEST_TEMPERATURE, LLM_REQUEST_TOP_P), breaking observability at import time.

  4. Streaming text lost with usage data (#3434) — Gemini and other providers include both usage data and text content in the same streaming chunk. The early return on chunk.usage caused text and tool call parsing to be skipped entirely.

Changes

Message content flattening (_chat_client.py)

  • System/developer messages: Early return that flattens all text content into a plain string.
  • All other roles: Post-processing step that flattens text-only content lists to plain strings. Multimodal content (text + images/audio) remains as a list for the API.

Streaming fix (_chat_client.py)

  • Removed early return in _parse_response_update_from_openai() when chunk.usage is present. Usage is now collected alongside text and tool call content in a single ChatResponseUpdate.

OTel attributes (observability.py)

  • Moved removed SpanAttributes.LLM_* values into the OtelAttr enum with their well-known gen_ai.* string values. Updated all references in source and tests.

Tests

  • test_prepare_system_message_content_is_string
  • test_prepare_developer_message_content_is_string
  • test_prepare_system_message_multiple_text_contents_joined
  • test_prepare_user_message_text_content_is_string
  • test_prepare_user_message_multimodal_content_remains_list
  • test_prepare_assistant_message_text_content_is_string
  • test_streaming_chunk_with_usage_and_text
  • All observability tests updated and passing with both semconv-ai 0.4.13 and 0.4.14

Fixes #1407
Fixes #4160
Fixes #3434
Partially fixes #4084

Some OpenAI-compatible endpoints (e.g. NVIDIA NIM) reject system messages
when content is a list of content parts. This change flattens system and
developer message content to a plain string in the Chat Completions client.

Fixes microsoft#1407

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 23, 2026 09:03
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a compatibility issue where some OpenAI-compatible endpoints (e.g., NVIDIA NIM) reject system and developer messages when the content field is formatted as a list of content parts instead of a plain string. The fix modifies the _prepare_message_for_openai() method in the Chat Completions client to flatten system/developer message content to plain strings while preserving list format for user messages (needed for multimodal support).

Changes:

  • Modified _prepare_message_for_openai() to convert system/developer message content from list to string format
  • Multiple text content items are joined with newlines
  • Added comprehensive test coverage for the new behavior
  • Updated existing test to reflect the new string format for system messages

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
python/packages/core/agent_framework/openai/_chat_client.py Added early-return logic in _prepare_message_for_openai() to flatten system/developer message content to plain strings; also reformatted import statements
python/packages/core/tests/openai/test_openai_chat_client.py Added 4 new tests for system/developer message string formatting and updated existing test assertion to expect string content

Version 0.4.14 removed several LLM_* attributes from SpanAttributes
(LLM_SYSTEM, LLM_REQUEST_MODEL, LLM_RESPONSE_MODEL, LLM_REQUEST_MAX_TOKENS,
LLM_REQUEST_TEMPERATURE, LLM_REQUEST_TOP_P, LLM_TOKEN_TYPE).

Move these to the OtelAttr enum with their well-known gen_ai.* string values
and update all references in observability.py and tests.

Fixes microsoft#4160

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@eavanvalkenburg eavanvalkenburg force-pushed the fix/1407-system-message-content-list branch from 67956eb to 2706e1b Compare February 23, 2026 09:44
@markwallace-microsoft
Copy link
Member

markwallace-microsoft commented Feb 23, 2026

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
packages/core/agent_framework
   observability.py6258286%354, 356–358, 361–363, 368–369, 375–376, 382–383, 390, 392–394, 397–399, 404–405, 411–412, 418–419, 426, 464, 555, 697, 700, 708–709, 712–715, 717, 720–722, 725–726, 754, 756, 767–769, 771–773, 777, 785, 886, 888, 1037, 1039, 1043–1048, 1050, 1053–1057, 1059, 1168–1169, 1171, 1228–1229, 1364, 1534, 1537, 1596, 1766, 1920, 1922
packages/core/agent_framework/openai
   _chat_client.py2812491%210, 240–241, 245, 363, 370, 448–455, 457–460, 470, 548, 550, 566, 610, 626
TOTAL21280331784% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
4196 240 💤 0 ❌ 0 🔥 1m 12s ⏱️

eavanvalkenburg and others added 3 commits February 23, 2026 10:48
Extend the system/developer fix to all message roles. Text-only content
lists are now post-processed into plain strings, while multimodal content
(text + images/audio) remains as a list. This fixes compatibility with
OpenAI-like endpoints that cannot deserialize list content (e.g. Foundry
Local's Neutron backend).

Partially fixes microsoft#4084

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Some providers (e.g. Gemini) include both usage data and text content
in the same streaming chunk. The early return on chunk.usage caused
text and tool call parsing to be skipped entirely. Remove the early
return and process usage alongside text/tool calls.

Fixes microsoft#3434

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Rename shadowed variable 'args' in system/developer branch to 'sys_args'
and rename loop variable 'content' to 'msg_content' to avoid type conflict.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@eavanvalkenburg eavanvalkenburg changed the title Python: Fix system message content sent as list instead of string Python: Fix OpenAI chat client compatibility with third-party endpoints and OTel 0.4.14 Feb 23, 2026
@eavanvalkenburg eavanvalkenburg added this pull request to the merge queue Feb 23, 2026
Merged via the queue into microsoft:main with commit b1c7c7c Feb 23, 2026
25 checks passed
eavanvalkenburg added a commit to eavanvalkenburg/agent-framework that referenced this pull request Feb 23, 2026
…ts and OTel 0.4.14 (microsoft#4161)

* Fix system message content sent as list instead of string

Some OpenAI-compatible endpoints (e.g. NVIDIA NIM) reject system messages
when content is a list of content parts. This change flattens system and
developer message content to a plain string in the Chat Completions client.

Fixes microsoft#1407

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix compatibility with opentelemetry-semantic-conventions-ai 0.4.14

Version 0.4.14 removed several LLM_* attributes from SpanAttributes
(LLM_SYSTEM, LLM_REQUEST_MODEL, LLM_RESPONSE_MODEL, LLM_REQUEST_MAX_TOKENS,
LLM_REQUEST_TEMPERATURE, LLM_REQUEST_TOP_P, LLM_TOKEN_TYPE).

Move these to the OtelAttr enum with their well-known gen_ai.* string values
and update all references in observability.py and tests.

Fixes microsoft#4160

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Flatten text-only message content to string for all roles

Extend the system/developer fix to all message roles. Text-only content
lists are now post-processed into plain strings, while multimodal content
(text + images/audio) remains as a list. This fixes compatibility with
OpenAI-like endpoints that cannot deserialize list content (e.g. Foundry
Local's Neutron backend).

Partially fixes microsoft#4084

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix streaming text lost when usage data in same chunk

Some providers (e.g. Gemini) include both usage data and text content
in the same streaming chunk. The early return on chunk.usage caused
text and tool call parsing to be skipped entirely. Remove the early
return and process usage alongside text/tool calls.

Fixes microsoft#3434

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix mypy errors in _chat_client.py

Rename shadowed variable 'args' in system/developer branch to 'sys_args'
and rename loop variable 'content' to 'msg_content' to avoid type conflict.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
github-merge-queue bot pushed a commit that referenced this pull request Feb 24, 2026
…ation (Phase 1) (#4153)

* feat(python): Add embedding abstractions and OpenAI implementation (Phase 1)

This PR contains two parts:

1. **Overall migration plan** for porting vector stores and embeddings from
   Semantic Kernel to Agent Framework (docs/features/vector-stores-and-embeddings/README.md)
   covering all 10 phases from core abstractions through connectors and TextSearch.

2. **Phase 1 implementation** — core embedding abstractions and OpenAI/Azure OpenAI
   embedding clients:

   Core types (_types.py):
   - EmbeddingGenerationOptions TypedDict (total=False)
   - Embedding[EmbeddingT] generic class with model_id, dimensions, created_at
   - GeneratedEmbeddings[EmbeddingT, EmbeddingOptionsT] list container with options, usage
   - EmbeddingInputT (default str) and EmbeddingT (default list[float]) TypeVars

   Protocol + base class (_clients.py):
   - SupportsGetEmbeddings protocol — Generic[EmbeddingInputT, EmbeddingT, OptionsContraT]
   - BaseEmbeddingClient ABC — Generic[EmbeddingInputT, EmbeddingT, OptionsCoT]

   Telemetry (observability.py):
   - EmbeddingTelemetryLayer with gen_ai.operation.name = "embeddings"

   OpenAI implementation (openai/_embedding_client.py):
   - RawOpenAIEmbeddingClient, OpenAIEmbeddingClient, OpenAIEmbeddingOptions
   - Uses _ensure_client() factory pattern

   Azure OpenAI implementation (azure/_embedding_client.py):
   - AzureOpenAIEmbeddingClient following AzureOpenAIChatClient pattern
   - Supports API key, Entra ID credentials, env var configuration

   Tests:
   - 47 unit tests for types, protocol, base class, OpenAI, and Azure clients
   - 6 integration tests (gated behind RUN_INTEGRATION_TESTS + credentials)

   Samples:
   - samples/02-agents/embeddings/openai_embeddings.py
   - samples/02-agents/embeddings/azure_openai_embeddings.py

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: Add AzureOpenAIEmbeddingClient to azure __init__.pyi stub

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* ci: Add embedding env vars to Python integration tests

Map OPENAI_EMBEDDING_MODEL_ID and AZURE_OPENAI_EMBEDDING_DEPLOYMENT_NAME
from GitHub vars to the integration test environment.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: Handle base64 encoding_format in OpenAI embedding client

When encoding_format='base64' is used, the OpenAI API returns base64-encoded
floats instead of a JSON array. Decode these automatically to list[float]
so the return type stays consistent regardless of encoding format.

Also adds a unit test for base64 decoding and fixes minor docstring/import issues.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: Only record INPUT_TOKENS for embedding telemetry

Embeddings have no output/completion tokens. Remove OUTPUT_TOKENS recording
which was double-counting prompt_tokens via the total_tokens fallback.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: Resolve mypy variance error and lint warning

Use contravariant/covariant TypeVars for SupportsGetEmbeddings Protocol.
Combine nested if into single statement in telemetry layer.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: Make EmbeddingCoT invariant for mypy compatibility

GeneratedEmbeddings is invariant in its type param, so the Protocol
TypeVar cannot be covariant.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: Address PR review - empty values guard, service_url for telemetry

- Add early return for empty values in get_embeddings to avoid unnecessary API calls
- Add service_url() method to RawOpenAIEmbeddingClient for proper telemetry endpoint reporting
- Add test for empty values behavior

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Python: Fix OpenAI chat client compatibility with third-party endpoints and OTel 0.4.14 (#4161)

* Fix system message content sent as list instead of string

Some OpenAI-compatible endpoints (e.g. NVIDIA NIM) reject system messages
when content is a list of content parts. This change flattens system and
developer message content to a plain string in the Chat Completions client.

Fixes #1407

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix compatibility with opentelemetry-semantic-conventions-ai 0.4.14

Version 0.4.14 removed several LLM_* attributes from SpanAttributes
(LLM_SYSTEM, LLM_REQUEST_MODEL, LLM_RESPONSE_MODEL, LLM_REQUEST_MAX_TOKENS,
LLM_REQUEST_TEMPERATURE, LLM_REQUEST_TOP_P, LLM_TOKEN_TYPE).

Move these to the OtelAttr enum with their well-known gen_ai.* string values
and update all references in observability.py and tests.

Fixes #4160

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Flatten text-only message content to string for all roles

Extend the system/developer fix to all message roles. Text-only content
lists are now post-processed into plain strings, while multimodal content
(text + images/audio) remains as a list. This fixes compatibility with
OpenAI-like endpoints that cannot deserialize list content (e.g. Foundry
Local's Neutron backend).

Partially fixes #4084

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix streaming text lost when usage data in same chunk

Some providers (e.g. Gemini) include both usage data and text content
in the same streaming chunk. The early return on chunk.usage caused
text and tool call parsing to be skipped entirely. Remove the early
return and process usage alongside text/tool calls.

Fixes #3434

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix mypy errors in _chat_client.py

Rename shadowed variable 'args' in system/developer branch to 'sys_args'
and rename loop variable 'content' to 'msg_content' to avoid type conflict.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* reorder imports

* fix: Use OtelAttr.REQUEST_MODEL instead of removed SpanAttributes.LLM_REQUEST_MODEL

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs: Add score_threshold to vector store plan

Reference SK .NET PR #13501 for score threshold filtering semantics.
Include score_threshold in SearchOptions from Phase 3.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs: Add reference to roji's SK .NET MEVD work for SQL connectors

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: Clear env vars in construction tests to avoid CI leakage

Tests for missing API key / model ID now use monkeypatch.delenv to ensure
env vars from the integration test environment don't prevent the expected
ValueError from being raised.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment