Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
75 commits
Select commit Hold shift + click to select a range
64f8472
feat: add agent-framework-azure-contentunderstanding package
yungshinlintw Mar 21, 2026
1058963
fix: update CU fixtures with real API data, fix test assertions
yungshinlintw Mar 21, 2026
ac552d2
chore: add connector .gitignore, update uv.lock
yungshinlintw Mar 22, 2026
e654aad
refactor: rename to azure-ai-contentunderstanding, fix CI issues
yungshinlintw Mar 23, 2026
eed2fce
feat: add samples (document_qa, invoice_processing, multimodal_chat)
yungshinlintw Mar 23, 2026
d94f08a
feat: add remaining samples (devui_multimodal_agent, large_doc_file_s…
yungshinlintw Mar 23, 2026
bb53a0f
feat: add file_search integration for large document RAG
yungshinlintw Mar 23, 2026
e5e4137
fix: add key-based auth support to all samples
yungshinlintw Mar 23, 2026
ee86b61
FEATURE(python): add analyzer auto-detection, file_search RAG, and la…
yungshinlin Mar 24, 2026
34c1939
feat(cu): MIME sniffing, media-aware formatting, unified timeout, vec…
yungshinlin Mar 24, 2026
ba4bfa3
fix: merge all CU content segments for video/audio analysis
yungshinlin Mar 25, 2026
e276f76
refactor: improve CU context provider docs and remove ContentLimits
yungshinlintw Mar 25, 2026
90ace98
feat: support user-provided vector store in FileSearchConfig
yungshinlintw Mar 25, 2026
38b3aba
fix: remove ContentLimits from README code block
yungshinlintw Mar 25, 2026
c2b9b32
refactor: create CU client in __init__ instead of __aenter__
yungshinlintw Mar 25, 2026
2e6f1b4
docs: add file_search param to class docstring
yungshinlintw Mar 25, 2026
1e4b889
feat: introduce FileSearchBackend abstraction for cross-client support
yungshinlintw Mar 25, 2026
7c39752
refactor: FileSearchBackend abstraction + caller-owned vector store
yungshinlintw Mar 26, 2026
7fce742
fix: file_search reliability and sample improvements
yungshinlintw Mar 26, 2026
8ea7e13
perf: set max_num_results=10 for file_search to reduce token usage
yungshinlintw Mar 26, 2026
41a070c
fix: move import to top of file (E402 lint)
yungshinlintw Mar 26, 2026
44cfcbc
chore: remove unused imports
yungshinlintw Mar 26, 2026
628ad1c
fix: align azure-ai-contentunderstanding with MAF coding conventions
yungshinlin Mar 26, 2026
6076cb2
refactor: improve CU context provider API surface and fix CI
yungshinlin Mar 26, 2026
5cff2f7
fix: improve file_search samples and move tool guidelines to context …
yungshinlin Mar 26, 2026
3607c85
feat: improve source_id, integration tests, and content assertions
yungshinlin Mar 26, 2026
aaf97c2
feat: reject duplicate filenames, add integration tests and sample co…
yungshinlin Mar 26, 2026
0437e43
chore: improve doc key derivation, comments, and README
yungshinlin Mar 26, 2026
2081ed5
test: strengthen _format_result assertions with exact expected strings
yungshinlin Mar 26, 2026
f377b2c
refactor: move invoice.pdf to shared sample_assets directory
yungshinlin Mar 26, 2026
5e1c2e9
refactor: reorganize samples into numbered dirs and simplify auth
yungshinlin Mar 26, 2026
2cde5fc
fix: resolve CI lint errors (D205, RUF001, E501)
yungshinlin Mar 26, 2026
4981b35
refactor: overhaul samples — FoundryChatClient, sessions, remove get_…
yungshinlin Mar 27, 2026
f2cbc45
feat: add 05_background_analysis sample and fix 04 session/max_wait
yungshinlin Mar 27, 2026
9eb35c2
docs: update README and fix sample 06
yungshinlin Mar 27, 2026
c1eb370
docs: rewrite README — concise format, prerequisites, CU link
yungshinlin Mar 27, 2026
7e8e62a
fix: resolve pyright errors in _format_result segment cast
yungshinlin Mar 27, 2026
9fc9e4e
docs: add numbered section comments and fresh sample output to all sa…
yungshinlin Mar 27, 2026
4097328
feat: add load_settings support for env var configuration
yungshinlin Mar 27, 2026
2682bfc
docs: polish README — fix duplicate env var, add Next steps, service …
yungshinlin Mar 27, 2026
39b79c3
chore: trim invoice fixture from 199K to 33 lines
yungshinlin Mar 27, 2026
bdb4617
feat: per-file analyzer_id override via additional_properties
yungshinlin Mar 27, 2026
4a9196d
Trim PDF test fixture and clarify unique filename requirement
yungshinlin Mar 27, 2026
beed5cc
Update python/packages/azure-ai-contentunderstanding/agent_framework_…
yungshinlintw Mar 27, 2026
759d29c
Update python/packages/azure-ai-contentunderstanding/agent_framework_…
yungshinlintw Mar 27, 2026
dc01991
Update python/packages/azure-ai-contentunderstanding/samples/02-devui…
yungshinlintw Mar 27, 2026
7222b6c
Update python/packages/azure-ai-contentunderstanding/samples/02-devui…
yungshinlintw Mar 27, 2026
53ab967
Update python/packages/azure-ai-contentunderstanding/samples/01-get-s…
yungshinlintw Mar 27, 2026
d5bb27d
Fix AGENTS.md to match implementation; remove unused variable in test…
yungshinlin Mar 27, 2026
618e4fe
Fix premature file_search instruction for background-completed docs
yungshinlin Mar 27, 2026
ad08891
fix: wrap long line in devui agent instructions (E501)
yungshinlin Mar 27, 2026
51a3be5
Fix Copilot review: unused logger, stray code in README, await cancel…
yungshinlin Mar 27, 2026
8c9777c
Sanitize doc keys and fix duplicate filename re-injection
yungshinlin Mar 27, 2026
a169efd
fix: add type annotation to tasks_to_cancel for pyright
yungshinlin Mar 27, 2026
5c06dfa
Move per-session mutable state to state dict for session isolation
yungshinlin Mar 27, 2026
2895202
Remove unused AnalysisSection enum values
yungshinlin Mar 27, 2026
958568b
Recursively flatten object/array field values for cleaner LLM output
yungshinlin Mar 27, 2026
f483b81
Preserve sub-field confidence; compare full expected JSON in tests
yungshinlin Mar 27, 2026
2e1dd6a
Remove incorrect MIME aliases (audio/mp4, video/x-matroska)
yungshinlin Mar 27, 2026
3ae3e66
feat: add AnalysisInput, content_range, warnings, and category support
yungshinlin Mar 27, 2026
b09f39c
fix: falsy-0 bug in duration calc; improve test coverage
yungshinlin Mar 27, 2026
c2e9cb4
refactor: split _context_provider.py into focused modules
yungshinlin Mar 27, 2026
5698d66
docs: update AGENTS.md with DocumentStatus, FileSearchBackend, and _f…
yungshinlin Mar 27, 2026
0cc0cde
refactor: replace AnalysisSection enum with Literal type for simpler DX
yungshinlin Apr 1, 2026
e3f684c
refactor: replace asyncio.Task with continuation tokens for serializa…
yungshinlin Apr 1, 2026
06d46b4
fix: resolve CI lint (RUF052) and mypy (call-overload) errors
yungshinlin Apr 1, 2026
d1b858c
feat: add structured output (Pydantic model) to invoice processing sa…
yungshinlin Apr 1, 2026
a39a0e6
fix: use FOUNDRY_PROJECT_ENDPOINT and FOUNDRY_MODEL env vars in all s…
yungshinlin Apr 1, 2026
331b3c1
refactor: remove background_analysis sample, use FoundryChatClient in…
yungshinlin Apr 1, 2026
c57c814
fix: vector_stores API moved from beta namespace in OpenAI SDK
yungshinlin Apr 1, 2026
493583d
docs: add comments about multi-file support and CU service limits in …
yungshinlin Apr 1, 2026
0b830ba
fix: broken markdown links after sample removal and renumbering
yungshinlin Apr 1, 2026
2c79c7a
fix: migrate BaseContextProvider to ContextProvider (non-deprecated)
yungshinlintw Apr 2, 2026
0c3db4d
fix: Message(text=) -> Message(contents=[]) for API compatibility
yungshinlin Apr 3, 2026
2c22cae
Merge branch 'main' into yslin/contentunderstanding-context-provider
yungshinlintw Apr 3, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions python/AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@ python/

### Azure Integrations
- [foundry](packages/foundry/README.md) - Microsoft Foundry chat, agent, memory, and embedding integrations
- [azure-ai-contentunderstanding](packages/azure-ai-contentunderstanding/AGENTS.md) - Azure Content Understanding context provider
- [azure-ai-search](packages/azure-ai-search/AGENTS.md) - Azure AI Search RAG
- [azure-cosmos](packages/azure-cosmos/AGENTS.md) - Azure Cosmos DB-backed history provider
- [azurefunctions](packages/azurefunctions/AGENTS.md) - Azure Functions hosting
Expand Down
3 changes: 3 additions & 0 deletions python/packages/azure-ai-contentunderstanding/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Local-only files (not committed)
_local_only/
*_local_only*
71 changes: 71 additions & 0 deletions python/packages/azure-ai-contentunderstanding/AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# AGENTS.md — azure-ai-contentunderstanding

## Package Overview

`agent-framework-azure-ai-contentunderstanding` integrates Azure Content Understanding (CU)
into the Agent Framework as a context provider. It automatically analyzes file attachments
(documents, images, audio, video) and injects structured results into the LLM context.

## Public API

| Symbol | Type | Description |
|--------|------|-------------|
| `ContentUnderstandingContextProvider` | class | Main context provider — extends `ContextProvider` |
| `AnalysisSection` | enum | Output section selector (MARKDOWN, FIELDS, etc.) |
| `DocumentStatus` | enum | Document lifecycle state (ANALYZING, UPLOADING, READY, FAILED) |
| `FileSearchBackend` | ABC | Abstract vector store file operations interface |
| `FileSearchConfig` | dataclass | Configuration for CU + vector store RAG mode |

## Architecture

- **`_context_provider.py`** — Main provider implementation. Overrides `before_run()` to detect
file attachments, call the CU API, manage session state with multi-document tracking,
and auto-register retrieval tools for follow-up turns.
- **Analyzer auto-detection** — When `analyzer_id=None` (default), `_resolve_analyzer_id()`
selects the CU analyzer based on media type prefix: `audio/` → `prebuilt-audioSearch`,
`video/` → `prebuilt-videoSearch`, everything else → `prebuilt-documentSearch`.
- **Multi-segment output** — CU splits long video/audio into multiple scene segments
(each a separate `contents[]` entry with its own `startTimeMs`, `endTimeMs`, `markdown`,
and `fields`). `_extract_sections()` produces:
- `segments`: list of per-segment dicts, each with `markdown`, `fields`, `start_time_s`, `end_time_s`
- `markdown`: concatenated at top level with `---` separators (for file_search uploads)
- `duration_seconds`: computed from global `min(startTimeMs)` → `max(endTimeMs)`
- Metadata (`kind`, `resolution`): taken from the first segment
- **Speaker diarization (not identification)** — CU transcripts label speakers as
`<Speaker 1>`, `<Speaker 2>`, etc. CU does **not** identify speakers by name.
- **file_search RAG** — When `FileSearchConfig` is provided, CU-extracted markdown is
uploaded to an OpenAI vector store and a `file_search` tool is registered on the context
instead of injecting the full document content. This enables token-efficient retrieval
for large documents.
- **`_models.py`** — `AnalysisSection` enum, `DocumentStatus` enum, `DocumentEntry` TypedDict,
`FileSearchConfig` dataclass.
- **`_file_search.py`** — `FileSearchBackend` ABC, `OpenAIFileSearchBackend`,
`FoundryFileSearchBackend`.

## Key Patterns

- Follows the Azure AI Search context provider pattern (same lifecycle, config style).
- Uses provider-scoped `state` dict for multi-document tracking across turns.
- Auto-registers `list_documents()` tool via `context.extend_tools()`.
- Configurable timeout (`max_wait`) with `asyncio.create_task()` background fallback.
- Strips supported binary attachments from `input_messages` to prevent LLM API errors.
- Explicit `analyzer_id` always overrides auto-detection (user preference wins).
- Vector store resources are cleaned up in `close()` / `__aexit__`.

## Samples

| Sample | Description |
|--------|-------------|
| `01_document_qa.py` | Upload a PDF via URL, ask questions about it |
| `02_multi_turn_session.py` | AgentSession persistence across turns |
| `03_multimodal_chat.py` | PDF + audio + video parallel analysis |
| `04_invoice_processing.py` | Structured field extraction with `prebuilt-invoice` analyzer |
| `05_large_doc_file_search.py` | CU extraction + OpenAI vector store RAG |
| `02-devui/01-multimodal_agent/` | DevUI web UI for CU-powered chat |
| `02-devui/02-file_search_agent/` | DevUI web UI combining CU + file_search RAG |

## Running Tests

```bash
uv run poe test -P azure-ai-contentunderstanding
```
21 changes: 21 additions & 0 deletions python/packages/azure-ai-contentunderstanding/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) Microsoft Corporation.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE
127 changes: 127 additions & 0 deletions python/packages/azure-ai-contentunderstanding/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
# Get Started with Azure Content Understanding in Microsoft Agent Framework

Please install this package via pip:

```bash
pip install agent-framework-azure-ai-contentunderstanding --pre
```

## Azure Content Understanding Integration

### Prerequisites

Before using this package, you need an Azure Content Understanding resource:

1. An active **Azure subscription** ([create one for free](https://azure.microsoft.com/pricing/purchase-options/azure-account))
2. A **Microsoft Foundry resource** created in a [supported region](https://learn.microsoft.com/azure/ai-services/content-understanding/language-region-support)
3. **Default model deployments** configured for your resource (GPT-4.1, GPT-4.1-mini, text-embedding-3-large)

Follow the [prerequisites section](https://learn.microsoft.com/azure/ai-services/content-understanding/quickstart/use-rest-api?tabs=portal%2Cdocument&pivots=programming-language-rest#prerequisites) in the Azure Content Understanding quickstart for setup instructions.

### Introduction

The Azure Content Understanding integration provides a context provider that automatically analyzes file attachments (documents, images, audio, video) using [Azure Content Understanding](https://learn.microsoft.com/azure/ai-services/content-understanding/) and injects structured results into the LLM context.

- **Document & image analysis**: State-of-the-art OCR with markdown extraction, table preservation, and structured field extraction — handles scanned PDFs, handwritten content, and complex layouts
- **Audio & video analysis**: Transcription, speaker diarization, and per-segment summaries
- **Background processing**: Configurable timeout with async background fallback for large files
- **file_search integration**: Optional vector store upload for token-efficient RAG on large documents

> Learn more about Azure Content Understanding capabilities at [https://learn.microsoft.com/azure/ai-services/content-understanding/](https://learn.microsoft.com/azure/ai-services/content-understanding/)

### Basic Usage Example

See the [samples directory](samples/) which demonstrates:

- Single PDF upload and Q&A ([01_document_qa](samples/01-get-started/01_document_qa.py))
- Multi-turn sessions with cached results ([02_multi_turn_session](samples/01-get-started/02_multi_turn_session.py))
- PDF + audio + video parallel analysis ([03_multimodal_chat](samples/01-get-started/03_multimodal_chat.py))
- Structured field extraction with prebuilt-invoice ([04_invoice_processing](samples/01-get-started/04_invoice_processing.py))
- CU extraction + OpenAI vector store RAG ([05_large_doc_file_search](samples/01-get-started/05_large_doc_file_search.py))
- Interactive web UI with DevUI ([02-devui](samples/02-devui/))

```python
import asyncio
from agent_framework import Agent, AgentSession, Message, Content
from agent_framework.foundry import FoundryChatClient
from agent_framework_azure_ai_contentunderstanding import ContentUnderstandingContextProvider
from azure.identity import AzureCliCredential

credential = AzureCliCredential()

cu = ContentUnderstandingContextProvider(
endpoint="https://my-resource.cognitiveservices.azure.com/",
credential=credential,
max_wait=None, # block until CU extraction completes before sending to LLM
)

client = FoundryChatClient(
project_endpoint="https://your-project.services.ai.azure.com",
model="gpt-4.1",
credential=credential,
)

async def main():
async with cu:
agent = Agent(
client=client,
name="DocumentQA",
instructions="You are a helpful document analyst.",
context_providers=[cu],
)
session = AgentSession()

response = await agent.run(
Message(role="user", contents=[
Content.from_text("What's on this invoice?"),
Content.from_uri(
"https://raw.githubusercontent.com/Azure-Samples/"
"azure-ai-content-understanding-assets/main/document/invoice.pdf",
media_type="application/pdf",
additional_properties={"filename": "invoice.pdf"},
),
]),
session=session,
)
print(response.text)

asyncio.run(main())
```

### Supported File Types

| Category | Types |
|----------|-------|
| Documents | PDF, DOCX, XLSX, PPTX, HTML, TXT, Markdown |
| Images | JPEG, PNG, TIFF, BMP |
| Audio | WAV, MP3, M4A, FLAC, OGG |
| Video | MP4, MOV, AVI, WebM |

For the complete list of supported file types and size limits, see [Azure Content Understanding service limits](https://learn.microsoft.com/azure/ai-services/content-understanding/service-limits#input-file-limits).

### Environment Variables

The provider supports automatic endpoint resolution from environment variables.
When ``endpoint`` is not passed to the constructor, it is loaded from
``AZURE_CONTENTUNDERSTANDING_ENDPOINT``:

```python
# Endpoint auto-loaded from AZURE_CONTENTUNDERSTANDING_ENDPOINT env var
cu = ContentUnderstandingContextProvider(credential=credential)
```

Set these in your shell or in a `.env` file:

```bash
AZURE_CONTENTUNDERSTANDING_ENDPOINT=https://your-cu-resource.cognitiveservices.azure.com/
AZURE_AI_PROJECT_ENDPOINT=https://your-project.services.ai.azure.com
AZURE_OPENAI_DEPLOYMENT_NAME=gpt-4.1
```

You also need to be logged in with `az login` (for `AzureCliCredential`).

### Next steps

- Explore the [samples directory](samples/) for complete code examples
- Read the [Azure Content Understanding documentation](https://learn.microsoft.com/azure/ai-services/content-understanding/) for detailed service information
- Learn more about the [Microsoft Agent Framework](https://aka.ms/agent-framework)
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Copyright (c) Microsoft. All rights reserved.

"""Azure Content Understanding integration for Microsoft Agent Framework.
Provides a context provider that analyzes file attachments (documents, images,
audio, video) using Azure Content Understanding and injects structured results
into the LLM context.
"""

import importlib.metadata

from ._context_provider import ContentUnderstandingContextProvider
from ._file_search import FileSearchBackend
from ._models import AnalysisSection, DocumentStatus, FileSearchConfig

try:
__version__ = importlib.metadata.version(__name__)
except importlib.metadata.PackageNotFoundError:
__version__ = "0.0.0"

__all__ = [
"AnalysisSection",
"ContentUnderstandingContextProvider",
"DocumentStatus",
"FileSearchBackend",
"FileSearchConfig",
"__version__",
]
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# Copyright (c) Microsoft. All rights reserved.

"""Constants for Azure Content Understanding context provider.

Supported media types, MIME aliases, and analyzer mappings used by
the file detection and analysis pipeline.
"""

from __future__ import annotations

# MIME types used to match against the resolved media type for routing files to CU analysis.
# The media type may be provided via Content.media_type or inferred (e.g., via sniffing or filename)
# when missing or generic (such as application/octet-stream). Only files whose resolved media type is
# in this set will be processed; others are skipped.
#
# Supported input file types:
# https://learn.microsoft.com/azure/ai-services/content-understanding/service-limits#input-file-limits
SUPPORTED_MEDIA_TYPES: frozenset[str] = frozenset({
# Documents and images
"application/pdf",
"image/jpeg",
"image/png",
"image/tiff",
"image/bmp",
"image/heif",
"image/heic",
"application/vnd.openxmlformats-officedocument.wordprocessingml.document",
"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
"application/vnd.openxmlformats-officedocument.presentationml.presentation",
# Text
"text/plain",
"text/html",
"text/markdown",
"text/rtf",
"text/xml",
"application/xml",
"message/rfc822",
"application/vnd.ms-outlook",
# Audio
"audio/wav",
"audio/mpeg",
"audio/mp3",
"audio/mp4",
"audio/m4a",
"audio/flac",
"audio/ogg",
"audio/opus",
"audio/webm",
"audio/x-ms-wma",
"audio/aac",
"audio/amr",
"audio/3gpp",
# Video
"video/mp4",
"video/quicktime",
"video/x-msvideo",
"video/webm",
"video/x-flv",
"video/x-ms-wmv",
"video/x-ms-asf",
"video/x-matroska",
})

# Mapping from filetype's MIME output to our canonical SUPPORTED_MEDIA_TYPES values.
# filetype uses some x-prefixed variants that differ from our set.
MIME_ALIASES: dict[str, str] = {
"audio/x-wav": "audio/wav",
"audio/x-flac": "audio/flac",
"video/x-m4v": "video/mp4",
}

# Mapping from media type prefix to the appropriate prebuilt CU analyzer.
# Used when analyzer_id is None (auto-detect mode).
MEDIA_TYPE_ANALYZER_MAP: dict[str, str] = {
"audio/": "prebuilt-audioSearch",
"video/": "prebuilt-videoSearch",
}
DEFAULT_ANALYZER: str = "prebuilt-documentSearch"
Loading
Loading