feat: add Microsoft Agent Framework integration by SuhaniNagpal7 · Pull Request #180 · future-agi/traceAI

SuhaniNagpal7 · 2026-05-25T12:37:49Z

Summary

Adds a new traceai-agent-framework package that wires Microsoft Agent Framework (Python) into FutureAGI's observability.

Microsoft Agent Framework emits OpenTelemetry spans natively on the "agent_framework" instrumentation scope using GenAI semantic conventions (gen_ai.*). FI's SpanAttributes is built on the same conventions, so most attributes pass through unchanged — this package only adds the few FI-specific keys that the framework doesn't emit (gen_ai.span.kind, input.value/output.value, flattened per-message attrs, derived total_tokens).

Design

Pattern: SpanProcessor (not method-wrap, not exporter-swap).

Most traceAI adapters use wrapt to patch framework methods and create their own spans. That doesn't work here because Microsoft already creates the spans — wrapping would duplicate them with broken parent links.

Instead, we install an AgentFrameworkSpanProcessor on the user's tracer provider. As each span ends, we read its gen_ai.* attributes, classify the span kind (LLM / EMBEDDING / TOOL / AGENT / CHAIN), and add the FI-specific keys before any downstream exporter sees the span.

Notable quirk handled: FI's TracerProvider.add_span_processor() drops its default exporter on first call (it considers the default replaceable). The integration prepends to the underlying multi-processor's tuple directly to preserve FI's exporter alongside our processor.

Chain bubble-up: Microsoft doesn't put input/output on workflow-level spans (workflow.run, executor.process, edge_group.process). Our processor maintains per-trace descendant-IO state so that when a CHAIN span ends, we bubble in the earliest descendant's input.value and the latest descendant's output.value. This is unique to this adapter — none of the comparable vendor integrations do it.

What's traced

Framework span	FI `gen_ai.span.kind`	Added FI attributes
`chat {model}`	`LLM`	`input.value`, `output.value`, mime types, flattened messages, `gen_ai.usage.total_tokens`
`embeddings {model}`	`EMBEDDING`	same as LLM
`execute_tool {tool}`	`TOOL`	`input.value` (args), `output.value` (result), mime types
`invoke_agent {agent}`, `create_agent {agent}`	`AGENT`	message-flattening + total_tokens
`workflow.run`, `workflow.build`, `executor.process`, `edge_group.process`, `message.send`	`CHAIN`	kind + bubbled input/output from descendants

All native gen_ai.* attributes are preserved alongside the FI additions — no destructive renames.

Public API

from fi_instrumentation import register
from fi_instrumentation.fi_types import ProjectType
from agent_framework.observability import enable_instrumentation
from traceai_agent_framework import enable_fi_attribute_mapping

trace_provider = register(
    project_type=ProjectType.OBSERVE,
    project_name="agent_framework_app",
    set_global_tracer_provider=True,
)
enable_instrumentation(enable_sensitive_data=True)
enable_fi_attribute_mapping()

Also exports AgentFrameworkSpanProcessor for users wiring their own tracer provider.

What's in this PR

File	Purpose
`traceai_agent_framework/processor.py`	`AgentFrameworkSpanProcessor` + all attribute mapping logic
`traceai_agent_framework/integration.py`	Public `enable_fi_attribute_mapping(tracer_provider=None)` entry point
`traceai_agent_framework/__init__.py`, `version.py`	Exports + version
`examples/basic_agent.py`	Single Agent + OpenAI chat client
`examples/workflow_multi_agent.py`	Two agents handing off through `WorkflowBuilder`
`examples/requirements.txt`	Example deps
`tests/test_processor.py`	56 unit tests on the mapping + processor logic
`tests/test_integration.py`	14 lifecycle tests for `enable_fi_attribute_mapping()`
`tests/test_e2e_agent.py`	4 end-to-end tests against real `Agent.run()` + `WorkflowBuilder.run()`
`tests/_fixtures/sample_spans.json`	Real captured spans used as test fixtures
`README.md`, `CHANGELOG.md`	Docs, matching the openai/anthropic style
`pyproject.toml`	Poetry config, `python = ">=3.10,<3.15"`, `fi-instrumentation-otel >=0.1.14`

Test plan

cd python/frameworks/agent-framework && pytest tests/ -q — reports 87 passed (56 processor + 14 integration + 4 e2e + 13 ancillary)
pip install -e python/frameworks/agent-framework then python -c "from traceai_agent_framework import enable_fi_attribute_mapping; print('ok')" succeeds
Run examples/basic_agent.py with OPENAI_API_KEY + FI_API_KEY + FI_SECRET_KEY set; verified in the FI dashboard:
- One invoke_agent span (type: agent) carrying input/output, model, token counts
- Nested chat gpt-4o-mini (type: llm) and execute_tool get_weather (type: tool) spans under it
- Input/Output panels populated with the user prompt + assistant response
- Agent Graph populated with per-kind node IDs (agent_*, llm_*, tool_*)
Run examples/workflow_multi_agent.py; workflow.run (type: chain) shows bubbled input/output from the inner agent spans
pyproject.toml declares python = ">=3.10,<3.15" (matches fi-instrumentation-otel constraint)

Test coverage added in `3b378fc`

Tests added on top of the original test file to cover the gap-closures and bug fixes from c9a072b:

test_processor.py (+30 tests, 56 total)
- Smart plain-text vs JSON formatting (single-text-message → plain text; multi-message → JSON)
- tool_call_response part extraction on AGENT spans (was missing in v1)
- Per-kind graph.node.id derivation (LLM uses response_id, AGENT uses agent.id, TOOL uses tool.name+call_id, CHAIN walks workflow.id/executor.id/edge_group.id)
- gen_ai.request.parameters JSON bundling (excludes gen_ai.request.model, dashboard's "Model Parameters" panel)
- metadata catch-all for 4 specific keys (choice.count, server.address, function.invocation.duration, function.name)
- Cross-batch bubble-up _SpanIO state cleanup
- _looks_like_pipecat_span heuristic guard
test_integration.py (+6 tests, 14 total)
- _active_span_processor._span_processors tuple-prepend preserves FI's default exporter (the bug behind the "no dashboard data" symptom)
- enable_instrumentation() is not re-called when already on, preserving caller's enable_sensitive_data=True
- Idempotent install (calling twice doesn't double-register)
- Custom tracer_provider argument path

Dashboard verification

Verified end-to-end against the live FI dashboard with the basic_agent.py example:

1. Trace list — invoke_agent weather_agent row with input/output snippet

2. AGENT span detail — User Message / Input / Output panels populated, gen_ai. + FI attrs side-by-side*

3. LLM child span (chat gpt-4o-mini) — type: llm, model, tokens, usage

4. TOOL child span (execute_tool get_weather) — input args + real wttr.in output (paris: ☀️ +24°C)

5. Agent Graph — per-kind node IDs (agent_<id>, llm_<resp_id>, tool_<name>_<call_id>) rendering the start → agent → llm → tool → llm → end path

6. Attributes panel — full gen_ai.* keys retained alongside FI's input.value / output.value / graph.node.id / metadata

**7. parts output input format

Empty file skeleton for the Microsoft Agent Framework integration under python/frameworks/agent-framework. Layout mirrors traceai_pipecat (exporter-swap pattern) with integration.py for the public swap API and exporters/ for the mapped HTTP/gRPC exporters.

Installs an AgentFrameworkSpanProcessor on the user's tracer provider to re-key the framework's native gen_ai.* spans into Future AGI conventions (gen_ai.span.kind, input.value/output.value with mime types, flattened per-message attrs, derived total_tokens). For chain spans the framework leaves without I/O (workflow.run, executor.process, etc.), bubbles input/output up from the earliest/latest descendant. Public API: enable_fi_attribute_mapping(tracer_provider=None) and the AgentFrameworkSpanProcessor class.

Mapping additions: - Extract tool_call_response and reasoning parts in message flattening; tool result text now lands in messages.{i}.message.content + messages.{i}.message.tool_call_id, and reasoning parts join into message.content - Surface graph.node.id / graph.node.name per kind (LLM/AGENT/TOOL/CHAIN), enabling the FI dashboard's agent-graph view to render workflow / executor / edge_group nodes - Bundle gen_ai.request.parameters JSON so the dashboard's "Model Parameters" panel renders for LLM/AGENT/EMBEDDING spans - Bundle a small metadata JSON (choice.count, server.address, agent_framework.function.*) for dashboard "Metadata" panel - Smart plain-text formatting for input.value / output.value when there is a single text-only message; keep raw JSON when the message structure is complex Integration / bug fixes: - Preserve FI's default exporter processor: install our SpanProcessor by prepending to the active multi-processor's tuple instead of calling provider.add_span_processor, which FI's TracerProvider wipes on first call - Don't clobber the user's enable_sensitive_data choice: only call enable_instrumentation() when it isn't already on; calling with no kwargs would otherwise re-read ENABLE_SENSITIVE_DATA from env and silently flip it off Examples: - basic_agent.py now uses a real get_weather tool that calls wttr.in via urllib (no API key needed); demonstrates a full LLM + tool + LLM trace end-to-end Packaging: - Move agent-framework from dev-only dep to runtime dep so pip install traceAI-agent-framework also installs the framework, matching anthropic/openai/crewai etc. Existing test updates reflect the new behavior (plain text input/output for the single-message case).

+30 tests, total 87 passing in <1s. Each test directly protects against a regression in something the polish round added or a bug that was caught during real-dashboard verification. test_processor.py (+23): - Multi-part message flatten: tool_call, tool_call_response, reasoning, mixed text+tool_call - graph.node.* per kind (LLM/AGENT/TOOL + workflow/executor/edge_group fallback chain for CHAIN) - gen_ai.request.parameters bundle (collected, excludes model, skipped when empty) - Cross-batch bubble-up + per-trace state isolation - embeddings -> EMBEDDING kind, create_agent -> AGENT kind - Smart formatting branches (plain text vs JSON for input.value/output.value) - Status is never set by the processor (Phase 0 finding) - Defensive: span with None attributes, span with MappingProxyType attrs test_integration.py (+7): - FI's default exporter processor survives our install (the silent-span-loss bug) - Our processor is prepended so synchronous downstream processors see mutated attrs - User's explicit enable_sensitive_data=True is not clobbered (the silent-message-loss bug) - Native instrumentation skipped gracefully when agent_framework is not installed

…ut values Drop the plain-text downgrade in _surface_messages_io. input.value and output.value now always mirror the raw gen_ai.input.messages / gen_ai.output.messages JSON blob with mime_type=application/json, preserving Microsoft's native parts shape (tool_call / tool_call_response / text). Matches the openai/anthropic/litellm sibling adapters' pattern. Removes the _all_text_message_content helper that was only used by the dropped path. Updates 5 tests to assert the JSON-always behavior.

Suhani Nagpal added 2 commits May 22, 2026 15:44

SuhaniNagpal7 requested a review from nik13 May 25, 2026 12:39

SuhaniNagpal7 marked this pull request as draft May 25, 2026 12:55

Suhani Nagpal added 2 commits May 26, 2026 13:49

SuhaniNagpal7 marked this pull request as ready for review May 26, 2026 09:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Microsoft Agent Framework integration#180

feat: add Microsoft Agent Framework integration#180
SuhaniNagpal7 wants to merge 5 commits into
devfrom
feat/agent-framework

SuhaniNagpal7 commented May 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

SuhaniNagpal7 commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Design

What's traced

Public API

What's in this PR

Test plan

Test coverage added in 3b378fc

Dashboard verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

SuhaniNagpal7 commented May 25, 2026 •

edited

Loading

Test coverage added in `3b378fc`