Skip to content

feat: add Microsoft Agent Framework integration#180

Open
SuhaniNagpal7 wants to merge 5 commits into
devfrom
feat/agent-framework
Open

feat: add Microsoft Agent Framework integration#180
SuhaniNagpal7 wants to merge 5 commits into
devfrom
feat/agent-framework

Conversation

@SuhaniNagpal7
Copy link
Copy Markdown
Contributor

@SuhaniNagpal7 SuhaniNagpal7 commented May 25, 2026

Summary

Adds a new traceai-agent-framework package that wires Microsoft Agent Framework (Python) into FutureAGI's observability.

Microsoft Agent Framework emits OpenTelemetry spans natively on the "agent_framework" instrumentation scope using GenAI semantic conventions (gen_ai.*). FI's SpanAttributes is built on the same conventions, so most attributes pass through unchanged — this package only adds the few FI-specific keys that the framework doesn't emit (gen_ai.span.kind, input.value/output.value, flattened per-message attrs, derived total_tokens).

Design

Pattern: SpanProcessor (not method-wrap, not exporter-swap).

Most traceAI adapters use wrapt to patch framework methods and create their own spans. That doesn't work here because Microsoft already creates the spans — wrapping would duplicate them with broken parent links.

Instead, we install an AgentFrameworkSpanProcessor on the user's tracer provider. As each span ends, we read its gen_ai.* attributes, classify the span kind (LLM / EMBEDDING / TOOL / AGENT / CHAIN), and add the FI-specific keys before any downstream exporter sees the span.

Notable quirk handled: FI's TracerProvider.add_span_processor() drops its default exporter on first call (it considers the default replaceable). The integration prepends to the underlying multi-processor's tuple directly to preserve FI's exporter alongside our processor.

Chain bubble-up: Microsoft doesn't put input/output on workflow-level spans (workflow.run, executor.process, edge_group.process). Our processor maintains per-trace descendant-IO state so that when a CHAIN span ends, we bubble in the earliest descendant's input.value and the latest descendant's output.value. This is unique to this adapter — none of the comparable vendor integrations do it.

What's traced

Framework span FI gen_ai.span.kind Added FI attributes
chat {model} LLM input.value, output.value, mime types, flattened messages, gen_ai.usage.total_tokens
embeddings {model} EMBEDDING same as LLM
execute_tool {tool} TOOL input.value (args), output.value (result), mime types
invoke_agent {agent}, create_agent {agent} AGENT message-flattening + total_tokens
workflow.run, workflow.build, executor.process, edge_group.process, message.send CHAIN kind + bubbled input/output from descendants

All native gen_ai.* attributes are preserved alongside the FI additions — no destructive renames.

Public API

from fi_instrumentation import register
from fi_instrumentation.fi_types import ProjectType
from agent_framework.observability import enable_instrumentation
from traceai_agent_framework import enable_fi_attribute_mapping

trace_provider = register(
    project_type=ProjectType.OBSERVE,
    project_name="agent_framework_app",
    set_global_tracer_provider=True,
)
enable_instrumentation(enable_sensitive_data=True)
enable_fi_attribute_mapping()

Also exports AgentFrameworkSpanProcessor for users wiring their own tracer provider.

What's in this PR

File Purpose
traceai_agent_framework/processor.py AgentFrameworkSpanProcessor + all attribute mapping logic
traceai_agent_framework/integration.py Public enable_fi_attribute_mapping(tracer_provider=None) entry point
traceai_agent_framework/__init__.py, version.py Exports + version
examples/basic_agent.py Single Agent + OpenAI chat client
examples/workflow_multi_agent.py Two agents handing off through WorkflowBuilder
examples/requirements.txt Example deps
tests/test_processor.py 56 unit tests on the mapping + processor logic
tests/test_integration.py 14 lifecycle tests for enable_fi_attribute_mapping()
tests/test_e2e_agent.py 4 end-to-end tests against real Agent.run() + WorkflowBuilder.run()
tests/_fixtures/sample_spans.json Real captured spans used as test fixtures
README.md, CHANGELOG.md Docs, matching the openai/anthropic style
pyproject.toml Poetry config, python = ">=3.10,<3.15", fi-instrumentation-otel >=0.1.14

Test plan

  • cd python/frameworks/agent-framework && pytest tests/ -q — reports 87 passed (56 processor + 14 integration + 4 e2e + 13 ancillary)
  • pip install -e python/frameworks/agent-framework then python -c "from traceai_agent_framework import enable_fi_attribute_mapping; print('ok')" succeeds
  • Run examples/basic_agent.py with OPENAI_API_KEY + FI_API_KEY + FI_SECRET_KEY set; verified in the FI dashboard:
    • One invoke_agent span (type: agent) carrying input/output, model, token counts
    • Nested chat gpt-4o-mini (type: llm) and execute_tool get_weather (type: tool) spans under it
    • Input/Output panels populated with the user prompt + assistant response
    • Agent Graph populated with per-kind node IDs (agent_*, llm_*, tool_*)
  • Run examples/workflow_multi_agent.py; workflow.run (type: chain) shows bubbled input/output from the inner agent spans
  • pyproject.toml declares python = ">=3.10,<3.15" (matches fi-instrumentation-otel constraint)

Test coverage added in 3b378fc

Tests added on top of the original test file to cover the gap-closures and bug fixes from c9a072b:

  • test_processor.py (+30 tests, 56 total)

    • Smart plain-text vs JSON formatting (single-text-message → plain text; multi-message → JSON)
    • tool_call_response part extraction on AGENT spans (was missing in v1)
    • Per-kind graph.node.id derivation (LLM uses response_id, AGENT uses agent.id, TOOL uses tool.name+call_id, CHAIN walks workflow.id/executor.id/edge_group.id)
    • gen_ai.request.parameters JSON bundling (excludes gen_ai.request.model, dashboard's "Model Parameters" panel)
    • metadata catch-all for 4 specific keys (choice.count, server.address, function.invocation.duration, function.name)
    • Cross-batch bubble-up _SpanIO state cleanup
    • _looks_like_pipecat_span heuristic guard
  • test_integration.py (+6 tests, 14 total)

    • _active_span_processor._span_processors tuple-prepend preserves FI's default exporter (the bug behind the "no dashboard data" symptom)
    • enable_instrumentation() is not re-called when already on, preserving caller's enable_sensitive_data=True
    • Idempotent install (calling twice doesn't double-register)
    • Custom tracer_provider argument path

Dashboard verification

Verified end-to-end against the live FI dashboard with the basic_agent.py example:

1. Trace list — invoke_agent weather_agent row with input/output snippet
image

2. AGENT span detail — User Message / Input / Output panels populated, gen_ai. + FI attrs side-by-side*
image

3. LLM child span (chat gpt-4o-mini) — type: llm, model, tokens, usage
image

4. TOOL child span (execute_tool get_weather) — input args + real wttr.in output (paris: ☀️ +24°C)
image

5. Agent Graph — per-kind node IDs (agent_<id>, llm_<resp_id>, tool_<name>_<call_id>) rendering the start → agent → llm → tool → llm → end path
image

6. Attributes panel — full gen_ai.* keys retained alongside FI's input.value / output.value / graph.node.id / metadata
image

**7. parts output input format
image

Suhani Nagpal added 2 commits May 22, 2026 15:44
Empty file skeleton for the Microsoft Agent Framework integration
under python/frameworks/agent-framework. Layout mirrors traceai_pipecat
(exporter-swap pattern) with integration.py for the public swap API
and exporters/ for the mapped HTTP/gRPC exporters.
Installs an AgentFrameworkSpanProcessor on the user's tracer provider
to re-key the framework's native gen_ai.* spans into Future AGI
conventions (gen_ai.span.kind, input.value/output.value with mime
types, flattened per-message attrs, derived total_tokens). For chain
spans the framework leaves without I/O (workflow.run, executor.process,
etc.), bubbles input/output up from the earliest/latest descendant.

Public API: enable_fi_attribute_mapping(tracer_provider=None) and
the AgentFrameworkSpanProcessor class.
@SuhaniNagpal7 SuhaniNagpal7 requested a review from nik13 May 25, 2026 12:39
@SuhaniNagpal7 SuhaniNagpal7 marked this pull request as draft May 25, 2026 12:55
Suhani Nagpal added 2 commits May 26, 2026 13:49
Mapping additions:
- Extract tool_call_response and reasoning parts in message flattening; tool result text now lands in messages.{i}.message.content + messages.{i}.message.tool_call_id, and reasoning parts join into message.content
- Surface graph.node.id / graph.node.name per kind (LLM/AGENT/TOOL/CHAIN), enabling the FI dashboard's agent-graph view to render workflow / executor / edge_group nodes
- Bundle gen_ai.request.parameters JSON so the dashboard's "Model Parameters" panel renders for LLM/AGENT/EMBEDDING spans
- Bundle a small metadata JSON (choice.count, server.address, agent_framework.function.*) for dashboard "Metadata" panel
- Smart plain-text formatting for input.value / output.value when there is a single text-only message; keep raw JSON when the message structure is complex

Integration / bug fixes:
- Preserve FI's default exporter processor: install our SpanProcessor by prepending to the active multi-processor's tuple instead of calling provider.add_span_processor, which FI's TracerProvider wipes on first call
- Don't clobber the user's enable_sensitive_data choice: only call enable_instrumentation() when it isn't already on; calling with no kwargs would otherwise re-read ENABLE_SENSITIVE_DATA from env and silently flip it off

Examples:
- basic_agent.py now uses a real get_weather tool that calls wttr.in via urllib (no API key needed); demonstrates a full LLM + tool + LLM trace end-to-end

Packaging:
- Move agent-framework from dev-only dep to runtime dep so pip install traceAI-agent-framework also installs the framework, matching anthropic/openai/crewai etc.

Existing test updates reflect the new behavior (plain text input/output for the single-message case).
+30 tests, total 87 passing in <1s. Each test directly protects against
a regression in something the polish round added or a bug that was caught
during real-dashboard verification.

test_processor.py (+23):
- Multi-part message flatten: tool_call, tool_call_response, reasoning, mixed text+tool_call
- graph.node.* per kind (LLM/AGENT/TOOL + workflow/executor/edge_group fallback chain for CHAIN)
- gen_ai.request.parameters bundle (collected, excludes model, skipped when empty)
- Cross-batch bubble-up + per-trace state isolation
- embeddings -> EMBEDDING kind, create_agent -> AGENT kind
- Smart formatting branches (plain text vs JSON for input.value/output.value)
- Status is never set by the processor (Phase 0 finding)
- Defensive: span with None attributes, span with MappingProxyType attrs

test_integration.py (+7):
- FI's default exporter processor survives our install (the silent-span-loss bug)
- Our processor is prepended so synchronous downstream processors see mutated attrs
- User's explicit enable_sensitive_data=True is not clobbered (the silent-message-loss bug)
- Native instrumentation skipped gracefully when agent_framework is not installed
@SuhaniNagpal7 SuhaniNagpal7 marked this pull request as ready for review May 26, 2026 09:01
…ut values

Drop the plain-text downgrade in _surface_messages_io. input.value and
output.value now always mirror the raw gen_ai.input.messages /
gen_ai.output.messages JSON blob with mime_type=application/json,
preserving Microsoft's native parts shape (tool_call / tool_call_response
/ text). Matches the openai/anthropic/litellm sibling adapters' pattern.

Removes the _all_text_message_content helper that was only used by the
dropped path. Updates 5 tests to assert the JSON-always behavior.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant