-
Notifications
You must be signed in to change notification settings - Fork 675
Open
Description
Proposal: Governance Hooks for Claude Agent SDK
Summary
Add optional governance hooks to the Claude Agent SDK that enable policy enforcement, threat detection, and audit trails for tool calls — similar to what guardrails do for model outputs, but at the tool execution layer.
Problem
When building multi-agent systems with the Claude Agent SDK, there is no built-in mechanism to:
- Enforce tool-level policies (which tools can be called, with what arguments, how often)
- Detect threat patterns in tool arguments before execution (data exfiltration, privilege escalation)
- Score trust levels between agents for safe delegation
- Generate immutable audit trails of all tool calls and policy decisions
Proposed Design
from claude_agent_sdk import Agent, GovernancePolicy
# Define governance policy
policy = GovernancePolicy(
name="production-safe",
allowed_tools=["search", "read_file", "write_file"],
blocked_tools=["execute_shell", "delete_file"],
max_tool_calls=50,
content_filters=["no_pii", "no_secrets"],
threat_detection=True,
audit_trail=True,
)
# Apply to agent
agent = Agent(
name="research-agent",
tools=[search, read_file, write_file],
governance=policy, # Governance hooks intercept tool calls
)
Hook Points
- before_tool_call — Validate tool name and arguments against policy. Block if disallowed.
- after_tool_call — Audit the result, check for sensitive data leakage.
- on_delegation — When Agent A delegates to Agent B, verify trust score thresholds.
- on_policy_violation — Callback when a policy rule is triggered (for alerting/monitoring).
Why Not External Middleware?
External governance works but loses context:
- SDK-level hooks have access to the full agent context (conversation history, tool results, delegation chain)
- Hook execution order can be guaranteed (governance before business logic)
- Audit trails can capture the complete agent decision trace, not just tool I/O
Prior Art
- OpenAI Agents SDK has
guardrails(input/output guardrails on model responses) - Google ADK has
BasePluginwithbefore_tool_callback/after_tool_callback - PydanticAI has middleware proposals (#2885)
- We've built governance integrations for all of these: agentmesh-integrations
Context
We maintain Agent-OS and have filed similar governance proposals across the agent ecosystem:
- anthropics/skills #412 — Governance skill
- google/adk-python #4543 — GovernancePlugin
- pydantic/pydantic-ai #4335 — Governance middleware
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels