Skip to content

docs(simulator): updated tool_simulator docs#752

Open
poshinchen wants to merge 1 commit intostrands-agents:mainfrom
poshinchen:docs/simulators
Open

docs(simulator): updated tool_simulator docs#752
poshinchen wants to merge 1 commit intostrands-agents:mainfrom
poshinchen:docs/simulators

Conversation

@poshinchen
Copy link
Copy Markdown
Contributor

Description

Added toolSimulator related docs

Related Issues

N/A

Type of Change

  • New content
  • Content update/revision

Checklist

  • I have read the CONTRIBUTING document
  • My changes follow the project's documentation style
  • I have tested the documentation locally using npm run dev
  • Links in the documentation are valid and working

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@github-actions
Copy link
Copy Markdown
Contributor

Documentation Preview Ready

Your documentation preview has been successfully deployed!

Preview URL: https://d3ehv1nix5p99z.cloudfront.net/pr-cms-752/docs/user-guide/quickstart/overview/

Updated at: 2026-04-10T15:36:53.987Z

## Overview

Simulators enable dynamic, multi-turn evaluation of conversational agents by generating realistic interaction patterns. Unlike static evaluators that assess single outputs, simulators actively participate in conversations, adapting their behavior based on agent responses to create authentic evaluation scenarios.
Simulators enable dynamic evaluation of agents by generating realistic interaction patterns. Unlike static evaluators that assess single outputs, simulators actively participate in the evaluation loop — driving multi-turn conversations or generating realistic tool responses to create authentic evaluation scenarios.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-blocker: I would like to try and avoid em-dashes if possible? Its basically shouting that its llm generated (not a bad thing, but would just like to lean away from these obvious signals).

- You want to test agent tool-use patterns without side effects
- Tools are still under development or unavailable in the test environment

```python
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this example, its not obvious how the simulated tool would know what the weather is in seattle. Can you show how you pass in context to the simulated tool in this case?

Comment on lines +38 to +45
## Key Features

- **Decorator-Based Registration**: Register tools with `@tool_simulator.tool()` using familiar function signatures and docstrings
- **Schema-Validated Responses**: Pydantic output schemas ensure structured, consistent responses from the LLM
- **Shared State**: Related tools share call history and context via `share_state_id`
- **Stateful Context**: Initial state descriptions and call history are included in LLM prompts for consistent multi-call sequences
- **Drop-in Replacement**: Simulated tools plug directly into Strands `Agent` via `get_tool()`
- **Bounded Call Cache**: FIFO eviction keeps memory usage predictable for long-running evaluations
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are all covered in below sections, and the page already has a table of contents. I dont think we need this.

Comment on lines +56 to +78
### Registering a Tool

Define a function with type hints and a docstring, then decorate it with `@tool_simulator.tool()`. Provide an `output_schema` to control the response structure:

```python
from typing import Any
from pydantic import BaseModel, Field
from strands_evals.simulation.tool_simulator import ToolSimulator

tool_simulator = ToolSimulator()

class OrderStatus(BaseModel):
order_id: str = Field(..., description="Order identifier")
status: str = Field(..., description="Current order status")
estimated_delivery: str = Field(..., description="Estimated delivery date")

@tool_simulator.tool(output_schema=OrderStatus)
def check_order(order_id: str) -> dict[str, Any]:
"""Check the current status of a customer order."""
pass
```

### Attaching to an Agent
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we combine these two? I think "Attaching to an Agent" is self explanatory. We dont need a whole section describing it

reports[0].run_display()
```

## Inspecting State
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This, and the below sections, might make more sense under a broader "Advanced Usage" or something.

tool_simulator = ToolSimulator(state_registry=registry)
```

## API Reference
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have auto-generated evals api docs? If not we should

response: str # Default response when no output_schema is provided
```

## Best Practices
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these actually help anyone? When I see "Best Practices" on docs pages, i basically assume they are ai generated and provide little value. If they arent actually helpful, and just restate what is said above, we should remove them. Same goes for "Troubleshooting", this should be for explicit cases where customers commonly trip up

memory_exporter = telemetry.in_memory_exporter
tool_simulator = ToolSimulator()

class HVACResponse(BaseModel):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: Missing imports for BaseModel and Field in the "Integration with Experiments" code example. The snippet uses BaseModel and Field (line 187-190) but only imports from strands, strands_evals, and strands_evals.simulation.tool_simulator.

Suggestion: Add from pydantic import BaseModel, Field to the imports block at the top of this code example to match the pattern used in other code blocks on this page (e.g., the "Registering a Tool" example at line 62).

This is useful when:

- Real tools require live infrastructure (APIs, databases, hardware)
- You need deterministic, controllable tool behavior for evaluation
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: The term "deterministic" may be misleading here. Since the tool responses are LLM-generated, they are inherently non-deterministic — the same inputs can produce different outputs across runs. The value is "controllable" and "reproducible in character" (e.g., the LLM will return weather-like data), but not deterministic in the strict sense.

Suggestion: Consider rewording to: You need controllable tool behavior for evaluation (dropping "deterministic"), or clarifying what's meant, e.g., You need controllable, consistent tool behavior for evaluation (without external dependencies).

@github-actions
Copy link
Copy Markdown
Contributor

Assessment: Comment

Good addition to the evals SDK documentation. The ToolSimulator docs are well-structured, follow the established patterns from the user_simulation.mdx page, and provide clear code examples with progressive complexity (basic → shared state → experiment integration → troubleshooting).

Review Details
  • Accuracy: The word "deterministic" appears in both files when describing LLM-powered simulation — since responses are LLM-generated, this is misleading and should be softened to "controllable" (see inline comments).
  • Code Examples: The "Integration with Experiments" example is missing from pydantic import BaseModel, Field imports (see inline comment on tool_simulation.mdx L187). Other code examples are complete and well-structured.
  • Structure & Consistency: The page follows the same section pattern as user_simulation.mdx (Overview → Key Features → Basic Usage → Integration → Best Practices → Troubleshooting → Related Docs). The comparison table in index.mdx is a nice touch.
  • Links: All internal links (index.md, user_simulation.md, quickstart.md, goal_success_rate_evaluator.md) reference files that exist in the repo.

Nice comprehensive documentation with the shared state and troubleshooting sections — those will save users a lot of debugging time.


The `ToolSimulator` enables LLM-powered simulation of tool behavior for controlled agent evaluation. Instead of calling real tools, registered tools are executed by an LLM that generates realistic, schema-validated responses while maintaining state across calls.

This is useful when real tools require live infrastructure, when you need deterministic behavior for evaluation, or when tools are still under development.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: Same "deterministic" wording concern as in tool_simulation.mdx. Since responses are LLM-generated, "deterministic" is misleading.

Suggestion: Consider: ...when you need controllable behavior for evaluation...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants