docs(simulator): updated tool_simulator docs#752
docs(simulator): updated tool_simulator docs#752poshinchen wants to merge 1 commit intostrands-agents:mainfrom
Conversation
Documentation Preview ReadyYour documentation preview has been successfully deployed! Preview URL: https://d3ehv1nix5p99z.cloudfront.net/pr-cms-752/docs/user-guide/quickstart/overview/ Updated at: 2026-04-10T15:36:53.987Z |
| ## Overview | ||
|
|
||
| Simulators enable dynamic, multi-turn evaluation of conversational agents by generating realistic interaction patterns. Unlike static evaluators that assess single outputs, simulators actively participate in conversations, adapting their behavior based on agent responses to create authentic evaluation scenarios. | ||
| Simulators enable dynamic evaluation of agents by generating realistic interaction patterns. Unlike static evaluators that assess single outputs, simulators actively participate in the evaluation loop — driving multi-turn conversations or generating realistic tool responses — to create authentic evaluation scenarios. |
There was a problem hiding this comment.
non-blocker: I would like to try and avoid em-dashes if possible? Its basically shouting that its llm generated (not a bad thing, but would just like to lean away from these obvious signals).
| - You want to test agent tool-use patterns without side effects | ||
| - Tools are still under development or unavailable in the test environment | ||
|
|
||
| ```python |
There was a problem hiding this comment.
In this example, its not obvious how the simulated tool would know what the weather is in seattle. Can you show how you pass in context to the simulated tool in this case?
| ## Key Features | ||
|
|
||
| - **Decorator-Based Registration**: Register tools with `@tool_simulator.tool()` using familiar function signatures and docstrings | ||
| - **Schema-Validated Responses**: Pydantic output schemas ensure structured, consistent responses from the LLM | ||
| - **Shared State**: Related tools share call history and context via `share_state_id` | ||
| - **Stateful Context**: Initial state descriptions and call history are included in LLM prompts for consistent multi-call sequences | ||
| - **Drop-in Replacement**: Simulated tools plug directly into Strands `Agent` via `get_tool()` | ||
| - **Bounded Call Cache**: FIFO eviction keeps memory usage predictable for long-running evaluations |
There was a problem hiding this comment.
These are all covered in below sections, and the page already has a table of contents. I dont think we need this.
| ### Registering a Tool | ||
|
|
||
| Define a function with type hints and a docstring, then decorate it with `@tool_simulator.tool()`. Provide an `output_schema` to control the response structure: | ||
|
|
||
| ```python | ||
| from typing import Any | ||
| from pydantic import BaseModel, Field | ||
| from strands_evals.simulation.tool_simulator import ToolSimulator | ||
|
|
||
| tool_simulator = ToolSimulator() | ||
|
|
||
| class OrderStatus(BaseModel): | ||
| order_id: str = Field(..., description="Order identifier") | ||
| status: str = Field(..., description="Current order status") | ||
| estimated_delivery: str = Field(..., description="Estimated delivery date") | ||
|
|
||
| @tool_simulator.tool(output_schema=OrderStatus) | ||
| def check_order(order_id: str) -> dict[str, Any]: | ||
| """Check the current status of a customer order.""" | ||
| pass | ||
| ``` | ||
|
|
||
| ### Attaching to an Agent |
There was a problem hiding this comment.
Can we combine these two? I think "Attaching to an Agent" is self explanatory. We dont need a whole section describing it
| reports[0].run_display() | ||
| ``` | ||
|
|
||
| ## Inspecting State |
There was a problem hiding this comment.
This, and the below sections, might make more sense under a broader "Advanced Usage" or something.
| tool_simulator = ToolSimulator(state_registry=registry) | ||
| ``` | ||
|
|
||
| ## API Reference |
There was a problem hiding this comment.
Do we have auto-generated evals api docs? If not we should
| response: str # Default response when no output_schema is provided | ||
| ``` | ||
|
|
||
| ## Best Practices |
There was a problem hiding this comment.
Do these actually help anyone? When I see "Best Practices" on docs pages, i basically assume they are ai generated and provide little value. If they arent actually helpful, and just restate what is said above, we should remove them. Same goes for "Troubleshooting", this should be for explicit cases where customers commonly trip up
| memory_exporter = telemetry.in_memory_exporter | ||
| tool_simulator = ToolSimulator() | ||
|
|
||
| class HVACResponse(BaseModel): |
There was a problem hiding this comment.
Issue: Missing imports for BaseModel and Field in the "Integration with Experiments" code example. The snippet uses BaseModel and Field (line 187-190) but only imports from strands, strands_evals, and strands_evals.simulation.tool_simulator.
Suggestion: Add from pydantic import BaseModel, Field to the imports block at the top of this code example to match the pattern used in other code blocks on this page (e.g., the "Registering a Tool" example at line 62).
| This is useful when: | ||
|
|
||
| - Real tools require live infrastructure (APIs, databases, hardware) | ||
| - You need deterministic, controllable tool behavior for evaluation |
There was a problem hiding this comment.
Issue: The term "deterministic" may be misleading here. Since the tool responses are LLM-generated, they are inherently non-deterministic — the same inputs can produce different outputs across runs. The value is "controllable" and "reproducible in character" (e.g., the LLM will return weather-like data), but not deterministic in the strict sense.
Suggestion: Consider rewording to: You need controllable tool behavior for evaluation (dropping "deterministic"), or clarifying what's meant, e.g., You need controllable, consistent tool behavior for evaluation (without external dependencies).
|
Assessment: Comment Good addition to the evals SDK documentation. The ToolSimulator docs are well-structured, follow the established patterns from the user_simulation.mdx page, and provide clear code examples with progressive complexity (basic → shared state → experiment integration → troubleshooting). Review Details
Nice comprehensive documentation with the shared state and troubleshooting sections — those will save users a lot of debugging time. |
|
|
||
| The `ToolSimulator` enables LLM-powered simulation of tool behavior for controlled agent evaluation. Instead of calling real tools, registered tools are executed by an LLM that generates realistic, schema-validated responses while maintaining state across calls. | ||
|
|
||
| This is useful when real tools require live infrastructure, when you need deterministic behavior for evaluation, or when tools are still under development. |
There was a problem hiding this comment.
Issue: Same "deterministic" wording concern as in tool_simulation.mdx. Since responses are LLM-generated, "deterministic" is misleading.
Suggestion: Consider: ...when you need controllable behavior for evaluation...
Description
Added toolSimulator related docs
Related Issues
N/A
Type of Change
Checklist
npm run devBy submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.