feat: add execute_tool API endpoint for direct tool execution on conversations by xingyaoww · Pull Request #2356 · OpenHands/software-agent-sdk

xingyaoww · 2026-03-07T06:39:49Z

Description

Add the ability to execute tools (like the terminal) directly on a conversation without going through the agent loop. This enables pre-run setup operations like running .openhands/setup.sh through the agent's persistent terminal session so environment changes persist.

Problem

In V1, setup.sh was executed via AsyncRemoteWorkspace.execute_command() which runs commands in an ephemeral subprocess — environment changes (exported variables, sourced files, PATH modifications, etc.) don't persist to the agent's terminal session. This made setup.sh effectively broken in V1 compared to V0.

Solution

Add a new POST /api/conversations/{id}/execute_tool endpoint that allows executing a tool directly on a conversation. The endpoint:

Lazy-initializes the agent and its tools (via _ensure_agent_ready())
Validates the action against the tool's schema
Executes through LocalConversation.execute_tool()
Returns the observation as JSON

Changes

Agent Server:

models.py: Added ExecuteToolRequest / ExecuteToolResponse models
event_service.py: Added EventService.execute_tool() — runs a tool through LocalConversation.execute_tool() in a thread executor
conversation_service.py: Added ConversationService.execute_tool() — delegates to EventService
conversation_router.py: Added POST /conversations/{id}/execute_tool endpoint with 404/400/500 error handling and logging

SDK:

remote_conversation.py: Implemented RemoteConversation.execute_tool() (was NotImplementedError) — calls the new server endpoint with proper observation deserialization
async_remote_workspace.py: Added AsyncRemoteWorkspace.execute_tool() convenience method

Companion PR

This PR is used by the OpenHands companion PR: OpenHands/OpenHands (feat/setup-sh-via-terminal-tool) which restructures the V1 startup flow to run setup.sh through the terminal tool.

Testing

Unit tests for RemoteConversation.execute_tool() covering success, error, and deserialization paths
No breaking changes to existing APIs (verified by API breakage CI checks)

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.13-nodejs22-slim`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:1df7714-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-1df7714-python \
  ghcr.io/openhands/agent-server:1df7714-python

All tags pushed for this build

ghcr.io/openhands/agent-server:1df7714-golang-amd64
ghcr.io/openhands/agent-server:1df7714-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:1df7714-golang-arm64
ghcr.io/openhands/agent-server:1df7714-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:1df7714-java-amd64
ghcr.io/openhands/agent-server:1df7714-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:1df7714-java-arm64
ghcr.io/openhands/agent-server:1df7714-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:1df7714-python-amd64
ghcr.io/openhands/agent-server:1df7714-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:1df7714-python-arm64
ghcr.io/openhands/agent-server:1df7714-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:1df7714-golang
ghcr.io/openhands/agent-server:1df7714-java
ghcr.io/openhands/agent-server:1df7714-python

About Multi-Architecture Support

Each variant tag (e.g., 1df7714-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., 1df7714-python-amd64) are also available if needed

…ersations Add the ability to execute tools (like the terminal) directly on a conversation without going through the agent loop. This enables pre-run setup operations like running .openhands/setup.sh through the agent's persistent terminal session so environment changes persist. Changes: - Add ExecuteToolRequest/Response models - Add EventService.execute_tool() method - Add ConversationService.execute_tool() method - Add POST /conversations/{id}/execute_tool endpoint - Implement RemoteConversation.execute_tool() (was NotImplementedError) - Add AsyncRemoteWorkspace.execute_tool() convenience method Co-authored-by: openhands <openhands@all-hands.dev>

github-actions · 2026-03-07T06:40:13Z

Python API breakage checks — ✅ PASSED

Result: ✅ PASSED

Action log

github-actions · 2026-03-07T06:40:31Z

REST API breakage checks (OpenAPI) — ✅ PASSED

Result: ✅ PASSED

Action log

xingyaoww · 2026-03-10T16:31:27Z

@OpenHands please fix https://github.com/OpenHands/software-agent-sdk/actions/runs/22794005415/job/66125654112?pr=2356 and https://github.com/OpenHands/software-agent-sdk/actions/runs/22794005420/job/66125654134?pr=2356

Then help me review the PR and test if this PR is working - When you are done, upload artifacts to .pr/ folder and push to the PR

openhands-ai · 2026-03-10T16:31:51Z

I'm on it! xingyaoww can track my progress at all-hands.dev

- Fix pyright errors in event_service.py by capturing _conversation in a local variable before the closure (type narrowing) - Fix Observation ABC instantiation in remote_conversation.py by using a concrete _RemoteObservation subclass - Update test to verify new execute_tool implementation behavior - Fix line-too-long issues in docstrings and field descriptions Co-authored-by: openhands <openhands@all-hands.dev>

github-actions · 2026-03-10T16:51:57Z

📁 PR Artifacts Notice

This PR contains a .pr/ directory with PR-specific documents. This directory will be automatically removed when the PR is approved.

For fork PRs: Manual removal is required before merging.

openhands-ai · 2026-03-10T16:52:32Z

Summary of Work Done

I fixed both failing CI checks on PR #2356, reviewed the PR, functionally tested the new endpoint, and pushed everything (including .pr/ artifacts) to the feat/execute-tool-api branch.

CI Fixes (commit `045262b5`, already pushed)

Pyright type errors in event_service.py — Captured self._conversation in a local variable conversation before the nested _execute() closure, so pyright can narrow the type from LocalConversation | None to LocalConversation.
Observation ABC instantiation in remote_conversation.py — Observation is abstract and can't be instantiated directly. Created a private concrete _RemoteObservation(Observation) subclass inside execute_tool().
Updated stale test — Replaced test_remote_conversation_execute_tool_not_implemented (which expected NotImplementedError) with test_remote_conversation_execute_tool that mocks the API response and verifies the observation is correctly parsed.
Line-too-long fixes — Fixed E501 in event_service.py docstring and models.py field description.

Functional Testing

Started the agent server and verified:

✅ Terminal tool execution returns correct output
✅ Environment variables persist across calls (the core setup.sh use case)
✅ Nonexistent tool → 400 with helpful error listing available tools
✅ Nonexistent conversation → 404

Artifacts

Detailed review notes uploaded to .pr/review.md.

Checklist

Fix pre-commit CI failure (pyright errors)
Fix sdk-tests CI failure (test + ABC bug)
Review the PR code
Functionally test the endpoint
Upload artifacts to .pr/ folder
Push to the PR branch

All changes are already pushed to origin/feat/execute-tool-api.

github-actions · 2026-03-10T17:00:34Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
openhands-agent-server/openhands/agent_server
conversation_router.py	160	22	86%	292, 382–385, 397–400, 422–423, 426–432, 436–438, 475
conversation_service.py	498	124	75%	141–142, 169, 172, 174, 181–187, 215, 222, 243, 342, 348, 353, 359, 367–368, 377–380, 389, 403–405, 412, 445–446, 485, 488, 505–509, 511–512, 515–516, 519–524, 621, 628–632, 635–636, 640–644, 647–648, 657–662, 666–670, 673–674, 696–697, 701–702, 704–706, 708, 711, 719–723, 726, 733–738, 740–741, 747–752, 759–760, 764, 766–767, 772–773, 779–780, 787–788, 792–794, 812, 836, 1114, 1117
event_service.py	364	103	71%	56–57, 75–77, 86–90, 93–96, 116, 220, 237, 278, 288, 312–313, 317, 325, 328, 375–376, 378–379, 381, 383–387, 391–393, 395, 408–409, 425, 427, 431–433, 437, 446–447, 449, 453, 459, 461, 491–496, 641, 643–644, 648, 662–664, 666, 670–673, 677–680, 688–691, 711–712, 714–721, 723–724, 733–734, 736–737, 744–745, 747–748, 768, 774, 780, 789–790
openhands-sdk/openhands/sdk/conversation/impl
remote_conversation.py	652	101	84%	78, 80, 147, 174, 187, 189–192, 202, 224–225, 230–233, 316, 326–328, 334, 375, 522–525, 527, 553–557, 562–565, 568, 584, 745–746, 750–751, 765, 789–790, 809, 820–821, 841–844, 846–847, 871–873, 876–880, 882–883, 887, 889–897, 899, 936, 1067, 1139–1140, 1144, 1149–1153, 1159–1165, 1178, 1183, 1218, 1274, 1281, 1287–1288, 1466–1467
openhands-sdk/openhands/sdk/workspace/remote
async_remote_workspace.py	68	16	76%	25–30, 132–134, 148–150, 175, 179, 184–185
TOTAL	23645	5718	75%

xingyaoww · 2026-03-13T12:45:39Z

@OpenHands please review this PR, can you test whether this works in the https://github.com/OpenHands/software-agent-sdk/blob/main/examples/02_remote_agent_server/01_convo_with_local_agent_server.py?

We should able to programmatically access it (eg conversation.execute_tool) and make sure it works. When it actually works, please modify the appropriate docs in OpenHands/docs repo and send a PR

openhands-ai · 2026-03-13T12:46:02Z

I'm on it! xingyaoww can track my progress at all-hands.dev

all-hands-bot · 2026-03-29T12:22:24Z

[Automatic Post]: It has been a while since there was any activity on this PR. @xingyaoww, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up.

all-hands-bot · 2026-04-12T12:25:32Z

[Automatic Post]: It has been a while since there was any activity on this PR. @xingyaoww, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up.

xingyaoww · 2026-04-20T13:47:32Z

@OpenHands Merge from main, resolve the conflict, and then /iterate until this PR is ready. You should keep iterating until the PRReviewBot actually approves this PR or provides no additional comments.

openhands-ai · 2026-04-20T13:47:47Z

I'm on it! xingyaoww can track my progress at all-hands.dev

Resolve conflicts in conversation_router.py and models.py: - Keep both execute_tool and fork endpoints in conversation_router.py - Keep both ExecuteToolRequest/Response and AgentResponseResult in models.py Co-authored-by: openhands <openhands@all-hands.dev>

all-hands-bot

Taste Rating: 🟡 Acceptable - Solid design but needs refinement on error handling and eval risk consideration.

This PR solves a real problem (environment persistence for setup.sh) with a straightforward architectural approach. However, it introduces a new tool execution path that bypasses the agent loop, which has implications for benchmark behavior and requires human review before merging.

all-hands-bot · 2026-04-20T14:03:24Z

Detailed Review

Architecture ✅

The layered approach is clean and follows existing patterns:

Router → ConversationService → EventService → LocalConversation
Lazy initialization via _ensure_agent_ready() is appropriate
Thread executor pattern for blocking I/O is correct

Testing Observations 🟢

Test migration from NotImplementedError to real implementation is good
Mock-based testing is acceptable for SDK client unit tests
The test verifies the happy path but doesn't cover error scenarios (tool not found, execution failure) - consider adding those cases

Key Design Insight 💡

The core insight here is solid: executing through the agent's persistent terminal session (via the tool executor) rather than ephemeral subprocesses correctly solves the setup.sh environment persistence problem. This is the right architectural choice.

Minor Observation

In conversation_router.py, the except KeyError followed by except Exception is technically redundant (Exception already catches KeyError), but it serves as documentation of expected error types - this is fine.

[RISK ASSESSMENT]

Overall PR: ⚠️ Risk Assessment: 🟡 MEDIUM

Rationale:

New Execution Path: This adds a tool execution path that bypasses the agent loop. While the implementation is sound, this is a significant behavioral change.
Agent Environment Modification: The primary use case (setup.sh via terminal) directly modifies the agent's environment state, which could affect agent behavior in subsequent runs.
Benchmark/Eval Impact: Changes to environment setup and tool execution paths can affect benchmark performance and agent decision-making in subtle ways.

Recommendation: Per the repository's eval risk policy, PRs that change agent behavior or could affect benchmark performance should not be auto-approved. This PR falls squarely into that category. A human maintainer should:

Review the architectural decision to bypass the agent loop
Consider running lightweight evals (SWE-bench Lite or similar) to verify no unexpected performance regression
Validate that the use case (setup.sh persistence) justifies the new execution model

Verdict: ❌ Needs human review before merging (not due to code quality, but due to eval risk policy)

- Separate error handling: KeyError/ValueError → 400, other exceptions → 500 - Add logger for unexpected server errors - Document expected observation response format in remote_conversation.py - Simplify observation text extraction to use only the content array format Co-authored-by: openhands <openhands@all-hands.dev>

xingyaoww · 2026-04-20T14:07:16Z

Addressed review feedback in d6c5a65. Ready for another look.

all-hands-bot

Taste Rating: 🟢 Good taste - Clean implementation following existing patterns

Summary

The code quality is excellent. The layered architecture is clean, error handling properly separates client/server errors, and the implementation follows repository conventions. All previously flagged issues (observation extraction, error handling) have been resolved.

Evidence: The .pr/review.md document shows thorough manual testing including the critical environment persistence use case.

Eval Risk Policy Note

Per repository guidelines, this PR changes tool execution behavior (adds direct execution path bypassing agent loop) and requires human review + lightweight evals before merge. While the code is solid, the behavioral change could affect benchmark results.

[RISK ASSESSMENT]

[Overall PR] ⚠️ Risk Assessment: 🟡 MEDIUM

This PR introduces a new tool execution path that bypasses the standard agent loop. While architecturally sound and solving a real problem (setup.sh environment persistence), it represents a behavioral change that could affect evaluation/benchmark results. The implementation itself is low-risk (good error handling, follows patterns, proper testing), but the feature's impact on agent behavior places it in medium-risk territory.

Recommendation: Human maintainer should verify via lightweight evals that this doesn't affect benchmark performance before merging. The code itself is merge-ready from a quality perspective.

VERDICT:
✅ Code quality approved, awaiting eval verification: Implementation is solid and well-tested. Needs human review for benchmark impact assessment per repo policy.

KEY INSIGHT:
Solves the real environment persistence problem with a straightforward architectural approach. The key insight—executing through the agent's persistent terminal session rather than ephemeral subprocesses—correctly addresses the core issue.

all-hands-bot

✅ QA Report: PASS

Verified: The PR successfully delivers its stated goal — executing tools through the agent's persistent terminal session so environment changes persist.

Does this PR achieve its stated goal?

Yes. The PR set out to fix a critical problem: in V1, setup.sh was executed via AsyncRemoteWorkspace.execute_command() which runs commands in ephemeral subprocesses, so environment variable exports, sourced files, and PATH modifications don't persist to the agent's terminal session. This PR adds a POST /api/conversations/{id}/execute_tool endpoint that executes tools directly through the conversation's tool executor (the persistent terminal session), solving the persistence problem.

Evidence: I created a conversation with the terminal tool, then executed export MY_SETUP_VAR=setup_complete via the new endpoint. In a subsequent call, echo $MY_SETUP_VAR correctly returned setup_complete, proving the variable persisted across calls. This is exactly what setup.sh needs — environment changes that survive beyond a single command execution.

Phase	Result
Environment Setup	✅ Server running (localhost:8000), dependencies installed
CI & Tests	✅ 22/26 pass (agent-server-tests, sdk-tests, pre-commit, check, REST API all pass), 4 pending build jobs
Functional Verification	✅ All core behaviors verified

Functional Verification

Test 1: Basic tool execution

Step 1 — Create a conversation with terminal tool:

curl -X POST http://localhost:8000/api/conversations \
  -H "Content-Type: application/json" \
  -d '{"agent": {"llm": {"model": "gpt-4o-mini"}, "tools": [{"name": "terminal"}]}, "workspace": {"type": "local", "working_dir": "/tmp/test_workspace"}}'

Got conversation ID: bae8b192-0e88-40f8-ae6d-00812b6f1899

Step 2 — Execute a simple command via new endpoint:

curl -X POST "http://localhost:8000/api/conversations/bae8b192-0e88-40f8-ae6d-00812b6f1899/execute_tool" \
  -H "Content-Type: application/json" \
  -d '{"tool_name": "terminal", "action": {"command": "echo hello world && pwd"}}'

Response:

{
  "observation": {
    "content": [{"text": "hello world\n/tmp/test_workspace"}],
    "exit_code": 0,
    "is_error": false
  },
  "is_error": false
}

✅ Command executed successfully through the persistent terminal session.

Test 2: Environment variable persistence (CORE USE CASE)

Step 1 — Set an environment variable:

curl -X POST "http://localhost:8000/api/conversations/$CONV_ID/execute_tool" \
  -H "Content-Type: application/json" \
  -d '{"tool_name": "terminal", "action": {"command": "export MY_SETUP_VAR=setup_complete"}}'

Exit code: 0 ✅

Step 2 — Read the variable in a subsequent call:

curl -X POST "http://localhost:8000/api/conversations/$CONV_ID/execute_tool" \
  -H "Content-Type: application/json" \
  -d '{"tool_name": "terminal", "action": {"command": "echo $MY_SETUP_VAR"}}'

Output: setup_complete

✅ CRITICAL VERIFICATION PASSED: The environment variable persisted across calls! This proves the endpoint executes through the agent's persistent terminal session, not an ephemeral subprocess. This is exactly what the PR set out to achieve — enabling setup.sh to export variables that remain available in the agent's session.

Test 3: Error handling — nonexistent tool

curl -X POST "http://localhost:8000/api/conversations/$CONV_ID/execute_tool" \
  -d '{"tool_name": "nonexistent_tool", "action": {"command": "echo test"}}'

Response:

{"detail":"\"Tool 'nonexistent_tool' not found. Available tools: ['terminal', 'finish', 'think']\""}

✅ Helpful error message listing available tools.

Test 4: Error handling — nonexistent conversation

curl -X POST "http://localhost:8000/api/conversations/00000000-0000-0000-0000-000000000000/execute_tool" \
  -d '{"tool_name": "terminal", "action": {"command": "echo test"}}'

Response:

{"detail":"Not Found"}

✅ Returns 404 as expected.

Test 5: Error handling — invalid action parameters

curl -X POST "http://localhost:8000/api/conversations/$CONV_ID/execute_tool" \
  -d '{"tool_name": "terminal", "action": {"invalid_field": "value"}}'

Response:

{
  "detail": "2 validation errors for TerminalAction\ncommand\n  Field required...\ninvalid_field\n  Extra inputs are not permitted..."
}

✅ Pydantic validation catches schema mismatches with clear error messages.

Issues Found

None. The implementation is solid and achieves its stated goal.

Architecture Assessment

The layered approach is clean and consistent with existing patterns:

Router (conversation_router.py) → ConversationService → EventService → LocalConversation.execute_tool()
_ensure_agent_ready() handles lazy initialization of tools
Error handling covers tool-not-found, conversation-not-found, and validation errors
The SDK's RemoteConversation.execute_tool() (previously NotImplementedError) now calls the server endpoint
AsyncRemoteWorkspace.execute_tool() provides a convenience method for workspace-level usage

The key insight — executing through the agent's persistent terminal session rather than ephemeral subprocesses — correctly addresses the setup.sh environment persistence issue that the PR description identifies.

openhands-ai · 2026-04-20T14:21:23Z

Summary of Work Done

Request Checklist

✅ Merge from main: Merged origin/main into feat/execute-tool-api and resolved 2 merge conflicts:
- conversation_router.py: Kept both the PR's execute_tool endpoint and main's new fork endpoint, plus both sets of model imports
- models.py: Kept both ExecuteToolRequest/ExecuteToolResponse and main's new AgentResponseResult
✅ Resolve conflicts: Both conflicts cleanly resolved with all code from both sides preserved
✅ /iterate until PRReviewBot approves or provides no additional comments:
- Pushed the merge, marked PR as ready-for-review to trigger the review bot
- Round 1: Review bot rated 🟡 "Acceptable" with 2 inline comments (error handling + observation extraction)
- Addressed both comments in commit d6c5a65e:
  - Separated error handling: KeyError/ValueError → 400, generic Exception → 500 with logger.exception()
  - Simplified observation extraction: removed dead text fallback, documented the content array format
- Resolved both review threads via GraphQL, re-requested review
- Round 2: Review bot rated 🟢 "Good taste - Clean implementation" with zero new inline comments
- QA bot also passed: ✅ PASS
- All 31/31 CI checks green, 0 unresolved threads

Conciseness Check

All changes are directly related to the request:

1 merge commit (conflict resolution)
1 fix commit (addressing review feedback)

No extraneous changes were introduced. The only files modified beyond the merge were conversation_router.py (error handling fix + logger) and remote_conversation.py (observation extraction cleanup) — both directly responding to reviewer feedback.

Demonstrates direct tool execution on conversations via both the SDK (RemoteConversation.execute_tool) and raw HTTP (POST endpoint). Shows that environment changes persist across calls and into the agent session. Co-authored-by: openhands <openhands@all-hands.dev>

xingyaoww · 2026-04-21T13:59:19Z

@OpenHands /codereview /github-pr-review

🤖 This comment was automatically posted by the pr-triage skill (OpenHands) on behalf of a maintainer. See PR #2906 for skill details.

openhands-ai · 2026-04-21T13:59:35Z

I'm on it! xingyaoww can track my progress at all-hands.dev

xingyaoww

🟡 Acceptable — Solid feature solving a real problem (setup.sh env persistence). Architecture follows existing layered patterns. A few things worth addressing before merge.

Note: This touches tool execution code paths, so flagging for human maintainer review before merge (per eval-risk policy).

This review was generated by an AI agent (OpenHands).

xingyaoww · 2026-04-21T14:03:22Z

+        observation_data = result.get("observation", {})
+        is_error = result.get("is_error", False)
+
+        # Server returns observation.model_dump(mode="json") which has:
+        #   {"content": [{"text": "...", "kind": "text"}, ...], "is_error": bool}
+        # Extract text from the content array.
+        text = ""
+        content = observation_data.get("content", [])
+        if content and isinstance(content, list) and len(content) > 0:
+            text = content[0].get("text", "")
+
+        # Use a concrete subclass since Observation is abstract
+        class _RemoteObservation(Observation):
+            pass
+
+        return _RemoteObservation.from_text(text=text, is_error=is_error)


🟠 Important: Data loss on deserialization — The server serializes the full typed observation (e.g., TerminalObservation with command, exit_code, timeout, metadata), but the client only extracts the first text content item and wraps it in a bare _RemoteObservation. All tool-specific fields are silently dropped.

For the setup.sh use case, this means callers can't check exit_code to know if the command succeeded or timeout to detect hung scripts. Only is_error (a coarser signal) survives.

Consider either:

Returning the raw observation dict alongside the Observation object so callers can access extra fields

Attempting to deserialize into the correct concrete type (e.g., via a discriminated union or type registry)

At minimum, concatenating all text content items instead of just content[0]

Also: the content and isinstance(content, list) and len(content) > 0 guard is redundant — if content: handles empty/None. And defining _RemoteObservation inside the method body creates a new class on every call — consider a module-level class _RemoteObservation(Observation): pass.

xingyaoww · 2026-04-21T14:03:22Z

+        def _execute():
+            # Get the tool to resolve the action type
+            conversation._ensure_agent_ready()
+            tool = conversation.agent.tools_map.get(tool_name)
+            if tool is None:
+                available_tools = list(conversation.agent.tools_map.keys())
+                raise KeyError(
+                    f"Tool '{tool_name}' not found. Available tools: {available_tools}"
+                )
+            # Validate the action dict against the tool's action type
+            action = tool.action_type.model_validate(action_dict)
+            observation = conversation.execute_tool(tool_name, action)
+            return observation.model_dump(mode="json")


🟡 Suggestion: The _execute() closure calls conversation._ensure_agent_ready() directly (line 383), then calls conversation.execute_tool() which calls _ensure_agent_ready() again internally (see local_conversation.py:1238). The explicit call here is redundant — execute_tool already handles lazy init.

Also worth noting: _ensure_agent_ready() is a private method on LocalConversation. Calling it from EventService couples this layer to the implementation detail. Consider just relying on execute_tool() to handle initialization, or extracting the tool lookup + action validation into a method on LocalConversation itself.

Suggested change

def _execute():

# Get the tool to resolve the action type

conversation._ensure_agent_ready()

tool = conversation.agent.tools_map.get(tool_name)

if tool is None:

available_tools = list(conversation.agent.tools_map.keys())

raise KeyError(

f"Tool '{tool_name}' not found. Available tools: {available_tools}"

)

# Validate the action dict against the tool's action type

action = tool.action_type.model_validate(action_dict)

observation = conversation.execute_tool(tool_name, action)

return observation.model_dump(mode="json")

def _execute():

# Validate tool exists and action matches its schema

tool = conversation.agent.tools_map.get(tool_name)

if tool is None:

# ensure_agent_ready is called by execute_tool, but we need

# the tool reference here for action validation first

conversation._ensure_agent_ready()

tool = conversation.agent.tools_map.get(tool_name)

if tool is None:

available_tools = list(conversation.agent.tools_map.keys())

raise KeyError(

f"Tool '{tool_name}' not found. Available tools: {available_tools}"

)

# Validate the action dict against the tool's action type

action = tool.action_type.model_validate(action_dict)

observation = conversation.execute_tool(tool_name, action)

return observation.model_dump(mode="json")

Actually, the simplest fix: just drop the explicit _ensure_agent_ready() call and rely on execute_tool() doing it. But then you need to move the tool lookup after execute_tool (or restructure). WDYT about having LocalConversation.execute_tool accept a dict and do the validation internally?

xingyaoww · 2026-04-21T14:03:22Z

+            conversation_id, request.tool_name, request.action
+        )
+    except KeyError as e:
+        raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail=str(e))


🟢 Acceptable: The except KeyError before except Exception isn't redundant — it correctly maps known errors to 400 vs generic errors to 500. The separation serves as documentation of expected error types. 👍

xingyaoww · 2026-04-21T14:03:22Z

+    async def execute_tool(
+        self,
+        conversation_id: str,
+        tool_name: str,
+        action: dict[str, Any],
+        timeout: float = 600.0,
+    ) -> dict[str, Any]:
+        """Execute a tool on a conversation through the agent server.
+
+        This calls the agent server's execute_tool endpoint, which runs the tool
+        through the conversation's tool executor (e.g., the persistent terminal
+        session). This is useful for pre-run setup operations like running
+        .openhands/setup.sh where environment changes need to persist.
+
+        Args:
+            conversation_id: The conversation ID (hex string)
+            tool_name: The name of the tool to execute (e.g., 'terminal')
+            action: The action parameters as a dictionary
+            timeout: Timeout in seconds
+
+        Returns:
+            The response dict with 'observation' and 'is_error' keys
+        """
+        payload = {
+            "tool_name": tool_name,
+            "action": action,
+        }
+        response = await self.client.post(
+            f"/api/conversations/{conversation_id}/execute_tool",
+            json=payload,
+            timeout=timeout,
+        )
+        response.raise_for_status()
+        return response.json()


🟡 Suggestion: Is this method used by anything in this PR or the companion PR? RemoteConversation.execute_tool() goes through _send_request (the sync httpx client), not through this async workspace method. If this is for the companion PR's AsyncRemoteWorkspace usage, it would be cleaner to add it in that PR. Otherwise it's dead code that might drift out of sync with the actual endpoint contract.

xingyaoww · 2026-04-21T14:03:22Z

+    async def execute_tool(self, tool_name: str, action_dict: dict) -> dict:
+        """Execute a tool directly on the conversation.
+
+        Bypasses the agent loop. Useful for pre-run setup operations
+        like running .openhands/setup.sh through the agent's terminal
+        tool so environment changes persist in the agent's session.
+
+        Args:
+            tool_name: The name of the tool to execute (e.g., 'terminal')
+            action_dict: The action parameters as a dictionary
+
+        Returns:
+            The observation as a dictionary
+
+        Raises:
+            ValueError: If the service is inactive
+            KeyError: If the tool is not found
+        """
+        if not self._conversation:
+            raise ValueError("inactive_service")
+
+        conversation = self._conversation
+        loop = asyncio.get_running_loop()
+
+        def _execute():
+            # Get the tool to resolve the action type
+            conversation._ensure_agent_ready()
+            tool = conversation.agent.tools_map.get(tool_name)
+            if tool is None:
+                available_tools = list(conversation.agent.tools_map.keys())
+                raise KeyError(
+                    f"Tool '{tool_name}' not found. Available tools: {available_tools}"
+                )
+            # Validate the action dict against the tool's action type
+            action = tool.action_type.model_validate(action_dict)
+            observation = conversation.execute_tool(tool_name, action)
+            return observation.model_dump(mode="json")
+
+        return await loop.run_in_executor(None, _execute)


🟡 Suggestion: Concurrency with run() — If a caller hits execute_tool while run() is in progress (the agent is actively using the terminal), both will attempt to execute tool commands concurrently on the same conversation. _ensure_agent_ready is thread-safe, but the tool executor itself (e.g., terminal session) may not handle concurrent commands gracefully.

Consider documenting that execute_tool is intended for pre-run setup only and may produce undefined behavior if called concurrently with run(). Or acquire the state lock to serialize access.

openhands-ai · 2026-04-21T14:04:24Z

Summary

I completed the code review for PR #2356 ("feat: add execute_tool API endpoint for direct tool execution on conversations") as requested by the /codereview /github-pr-review comment.

What was done

I posted a single structured GitHub PR review with 5 inline comments at: #2356 (review)

Review findings

🟡 Acceptable overall — solid feature solving a real problem (setup.sh env persistence via persistent terminal session)
🟠 Data loss in RemoteConversation.execute_tool() — server serializes full TerminalObservation (with exit_code, timeout, command, metadata) but client only extracts first text content item, silently dropping tool-specific fields
🟡 Redundant _ensure_agent_ready() call in event_service.py (already called internally by execute_tool)
🟡 Potentially unused AsyncRemoteWorkspace.execute_tool() — not referenced by this PR's code paths
🟡 Concurrency concern — no protection against concurrent execute_tool + run() on the same conversation
🟢 Acceptable error handling pattern in the router

Per the repo's custom codereview guide (eval-risk policy), I submitted a COMMENT review (not APPROVE) since this PR touches tool execution code paths that could affect benchmark performance.

Checklist

Retrieved and analyzed the full PR diff
Reviewed all 9 changed files across agent-server, SDK, and tests
Checked thread safety, data flow, and architectural conventions per AGENTS.md
Posted review using GitHub API with inline comments per /github-pr-review skill format
Used priority labels (🔴🟠🟡🟢) on every comment
Applied eval-risk policy (COMMENT, not APPROVE)
Included AI disclosure in review body

No code changes were made

This was a review-only task — no commits, no code modifications. Only a GitHub review was posted.

xingyaoww mentioned this pull request Mar 7, 2026

DRAFT: feat: run .openhands/setup.sh via agent's terminal tool in V1 OpenHands/OpenHands#13285

Draft

Merge branch 'main' into feat/execute-tool-api

874d2cd

xingyaoww marked this pull request as ready for review April 20, 2026 14:00

all-hands-bot reviewed Apr 20, 2026

View reviewed changes

Comment thread openhands-agent-server/openhands/agent_server/conversation_router.py

Comment thread openhands-sdk/openhands/sdk/conversation/impl/remote_conversation.py Outdated

xingyaoww requested a review from all-hands-bot April 20, 2026 14:07

all-hands-bot reviewed Apr 20, 2026

View reviewed changes

Merge branch 'main' into feat/execute-tool-api

a59b48d

xingyaoww changed the title ~~DRAFT: feat: add execute_tool API endpoint for direct tool execution on conversations~~ feat: add execute_tool API endpoint for direct tool execution on conversations Apr 20, 2026

xingyaoww commented Apr 21, 2026

View reviewed changes

Conversation

xingyaoww commented Mar 7, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Problem

Solution

Changes

Companion PR

Testing

Uh oh!

github-actions Bot commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Python API breakage checks — ✅ PASSED

Uh oh!

github-actions Bot commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

REST API breakage checks (OpenAPI) — ✅ PASSED

Uh oh!

xingyaoww commented Mar 10, 2026

Uh oh!

openhands-ai Bot commented Mar 10, 2026

Uh oh!

github-actions Bot commented Mar 10, 2026

Uh oh!

openhands-ai Bot commented Mar 10, 2026

Summary of Work Done

CI Fixes (commit 045262b5, already pushed)

Functional Testing

Artifacts

Checklist

Uh oh!

github-actions Bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xingyaoww commented Mar 13, 2026

Uh oh!

openhands-ai Bot commented Mar 13, 2026

Uh oh!

all-hands-bot commented Mar 29, 2026

Uh oh!

all-hands-bot commented Apr 12, 2026

Uh oh!

xingyaoww commented Apr 20, 2026

Uh oh!

openhands-ai Bot commented Apr 20, 2026

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

all-hands-bot commented Apr 20, 2026

Detailed Review

Architecture ✅

Testing Observations 🟢

Key Design Insight 💡

Minor Observation

[RISK ASSESSMENT]

Uh oh!

xingyaoww commented Apr 20, 2026

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Summary

Eval Risk Policy Note

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

✅ QA Report: PASS

Does this PR achieve its stated goal?

Test 1: Basic tool execution

Test 2: Environment variable persistence (CORE USE CASE)

Test 3: Error handling — nonexistent tool

Test 4: Error handling — nonexistent conversation

Test 5: Error handling — invalid action parameters

Issues Found

Architecture Assessment

Uh oh!

openhands-ai Bot commented Apr 20, 2026

Summary of Work Done

xingyaoww commented Mar 7, 2026 •

edited by github-actions Bot

Loading

github-actions Bot commented Mar 7, 2026 •

edited

Loading

github-actions Bot commented Mar 7, 2026 •

edited

Loading

CI Fixes (commit `045262b5`, already pushed)

github-actions Bot commented Mar 10, 2026 •

edited

Loading

xingyaoww commented Apr 21, 2026 •

edited

Loading