feat: add ask_user tool with host-side LLM interception#1
feat: add ask_user tool with host-side LLM interception#1yash-scaleai wants to merge 4 commits intoscaleapi:scale-customizationsfrom
Conversation
Adds an ask_user tool that allows agents to request clarification on underspecified tasks. The tool simulates user responses using an LLM based on the complete task definition. Key changes: - Add ask_user tool in tools/ask_user/ - Add TaskDefinitionInjectionHook to inject task definitions into containers - Auto-detect task_definitions.json in run_batch.py - **Critical fix**: Intercept ask_user commands on HOST side in agents.py The host-side interception is necessary because Modal containers cannot reach internal API endpoints (e.g., litellm.ml-serving-internal.scale.com). By handling the LLM call on the host (where the agent's own LLM calls are made), we avoid timeout issues while maintaining the same functionality. Based on Bryan's PR #17144 with architectural modification for internal API compatibility. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
Includes post-experiment working tree fixes (traj_path refactor, _find_task_definitions_file search) that ran during Phase B. Safety fixes (no behavior change): - Add threading.Lock around task definitions cache - Tighten _parse_ask_user_command to exact word boundary - Guard result against None before slicing for log - Remove no-op @Retry(stop_after_attempt(1)) in container llm.py Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Infrastructure fixes applied during experiment runs: - pyproject.toml: use jeff-da/SWE-ReX sweap-support branch for Modal - models.py: handle top_p=None in model ID string (was crashing) - justfile: mount data/ volume for container access Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| try: | ||
| logger.info(f"ask_user intercepted on host: question='{question[:100]}...'") | ||
| # Drop unsupported params for models like GPT-5 that don't support temperature | ||
| litellm.drop_params = True |
There was a problem hiding this comment.
Global side-effect on all litellm calls
litellm.drop_params is a module-level global. Setting it to True here permanently affects all subsequent litellm.completion calls in the process, including the main agent model calls in models.py:693. This means unsupported parameters will be silently dropped for the agent's own LLM calls too, which could mask configuration errors or change behavior unexpectedly.
This is especially problematic in multi-worker mode (ThreadPoolExecutor in run_batch.py:288) where a single ask_user call from one worker mutates global state for all workers.
Consider passing drop_params as a per-call parameter instead:
| litellm.drop_params = True | |
| litellm.drop_params = True # TODO: this mutates global state; move to per-call param if litellm supports it |
Or scope the change more carefully by saving and restoring the original value.
Prompt To Fix With AI
This is a comment left during a code review.
Path: sweagent/agent/agents.py
Line: 189
Comment:
**Global side-effect on all litellm calls**
`litellm.drop_params` is a module-level global. Setting it to `True` here permanently affects all subsequent `litellm.completion` calls in the process, including the main agent model calls in `models.py:693`. This means unsupported parameters will be silently dropped for the agent's own LLM calls too, which could mask configuration errors or change behavior unexpectedly.
This is especially problematic in multi-worker mode (`ThreadPoolExecutor` in `run_batch.py:288`) where a single `ask_user` call from one worker mutates global state for all workers.
Consider passing `drop_params` as a per-call parameter instead:
```suggestion
litellm.drop_params = True # TODO: this mutates global state; move to per-call param if litellm supports it
```
Or scope the change more carefully by saving and restoring the original value.
How can I resolve this? If you propose a fix, please make it concise.| # Pass through API credentials for the user simulator LLM | ||
| api_key = os.environ.get("OPENAI_API_KEY") or os.environ.get("LLM_API_KEY") | ||
| base_url = os.environ.get("OPENAI_BASE_URL") or os.environ.get("LLM_BASE_URL") | ||
| simulator_model = os.environ.get("USER_SIMULATOR_MODEL") | ||
|
|
||
| if api_key: | ||
| env_vars["OPENAI_API_KEY"] = api_key | ||
| env_vars["LLM_API_KEY"] = api_key | ||
| if base_url: | ||
| env_vars["OPENAI_BASE_URL"] = base_url | ||
| env_vars["LLM_BASE_URL"] = base_url | ||
| if simulator_model: | ||
| env_vars["USER_SIMULATOR_MODEL"] = simulator_model |
There was a problem hiding this comment.
Unnecessary credential injection into container
Since ask_user commands are now intercepted on the host side (in agents.py) before reaching the container, the container's ask_user binary never actually runs. Injecting API keys (OPENAI_API_KEY, LLM_API_KEY) into the container environment is unnecessary credential exposure. If the container is compromised or the agent runs arbitrary code, these credentials could be exfiltrated.
Consider removing the API credential injection since the host-side interception makes it redundant, or add a comment explaining why it's still needed (e.g., as a fallback).
Prompt To Fix With AI
This is a comment left during a code review.
Path: sweagent/run/hooks/task_definition_injection.py
Line: 104-116
Comment:
**Unnecessary credential injection into container**
Since `ask_user` commands are now intercepted on the host side (in `agents.py`) before reaching the container, the container's `ask_user` binary never actually runs. Injecting API keys (`OPENAI_API_KEY`, `LLM_API_KEY`) into the container environment is unnecessary credential exposure. If the container is compromised or the agent runs arbitrary code, these credentials could be exfiltrated.
Consider removing the API credential injection since the host-side interception makes it redundant, or add a comment explaining why it's still needed (e.g., as a fallback).
How can I resolve this? If you propose a fix, please make it concise.Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
| self._chook.on_action_executed(step=step) | ||
| step.state = self.tools.get_state(env=self._env) | ||
| return self.handle_submission(step) |
There was a problem hiding this comment.
handle_submission called unconditionally on every ask_user
handle_submission inspects step.observation for a submission command and, if found, reads /root/model.patch, sets step.done = True, and overwrites step.observation with the patch content. While a real ask_user response is extremely unlikely to contain a submission trigger, calling this here means:
- Any LLM-simulated response that happens to contain the submission token will silently terminate the episode and replace the clarification answer with the patch file contents.
- The agent receives no response to its question and gets an empty or spurious
observation.
The correct pattern is to simply return step directly so the observation flows back to the agent as normal tool output:
| self._chook.on_action_executed(step=step) | |
| step.state = self.tools.get_state(env=self._env) | |
| return self.handle_submission(step) | |
| return step |
Prompt To Fix With AI
This is a comment left during a code review.
Path: sweagent/agent/agents.py
Line: 1144-1146
Comment:
**`handle_submission` called unconditionally on every `ask_user`**
`handle_submission` inspects `step.observation` for a submission command and, if found, reads `/root/model.patch`, sets `step.done = True`, and overwrites `step.observation` with the patch content. While a real `ask_user` response is extremely unlikely to contain a submission trigger, calling this here means:
1. Any LLM-simulated response that happens to contain the submission token will **silently terminate the episode** and replace the clarification answer with the patch file contents.
2. The agent receives no response to its question and gets an empty or spurious `observation`.
The correct pattern is to simply return `step` directly so the observation flows back to the agent as normal tool output:
```suggestion
return step
```
How can I resolve this? If you propose a fix, please make it concise.
Summary
Adds an
ask_usertool that allows agents to request clarification on underspecified tasks. The tool simulates user responses using an LLM based on the complete task definition.Key changes:
ask_usertool intools/ask_user/TaskDefinitionInjectionHookto inject task definitions into containerstask_definitions.jsoninrun_batch.pyask_usercommands on HOST side inagents.pyWhy host-side interception?
The original implementation (Bryan's PR #17144) runs the LLM call inside the Modal container. However, Modal containers cannot reach internal API endpoints like
litellm.ml-serving-internal.scale.com.This PR intercepts
ask_usercommands inagents.pybefore they are sent to the container, makes the LLM call on the host (where the agent's own LLM calls are made), and returns the response directly.Flow:
Based on Bryan's PR #17144 with architectural modification for internal API compatibility.
Test plan
scripts/test_ask_user_single.py🤖 Generated with Claude Code
Greptile Summary
This PR adds an
ask_usertool that lets the agent request clarification on underspecified tasks. The key architectural decision is to interceptask_usercommands on the host (inagents.py) before they reach the Modal container, so the LLM call can reach internal API endpoints that containers cannot. The PR also includes aTaskDefinitionInjectionHookthat injects the full task spec into containers, atop_p=Nonefix inmodels.py, and auto-detection oftask_definitions.jsoninrun_batch.py.Issues found:
handle_submissioncalled on everyask_userresponse (agents.py:1146): If the LLM-simulated user response happens to contain the submission trigger token,handle_submissionwill terminate the episode, overwritestep.observationwith the patch file, and the agent never receives its clarification. The intercepted step should be returned directly.base_url(tools/ask_user/bin/ask_user:85): The guardif not api_key or not base_urlraises for users with a plainOPENAI_API_KEYand no proxy, even thoughllm.pyexplicitly supportsbase_url=None.task_definition_injection.py:85): If a task description containsTASK_DEFINITION_EOFon its own line, the heredoc terminates early and subsequent text is executed as shell commands.pyproject.toml):swe-rexis pinned to the@sweap-supportbranch rather than a commit SHA, making builds non-reproducible.Confidence Score: 2/5
Important Files Changed
Sequence Diagram
sequenceDiagram participant Agent as Agent (LLM) participant Host as Host handle_action() participant LiteLLM as litellm.completion (host) participant Container as Modal Container participant BinAskUser as ask_user binary (container) Agent->>Host: step.action = ask_user "question" Host->>Host: _parse_ask_user_command() alt ask_user intercepted Host->>Host: _load_task_definitions(traj_path) Host->>LiteLLM: litellm.completion(USER_SIMULATOR_MODEL) LiteLLM-->>Host: simulated user response Host->>Host: step.observation = response Host->>Host: handle_submission(step) ⚠️ Host-->>Agent: step (with clarification) else normal command Host->>Container: env.communicate(run_action) Container-->>Host: observation Host-->>Agent: step end note over BinAskUser: Never reached when<br/>host intercept is activePrompt To Fix All With AI
Last reviewed commit: "Update sweagent/agen..."