feat: add ask_user tool with host-side LLM interception by yash-scaleai · Pull Request #1 · scaleapi/SWE-agent

yash-scaleai · 2026-01-28T10:47:45Z

Summary

Adds an ask_user tool that allows agents to request clarification on underspecified tasks. The tool simulates user responses using an LLM based on the complete task definition.

Key changes:

Add ask_user tool in tools/ask_user/
Add TaskDefinitionInjectionHook to inject task definitions into containers
Auto-detect task_definitions.json in run_batch.py
Critical fix: Intercept ask_user commands on HOST side in agents.py

Why host-side interception?

The original implementation (Bryan's PR #17144) runs the LLM call inside the Modal container. However, Modal containers cannot reach internal API endpoints like litellm.ml-serving-internal.scale.com.

This PR intercepts ask_user commands in agents.py before they are sent to the container, makes the LLM call on the host (where the agent's own LLM calls are made), and returns the response directly.

Flow:

1. Agent outputs: ask_user "what's the cert validation behavior?"
2. Host intercepts command in handle_action()
3. Host makes LLM call (can reach internal API)
4. Response returned directly to agent
5. Container's ask_user binary never runs

Based on Bryan's PR #17144 with architectural modification for internal API compatibility.

Test plan

Single task test with scripts/test_ask_user_single.py
Verified ask_user interception in logs
Verified response in trajectory file

🤖 Generated with Claude Code

Greptile Summary

This PR adds an ask_user tool that lets the agent request clarification on underspecified tasks. The key architectural decision is to intercept ask_user commands on the host (in agents.py) before they reach the Modal container, so the LLM call can reach internal API endpoints that containers cannot. The PR also includes a TaskDefinitionInjectionHook that injects the full task spec into containers, a top_p=None fix in models.py, and auto-detection of task_definitions.json in run_batch.py.

Issues found:

handle_submission called on every ask_user response (agents.py:1146): If the LLM-simulated user response happens to contain the submission trigger token, handle_submission will terminate the episode, overwrite step.observation with the patch file, and the agent never receives its clarification. The intercepted step should be returned directly.
Container binary fails without base_url (tools/ask_user/bin/ask_user:85): The guard if not api_key or not base_url raises for users with a plain OPENAI_API_KEY and no proxy, even though llm.py explicitly supports base_url=None.
Shell injection via heredoc delimiter (task_definition_injection.py:85): If a task description contains TASK_DEFINITION_EOF on its own line, the heredoc terminates early and subsequent text is executed as shell commands.
Unpinned git branch dependency (pyproject.toml): swe-rex is pinned to the @sweap-support branch rather than a commit SHA, making builds non-reproducible.

Confidence Score: 2/5

Not safe to merge without addressing the handle_submission logic issue and the heredoc injection risk.
The unconditional handle_submission call on ask_user responses can silently terminate episodes and corrupt agent observations; the heredoc construction in the injection hook is exploitable if task content contains the delimiter string; and the unpinned branch dependency breaks build reproducibility.
sweagent/agent/agents.py (handle_submission logic), sweagent/run/hooks/task_definition_injection.py (heredoc shell injection), pyproject.toml (unpinned branch dependency)

Important Files Changed

Filename	Overview
sweagent/agent/agents.py	Adds host-side ask_user interception in handle_action(); contains a logic issue where handle_submission is called on the intercepted response (could prematurely end the episode if the LLM response contains a submission token) and sets litellm.drop_params globally.
tools/ask_user/bin/ask_user	Container-side ask_user binary; incorrectly requires base_url to be set even though llm.py supports base_url=None, causing failures for direct OpenAI API usage.
sweagent/run/hooks/task_definition_injection.py	New RunHook that injects task definitions into containers; contains a potential shell injection via unescaped heredoc delimiter in the cat command, and injects API credentials that are technically redundant given host-side interception.
pyproject.toml	Changes swe-rex dependency from a pinned version tag to an unpinned git branch reference, breaking reproducible builds.
sweagent/agent/models.py	Changes top_p default from 1.0 to None and fixes the model ID string to handle null top_p gracefully.
sweagent/run/run_batch.py	Auto-registers TaskDefinitionInjectionHook when task_definitions.json exists in output_dir; straightforward and safe.
tools/ask_user/bin/llm.py	Thin OpenAI client wrapper with lazy singleton; supports base_url=None but the caller in ask_user incorrectly gates on base_url.
config/tool_use_with_ask_user.yaml	New agent config that bundles the ask_user tool alongside the standard tool set.
tools/ask_user/config.yaml	Tool definition YAML for ask_user, correctly specifying signature and argument descriptions.

Sequence Diagram

sequenceDiagram
    participant Agent as Agent (LLM)
    participant Host as Host handle_action()
    participant LiteLLM as litellm.completion (host)
    participant Container as Modal Container
    participant BinAskUser as ask_user binary (container)

    Agent->>Host: step.action = ask_user "question"
    Host->>Host: _parse_ask_user_command()
    alt ask_user intercepted
        Host->>Host: _load_task_definitions(traj_path)
        Host->>LiteLLM: litellm.completion(USER_SIMULATOR_MODEL)
        LiteLLM-->>Host: simulated user response
        Host->>Host: step.observation = response
        Host->>Host: handle_submission(step) ⚠️
        Host-->>Agent: step (with clarification)
    else normal command
        Host->>Container: env.communicate(run_action)
        Container-->>Host: observation
        Host-->>Agent: step
    end

    note over BinAskUser: Never reached when<br/>host intercept is active

Prompt To Fix All With AI

This is a comment left during a code review.
Path: tools/ask_user/bin/ask_user
Line: 85-86

Comment:
**`base_url` is required but `llm.py` supports `None`**

This guard rejects calls when `base_url` is not set, but `llm.py`'s `get_openai_client()` explicitly documents and supports `base_url=None` for direct OpenAI API usage (see the `# Note: base_url can be None` comment in `llm.py`). Any deployment that uses a plain `OPENAI_API_KEY` without a proxy will hit this error path even though the underlying client would work fine.

```suggestion
    if not api_key:
        raise ValueError("LLM_API_KEY or OPENAI_API_KEY not available. Cannot generate response.")
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: pyproject.toml
Line: 60

Comment:
**Unpinned git branch breaks reproducible builds**

`@sweap-support` is a branch reference, not a commit hash. Every fresh install or CI run could pull a different commit from that branch, making builds non-reproducible and potentially silently breaking or changing behavior.

Pin to a specific commit SHA instead:

```suggestion
    "swe-rex[modal] @ git+https://github.com/jeff-da/SWE-ReX.git@<full-commit-sha>",
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: sweagent/agent/agents.py
Line: 1144-1146

Comment:
**`handle_submission` called unconditionally on every `ask_user`**

`handle_submission` inspects `step.observation` for a submission command and, if found, reads `/root/model.patch`, sets `step.done = True`, and overwrites `step.observation` with the patch content. While a real `ask_user` response is extremely unlikely to contain a submission trigger, calling this here means:

1. Any LLM-simulated response that happens to contain the submission token will **silently terminate the episode** and replace the clarification answer with the patch file contents.
2. The agent receives no response to its question and gets an empty or spurious `observation`.

The correct pattern is to simply return `step` directly so the observation flows back to the agent as normal tool output:

```suggestion
            return step
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: sweagent/run/hooks/task_definition_injection.py
Line: 85-87

Comment:
**Shell injection risk via unescaped `task_def_json`**

The task definition JSON is interpolated directly into a heredoc shell command:

```python
command = f"cat > {task_def_path} << 'TASK_DEFINITION_EOF'\n{task_def_json}\nTASK_DEFINITION_EOF"
```

If the JSON value contains the literal string `TASK_DEFINITION_EOF` on its own line (e.g., as part of a task description or code snippet), it will terminate the heredoc early and the remaining text will be interpreted as shell commands. Consider writing the file via a Python `base64`-encoded write or a temporary host-side file copy instead, or at minimum choose a highly unique delimiter.

How can I resolve this? If you propose a fix, please make it concise.

_{Last reviewed commit: "Update sweagent/agen..."}

Greptile also left 1 inline comment on this PR.

Adds an ask_user tool that allows agents to request clarification on underspecified tasks. The tool simulates user responses using an LLM based on the complete task definition. Key changes: - Add ask_user tool in tools/ask_user/ - Add TaskDefinitionInjectionHook to inject task definitions into containers - Auto-detect task_definitions.json in run_batch.py - **Critical fix**: Intercept ask_user commands on HOST side in agents.py The host-side interception is necessary because Modal containers cannot reach internal API endpoints (e.g., litellm.ml-serving-internal.scale.com). By handling the LLM call on the host (where the agent's own LLM calls are made), we avoid timeout issues while maintaining the same functionality. Based on Bryan's PR #17144 with architectural modification for internal API compatibility. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

socket-security · 2026-01-28T10:48:05Z

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff	Package	Supply Chain Security	Vulnerability	Quality	Maintenance	License
	openai@2.29.0
	swe-rex@1.2.0 ⏵ 1.4.0

View full report

@Retry

Includes post-experiment working tree fixes (traj_path refactor, _find_task_definitions_file search) that ran during Phase B. Safety fixes (no behavior change): - Add threading.Lock around task definitions cache - Tighten _parse_ask_user_command to exact word boundary - Guard result against None before slicing for log - Remove no-op @Retry(stop_after_attempt(1)) in container llm.py Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Infrastructure fixes applied during experiment runs: - pyproject.toml: use jeff-da/SWE-ReX sweap-support branch for Modal - models.py: handle top_p=None in model ID string (was crashing) - justfile: mount data/ volume for container access Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

greptile-apps

_{9 files reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-20T00:54:20Z

sweagent/agent/agents.py

+    try:
+        logger.info(f"ask_user intercepted on host: question='{question[:100]}...'")
+        # Drop unsupported params for models like GPT-5 that don't support temperature
+        litellm.drop_params = True


Global side-effect on all litellm calls

litellm.drop_params is a module-level global. Setting it to True here permanently affects all subsequent litellm.completion calls in the process, including the main agent model calls in models.py:693. This means unsupported parameters will be silently dropped for the agent's own LLM calls too, which could mask configuration errors or change behavior unexpectedly.

This is especially problematic in multi-worker mode (ThreadPoolExecutor in run_batch.py:288) where a single ask_user call from one worker mutates global state for all workers.

Consider passing drop_params as a per-call parameter instead:

Suggested change

litellm.drop_params = True

litellm.drop_params = True # TODO: this mutates global state; move to per-call param if litellm supports it

Or scope the change more carefully by saving and restoring the original value.

Prompt To Fix With AI

This is a comment left during a code review. Path: sweagent/agent/agents.py Line: 189 Comment: **Global side-effect on all litellm calls** `litellm.drop_params` is a module-level global. Setting it to `True` here permanently affects all subsequent `litellm.completion` calls in the process, including the main agent model calls in `models.py:693`. This means unsupported parameters will be silently dropped for the agent's own LLM calls too, which could mask configuration errors or change behavior unexpectedly. This is especially problematic in multi-worker mode (`ThreadPoolExecutor` in `run_batch.py:288`) where a single `ask_user` call from one worker mutates global state for all workers. Consider passing `drop_params` as a per-call parameter instead: ```suggestion litellm.drop_params = True # TODO: this mutates global state; move to per-call param if litellm supports it ``` Or scope the change more carefully by saving and restoring the original value. How can I resolve this? If you propose a fix, please make it concise.

sweagent/agent/agents.py

greptile-apps · 2026-02-20T00:54:22Z

sweagent/run/hooks/task_definition_injection.py

+            # Pass through API credentials for the user simulator LLM
+            api_key = os.environ.get("OPENAI_API_KEY") or os.environ.get("LLM_API_KEY")
+            base_url = os.environ.get("OPENAI_BASE_URL") or os.environ.get("LLM_BASE_URL")
+            simulator_model = os.environ.get("USER_SIMULATOR_MODEL")
+
+            if api_key:
+                env_vars["OPENAI_API_KEY"] = api_key
+                env_vars["LLM_API_KEY"] = api_key
+            if base_url:
+                env_vars["OPENAI_BASE_URL"] = base_url
+                env_vars["LLM_BASE_URL"] = base_url
+            if simulator_model:
+                env_vars["USER_SIMULATOR_MODEL"] = simulator_model


Unnecessary credential injection into container

Since ask_user commands are now intercepted on the host side (in agents.py) before reaching the container, the container's ask_user binary never actually runs. Injecting API keys (OPENAI_API_KEY, LLM_API_KEY) into the container environment is unnecessary credential exposure. If the container is compromised or the agent runs arbitrary code, these credentials could be exfiltrated.

Consider removing the API credential injection since the host-side interception makes it redundant, or add a comment explaining why it's still needed (e.g., as a fallback).

Prompt To Fix With AI

This is a comment left during a code review. Path: sweagent/run/hooks/task_definition_injection.py Line: 104-116 Comment: **Unnecessary credential injection into container** Since `ask_user` commands are now intercepted on the host side (in `agents.py`) before reaching the container, the container's `ask_user` binary never actually runs. Injecting API keys (`OPENAI_API_KEY`, `LLM_API_KEY`) into the container environment is unnecessary credential exposure. If the container is compromised or the agent runs arbitrary code, these credentials could be exfiltrated. Consider removing the API credential injection since the host-side interception makes it redundant, or add a comment explaining why it's still needed (e.g., as a fallback). How can I resolve this? If you propose a fix, please make it concise.

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

greptile-apps · 2026-03-19T16:46:21Z

sweagent/agent/agents.py

+            self._chook.on_action_executed(step=step)
+            step.state = self.tools.get_state(env=self._env)
+            return self.handle_submission(step)


handle_submission called unconditionally on every ask_user

handle_submission inspects step.observation for a submission command and, if found, reads /root/model.patch, sets step.done = True, and overwrites step.observation with the patch content. While a real ask_user response is extremely unlikely to contain a submission trigger, calling this here means:

Any LLM-simulated response that happens to contain the submission token will silently terminate the episode and replace the clarification answer with the patch file contents.

The agent receives no response to its question and gets an empty or spurious observation.

The correct pattern is to simply return step directly so the observation flows back to the agent as normal tool output:

Suggested change

self._chook.on_action_executed(step=step)

step.state = self.tools.get_state(env=self._env)

return self.handle_submission(step)

return step

Prompt To Fix With AI

This is a comment left during a code review. Path: sweagent/agent/agents.py Line: 1144-1146 Comment: **`handle_submission` called unconditionally on every `ask_user`** `handle_submission` inspects `step.observation` for a submission command and, if found, reads `/root/model.patch`, sets `step.done = True`, and overwrites `step.observation` with the patch content. While a real `ask_user` response is extremely unlikely to contain a submission trigger, calling this here means: 1. Any LLM-simulated response that happens to contain the submission token will **silently terminate the episode** and replace the clarification answer with the patch file contents. 2. The agent receives no response to its question and gets an empty or spurious `observation`. The correct pattern is to simply return `step` directly so the observation flows back to the agent as normal tool output: ```suggestion return step ``` How can I resolve this? If you propose a fix, please make it concise.

yash-scaleai and others added 2 commits February 20, 2026 00:50

greptile-apps bot reviewed Feb 20, 2026

View reviewed changes

Update sweagent/agent/agents.py

b2bdfc9

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

greptile-apps bot reviewed Mar 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add ask_user tool with host-side LLM interception#1

feat: add ask_user tool with host-side LLM interception#1
yash-scaleai wants to merge 4 commits intoscaleapi:scale-customizationsfrom
yash-scaleai:yash/ask-user-host-interception

yash-scaleai commented Jan 28, 2026 •

edited by greptile-apps bot

Loading

Uh oh!

socket-security bot commented Jan 28, 2026 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Feb 20, 2026

Uh oh!

Uh oh!

greptile-apps bot Feb 20, 2026

Uh oh!

greptile-apps bot Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	litellm.drop_params = True
	litellm.drop_params = True # TODO: this mutates global state; move to per-call param if litellm supports it

Conversation

yash-scaleai commented Jan 28, 2026 • edited by greptile-apps bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key changes:

Why host-side interception?

Flow:

Test plan

Greptile Summary

Confidence Score: 2/5

Important Files Changed

Sequence Diagram

Uh oh!

socket-security bot commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yash-scaleai commented Jan 28, 2026 •

edited by greptile-apps bot

Loading

socket-security bot commented Jan 28, 2026 •

edited

Loading