feat(agent-runner): add tool_choice parameter to fix empty tool calls response in "skills-like" tool call mode#7101
Conversation
… response in "skills-like" tool call mode fixes: #7049
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- The
_has_meaningful_assistant_replyhelper currently only checkscompletion_text; consider also treatingreasoning_contentor a non-emptyresult_chainas meaningful to avoid unnecessary second-stage retries when the model actually returned a useful explanation. - In
openai_source._queryand_query_stream,payloads["tool_choice"] = payloads.get("tool_choice", "auto")is effectively a no-op given howpayloadsis constructed; you can either drop this assignment or wire thetool_choiceparameter through more explicitly for clarity.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `_has_meaningful_assistant_reply` helper currently only checks `completion_text`; consider also treating `reasoning_content` or a non-empty `result_chain` as meaningful to avoid unnecessary second-stage retries when the model actually returned a useful explanation.
- In `openai_source._query` and `_query_stream`, `payloads["tool_choice"] = payloads.get("tool_choice", "auto")` is effectively a no-op given how `payloads` is constructed; you can either drop this assignment or wire the `tool_choice` parameter through more explicitly for clarity.
## Individual Comments
### Comment 1
<location path="astrbot/core/agent/runners/tool_loop_agent_runner.py" line_range="988-996" />
<code_context>
+ # If the re-query still returns no tool calls, and also does not have a meaningful assistant reply,
+ # we consider it as a failure of the LLM to follow the tool-use instruction,
+ # and we will retry once with a stronger instruction that explicitly requires the LLM to either call the tool or give an explanation.
+ if (
+ not llm_resp.tools_call_name
+ and not self._has_meaningful_assistant_reply(llm_resp)
+ ):
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Second-stage repair still enforces `tool_choice="required"`, which conflicts with allowing a no-tool explanation.
In the second-stage repair call you instruct the model that it may either call a tool or return an explanation, but you still pass `tool_choice="required"`:
```python
repair_resp = await self.provider.text_chat(
contexts=repair_contexts,
func_tool=param_subset,
model=self.req.model,
session_id=self.req.session_id,
extra_user_content_parts=self.req.extra_user_content_parts,
tool_choice="required",
abort_signal=self._abort_signal,
)
```
For Gemini/Anthropic this maps to `ANY`/`any` and for OpenAI to `required`, all of which bias strongly toward always calling a tool. This conflicts with the intent to allow a no-tool explanation and may weaken the fallback behavior. Consider using `tool_choice="auto"` (or omitting it) here so the API constraint matches the instruction that a pure explanation is allowed.
```suggestion
repair_resp = await self.provider.text_chat(
contexts=repair_contexts,
func_tool=param_subset,
model=self.req.model,
session_id=self.req.session_id,
extra_user_content_parts=self.req.extra_user_content_parts,
tool_choice="auto",
abort_signal=self._abort_signal,
)
```
</issue_to_address>
### Comment 2
<location path="astrbot/core/agent/runners/tool_loop_agent_runner.py" line_range="919-922" />
<code_context>
contexts.insert(0, {"role": "system", "content": instruction})
return contexts
+ @staticmethod
+ def _has_meaningful_assistant_reply(llm_resp: LLMResponse) -> bool:
+ text = (llm_resp.completion_text or "").strip()
+ return bool(text)
+
def _build_tool_subset(self, tool_set: ToolSet, tool_names: list[str]) -> ToolSet:
</code_context>
<issue_to_address>
**suggestion:** `_has_meaningful_assistant_reply` only looks at `completion_text`, which can misclassify valid replies.
The current implementation only inspects `completion_text`:
```python
@staticmethod
def _has_meaningful_assistant_reply(llm_resp: LLMResponse) -> bool:
text = (llm_resp.completion_text or "").strip()
return bool(text)
```
If a provider returns the assistant reply in another field (e.g. `result_chain` or message parts) and leaves `completion_text` empty, this will incorrectly treat a valid reply as missing and trigger an unnecessary second repair attempt. If `LLMResponse` guarantees that any assistant NL reply is always mirrored to `completion_text`, this is fine; otherwise, this helper should also consider other fields that can contain assistant text before deciding there was “no explanation.”
Suggested implementation:
```python
@staticmethod
def _has_meaningful_assistant_reply(llm_resp: LLMResponse) -> bool:
"""
Return True if the LLMResponse appears to contain a non‑empty natural‑language
assistant reply in any of the known fields, not just `completion_text`.
This is intentionally conservative: it only returns False when we are reasonably
sure there is no assistant text at all, to avoid triggering unnecessary repair
attempts when providers populate alternative fields.
"""
# Primary / normalized path: most providers should mirror assistant text here.
text = (getattr(llm_resp, "completion_text", None) or "").strip()
if text:
return True
# Fallback 1: some providers may populate `result_chain` instead of `completion_text`.
result_chain = getattr(llm_resp, "result_chain", None)
if result_chain:
# If it's already a string, check it directly.
if isinstance(result_chain, str):
if result_chain.strip():
return True
# If it's a list/sequence of "messages" or nodes, look for any non‑empty content.
elif isinstance(result_chain, (list, tuple)):
for node in result_chain:
content = ""
# Common patterns: dicts with "content" or "parts".
if isinstance(node, dict):
if "content" in node:
content = node.get("content") or ""
elif "parts" in node and isinstance(node["parts"], list):
# Join parts that are simple strings.
str_parts = [
p for p in node["parts"] if isinstance(p, str)
]
content = " ".join(str_parts)
# Or objects with a `content` attribute.
elif hasattr(node, "content"):
content = getattr(node, "content") or ""
if isinstance(content, str) and content.strip():
return True
# Fallback 2: generic `message` / `messages` fields if present.
message = getattr(llm_resp, "message", None)
if isinstance(message, str) and message.strip():
return True
messages = getattr(llm_resp, "messages", None)
if isinstance(messages, (list, tuple)):
for msg in messages:
content = ""
if isinstance(msg, dict):
content = msg.get("content") or ""
elif hasattr(msg, "content"):
content = getattr(msg, "content") or ""
if isinstance(content, str) and content.strip():
return True
# If none of the known fields contain text, treat it as no meaningful reply.
return False
```
This implementation assumes `LLMResponse` may expose assistant content via `result_chain`, `message`, or `messages` fields, and that these follow typical "message with content" conventions. If your actual `LLMResponse` type uses different field names or shapes (e.g., `output_messages`, `assistant_message`, etc.), you should:
1. Update `_has_meaningful_assistant_reply` to explicitly inspect those real fields.
2. Optionally, centralize the “extract assistant text” logic in a helper on `LLMResponse` (e.g., a `get_assistant_text()` method) and have `_has_meaningful_assistant_reply` call that instead, to keep this runner decoupled from provider‑specific details.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| repair_resp = await self.provider.text_chat( | ||
| contexts=repair_contexts, | ||
| func_tool=param_subset, | ||
| model=self.req.model, | ||
| session_id=self.req.session_id, | ||
| extra_user_content_parts=self.req.extra_user_content_parts, | ||
| tool_choice="required", | ||
| abort_signal=self._abort_signal, | ||
| ) |
There was a problem hiding this comment.
suggestion (bug_risk): Second-stage repair still enforces tool_choice="required", which conflicts with allowing a no-tool explanation.
In the second-stage repair call you instruct the model that it may either call a tool or return an explanation, but you still pass tool_choice="required":
repair_resp = await self.provider.text_chat(
contexts=repair_contexts,
func_tool=param_subset,
model=self.req.model,
session_id=self.req.session_id,
extra_user_content_parts=self.req.extra_user_content_parts,
tool_choice="required",
abort_signal=self._abort_signal,
)For Gemini/Anthropic this maps to ANY/any and for OpenAI to required, all of which bias strongly toward always calling a tool. This conflicts with the intent to allow a no-tool explanation and may weaken the fallback behavior. Consider using tool_choice="auto" (or omitting it) here so the API constraint matches the instruction that a pure explanation is allowed.
| repair_resp = await self.provider.text_chat( | |
| contexts=repair_contexts, | |
| func_tool=param_subset, | |
| model=self.req.model, | |
| session_id=self.req.session_id, | |
| extra_user_content_parts=self.req.extra_user_content_parts, | |
| tool_choice="required", | |
| abort_signal=self._abort_signal, | |
| ) | |
| repair_resp = await self.provider.text_chat( | |
| contexts=repair_contexts, | |
| func_tool=param_subset, | |
| model=self.req.model, | |
| session_id=self.req.session_id, | |
| extra_user_content_parts=self.req.extra_user_content_parts, | |
| tool_choice="auto", | |
| abort_signal=self._abort_signal, | |
| ) |
| @staticmethod | ||
| def _has_meaningful_assistant_reply(llm_resp: LLMResponse) -> bool: | ||
| text = (llm_resp.completion_text or "").strip() | ||
| return bool(text) |
There was a problem hiding this comment.
suggestion: _has_meaningful_assistant_reply only looks at completion_text, which can misclassify valid replies.
The current implementation only inspects completion_text:
@staticmethod
def _has_meaningful_assistant_reply(llm_resp: LLMResponse) -> bool:
text = (llm_resp.completion_text or "").strip()
return bool(text)If a provider returns the assistant reply in another field (e.g. result_chain or message parts) and leaves completion_text empty, this will incorrectly treat a valid reply as missing and trigger an unnecessary second repair attempt. If LLMResponse guarantees that any assistant NL reply is always mirrored to completion_text, this is fine; otherwise, this helper should also consider other fields that can contain assistant text before deciding there was “no explanation.”
Suggested implementation:
@staticmethod
def _has_meaningful_assistant_reply(llm_resp: LLMResponse) -> bool:
"""
Return True if the LLMResponse appears to contain a non‑empty natural‑language
assistant reply in any of the known fields, not just `completion_text`.
This is intentionally conservative: it only returns False when we are reasonably
sure there is no assistant text at all, to avoid triggering unnecessary repair
attempts when providers populate alternative fields.
"""
# Primary / normalized path: most providers should mirror assistant text here.
text = (getattr(llm_resp, "completion_text", None) or "").strip()
if text:
return True
# Fallback 1: some providers may populate `result_chain` instead of `completion_text`.
result_chain = getattr(llm_resp, "result_chain", None)
if result_chain:
# If it's already a string, check it directly.
if isinstance(result_chain, str):
if result_chain.strip():
return True
# If it's a list/sequence of "messages" or nodes, look for any non‑empty content.
elif isinstance(result_chain, (list, tuple)):
for node in result_chain:
content = ""
# Common patterns: dicts with "content" or "parts".
if isinstance(node, dict):
if "content" in node:
content = node.get("content") or ""
elif "parts" in node and isinstance(node["parts"], list):
# Join parts that are simple strings.
str_parts = [
p for p in node["parts"] if isinstance(p, str)
]
content = " ".join(str_parts)
# Or objects with a `content` attribute.
elif hasattr(node, "content"):
content = getattr(node, "content") or ""
if isinstance(content, str) and content.strip():
return True
# Fallback 2: generic `message` / `messages` fields if present.
message = getattr(llm_resp, "message", None)
if isinstance(message, str) and message.strip():
return True
messages = getattr(llm_resp, "messages", None)
if isinstance(messages, (list, tuple)):
for msg in messages:
content = ""
if isinstance(msg, dict):
content = msg.get("content") or ""
elif hasattr(msg, "content"):
content = getattr(msg, "content") or ""
if isinstance(content, str) and content.strip():
return True
# If none of the known fields contain text, treat it as no meaningful reply.
return FalseThis implementation assumes LLMResponse may expose assistant content via result_chain, message, or messages fields, and that these follow typical "message with content" conventions. If your actual LLMResponse type uses different field names or shapes (e.g., output_messages, assistant_message, etc.), you should:
- Update
_has_meaningful_assistant_replyto explicitly inspect those real fields. - Optionally, centralize the “extract assistant text” logic in a helper on
LLMResponse(e.g., aget_assistant_text()method) and have_has_meaningful_assistant_replycall that instead, to keep this runner decoupled from provider‑specific details.
|
Documentation Updates 1 document(s) were updated by changes in this PR: pr4697的改动View Changes@@ -1752,7 +1752,139 @@
---
-### 15. 其他优化
+### 15. skills_like 工具模式 tool_choice 参数与重试机制(PR #7101)
+
+#### 功能说明
+[PR #7101](https://github.com/AstrBotDevs/AstrBot/pull/7101) 新增了 `tool_choice` 参数用于控制 LLM 工具调用策略,并在 skills_like 工具模式下引入了增强的重试机制,解决了 LLM 在参数补全阶段返回空响应的问题([Issue #7049](https://github.com/AstrBotDevs/AstrBot/issues/7049))。
+
+#### tool_choice 参数
+
+##### 参数说明
+新增 `tool_choice` 参数,支持在 provider API 的 `text_chat()` 和 `text_chat_stream()` 方法中指定工具调用策略:
+
+- **参数类型**:`Literal["auto", "required"]`
+- **默认值**:`"auto"`
+- **语义说明**:
+ - `"auto"`:LLM 自行决定是否调用工具或直接回复
+ - `"required"`:强制要求 LLM 调用其中一个可用工具
+
+##### 提供商适配
+`tool_choice` 参数已在所有三个 provider 源中实现,并根据各提供商的 API 规范进行了适配:
+
+- **OpenAI**:直接传递 `tool_choice` 值(`"auto"` 或 `"required"`)
+- **Anthropic**:转换为结构化对象(`{"type": "auto"}` 或 `{"type": "any"}`)
+- **Gemini**:使用 `FunctionCallingConfigMode.AUTO` 或 `FunctionCallingConfigMode.ANY` 枚举值
+
+#### skills_like 模式重试机制增强
+
+##### 问题背景
+在 skills_like 工具模式下,系统分两轮与 LLM 交互:
+1. 第一轮使用轻量级工具描述获取工具名称
+2. 第二轮(re-query)使用完整参数 schema 进行参数补全
+
+修复前,如果 LLM 在参数补全阶段返回空响应(既无工具调用也无有意义的文本回复),系统会直接失败,导致工具执行流程中断。
+
+##### 两阶段重试机制
+PR #7101 在 `tool_loop_agent_runner.py` 的 `_resolve_tool_exec()` 方法中引入了两阶段重试机制:
+
+**阶段 1 - 强制工具调用重试**:
+- 当初始 re-query 返回空响应时,系统使用 `tool_choice="required"` 参数重新请求 LLM
+- 强制要求 LLM 调用工具或提供解释
+
+**阶段 2 - 强化指令重试**:
+- 如果第一阶段重试仍失败,系统会注入更强的指令并再次使用 `tool_choice="required"` 重试
+- 注入的额外指令明确要求 LLM 执行以下两者之一:
+ 1. 使用提供的工具 schema 调用工具
+ 2. 向用户解释为何无法调用工具
+
+**注入的额外指令内容**:
+```
+This is the second-stage tool execution step.
+You must do exactly one of the following:
+1. Call one of the selected tools using the provided tool schema.
+2. If calling a tool is no longer possible or appropriate, reply to the user with a brief explanation of why.
+Do not return an empty response.
+Do not ignore the selected tools without explanation.
+```
+
+##### 平滑回退机制
+如果两阶段重试均失败但返回了有意义的 assistant 回复,系统会:
+1. 将响应视为普通的 assistant 文本回复
+2. 通过 `_complete_with_assistant_response()` 方法完成执行
+3. 在消息流中显示 LLM 的回复内容
+4. 避免因工具调用失败而中断对话
+
+#### 代码重构
+
+##### 新增辅助方法
+
+**`_complete_with_assistant_response()` 方法**:
+- 统一处理 assistant 响应的最终化逻辑
+- 构建包含 thinking(如有)和文本内容的消息
+- 将 assistant 消息追加到对话历史
+- 调用 `on_agent_done` 钩子并清理未消费的跟进消息
+- 该方法在多个场景下复用,确保行为一致性
+
+**`_has_meaningful_assistant_reply()` 静态方法**:
+- 检查 LLM 响应是否包含有意义的文本内容
+- 通过去除空白字符判断回复是否为空
+- 用于决定是否需要进一步重试
+
+**`_build_tool_requery_context()` 方法增强**:
+- 新增可选的 `extra_instruction` 参数
+- 支持在第二阶段重试时注入额外指令
+- 额外指令会附加到系统消息中,引导 LLM 更明确地执行工具调用
+
+#### 行为说明
+
+##### 工具调用流程(skills_like 模式)
+1. **初始 re-query**:使用完整工具 schema 请求参数补全
+2. **空响应检测**:检查是否返回工具调用或有意义的文本
+3. **第一阶段重试**:使用 `tool_choice="required"` 强制工具调用
+4. **第二阶段重试**:注入更强指令并再次使用 `tool_choice="required"`
+5. **平滑回退**:如有有意义的文本回复,转为普通 assistant 响应
+
+##### 可观测性增强
+系统在重试过程中记录详细的警告日志,便于诊断:
+
+- **初始空响应**:
+ ```
+ skills_like tool re-query returned no tool calls; fallback to assistant response.
+ ```
+- **第一阶段重试触发**:
+ ```
+ skills_like tool re-query returned no tool calls and no explanation; retrying with stronger instruction.
+ ```
+
+#### 用户收益
+- **提升可靠性**:防止因 LLM 空响应导致的工具执行失败,提高系统稳定性
+- **改善用户体验**:即使工具调用失败,用户也能收到 LLM 的解释或回复,避免静默失败
+- **增强可观测性**:通过分级日志清晰记录重试流程,便于开发者诊断和优化
+- **跨提供商一致性**:`tool_choice` 参数在所有提供商(OpenAI、Anthropic、Gemini)中统一实现,确保行为一致性
+
+#### 技术实现要点
+
+##### 重试逻辑集成
+重试逻辑完全集成在 `_resolve_tool_exec()` 方法中,避免影响其他工具模式:
+- 仅在 `tool_schema_mode == "skills_like"` 时激活
+- 重试条件:无工具调用且无有意义的 assistant 回复
+- 重试次数:最多两次(第一阶段 + 第二阶段)
+
+##### 上下文构建优化
+`_build_tool_requery_context()` 方法的增强确保额外指令正确注入:
+- 如果上下文首条消息为系统消息,将指令附加到现有内容
+- 否则在上下文开头插入新的系统消息
+- 保持上下文结构的完整性和一致性
+
+##### 回退策略
+平滑回退机制确保用户体验不受影响:
+- LLM 的文本回复通过 `AgentResponse` 正常返回
+- 对话历史正确记录 assistant 消息
+- Agent 状态转换为 DONE,执行流程正常结束
+
+---
+
+### 16. 其他优化
- JWT 处理和错误处理机制增强,提升系统安全性和稳定性
- UI 细节优化,提升用户体验
- 日志与异常处理增强,便于问题追踪 |
There was a problem hiding this comment.
Code Review
This pull request introduces a tool_choice parameter across the LLM provider interface and its implementations (OpenAI, Anthropic, Gemini), allowing for explicit control over whether tool calls are required or optional. Additionally, the tool_loop_agent_runner has been refactored to centralize assistant response finalization and now includes a retry mechanism with reinforced instructions when a tool re-query fails to yield a tool call or a meaningful explanation. I have no feedback to provide as there were no review comments to evaluate.
fixes: #7049
Modifications / 改动点
Screenshots or Test Results / 运行截图或测试结果
Checklist / 检查清单
😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
/ 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。
👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
/ 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”。
🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in
requirements.txtandpyproject.toml./ 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到
requirements.txt和pyproject.toml文件相应位置。😮 My changes do not introduce malicious code.
/ 我的更改没有引入恶意代码。
Summary by Sourcery
Ensure tools are invoked reliably in skills-like tool call mode by making tool usage configurable and improving fallback behavior when models return no tool calls.
New Features:
Bug Fixes:
Enhancements: