fix: support custom image_caption_prompt and prevent duplicate image_urls when default_image_caption_provider_id is configured#8308
Conversation
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The logic to derive
cfgand pullimage_caption_prompt/default_image_caption_provider_idis now duplicated in_process_quote_messageandbuild_main_agent; consider extracting a small helper (e.g.,get_provider_settings(event, config, plugin_context)) to centralize this and avoid subtle drift in future changes. - When
default_image_caption_provider_idis set, raw image URLs are never appended toreq.image_urls; you might want to consider a fallback path (e.g., on caption provider failure or missing provider) where images are still added so the main LLM can handle them instead of silently dropping image content.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The logic to derive `cfg` and pull `image_caption_prompt` / `default_image_caption_provider_id` is now duplicated in `_process_quote_message` and `build_main_agent`; consider extracting a small helper (e.g., `get_provider_settings(event, config, plugin_context)`) to centralize this and avoid subtle drift in future changes.
- When `default_image_caption_provider_id` is set, raw image URLs are never appended to `req.image_urls`; you might want to consider a fallback path (e.g., on caption provider failure or missing provider) where images are still added so the main LLM can handle them instead of silently dropping image content.
## Individual Comments
### Comment 1
<location path="astrbot/core/astr_main_agent.py" line_range="804-805" />
<code_context>
event.track_temporary_local_file(compress_path)
+ cfg = (
+ config.provider_settings if config else None
+ ) or plugin_context.get_config(umo=event.unified_msg_origin).get(
+ "provider_settings", {}
+ )
+ img_cap_prompt = (
</code_context>
<issue_to_address>
**issue (bug_risk):** Guard against `plugin_context.get_config` returning `None` before calling `.get`.
`plugin_context.get_config(umo=event.unified_msg_origin)` may return `None`, which would cause an `AttributeError` when calling `.get`. Consider defaulting to an empty dict, e.g. `(plugin_context.get_config(umo=event.unified_msg_origin) or {}).get("provider_settings", {})`, to make this robust in partially configured environments.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| ) or plugin_context.get_config(umo=event.unified_msg_origin).get( | ||
| "provider_settings", {} |
There was a problem hiding this comment.
issue (bug_risk): Guard against plugin_context.get_config returning None before calling .get.
plugin_context.get_config(umo=event.unified_msg_origin) may return None, which would cause an AttributeError when calling .get. Consider defaulting to an empty dict, e.g. (plugin_context.get_config(umo=event.unified_msg_origin) or {}).get("provider_settings", {}), to make this robust in partially configured environments.
There was a problem hiding this comment.
Code Review
This pull request introduces configurable image caption prompts and conditional image URL handling within the main agent. Key changes include updating _process_quote_message to utilize a custom prompt from the configuration and modifying build_main_agent to skip appending image URLs when a specific image caption provider is active. The review feedback suggests refactoring duplicated configuration logic for better maintainability and ensuring that fallback settings are consistently applied to the message parser.
| cfg = ( | ||
| config.provider_settings if config else None | ||
| ) or plugin_context.get_config(umo=event.unified_msg_origin).get( | ||
| "provider_settings", {} | ||
| ) |
There was a problem hiding this comment.
The logic for retrieving cfg with fallback is duplicated here and in _decorate_llm_request (line 888). Since _process_quote_message is called from _decorate_llm_request, it would be more efficient and maintainable to pass cfg as an argument to this function instead of recalculating it, especially since plugin_context.get_config might involve overhead.
References
- When implementing similar functionality for different cases (e.g., direct vs. quoted attachments), refactor the logic into a shared helper function to avoid code duplication.
| quoted_message_settings = _get_quoted_message_parser_settings( | ||
| config.provider_settings | ||
| ) | ||
| cfg = config.provider_settings or plugin_context.get_config( | ||
| umo=event.unified_msg_origin | ||
| ).get("provider_settings", {}) | ||
| img_cap_prov_id = cfg.get("default_image_caption_provider_id") or "" |
There was a problem hiding this comment.
The cfg variable (which includes fallback settings from the plugin configuration) should be used when initializing quoted_message_settings. Currently, quoted_message_settings only uses config.provider_settings, which may be empty even if the user has configured settings globally. Using cfg ensures that the parser settings correctly respect the fallback configuration, maintaining consistency with how img_cap_prov_id is handled. Additionally, new functionality such as handling attachments should be accompanied by corresponding unit tests.
cfg = config.provider_settings or plugin_context.get_config(
umo=event.unified_msg_origin
).get("provider_settings", {})
quoted_message_settings = _get_quoted_message_parser_settings(cfg)
img_cap_prov_id = cfg.get("default_image_caption_provider_id") or ""References
- New functionality, such as handling attachments, should be accompanied by corresponding unit tests.
This PR resolves two issues related to image captioning configuration and behavior when handling quoted/embedded images:
_process_quote_messagewas hardcoded to"Please describe the image content.". This change readsimage_caption_promptfrom provider settings (falling back to"Please describe the image."if empty) to honor user-defined configurations.default_image_caption_provider_idis set, the main agent should route the image to the captioning provider rather than sending the raw image directly to the main LLM request. Previously, the image path/URL was still being added toreq.image_urls. We now conditionalize this so raw image URLs are not appended when a dedicated captioning provider ID is configured.Modifications / 改动点
Modified astr_main_agent.py:
image_caption_promptfrom provider settings and apply it dynamically in_process_quote_message.default_image_caption_provider_idfrom provider settings inbuild_main_agent.req.image_urlsonimg_cap_prov_idnot being configured.This is NOT a breaking change. / 这不是一个破坏性变更。
Screenshots or Test Results / 运行截图或测试结果
Before Changes / 改动前:


After the change / 改动后:
provider_settingsare correctly passed totext_chatfor image descriptions.default_image_caption_provider_idis configured, image description is processed correctly without appending raw image URLs to the request payload.Checklist / 检查清单
😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
/ 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。
👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
/ 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”。
🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in
requirements.txtandpyproject.toml./ 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到
requirements.txt和pyproject.toml文件相应位置。😮 My changes do not introduce malicious code.
/ 我的更改没有引入恶意代码。
Summary by Sourcery
Honor provider-configured image captioning settings and avoid sending duplicate image URLs when a dedicated caption provider is configured.
Bug Fixes: