Skip to content

fix: support custom image_caption_prompt and prevent duplicate image_urls when default_image_caption_provider_id is configured#8308

Open
Yao-lin101 wants to merge 3 commits into
AstrBotDevs:masterfrom
Yao-lin101:fix/image-caption-config
Open

fix: support custom image_caption_prompt and prevent duplicate image_urls when default_image_caption_provider_id is configured#8308
Yao-lin101 wants to merge 3 commits into
AstrBotDevs:masterfrom
Yao-lin101:fix/image-caption-config

Conversation

@Yao-lin101
Copy link
Copy Markdown

@Yao-lin101 Yao-lin101 commented May 23, 2026

This PR resolves two issues related to image captioning configuration and behavior when handling quoted/embedded images:

  1. Support Custom Image Caption Prompt: Previously, the image caption prompt used in _process_quote_message was hardcoded to "Please describe the image content.". This change reads image_caption_prompt from provider settings (falling back to "Please describe the image." if empty) to honor user-defined configurations.
  2. Prevent Duplicate Image Sending: When default_image_caption_provider_id is set, the main agent should route the image to the captioning provider rather than sending the raw image directly to the main LLM request. Previously, the image path/URL was still being added to req.image_urls. We now conditionalize this so raw image URLs are not appended when a dedicated captioning provider ID is configured.

Modifications / 改动点

  • Modified astr_main_agent.py:

    • Read image_caption_prompt from provider settings and apply it dynamically in _process_quote_message.
    • Read default_image_caption_provider_id from provider settings in build_main_agent.
    • Conditioned adding images to req.image_urls on img_cap_prov_id not being configured.
  • This is NOT a breaking change. / 这不是一个破坏性变更。

Screenshots or Test Results / 运行截图或测试结果

Before Changes / 改动前:
image
After the change / 改动后:
image

  • Tested successfully in the container environment:
    • Verified that custom prompts from provider_settings are correctly passed to text_chat for image descriptions.
    • Verified that when default_image_caption_provider_id is configured, image description is processed correctly without appending raw image URLs to the request payload.

Checklist / 检查清单

  • 😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
    / 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。

  • 👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
    / 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”

  • 🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in requirements.txt and pyproject.toml.
    / 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到 requirements.txtpyproject.toml 文件相应位置。

  • 😮 My changes do not introduce malicious code.
    / 我的更改没有引入恶意代码。

Summary by Sourcery

Honor provider-configured image captioning settings and avoid sending duplicate image URLs when a dedicated caption provider is configured.

Bug Fixes:

  • Use the image caption prompt from provider settings in quote message handling instead of a hardcoded string.
  • Skip adding quoted image URLs to main LLM requests when a default image caption provider is configured, preventing duplicate image handling.

@auto-assign auto-assign Bot requested review from LIghtJUNction and Raven95676 May 23, 2026 16:43
@dosubot dosubot Bot added size:S This PR changes 10-29 lines, ignoring generated files. area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. labels May 23, 2026
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The logic to derive cfg and pull image_caption_prompt / default_image_caption_provider_id is now duplicated in _process_quote_message and build_main_agent; consider extracting a small helper (e.g., get_provider_settings(event, config, plugin_context)) to centralize this and avoid subtle drift in future changes.
  • When default_image_caption_provider_id is set, raw image URLs are never appended to req.image_urls; you might want to consider a fallback path (e.g., on caption provider failure or missing provider) where images are still added so the main LLM can handle them instead of silently dropping image content.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The logic to derive `cfg` and pull `image_caption_prompt` / `default_image_caption_provider_id` is now duplicated in `_process_quote_message` and `build_main_agent`; consider extracting a small helper (e.g., `get_provider_settings(event, config, plugin_context)`) to centralize this and avoid subtle drift in future changes.
- When `default_image_caption_provider_id` is set, raw image URLs are never appended to `req.image_urls`; you might want to consider a fallback path (e.g., on caption provider failure or missing provider) where images are still added so the main LLM can handle them instead of silently dropping image content.

## Individual Comments

### Comment 1
<location path="astrbot/core/astr_main_agent.py" line_range="804-805" />
<code_context>
                     event.track_temporary_local_file(compress_path)
+                cfg = (
+                    config.provider_settings if config else None
+                ) or plugin_context.get_config(umo=event.unified_msg_origin).get(
+                    "provider_settings", {}
+                )
+                img_cap_prompt = (
</code_context>
<issue_to_address>
**issue (bug_risk):** Guard against `plugin_context.get_config` returning `None` before calling `.get`.

`plugin_context.get_config(umo=event.unified_msg_origin)` may return `None`, which would cause an `AttributeError` when calling `.get`. Consider defaulting to an empty dict, e.g. `(plugin_context.get_config(umo=event.unified_msg_origin) or {}).get("provider_settings", {})`, to make this robust in partially configured environments.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread astrbot/core/astr_main_agent.py Outdated
Comment on lines +804 to +805
) or plugin_context.get_config(umo=event.unified_msg_origin).get(
"provider_settings", {}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Guard against plugin_context.get_config returning None before calling .get.

plugin_context.get_config(umo=event.unified_msg_origin) may return None, which would cause an AttributeError when calling .get. Consider defaulting to an empty dict, e.g. (plugin_context.get_config(umo=event.unified_msg_origin) or {}).get("provider_settings", {}), to make this robust in partially configured environments.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces configurable image caption prompts and conditional image URL handling within the main agent. Key changes include updating _process_quote_message to utilize a custom prompt from the configuration and modifying build_main_agent to skip appending image URLs when a specific image caption provider is active. The review feedback suggests refactoring duplicated configuration logic for better maintainability and ensuring that fallback settings are consistently applied to the message parser.

Comment thread astrbot/core/astr_main_agent.py Outdated
Comment on lines +802 to +806
cfg = (
config.provider_settings if config else None
) or plugin_context.get_config(umo=event.unified_msg_origin).get(
"provider_settings", {}
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for retrieving cfg with fallback is duplicated here and in _decorate_llm_request (line 888). Since _process_quote_message is called from _decorate_llm_request, it would be more efficient and maintainable to pass cfg as an argument to this function instead of recalculating it, especially since plugin_context.get_config might involve overhead.

References
  1. When implementing similar functionality for different cases (e.g., direct vs. quoted attachments), refactor the logic into a shared helper function to avoid code duplication.

Comment thread astrbot/core/astr_main_agent.py Outdated
Comment on lines +1280 to +1286
quoted_message_settings = _get_quoted_message_parser_settings(
config.provider_settings
)
cfg = config.provider_settings or plugin_context.get_config(
umo=event.unified_msg_origin
).get("provider_settings", {})
img_cap_prov_id = cfg.get("default_image_caption_provider_id") or ""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The cfg variable (which includes fallback settings from the plugin configuration) should be used when initializing quoted_message_settings. Currently, quoted_message_settings only uses config.provider_settings, which may be empty even if the user has configured settings globally. Using cfg ensures that the parser settings correctly respect the fallback configuration, maintaining consistency with how img_cap_prov_id is handled. Additionally, new functionality such as handling attachments should be accompanied by corresponding unit tests.

            cfg = config.provider_settings or plugin_context.get_config(
                umo=event.unified_msg_origin
            ).get("provider_settings", {})
            quoted_message_settings = _get_quoted_message_parser_settings(cfg)
            img_cap_prov_id = cfg.get("default_image_caption_provider_id") or ""
References
  1. New functionality, such as handling attachments, should be accompanied by corresponding unit tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. size:S This PR changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant