feat: auto-remind ReadMediaFile for video tags to avoid python frame extraction#590
feat: auto-remind ReadMediaFile for video tags to avoid python frame extraction#590bj456736 wants to merge 1 commit into
Conversation
…extraction When users paste a video file into the CLI, the TUI converts it into a <video path="..."></video> tag. Previously the model often tried to write Python scripts to extract frames instead of using the built-in ReadMediaFile tool. Now, when the user message contains a <video> tag, a system reminder is automatically injected: "The user provided a video file. Use the ReadMediaFile tool to read and analyze the video content directly. Do not write Python scripts or other code to extract frames from the video." This fixes the P0 issue from the internal task list where video analysis would default to Python frame extraction instead of the native tool.
|
commit: |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b808331390
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (hasVideoTag(content)) { | ||
| this.appendSystemReminder( | ||
| 'The user provided a video file. Use the ReadMediaFile tool to read and analyze the video content directly. Do not write Python scripts or other code to extract frames from the video.', | ||
| { kind: 'injection', variant: 'host' }, |
There was a problem hiding this comment.
Remove video reminders when undoing their prompt
When a video prompt is later undone, this prompt-scoped reminder is left behind because undo() skips messages whose origin is kind: 'injection'. In the /undo scenario after a user submits <video ...></video>, the real user prompt is removed but this reminder remains in context, so the next model request can still be told that the user provided a video file and should use ReadMediaFile even though that prompt was undone.
Useful? React with 👍 / 👎.
Problem
When users paste a video file into the CLI, the TUI converts it into a
<video path="...">\</video>tag. Previously the model often tried to write Python scripts to extract frames instead of using the built-inReadMediaFiletool, which already supports video content natively.Solution
Auto-inject a system reminder when the user message contains a
<video>tag:Changes
packages/agent-core/src/agent/context/index.ts: AddedhasVideoTag()helper and video-reminder injection inappendUserMessage()packages/agent-core/test/agent/context.test.ts: Added tests for both video-tag presence and absenceTest
Reference
Internal task: Kimi CLI 视频分析希望默认调用 ReadMediaFile 而不是写 Python 切帧 (P0)