modelscope · hjh0119 · Mar 30, 2026 · Mar 27, 2026 · Mar 27, 2026 · Mar 27, 2026
diff --git a/docs/source/Instruction/Frequently-asked-questions.md b/docs/source/Instruction/Frequently-asked-questions.md
@@ -340,6 +340,12 @@ swift对不同版本的qwen-vl-utils做了兼容，使用qwen2.5-vl和qwen3-vl
 ### Q20: 报错，safetensors_rust.SafetensorError: Error while deserializing header:MetadataIncompleteBuffer
 模型权重损坏了。
 
+### Q21: vLLM 报错 `ValueError: the decoder prompt contains a(n) video item with length 16758, which exceeds the pre-allocated encoder cache size 16384. please reduce the input size or increase the encoder cache size by setting --limit-mm-per-prompt at startup.`
+这通常是多模态输入过长，超过了 vLLM 预分配的 encoder cache size 导致的。可以通过 `--limit-mm-per-prompt` 调整 encoder cache size。另一个可行的解决方法是增大 `max_num_batched_tokens`，在 Swift cli 中传入：
+```shell
+--vllm_engine_kwargs '{"max_num_batched_tokens": 20000}'
+```
+
 ## 导出
 
 ### Q1: autoawq相关的报错

diff --git a/docs/source_en/Instruction/Frequently-asked-questions.md b/docs/source_en/Instruction/Frequently-asked-questions.md
@@ -340,6 +340,12 @@ Swift is compatible with different versions of qwen-vl-utils, so you do not need
 ### Q20: I got an error: safetensors_rust.SafetensorError: Error while deserializing header:MetadataIncompleteBuffer
 The model weights are corrupted.
 
+### Q21: How can I handle this vLLM error: `ValueError: the decoder prompt contains a(n) video item with length 16758, which exceeds the pre-allocated encoder cache size 16384. please reduce the input size or increase the encoder cache size by setting --limit-mm-per-prompt at startup.`?
+This usually means the multimodal input is too long and exceeds vLLM's pre-allocated encoder cache size. You can adjust the encoder cache size with `--limit-mm-per-prompt`. Another practical workaround is to increase `max_num_batched_tokens`. In Swift cli:
+```shell
+--vllm_engine_kwargs '{"max_num_batched_tokens": 20000}'
+```
+
 ## Export
 
 ### Q1: Errors related to autoawq

diff --git a/swift/template/templates/qwen.py b/swift/template/templates/qwen.py
@@ -313,6 +313,7 @@ def replace_tag(self, media_type: Literal['image', 'video', 'audio'], index: int
         kwargs = {'image_patch_size': self.processor.image_processor.patch_size} if self.version == 'v3' else {}
         if self.mode == 'vllm':
             # resized in qwen_vl_utils, no need to resize again in vllm
+            # ref: https://github.com/modelscope/ms-swift/issues/8445
             inputs.mm_processor_kwargs['do_resize'] = False
         if media_type == 'image':
             inputs.images[index] = fetch_image({'image': inputs.images[index]}, **kwargs)
@@ -642,6 +643,9 @@ def replace_tag(self, media_type: Literal['image', 'video', 'audio'], index: int
                     inputs: StdTemplateInputs) -> List[Context]:
         from qwen_omni_utils import fetch_image, fetch_video
         kwargs = {'image_patch_size': self.processor.image_processor.patch_size} if self.version == 'omni_v3' else {}
+        if self.mode == 'vllm':
+            # https://github.com/modelscope/ms-swift/issues/8445
+            inputs.mm_processor_kwargs['do_resize'] = False
         if media_type == 'image':
             inputs.images[index] = fetch_image({'image': inputs.images[index]}, **kwargs)
             if self.version == 'omni_v2_5':