Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/source/Instruction/Frequently-asked-questions.md
Original file line number Diff line number Diff line change
Expand Up @@ -340,6 +340,12 @@ swift对不同版本的qwen-vl-utils做了兼容,使用qwen2.5-vl和qwen3-vl
### Q20: 报错,safetensors_rust.SafetensorError: Error while deserializing header:MetadataIncompleteBuffer
模型权重损坏了。

### Q21: vLLM 报错 `ValueError: the decoder prompt contains a(n) video item with length 16758, which exceeds the pre-allocated encoder cache size 16384. please reduce the input size or increase the encoder cache size by setting --limit-mm-per-prompt at startup.`
这通常是多模态输入过长,超过了 vLLM 预分配的 encoder cache size 导致的。可以通过 `--limit-mm-per-prompt` 调整 encoder cache size。另一个可行的解决方法是增大 `max_num_batched_tokens`,在 Swift cli 中传入:
```shell
--vllm_engine_kwargs '{"max_num_batched_tokens": 20000}'
```

## 导出

### Q1: autoawq相关的报错
Expand Down
6 changes: 6 additions & 0 deletions docs/source_en/Instruction/Frequently-asked-questions.md
Original file line number Diff line number Diff line change
Expand Up @@ -340,6 +340,12 @@ Swift is compatible with different versions of qwen-vl-utils, so you do not need
### Q20: I got an error: safetensors_rust.SafetensorError: Error while deserializing header:MetadataIncompleteBuffer
The model weights are corrupted.

### Q21: How can I handle this vLLM error: `ValueError: the decoder prompt contains a(n) video item with length 16758, which exceeds the pre-allocated encoder cache size 16384. please reduce the input size or increase the encoder cache size by setting --limit-mm-per-prompt at startup.`?
This usually means the multimodal input is too long and exceeds vLLM's pre-allocated encoder cache size. You can adjust the encoder cache size with `--limit-mm-per-prompt`. Another practical workaround is to increase `max_num_batched_tokens`. In Swift cli:
```shell
--vllm_engine_kwargs '{"max_num_batched_tokens": 20000}'
```

## Export

### Q1: Errors related to autoawq
Expand Down
4 changes: 4 additions & 0 deletions swift/template/templates/qwen.py
Original file line number Diff line number Diff line change
Expand Up @@ -313,6 +313,7 @@ def replace_tag(self, media_type: Literal['image', 'video', 'audio'], index: int
kwargs = {'image_patch_size': self.processor.image_processor.patch_size} if self.version == 'v3' else {}
if self.mode == 'vllm':
# resized in qwen_vl_utils, no need to resize again in vllm
# ref: https://github.com/modelscope/ms-swift/issues/8445
inputs.mm_processor_kwargs['do_resize'] = False
if media_type == 'image':
inputs.images[index] = fetch_image({'image': inputs.images[index]}, **kwargs)
Expand Down Expand Up @@ -642,6 +643,9 @@ def replace_tag(self, media_type: Literal['image', 'video', 'audio'], index: int
inputs: StdTemplateInputs) -> List[Context]:
from qwen_omni_utils import fetch_image, fetch_video
kwargs = {'image_patch_size': self.processor.image_processor.patch_size} if self.version == 'omni_v3' else {}
if self.mode == 'vllm':
# https://github.com/modelscope/ms-swift/issues/8445
Comment thread
Tohrusky marked this conversation as resolved.
inputs.mm_processor_kwargs['do_resize'] = False
if media_type == 'image':
inputs.images[index] = fetch_image({'image': inputs.images[index]}, **kwargs)
if self.version == 'omni_v2_5':
Expand Down
Loading