Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .versions
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
GO_VERSION=1.25
VLLM_VERSION=0.17.0
VLLM_VERSION=0.19.0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Upgrading to vLLM 0.19.0 is a breaking change for the current backend implementation. vLLM 0.19.0 removed the --use-v2-block-manager flag (as the V2 block manager is now the default and only option), but the code in pkg/inference/backends/vllm/vllm.go:172 still explicitly appends this flag when speculative decoding is enabled. This will cause the vLLM process to fail on startup with an 'unrecognized arguments' error on Linux.

You should update the Go backend to handle this flag conditionally based on the version, or remove it if the older version (0.17.1) used for macOS also supports the V2 block manager without the explicit flag. Merging this version bump without the corresponding code change will break speculative decoding functionality.

VLLM_UPSTREAM_VERSION=0.17.1
VLLM_METAL_RELEASE=v0.1.0-20260320-122309
DIFFUSERS_RELEASE=v0.1.0-20260216-000000
Expand Down
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ ENTRYPOINT ["/app/model-runner"]
# --- vLLM variant ---
FROM llamacpp AS vllm

ARG VLLM_VERSION=0.17.0
ARG VLLM_VERSION=0.19.0
ARG VLLM_CUDA_VERSION=cu130
ARG VLLM_PYTHON_TAG=cp38-abi3
ARG TARGETARCH
Expand Down