Conversation
Signed-off-by: Dorin Geman <dorin.geman@docker.com>
There was a problem hiding this comment.
Code Review
This pull request updates the vLLM version from 0.17.0 to 0.19.0. A critical issue was identified where this upgrade breaks the backend because the --use-v2-block-manager flag, which is still explicitly appended in the Go implementation, has been removed in vLLM 0.19.0 and will cause process startup failures.
| @@ -1,5 +1,5 @@ | |||
| GO_VERSION=1.25 | |||
| VLLM_VERSION=0.17.0 | |||
| VLLM_VERSION=0.19.0 | |||
There was a problem hiding this comment.
Upgrading to vLLM 0.19.0 is a breaking change for the current backend implementation. vLLM 0.19.0 removed the --use-v2-block-manager flag (as the V2 block manager is now the default and only option), but the code in pkg/inference/backends/vllm/vllm.go:172 still explicitly appends this flag when speculative decoding is enabled. This will cause the vLLM process to fail on startup with an 'unrecognized arguments' error on Linux.
You should update the Go backend to handle this flag conditionally based on the version, or remove it if the older version (0.17.1) used for macOS also supports the V2 block manager without the explicit flag. Merging this version bump without the corresponding code change will break speculative decoding functionality.
Will help #832 for vLLM on Linux and Windows.
Not yet for vllm-metal on macOS.