Conversation
There was a problem hiding this comment.
Pull request overview
Updates the embeddings demo documentation to use the correct Qwen3 embeddings model variant for NPU usage.
Changes:
- Switches the Qwen3 NPU tab sync/model reference from fp16 to int8.
- Adds a
docker runexample for serving the Qwen3 int8 OpenVINO model via OpenVINO Model Server.
| :sync: Qwen3-Embedding-0.6B-fp16 | ||
| :sync: Qwen3-Embedding-0.6B-int8 | ||
| ```console | ||
| docker run --user $(id -u):$(id -g) --rm -v $(pwd)/models:/models:rw openvino/model_server:latest --pull --model_repository_path /models --source_model OpenVINO/Qwen3-Embedding-0.6B-int8-ov --pooling LAST --task embeddings |
There was a problem hiding this comment.
docker run --pull requires an explicit value (e.g., --pull=always|missing|never) in supported Docker versions; using --pull without a value will fail. Update the command to provide a value (or remove the flag if not needed).
| docker run --user $(id -u):$(id -g) --rm -v $(pwd)/models:/models:rw openvino/model_server:latest --pull --model_repository_path /models --source_model OpenVINO/Qwen3-Embedding-0.6B-int8-ov --pooling LAST --task embeddings | |
| docker run --user $(id -u):$(id -g) --rm -v $(pwd)/models:/models:rw openvino/model_server:latest --pull=always --model_repository_path /models --source_model OpenVINO/Qwen3-Embedding-0.6B-int8-ov --pooling LAST --task embeddings |
| :sync: Qwen3-Embedding-0.6B-fp16 | ||
| :sync: Qwen3-Embedding-0.6B-int8 | ||
| ```console | ||
| docker run --user $(id -u):$(id -g) --rm -v $(pwd)/models:/models:rw openvino/model_server:latest --pull --model_repository_path /models --source_model OpenVINO/Qwen3-Embedding-0.6B-int8-ov --pooling LAST --task embeddings |
There was a problem hiding this comment.
Using the :latest tag makes the documentation non-reproducible (behavior can change over time). Prefer pinning to a specific, known-good image version/tag to keep the demo stable.
| docker run --user $(id -u):$(id -g) --rm -v $(pwd)/models:/models:rw openvino/model_server:latest --pull --model_repository_path /models --source_model OpenVINO/Qwen3-Embedding-0.6B-int8-ov --pooling LAST --task embeddings | |
| docker run --user $(id -u):$(id -g) --rm -v $(pwd)/models:/models:rw openvino/model_server:2024.0 --pull --model_repository_path /models --source_model OpenVINO/Qwen3-Embedding-0.6B-int8-ov --pooling LAST --task embeddings |
| :sync: Qwen3-Embedding-0.6B-fp16 | ||
| :sync: Qwen3-Embedding-0.6B-int8 | ||
| ```console | ||
| docker run --user $(id -u):$(id -g) --rm -v $(pwd)/models:/models:rw openvino/model_server:latest --pull --model_repository_path /models --source_model OpenVINO/Qwen3-Embedding-0.6B-int8-ov --pooling LAST --task embeddings |
There was a problem hiding this comment.
The bind mount uses -v $(pwd)/models:... without quoting; if the working directory path contains spaces, the command will break. Quote the host path (or use an absolute path variable) to make the example more robust.
| docker run --user $(id -u):$(id -g) --rm -v $(pwd)/models:/models:rw openvino/model_server:latest --pull --model_repository_path /models --source_model OpenVINO/Qwen3-Embedding-0.6B-int8-ov --pooling LAST --task embeddings | |
| docker run --user $(id -u):$(id -g) --rm -v "$(pwd)/models":/models:rw openvino/model_server:latest --pull --model_repository_path /models --source_model OpenVINO/Qwen3-Embedding-0.6B-int8-ov --pooling LAST --task embeddings |
🛠 Summary
JIRA/Issue if applicable.
Describe the changes.
🧪 Checklist
``