Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion demos/llm_npu/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Multiple [OpenVINO models optimized for NPU](https://huggingface.co/collections/
:sync: Linux
```bash
mkdir -p models
docker run -d --rm -u $(id -u):$(id -g) -v $(pwd)/models:/models:rw openvino/model_server:latest-gpu --pull --source_model OpenVINO/Qwen3-8B-int4-cw-ov --model_repository_path /models --target_device NPU --task text_generation --tool_parser hermes3 --cache_dir .ov_cache --enable_prefix_caching true --max_prompt_len 2000
docker run -d --rm -u $(id -u):$(id -g) -v $(pwd)/models:/models:rw openvino/model_server:latest-gpu --pull --source_model OpenVINO/Qwen3-8B-int4-cw-ov --model_repository_path /models --target_device NPU --task text_generation --tool_parser hermes3 --cache_dir /models/.ov_cache --enable_prefix_caching true --max_prompt_len 2000
docker run -d --rm -u $(id -u):$(id -g) -v $(pwd)/models:/models:rw openvino/model_server:latest-gpu --add_to_config --config_path /models/config.json --model_name OpenVINO/Qwen3-8B-int4-cw-ov --model_path /models/OpenVINO/Qwen3-8B-int4-cw-ov
```
:::
Expand Down
2 changes: 1 addition & 1 deletion docs/parameters.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ Configuration options for the server are defined only via command-line options a
| `cpu_extension` | `string` | Optional path to a library with [custom layers implementation](https://docs.openvino.ai/2025/documentation/openvino-extensibility.html). |
| `log_level` | `"DEBUG"/"INFO"/"ERROR"` | Serving logging level |
| `log_path` | `string` | Optional path to the log file. |
| `cache_dir` | `string` | Path to the model cache storage. Caching will be enabled if this parameter is defined or the default path /opt/cache exists |
| `cache_dir` | `string` | Path (absolute or relative to the current directory) to the model cache storage. Caching will be enabled if this parameter is defined or the default path /opt/cache exists |
| `grpc_channel_arguments` | `string` | A comma separated list of arguments to be passed to the grpc server. (e.g. grpc.max_connection_age_ms=2000) |
| `grpc_max_threads` | `string` | Maximum number of threads which can be used by the grpc server. Default value depends on number of CPUs. |
| `grpc_memory_quota` | `string` | GRPC server buffer memory quota. Default value set to 2147483648 (2GB). |
Expand Down