From eb126b3cc0477220c59b55d243c56db983f4b70f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Pawe=C5=82=20Rzepecki?= Date: Tue, 23 Dec 2025 09:22:04 +0100 Subject: [PATCH] LLM NPU cache path fix (#3885) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ### ๐Ÿ›  Summary [CVS-17867](https://jira.devtools.intel.com/secure/RapidBoard.jspa?rapidView=28191&view=detail&selectedIssue=CVS-178677&quickFilter=232315#) Changing cache path to permitted one / adding info about paths. Demo test result: image ### ๐Ÿงช Checklist - [ ] Unit tests added. - [ ] The documentation updated. - [ ] Change follows security best practices. `` --- demos/llm_npu/README.md | 2 +- docs/parameters.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/demos/llm_npu/README.md b/demos/llm_npu/README.md index b1107993b0..ef28c75214 100644 --- a/demos/llm_npu/README.md +++ b/demos/llm_npu/README.md @@ -26,7 +26,7 @@ Multiple [OpenVINO models optimized for NPU](https://huggingface.co/collections/ :sync: Linux ```bash mkdir -p models -docker run -d --rm -u $(id -u):$(id -g) -v $(pwd)/models:/models:rw openvino/model_server:latest-gpu --pull --source_model OpenVINO/Qwen3-8B-int4-cw-ov --model_repository_path /models --target_device NPU --task text_generation --tool_parser hermes3 --cache_dir .ov_cache --enable_prefix_caching true --max_prompt_len 2000 +docker run -d --rm -u $(id -u):$(id -g) -v $(pwd)/models:/models:rw openvino/model_server:latest-gpu --pull --source_model OpenVINO/Qwen3-8B-int4-cw-ov --model_repository_path /models --target_device NPU --task text_generation --tool_parser hermes3 --cache_dir /models/.ov_cache --enable_prefix_caching true --max_prompt_len 2000 docker run -d --rm -u $(id -u):$(id -g) -v $(pwd)/models:/models:rw openvino/model_server:latest-gpu --add_to_config --config_path /models/config.json --model_name OpenVINO/Qwen3-8B-int4-cw-ov --model_path /models/OpenVINO/Qwen3-8B-int4-cw-ov ``` ::: diff --git a/docs/parameters.md b/docs/parameters.md index 6bbc380184..9aa06e9507 100644 --- a/docs/parameters.md +++ b/docs/parameters.md @@ -46,7 +46,7 @@ Configuration options for the server are defined only via command-line options a | `cpu_extension` | `string` | Optional path to a library with [custom layers implementation](https://docs.openvino.ai/2025/documentation/openvino-extensibility.html). | | `log_level` | `"DEBUG"/"INFO"/"ERROR"` | Serving logging level | | `log_path` | `string` | Optional path to the log file. | -| `cache_dir` | `string` | Path to the model cache storage. Caching will be enabled if this parameter is defined or the default path /opt/cache exists | +| `cache_dir` | `string` | Path (absolute or relative to the current directory) to the model cache storage. Caching will be enabled if this parameter is defined or the default path /opt/cache exists | | `grpc_channel_arguments` | `string` | A comma separated list of arguments to be passed to the grpc server. (e.g. grpc.max_connection_age_ms=2000) | | `grpc_max_threads` | `string` | Maximum number of threads which can be used by the grpc server. Default value depends on number of CPUs. | | `grpc_memory_quota` | `string` | GRPC server buffer memory quota. Default value set to 2147483648 (2GB). |