From eb126b3cc0477220c59b55d243c56db983f4b70f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Pawe=C5=82=20Rzepecki?= <pawel.rzepecki@intel.com>
Date: Tue, 23 Dec 2025 09:22:04 +0100
Subject: [PATCH] LLM NPU cache path fix  (#3885)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

### 🛠 Summary


[CVS-17867](https://jira.devtools.intel.com/secure/RapidBoard.jspa?rapidView=28191&view=detail&selectedIssue=CVS-178677&quickFilter=232315#)
Changing cache path to permitted one / adding info about paths.

Demo test result:
<img width="1757" height="170" alt="image"
src="https://github.com/user-attachments/assets/c6023978-c65f-4ffa-9f12-83c950406faa"
/>


### 🧪 Checklist

- [ ] Unit tests added.
- [ ] The documentation updated.
- [ ] Change follows security best practices.
``
---
 demos/llm_npu/README.md | 2 +-
 docs/parameters.md      | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/demos/llm_npu/README.md b/demos/llm_npu/README.md
index b1107993b0..ef28c75214 100644
--- a/demos/llm_npu/README.md
+++ b/demos/llm_npu/README.md
@@ -26,7 +26,7 @@ Multiple [OpenVINO models optimized for NPU](https://huggingface.co/collections/
 :sync: Linux
 ```bash
 mkdir -p models
-docker run -d --rm -u $(id -u):$(id -g) -v $(pwd)/models:/models:rw openvino/model_server:latest-gpu --pull --source_model OpenVINO/Qwen3-8B-int4-cw-ov --model_repository_path /models --target_device NPU --task text_generation --tool_parser hermes3 --cache_dir .ov_cache --enable_prefix_caching true --max_prompt_len 2000
+docker run -d --rm -u $(id -u):$(id -g) -v $(pwd)/models:/models:rw openvino/model_server:latest-gpu --pull --source_model OpenVINO/Qwen3-8B-int4-cw-ov --model_repository_path /models --target_device NPU --task text_generation --tool_parser hermes3 --cache_dir /models/.ov_cache --enable_prefix_caching true --max_prompt_len 2000
 docker run -d --rm -u $(id -u):$(id -g) -v $(pwd)/models:/models:rw openvino/model_server:latest-gpu --add_to_config --config_path /models/config.json --model_name OpenVINO/Qwen3-8B-int4-cw-ov --model_path /models/OpenVINO/Qwen3-8B-int4-cw-ov
 ```
 ::: 
diff --git a/docs/parameters.md b/docs/parameters.md
index 6bbc380184..9aa06e9507 100644
--- a/docs/parameters.md
+++ b/docs/parameters.md
@@ -46,7 +46,7 @@ Configuration options for the server are defined only via command-line options a
 | `cpu_extension` | `string` | Optional path to a library with [custom layers implementation](https://docs.openvino.ai/2025/documentation/openvino-extensibility.html). |
 | `log_level` | `"DEBUG"/"INFO"/"ERROR"` | Serving logging level |
 | `log_path` | `string` | Optional path to the log file. |
-| `cache_dir` | `string` | Path to the model cache storage. Caching will be enabled if this parameter is defined or the default path /opt/cache exists |
+| `cache_dir` | `string` | Path (absolute or relative to the current directory) to the model cache storage. Caching will be enabled if this parameter is defined or the default path /opt/cache exists |
 | `grpc_channel_arguments` | `string` |   A comma separated list of arguments to be passed to the grpc server. (e.g. grpc.max_connection_age_ms=2000) |
 | `grpc_max_threads` | `string` |   Maximum number of threads which can be used by the grpc server. Default value depends on number of CPUs. |
 | `grpc_memory_quota` | `string` |   GRPC server buffer memory quota. Default value set to 2147483648 (2GB). |