Add kv_cache size to pipeline generation log. by rasapala · Pull Request #3989 · openvinotoolkit/model_server

rasapala · 2026-02-18T15:43:59Z

🛠 Summary

JIRA CVS-181044
Depends on https://github.com/openvinotoolkit/openvino.genai/pull/3352/changes and ovms genai update.
Output:

Cache usage static 18.6% of 1022.4 MB;
for static 1GB

Cache usage dynamic 98.6% of 189.9 MB;
for dynamic cache_size = 0

For: {"prompt_tokens":2299,"completion_tokens":30,"total_tokens":2329}}

🧪 Checklist

Unit tests added.
The documentation updated.
Change follows security best practices.
``

popovaan

Looks good to me.

mzegla

Does not seem to make sense to report 100% of X in every log for dynamic cache size.
It's a minor, but can we drop showing percentage for dynamic cache size and only report size in MB/GB etc. ?

### 🛠 Summary [CVS-180948](https://jira.devtools.intel.com/browse/CVS-180948) Adding new parameter to export_model to enable export of model with requirement for trusting remote code. ### 🧪 Checklist - [ ] Unit tests added. - [ ] The documentation updated. - [ ] Change follows security best practices. `` --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Miłosz Żeglarski <milosz.zeglarski@intel.com>

### 🛠 Summary Adding environmental variable for Qwen3-Coder-30B to make it work correctly ### 🧪 Checklist - [ ] Unit tests added. - [ ] The documentation updated. - [ ] Change follows security best practices. ``

### 🛠 Summary Updating export and NPU usage. ### 🧪 Checklist - [ ] Unit tests added. - [ ] The documentation updated. - [ ] Change follows security best practices. `` --------- Co-authored-by: Damian Kalinowski <damian.kalinowski@intel.com> Co-authored-by: Trawinski, Dariusz <dariusz.trawinski@intel.com>

Copilot

Pull request overview

Updates OVMS GenAI continuous batching logging to include kv-cache size (and whether the cache is treated as dynamic vs static), and bumps pinned OpenVINO/GenAI/tokenizers dependencies to versions that provide the required metrics field.

Changes:

Extend continuous batching metrics log line to print cache usage along with kv-cache size (bytes formatted as B/KB/MB/GB/TB).
Propagate a “dynamic kv-cache” flag into LLMExecutorWrapper based on schedulerConfig.cache_size == 0.
Bump OpenVINO / OpenVINO GenAI / OpenVINO tokenizers pinning (Makefile, Windows deps script, demo export requirements) to dev20260305 / updated commit hashes.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
`src/llm/language_model/continuous_batching/llm_executor.hpp`	Adds formatting helpers and logs kv-cache size + dynamic/static label in metrics output.
`src/llm/language_model/continuous_batching/servable_initializer.cpp`	Passes dynamic/static cache indicator into `LLMExecutorWrapper`.
`Makefile`	Updates pinned OpenVINO/GenAI/tokenizers branches and nightly package URLs.
`windows_install_build_dependencies.bat`	Updates default GenAI nightly ZIP URL and pinned source branches on Windows.
`demos/common/export_models/requirements.txt`	Updates OpenVINO and tokenizers Python package versions to `dev20260305`.

Copilot · 2026-03-06T13:52:36Z

src/llm/language_model/continuous_batching/llm_executor.hpp

 #include <atomic>
 #include <condition_variable>
 #include <cstdint>
 #include <memory>
 #include <mutex>
+#include <string>
 #include <thread>
 #include <utility>

 #include <openvino/genai/continuous_batching_pipeline.hpp>

 #include "../../../logging.hpp"
 #include "../../../profiler.hpp"

 namespace ovms {
 struct LLMExecutor {
+    bool isDynamicKVCache;
    // For logging purposes we could have more information about graph and node here
    std::mutex mutex;
    std::condition_variable cv;
    std::shared_ptr<ov::genai::ContinuousBatchingPipeline> pipe = nullptr;

-    LLMExecutor(std::shared_ptr<ov::genai::ContinuousBatchingPipeline> pipe) {
+    LLMExecutor(std::shared_ptr<ov::genai::ContinuousBatchingPipeline> pipe, bool isDynamicKVCacheSet = false) {
        this->pipe = std::move(pipe);
+        this->isDynamicKVCache = isDynamicKVCacheSet;
    }

    bool hasRequests() {
        return (pipe->has_non_finished_requests());
    }

    void step() {
        OVMS_PROFILE_FUNCTION();
        pipe->step();
    }

    void waitForRequests(std::atomic<bool>* receivedEndSignal) {
        std::unique_lock<std::mutex> lock(mutex);
        cv.wait(lock, [this, receivedEndSignal] { return (pipe->has_non_finished_requests() || *receivedEndSignal); });
    }

    void notify() {
        std::unique_lock<std::mutex> lock(mutex);
        cv.notify_one();
    }

+    std::string formatCacheInfo(float cacheUsage, size_t cacheBytes, bool isCacheDynamic) {
+        std::ostringstream oss;
+        oss << std::fixed << std::setprecision(1);


This header now uses std::ostringstream and i/o manipulators (std::fixed / std::setprecision) but does not include the required standard headers ( and ). This will fail to compile in TUs that don't already include them indirectly; add the missing includes here (and consider for size_t to keep the header self-contained).

Add log

a26f7b0

rasapala requested review from dtrawins and mzegla February 18, 2026 15:44

popovaan approved these changes Feb 18, 2026

View reviewed changes

mzegla reviewed Feb 18, 2026

View reviewed changes

rasapala and others added 11 commits February 19, 2026 14:23

Add dynamic flag

dbb1d34

Coverity fixes 2026.1 (#3803)

7729304

Update transcriptions docs (#3961)

7fb778a

Adding env var for Qwen3-coder - continue demo (#3991)

0e23792

### 🛠 Summary Adding environmental variable for Qwen3-Coder-30B to make it work correctly ### 🧪 Checklist - [ ] Unit tests added. - [ ] The documentation updated. - [ ] Change follows security best practices. ``

Speaker embeddings demo improvements (#3987)

9e19f5b

Add copilot instructions (#3992)

a446f19

Style

a13d0ba

Merge branch 'main' into kvcache_size_log

23c73d7

Update genai, add % and dynamic,static keyword

08a7e22

mzegla approved these changes Mar 6, 2026

View reviewed changes

mzegla requested a review from Copilot March 6, 2026 13:48

Copilot started reviewing on behalf of mzegla March 6, 2026 13:49 View session

Copilot AI reviewed Mar 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add kv_cache size to pipeline generation log.#3989

Add kv_cache size to pipeline generation log.#3989
rasapala wants to merge 12 commits intomainfrom
kvcache_size_log

rasapala commented Feb 18, 2026 •

edited

Loading

Uh oh!

popovaan left a comment

Uh oh!

mzegla left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Conversation

rasapala commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🛠 Summary

🧪 Checklist

Uh oh!

popovaan left a comment

Choose a reason for hiding this comment

Uh oh!

mzegla left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

rasapala commented Feb 18, 2026 •

edited

Loading