Skip to content

Add kv_cache size to pipeline generation log.#3989

Open
rasapala wants to merge 12 commits intomainfrom
kvcache_size_log
Open

Add kv_cache size to pipeline generation log.#3989
rasapala wants to merge 12 commits intomainfrom
kvcache_size_log

Conversation

@rasapala
Copy link
Collaborator

@rasapala rasapala commented Feb 18, 2026

🛠 Summary

JIRA CVS-181044
Depends on https://github.com/openvinotoolkit/openvino.genai/pull/3352/changes and ovms genai update.
Output:

Cache usage static 18.6% of 1022.4 MB;
for static 1GB

Cache usage dynamic 98.6% of 189.9 MB;
for dynamic cache_size = 0

For: {"prompt_tokens":2299,"completion_tokens":30,"total_tokens":2329}}

🧪 Checklist

  • Unit tests added.
  • The documentation updated.
  • Change follows security best practices.
    ``

@rasapala rasapala requested review from dtrawins and mzegla February 18, 2026 15:44
Copy link

@popovaan popovaan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Copy link
Collaborator

@mzegla mzegla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does not seem to make sense to report 100% of X in every log for dynamic cache size.
It's a minor, but can we drop showing percentage for dynamic cache size and only report size in MB/GB etc. ?

rasapala and others added 11 commits February 19, 2026 14:23
### 🛠 Summary

[CVS-180948](https://jira.devtools.intel.com/browse/CVS-180948)
Adding new parameter to export_model to enable export of model with
requirement for trusting remote code.

### 🧪 Checklist

- [ ] Unit tests added.
- [ ] The documentation updated.
- [ ] Change follows security best practices.
``

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Miłosz Żeglarski <milosz.zeglarski@intel.com>
### 🛠 Summary

Adding environmental variable for Qwen3-Coder-30B to make it work
correctly

### 🧪 Checklist

- [ ] Unit tests added.
- [ ] The documentation updated.
- [ ] Change follows security best practices.
``
### 🛠 Summary

Updating export and NPU usage.

### 🧪 Checklist

- [ ] Unit tests added.
- [ ] The documentation updated.
- [ ] Change follows security best practices.
``

---------

Co-authored-by: Damian Kalinowski <damian.kalinowski@intel.com>
Co-authored-by: Trawinski, Dariusz <dariusz.trawinski@intel.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates OVMS GenAI continuous batching logging to include kv-cache size (and whether the cache is treated as dynamic vs static), and bumps pinned OpenVINO/GenAI/tokenizers dependencies to versions that provide the required metrics field.

Changes:

  • Extend continuous batching metrics log line to print cache usage along with kv-cache size (bytes formatted as B/KB/MB/GB/TB).
  • Propagate a “dynamic kv-cache” flag into LLMExecutorWrapper based on schedulerConfig.cache_size == 0.
  • Bump OpenVINO / OpenVINO GenAI / OpenVINO tokenizers pinning (Makefile, Windows deps script, demo export requirements) to dev20260305 / updated commit hashes.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/llm/language_model/continuous_batching/llm_executor.hpp Adds formatting helpers and logs kv-cache size + dynamic/static label in metrics output.
src/llm/language_model/continuous_batching/servable_initializer.cpp Passes dynamic/static cache indicator into LLMExecutorWrapper.
Makefile Updates pinned OpenVINO/GenAI/tokenizers branches and nightly package URLs.
windows_install_build_dependencies.bat Updates default GenAI nightly ZIP URL and pinned source branches on Windows.
demos/common/export_models/requirements.txt Updates OpenVINO and tokenizers Python package versions to dev20260305.

Comment on lines 19 to +67
#include <atomic>
#include <condition_variable>
#include <cstdint>
#include <memory>
#include <mutex>
#include <string>
#include <thread>
#include <utility>

#include <openvino/genai/continuous_batching_pipeline.hpp>

#include "../../../logging.hpp"
#include "../../../profiler.hpp"

namespace ovms {
struct LLMExecutor {
bool isDynamicKVCache;
// For logging purposes we could have more information about graph and node here
std::mutex mutex;
std::condition_variable cv;
std::shared_ptr<ov::genai::ContinuousBatchingPipeline> pipe = nullptr;

LLMExecutor(std::shared_ptr<ov::genai::ContinuousBatchingPipeline> pipe) {
LLMExecutor(std::shared_ptr<ov::genai::ContinuousBatchingPipeline> pipe, bool isDynamicKVCacheSet = false) {
this->pipe = std::move(pipe);
this->isDynamicKVCache = isDynamicKVCacheSet;
}

bool hasRequests() {
return (pipe->has_non_finished_requests());
}

void step() {
OVMS_PROFILE_FUNCTION();
pipe->step();
}

void waitForRequests(std::atomic<bool>* receivedEndSignal) {
std::unique_lock<std::mutex> lock(mutex);
cv.wait(lock, [this, receivedEndSignal] { return (pipe->has_non_finished_requests() || *receivedEndSignal); });
}

void notify() {
std::unique_lock<std::mutex> lock(mutex);
cv.notify_one();
}

std::string formatCacheInfo(float cacheUsage, size_t cacheBytes, bool isCacheDynamic) {
std::ostringstream oss;
oss << std::fixed << std::setprecision(1);
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This header now uses std::ostringstream and i/o manipulators (std::fixed / std::setprecision) but does not include the required standard headers ( and ). This will fail to compile in TUs that don't already include them indirectly; add the missing includes here (and consider for size_t to keep the header self-contained).

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants