Component: LLM continuous batching, LLMCalculatorOptions / mediapipe registration
OVMS version: 2026.1.0.72cc0624 (OpenVINO backend 2026.1.0, OpenVINO GenAI backend 2026.1.0.0)
Context
When several deployments share the same on-disk model directory but need different generation defaults (e.g. different num_assistant_tokens, temperature, or sampling settings per served endpoint), the only current option is to duplicate the model directory — including the weights — because OVMS reads generation_config.json from a fixed name inside models_path. For multi-gigabyte LLMs this is impractical.
The same problem exists for graph.pbtxt, but is already solved there: graph_path in the mediapipe config entry lets one model directory back several deployments with different graphs. There is no equivalent for generation_config.json.
Related to #4221
Question
Would it be feasible to add a per-LLM-node override for the generation-config file path — analogous to graph_path? A natural shape would be either:
- a
generation_config_path field in LLMCalculatorOptions (next to models_path), absolute or relative to models_path; or
- a sibling field at the mediapipe config-entry level (next to
graph_path).
From a quick read of openvino.genai, ContinuousBatchingPipeline accepts an optional GenerationConfig at construction and exposes set_config() post-construction, so the underlying mechanism appears to be already in place. The work seems contained within src/llm/language_model/continuous_batching/servable_initializer.cpp on the OVMS side.
Use case
Multiple served names backed by the same model weights, each with its own generation defaults. Without per-entry generation-config selection, each variant requires a full copy of the model directory on disk.
Open questions
- Is there a reason this hasn't been exposed yet — for example, a planned different mechanism (per-deployment overrides through some other channel), or an interaction with model auto-detection/conversion that I'm missing?
- Is one of the placement options preferred from the architecture side?
Component: LLM continuous batching,
LLMCalculatorOptions/ mediapipe registrationOVMS version: 2026.1.0.72cc0624 (OpenVINO backend 2026.1.0, OpenVINO GenAI backend 2026.1.0.0)
Context
When several deployments share the same on-disk model directory but need different generation defaults (e.g. different
num_assistant_tokens,temperature, or sampling settings per served endpoint), the only current option is to duplicate the model directory — including the weights — because OVMS readsgeneration_config.jsonfrom a fixed name insidemodels_path. For multi-gigabyte LLMs this is impractical.The same problem exists for
graph.pbtxt, but is already solved there:graph_pathin the mediapipe config entry lets one model directory back several deployments with different graphs. There is no equivalent forgeneration_config.json.Related to #4221
Question
Would it be feasible to add a per-LLM-node override for the generation-config file path — analogous to
graph_path? A natural shape would be either:generation_config_pathfield inLLMCalculatorOptions(next tomodels_path), absolute or relative tomodels_path; orgraph_path).From a quick read of openvino.genai,
ContinuousBatchingPipelineaccepts an optionalGenerationConfigat construction and exposesset_config()post-construction, so the underlying mechanism appears to be already in place. The work seems contained withinsrc/llm/language_model/continuous_batching/servable_initializer.cppon the OVMS side.Use case
Multiple served names backed by the same model weights, each with its own generation defaults. Without per-entry generation-config selection, each variant requires a full copy of the model directory on disk.
Open questions