generation_config.json path override per LLM node

**Component:** LLM continuous batching, `LLMCalculatorOptions` / mediapipe registration
**OVMS version:** 2026.1.0.72cc0624 (OpenVINO backend 2026.1.0, OpenVINO GenAI backend 2026.1.0.0)

**Context**

When several deployments share the same on-disk model directory but need different generation defaults (e.g. different `num_assistant_tokens`, `temperature`, or sampling settings per served endpoint), the only current option is to duplicate the model directory — including the weights — because OVMS reads `generation_config.json` from a fixed name inside `models_path`. For multi-gigabyte LLMs this is impractical.
The same problem exists for `graph.pbtxt`, but is already solved there: `graph_path` in the mediapipe config entry lets one model directory back several deployments with different graphs. There is no equivalent for `generation_config.json`.

Related to #4221 

**Question**

Would it be feasible to add a per-LLM-node override for the generation-config file path — analogous to `graph_path`? A natural shape would be either:
- a `generation_config_path` field in `LLMCalculatorOptions` (next to `models_path`), absolute or relative to `models_path`; or
- a sibling field at the mediapipe config-entry level (next to `graph_path`).
From a quick read of openvino.genai, `ContinuousBatchingPipeline` accepts an optional `GenerationConfig` at construction and exposes `set_config()` post-construction, so the underlying mechanism appears to be already in place. The work seems contained within `src/llm/language_model/continuous_batching/servable_initializer.cpp` on the OVMS side.

**Use case**

Multiple served names backed by the same model weights, each with its own generation defaults. Without per-entry generation-config selection, each variant requires a full copy of the model directory on disk.

**Open questions**

- Is there a reason this hasn't been exposed yet — for example, a planned different mechanism (per-deployment overrides through some other channel), or an interaction with model auto-detection/conversion that I'm missing?
- Is one of the placement options preferred from the architecture side?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

generation_config.json path override per LLM node #4233

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

generation_config.json path override per LLM node #4233

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions