Skip to content

generation_config.json path override per LLM node #4233

@korund

Description

@korund

Component: LLM continuous batching, LLMCalculatorOptions / mediapipe registration
OVMS version: 2026.1.0.72cc0624 (OpenVINO backend 2026.1.0, OpenVINO GenAI backend 2026.1.0.0)

Context

When several deployments share the same on-disk model directory but need different generation defaults (e.g. different num_assistant_tokens, temperature, or sampling settings per served endpoint), the only current option is to duplicate the model directory — including the weights — because OVMS reads generation_config.json from a fixed name inside models_path. For multi-gigabyte LLMs this is impractical.
The same problem exists for graph.pbtxt, but is already solved there: graph_path in the mediapipe config entry lets one model directory back several deployments with different graphs. There is no equivalent for generation_config.json.

Related to #4221

Question

Would it be feasible to add a per-LLM-node override for the generation-config file path — analogous to graph_path? A natural shape would be either:

  • a generation_config_path field in LLMCalculatorOptions (next to models_path), absolute or relative to models_path; or
  • a sibling field at the mediapipe config-entry level (next to graph_path).
    From a quick read of openvino.genai, ContinuousBatchingPipeline accepts an optional GenerationConfig at construction and exposes set_config() post-construction, so the underlying mechanism appears to be already in place. The work seems contained within src/llm/language_model/continuous_batching/servable_initializer.cpp on the OVMS side.

Use case

Multiple served names backed by the same model weights, each with its own generation defaults. Without per-entry generation-config selection, each variant requires a full copy of the model directory on disk.

Open questions

  • Is there a reason this hasn't been exposed yet — for example, a planned different mechanism (per-deployment overrides through some other channel), or an interaction with model auto-detection/conversion that I'm missing?
  • Is one of the placement options preferred from the architecture side?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions