From a516e882a26049620d363078dad1b55717e36f92 Mon Sep 17 00:00:00 2001
From: Damian Kalinowski <damian.kalinowski@intel.com>
Date: Tue, 26 May 2026 09:14:20 +0200
Subject: [PATCH] save

---
 docs/model_server_rest_api_chat.md      | 2 +-
 docs/model_server_rest_api_responses.md | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/model_server_rest_api_chat.md b/docs/model_server_rest_api_chat.md
index 5a2718fc3f..6e1041980f 100644
--- a/docs/model_server_rest_api_chat.md
+++ b/docs/model_server_rest_api_chat.md
@@ -219,7 +219,7 @@ Some parameters, especially related to sampling (like `temperature`, `top_p` etc
 | logprobs | ⚠️ | ✅ | ✅ | bool (default: `false`) | Include the log probabilities on the logprob of the returned output token. **_ in stream mode logprobs are not returned. Only info about selected tokens is returned _** |
 | tools | ✅ | ✅ | ✅ | array | A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for. See [OpenAI API reference](https://platform.openai.com/docs/api-reference/chat/create#chat-create-tools) for more details. |
 | tool_choice | ✅ | ✅ | ✅ | string or object | Controls which (if any) tool is called by the model. `none` means the model will not call any tool and instead generates a message. `auto` means the model can pick between generating a message or calling one or more tools. `required` means that model should call at least one tool. Specifying a particular tool via `{"type": "function", "function": {"name": "my_function"}}` forces the model to call that tool. See [OpenAI API reference](https://platform.openai.com/docs/api-reference/chat/create#chat-create-tool_choice) for more details. |
-| response_format | ✅ | ✅ | ✅ | object | An object specifying the format that the model must output. Setting to `{ "type": "json_schema", "json_schema": {...} }` enables Structured Outputs which ensures the model will match your supplied JSON schema according to [OpenAI reference](https://platform.openai.com/docs/api-reference/chat/create#chat-create-response_format). Learn more in the [Structured Outputs demo](../demos/continuous_batching/structured_output/README.md). Additionally, `response_format` can accept [XGrammar structural tags format](https://github.com/mlc-ai/xgrammar/blob/main/docs/tutorials/structural_tag.md#format-types) (not part of OpenAI API). For example: `{ "type": "const_string", "value": "Hello World!" }`. **Note** that if model server fails to process the format, the request will still be processed, but the format will not be imposed. |
+| response_format | ✅ | ✅ | ✅ | object | An object specifying the format that the model must output. Setting to `{ "type": "json_schema", "json_schema": {...} }` enables Structured Outputs which ensures the model will match your supplied JSON schema according to [OpenAI reference](https://platform.openai.com/docs/api-reference/chat/create#chat-create-response_format). Learn more in the [Structured Outputs demo](../demos/continuous_batching/structured_output/README.md). Additionally, `response_format` can accept [XGrammar structural tags format](https://github.com/mlc-ai/xgrammar/blob/v0.1.26/docs/tutorials/structural_tag.md#format-types) (not part of OpenAI API). For example: `{ "type": "const_string", "value": "Hello World!" }`. **Note** that if model server fails to process the format, the request will still be processed, but the format will not be imposed. |
 | chat_template_kwargs | ✅ | ❌ | ✅ |  object | Enables passing additional parameters to chat template engine. Example `{"enable_thinking": false}`. Note that values like `messages`, `eos_token`, `bos_token` etc. are provided natively to the template engine, so including them in `chat_template_kwargs` will cause error. |
 | skip_special_tokens | ✅ | ❌ | ✅ | bool (default: `true`) | Whether to remove special tokens (e.g. `<\|endoftext\|>`, `<\|im_end\|>`) from the generated output. Set to `false` to include them, which is useful when the model uses special tokens to encode structured information (e.g. bounding boxes, reasoning markers). When `false`, any tool or reasoning parser configured on the endpoint is silently disabled for the request, so the raw token stream is returned. This option works with most detokenizers exported with OpenVINO Tokenizers 2024.5 or later, unless they are based on custom ops. |
 
diff --git a/docs/model_server_rest_api_responses.md b/docs/model_server_rest_api_responses.md
index 6698a47efe..b097ca69c7 100644
--- a/docs/model_server_rest_api_responses.md
+++ b/docs/model_server_rest_api_responses.md
@@ -100,7 +100,7 @@ curl http://localhost/v3/responses \
 | ignore_eos | ✅ | ❌ | bool (default: `false`) | Whether to ignore the `EOS` token and continue generating tokens after the `EOS` token is generated. |
 | include_stop_str_in_output | ✅ | ❌ | bool (default: `false` if `stream=false`, `true` if `stream=true`) | Whether to include matched stop string in output. Setting it to false when `stream=true` is invalid configuration and will result in error. |
 | logprobs | ⚠️ | ❌ | bool (default: `false`) | Include the log probabilities on the logprob of the returned output token. **_In stream mode logprobs are not supported._** |
-| response_format | ✅ | ❌ | object (optional) | An object specifying the format that the model must output. Setting to `{ "type": "json_schema", "json_schema": {...} }` enables Structured Outputs. Additionally accepts [XGrammar structural tags format](https://github.com/mlc-ai/xgrammar/blob/main/docs/tutorials/structural_tag.md#format-types). OpenAI Responses API uses `text.format` instead (not supported in OVMS). |
+| response_format | ✅ | ❌ | object (optional) | An object specifying the format that the model must output. Setting to `{ "type": "json_schema", "json_schema": {...} }` enables Structured Outputs. Additionally accepts [XGrammar structural tags format](https://github.com/mlc-ai/xgrammar/blob/v0.1.26/docs/tutorials/structural_tag.md#format-types). OpenAI Responses API uses `text.format` instead (not supported in OVMS). |
 | tools | ⚠️ | ✅ | array (optional) | A list of tools the model may call. Currently, only **function** tools are supported. OpenAI also supports built-in tools (web_search, file_search, code_interpreter, etc.) and MCP tools. OVMS additionally accepts a flat `{type, name, parameters}` format alongside the nested `{type, function: {name, parameters}}` format. See [OpenAI API reference](https://platform.openai.com/docs/api-reference/chat/create#chat-create-tools) for more details. |
 | tool_choice | ✅ | ✅ | string or object (optional) | Controls which (if any) tool is called by the model. `none` means the model will not call any tool and instead generates a message. `auto` means the model can pick between generating a message or calling one or more tools. `required` means that model should call at least one tool. Specifying a particular function via `{"type": "function", "function": {"name": "my_function"}}` forces the model to call that tool. |
 | reasoning | ⚠️ | ✅ | object (optional) | Configuration for reasoning/thinking mode. The `effort` field accepts `"low"`, `"medium"`, or `"high"` — any value enables thinking mode (`enable_thinking: true` is injected into chat template kwargs). The `summary` field is accepted but ignored. |