Skip to content

Commit e5d156f

Browse files
Add Nvidia inference specification (#5794)
* Add Nvidia inference specification * Enhance Nvidia integration by updating task descriptions and reordering task types * Update @redocly/cli and related dependencies to version 1.34.5 * Clarify documentation for Nvidia input_type and similarity fields * Update output
1 parent 5b295b2 commit e5d156f

28 files changed

+1815
-60
lines changed

output/openapi/elasticsearch-openapi.json

Lines changed: 330 additions & 3 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

output/openapi/elasticsearch-serverless-openapi.json

Lines changed: 330 additions & 3 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

output/schema/schema.json

Lines changed: 566 additions & 54 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

output/typescript/types.ts

Lines changed: 43 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

specification/_doc_ids/table.csv

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -398,6 +398,7 @@ inference-api-put-huggingface,https://www.elastic.co/docs/api/doc/elasticsearch/
398398
inference-api-put-jinaai,https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put-jinaai,,
399399
inference-api-put-llama,https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put-llama,,
400400
inference-api-put-mistral,https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put-mistral,https://www.elastic.co/guide/en/elasticsearch/reference/8.18/infer-service-mistral.html,
401+
inference-api-put-nvidia,https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put-nvidia,,
401402
inference-api-put-openai,https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put-openai,https://www.elastic.co/guide/en/elasticsearch/reference/8.18/infer-service-openai.html,
402403
inference-api-put-openshift-ai,https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put-openshift-ai,,
403404
inference-api-put-voyageai,https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put-voyageai,,
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
{
2+
"inference.put_nvidia": {
3+
"documentation": {
4+
"url": "https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put-nvidia",
5+
"description": "Create an Nvidia inference endpoint"
6+
},
7+
"stability": "stable",
8+
"visibility": "public",
9+
"headers": {
10+
"accept": ["application/json"],
11+
"content_type": ["application/json"]
12+
},
13+
"url": {
14+
"paths": [
15+
{
16+
"path": "/_inference/{task_type}/{nvidia_inference_id}",
17+
"methods": ["PUT"],
18+
"parts": {
19+
"task_type": {
20+
"type": "enum",
21+
"description": "The task type",
22+
"options": [
23+
"chat_completion",
24+
"completion",
25+
"rerank",
26+
"text_embedding"
27+
]
28+
},
29+
"nvidia_inference_id": {
30+
"type": "string",
31+
"description": "The inference ID"
32+
}
33+
}
34+
}
35+
]
36+
},
37+
"body": {
38+
"description": "The inference endpoint's task and service settings",
39+
"required": true
40+
},
41+
"params": {
42+
"timeout": {
43+
"type": "time",
44+
"description": "Specifies the amount of time to wait for the inference endpoint to be created.",
45+
"default": "30s"
46+
}
47+
}
48+
}
49+
}

specification/inference/_types/CommonTypes.ts

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1810,6 +1810,90 @@ export enum MistralServiceType {
18101810
mistral
18111811
}
18121812

1813+
export class NvidiaServiceSettings {
1814+
/**
1815+
* A valid API key for your Nvidia endpoint.
1816+
* Can be found in `API Keys` section of Nvidia account settings.
1817+
*/
1818+
api_key: string
1819+
/**
1820+
* The URL of the Nvidia model endpoint. If not provided, the default endpoint URL is used depending on the task type:
1821+
*
1822+
* * For `text_embedding` task - `https://integrate.api.nvidia.com/v1/embeddings`.
1823+
* * For `completion` and `chat_completion` tasks - `https://integrate.api.nvidia.com/v1/chat/completions`.
1824+
* * For `rerank` task - `https://ai.api.nvidia.com/v1/retrieval/nvidia/reranking`.
1825+
*/
1826+
url?: string
1827+
/**
1828+
* The name of the model to use for the inference task.
1829+
* Refer to the model's documentation for the name if needed.
1830+
* Service has been tested and confirmed to be working with the following models:
1831+
*
1832+
* * For `text_embedding` task - `nvidia/llama-3.2-nv-embedqa-1b-v2`.
1833+
* * For `completion` and `chat_completion` tasks - `microsoft/phi-3-mini-128k-instruct`.
1834+
* * For `rerank` task - `nv-rerank-qa-mistral-4b:1`.
1835+
* Service doesn't support `text_embedding` task `baai/bge-m3` and `nvidia/nvclip` models due to them not recognizing the `input_type` parameter.
1836+
*/
1837+
model_id: string
1838+
/**
1839+
* For a `text_embedding` task, the maximum number of tokens per input. Inputs exceeding this value are truncated prior to sending to the Nvidia API.
1840+
*/
1841+
max_input_tokens?: integer
1842+
/**
1843+
* For a `text_embedding` task, the similarity measure. One of cosine, dot_product, l2_norm.
1844+
*/
1845+
similarity?: NvidiaSimilarityType
1846+
/**
1847+
* This setting helps to minimize the number of rate limit errors returned from the Nvidia API.
1848+
* By default, the `nvidia` service sets the number of requests allowed per minute to 3000.
1849+
*/
1850+
rate_limit?: RateLimitSetting
1851+
}
1852+
1853+
export enum NvidiaTaskType {
1854+
chat_completion,
1855+
completion,
1856+
rerank,
1857+
text_embedding
1858+
}
1859+
1860+
export enum NvidiaServiceType {
1861+
nvidia
1862+
}
1863+
1864+
export enum NvidiaSimilarityType {
1865+
cosine,
1866+
dot_product,
1867+
l2_norm
1868+
}
1869+
1870+
export class NvidiaTaskSettings {
1871+
/**
1872+
* For a `text_embedding` task, type of input sent to the Nvidia endpoint.
1873+
* Valid values are:
1874+
*
1875+
* * `ingest`: Mapped to Nvidia's `passage` value in request. Used when generating embeddings during indexing.
1876+
* * `search`: Mapped to Nvidia's `query` value in request. Used when generating embeddings during querying.
1877+
*
1878+
* IMPORTANT: For Nvidia endpoints, if the `input_type` field is not specified, it defaults to `query`.
1879+
*/
1880+
input_type?: NvidiaInputType
1881+
/**
1882+
* For a `text_embedding` task, the method used by the Nvidia model to handle inputs longer than the maximum token length.
1883+
* Valid values are:
1884+
*
1885+
* * `END`: When the input exceeds the maximum input token length, the end of the input is discarded.
1886+
* * `NONE`: When the input exceeds the maximum input token length, an error is returned.
1887+
* * `START`: When the input exceeds the maximum input token length, the start of the input is discarded.
1888+
*/
1889+
truncate?: CohereTruncateType
1890+
}
1891+
1892+
export enum NvidiaInputType {
1893+
ingest,
1894+
search
1895+
}
1896+
18131897
export class OpenAIServiceSettings {
18141898
/**
18151899
* A valid API key of your OpenAI account.
@@ -1908,6 +1992,7 @@ export class OpenShiftAiServiceSettings {
19081992
max_input_tokens?: integer
19091993
/**
19101994
* For a `text_embedding` task, the similarity measure. One of cosine, dot_product, l2_norm.
1995+
* If not specified, the default dot_product value is used.
19111996
*/
19121997
similarity?: OpenShiftAiSimilarityType
19131998
/**

specification/inference/_types/Services.ts

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@ import {
4141
TaskTypeJinaAi,
4242
TaskTypeLlama,
4343
TaskTypeMistral,
44+
TaskTypeNvidia,
4445
TaskTypeOpenAI,
4546
TaskTypeOpenShiftAi,
4647
TaskTypeVoyageAI,
@@ -304,6 +305,17 @@ export class InferenceEndpointInfoMistral extends InferenceEndpoint {
304305
task_type: TaskTypeMistral
305306
}
306307

308+
export class InferenceEndpointInfoNvidia extends InferenceEndpoint {
309+
/**
310+
* The inference ID
311+
*/
312+
inference_id: string
313+
/**
314+
* The task type
315+
*/
316+
task_type: TaskTypeNvidia
317+
}
318+
307319
export class InferenceEndpointInfoOpenAI extends InferenceEndpoint {
308320
/**
309321
* The inference Id

specification/inference/_types/TaskType.ts

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -141,6 +141,13 @@ export enum TaskTypeMistral {
141141
completion
142142
}
143143

144+
export enum TaskTypeNvidia {
145+
chat_completion,
146+
completion,
147+
rerank,
148+
text_embedding
149+
}
150+
144151
export enum TaskTypeOpenAI {
145152
text_embedding,
146153
chat_completion,

specification/inference/put/PutRequest.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@ import { TaskType } from '@inference/_types/TaskType'
4949
* * JinaAI (`rerank`, `text_embedding`)
5050
* * Llama (`chat_completion`, `completion`, `text_embedding`)
5151
* * Mistral (`chat_completion`, `completion`, `text_embedding`)
52+
* * Nvidia (`chat_completion`, `completion`, `text_embedding`, `rerank`)
5253
* * OpenAI (`chat_completion`, `completion`, `text_embedding`)
5354
* * OpenShift AI (`chat_completion`, `completion`, `rerank`, `text_embedding`)
5455
* * VoyageAI (`rerank`, `text_embedding`)

0 commit comments

Comments
 (0)