Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions sdk/cosmos/azure-cosmos/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
### 4.16.0b3 (Unreleased)

#### Features Added
* Documented support for the optional `embeddingSource` field on entries in `vector_embedding_policy.vectorEmbeddings`, which can be used to have the service generate vector embeddings from the specified item paths. See [46870](https://github.com/Azure/azure-sdk-for-python/pull/46870)

#### Breaking Changes

Expand Down
31 changes: 31 additions & 0 deletions sdk/cosmos/azure-cosmos/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -733,6 +733,37 @@ vector_embedding_policy = {
}
```

A vector embedding may optionally include an `embeddingSource` describing how the embedding is generated by the
service. The source specifies the item paths whose values are embedded (`sourcePaths`), the embedding model
deployment (`deploymentName`, `modelName`), the embedding service `endpoint`, and the `authType` used to call
that endpoint (one of `ApiKey` or `Entra`). A vector embedding policy that includes an embedding source looks like
this:
```python
vector_embedding_policy = {
"vectorEmbeddings": [
{
"path": "/embedding",
"dataType": "float32",
"dimensions": 1536,
"distanceFunction": "cosine",
"embeddingSource": {
"sourcePaths": [
"/journal_title",
"/title",
"/toc_abstract",
"/abstract",
"/full_text"
],
"deploymentName": "text-embedding-3-small",
"modelName": "text-embedding-3-small",
"endpoint": "<azure-ai-model-endpoint>",
"authType": "ApiKey"
}
}
]
}
```

Separately, vector indexes have been added to the already existing indexing_policy and only require two fields per index:
the path to the relevant field to be used, and the type of index from the possible options - flat, quantizedFlat, or diskANN.
A sample indexing policy with vector indexes would look like this:
Expand Down
30 changes: 24 additions & 6 deletions sdk/cosmos/azure-cosmos/azure/cosmos/aio/_database.py
Original file line number Diff line number Diff line change
Expand Up @@ -206,7 +206,10 @@ async def create_container(
note that analytical storage can only be enabled on Synapse Link enabled accounts.
:keyword dict[str, Any] vector_embedding_policy: The vector embedding policy for the container. Each vector
embedding possesses a predetermined number of dimensions, is associated with an underlying data type, and
is generated for a particular distance function.
is generated for a particular distance function. Each vector embedding may also include an optional
``embeddingSource`` describing how the embedding is produced; the source object supports
``sourcePaths`` (list of item paths whose values are embedded), ``deploymentName``, ``modelName``,
``endpoint`` (embedding service endpoint), and ``authType`` (one of ``ApiKey`` or ``Entra``).
:keyword dict[str, Any] change_feed_policy: The change feed policy to apply 'retentionDuration' to
the container.
:keyword dict[str, Any] full_text_policy: **provisional** The full text policy for the container.
Expand Down Expand Up @@ -283,7 +286,10 @@ async def create_container( # pylint: disable=too-many-statements
note that analytical storage can only be enabled on Synapse Link enabled accounts.
:keyword dict[str, Any] vector_embedding_policy: The vector embedding policy for the container. Each vector
embedding possesses a predetermined number of dimensions, is associated with an underlying data type, and
is generated for a particular distance function.
is generated for a particular distance function. Each vector embedding may also include an optional
``embeddingSource`` describing how the embedding is produced; the source object supports ``sourcePaths``
(list of item paths whose values are embedded), ``deploymentName``, ``modelName``, ``endpoint``
(embedding service endpoint), and ``authType`` (one of ``ApiKey`` or ``Entra``).
:keyword dict[str, Any] change_feed_policy: The change feed policy to apply 'retentionDuration' to
the container.
:keyword dict[str, Any] full_text_policy: **provisional** The full text policy for the container.
Expand Down Expand Up @@ -347,7 +353,10 @@ async def create_container( # pylint:disable=docstring-should-be-keyword, too-ma
note that analytical storage can only be enabled on Synapse Link enabled accounts.
:keyword dict[str, Any] vector_embedding_policy: The vector embedding policy for the container. Each vector
embedding possesses a predetermined number of dimensions, is associated with an underlying data type, and
is generated for a particular distance function.
is generated for a particular distance function. Each vector embedding may also include an optional
``embeddingSource`` describing how the embedding is produced; the source object supports ``sourcePaths``
(list of item paths whose values are embedded), ``deploymentName``, ``modelName``, ``endpoint``
(embedding service endpoint), and ``authType`` (one of ``ApiKey`` or ``Entra``).
:keyword dict[str, Any] change_feed_policy: The change feed policy to apply 'retentionDuration' to
the container.
:keyword dict[str, Any] full_text_policy: **provisional** The full text policy for the container.
Expand Down Expand Up @@ -497,7 +506,10 @@ async def create_container_if_not_exists(
note that analytical storage can only be enabled on Synapse Link enabled accounts.
:keyword dict[str, Any] vector_embedding_policy: The vector embedding policy for the container.
Each vector embedding possesses a predetermined number of dimensions, is associated with an underlying
data type, and is generated for a particular distance function.
data type, and is generated for a particular distance function. Each vector embedding may also include an
optional ``embeddingSource`` describing how the embedding is produced; the source object supports
``sourcePaths`` (list of item paths whose values are embedded), ``deploymentName``, ``modelName``,
``endpoint`` (embedding service endpoint), and ``authType`` (one of ``ApiKey`` or ``Entra``).
:keyword dict[str, Any] change_feed_policy: The change feed policy to apply 'retentionDuration' to
the container.
:keyword dict[str, Any] full_text_policy: **provisional** The full text policy for the container.
Expand Down Expand Up @@ -558,7 +570,10 @@ async def create_container_if_not_exists(
note that analytical storage can only be enabled on Synapse Link enabled accounts.
:keyword dict[str, Any] vector_embedding_policy: The vector embedding policy for the container.
Each vector embedding possesses a predetermined number of dimensions, is associated with an underlying
data type, and is generated for a particular distance function.
data type, and is generated for a particular distance function. Each vector embedding may also include an
optional ``embeddingSource`` describing how the embedding is produced; the source object supports
``sourcePaths`` (list of item paths whose values are embedded), ``deploymentName``, ``modelName``,
``endpoint`` (embedding service endpoint), and ``authType`` (one of ``ApiKey`` or ``Entra``).
:keyword dict[str, Any] change_feed_policy: The change feed policy to apply 'retentionDuration' to
the container.
:keyword dict[str, Any] full_text_policy: **provisional** The full text policy for the container.
Expand Down Expand Up @@ -606,7 +621,10 @@ async def create_container_if_not_exists( # pylint:disable=docstring-should-be-k
note that analytical storage can only be enabled on Synapse Link enabled accounts.
:keyword dict[str, Any] vector_embedding_policy: The vector embedding policy for the container.
Each vector embedding possesses a predetermined number of dimensions, is associated with an underlying
data type, and is generated for a particular distance function.
data type, and is generated for a particular distance function. Each vector embedding may also include an
optional ``embeddingSource`` describing how the embedding is produced; the source object supports
``sourcePaths`` (list of item paths whose values are embedded), ``deploymentName``, ``modelName``,
``endpoint`` (embedding service endpoint), and ``authType`` (one of ``ApiKey`` or ``Entra``).
:keyword dict[str, Any] change_feed_policy: The change feed policy to apply 'retentionDuration' to
the container.
:keyword dict[str, Any] full_text_policy: **provisional** The full text policy for the container.
Expand Down
30 changes: 24 additions & 6 deletions sdk/cosmos/azure-cosmos/azure/cosmos/database.py
Original file line number Diff line number Diff line change
Expand Up @@ -201,7 +201,10 @@ def create_container( # pylint:disable=docstring-missing-param
`here: https://learn.microsoft.com/azure/cosmos-db/nosql/query/computed-properties?tabs=dotnet`
:keyword dict[str, Any] vector_embedding_policy: The vector embedding policy for the container.
Each vector embedding possesses a predetermined number of dimensions, is associated with an underlying
data type, and is generated for a particular distance function.
data type, and is generated for a particular distance function. Each vector embedding may also include an
optional ``embeddingSource`` describing how the embedding is produced; the source object supports
``sourcePaths`` (list of item paths whose values are embedded), ``deploymentName``, ``modelName``,
``endpoint`` (embedding service endpoint), and ``authType`` (one of ``ApiKey`` or ``Entra``).
:keyword dict[str, Any] change_feed_policy: The change feed policy to apply 'retentionDuration' to
the container.
:keyword dict[str, Any] full_text_policy: **provisional** The full text policy for the container.
Expand Down Expand Up @@ -274,7 +277,10 @@ def create_container( # pylint:disable=docstring-missing-param
`here: https://learn.microsoft.com/azure/cosmos-db/nosql/query/computed-properties?tabs=dotnet`
:keyword dict[str, Any] vector_embedding_policy: The vector embedding policy for the container.
Each vector embedding possesses a predetermined number of dimensions, is associated with an underlying
data type, and is generated for a particular distance function.
data type, and is generated for a particular distance function. Each vector embedding may also include an
optional ``embeddingSource`` describing how the embedding is produced; the source object supports
``sourcePaths`` (list of item paths whose values are embedded), ``deploymentName``, ``modelName``,
``endpoint`` (embedding service endpoint), and ``authType`` (one of ``ApiKey`` or ``Entra``).
:keyword dict[str, Any] change_feed_policy: The change feed policy to apply 'retentionDuration' to
the container.
:keyword dict[str, Any] full_text_policy: **provisional** The full text policy for the container.
Expand Down Expand Up @@ -333,7 +339,10 @@ def create_container( # pylint:disable=docstring-missing-param, too-many-statem
`here: https://learn.microsoft.com/azure/cosmos-db/nosql/query/computed-properties?tabs=dotnet`
:keyword dict[str, Any] vector_embedding_policy: The vector embedding policy for the container.
Each vector embedding possesses a predetermined number of dimensions, is associated with an underlying
data type, and is generated for a particular distance function.
data type, and is generated for a particular distance function. Each vector embedding may also include an
optional ``embeddingSource`` describing how the embedding is produced; the source object supports
``sourcePaths`` (list of item paths whose values are embedded), ``deploymentName``, ``modelName``,
``endpoint`` (embedding service endpoint), and ``authType`` (one of ``ApiKey`` or ``Entra``).
:keyword dict[str, Any] change_feed_policy: The change feed policy to apply 'retentionDuration' to
the container.
:keyword dict[str, Any] full_text_policy: **provisional** The full text policy for the container.
Expand Down Expand Up @@ -485,7 +494,10 @@ def create_container_if_not_exists( # pylint:disable=docstring-missing-param
`here: https://learn.microsoft.com/azure/cosmos-db/nosql/query/computed-properties?tabs=dotnet`
:keyword dict[str, Any] vector_embedding_policy: The vector embedding policy for the container. Each vector
embedding possesses a predetermined number of dimensions, is associated with an underlying data type, and
is generated for a particular distance function.
is generated for a particular distance function. Each vector embedding may also include an optional
``embeddingSource`` describing how the embedding is produced; the source object supports ``sourcePaths``
(list of item paths whose values are embedded), ``deploymentName``, ``modelName``, ``endpoint``
(embedding service endpoint), and ``authType`` (one of ``ApiKey`` or ``Entra``).
:keyword dict[str, Any] change_feed_policy: The change feed policy to apply 'retentionDuration' to
the container.
:keyword dict[str, Any] full_text_policy: **provisional** The full text policy for the container.
Expand Down Expand Up @@ -544,7 +556,10 @@ def create_container_if_not_exists( # pylint:disable=docstring-missing-param
`here: https://learn.microsoft.com/azure/cosmos-db/nosql/query/computed-properties?tabs=dotnet`
:keyword dict[str, Any] vector_embedding_policy: The vector embedding policy for the container. Each vector
embedding possesses a predetermined number of dimensions, is associated with an underlying data type, and
is generated for a particular distance function.
is generated for a particular distance function. Each vector embedding may also include an optional
``embeddingSource`` describing how the embedding is produced; the source object supports ``sourcePaths``
(list of item paths whose values are embedded), ``deploymentName``, ``modelName``, ``endpoint``
(embedding service endpoint), and ``authType`` (one of ``ApiKey`` or ``Entra``).
:keyword dict[str, Any] change_feed_policy: The change feed policy to apply 'retentionDuration' to
the container.
:keyword dict[str, Any] full_text_policy: **provisional** The full text policy for the container.
Expand Down Expand Up @@ -589,7 +604,10 @@ def create_container_if_not_exists( # pylint:disable=docstring-missing-param, d
`here: https://learn.microsoft.com/azure/cosmos-db/nosql/query/computed-properties?tabs=dotnet`
:keyword dict[str, Any] vector_embedding_policy: The vector embedding policy for the container. Each vector
embedding possesses a predetermined number of dimensions, is associated with an underlying data type, and
is generated for a particular distance function.
is generated for a particular distance function. Each vector embedding may also include an optional
``embeddingSource`` describing how the embedding is produced; the source object supports ``sourcePaths``
(list of item paths whose values are embedded), ``deploymentName``, ``modelName``, ``endpoint``
(embedding service endpoint), and ``authType`` (one of ``ApiKey`` or ``Entra``).
:keyword dict[str, Any] change_feed_policy: The change feed policy to apply 'retentionDuration' to
the container.
:keyword dict[str, Any] full_text_policy: **provisional** The full text policy for the container.
Expand Down
Loading
Loading