-
add_or_update - Index document
-
index - Index documents
-
bulk_index - Bulk index documents
-
process_all - Schedules the processing of uploaded documents
-
delete - Delete document
-
debug - Beta: Get document information
-
debug_many - Beta: Get information of a batch of documents
-
check_access - Check document access
-
status- Get document upload and indexing status⚠️ Deprecated -
count- Get document count⚠️ Deprecated
Adds a document to the index or updates an existing document.
from glean.api_client import Glean, models
import os
with Glean(
api_token=os.getenv("GLEAN_API_TOKEN", ""),
) as glean:
glean.indexing.documents.add_or_update(document=models.DocumentDefinition(
datasource="<value>",
))
# Use the SDK ...| Parameter | Type | Required | Description |
|---|---|---|---|
document |
models.DocumentDefinition | ✔️ | Indexable document structure |
version |
Optional[int] | ➖ | Version number for document for optimistic concurrency control. If absent or 0 then no version checks are done. |
retries |
Optional[utils.RetryConfig] | ➖ | Configuration to override the default retry behavior of the client. |
| Error Type | Status Code | Content Type |
|---|---|---|
| errors.GleanError | 4XX, 5XX | */* |
Adds or updates multiple documents in the index. Please refer to the bulk indexing documentation for an explanation of when to use this endpoint.
from glean.api_client import Glean
import os
with Glean(
api_token=os.getenv("GLEAN_API_TOKEN", ""),
) as glean:
glean.indexing.documents.index(datasource="<value>", documents=[])
# Use the SDK ...| Parameter | Type | Required | Description |
|---|---|---|---|
datasource |
str | ✔️ | Datasource of the documents |
documents |
List[models.DocumentDefinition] | ✔️ | Batch of documents being added/updated |
upload_id |
Optional[str] | ➖ | Optional id parameter to identify and track a batch of documents. |
retries |
Optional[utils.RetryConfig] | ➖ | Configuration to override the default retry behavior of the client. |
| Error Type | Status Code | Content Type |
|---|---|---|
| errors.GleanError | 4XX, 5XX | */* |
Replaces the documents in a datasource using paginated batch API calls. Please refer to the bulk indexing documentation for an explanation of how to use bulk endpoints.
from glean.api_client import Glean
import os
with Glean(
api_token=os.getenv("GLEAN_API_TOKEN", ""),
) as glean:
glean.indexing.documents.bulk_index(upload_id="<id>", datasource="<value>", documents=[])
# Use the SDK ...| Parameter | Type | Required | Description |
|---|---|---|---|
upload_id |
str | ✔️ | Unique id that must be used for this bulk upload instance |
datasource |
str | ✔️ | Datasource of the documents |
documents |
List[models.DocumentDefinition] | ✔️ | Batch of documents for the datasource |
is_first_page |
Optional[bool] | ➖ | true if this is the first page of the upload. Defaults to false |
is_last_page |
Optional[bool] | ➖ | true if this is the last page of the upload. Defaults to false |
force_restart_upload |
Optional[bool] | ➖ | Flag to discard previous upload attempts and start from scratch. Must be specified with isFirstPage=true |
disable_stale_document_deletion_check |
Optional[bool] | ➖ | True if older documents need to be force deleted after the upload completes. Defaults to older documents being deleted asynchronously. This must only be set when isLastPage = true |
retries |
Optional[utils.RetryConfig] | ➖ | Configuration to override the default retry behavior of the client. |
| Error Type | Status Code | Content Type |
|---|---|---|
| errors.GleanError | 4XX, 5XX | */* |
Schedules the immediate processing of documents uploaded through the indexing API. By default the uploaded documents will be processed asynchronously but this API can be used to schedule processing of all documents on demand.
If a datasource parameter is specified, processing is limited to that custom datasource. Without it, processing applies to all documents across all custom datasources.
This endpoint is rate-limited to one usage every 3 hours. Exceeding this limit results in a 429 response code. Here's how the rate limit works:
- Calling
/processalldocumentsfor datasourcefooprevents another call forfoofor 3 hours. - Calling
/processalldocumentsfor datasourcefoodoesn't affect immediate calls forbar. - Calling
/processalldocumentsfor all datasources prevents any datasource calls for 3 hours. - Calling
/processalldocumentsfor datasourcefoodoesn't affect immediate calls for all datasources.
For more frequent document processing, contact Glean support.
from glean.api_client import Glean
import os
with Glean(
api_token=os.getenv("GLEAN_API_TOKEN", ""),
) as glean:
glean.indexing.documents.process_all()
# Use the SDK ...| Parameter | Type | Required | Description |
|---|---|---|---|
request |
models.ProcessAllDocumentsRequest | ✔️ | The request object to use for the request. |
retries |
Optional[utils.RetryConfig] | ➖ | Configuration to override the default retry behavior of the client. |
| Error Type | Status Code | Content Type |
|---|---|---|
| errors.GleanError | 4XX, 5XX | */* |
Deletes the specified document from the index. Succeeds if document is not present.
from glean.api_client import Glean
import os
with Glean(
api_token=os.getenv("GLEAN_API_TOKEN", ""),
) as glean:
glean.indexing.documents.delete(datasource="<value>", object_type="<value>", id="<id>")
# Use the SDK ...| Parameter | Type | Required | Description |
|---|---|---|---|
datasource |
str | ✔️ | datasource of the document |
object_type |
str | ✔️ | object type of the document |
id |
str | ✔️ | The id of the document |
version |
Optional[int] | ➖ | Version number for document for optimistic concurrency control. If absent or 0 then no version checks are done. |
retries |
Optional[utils.RetryConfig] | ➖ | Configuration to override the default retry behavior of the client. |
| Error Type | Status Code | Content Type |
|---|---|---|
| errors.GleanError | 4XX, 5XX | */* |
Gives various information that would help in debugging related to a particular document. Currently in beta, might undergo breaking changes without prior notice.
Tip: Refer to the Troubleshooting tutorial for more information.
from glean.api_client import Glean
import os
with Glean(
api_token=os.getenv("GLEAN_API_TOKEN", ""),
) as glean:
res = glean.indexing.documents.debug(datasource="<value>", object_type="Article", doc_id="art123")
# Handle response
print(res)| Parameter | Type | Required | Description | Example |
|---|---|---|---|---|
datasource |
str | ✔️ | The datasource to which the document belongs | |
object_type |
str | ✔️ | Object type of the document to get the status for. | Article |
doc_id |
str | ✔️ | Glean Document ID within the datasource to get the status for. | art123 |
retries |
Optional[utils.RetryConfig] | ➖ | Configuration to override the default retry behavior of the client. |
| Error Type | Status Code | Content Type |
|---|---|---|
| errors.GleanError | 4XX, 5XX | */* |
Gives various information that would help in debugging related to a batch of documents. Currently in beta, might undergo breaking changes without prior notice.
Tip: Refer to the Troubleshooting tutorial for more information.
from glean.api_client import Glean
import os
with Glean(
api_token=os.getenv("GLEAN_API_TOKEN", ""),
) as glean:
res = glean.indexing.documents.debug_many(datasource="<value>", debug_documents=[])
# Handle response
print(res)| Parameter | Type | Required | Description |
|---|---|---|---|
datasource |
str | ✔️ | The datasource to which the document belongs |
debug_documents |
List[models.DebugDocumentRequest] | ✔️ | Documents to fetch debug information for |
retries |
Optional[utils.RetryConfig] | ➖ | Configuration to override the default retry behavior of the client. |
| Error Type | Status Code | Content Type |
|---|---|---|
| errors.GleanError | 4XX, 5XX | */* |
Check if a given user has access to access a document in a custom datasource
Tip: Refer to the Troubleshooting tutorial for more information.
from glean.api_client import Glean
import os
with Glean(
api_token=os.getenv("GLEAN_API_TOKEN", ""),
) as glean:
res = glean.indexing.documents.check_access(datasource="<value>", object_type="<value>", doc_id="<id>", user_email="<value>")
# Handle response
print(res)| Parameter | Type | Required | Description |
|---|---|---|---|
datasource |
str | ✔️ | Datasource of document to check access for. |
object_type |
str | ✔️ | Object type of document to check access for. |
doc_id |
str | ✔️ | Glean Document ID to check access for. |
user_email |
str | ✔️ | Email of user to check access for. |
retries |
Optional[utils.RetryConfig] | ➖ | Configuration to override the default retry behavior of the client. |
models.CheckDocumentAccessResponse
| Error Type | Status Code | Content Type |
|---|---|---|
| errors.GleanError | 4XX, 5XX | */* |
Intended for debugging/validation. Fetches the current upload and indexing status of documents.
Tip: Use /debug/{datasource}/document for richer information.
⚠️ DEPRECATED: This will be removed in a future release, please migrate away from it as soon as possible.
from glean.api_client import Glean
import os
with Glean(
api_token=os.getenv("GLEAN_API_TOKEN", ""),
) as glean:
res = glean.indexing.documents.status(datasource="<value>", object_type="<value>", doc_id="<id>")
# Handle response
print(res)| Parameter | Type | Required | Description |
|---|---|---|---|
datasource |
str | ✔️ | Datasource to get fetch document status for |
object_type |
str | ✔️ | Object type of the document to get the status for |
doc_id |
str | ✔️ | Glean Document ID within the datasource to get the status for. |
retries |
Optional[utils.RetryConfig] | ➖ | Configuration to override the default retry behavior of the client. |
models.GetDocumentStatusResponse
| Error Type | Status Code | Content Type |
|---|---|---|
| errors.GleanError | 4XX, 5XX | */* |
Fetches document count for the specified custom datasource.
Tip: Use /debug/{datasource}/status for richer information.
⚠️ DEPRECATED: This will be removed in a future release, please migrate away from it as soon as possible.
from glean.api_client import Glean
import os
with Glean(
api_token=os.getenv("GLEAN_API_TOKEN", ""),
) as glean:
res = glean.indexing.documents.count(datasource="<value>")
# Handle response
print(res)| Parameter | Type | Required | Description |
|---|---|---|---|
datasource |
str | ✔️ | Datasource name for which document count is needed. |
retries |
Optional[utils.RetryConfig] | ➖ | Configuration to override the default retry behavior of the client. |
models.GetDocumentCountResponse
| Error Type | Status Code | Content Type |
|---|---|---|
| errors.GleanError | 4XX, 5XX | */* |