Skip to content

Latest commit

 

History

History
445 lines (285 loc) · 34.4 KB

File metadata and controls

445 lines (285 loc) · 34.4 KB

Indexing.Documents

Overview

Available Operations

  • add_or_update - Index document

  • index - Index documents

  • bulk_index - Bulk index documents

  • process_all - Schedules the processing of uploaded documents

  • delete - Delete document

  • debug - Beta: Get document information

  • debug_many - Beta: Get information of a batch of documents

  • check_access - Check document access

  • status - Get document upload and indexing status ⚠️ Deprecated

  • count - Get document count ⚠️ Deprecated

add_or_update

Adds a document to the index or updates an existing document.

Example Usage

from glean.api_client import Glean, models
import os


with Glean(
    api_token=os.getenv("GLEAN_API_TOKEN", ""),
) as glean:

    glean.indexing.documents.add_or_update(document=models.DocumentDefinition(
        datasource="<value>",
    ))

    # Use the SDK ...

Parameters

Parameter Type Required Description
document models.DocumentDefinition ✔️ Indexable document structure
version Optional[int] Version number for document for optimistic concurrency control. If absent or 0 then no version checks are done.
retries Optional[utils.RetryConfig] Configuration to override the default retry behavior of the client.

Errors

Error Type Status Code Content Type
errors.GleanError 4XX, 5XX */*

index

Adds or updates multiple documents in the index. Please refer to the bulk indexing documentation for an explanation of when to use this endpoint.

Example Usage

from glean.api_client import Glean
import os


with Glean(
    api_token=os.getenv("GLEAN_API_TOKEN", ""),
) as glean:

    glean.indexing.documents.index(datasource="<value>", documents=[])

    # Use the SDK ...

Parameters

Parameter Type Required Description
datasource str ✔️ Datasource of the documents
documents List[models.DocumentDefinition] ✔️ Batch of documents being added/updated
upload_id Optional[str] Optional id parameter to identify and track a batch of documents.
retries Optional[utils.RetryConfig] Configuration to override the default retry behavior of the client.

Errors

Error Type Status Code Content Type
errors.GleanError 4XX, 5XX */*

bulk_index

Replaces the documents in a datasource using paginated batch API calls. Please refer to the bulk indexing documentation for an explanation of how to use bulk endpoints.

Example Usage

from glean.api_client import Glean
import os


with Glean(
    api_token=os.getenv("GLEAN_API_TOKEN", ""),
) as glean:

    glean.indexing.documents.bulk_index(upload_id="<id>", datasource="<value>", documents=[])

    # Use the SDK ...

Parameters

Parameter Type Required Description
upload_id str ✔️ Unique id that must be used for this bulk upload instance
datasource str ✔️ Datasource of the documents
documents List[models.DocumentDefinition] ✔️ Batch of documents for the datasource
is_first_page Optional[bool] true if this is the first page of the upload. Defaults to false
is_last_page Optional[bool] true if this is the last page of the upload. Defaults to false
force_restart_upload Optional[bool] Flag to discard previous upload attempts and start from scratch. Must be specified with isFirstPage=true
disable_stale_document_deletion_check Optional[bool] True if older documents need to be force deleted after the upload completes. Defaults to older documents being deleted asynchronously. This must only be set when isLastPage = true
retries Optional[utils.RetryConfig] Configuration to override the default retry behavior of the client.

Errors

Error Type Status Code Content Type
errors.GleanError 4XX, 5XX */*

process_all

Schedules the immediate processing of documents uploaded through the indexing API. By default the uploaded documents will be processed asynchronously but this API can be used to schedule processing of all documents on demand.

If a datasource parameter is specified, processing is limited to that custom datasource. Without it, processing applies to all documents across all custom datasources.

Rate Limits

This endpoint is rate-limited to one usage every 3 hours. Exceeding this limit results in a 429 response code. Here's how the rate limit works:

  1. Calling /processalldocuments for datasource foo prevents another call for foo for 3 hours.
  2. Calling /processalldocuments for datasource foo doesn't affect immediate calls for bar.
  3. Calling /processalldocuments for all datasources prevents any datasource calls for 3 hours.
  4. Calling /processalldocuments for datasource foo doesn't affect immediate calls for all datasources.

For more frequent document processing, contact Glean support.

Example Usage

from glean.api_client import Glean
import os


with Glean(
    api_token=os.getenv("GLEAN_API_TOKEN", ""),
) as glean:

    glean.indexing.documents.process_all()

    # Use the SDK ...

Parameters

Parameter Type Required Description
request models.ProcessAllDocumentsRequest ✔️ The request object to use for the request.
retries Optional[utils.RetryConfig] Configuration to override the default retry behavior of the client.

Errors

Error Type Status Code Content Type
errors.GleanError 4XX, 5XX */*

delete

Deletes the specified document from the index. Succeeds if document is not present.

Example Usage

from glean.api_client import Glean
import os


with Glean(
    api_token=os.getenv("GLEAN_API_TOKEN", ""),
) as glean:

    glean.indexing.documents.delete(datasource="<value>", object_type="<value>", id="<id>")

    # Use the SDK ...

Parameters

Parameter Type Required Description
datasource str ✔️ datasource of the document
object_type str ✔️ object type of the document
id str ✔️ The id of the document
version Optional[int] Version number for document for optimistic concurrency control. If absent or 0 then no version checks are done.
retries Optional[utils.RetryConfig] Configuration to override the default retry behavior of the client.

Errors

Error Type Status Code Content Type
errors.GleanError 4XX, 5XX */*

debug

Gives various information that would help in debugging related to a particular document. Currently in beta, might undergo breaking changes without prior notice.

Tip: Refer to the Troubleshooting tutorial for more information.

Example Usage

from glean.api_client import Glean
import os


with Glean(
    api_token=os.getenv("GLEAN_API_TOKEN", ""),
) as glean:

    res = glean.indexing.documents.debug(datasource="<value>", object_type="Article", doc_id="art123")

    # Handle response
    print(res)

Parameters

Parameter Type Required Description Example
datasource str ✔️ The datasource to which the document belongs
object_type str ✔️ Object type of the document to get the status for. Article
doc_id str ✔️ Glean Document ID within the datasource to get the status for. art123
retries Optional[utils.RetryConfig] Configuration to override the default retry behavior of the client.

Response

models.DebugDocumentResponse

Errors

Error Type Status Code Content Type
errors.GleanError 4XX, 5XX */*

debug_many

Gives various information that would help in debugging related to a batch of documents. Currently in beta, might undergo breaking changes without prior notice.

Tip: Refer to the Troubleshooting tutorial for more information.

Example Usage

from glean.api_client import Glean
import os


with Glean(
    api_token=os.getenv("GLEAN_API_TOKEN", ""),
) as glean:

    res = glean.indexing.documents.debug_many(datasource="<value>", debug_documents=[])

    # Handle response
    print(res)

Parameters

Parameter Type Required Description
datasource str ✔️ The datasource to which the document belongs
debug_documents List[models.DebugDocumentRequest] ✔️ Documents to fetch debug information for
retries Optional[utils.RetryConfig] Configuration to override the default retry behavior of the client.

Response

models.DebugDocumentsResponse

Errors

Error Type Status Code Content Type
errors.GleanError 4XX, 5XX */*

check_access

Check if a given user has access to access a document in a custom datasource

Tip: Refer to the Troubleshooting tutorial for more information.

Example Usage

from glean.api_client import Glean
import os


with Glean(
    api_token=os.getenv("GLEAN_API_TOKEN", ""),
) as glean:

    res = glean.indexing.documents.check_access(datasource="<value>", object_type="<value>", doc_id="<id>", user_email="<value>")

    # Handle response
    print(res)

Parameters

Parameter Type Required Description
datasource str ✔️ Datasource of document to check access for.
object_type str ✔️ Object type of document to check access for.
doc_id str ✔️ Glean Document ID to check access for.
user_email str ✔️ Email of user to check access for.
retries Optional[utils.RetryConfig] Configuration to override the default retry behavior of the client.

Response

models.CheckDocumentAccessResponse

Errors

Error Type Status Code Content Type
errors.GleanError 4XX, 5XX */*

status

Intended for debugging/validation. Fetches the current upload and indexing status of documents.

Tip: Use /debug/{datasource}/document for richer information.

⚠️ DEPRECATED: This will be removed in a future release, please migrate away from it as soon as possible.

Example Usage

from glean.api_client import Glean
import os


with Glean(
    api_token=os.getenv("GLEAN_API_TOKEN", ""),
) as glean:

    res = glean.indexing.documents.status(datasource="<value>", object_type="<value>", doc_id="<id>")

    # Handle response
    print(res)

Parameters

Parameter Type Required Description
datasource str ✔️ Datasource to get fetch document status for
object_type str ✔️ Object type of the document to get the status for
doc_id str ✔️ Glean Document ID within the datasource to get the status for.
retries Optional[utils.RetryConfig] Configuration to override the default retry behavior of the client.

Response

models.GetDocumentStatusResponse

Errors

Error Type Status Code Content Type
errors.GleanError 4XX, 5XX */*

count

Fetches document count for the specified custom datasource.

Tip: Use /debug/{datasource}/status for richer information.

⚠️ DEPRECATED: This will be removed in a future release, please migrate away from it as soon as possible.

Example Usage

from glean.api_client import Glean
import os


with Glean(
    api_token=os.getenv("GLEAN_API_TOKEN", ""),
) as glean:

    res = glean.indexing.documents.count(datasource="<value>")

    # Handle response
    print(res)

Parameters

Parameter Type Required Description
datasource str ✔️ Datasource name for which document count is needed.
retries Optional[utils.RetryConfig] Configuration to override the default retry behavior of the client.

Response

models.GetDocumentCountResponse

Errors

Error Type Status Code Content Type
errors.GleanError 4XX, 5XX */*