diff --git a/.github/skills/find-package-skill/SKILL.md b/.github/skills/find-package-skill/SKILL.md new file mode 100644 index 000000000000..2dfb300c0357 --- /dev/null +++ b/.github/skills/find-package-skill/SKILL.md @@ -0,0 +1,25 @@ +--- +name: find-package-skill +description: '**UTILITY SKILL** — Must be consulted before making any changes to packages under sdk/. Provides package-specific domain knowledge that prevents common mistakes. WHEN: "add feature to package", "fix bug in package", "modify package code", "regenerate from typespec", "change client library".' +--- + +# Find Package Skill + +Before modifying any SDK package in this monorepo, check this registry to see +if a package-specific skill exists. Package skills contain tribal knowledge +(architecture, data flows, type mappings, pitfalls) that prevents common mistakes. + +Always check this registry before modifying any SDK package — even if you think +you already know the package well. + +## How to Use + +1. Find the package you're modifying in the table below. +2. Read the SKILL.md at the listed path using the Read tool. Then read all files under the `references/` directory next to it for additional context. +3. If the package isn't listed, no package-specific skill exists yet — proceed normally. + +## Package Skills + +| Package | Path | +| ------------------------- | --------------------------------------------------------------------------- | +| `azure-search-documents` | `sdk/search/azure-search-documents/.github/skills/search-documents/SKILL.md` | \ No newline at end of file diff --git a/sdk/search/azure-search-documents/.github/skills/search-documents/SKILL.md b/sdk/search/azure-search-documents/.github/skills/search-documents/SKILL.md new file mode 100644 index 000000000000..3b438d5940b1 --- /dev/null +++ b/sdk/search/azure-search-documents/.github/skills/search-documents/SKILL.md @@ -0,0 +1,224 @@ +--- +name: search-documents-python +description: 'Post-regeneration customization guide for the azure-search-documents Python SDK. After running tsp-client update, consult this skill to re-apply all search-specific customizations and produce a production-ready SDK. USE WHEN: regenerating the search SDK from TypeSpec, running tsp-client update, fixing broken _patch.py imports after regeneration, adding a new operation or model to the search SDK, verifying the SDK still works after spec changes, or any task that mentions "regenerate", "tsp-client", "typespec", or "codegen" in the context of this package.' +--- + +# azure-search-documents — Post-Regeneration Customization Guide + +After running `tsp-client update`, the generated code is a raw skeleton — not a shippable SDK. This skill tells you exactly what customizations exist in `_patch.py` files and how to verify they still work after regeneration. + +The generator never touches `_patch.py` files, so your customizations survive regeneration. But they import from generated modules that **do** change. A renamed model, a new parameter, or a removed enum value will silently break a `_patch.py` file. The verification steps below catch these breaks. + +## Step 1: Run Regeneration + +```bash +cd sdk/search/azure-search-documents + +# Update tsp-location.yaml with the new spec commit SHA, then: +tsp-client update + +# If API version changed, update _metadata.json too +``` + +## Step 2: Verify Imports in All `_patch.py` Files + +Every `_patch.py` with customizations imports from generated modules. After regeneration, check that these imports still resolve. The fastest way: + +```bash +python -c "from azure.search.documents import SearchClient" +python -c "from azure.search.documents.aio import SearchClient" +python -c "from azure.search.documents.indexes.models import SearchField, SearchFieldDataType" +python -c "from azure.search.documents.models import IndexDocumentsBatch" +``` + +If any fail, a generated class/enum was renamed or removed. Read `references/customizations.md` for the exact import each `_patch.py` depends on. + +The `_patch.py` files that have customizations (all others are empty boilerplate): + +``` +azure/search/documents/ +├── _patch.py # SearchClient, SearchIndexingBufferedSender, ApiVersion +├── _operations/_patch.py # SearchItemPaged, search(), document CRUD +├── models/_patch.py # IndexDocumentsBatch, RequestEntityTooLargeError +├── aio/_patch.py # Async SearchClient, async SearchIndexingBufferedSender +├── aio/_operations/_patch.py # AsyncSearchItemPaged, async search()/CRUD +├── indexes/_operations/_patch.py # Polymorphic delete/update, list helpers +├── indexes/models/_patch.py # SearchField, field builders, enum aliases +└── indexes/aio/_operations/_patch.py # Async index/indexer operations +``` + +## Step 3: Check ApiVersion + +After regeneration, the generated code targets the API version in `_metadata.json`. The hand-maintained `ApiVersion` enum in `_patch.py` must include this version. + +```bash +# 1. See what API version the generator used +python -c "import json; print(json.load(open('_metadata.json'))['apiVersion'])" + +# 2. See what versions the SDK currently advertises +grep -A 20 'class ApiVersion' azure/search/documents/_patch.py + +# 3. Check the current default +grep 'DEFAULT_VERSION' azure/search/documents/_patch.py +``` + +If the generated API version is **not** in the `ApiVersion` enum: +1. Add the new member to `ApiVersion` in `azure/search/documents/_patch.py` (e.g., `V2026_05_01_PREVIEW = "2026-05-01-preview"`) +2. Update `DEFAULT_VERSION` to point to the new member +3. Update the `ApiVersion` docstring in the class if one exists + +Verify the default round-trips correctly: + +```bash +python -c "from azure.search.documents._patch import ApiVersion, DEFAULT_VERSION; print(DEFAULT_VERSION.value)" +``` + +## Step 4: Check for New Operations That Need Wrappers + +Look at what changed in the generated operations files: + +```bash +git diff --name-only | grep "_operations\.py" | grep -v _patch +``` + +If a new operation was added to the generated `_operations.py`, decide whether it needs a convenience wrapper in `_patch.py`. Operations that need wrappers: +- **Delete operations** — need the polymorphic str-or-model pattern (see "Polymorphic Delete Pattern" below) +- **Create-or-update operations** — need `prefer="return=representation"`, `match_condition`, and `etag` forwarding +- **List operations** — may need a `select` parameter or name-only projection + +Operations that pass through without a wrapper (if the generated signature is already user-friendly) can be left alone — they're inherited from the generated mixin. + +Remember: every sync wrapper in `_operations/_patch.py` needs a matching async wrapper in `aio/_operations/_patch.py`. + +## Step 5: Check for Model/Enum Changes That Affect Customizations + +```bash +git diff models/_enums.py models/_models.py indexes/models/_enums.py indexes/models/_models.py +``` + +Watch for: +- **Renamed enum values** — backward-compat aliases in `_patch.py` may reference old names +- **New enum values that collide with Python keywords** — the generator renames them with `_ENUM` suffix (e.g., `IS` → `IS_ENUM`), and you must add a backward-compat alias +- **Changed model constructors** — `IndexDocumentsBatch`, `SearchField`, `SearchIndexerDataSourceConnection`, `KnowledgeBase` are subclassed in `_patch.py` and may break if the base constructor changes +- **New fields on `SearchResult`** — `_convert_search_result()` in `_operations/_patch.py` extracts `@search.*` metadata fields; new ones need to be added +- **Changed `SearchRequest` model** — `_build_search_request()` constructs this model directly; new parameters need to be wired through +- **Changed `SearchFieldDataType` model** — `indexes/models/_patch.py` monkey-patches `SearchFieldDataType.Collection` as a `staticmethod` and adds camelCase backward-compat aliases (e.g., `Int32` → `INT32`). If values are added, removed, or renamed in the generated enum, the aliases and `Collection` helper must be updated to match + +## Step 6: Ensure mypy pass + +```bash +cd sdk/search/azure-search-documents + +# Preferred — uses azpysdk CLI (install via: pip install -e eng/tools/azure-sdk-tools) +azpysdk mypy +``` + +The package-level `mypy.ini` already ignores `_generated` internals — you only need to fix errors in `_patch.py` files and `samples/`. + +## Step 7: Ensure pylint pass + +```bash +cd sdk/search/azure-search-documents + +# Preferred +azpysdk pylint +``` + +The repo-level `pylintrc` already excludes `_generated/`, `_vendor/`, `tests/`, and `samples/` — only `_patch.py` customizations are linted. + +## Step 8: Update Documentation and Samples + +If the regeneration added new operations, models, or changed the API version, update the docs and samples: + +### Changelog + +```bash +# Edit CHANGELOG.md — add an entry under the next unreleased version +# Format: +# ## 11.x.0bN (Unreleased) +# ### Features Added +# - Added `new_operation` to `SearchIndexClient` +# ### Breaking Changes +# - Renamed `OldModel` to `NewModel` +code CHANGELOG.md +``` + +### README + +If new client classes or major features were added, update `README.md` with usage examples: + +```bash +code README.md +``` + +# Customization Patterns Reference + +Each pattern below describes a customization that exists in the codebase today. After regeneration, verify each one still works. Read `references/customizations.md` for the exhaustive file-by-file inventory. + +## SearchClient Constructor Reordering + +`SearchClient` in `_patch.py` swaps parameter order to `(endpoint, index_name, credential)`. The generated base uses `(endpoint, credential, index_name)`. If regeneration changes the base constructor signature, update the subclass. + +## Custom Search Pagination + +`search()` does **not** use standard Azure SDK paging. It uses `SearchItemPaged` / `AsyncSearchItemPaged` with a custom `SearchPageIterator` because: +- Search paginates via POST with `nextPageParameters` body, not GET with `nextLink` +- First-page metadata (facets, count, answers) must be available before iteration +- Results are converted to dicts with `@search.*` keys + +The continuation token is Base64-encoded JSON: `{"apiVersion": "...", "nextLink": "...", "nextPageParameters": {...}}` + +Shared helpers (`_build_search_request`, `_convert_search_result`, `_pack_continuation_token`, `_unpack_continuation_token`) live in sync `_operations/_patch.py` and are imported by async. Don't duplicate them. + +## Pipe-Delimited Semantic Encoding + +`_build_search_request()` encodes semantic parameters into pipe-delimited wire format: +``` +query_answer="extractive", count=3 → "extractive|count-3" +query_caption="extractive", highlight=True → "extractive|highlight-true" +``` +New semantic parameters should follow this pattern. + +## SearchIndexingBufferedSender + +Entirely hand-authored in `_patch.py` (sync) and `aio/_patch.py` (async). Not generated. Wraps `SearchClient` with auto-flush timer, 413 recursive batch splitting, retry per document key (409/422/503), and key field auto-detection. The async version supports both sync and async callbacks via `asyncio.iscoroutinefunction()`. + +## Field Builders + +`SimpleField()`, `SearchableField()`, `ComplexField()` in `indexes/models/_patch.py`. `SimpleField` explicitly sets `searchable=False`. `SearchableField` auto-sets type to `String`/`Collection(String)`. + +`SearchField` subclass adds `hidden` property (inverse of `retrievable`). + +`SearchFieldDataType.Collection` is a `staticmethod` monkey-patched onto the generated enum. + +## Backward-Compatible Enum Aliases + +Monkey-patched at module load in `_patch.py` files: +```python +SearchFieldDataType.Int32 = SearchFieldDataType.INT32 # camelCase → UPPER +OcrSkillLanguage.IS = OcrSkillLanguage.IS_ENUM # Python keyword collision +ScoringStatistics.Global = ScoringStatistics.GLOBAL_ENUM +``` +After regeneration, verify the right-hand-side names still exist in the generated enums. If a new enum collides with a Python keyword, add an alias. + +## Polymorphic Delete Pattern + +All delete/update operations in `indexes/_operations/_patch.py` accept str or model object: +```python +def delete_index(self, index, *, match_condition=MatchConditions.Unconditionally, **kwargs): + try: + name = index.name # model object + return self._delete_index(name=name, etag=index.e_tag, match_condition=match_condition, **kwargs) + except AttributeError: + name = index # string + return self._delete_index(name=name, **kwargs) +``` +New resource types need this same wrapper in both sync and async. + +## Removed Model Tombstones + +`EntityRecognitionSkill`, `SentimentSkill` → `_RemovedModel` subclasses that `raise ValueError` on instantiation. Keep these for backward compat. + +## `_convert_index_response` Helper + +`list_indexes(select=...)` returns `SearchIndexResponse` (projection type). `_convert_index_response()` maps it to `SearchIndex`, notably `response.semantic` → `semantic_search`. Shared between sync and async via import. diff --git a/sdk/search/azure-search-documents/.github/skills/search-documents/references/customizations.md b/sdk/search/azure-search-documents/.github/skills/search-documents/references/customizations.md new file mode 100644 index 000000000000..ca2712246dd4 --- /dev/null +++ b/sdk/search/azure-search-documents/.github/skills/search-documents/references/customizations.md @@ -0,0 +1,232 @@ +# Post-Regeneration Customization Checklist + +Use this file after running `tsp-client update` to verify every customization. Each section is a `_patch.py` file with the exact classes, functions, and aliases it defines, plus what generated symbols it depends on. + +--- + +## File: `azure/search/documents/_patch.py` + +### Depends On (from generated code) +- `._client.SearchClient` as `_SearchClient` (base class) + +### Defines +| Symbol | Type | What It Does | +|--------|------|-------------| +| `SearchClient` | class | Subclass of `_SearchClient`; reorders constructor to `(endpoint, index_name, credential)` | +| `SearchIndexingBufferedSender` | class | Hand-authored batching sender. Uses `threading.Timer` for auto-flush, recursive 413 splitting, retry per doc key (409/422/503), key field auto-detection | +| `ApiVersion` | enum | Supported API versions. Default: `V2025_11_01_PREVIEW` | +| `is_retryable_status_code()` | function | Returns True for 409, 422, 503 | + +### After Regeneration, Verify +- [ ] `_SearchClient` base class constructor signature unchanged +- [ ] If API version changed, add new `ApiVersion` member and update default + +--- + +## File: `azure/search/documents/models/_patch.py` + +### Depends On (from generated code) +- `.._generated.models.IndexDocumentsBatch` as `IndexDocumentsBatchGenerated` (base class) +- `._enums.ScoringStatistics` + +### Defines +| Symbol | Type | What It Does | +|--------|------|-------------| +| `IndexDocumentsBatch` | class | Adds `add_upload_actions`, `add_delete_actions`, `add_merge_actions`, `add_merge_or_upload_actions`, `dequeue_actions`, `enqueue_actions`, `actions` property | +| `RequestEntityTooLargeError` | class | `HttpResponseError` subclass for 413 | + +### Enum Aliases +```python +ScoringStatistics.Global = ScoringStatistics.GLOBAL_ENUM +``` + +### After Regeneration, Verify +- [ ] `IndexDocumentsBatchGenerated` base class still exists with compatible constructor +- [ ] `ScoringStatistics.GLOBAL_ENUM` still exists in generated enums + +--- + +## File: `azure/search/documents/_operations/_patch.py` + +### Depends On (from generated code) +- `..models._models.SearchRequest` (constructed directly in `_build_search_request`) +- `..models._models.SearchResult` (field access in `_convert_search_result`) +- `azure.core.paging.ItemPaged`, `azure.core.paging.PageIterator` + +### Defines +| Symbol | Type | What It Does | +|--------|------|-------------| +| `_convert_search_result(result)` | function | Extracts `@search.score`, `@search.reranker_score`, `@search.highlights`, `@search.captions`, `@search.document_debug_info`, `@search.reranker_boosted_score` | +| `_pack_continuation_token(response, api_version)` | function | Base64 JSON: `{apiVersion, nextLink, nextPageParameters}` | +| `_unpack_continuation_token(token)` | function | Decodes token to `(next_link, next_page_request)` | +| `_build_search_request(search_text, **kwargs)` | function | Builds `SearchRequest`. Pipe-delimited encoding for answers/captions/rewrites | +| `SearchPageIterator` | class | Custom page iterator with `get_facets()`, `get_count()`, `get_coverage()`, `get_answers()`, `get_debug_info()` | +| `SearchItemPaged` | class | Extends `ItemPaged` with same metadata accessors | +| `_SearchClientOperationsMixin` | class | Overrides: `search()`, `index_documents()` (413 splitting), `upload_documents()`, `delete_documents()`, `merge_documents()`, `merge_or_upload_documents()` | + +### After Regeneration, Verify +- [ ] `SearchRequest` model fields still match what `_build_search_request` sets — new search parameters need to be wired through +- [ ] `SearchResult` model fields still match what `_convert_search_result` extracts — new `@search.*` fields need to be added +- [ ] Generated mixin methods that `_SearchClientOperationsMixin` calls (e.g., `_search_post`) still exist with same signatures + +--- + +## File: `azure/search/documents/aio/_patch.py` + +### Depends On (from generated code) +- `.._client.SearchClient` as async `_SearchClient` (base class) + +### Defines +| Symbol | Type | What It Does | +|--------|------|-------------| +| `SearchClient` | class | Async subclass, same reorder as sync | +| `SearchIndexingBufferedSender` | class | Async version: `asyncio.Task` instead of `threading.Timer`, `iscoroutinefunction()` callback detection | + +### After Regeneration, Verify +- [ ] Same checks as sync `_patch.py` + +--- + +## File: `azure/search/documents/aio/_operations/_patch.py` + +### Depends On (from generated code) +- Same models as sync `_operations/_patch.py` +- `azure.core.async_paging.AsyncItemPaged`, `AsyncPageIterator` + +### Imports From Sync (NOT Duplicated) +- `_build_search_request`, `_convert_search_result`, `_pack_continuation_token`, `_unpack_continuation_token` + +### Defines +| Symbol | Type | What It Does | +|--------|------|-------------| +| `AsyncSearchPageIterator` | class | Async version of `SearchPageIterator` | +| `AsyncSearchItemPaged` | class | Async version of `SearchItemPaged` | +| `_SearchClientOperationsMixin` | class | Async operations mixin, same methods as sync | + +### After Regeneration, Verify +- [ ] Same checks as sync `_operations/_patch.py` +- [ ] Imports from sync `_operations/_patch.py` still resolve + +--- + +## File: `azure/search/documents/indexes/models/_patch.py` + +### Depends On (from generated code) +- `._models.SearchField` as `_SearchField` (base class) +- `._models.SearchIndexerDataSourceConnection` as `_SearchIndexerDataSourceConnection` (base class) +- `._models.KnowledgeBase` as `_KnowledgeBase` (base class) +- `._enums.SearchFieldDataType`, `OcrSkillLanguage`, `SplitSkillLanguage`, `TextTranslationSkillLanguage` +- Various model imports for type annotations + +### Defines +| Symbol | Type | What It Does | +|--------|------|-------------| +| `SearchField` | class | Adds `hidden` property (inverse of `retrievable`), constructor accepts `hidden` kwarg | +| `SearchIndexerDataSourceConnection` | class | Accepts `connection_string` str or `credentials` object | +| `KnowledgeBase` | class | Deserializes `retrieval_reasoning_effort` from dict | +| `SimpleField()` | function | Builder: sets `searchable=False` explicitly | +| `SearchableField()` | function | Builder: auto-types to `String`/`Collection(String)` | +| `ComplexField()` | function | Builder: sets type to `Complex`/`Collection(Complex)` | +| `Collection()` | function | Wraps type in `"Collection(...)"` format | +| `_RemovedModel` | class | Base for tombstones: raises `ValueError` on init, `TypeError` on subclass | +| `EntityRecognitionSkill` | tombstone | "Use EntityRecognitionSkillV3 instead" | +| `EntityRecognitionSkillLanguage` | tombstone | "Use EntityRecognitionSkillV3 instead" | +| `SentimentSkill` | tombstone | "Use SentimentSkillV3 instead" | + +### Enum Aliases +```python +# SearchFieldDataType (old camelCase -> generated UPPER_CASE) +SearchFieldDataType.String = SearchFieldDataType.STRING +SearchFieldDataType.Int32 = SearchFieldDataType.INT32 +SearchFieldDataType.Int64 = SearchFieldDataType.INT64 +SearchFieldDataType.Single = SearchFieldDataType.SINGLE +SearchFieldDataType.Double = SearchFieldDataType.DOUBLE +SearchFieldDataType.Boolean = SearchFieldDataType.BOOLEAN +SearchFieldDataType.DateTimeOffset = SearchFieldDataType.DATE_TIME_OFFSET +SearchFieldDataType.GeographyPoint = SearchFieldDataType.GEOGRAPHY_POINT +SearchFieldDataType.ComplexType = SearchFieldDataType.COMPLEX + +# Monkey-patched staticmethod +SearchFieldDataType.Collection = staticmethod(Collection) + +# Python keyword collisions (generated IS_ENUM -> alias IS) +OcrSkillLanguage.IS = OcrSkillLanguage.IS_ENUM +SplitSkillLanguage.IS = SplitSkillLanguage.IS_ENUM +TextTranslationSkillLanguage.IS = TextTranslationSkillLanguage.IS_ENUM +``` + +### After Regeneration, Verify +- [ ] All right-hand-side enum members (`STRING`, `INT32`, `IS_ENUM`, etc.) still exist +- [ ] `_SearchField`, `_SearchIndexerDataSourceConnection`, `_KnowledgeBase` base constructors unchanged +- [ ] No new enum values that collide with Python keywords — if so, add aliases +- [ ] `SearchField.retrievable` property still exists (used by `hidden` inversion) + +--- + +## File: `azure/search/documents/indexes/_operations/_patch.py` + +### Depends On (from generated code) +- Generated `_SearchIndexClientOperationsMixin` and `_SearchIndexerClientOperationsMixin` (base classes) +- `azure.core.match_conditions.MatchConditions` + +### Defines +| Symbol | Type | What It Does | +|--------|------|-------------| +| `_convert_index_response(response)` | function | Maps `SearchIndexResponse` → `SearchIndex` (semantic → semantic_search) | + +**`_SearchIndexClientOperationsMixin`** wraps: +| Method | Customization | +|--------|--------------| +| `delete_index(index)` | Polymorphic: str or SearchIndex | +| `create_or_update_index(index)` | Adds prefer, match_condition, allow_index_downtime | +| `delete_synonym_map(synonym_map)` | Polymorphic: str or SynonymMap | +| `create_or_update_synonym_map(synonym_map)` | Adds prefer, match_condition | +| `delete_alias(alias)` | Polymorphic: str or SearchAlias | +| `create_or_update_alias(alias)` | Adds prefer, match_condition | +| `delete_knowledge_base(kb)` | Polymorphic: str or KnowledgeBase | +| `create_or_update_knowledge_base(kb)` | Adds prefer, match_condition | +| `delete_knowledge_source(ks)` | Polymorphic: str or KnowledgeSource | +| `create_or_update_knowledge_source(ks)` | Adds prefer, match_condition | +| `list_indexes(*, select)` | Uses `_convert_index_response` for projections | +| `list_index_names()` | Name-only projection via `cls` callback | +| `get_synonym_maps(*, select)` | Returns list | + +**`_SearchIndexerClientOperationsMixin`** wraps: +| Method | Customization | +|--------|--------------| +| `delete_data_source_connection(dsc)` | Polymorphic: str or object | +| `create_or_update_data_source_connection(dsc)` | Adds prefer, match_condition, skip_indexer_reset | +| `delete_indexer(indexer)` | Polymorphic: str or SearchIndexer | +| `create_or_update_indexer(indexer)` | Adds prefer, match_condition, skip/disable cache | +| `delete_skillset(skillset)` | Polymorphic: str or SearchIndexerSkillset | +| `create_or_update_skillset(skillset)` | Adds prefer, match_condition, skip/disable cache | + +### After Regeneration, Verify +- [ ] All generated `_delete_*`, `_create_or_update_*`, `_list_*` base methods still exist with compatible signatures +- [ ] `SearchIndexResponse` still has `.semantic` field (used by `_convert_index_response`) +- [ ] New resource types added to spec → add polymorphic delete/update wrappers here +- [ ] Verify prefer header value `"return=representation"` still applies + +--- + +## File: `azure/search/documents/indexes/aio/_operations/_patch.py` + +Async mirror of `indexes/_operations/_patch.py`. Same methods, all `async`. + +Imports `_convert_index_response` from sync — NOT duplicated. + +### After Regeneration, Verify +- [ ] Same checks as sync `indexes/_operations/_patch.py` +- [ ] Import of `_convert_index_response` from sync still resolves + +--- + +## Empty `_patch.py` Files (No Customizations) + +These contain only `__all__ = []` and empty `patch_sdk()`. No action needed after regeneration. + +- `azure/search/documents/indexes/_patch.py` +- `azure/search/documents/indexes/aio/_patch.py` +- `azure/search/documents/knowledgebases/_patch.py` +- `azure/search/documents/knowledgebases/models/_patch.py` +- `azure/search/documents/knowledgebases/aio/_patch.py`