diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/sdkinternal-py-env-setup-venv/SKILL.md b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/sdkinternal-py-env-setup-venv/SKILL.md new file mode 100644 index 000000000000..a2701d10b068 --- /dev/null +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/sdkinternal-py-env-setup-venv/SKILL.md @@ -0,0 +1,166 @@ +--- +name: sdkinternal-py-env-setup-venv +description: Set up Python virtual environment for azure-ai-contentunderstanding package development. Use this skill when setting up the development environment, creating venv, installing dependencies, or configuring environment variables. +--- + +# Python Virtual Environment Setup for azure-ai-contentunderstanding + +Set up a complete Python development environment for the azure-ai-contentunderstanding package, including virtual environment creation, dependency installation, and environment configuration. + +## Package Directory + +``` +sdk/contentunderstanding/azure-ai-contentunderstanding +``` + +## Workflow + +### 1. Navigate to Package Directory + +```bash +cd sdk/contentunderstanding/azure-ai-contentunderstanding +``` + +### 2. Check and Create Virtual Environment + +Check if the virtual environment already exists: + +```bash +if [ -d ".venv" ]; then + echo "Virtual environment already exists at .venv" +else + echo "Creating virtual environment..." + python -m venv .venv + echo "Virtual environment created at .venv" +fi +``` + +### 3. Activate Virtual Environment + +**On Linux/macOS:** +```bash +source .venv/bin/activate +``` + +**On Windows (PowerShell):** +```powershell +.venv\Scripts\Activate.ps1 +``` + +**On Windows (Command Prompt):** +```cmd +.venv\Scripts\activate.bat +``` + +Verify activation by checking Python location: +```bash +which python # Linux/macOS +# where python # Windows +``` + +### 4. Install Dependencies + +Install the SDK and all development dependencies: + +```bash +pip install -e . +pip install -r dev_requirements.txt +``` + +This installs: +- `aiohttp` - Required for async operations +- `python-dotenv` - For loading `.env` files +- `azure-identity` - For `DefaultAzureCredential` authentication +- `pytest-xdist` - For parallel test execution + +### 5. Configure Environment Variables + +Check if `.env` file exists in the package root directory: + +```bash +if [ -f ".env" ]; then + echo ".env file already exists" +else + echo "Copying env.sample to .env..." + cp env.sample .env + echo "Created .env - Please configure the required variables" +fi +``` + +### 6. Required Environment Variables + +After copying, edit `.env` and configure these **required** variables: + +| Variable | Description | Required For | +|----------|-------------|--------------| +| `CONTENTUNDERSTANDING_ENDPOINT` | Microsoft Foundry resource endpoint URL (e.g., `https://.services.ai.azure.com/`) | All samples | +| `CONTENTUNDERSTANDING_KEY` | API key (optional if using DefaultAzureCredential) | API key authentication | +| `GPT_4_1_DEPLOYMENT` | GPT-4.1 deployment name in Microsoft Foundry | sample_update_defaults.py | +| `GPT_4_1_MINI_DEPLOYMENT` | GPT-4.1-mini deployment name | sample_update_defaults.py | +| `TEXT_EMBEDDING_3_LARGE_DEPLOYMENT` | text-embedding-3-large deployment name | sample_update_defaults.py | + +**Example `.env` configuration:** +```bash +CONTENTUNDERSTANDING_ENDPOINT=https://.services.ai.azure.com/ +CONTENTUNDERSTANDING_KEY= # Optional if using DefaultAzureCredential +GPT_4_1_DEPLOYMENT=gpt-4.1 +GPT_4_1_MINI_DEPLOYMENT=gpt-4.1-mini +TEXT_EMBEDDING_3_LARGE_DEPLOYMENT=text-embedding-3-large +``` + +### 7. Verify Setup + +Test the environment by running a sample: + +```bash +cd samples +python sample_update_defaults.py +``` + +## Complete Setup Script (Linux/macOS) + +Run the automated setup script: + +```bash +# From the package directory +.github/skills/sdkinternal-py-env-setup-venv/scripts/setup_venv.sh +``` + +Or run all steps manually: + +```bash +cd sdk/contentunderstanding/azure-ai-contentunderstanding + +# Create venv if not exists +[ ! -d ".venv" ] && python -m venv .venv + +# Activate +source .venv/bin/activate + +# Install dependencies +pip install -e . +pip install -r dev_requirements.txt + +# Copy env.sample if .env doesn't exist +[ ! -f ".env" ] && cp env.sample .env && echo "Created .env - configure required variables" + +echo "Setup complete! Edit .env with your configuration." +``` + +## Troubleshooting + +**Error: "python: command not found"** +- Ensure Python 3.9+ is installed +- Try using `python3` instead of `python` + +**Error: "pip: command not found" after activation** +- The venv may not have pip. Run: `python -m ensurepip --upgrade` + +**Error: "ModuleNotFoundError" when running samples** +- Ensure venv is activated: `source .venv/bin/activate` +- Reinstall dependencies: `pip install -r dev_requirements.txt` + +**Error: "Access denied" or authentication failures** +- Check `CONTENTUNDERSTANDING_ENDPOINT` is correct +- If using API key, verify `CONTENTUNDERSTANDING_KEY` is set +- If using DefaultAzureCredential, run `az login` first diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/sdkinternal-py-env-setup-venv/scripts/setup_venv.sh b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/sdkinternal-py-env-setup-venv/scripts/setup_venv.sh new file mode 100755 index 000000000000..3c98dd0495d5 --- /dev/null +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/sdkinternal-py-env-setup-venv/scripts/setup_venv.sh @@ -0,0 +1,110 @@ +#!/bin/bash +# Setup script for azure-ai-contentunderstanding Python development environment +# This script sets up venv, installs dependencies, and configures environment variables + +set -e + +# Determine script directory and package root +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +PACKAGE_ROOT="$(cd "$SCRIPT_DIR/../../../.." && pwd)" + +echo "=== Azure AI Content Understanding - Python Environment Setup ===" +echo "Package root: $PACKAGE_ROOT" +echo "" + +cd "$PACKAGE_ROOT" + +# Step 1: Check and create virtual environment +echo "Step 1: Checking virtual environment..." +if [ -d ".venv" ]; then + echo " ✓ Virtual environment already exists at .venv" +else + echo " Creating virtual environment..." + python3 -m venv .venv || python -m venv .venv + echo " ✓ Virtual environment created at .venv" +fi +echo "" + +# Step 2: Activate virtual environment +echo "Step 2: Activating virtual environment..." +source .venv/bin/activate +echo " ✓ Virtual environment activated" +echo " Python: $(which python)" +echo "" + +# Step 3: Install dependencies +echo "Step 3: Installing dependencies..." +echo " Installing package in editable mode..." +pip install -e . --quiet +echo " Installing dev requirements..." +pip install -r dev_requirements.txt --quiet +echo " ✓ Dependencies installed" +echo "" + +# Step 4: Configure environment variables +echo "Step 4: Configuring environment variables..." +if [ -f ".env" ]; then + echo " ✓ .env file already exists" +else + if [ -f "env.sample" ]; then + cp env.sample .env + echo " ✓ Created .env from env.sample" + echo "" + echo " ⚠ Please configure the required variables in .env:" + echo "" + echo " Required variables:" + echo " CONTENTUNDERSTANDING_ENDPOINT - Your Microsoft Foundry endpoint URL" + echo " CONTENTUNDERSTANDING_KEY - API key (optional if using DefaultAzureCredential)" + echo "" + echo " For running sample_update_defaults.py:" + echo " GPT_4_1_DEPLOYMENT - Your GPT-4.1 deployment name" + echo " GPT_4_1_MINI_DEPLOYMENT - Your GPT-4.1-mini deployment name" + echo " TEXT_EMBEDDING_3_LARGE_DEPLOYMENT - Your text-embedding-3-large deployment name" + echo "" + + # Ask user if they want to configure now + read -p " Would you like to configure required variables now? (y/N): " configure_now + if [[ "$configure_now" =~ ^[Yy]$ ]]; then + echo "" + read -p " Enter CONTENTUNDERSTANDING_ENDPOINT: " endpoint + if [ -n "$endpoint" ]; then + sed -i "s|CONTENTUNDERSTANDING_ENDPOINT=.*|CONTENTUNDERSTANDING_ENDPOINT=$endpoint|" .env + fi + + read -p " Enter CONTENTUNDERSTANDING_KEY (press Enter to skip for DefaultAzureCredential): " api_key + if [ -n "$api_key" ]; then + sed -i "s|CONTENTUNDERSTANDING_KEY=.*|CONTENTUNDERSTANDING_KEY=$api_key|" .env + fi + + read -p " Enter GPT_4_1_DEPLOYMENT (default: gpt-4.1): " gpt41 + gpt41="${gpt41:-gpt-4.1}" + sed -i "s|GPT_4_1_DEPLOYMENT=.*|GPT_4_1_DEPLOYMENT=$gpt41|" .env + + read -p " Enter GPT_4_1_MINI_DEPLOYMENT (default: gpt-4.1-mini): " gpt41mini + gpt41mini="${gpt41mini:-gpt-4.1-mini}" + sed -i "s|GPT_4_1_MINI_DEPLOYMENT=.*|GPT_4_1_MINI_DEPLOYMENT=$gpt41mini|" .env + + read -p " Enter TEXT_EMBEDDING_3_LARGE_DEPLOYMENT (default: text-embedding-3-large): " embedding + embedding="${embedding:-text-embedding-3-large}" + sed -i "s|TEXT_EMBEDDING_3_LARGE_DEPLOYMENT=.*|TEXT_EMBEDDING_3_LARGE_DEPLOYMENT=$embedding|" .env + + echo "" + echo " ✓ Environment variables configured" + fi + else + echo " ⚠ env.sample not found, skipping .env creation" + fi +fi +echo "" + +# Summary +echo "=== Setup Complete ===" +echo "" +echo "To activate the virtual environment in a new terminal:" +echo " cd $PACKAGE_ROOT" +echo " source .venv/bin/activate" +echo "" +echo "To run samples:" +echo " cd samples" +echo " python sample_update_defaults.py" +echo "" diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/sdkinternal-py-sample-run-all-samples/SKILL.md b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/sdkinternal-py-sample-run-all-samples/SKILL.md new file mode 100644 index 000000000000..190db0588a3f --- /dev/null +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/sdkinternal-py-sample-run-all-samples/SKILL.md @@ -0,0 +1,192 @@ +--- +name: sdkinternal-py-sample-run-all-samples +description: "Run all samples (sync and async) for the azure-ai-contentunderstanding SDK. Use when you need to verify samples work correctly after SDK changes." +--- + +# Run All Samples + +This skill runs all Python samples for the `azure-ai-contentunderstanding` SDK to verify they execute correctly. + +## Prerequisites + +- Python >= 3.9 +- A Microsoft Foundry resource with required model deployments (gpt-4.1, gpt-4.1-mini, text-embedding-3-large) +- **Cognitive Services User** role assigned to your account +- Sample files available in `samples/sample_files/` + +## Setup + +### Step 1: Set Up Virtual Environment (Optional) + +The script automatically handles virtual environment setup. It will: +- Use your active virtual environment if one is detected +- Activate `.venv` if it exists in the package directory +- Create `.venv` and install dependencies if neither exists + +**To manually set up** (from the package directory `sdk/contentunderstanding/azure-ai-contentunderstanding`): + +```bash +# Create virtual environment (only needed once) +python -m venv .venv + +# Activate virtual environment +source .venv/bin/activate # On Linux/macOS +# .venv\Scripts\activate # On Windows + +# Install SDK (editable) and all dependencies +pip install -e . +pip install -r dev_requirements.txt # Includes aiohttp, pytest, python-dotenv, azure-identity +``` + +### Step 2: Configure Environment Variables (Optional) + +The script automatically copies `env.sample` to `.env` if no `.env` exists. + +**To manually set up**, create a `.env` file in the package root directory: + +```bash +cp env.sample .env +``` + +Edit `.env` with your values: + +```bash +CONTENTUNDERSTANDING_ENDPOINT=https://.services.ai.azure.com/ +# Optional: If omitted, DefaultAzureCredential is used +CONTENTUNDERSTANDING_KEY= + +# Required for sample_update_defaults.py (one-time setup) +GPT_4_1_DEPLOYMENT=gpt-4.1 +GPT_4_1_MINI_DEPLOYMENT=gpt-4.1-mini +TEXT_EMBEDDING_3_LARGE_DEPLOYMENT=text-embedding-3-large +``` + +**Note:** If `CONTENTUNDERSTANDING_KEY` is not set, the SDK uses `DefaultAzureCredential`. Ensure you have authenticated (e.g., `az login`). + +### Step 3: Configure Model Deployments (One-Time) + +Before running samples that use prebuilt analyzers, you must configure default model mappings: + +```bash +cd samples +python sample_update_defaults.py +``` + +This maps your deployed models to those required by prebuilt analyzers. + +## Quick Start + +The script automatically sets up the environment if needed. It's **safe to rerun** - existing `.venv` and `.env` files are never overwritten. + +From the `.github/skills/sdkinternal-py-sample-run-all-samples/scripts/` directory: + +```bash +./run_all_samples.sh +``` + +Or from the package root: + +```bash +.github/skills/sdkinternal-py-sample-run-all-samples/scripts/run_all_samples.sh +``` + +## What It Does + +1. **Automatic virtual environment handling** (safe - never overwrites existing): + - If already in an active venv → uses it + - If `.venv` exists in the package directory → activates it (does NOT reinstall) + - If no venv exists → creates `.venv`, installs SDK (`pip install -e .`) and dependencies +2. **Automatic .env configuration** (safe - never overwrites existing): + - If `.env` exists → uses it + - If `.env` doesn't exist but `env.sample` does → copies `env.sample` to `.env` +3. Runs all sync samples (`samples/*.py`) +4. Runs all async samples (`samples/async_samples/*.py`) +5. Logs all output to a timestamped log file in the current directory + +## Script Location + +The bash script is located at: +``` +.github/skills/sdkinternal-py-sample-run-all-samples/scripts/run_all_samples.sh +``` + +## Configuration + +The script uses the following defaults: +- **Python command**: `python` (can be changed via `VENV_PY` variable) +- **Log file**: `run_samples_YYYYMMDD_HHMMSS.log` in the current directory + +To use a specific Python interpreter: + +```bash +VENV_PY=python3.11 ./run_all_samples.sh +``` + +## Expected Output + +The script creates a log file with: +- Timestamp of the run +- Output from each sample (grouped by sync/async) +- Summary of completion + +Example log output: +``` +=== Azure SDK Samples Run - 2026-01-30 10:30:00 === +--------------------------------------------- +Running sample (sync): sample_analyze_document.py +--------------------------------------------- +[sample output here] + +--------------------------------------------- +Running sample (async): sample_analyze_document_async.py +--------------------------------------------- +[sample output here] + +============================================= +All samples finished. Log saved to run_samples_20260130_103000.log. +``` + +## Troubleshooting + +### Sample fails to find sample_files + +Make sure to run samples from the `samples/` directory (the script handles this automatically). + +### Missing credentials / Access denied + +Ensure Azure credentials are configured: +- Set `CONTENTUNDERSTANDING_ENDPOINT` environment variable +- For API key auth: set `CONTENTUNDERSTANDING_KEY` +- For DefaultAzureCredential: run `az login` +- Ensure you have the **Cognitive Services User** role assigned + +### Model deployment not found + +- Ensure you have deployed the required models (gpt-4.1, gpt-4.1-mini, text-embedding-3-large) in Microsoft Foundry +- Run `sample_update_defaults.py` to configure default model mappings +- Check that deployment names in `.env` match what you created in Foundry + +### Import errors + +Ensure the SDK and dependencies are installed: +```bash +pip install -e . +pip install -r dev_requirements.txt +``` + +### aiohttp not installed (async samples) + +The async transport requires aiohttp: +```bash +pip install aiohttp +``` +Or install all dev dependencies: `pip install -r dev_requirements.txt` + +## Related Files + +- `samples/*.py` - Sync samples +- `samples/async_samples/*.py` - Async samples +- `samples/sample_files/` - Sample input files (PDFs, images, etc.) +- `.env` - Environment configuration (created from `env.sample` if not exists) +- `env.sample` - Template for environment variables +- `dev_requirements.txt` - Development dependencies diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/sdkinternal-py-sample-run-all-samples/scripts/run_all_samples.sh b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/sdkinternal-py-sample-run-all-samples/scripts/run_all_samples.sh new file mode 100755 index 000000000000..f28ac280693b --- /dev/null +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/sdkinternal-py-sample-run-all-samples/scripts/run_all_samples.sh @@ -0,0 +1,122 @@ +#!/bin/bash +# =============================== +# Run All Samples Script +# =============================== +# This script runs all sync and async samples for azure-ai-contentunderstanding +# and logs the output to a timestamped log file. + +set -e + +# =============================== +# Settings +# =============================== +VENV_PY="${VENV_PY:-python}" +CURRENT_DIR="$(pwd)" +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +PACKAGE_DIR="$(cd "$SCRIPT_DIR/../../../.." 2>/dev/null && pwd)" || PACKAGE_DIR="" +SAMPLES_PARENT="$(cd "$SCRIPT_DIR/../../../../samples" 2>/dev/null && pwd)" || SAMPLES_PARENT="" +SAMPLES_SUB="async_samples" +LOG_FILE="$CURRENT_DIR/run_samples_$(date +%Y%m%d_%H%M%S).log" +VENV_DIR="$PACKAGE_DIR/.venv" + +# =============================== +# Virtual environment setup +# =============================== +if [[ -z "$VIRTUAL_ENV" ]]; then + # Not in a virtual environment - check if .venv exists + if [[ -d "$VENV_DIR" && -f "$VENV_DIR/bin/activate" ]]; then + echo "Virtual environment already exists at $VENV_DIR" + echo "Activating..." + source "$VENV_DIR/bin/activate" + VENV_PY="python" + else + echo "No virtual environment found at $VENV_DIR" + echo "Creating virtual environment..." + python -m venv "$VENV_DIR" + source "$VENV_DIR/bin/activate" + VENV_PY="python" + + echo "Installing dependencies..." + pip install --upgrade pip + pip install -e "$PACKAGE_DIR" + if [[ -f "$PACKAGE_DIR/dev_requirements.txt" ]]; then + pip install -r "$PACKAGE_DIR/dev_requirements.txt" + fi + echo "" + fi +else + echo "Using active virtual environment: $VIRTUAL_ENV" +fi + +# =============================== +# Check .env configuration +# =============================== +ENV_FILE="$PACKAGE_DIR/.env" +ENV_SAMPLE="$PACKAGE_DIR/env.sample" + +if [[ ! -f "$ENV_FILE" ]]; then + if [[ -f "$ENV_SAMPLE" ]]; then + echo "No .env file found. Copying env.sample to .env..." + cp "$ENV_SAMPLE" "$ENV_FILE" + echo "Created .env - Please configure the required variables before running samples." + echo "Required: CONTENTUNDERSTANDING_ENDPOINT" + echo "" + else + echo "WARNING: No .env or env.sample found. Samples may fail without configuration." + echo "" + fi +else + echo "Using existing .env configuration" +fi + +echo "=== Azure SDK Samples Run - $(date '+%Y-%m-%d %H:%M:%S') ===" | tee "$LOG_FILE" +echo "Using Python: $VENV_PY ($(which $VENV_PY 2>/dev/null || echo 'not found'))" | tee -a "$LOG_FILE" + +# =============================== +# Validate samples directory +# =============================== +if [[ -z "$SAMPLES_PARENT" || ! -d "$SAMPLES_PARENT" ]]; then + echo "Sample directory not found: $SAMPLES_PARENT" | tee -a "$LOG_FILE" + exit 1 +fi + +# =============================== +# Run each sample +# =============================== +cd "$SAMPLES_PARENT" + +# Run sync samples located at the samples root (so `sample_files/...` resolves) +for sample in ./*.py; do + if [[ -f "$sample" ]]; then + sample_name="$(basename "$sample")" + echo "---------------------------------------------" | tee -a "$LOG_FILE" + echo "Running sample (sync): $sample_name" | tee -a "$LOG_FILE" + echo "---------------------------------------------" | tee -a "$LOG_FILE" + + $VENV_PY "./$sample_name" 2>&1 | tee -a "$LOG_FILE" || true + + echo "" | tee -a "$LOG_FILE" + fi +done + +# Run async samples in the `async_samples` subfolder +if [[ -d "$SAMPLES_SUB" ]]; then + for sample in "$SAMPLES_SUB"/*.py; do + if [[ -f "$sample" ]]; then + sample_name="$(basename "$sample")" + echo "---------------------------------------------" | tee -a "$LOG_FILE" + echo "Running sample (async): $sample_name" | tee -a "$LOG_FILE" + echo "---------------------------------------------" | tee -a "$LOG_FILE" + + # Run the script from the samples parent directory so relative sample_files paths resolve + $VENV_PY "./$SAMPLES_SUB/$sample_name" 2>&1 | tee -a "$LOG_FILE" || true + + echo "" | tee -a "$LOG_FILE" + fi + done +fi + +cd "$CURRENT_DIR" + +echo "=============================================" | tee -a "$LOG_FILE" +echo "All samples finished. Log saved to $LOG_FILE." | tee -a "$LOG_FILE" diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/sdkinternal-py-sdk-pre-pr-check/SKILL.md b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/sdkinternal-py-sdk-pre-pr-check/SKILL.md new file mode 100644 index 000000000000..2776b9570cf7 --- /dev/null +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/sdkinternal-py-sdk-pre-pr-check/SKILL.md @@ -0,0 +1,463 @@ +--- +name: sdkinternal-py-sdk-pre-pr-check +description: "Run all required CI checks before creating a PR for the azure-ai-contentunderstanding SDK. Use this skill to validate code quality, type checking, documentation, and tests before submitting changes." +--- + +# Pre-PR Check Workflow for azure-ai-contentunderstanding + +This skill guides you through running all required CI checks locally before creating a pull request. Running these checks locally helps catch issues early and reduces PR review cycles. + +## Overview + +The Azure SDK for Python CI runs these **required checks**: +- **Pylint** - Code linting and Azure SDK guidelines +- **MyPy** - Static type checking +- **Pyright** - Static type checking (catches different issues than MyPy) +- **Black** - Code formatting +- **Bandit** - Security linting + +Additionally, there are **release-blocking checks**: +- **Sphinx** - Documentation generation/validation +- **Tests - CI** - Recorded tests in playback mode + +Optional checks: +- **Verifytypes** - Type completeness verification + +## Prerequisites + +- Python >= 3.9 with pip and venv +- Virtual environment activated with dev dependencies + +### Check and Create Virtual Environment + +```bash +cd sdk/contentunderstanding/azure-ai-contentunderstanding + +# Check if venv exists, create if not +if [ -d ".venv" ]; then + echo "Virtual environment already exists at .venv" +else + echo "Creating virtual environment..." + python -m venv .venv + echo "Virtual environment created at .venv" +fi +``` + +### Activate Virtual Environment + +**On Linux/macOS:** +```bash +source .venv/bin/activate +``` + +**On Windows (PowerShell):** +```powershell +.venv\Scripts\Activate.ps1 +``` + +**On Windows (Command Prompt):** +```cmd +.venv\Scripts\activate.bat +``` + +Verify activation: +```bash +which python # Should show .venv/bin/python +``` + +### Install Dependencies + +```bash +pip install -e . +pip install -r dev_requirements.txt +pip install "tox<5" black bandit +``` + +### Complete Setup Script + +Alternatively, run the automated setup script: + +```bash +.github/skills/sdkinternal-py-env-setup-venv/scripts/setup_venv.sh +``` + +## Package Directory + +```bash +cd sdk/contentunderstanding/azure-ai-contentunderstanding +``` + +## Quick Reference + +All tox commands use this pattern from the package directory: + +```bash +tox -e -c ../../../eng/tox/tox.ini --root . +``` + +**Tip:** For faster package installation, prefix with `TOX_PIP_IMPL=uv`: + +```bash +TOX_PIP_IMPL=uv tox -e pylint -c ../../../eng/tox/tox.ini --root . +``` + +--- + +## Step 1: Run Pylint (Required - Release Blocking) + +Pylint checks code style and adherence to Python/Azure SDK guidelines. + +```bash +cd sdk/contentunderstanding/azure-ai-contentunderstanding +tox -e pylint -c ../../../eng/tox/tox.ini --root . +``` + +### Common Pylint Issues and Fixes + +| Error | Solution | +|-------|----------| +| `line-too-long` | Break long lines, max 120 characters | +| `missing-docstring` | Add docstrings to public methods/classes | +| `unused-import` | Remove unused imports | +| `protected-access` | Use `# pylint:disable=protected-access` if intentional | +| `client-method-missing-type-annotations` | Add type hints to public methods | + +### Reference Documentation +- [Pylint Guide](https://github.com/Azure/azure-sdk-for-python/blob/main/doc/dev/pylint_checking.md) +- [Custom Azure Pylint Checkers](https://github.com/Azure/azure-sdk-tools/blob/main/tools/pylint-extensions/azure-pylint-guidelines-checker/README.md) + +--- + +## Step 2: Run MyPy (Required - Release Blocking) + +MyPy performs static type checking to catch type errors before runtime. + +```bash +cd sdk/contentunderstanding/azure-ai-contentunderstanding +tox -e mypy -c ../../../eng/tox/tox.ini --root . +``` + +### Common MyPy Issues and Fixes + +| Error | Solution | +|-------|----------| +| `Missing type parameters` | Add generic type parameters, e.g., `List[str]` not `List` | +| `Incompatible types` | Fix type mismatches or use `Union` types | +| `Missing return statement` | Add return statement or annotate with `-> None` | +| `Argument has incompatible type` | Check parameter types match expected types | +| `Cannot find module` | Add stub package or use `# type: ignore` with issue link | + +### Ignoring Specific Errors + +```python +# Globally ignore (both mypy and pyright): +value: int = some_function() # type: ignore # https://github.com/issue-link + +# MyPy specific: +value: int = some_function() # type: ignore[misc] +``` + +### Reference Documentation +- [Static Type Checking Guide](https://github.com/Azure/azure-sdk-for-python/blob/main/doc/dev/static_type_checking.md) +- [Static Type Checking Cheat Sheet](https://github.com/Azure/azure-sdk-for-python/blob/main/doc/dev/static_type_checking_cheat_sheet.md) + +--- + +## Step 3: Run Sphinx (Required - Release Blocking) + +Sphinx validates documentation and docstrings. + +```bash +cd sdk/contentunderstanding/azure-ai-contentunderstanding +tox -e sphinx -c ../../../eng/tox/tox.ini --root . +``` + +### Common Sphinx Issues and Fixes + +| Error | Solution | +|-------|----------| +| `undefined label` | Fix cross-reference targets | +| `duplicate label` | Remove duplicate labels/references | +| `unknown directive` | Check directive syntax (`:param:`, `:returns:`, etc.) | +| Missing docstring | Add docstrings with proper RST formatting | + +### Docstring Format + +```python +def my_method(self, param1: str, param2: int) -> bool: + """Short description of the method. + + :param str param1: Description of param1. + :param int param2: Description of param2. + :returns: Description of return value. + :rtype: bool + :raises ValueError: When validation fails. + """ +``` + +### Reference Documentation +- [Docstring Guide](https://github.com/Azure/azure-sdk-for-python/blob/main/doc/dev/docstring.md) + +--- + +## Step 4: Run Tests in Playback Mode (Required - Release Blocking) + +Run recorded tests to verify functionality without making live API calls. + +```bash +cd sdk/contentunderstanding/azure-ai-contentunderstanding +AZURE_TEST_RUN_LIVE=false pytest +``` + +### Run Specific Tests + +```bash +# Run a specific test file +AZURE_TEST_RUN_LIVE=false pytest tests/test_content_understanding_content_analyzers_operations.py + +# Run a specific test method +AZURE_TEST_RUN_LIVE=false pytest tests/test_content_understanding_content_analyzers_operations.py::TestContentUnderstandingContentAnalyzersOperations::test_content_analyzers_get + +# Run with verbose output +AZURE_TEST_RUN_LIVE=false pytest -v + +# Run with print output visible +AZURE_TEST_RUN_LIVE=false pytest -s +``` + +### Test Troubleshooting + +| Issue | Solution | +|-------|----------| +| `Connection refused` errors | Ensure test-proxy is running (automatic via conftest.py fixture) | +| Missing recordings | Recordings may need fetching: `python scripts/manage_recordings.py restore` | +| Test assertion failures | Check if API changes broke assertions; update tests or re-record | +| `ModuleNotFoundError` | Reinstall dependencies: `pip install -r dev_requirements.txt` | + +### Re-recording Tests + +If tests need new recordings due to API changes: + +```bash +AZURE_TEST_RUN_LIVE=true pytest +# Then push recordings: +python scripts/manage_recordings.py push -p sdk/contentunderstanding/azure-ai-contentunderstanding/assets.json +``` + +### Reference Documentation +- [Testing Guide](https://github.com/Azure/azure-sdk-for-python/blob/main/doc/dev/tests.md) +- [Test Proxy Troubleshooting](https://github.com/Azure/azure-sdk-for-python/blob/main/doc/dev/test_proxy_troubleshooting.md) + +--- + +## Step 5: Run Pyright (Required - CI Blocking) + +Pyright is a static type checker that catches different issues than MyPy. CI runs this check and will fail if there are errors. + +```bash +cd sdk/contentunderstanding/azure-ai-contentunderstanding +tox -e pyright -c ../../../eng/tox/tox.ini --root . +``` + +### Common Pyright Issues and Fixes + +| Error | Solution | +|-------|----------| +| `reportInvalidTypeArguments` | Fix generic type arguments or add `# pyright: ignore[reportInvalidTypeArguments]` | +| `reportPrivateUsage` | Use public API or add `# pyright: ignore[reportPrivateUsage]` | +| `reportReturnType` | Fix return type mismatch or add `# pyright: ignore[reportReturnType]` | + +### Ignoring Pyright Errors + +```python +value: int = some_function() # pyright: ignore[reportPrivateUsage] +``` + +**Important:** The `# pyright: ignore` comment must be on the same line as the error. + +--- + +## Step 6: Run Black (Required - CI Blocking) + +Black is an opinionated code formatter. CI runs this check and will fail if code is not formatted correctly. + +```bash +cd sdk/contentunderstanding/azure-ai-contentunderstanding +pip install black +black --check azure/ +``` + +### Auto-format Code + +To automatically format your code: + +```bash +black azure/ +``` + +--- + +## Step 7: Run Bandit (Required - CI Blocking) + +Bandit is a security linter that finds common security issues in Python code. + +```bash +cd sdk/contentunderstanding/azure-ai-contentunderstanding +pip install bandit +bandit -r azure/ -c ../../../eng/bandit.yml +``` + +### Common Bandit Issues + +| Issue | Solution | +|-------|----------| +| `B101: assert_used` | Use proper error handling instead of assert in production code | +| `B105: hardcoded_password` | Move credentials to environment variables | +| `B608: hardcoded_sql_expressions` | Use parameterized queries | + +--- + +## Step 8: Run Verifytypes (Optional) + +Verifytypes measures type completeness of the library. + +```bash +cd sdk/contentunderstanding/azure-ai-contentunderstanding +tox -e verifytypes -c ../../../eng/tox/tox.ini --root . +``` + +--- + +## Complete Pre-PR Check Script + +Run all required checks in sequence: + +```bash +#!/bin/bash +set -e + +cd sdk/contentunderstanding/azure-ai-contentunderstanding + +echo "=== Running Pylint ===" +tox -e pylint -c ../../../eng/tox/tox.ini --root . + +echo "=== Running MyPy ===" +tox -e mypy -c ../../../eng/tox/tox.ini --root . + +echo "=== Running Pyright ===" +tox -e pyright -c ../../../eng/tox/tox.ini --root . + +echo "=== Running Black ===" +black --check azure/ + +echo "=== Running Bandit ===" +bandit -r azure/ -x azure/**/tests/** || true # May have config file + +echo "=== Running Sphinx ===" +tox -e sphinx -c ../../../eng/tox/tox.ini --root . + +echo "=== Running Tests (Playback) ===" +AZURE_TEST_RUN_LIVE=false pytest + +echo "=== All required checks passed! ===" +``` + +### Faster Execution with uv + +```bash +#!/bin/bash +set -e + +cd sdk/contentunderstanding/azure-ai-contentunderstanding + +echo "=== Running Pylint ===" +TOX_PIP_IMPL=uv tox -e pylint -c ../../../eng/tox/tox.ini --root . + +echo "=== Running MyPy ===" +TOX_PIP_IMPL=uv tox -e mypy -c ../../../eng/tox/tox.ini --root . + +echo "=== Running Pyright ===" +TOX_PIP_IMPL=uv tox -e pyright -c ../../../eng/tox/tox.ini --root . + +echo "=== Running Black ===" +black --check azure/ + +echo "=== Running Bandit ===" +bandit -r azure/ -x azure/**/tests/** || true + +echo "=== Running Sphinx ===" +TOX_PIP_IMPL=uv tox -e sphinx -c ../../../eng/tox/tox.ini --root . + +echo "=== Running Tests (Playback) ===" +AZURE_TEST_RUN_LIVE=false pytest + +echo "=== All required checks passed! ===" +``` + +--- + +## Additional Tox Environments + +These additional tox environments are available for specific needs: + +| Environment | Description | Command | +|-------------|-------------|---------| +| `next-pylint` | Test with upcoming Pylint version | `tox -e next-pylint -c ../../../eng/tox/tox.ini --root .` | +| `next-mypy` | Test with upcoming MyPy version | `tox -e next-mypy -c ../../../eng/tox/tox.ini --root .` | +| `next-pyright` | Test with upcoming Pyright version | `tox -e next-pyright -c ../../../eng/tox/tox.ini --root .` | +| `whl` | Build wheel package | `tox -e whl -c ../../../eng/tox/tox.ini --root .` | +| `sdist` | Build source distribution | `tox -e sdist -c ../../../eng/tox/tox.ini --root .` | +| `samples` | Run all samples | `tox -e samples -c ../../../eng/tox/tox.ini --root .` | +| `apistub` | Generate API stub for APIView | `tox -e apistub -c ../../../eng/tox/tox.ini --root .` | + +--- + +## Troubleshooting + +### Tox Issues + +| Issue | Solution | +|-------|----------| +| `tox: command not found` | Install tox: `pip install "tox<5"` | +| Tox version mismatch | Upgrade: `pip install --upgrade "tox<5"` | +| Slow package installation | Use uv: `TOX_PIP_IMPL=uv tox -e ...` | +| `.tox` cache issues | Delete `.tox` directory and retry | + +### General Issues + +| Issue | Solution | +|-------|----------| +| Import errors | Ensure package is installed: `pip install -e .` | +| Missing dependencies | Run: `pip install -r dev_requirements.txt` | +| Wrong Python version | Ensure Python >= 3.9 is active in venv | +| Check disabled in CI | Remove `check = false` from `pyproject.toml` | + +--- + +## Pre-PR Checklist + +Before creating a PR, verify: + +- [ ] **Pylint** passes with no errors +- [ ] **MyPy** passes with no errors +- [ ] **Pyright** passes with no errors +- [ ] **Black** formatting is correct +- [ ] **Bandit** security checks pass +- [ ] **Sphinx** passes with no errors +- [ ] **Tests** pass in playback mode +- [ ] **CHANGELOG.md** is updated with changes +- [ ] **Version** in `_version.py` is updated (if releasing) +- [ ] **Samples** work correctly (if modified) +- [ ] **Recordings** are updated (if API changes) + +--- + +## Reference Documentation + +- [Repo Health Status](https://github.com/Azure/azure-sdk-for-python/blob/main/doc/repo_health_status.md) - Required checks explained +- [Engineering System Checks](https://github.com/Azure/azure-sdk-for-python/blob/main/doc/eng_sys_checks.md) - All CI check details +- [Pylint Guide](https://github.com/Azure/azure-sdk-for-python/blob/main/doc/dev/pylint_checking.md) +- [Static Type Checking Guide](https://github.com/Azure/azure-sdk-for-python/blob/main/doc/dev/static_type_checking.md) +- [Testing Guide](https://github.com/Azure/azure-sdk-for-python/blob/main/doc/dev/tests.md) +- [Docstring Guide](https://github.com/Azure/azure-sdk-for-python/blob/main/doc/dev/docstring.md) +- [Azure SDK Python Design Guidelines](https://azure.github.io/azure-sdk/python_design.html) diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/sdkinternal-py-sdk-pre-pr-check/scripts/run_pre_pr_checks.sh b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/sdkinternal-py-sdk-pre-pr-check/scripts/run_pre_pr_checks.sh new file mode 100644 index 000000000000..aff11ab623b7 --- /dev/null +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/sdkinternal-py-sdk-pre-pr-check/scripts/run_pre_pr_checks.sh @@ -0,0 +1,259 @@ +#!/bin/bash +# Pre-PR Check Script for azure-ai-contentunderstanding +# Runs all required CI checks before creating a pull request +# +# Usage: ./run_pre_pr_checks.sh [--fast] [--all] +# --fast: Use uv for faster package installation +# --all: Also run optional checks (verifytypes) + +set -e + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +PACKAGE_DIR="$(cd "$SCRIPT_DIR/../../../.." && pwd)" +TOX_INI="$PACKAGE_DIR/../../../eng/tox/tox.ini" + +# Parse arguments +USE_UV=false +RUN_ALL=false + +for arg in "$@"; do + case $arg in + --fast) + USE_UV=true + shift + ;; + --all) + RUN_ALL=true + shift + ;; + *) + ;; + esac +done + +cd "$PACKAGE_DIR" +echo "Running pre-PR checks in: $PACKAGE_DIR" +echo "" + +# Check if virtual environment is activated +if [ -z "$VIRTUAL_ENV" ]; then + echo "==========================================" + echo "WARNING: No virtual environment detected!" + echo "==========================================" + echo "" + echo "Please activate a virtual environment before running this script:" + echo "" + echo " # Create venv if it doesn't exist" + echo " cd $PACKAGE_DIR" + echo " python -m venv .venv" + echo "" + echo " # Activate venv" + echo " source .venv/bin/activate # Linux/macOS" + echo " # .venv\\Scripts\\activate # Windows" + echo "" + echo " # Install dependencies" + echo " pip install -e ." + echo " pip install -r dev_requirements.txt" + echo " pip install \"tox<5\" black bandit" + echo "" + echo "Or run the setup script:" + echo " $PACKAGE_DIR/.github/skills/sdkinternal-py-env-setup-venv/scripts/setup_venv.sh" + echo "" + read -p "Continue without venv? (y/N) " -n 1 -r + echo "" + if [[ ! $REPLY =~ ^[Yy]$ ]]; then + echo "Exiting. Please activate venv and try again." + exit 1 + fi + echo "" +fi + +# Set tox pip implementation +if $USE_UV; then + export TOX_PIP_IMPL=uv + echo "Using uv for faster package installation" + echo "" +fi + +# Track results +PYLINT_RESULT=0 +MYPY_RESULT=0 +PYRIGHT_RESULT=0 +BLACK_RESULT=0 +BANDIT_RESULT=0 +SPHINX_RESULT=0 +TESTS_RESULT=0 +VERIFYTYPES_RESULT=0 + +# Required checks +echo "==========================================" +echo "=== Step 1/7: Running Pylint (Required) ===" +echo "==========================================" +if tox -e pylint -c "$TOX_INI" --root .; then + echo "✓ Pylint passed" +else + PYLINT_RESULT=1 + echo "✗ Pylint failed" +fi +echo "" + +echo "==========================================" +echo "=== Step 2/7: Running MyPy (Required) ===" +echo "==========================================" +if tox -e mypy -c "$TOX_INI" --root .; then + echo "✓ MyPy passed" +else + MYPY_RESULT=1 + echo "✗ MyPy failed" +fi +echo "" + +echo "==========================================" +echo "=== Step 3/7: Running Pyright (Required) ===" +echo "==========================================" +if tox -e pyright -c "$TOX_INI" --root .; then + echo "✓ Pyright passed" +else + PYRIGHT_RESULT=1 + echo "✗ Pyright failed" +fi +echo "" + +echo "==========================================" +echo "=== Step 4/7: Running Black (Required) ===" +echo "==========================================" +if python -m black --check azure/; then + echo "✓ Black passed" +else + BLACK_RESULT=1 + echo "✗ Black failed (run 'black azure/' to auto-format)" +fi +echo "" + +echo "==========================================" +echo "=== Step 5/7: Running Bandit (Required) ===" +echo "==========================================" +if python -m bandit -r azure/ -x "azure/**/tests/**" -q; then + echo "✓ Bandit passed" +else + BANDIT_RESULT=1 + echo "✗ Bandit failed" +fi +echo "" + +echo "==========================================" +echo "=== Step 6/7: Running Sphinx (Required) ===" +echo "==========================================" +if tox -e sphinx -c "$TOX_INI" --root .; then + echo "✓ Sphinx passed" +else + SPHINX_RESULT=1 + echo "✗ Sphinx failed" +fi +echo "" + +echo "==============================================" +echo "=== Step 7/7: Running Tests (Playback Mode) ===" +echo "==============================================" +if AZURE_TEST_RUN_LIVE=false pytest; then + echo "✓ Tests passed" +else + TESTS_RESULT=1 + echo "✗ Tests failed" +fi +echo "" + +# Optional checks (if --all specified) +if $RUN_ALL; then + echo "==========================================" + echo "=== Optional: Running Verifytypes ===" + echo "==========================================" + if tox -e verifytypes -c "$TOX_INI" --root .; then + echo "✓ Verifytypes passed" + else + VERIFYTYPES_RESULT=1 + echo "✗ Verifytypes failed (optional)" + fi + echo "" +fi + +# Summary +echo "==========================================" +echo "=== Summary ===" +echo "==========================================" +echo "" + +REQUIRED_FAILED=0 + +if [ $PYLINT_RESULT -eq 0 ]; then + echo "✓ Pylint: PASSED" +else + echo "✗ Pylint: FAILED (CI blocking)" + REQUIRED_FAILED=1 +fi + +if [ $MYPY_RESULT -eq 0 ]; then + echo "✓ MyPy: PASSED" +else + echo "✗ MyPy: FAILED (CI blocking)" + REQUIRED_FAILED=1 +fi + +if [ $PYRIGHT_RESULT -eq 0 ]; then + echo "✓ Pyright: PASSED" +else + echo "✗ Pyright: FAILED (CI blocking)" + REQUIRED_FAILED=1 +fi + +if [ $BLACK_RESULT -eq 0 ]; then + echo "✓ Black: PASSED" +else + echo "✗ Black: FAILED (CI blocking) - run 'black azure/' to fix" + REQUIRED_FAILED=1 +fi + +if [ $BANDIT_RESULT -eq 0 ]; then + echo "✓ Bandit: PASSED" +else + echo "✗ Bandit: FAILED (CI blocking)" + REQUIRED_FAILED=1 +fi + +if [ $SPHINX_RESULT -eq 0 ]; then + echo "✓ Sphinx: PASSED" +else + echo "✗ Sphinx: FAILED (release blocking)" + REQUIRED_FAILED=1 +fi + +if [ $TESTS_RESULT -eq 0 ]; then + echo "✓ Tests: PASSED" +else + echo "✗ Tests: FAILED (release blocking)" + REQUIRED_FAILED=1 +fi + +if $RUN_ALL; then + if [ $VERIFYTYPES_RESULT -eq 0 ]; then + echo "✓ Verifytypes: PASSED (optional)" + else + echo "✗ Verifytypes: FAILED (optional)" + fi +fi + +echo "" + +if [ $REQUIRED_FAILED -eq 0 ]; then + echo "==========================================" + echo "=== All required checks passed! ===" + echo "=== Ready to create PR ===" + echo "==========================================" + exit 0 +else + echo "==========================================" + echo "=== Some required checks failed! ===" + echo "=== Please fix issues before creating PR ===" + echo "==========================================" + exit 1 +fi diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/sdkinternal-py-sdk-update-from-typespec/SKILL.md b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/sdkinternal-py-sdk-update-from-typespec/SKILL.md new file mode 100644 index 000000000000..fbeab2ebe55c --- /dev/null +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/sdkinternal-py-sdk-update-from-typespec/SKILL.md @@ -0,0 +1,538 @@ +--- +name: sdkinternal-py-sdk-update-from-typespec +description: "Update the azure-ai-contentunderstanding SDK from a new TypeSpec commit. Use when the TypeSpec spec has been updated and the SDK needs regeneration." +--- + +# Update SDK from TypeSpec Commit + +This skill guides updating the `azure-ai-contentunderstanding` SDK when a new TypeSpec commit is available. + +## Prerequisites + +- Node.js LTS installed (>= 22.16.0) +- Python >= 3.9 with pip and venv +- tsp-client installed from `eng/common/tsp-client` +- Virtual environment activated with dev dependencies + +### Check and Create Virtual Environment + +```bash +cd sdk/contentunderstanding/azure-ai-contentunderstanding + +# Check if venv exists, create if not +if [ -d ".venv" ]; then + echo "Virtual environment already exists at .venv" +else + echo "Creating virtual environment..." + python -m venv .venv + echo "Virtual environment created at .venv" +fi +``` + +### Activate Virtual Environment + +**On Linux/macOS:** +```bash +source .venv/bin/activate +``` + +**On Windows (PowerShell):** +```powershell +.venv\Scripts\Activate.ps1 +``` + +**On Windows (Command Prompt):** +```cmd +.venv\Scripts\activate.bat +``` + +Verify activation: +```bash +which python # Should show .venv/bin/python +``` + +### Install Dependencies + +```bash +pip install -e . +pip install -r dev_requirements.txt +pip install "tox<5" black bandit +``` + +### Complete Setup Script + +Alternatively, run the automated setup script: + +```bash +.github/skills/sdkinternal-py-env-setup-venv/scripts/setup_venv.sh +``` + +## Workflow Steps + +### Step 1: Update TypeSpec Commit Reference + +Update the `commit` field in `tsp-location.yaml`: + +```yaml +# File: sdk/contentunderstanding/azure-ai-contentunderstanding/tsp-location.yaml +directory: specification/ai/ContentUnderstanding +commit: # Update this +repo: Azure/azure-rest-api-specs +``` + +### Step 2: Install tsp-client (if not already) + +```bash +cd eng/common/tsp-client +npm ci +``` + +### Step 3: Regenerate SDK + +Run from the tsp-client directory: + +```bash +cd eng/common/tsp-client +npx tsp-client update -o ../../../sdk/contentunderstanding/azure-ai-contentunderstanding +``` + +This regenerates the SDK from the TypeSpec specification. + +### Step 4: Verify Customizations Are Preserved + +Python SDK customizations are applied via `_patch.py` files which are NOT overwritten during regeneration. Verify these files are intact: + +Key patch files with customizations: +- `azure/ai/contentunderstanding/_patch.py` - Sync client (string_encoding, content_type defaults) +- `azure/ai/contentunderstanding/models/_patch.py` - Models (AnalyzeLROPoller, .value property, KeyFrameTimesMs, RecordMergePatchUpdate) +- `azure/ai/contentunderstanding/aio/_patch.py` - Async client (string_encoding, content_type defaults) +- `azure/ai/contentunderstanding/aio/models/_patch.py` - Async models (AnalyzeAsyncLROPoller) + +Empty patch files (no current customizations): +- `azure/ai/contentunderstanding/_operations/_patch.py` +- `azure/ai/contentunderstanding/aio/_operations/_patch.py` +- `azure/ai/contentunderstanding/operations/_patch.py` +- `azure/ai/contentunderstanding/aio/operations/_patch.py` + +### Step 5: Check Fix Status + +Verify each fix listed in the "Current Known Fixes" section below is still applied in the `_patch.py` files. + +| Fix # | Description | Check Location | Verification | +|-------|-------------|----------------|--------------| +| 1 | Default `string_encoding` to "codePoint" | `_patch.py` and `aio/_patch.py` in `begin_analyze` and `begin_analyze_binary` | Has `kwargs["string_encoding"] = "codePoint"` | +| 2 | `AnalyzeLROPoller` with `.operation_id` property (sync) | `models/_patch.py` | `AnalyzeLROPoller` class exists with `operation_id` property, `from_poller()`, `from_continuation_token()` | +| 3 | `AnalyzeAsyncLROPoller` with `.operation_id` property (async) | `aio/models/_patch.py` | `AnalyzeAsyncLROPoller` class exists with `operation_id` property, `from_poller()`, `from_continuation_token()` | +| 4 | `.value` property on ContentField types | `models/_patch.py` in `patch_sdk()` | `_add_value_property_to_field` calls for all 9 field types + `ContentField` base class | +| 5 | `KeyFrameTimesMs` casing normalization | `models/_patch.py` in `patch_sdk()` | `_patched_audio_visual_content_init` handles both casings | +| 6 | `RecordMergePatchUpdate` type alias | `models/_patch.py` | `RecordMergePatchUpdate = Dict[str, str]` | +| 7 | Default `content_type` for `begin_analyze_binary` | `_patch.py` and `aio/_patch.py` | Has `content_type: str = "application/octet-stream"` | +| 8 | Parameter order for analyze methods | `_patch.py` and `aio/_patch.py` | `begin_analyze` convenience: `inputs, model_deployments, processing_location` (no `content_type`); `begin_analyze_binary`: `input_range, content_type, processing_location` | + +**If a fix is now included in the generated code upstream, remove it from this skill document.** + +### Step 6: Run CI Checks + +All CI checks use tox with the configuration from `eng/tox/tox.ini`. Run from the package directory: + +```bash +cd sdk/contentunderstanding/azure-ai-contentunderstanding +``` + +#### Required CI Checks + +These checks run in CI and must pass: + +**Pylint** (Linting) +```bash +tox -e pylint -c ../../../eng/tox/tox.ini --root . +``` + +**MyPy** (Type Checking) +```bash +tox -e mypy -c ../../../eng/tox/tox.ini --root . +``` + +**Pyright** (Type Checking) +```bash +tox -e pyright -c ../../../eng/tox/tox.ini --root . +``` + +**Black** (Code Formatting) +```bash +black --check azure/ +# To auto-format: black azure/ +``` + +**Bandit** (Security Linting) +```bash +bandit -r azure/ -x "azure/**/tests/**" +``` + +#### Release Blocking Checks + +**Sphinx** (Documentation) +```bash +tox -e sphinx -c ../../../eng/tox/tox.ini --root . +``` + +**Tip:** Use `TOX_PIP_IMPL=uv` for faster package installation: +```bash +TOX_PIP_IMPL=uv tox -e pylint -c ../../../eng/tox/tox.ini --root . +``` + +### Step 7: Run Tests + +Run tests in **playback mode** to verify the regenerated SDK works with existing recordings: + +```bash +cd sdk/contentunderstanding/azure-ai-contentunderstanding +AZURE_TEST_RUN_LIVE=false pytest +``` + +Run specific test: +```bash +AZURE_TEST_RUN_LIVE=false pytest tests/test_content_understanding_content_analyzers_operations.py::TestContentUnderstandingContentAnalyzersOperations::test_content_analyzers_get +``` + +If tests fail: +1. Check if model/API changes broke test assertions — update tests as needed +2. If API behavior changed, re-record with `AZURE_TEST_RUN_LIVE=true AZURE_TEST_RECORD_MODE=true pytest` +3. See `tests/README.md` for detailed testing guide + +### Step 8: Update This Skill Document + +**Important**: Update this SKILL.md file directly with any changes: + +- Add any new fixes discovered during this update +- Remove fixes that are now included in upstream generated code +- Update fix descriptions if implementation details changed + +--- + +## Current Known Fixes + +These fixes address issues in the generated code that are not yet resolved upstream in the TypeSpec emitter. Customizations are applied via `_patch.py` files, which are preserved during regeneration. + +### Fix 1: Default string_encoding to "codePoint" + +**Files**: +- `azure/ai/contentunderstanding/_patch.py` (sync) +- `azure/ai/contentunderstanding/aio/_patch.py` (async) + +**Problem**: The generated code does not set a default value for `string_encoding`, but Python strings use Unicode code points for `len()` and indexing. + +**Why this matters**: +- Python's `str[i]` and `len(str)` operate on Unicode code points +- Without this default, span offsets and lengths returned by the service may not align correctly with Python string operations +- Users would need to remember to always pass `string_encoding="codePoint"` manually + +**Fix**: Override `begin_analyze` and `begin_analyze_binary` to set the default: + +```python +# In begin_analyze and begin_analyze_binary: +kwargs["string_encoding"] = "codePoint" +``` + +--- + +### Fix 2: AnalyzeLROPoller with .operation_id Property (Sync) + +**File**: `azure/ai/contentunderstanding/models/_patch.py` + +**Problem**: The generated LROPoller does not expose the operation ID, which is needed to call `get_result_file()`. + +**Why this matters**: +- Users need the operation ID to retrieve intermediate or final result files from the service +- The operation ID is embedded in the `Operation-Location` header but not easily accessible + +**Fix**: Custom `AnalyzeLROPoller` class that extracts and exposes the operation ID: + +```python +class AnalyzeLROPoller(LROPoller[PollingReturnType_co]): + @property + def operation_id(self) -> str: + operation_location = self.polling_method()._initial_response.http_response.headers["Operation-Location"] + return _parse_operation_id(operation_location) + + @classmethod + def from_poller(cls, poller: LROPoller[PollingReturnType_co]) -> "AnalyzeLROPoller[PollingReturnType_co]": + """Wrap an existing LROPoller without re-initializing the polling method.""" + instance = object.__new__(cls) + instance.__dict__.update(poller.__dict__) + return instance + + @classmethod + def from_continuation_token(cls, polling_method, continuation_token, **kwargs) -> "AnalyzeLROPoller": + """Create a poller from a continuation token.""" + # ... implementation +``` + +--- + +### Fix 3: AnalyzeAsyncLROPoller with .operation_id Property (Async) + +**File**: `azure/ai/contentunderstanding/aio/models/_patch.py` + +**Problem**: Same as Fix 2, but for async operations. + +**Why this matters**: Async users also need access to the operation ID for `get_result_file()`. + +**Fix**: Custom `AnalyzeAsyncLROPoller` class (mirrors sync version): + +```python +class AnalyzeAsyncLROPoller(AsyncLROPoller[PollingReturnType_co]): + @property + def operation_id(self) -> str: + operation_location = self.polling_method()._initial_response.http_response.headers["Operation-Location"] + return _parse_operation_id(operation_location) + + @classmethod + def from_poller(cls, poller: AsyncLROPoller[PollingReturnType_co]) -> "AnalyzeAsyncLROPoller[PollingReturnType_co]": + """Wrap an existing AsyncLROPoller without re-initializing the polling method.""" + instance = object.__new__(cls) + instance.__dict__.update(poller.__dict__) + return instance + + @classmethod + async def from_continuation_token(cls, polling_method, continuation_token, **kwargs): + """Create a poller from a continuation token.""" + # ... implementation +``` + +--- + +### Fix 4: Add .value Property to ContentField Types + +**File**: `azure/ai/contentunderstanding/models/_patch.py` + +**Problem**: The generated ContentField subtypes (StringField, NumberField, etc.) only expose typed properties like `value_string`, `value_number`, etc. This differs from other Azure SDKs which expose a convenient `value` property. + +**Why this matters**: +- Improves developer experience by providing a consistent, simpler way to access field values +- Reduces boilerplate in user code (no need to check field type before accessing value) +- Simplifies samples and documentation + +**Fix**: Add a `value` property to each ContentField subtype AND the base class at runtime in `patch_sdk()`: + +```python +# For each specific subtype: +_add_value_property_to_field(StringField, "value_string", Optional[str]) +_add_value_property_to_field(IntegerField, "value_integer", Optional[int]) +_add_value_property_to_field(NumberField, "value_number", Optional[float]) +_add_value_property_to_field(BooleanField, "value_boolean", Optional[bool]) +_add_value_property_to_field(DateField, "value_date", Optional[str]) +_add_value_property_to_field(TimeField, "value_time", Optional[str]) +_add_value_property_to_field(ArrayField, "value_array", Optional[List[Any]]) +_add_value_property_to_field(ObjectField, "value_object", Optional[Dict[str, Any]]) +_add_value_property_to_field(JsonField, "value_json", Optional[Any]) + +# For the base ContentField class (dynamic lookup): +def _content_field_value_getter(self: ContentField) -> Any: + for attr in ["value_string", "value_integer", "value_number", ...]: + if hasattr(self, attr): + return getattr(self, attr) + return None +``` + +--- + +### Fix 5: KeyFrameTimesMs Property Casing Inconsistency + +**File**: `azure/ai/contentunderstanding/models/_patch.py` + +**Problem**: The Content Understanding service has a known issue where it returns `KeyFrameTimesMs` (PascalCase) instead of the expected `keyFrameTimesMs` (camelCase) for the `AudioVisualContent` type. + +**Why this matters**: Without this fix, the SDK would fail to deserialize the `key_frame_times_ms` property when the service returns PascalCase, causing data loss for users processing video content. + +**Fix**: Patch `AudioVisualContent.__init__` to normalize the casing (forward compatible): + +```python +_original_audio_visual_content_init = _models.AudioVisualContent.__init__ + +def _patched_audio_visual_content_init(self, *args: Any, **kwargs: Any) -> None: + if args and isinstance(args[0], dict): + mapping = dict(args[0]) + # Forward compatible: only normalizes if incorrect casing exists and correct casing doesn't + if "KeyFrameTimesMs" in mapping and "keyFrameTimesMs" not in mapping: + mapping["keyFrameTimesMs"] = mapping["KeyFrameTimesMs"] + args = (mapping,) + args[1:] + _original_audio_visual_content_init(self, *args, **kwargs) + +_models.AudioVisualContent.__init__ = _patched_audio_visual_content_init +``` + +--- + +### Fix 6: RecordMergePatchUpdate Type Alias + +**File**: `azure/ai/contentunderstanding/models/_patch.py` + +**Problem**: `RecordMergePatchUpdate` is a TypeSpec artifact used for model deployments that wasn't generated. + +**Why this matters**: The type is referenced but missing, causing import errors. + +**Fix**: Define as a type alias: + +```python +RecordMergePatchUpdate = Dict[str, str] +``` + +--- + +### Fix 7: Default content_type for begin_analyze_binary + +**Files**: +- `azure/ai/contentunderstanding/_patch.py` (sync) +- `azure/ai/contentunderstanding/aio/_patch.py` (async) + +**Problem**: The generated `begin_analyze_binary` method may not have the correct default `content_type` for binary uploads. + +**Why this matters**: Users uploading binary files (PDFs, images, etc.) should have `application/octet-stream` as the default, not `application/json`. + +**Fix**: Override `begin_analyze_binary` with correct default: + +```python +def begin_analyze_binary( + self, + analyzer_id: str, + binary_input: bytes, + *, + content_type: str = "application/octet-stream", # Correct default for binary + **kwargs: Any, +) -> "AnalyzeLROPoller[_models.AnalyzeResult]": + # ... +``` + +--- + +### Fix 8: Parameter Order for Analyze Methods + +**Files**: +- `azure/ai/contentunderstanding/_patch.py` (sync) +- `azure/ai/contentunderstanding/aio/_patch.py` (async) + +**Problem**: The generated code does not order parameters consistently with other Azure SDKs (e.g., Java). The expected order is to list the most commonly used parameters first. + +**Expected parameter order**: +- `begin_analyze` (convenience): `analyzer_id, *, inputs, model_deployments, processing_location` (no `content_type`) +- `begin_analyze` (body overloads): `analyzer_id, body, *, processing_location, content_type` +- `begin_analyze_binary`: `analyzer_id, binary_input, *, input_range, content_type, processing_location` + +**Why this matters**: Consistent parameter ordering across language SDKs improves developer experience and reduces confusion when porting code between languages. The convenience overload doesn't need `content_type` since it's always JSON. + +**Note**: The generated code includes `content_type` in all `begin_analyze` overloads. We explicitly remove it from the convenience overload (with `inputs`/`model_deployments`) to match the Java SDK API. The `body` overloads retain `content_type` for advanced users who pass raw JSON or binary streams. + +**Fix**: Override methods with correct parameter order: + +```python +# begin_analyze (convenience overload - no content_type) +def begin_analyze( + self, + analyzer_id: str, + *, + inputs: Optional[list[_models.AnalyzeInput]] = None, + model_deployments: Optional[dict[str, str]] = None, + processing_location: Optional[Union[str, _models.ProcessingLocation]] = None, + **kwargs: Any, +) -> "AnalyzeLROPoller[_models.AnalyzeResult]": + # ... + +# begin_analyze (implementation - keeps content_type internally) +def begin_analyze( + self, + analyzer_id: str, + body: Union[JSON, IO[bytes]] = _Unset, + *, + inputs: Optional[list[_models.AnalyzeInput]] = None, + model_deployments: Optional[dict[str, str]] = None, + processing_location: Optional[Union[str, _models.ProcessingLocation]] = None, + content_type: Optional[str] = None, + **kwargs: Any, +) -> "AnalyzeLROPoller[_models.AnalyzeResult]": + # ... + +# begin_analyze_binary +def begin_analyze_binary( + self, + analyzer_id: str, + binary_input: bytes, + *, + input_range: Optional[str] = None, + content_type: str = "application/octet-stream", + processing_location: Optional[Union[str, _models.ProcessingLocation]] = None, + **kwargs: Any, +) -> "AnalyzeLROPoller[_models.AnalyzeResult]": + # ... +``` + +--- + +## Troubleshooting + +### tsp-client: command not found + +Run `npx` from the `eng/common/tsp-client` directory where it's installed: + +```bash +cd eng/common/tsp-client +npx tsp-client update -o ../../../sdk/contentunderstanding/azure-ai-contentunderstanding +``` + +### Missing pip or venv + +Install Python dependencies (Ubuntu): + +```bash +sudo apt install python-is-python3 python3-pip python3.12-venv +``` + +### Pylint/MyPy/Sphinx Failures After Regeneration + +1. Check if generated code introduced new issues +2. Verify `_patch.py` files weren't accidentally modified +3. May need to update customizations for new generated code structure + +### Tox Issues + +1. **Missing tox**: `pip install tox` +2. **Tox version mismatch**: Requires tox >= 4.4.10. Upgrade: `pip install --upgrade tox` +3. **Slow installation**: Use uv: `TOX_PIP_IMPL=uv tox -e pylint ...` + +### Tests Fail + +1. Check if model/API changes broke test assertions — update tests as needed +2. If API behavior changed, re-record: `AZURE_TEST_RUN_LIVE=true AZURE_TEST_RECORD_MODE=true pytest` +3. See `tests/README.md` for test-proxy troubleshooting (connection refused errors, etc.) + +### Customizations Not Applied + +1. Ensure all 8 `_patch.py` files exist and weren't deleted during regeneration +2. Check that `__all__` exports include customized classes in files that have them +3. Verify `patch_sdk()` is called (happens automatically on import via `__init__.py`) +4. Compare current `_patch.py` content against this skill document to ensure all fixes are present + +--- + +## Related Files + +- `tsp-location.yaml` - TypeSpec commit reference +- `azure/ai/contentunderstanding/_patch.py` - Client customizations (sync) - **Fixes 1, 7, 8** +- `azure/ai/contentunderstanding/aio/_patch.py` - Client customizations (async) - **Fixes 1, 7, 8** +- `azure/ai/contentunderstanding/models/_patch.py` - Model customizations - **Fixes 2, 4, 5, 6** +- `azure/ai/contentunderstanding/aio/models/_patch.py` - Async poller customizations - **Fix 3** +- `azure/ai/contentunderstanding/_operations/_patch.py` - Currently empty (no customizations) +- `azure/ai/contentunderstanding/aio/_operations/_patch.py` - Currently empty (no customizations) +- `azure/ai/contentunderstanding/operations/_patch.py` - Currently empty (no customizations) +- `azure/ai/contentunderstanding/aio/operations/_patch.py` - Currently empty (no customizations) +- `tests/README.md` - Testing guide with test modes and test-proxy configuration + +## Reference Documentation + +- [Python SDK Customization Guide](https://aka.ms/azsdk/python/dpcodegen/python/customize) +- [Local SDK Workflow](https://github.com/Azure/azure-sdk-for-python/blob/main/eng/common/instructions/azsdk-tools/local-sdk-workflow.instructions.md) +- [Azure SDK Python Design Guidelines](https://azure.github.io/azure-sdk/python_design.html) +- [Azure SDK Python Testing Guide](https://github.com/Azure/azure-sdk-for-python/blob/main/doc/dev/tests.md) +- [Repo Health Status](https://github.com/Azure/azure-sdk-for-python/blob/main/doc/repo_health_status.md) - Required checks (MyPy, Pylint, Sphinx, Tests) +- [Pylint Guide](https://github.com/Azure/azure-sdk-for-python/blob/main/doc/dev/pylint_checking.md) +- [Static Type Checking Guide](https://github.com/Azure/azure-sdk-for-python/blob/main/doc/dev/static_type_checking.md) diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/sdkinternal-py-test-push-recordings/SKILL.md b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/sdkinternal-py-test-push-recordings/SKILL.md new file mode 100644 index 000000000000..9cf9b1e00f9c --- /dev/null +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/sdkinternal-py-test-push-recordings/SKILL.md @@ -0,0 +1,300 @@ +--- +name: sdkinternal-py-test-push-recordings +description: "Push test recordings to the azure-sdk-assets repository after recording tests." +--- + +## Purpose + +This skill pushes test recordings for the Azure AI Content Understanding SDK (`azure-ai-contentunderstanding`) to the [azure-sdk-assets](https://github.com/Azure/azure-sdk-assets) repository. After running tests in record mode, recordings must be pushed to the assets repo so they can be used for playback tests in CI. + +## When to Use + +Use this skill when: + +- You've just recorded new tests with `AZURE_TEST_RUN_LIVE=true AZURE_TEST_RECORD_MODE=true` +- Test recordings have been updated due to API changes +- You need to commit test changes for a PR + +## Prerequisites + +Before pushing recordings, ensure: + +1. You have **write access** to the [Azure/azure-sdk-assets](https://github.com/Azure/azure-sdk-assets) repository + - Requires membership in the `azure-sdk-write` GitHub group + - See [Permissions to azure-sdk-assets](https://dev.azure.com/azure-sdk/internal/_wiki/wikis/internal.wiki/785/Externalizing-Recordings-(Asset-Sync)?anchor=permissions-to-%60azure/azure-sdk-assets%60) +2. **Dev dependencies installed** (required for `manage_recordings.py`): + ```bash + # From package directory: sdk/contentunderstanding/azure-ai-contentunderstanding/ + pip install -r dev_requirements.txt + ``` +3. Tests have been run in record mode (`AZURE_TEST_RUN_LIVE=true AZURE_TEST_RECORD_MODE=true pytest`) +4. The package has a valid `assets.json` file (see "New Package Setup" below) +5. Git is configured with your name and email +6. Git version > 2.30.0 is installed + +## New Package Setup + +First check if `assets.json` already exists: + +```bash +# From repo root: azure-sdk-for-python/ +cd sdk/contentunderstanding/azure-ai-contentunderstanding +cat assets.json +``` + +If `assets.json` already exists (has a `Tag` value), skip to the "Instructions" section. + +If `assets.json` doesn't exist, you need to run the migration script. See the [Recording Migration Guide](https://github.com/Azure/azure-sdk-for-python/blob/main/doc/dev/recording_migration_guide.md) for details. + +The `assets.json` file has this structure: + +```json +{ + "AssetsRepo": "Azure/azure-sdk-assets", + "AssetsRepoPrefixPath": "python", + "TagPrefix": "python/contentunderstanding/azure-ai-contentunderstanding", + "Tag": "python/contentunderstanding/azure-ai-contentunderstanding_e261fca8e6" +} +``` + +## Instructions + +1. **Record tests**: Run tests in live mode to capture recordings + + ```bash + # From repo root: azure-sdk-for-python/ + cd sdk/contentunderstanding/azure-ai-contentunderstanding + AZURE_TEST_RUN_LIVE=true AZURE_TEST_RECORD_MODE=true pytest + ``` + +2. **Review recording changes** (optional but recommended): Check what recordings changed + + ```bash + # From repo root: azure-sdk-for-python/ + # Find the local recordings directory + python scripts/manage_recordings.py locate -p sdk/contentunderstanding/azure-ai-contentunderstanding/assets.json + + # Navigate to that directory and use git to review changes + cd + git status + git diff + ``` + +3. **Push recordings to assets repo**: + + ```bash + # From repo root: azure-sdk-for-python/ + python scripts/manage_recordings.py push -p sdk/contentunderstanding/azure-ai-contentunderstanding/assets.json + ``` + +4. **Verify `assets.json` was updated**: The `Tag` field should have a new value + + ```bash + # From repo root: azure-sdk-for-python/ + cat sdk/contentunderstanding/azure-ai-contentunderstanding/assets.json + ``` + +5. **Commit the updated `assets.json`**: Include this file in your PR + + ```bash + # From repo root: azure-sdk-for-python/ + git add sdk/contentunderstanding/azure-ai-contentunderstanding/assets.json + git commit -m "Update test recording tag" + ``` + +## Example + +```bash +# From repo root: azure-sdk-for-python/ + +# Record tests first (if not already done) +cd sdk/contentunderstanding/azure-ai-contentunderstanding +AZURE_TEST_RUN_LIVE=true AZURE_TEST_RECORD_MODE=true pytest + +# Push recordings (from repo root) +cd ../../.. +python scripts/manage_recordings.py push -p sdk/contentunderstanding/azure-ai-contentunderstanding/assets.json + +# Verify the tag was updated +# From repo root: azure-sdk-for-python/ +cat sdk/contentunderstanding/azure-ai-contentunderstanding/assets.json + +# Commit the change +# From repo root: azure-sdk-for-python/ +git add sdk/contentunderstanding/azure-ai-contentunderstanding/assets.json +git commit -m "Update test recording tag" +``` + +## Using the Script + +This skill includes a script that handles pre-flight checks, git configuration verification, and the push process. + +### Script Location + +```bash +sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/sdkinternal-py-test-push-recordings/scripts/push_recordings.sh +``` + +### Script Usage + +```bash +# From script directory: +# sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/sdkinternal-py-test-push-recordings/scripts/ + +# Push recordings +./push_recordings.sh + +# Dry run (see what would be executed) +./push_recordings.sh --dry-run + +# Save output to a custom log file +./push_recordings.sh --log my-push.log +``` + +### Script Features + +- **Virtual environment setup**: Automatically creates and activates `.venv` if not present +- **Dependency installation**: Installs `dev_requirements.txt` if `devtools_testutils` is missing +- **Pre-flight checks**: Verifies git is configured, assets.json exists, Git version is compatible +- **Logging**: Saves output to timestamped log files +- **Dry run**: See what would be executed without running +- **Post-run guidance**: Provides next steps after push completes + +## manage_recordings.py Commands + +The `manage_recordings.py` script supports several verbs: + +| Command | Description | +|---------|-------------| +| `locate` | Print the location of the library's locally cached recordings | +| `push` | Push recording updates to a new assets repo tag and update `assets.json` | +| `show` | Print the contents of the provided `assets.json` file | +| `restore` | Fetch recordings from the assets repo based on the tag in `assets.json` | +| `reset` | Discard any pending changes to recordings | + +### Usage + +```bash +# From repo root: azure-sdk-for-python/ +python scripts/manage_recordings.py -p +``` + +If you run from the package directory containing `assets.json`, the `-p` flag is optional: + +```bash +# From repo root: azure-sdk-for-python/ +cd sdk/contentunderstanding/azure-ai-contentunderstanding + +# From package directory: sdk/contentunderstanding/azure-ai-contentunderstanding/ +python ../../../scripts/manage_recordings.py push +``` + +## What Happens When You Push + +The `manage_recordings.py push` command: + +1. Scans your local recording files +2. Creates a new commit in the azure-sdk-assets repo with your recordings +3. Creates a new Git tag pointing to that commit +4. Updates `assets.json` in your package root with the new tag reference + +## Finding Recording Files + +### Local Recordings + +Recording files are stored locally at: + +``` +azure-sdk-for-python/.assets/ +``` + +To find your package's recordings: + +```bash +# From repo root: azure-sdk-for-python/ +python scripts/manage_recordings.py locate -p sdk/contentunderstanding/azure-ai-contentunderstanding/assets.json +``` + +The output will include an absolute path to the recordings directory. + +### Remote Recordings (Assets Repo) + +The `Tag` in `assets.json` points to your recordings in the assets repo. View recordings at: + +``` +https://github.com/Azure/azure-sdk-assets/tree/ +``` + +For example: +``` +https://github.com/Azure/azure-sdk-assets/tree/python/contentunderstanding/azure-ai-contentunderstanding_e261fca8e6 +``` + +## Troubleshooting + +**"ModuleNotFoundError: No module named 'devtools_testutils'"** + +- The `push_recordings.sh` script automatically handles this by setting up venv and installing dependencies +- To fix manually, install the dev requirements from the package directory: + ```bash + # From package directory: sdk/contentunderstanding/azure-ai-contentunderstanding/ + source .venv/bin/activate # Activate venv first + pip install -r dev_requirements.txt + ``` + +**"Permission denied" or "Authentication failed"** + +- Ensure you have membership in the `azure-sdk-write` GitHub group +- Check your Git credentials are configured correctly +- May need to authenticate via `gh auth login` if using GitHub CLI + +**"No recordings to push"** + +- Ensure you've run tests in record mode first: `AZURE_TEST_RUN_LIVE=true AZURE_TEST_RECORD_MODE=true pytest` +- Check recordings changed: `python scripts/manage_recordings.py locate` then `git status` in that directory + +**"assets.json not found"** + +- See the [Recording Migration Guide](https://github.com/Azure/azure-sdk-for-python/blob/main/doc/dev/recording_migration_guide.md) for initial setup + +**Push succeeds but `assets.json` unchanged** + +- The recordings may already be up to date +- Try re-recording tests with `AZURE_TEST_RUN_LIVE=true AZURE_TEST_RECORD_MODE=true` first + +**Git user not configured** + +- Run: `git config --global user.name "Your Name"` +- Run: `git config --global user.email "your.email@example.com"` +- Or set environment variables: `GIT_COMMIT_OWNER` and `GIT_COMMIT_EMAIL` + +**Git version too old** + +- Git version > 2.30.0 is required +- Check version: `git --version` +- Update Git if needed + +**Test proxy not found** + +- The test proxy will be automatically downloaded to `.proxy/` when tests run +- If issues persist, try running tests first: `AZURE_TEST_RUN_LIVE=false pytest` + +## Important Notes + +- **Always commit `assets.json`**: The updated tag must be in your PR for CI to use the new recordings +- **Don't commit local recordings**: The `.assets/` directory is gitignored +- **Recordings are sanitized**: Sensitive data (keys, endpoints) is automatically removed by the test proxy +- **Review before pushing**: Check recordings don't contain any leaked sensitive data +- **CI uses the tag**: If `assets.json` isn't updated, CI will use old recordings and tests may fail + +## Related Skills + +- `sdkinternal-py-sdk-pre-pr-check` - Run all CI checks before creating a PR +- `sdkinternal-py-env-setup-venv` - Set up virtual environment for development + +## Documentation + +- [Testing Guide - Update test recordings](https://github.com/Azure/azure-sdk-for-python/blob/main/doc/dev/tests.md#update-test-recordings) +- [Recording Migration Guide](https://github.com/Azure/azure-sdk-for-python/blob/main/doc/dev/recording_migration_guide.md) +- [Asset Sync (Externalizing Recordings)](https://dev.azure.com/azure-sdk/internal/_wiki/wikis/internal.wiki/785/Externalizing-Recordings-(Asset-Sync)) +- [manage_recordings.py](https://github.com/Azure/azure-sdk-for-python/blob/main/scripts/manage_recordings.py) diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/sdkinternal-py-test-push-recordings/scripts/push_recordings.sh b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/sdkinternal-py-test-push-recordings/scripts/push_recordings.sh new file mode 100755 index 000000000000..71bb790d223f --- /dev/null +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/.github/skills/sdkinternal-py-test-push-recordings/scripts/push_recordings.sh @@ -0,0 +1,347 @@ +#!/usr/bin/env bash +set -euo pipefail + +# push_recordings.sh - Push test recordings to azure-sdk-assets repository +# Usage: +# push_recordings.sh [--dry-run] [--log ] +# Examples: +# push_recordings.sh # push recordings +# push_recordings.sh --dry-run # show what would be done + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +PACKAGE_ROOT="$(cd "$SCRIPT_DIR/../../../.." && pwd)" +REPO_ROOT="$(cd "$PACKAGE_ROOT/../../.." && pwd)" + +DRY_RUN=0 +DATE_STR="$(date '+%Y%m%d_%H%M%S')" +LOG_FILE="$SCRIPT_DIR/push_recordings_${DATE_STR}.log" + +# Relative path from repo root to assets.json +ASSETS_JSON_PATH="sdk/contentunderstanding/azure-ai-contentunderstanding/assets.json" + +print_help() { + cat < Save output to (default: $LOG_FILE) + --help, -h Show this help message + +Prerequisites: + - Write access to https://github.com/Azure/azure-sdk-assets + (membership in azure-sdk-write GitHub group) + - Dev dependencies installed: pip install -r dev_requirements.txt + - Tests have been run in record mode (AZURE_TEST_RUN_LIVE=true AZURE_TEST_RECORD_MODE=true) + - Git is configured with user.name and user.email + - Git version > 2.30.0 + +Workflow: + 1. Run tests in record mode: AZURE_TEST_RUN_LIVE=true AZURE_TEST_RECORD_MODE=true pytest + 2. Push recordings: $(basename "$0") + 3. Commit updated assets.json to your PR + +What This Script Does: + - Checks for assets.json + - Verifies git is configured and version is compatible + - Pushes recordings to azure-sdk-assets repo + - Updates assets.json with the new tag + - Displays the updated tag for verification + +Examples: + $(basename "$0") # Push recordings + $(basename "$0") --dry-run # Show what would be done +EOF +} + +log() { + echo "[$(date '+%H:%M:%S')] $*" +} + +error() { + echo "[$(date '+%H:%M:%S')] ERROR: $*" >&2 +} + +warn() { + echo "[$(date '+%H:%M:%S')] WARNING: $*" >&2 +} + +# Parse arguments +while [[ ${#@} -ge 1 ]]; do + case "$1" in + --help|-h) + print_help + exit 0 + ;; + --dry-run) + DRY_RUN=1 + shift + ;; + --log) + if [[ ${#@} -ge 2 ]]; then + LOG_FILE="$2" + shift 2 + else + error "--log requires a file argument" + exit 2 + fi + ;; + *) + error "Unknown option: $1" + print_help + exit 2 + ;; + esac +done + +check_git_version() { + log "Checking git version..." + + local git_version + git_version="$(git --version | sed 's/git version //')" + + # Extract major and minor version + local major minor + major="$(echo "$git_version" | cut -d. -f1)" + minor="$(echo "$git_version" | cut -d. -f2)" + + # Require git version > 2.30.0 + if [[ "$major" -lt 2 ]] || { [[ "$major" -eq 2 ]] && [[ "$minor" -lt 30 ]]; }; then + error "Git version $git_version is too old. Version > 2.30.0 is required." + error "Please update Git: https://git-scm.com/downloads" + exit 1 + fi + + log "Git version: $git_version (OK)" +} + +check_git_config() { + log "Checking git configuration..." + + local git_name git_email + git_name="$(git config --get user.name || echo "${GIT_COMMIT_OWNER:-}")" + git_email="$(git config --get user.email || echo "${GIT_COMMIT_EMAIL:-}")" + + if [[ -z "$git_name" ]]; then + error "Git user.name is not configured" + error "Run: git config --global user.name \"Your Name\"" + error "Or set environment variable: GIT_COMMIT_OWNER" + exit 1 + fi + + if [[ -z "$git_email" ]]; then + error "Git user.email is not configured" + error "Run: git config --global user.email \"your.email@example.com\"" + error "Or set environment variable: GIT_COMMIT_EMAIL" + exit 1 + fi + + log "Git configured as: $git_name <$git_email>" +} + +check_and_setup_venv() { + log "Checking virtual environment..." + + local venv_dir="$PACKAGE_ROOT/.venv" + + # Check if venv exists + if [[ -d "$venv_dir" ]]; then + log "Virtual environment found at $venv_dir" + else + log "Virtual environment not found at $venv_dir" + if [[ $DRY_RUN -eq 1 ]]; then + echo "DRY RUN: Would create virtual environment at $venv_dir" + else + log "Creating virtual environment..." + (cd "$PACKAGE_ROOT" && python3 -m venv .venv) || (cd "$PACKAGE_ROOT" && python -m venv .venv) + log "Virtual environment created at $venv_dir" + fi + fi + + # Check if we're already in the venv + if [[ "${VIRTUAL_ENV:-}" == "$venv_dir" ]]; then + log "Virtual environment is already activated" + else + log "Activating virtual environment..." + if [[ $DRY_RUN -eq 1 ]]; then + echo "DRY RUN: Would activate virtual environment: source $venv_dir/bin/activate" + else + # shellcheck disable=SC1091 + source "$venv_dir/bin/activate" + log "Virtual environment activated" + log "Python: $(which python)" + fi + fi +} + +check_and_install_dependencies() { + log "Checking for devtools_testutils module..." + + if python -c "import devtools_testutils" 2>/dev/null; then + log "devtools_testutils module found (OK)" + return 0 + fi + + log "Module 'devtools_testutils' is not installed." + + if [[ $DRY_RUN -eq 1 ]]; then + echo "DRY RUN: Would install dependencies:" + echo " pip install -e $PACKAGE_ROOT" + echo " pip install -r $PACKAGE_ROOT/dev_requirements.txt" + return 0 + fi + + log "Installing dependencies..." + log " Installing package in editable mode..." + (cd "$PACKAGE_ROOT" && pip install -e . --quiet) + log " Installing dev requirements..." + (cd "$PACKAGE_ROOT" && pip install -r dev_requirements.txt --quiet) + log "Dependencies installed" + + # Verify installation + if ! python -c "import devtools_testutils" 2>/dev/null; then + error "Failed to install devtools_testutils. Please check the installation manually." + exit 1 + fi + + log "devtools_testutils module found (OK)" +} + +check_assets_json() { + local assets_file="$REPO_ROOT/$ASSETS_JSON_PATH" + + if [[ -f "$assets_file" ]]; then + log "Found existing assets.json" + log "Current contents:" + cat "$assets_file" + echo "" + + # Check if Tag is empty + local tag + tag="$(grep -o '"Tag":[[:space:]]*"[^"]*"' "$assets_file" | sed 's/"Tag":[[:space:]]*"//' | sed 's/"$//')" + if [[ -z "$tag" ]]; then + warn "assets.json has an empty Tag. You may need to record tests first." + warn "Run: AZURE_TEST_RUN_LIVE=true AZURE_TEST_RECORD_MODE=true pytest" + fi + return 0 + else + error "assets.json not found at $assets_file" + error "See the Recording Migration Guide for initial setup:" + error "https://github.com/Azure/azure-sdk-for-python/blob/main/doc/dev/recording_migration_guide.md" + exit 1 + fi +} + +check_recordings_exist() { + local assets_dir="$REPO_ROOT/.assets" + + if [[ -d "$assets_dir" ]]; then + log "Found .assets directory at $assets_dir" + + # Try to locate the package's recordings + if [[ $DRY_RUN -eq 0 ]]; then + log "Locating package recordings..." + (cd "$REPO_ROOT" && python scripts/manage_recordings.py locate -p "$ASSETS_JSON_PATH") || true + fi + else + warn ".assets directory not found. You may need to run tests first." + warn "Run: AZURE_TEST_RUN_LIVE=true AZURE_TEST_RECORD_MODE=true pytest" + warn "Or restore existing recordings: python scripts/manage_recordings.py restore -p $ASSETS_JSON_PATH" + fi +} + +preflight_checks() { + log "Running pre-flight checks..." + + # Check if we're in the right directory structure + if [[ ! -f "$PACKAGE_ROOT/setup.py" ]] && [[ ! -f "$PACKAGE_ROOT/pyproject.toml" ]]; then + error "Cannot find setup.py or pyproject.toml at $PACKAGE_ROOT" + exit 1 + fi + + # Check if manage_recordings.py exists + if [[ ! -f "$REPO_ROOT/scripts/manage_recordings.py" ]]; then + error "Cannot find manage_recordings.py at $REPO_ROOT/scripts/" + exit 1 + fi + + # Check git version + check_git_version + + # Check git configuration + check_git_config + + # Check and setup virtual environment + check_and_setup_venv + + # Check and install dependencies (including devtools_testutils) + check_and_install_dependencies + + # Check assets.json exists + check_assets_json + + # Check if recordings might exist + check_recordings_exist + + log "Pre-flight checks passed" +} + +push_recordings() { + log "Pushing recordings to azure-sdk-assets..." + + if [[ $DRY_RUN -eq 1 ]]; then + echo "DRY RUN: (in $REPO_ROOT) python scripts/manage_recordings.py push -p $ASSETS_JSON_PATH" + else + (cd "$REPO_ROOT" && python scripts/manage_recordings.py push -p "$ASSETS_JSON_PATH") + fi +} + +show_result() { + local assets_file="$REPO_ROOT/$ASSETS_JSON_PATH" + + log "Push completed!" + log "" + log "Updated assets.json:" + if [[ $DRY_RUN -eq 0 ]]; then + cat "$assets_file" + else + echo "DRY RUN: would show updated assets.json" + fi + echo "" + + log "Next steps:" + log " 1. Verify the Tag was updated in assets.json" + log " 2. Run playback tests to verify: AZURE_TEST_RUN_LIVE=false pytest" + log " 3. Commit assets.json to your PR:" + log " git add $ASSETS_JSON_PATH" + log " git commit -m \"Update test recording tag\"" +} + +main() { + echo "========================================" + echo " Push Recordings to Assets Repo" + echo "========================================" + echo "Script directory: $SCRIPT_DIR" + echo "Package root: $PACKAGE_ROOT" + echo "Repository root: $REPO_ROOT" + echo "Assets JSON: $ASSETS_JSON_PATH" + echo "Log file: $LOG_FILE" + if [[ $DRY_RUN -eq 1 ]]; then + echo "Mode: Dry run" + else + echo "Mode: Push" + fi + echo "" + + { + preflight_checks + push_recordings + show_result + log "Done!" + } 2>&1 | tee "$LOG_FILE" +} + +main diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/.gitignore b/sdk/contentunderstanding/azure-ai-contentunderstanding/.gitignore index 485d2e026cc3..469c84db854b 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/.gitignore +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/.gitignore @@ -25,4 +25,9 @@ tests/recordings/ # Environment variables .env +# Logs +*.log +logs/ + +# Local files .local_only/ \ No newline at end of file diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/assets.json b/sdk/contentunderstanding/azure-ai-contentunderstanding/assets.json index 27253d354f2c..ded54d3b8797 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/assets.json +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/assets.json @@ -2,5 +2,5 @@ "AssetsRepo": "Azure/azure-sdk-assets", "AssetsRepoPrefixPath": "python", "TagPrefix": "python/contentunderstanding/azure-ai-contentunderstanding", - "Tag": "python/contentunderstanding/azure-ai-contentunderstanding_e261fca8e6" + "Tag": "python/contentunderstanding/azure-ai-contentunderstanding_8ae82e61bc" } diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/azure/ai/contentunderstanding/_operations/_operations.py b/sdk/contentunderstanding/azure-ai-contentunderstanding/azure/ai/contentunderstanding/_operations/_operations.py index 5ebd158784a4..c06a4b0aab99 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/azure/ai/contentunderstanding/_operations/_operations.py +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/azure/ai/contentunderstanding/_operations/_operations.py @@ -49,7 +49,7 @@ def build_content_understanding_analyze_request( # pylint: disable=name-too-long analyzer_id: str, *, - string_encoding: Optional[str] = None, + string_encoding: str, processing_location: Optional[Union[str, _models.ProcessingLocation]] = None, **kwargs: Any ) -> HttpRequest: @@ -69,11 +69,10 @@ def build_content_understanding_analyze_request( # pylint: disable=name-too-lon _url: str = _url.format(**path_format_arguments) # type: ignore # Construct parameters - _params["api-version"] = _SERIALIZER.query("api_version", api_version, "str") - if string_encoding is not None: - _params["stringEncoding"] = _SERIALIZER.query("string_encoding", string_encoding, "str") + _params["stringEncoding"] = _SERIALIZER.query("string_encoding", string_encoding, "str") if processing_location is not None: _params["processingLocation"] = _SERIALIZER.query("processing_location", processing_location, "str") + _params["api-version"] = _SERIALIZER.query("api_version", api_version, "str") # Construct headers if content_type is not None: @@ -86,15 +85,15 @@ def build_content_understanding_analyze_request( # pylint: disable=name-too-lon def build_content_understanding_analyze_binary_request( # pylint: disable=name-too-long analyzer_id: str, *, - string_encoding: Optional[str] = None, - processing_location: Optional[Union[str, _models.ProcessingLocation]] = None, + string_encoding: str, input_range: Optional[str] = None, + processing_location: Optional[Union[str, _models.ProcessingLocation]] = None, **kwargs: Any ) -> HttpRequest: _headers = case_insensitive_dict(kwargs.pop("headers", {}) or {}) _params = case_insensitive_dict(kwargs.pop("params", {}) or {}) - content_type: str = kwargs.pop("content_type") + content_type: Optional[str] = kwargs.pop("content_type", _headers.pop("content-type", None)) api_version: str = kwargs.pop("api_version", _params.pop("api-version", "2025-11-01")) accept = _headers.pop("Accept", "application/json") @@ -107,16 +106,16 @@ def build_content_understanding_analyze_binary_request( # pylint: disable=name- _url: str = _url.format(**path_format_arguments) # type: ignore # Construct parameters - _params["api-version"] = _SERIALIZER.query("api_version", api_version, "str") - if string_encoding is not None: - _params["stringEncoding"] = _SERIALIZER.query("string_encoding", string_encoding, "str") - if processing_location is not None: - _params["processingLocation"] = _SERIALIZER.query("processing_location", processing_location, "str") + _params["stringEncoding"] = _SERIALIZER.query("string_encoding", string_encoding, "str") if input_range is not None: _params["range"] = _SERIALIZER.query("input_range", input_range, "str") + if processing_location is not None: + _params["processingLocation"] = _SERIALIZER.query("processing_location", processing_location, "str") + _params["api-version"] = _SERIALIZER.query("api_version", api_version, "str") # Construct headers - _headers["content-type"] = _SERIALIZER.header("content_type", content_type, "str") + if content_type is not None: + _headers["content-type"] = _SERIALIZER.header("content_type", content_type, "str") _headers["Accept"] = _SERIALIZER.header("accept", accept, "str") return HttpRequest(method="POST", url=_url, params=_params, headers=_headers, **kwargs) @@ -460,7 +459,7 @@ def _analyze_initial( analyzer_id: str, body: Union[JSON, IO[bytes]] = _Unset, *, - string_encoding: Optional[str] = None, + string_encoding: str, processing_location: Optional[Union[str, _models.ProcessingLocation]] = None, inputs: Optional[list[_models.AnalyzeInput]] = None, model_deployments: Optional[dict[str, str]] = None, @@ -538,7 +537,7 @@ def begin_analyze( self, analyzer_id: str, *, - string_encoding: Optional[str] = None, + string_encoding: str, processing_location: Optional[Union[str, _models.ProcessingLocation]] = None, content_type: str = "application/json", inputs: Optional[list[_models.AnalyzeInput]] = None, @@ -551,7 +550,7 @@ def begin_analyze( :type analyzer_id: str :keyword string_encoding: The string encoding format for content spans in the response. Possible values are 'codePoint', 'utf16', and ``utf8``. Default is ``codePoint``."). - Default value is None. + Required. :paramtype string_encoding: str :keyword processing_location: The location where the data may be processed. Defaults to global. Known values are: "geography", "dataZone", and "global". Default value is None. @@ -578,7 +577,7 @@ def begin_analyze( analyzer_id: str, body: JSON, *, - string_encoding: Optional[str] = None, + string_encoding: str, processing_location: Optional[Union[str, _models.ProcessingLocation]] = None, content_type: str = "application/json", **kwargs: Any @@ -591,7 +590,7 @@ def begin_analyze( :type body: JSON :keyword string_encoding: The string encoding format for content spans in the response. Possible values are 'codePoint', 'utf16', and ``utf8``. Default is ``codePoint``."). - Default value is None. + Required. :paramtype string_encoding: str :keyword processing_location: The location where the data may be processed. Defaults to global. Known values are: "geography", "dataZone", and "global". Default value is None. @@ -611,7 +610,7 @@ def begin_analyze( analyzer_id: str, body: IO[bytes], *, - string_encoding: Optional[str] = None, + string_encoding: str, processing_location: Optional[Union[str, _models.ProcessingLocation]] = None, content_type: str = "application/json", **kwargs: Any @@ -624,7 +623,7 @@ def begin_analyze( :type body: IO[bytes] :keyword string_encoding: The string encoding format for content spans in the response. Possible values are 'codePoint', 'utf16', and ``utf8``. Default is ``codePoint``."). - Default value is None. + Required. :paramtype string_encoding: str :keyword processing_location: The location where the data may be processed. Defaults to global. Known values are: "geography", "dataZone", and "global". Default value is None. @@ -644,7 +643,7 @@ def begin_analyze( analyzer_id: str, body: Union[JSON, IO[bytes]] = _Unset, *, - string_encoding: Optional[str] = None, + string_encoding: str, processing_location: Optional[Union[str, _models.ProcessingLocation]] = None, inputs: Optional[list[_models.AnalyzeInput]] = None, model_deployments: Optional[dict[str, str]] = None, @@ -658,7 +657,7 @@ def begin_analyze( :type body: JSON or IO[bytes] :keyword string_encoding: The string encoding format for content spans in the response. Possible values are 'codePoint', 'utf16', and ``utf8``. Default is ``codePoint``."). - Default value is None. + Required. :paramtype string_encoding: str :keyword processing_location: The location where the data may be processed. Defaults to global. Known values are: "geography", "dataZone", and "global". Default value is None. @@ -743,9 +742,9 @@ def _analyze_binary_initial( analyzer_id: str, binary_input: bytes, *, - string_encoding: Optional[str] = None, - processing_location: Optional[Union[str, _models.ProcessingLocation]] = None, + string_encoding: str, input_range: Optional[str] = None, + processing_location: Optional[Union[str, _models.ProcessingLocation]] = None, **kwargs: Any ) -> Iterator[bytes]: error_map: MutableMapping = { @@ -756,10 +755,10 @@ def _analyze_binary_initial( } error_map.update(kwargs.pop("error_map", {}) or {}) - _headers = kwargs.pop("headers", {}) or {} + _headers = case_insensitive_dict(kwargs.pop("headers", {}) or {}) _params = kwargs.pop("params", {}) or {} - content_type: str = kwargs.pop("content_type") + content_type: Optional[str] = kwargs.pop("content_type", _headers.pop("content-type", None)) cls: ClsType[Iterator[bytes]] = kwargs.pop("cls", None) _content = binary_input @@ -767,8 +766,8 @@ def _analyze_binary_initial( _request = build_content_understanding_analyze_binary_request( analyzer_id=analyzer_id, string_encoding=string_encoding, - processing_location=processing_location, input_range=input_range, + processing_location=processing_location, content_type=content_type, api_version=self._config.api_version, content=_content, @@ -814,9 +813,9 @@ def begin_analyze_binary( analyzer_id: str, binary_input: bytes, *, - string_encoding: Optional[str] = None, - processing_location: Optional[Union[str, _models.ProcessingLocation]] = None, + string_encoding: str, input_range: Optional[str] = None, + processing_location: Optional[Union[str, _models.ProcessingLocation]] = None, **kwargs: Any ) -> LROPoller[_models.AnalyzeResult]: """Extract content and fields from input. @@ -827,24 +826,24 @@ def begin_analyze_binary( :type binary_input: bytes :keyword string_encoding: The string encoding format for content spans in the response. Possible values are 'codePoint', 'utf16', and ``utf8``. Default is ``codePoint``."). - Default value is None. + Required. :paramtype string_encoding: str - :keyword processing_location: The location where the data may be processed. Defaults to - global. Known values are: "geography", "dataZone", and "global". Default value is None. - :paramtype processing_location: str or ~azure.ai.contentunderstanding.models.ProcessingLocation :keyword input_range: Range of the input to analyze (ex. ``1-3,5,9-``). Document content uses 1-based page numbers, while audio visual content uses integer milliseconds. Default value is None. :paramtype input_range: str + :keyword processing_location: The location where the data may be processed. Defaults to + global. Known values are: "geography", "dataZone", and "global". Default value is None. + :paramtype processing_location: str or ~azure.ai.contentunderstanding.models.ProcessingLocation :return: An instance of LROPoller that returns AnalyzeResult. The AnalyzeResult is compatible with MutableMapping :rtype: ~azure.core.polling.LROPoller[~azure.ai.contentunderstanding.models.AnalyzeResult] :raises ~azure.core.exceptions.HttpResponseError: """ - _headers = kwargs.pop("headers", {}) or {} + _headers = case_insensitive_dict(kwargs.pop("headers", {}) or {}) _params = kwargs.pop("params", {}) or {} - content_type: str = kwargs.pop("content_type") + content_type: Optional[str] = kwargs.pop("content_type", _headers.pop("content-type", None)) cls: ClsType[_models.AnalyzeResult] = kwargs.pop("cls", None) polling: Union[bool, PollingMethod] = kwargs.pop("polling", True) lro_delay = kwargs.pop("polling_interval", self._config.polling_interval) @@ -854,8 +853,8 @@ def begin_analyze_binary( analyzer_id=analyzer_id, binary_input=binary_input, string_encoding=string_encoding, - processing_location=processing_location, input_range=input_range, + processing_location=processing_location, content_type=content_type, cls=lambda x, y, z: x, headers=_headers, diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/azure/ai/contentunderstanding/_patch.py b/sdk/contentunderstanding/azure-ai-contentunderstanding/azure/ai/contentunderstanding/_patch.py index fa861178575f..e6eb2df173a5 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/azure/ai/contentunderstanding/_patch.py +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/azure/ai/contentunderstanding/_patch.py @@ -50,22 +50,15 @@ def begin_analyze( self, analyzer_id: str, *, - processing_location: Optional[Union[str, _models.ProcessingLocation]] = None, - content_type: str = "application/json", inputs: Optional[list[_models.AnalyzeInput]] = None, model_deployments: Optional[dict[str, str]] = None, + processing_location: Optional[Union[str, _models.ProcessingLocation]] = None, **kwargs: Any, ) -> "AnalyzeLROPoller[_models.AnalyzeResult]": # pyright: ignore[reportInvalidTypeArguments] """Extract content and fields from input. :param analyzer_id: The unique identifier of the analyzer. Required. :type analyzer_id: str - :keyword processing_location: The location where the data may be processed. Defaults to - global. Known values are: "geography", "dataZone", and "global". Default value is None. - :paramtype processing_location: str or ~azure.ai.contentunderstanding.models.ProcessingLocation - :keyword content_type: Body Parameter content-type. Content type parameter for JSON body. - Default value is "application/json". - :paramtype content_type: str :keyword inputs: Inputs to analyze. Currently, only pro mode supports multiple inputs. Default value is None. :paramtype inputs: list[~azure.ai.contentunderstanding.models.AnalyzeInput] @@ -73,6 +66,9 @@ def begin_analyze( Ex. { "gpt-4.1": "myGpt41Deployment", "text-embedding-3-large": "myTextEmbedding3LargeDeployment" }. Default value is None. :paramtype model_deployments: dict[str, str] + :keyword processing_location: The location where the data may be processed. Defaults to + global. Known values are: "geography", "dataZone", and "global". Default value is None. + :paramtype processing_location: str or ~azure.ai.contentunderstanding.models.ProcessingLocation :return: An instance of AnalyzeLROPoller that returns AnalyzeResult. The AnalyzeResult is compatible with MutableMapping. The poller includes an .operation_id property. :rtype: ~azure.ai.contentunderstanding.models.AnalyzeLROPoller[~azure.ai.contentunderstanding.models.AnalyzeResult] @@ -156,10 +152,10 @@ def begin_analyze( # type: ignore[override] # pyright: ignore[reportIncompatib analyzer_id: str, body: Union[JSON, IO[bytes]] = _Unset, *, - processing_location: Optional[Union[str, _models.ProcessingLocation]] = None, - content_type: Optional[str] = None, inputs: Optional[list[_models.AnalyzeInput]] = None, model_deployments: Optional[dict[str, str]] = None, + processing_location: Optional[Union[str, _models.ProcessingLocation]] = None, + content_type: Optional[str] = None, **kwargs: Any, ) -> "AnalyzeLROPoller[_models.AnalyzeResult]": # pyright: ignore[reportInvalidTypeArguments] """Extract content and fields from input. @@ -168,11 +164,6 @@ def begin_analyze( # type: ignore[override] # pyright: ignore[reportIncompatib :type analyzer_id: str :param body: Is either a JSON type or a IO[bytes] type. Default value is None. :type body: JSON or IO[bytes] - :keyword processing_location: The location where the data may be processed. Defaults to - global. Known values are: "geography", "dataZone", and "global". Default value is None. - :paramtype processing_location: str or ~azure.ai.contentunderstanding.models.ProcessingLocation - :keyword content_type: Body Parameter content-type. Default value is "application/json". - :paramtype content_type: str :keyword inputs: Inputs to analyze. Currently, only pro mode supports multiple inputs. Default value is None. :paramtype inputs: list[~azure.ai.contentunderstanding.models.AnalyzeInput] @@ -180,6 +171,11 @@ def begin_analyze( # type: ignore[override] # pyright: ignore[reportIncompatib Ex. { "gpt-4.1": "myGpt41Deployment", "text-embedding-3-large": "myTextEmbedding3LargeDeployment" }. Default value is None. :paramtype model_deployments: dict[str, str] + :keyword processing_location: The location where the data may be processed. Defaults to + global. Known values are: "geography", "dataZone", and "global". Default value is None. + :paramtype processing_location: str or ~azure.ai.contentunderstanding.models.ProcessingLocation + :keyword content_type: Body Parameter content-type. Default value is "application/json". + :paramtype content_type: str :return: An instance of AnalyzeLROPoller that returns AnalyzeResult. The AnalyzeResult is compatible with MutableMapping. The poller includes an .operation_id property. :rtype: ~azure.ai.contentunderstanding.models.AnalyzeLROPoller[~azure.ai.contentunderstanding.models.AnalyzeResult] @@ -226,9 +222,9 @@ def begin_analyze_binary( analyzer_id: str, binary_input: bytes, *, - processing_location: Optional[Union[str, _models.ProcessingLocation]] = None, input_range: Optional[str] = None, content_type: str = "application/octet-stream", + processing_location: Optional[Union[str, _models.ProcessingLocation]] = None, **kwargs: Any, ) -> "AnalyzeLROPoller[_models.AnalyzeResult]": # pyright: ignore[reportInvalidTypeArguments] """Extract content and fields from input. @@ -237,15 +233,15 @@ def begin_analyze_binary( :type analyzer_id: str :param binary_input: The binary content of the document to analyze. Required. :type binary_input: bytes - :keyword processing_location: The location where the data may be processed. Defaults to - global. Known values are: "geography", "dataZone", and "global". Default value is None. - :paramtype processing_location: str or ~azure.ai.contentunderstanding.models.ProcessingLocation :keyword input_range: Range of the input to analyze (ex. ``1-3,5,9-``). Document content uses 1-based page numbers, while audio visual content uses integer milliseconds. Default value is None. :paramtype input_range: str :keyword content_type: Body Parameter content-type. Content type parameter for binary body. Default value is "application/octet-stream". :paramtype content_type: str + :keyword processing_location: The location where the data may be processed. Defaults to + global. Known values are: "geography", "dataZone", and "global". Default value is None. + :paramtype processing_location: str or ~azure.ai.contentunderstanding.models.ProcessingLocation :return: An instance of AnalyzeLROPoller that returns AnalyzeResult. The AnalyzeResult is compatible with MutableMapping. The poller includes an .operation_id property. :rtype: ~azure.ai.contentunderstanding.models.AnalyzeLROPoller[~azure.ai.contentunderstanding.models.AnalyzeResult] @@ -256,16 +252,15 @@ def begin_analyze_binary( matches Python's native string indexing behavior (len() and str[i] use code points). This ensures ContentSpan offsets work correctly with Python string slicing. """ - # Set string_encoding to "codePoint" (matches Python's string indexing) - kwargs["string_encoding"] = "codePoint" - - # Call parent implementation + # Call parent implementation with string_encoding set to "codePoint" + # (matches Python's string indexing) poller = super().begin_analyze_binary( analyzer_id=analyzer_id, binary_input=binary_input, - processing_location=processing_location, + string_encoding="codePoint", input_range=input_range, content_type=content_type, + processing_location=processing_location, **kwargs, ) diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/azure/ai/contentunderstanding/_utils/model_base.py b/sdk/contentunderstanding/azure-ai-contentunderstanding/azure/ai/contentunderstanding/_utils/model_base.py index 12926fa98dcf..2e7977d8ab17 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/azure/ai/contentunderstanding/_utils/model_base.py +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/azure/ai/contentunderstanding/_utils/model_base.py @@ -37,6 +37,7 @@ TZ_UTC = timezone.utc _T = typing.TypeVar("_T") +_NONE_TYPE = type(None) def _timedelta_as_isostr(td: timedelta) -> str: @@ -171,6 +172,21 @@ def default(self, o): # pylint: disable=too-many-return-statements r"(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s\d{4}\s\d{2}:\d{2}:\d{2}\sGMT" ) +_ARRAY_ENCODE_MAPPING = { + "pipeDelimited": "|", + "spaceDelimited": " ", + "commaDelimited": ",", + "newlineDelimited": "\n", +} + + +def _deserialize_array_encoded(delimit: str, attr): + if isinstance(attr, str): + if attr == "": + return [] + return attr.split(delimit) + return attr + def _deserialize_datetime(attr: typing.Union[str, datetime]) -> datetime: """Deserialize ISO-8601 formatted string into Datetime object. @@ -202,7 +218,7 @@ def _deserialize_datetime(attr: typing.Union[str, datetime]) -> datetime: test_utc = date_obj.utctimetuple() if test_utc.tm_year > 9999 or test_utc.tm_year < 1: raise OverflowError("Hit max or min date") - return date_obj + return date_obj # type: ignore[no-any-return] def _deserialize_datetime_rfc7231(attr: typing.Union[str, datetime]) -> datetime: @@ -256,7 +272,7 @@ def _deserialize_time(attr: typing.Union[str, time]) -> time: """ if isinstance(attr, time): return attr - return isodate.parse_time(attr) + return isodate.parse_time(attr) # type: ignore[no-any-return] def _deserialize_bytes(attr): @@ -315,6 +331,8 @@ def _deserialize_int_as_str(attr): def get_deserializer(annotation: typing.Any, rf: typing.Optional["_RestField"] = None): if annotation is int and rf and rf._format == "str": return _deserialize_int_as_str + if annotation is str and rf and rf._format in _ARRAY_ENCODE_MAPPING: + return functools.partial(_deserialize_array_encoded, _ARRAY_ENCODE_MAPPING[rf._format]) if rf and rf._format: return _DESERIALIZE_MAPPING_WITHFORMAT.get(rf._format) return _DESERIALIZE_MAPPING.get(annotation) # pyright: ignore @@ -353,9 +371,39 @@ def __contains__(self, key: typing.Any) -> bool: return key in self._data def __getitem__(self, key: str) -> typing.Any: + # If this key has been deserialized (for mutable types), we need to handle serialization + if hasattr(self, "_attr_to_rest_field"): + cache_attr = f"_deserialized_{key}" + if hasattr(self, cache_attr): + rf = _get_rest_field(getattr(self, "_attr_to_rest_field"), key) + if rf: + value = self._data.get(key) + if isinstance(value, (dict, list, set)): + # For mutable types, serialize and return + # But also update _data with serialized form and clear flag + # so mutations via this returned value affect _data + serialized = _serialize(value, rf._format) + # If serialized form is same type (no transformation needed), + # return _data directly so mutations work + if isinstance(serialized, type(value)) and serialized == value: + return self._data.get(key) + # Otherwise return serialized copy and clear flag + try: + object.__delattr__(self, cache_attr) + except AttributeError: + pass + # Store serialized form back + self._data[key] = serialized + return serialized return self._data.__getitem__(key) def __setitem__(self, key: str, value: typing.Any) -> None: + # Clear any cached deserialized value when setting through dictionary access + cache_attr = f"_deserialized_{key}" + try: + object.__delattr__(self, cache_attr) + except AttributeError: + pass self._data.__setitem__(key, value) def __delitem__(self, key: str) -> None: @@ -483,6 +531,8 @@ def _is_model(obj: typing.Any) -> bool: def _serialize(o, format: typing.Optional[str] = None): # pylint: disable=too-many-return-statements if isinstance(o, list): + if format in _ARRAY_ENCODE_MAPPING and all(isinstance(x, str) for x in o): + return _ARRAY_ENCODE_MAPPING[format].join(o) return [_serialize(x, format) for x in o] if isinstance(o, dict): return {k: _serialize(v, format) for k, v in o.items()} @@ -758,6 +808,14 @@ def _deserialize_multiple_sequence( return type(obj)(_deserialize(deserializer, entry, module) for entry, deserializer in zip(obj, entry_deserializers)) +def _is_array_encoded_deserializer(deserializer: functools.partial) -> bool: + return ( + isinstance(deserializer, functools.partial) + and isinstance(deserializer.args[0], functools.partial) + and deserializer.args[0].func == _deserialize_array_encoded # pylint: disable=comparison-with-callable + ) + + def _deserialize_sequence( deserializer: typing.Optional[typing.Callable], module: typing.Optional[str], @@ -767,6 +825,19 @@ def _deserialize_sequence( return obj if isinstance(obj, ET.Element): obj = list(obj) + + # encoded string may be deserialized to sequence + if isinstance(obj, str) and isinstance(deserializer, functools.partial): + # for list[str] + if _is_array_encoded_deserializer(deserializer): + return deserializer(obj) + + # for list[Union[...]] + if isinstance(deserializer.args[0], list): + for sub_deserializer in deserializer.args[0]: + if _is_array_encoded_deserializer(sub_deserializer): + return sub_deserializer(obj) + return type(obj)(_deserialize(deserializer, entry, module) for entry in obj) @@ -817,16 +888,16 @@ def _get_deserialize_callable_from_annotation( # pylint: disable=too-many-retur # is it optional? try: - if any(a for a in annotation.__args__ if a == type(None)): # pyright: ignore + if any(a is _NONE_TYPE for a in annotation.__args__): # pyright: ignore if len(annotation.__args__) <= 2: # pyright: ignore if_obj_deserializer = _get_deserialize_callable_from_annotation( - next(a for a in annotation.__args__ if a != type(None)), module, rf # pyright: ignore + next(a for a in annotation.__args__ if a is not _NONE_TYPE), module, rf # pyright: ignore ) return functools.partial(_deserialize_with_optional, if_obj_deserializer) # the type is Optional[Union[...]], we need to remove the None type from the Union annotation_copy = copy.copy(annotation) - annotation_copy.__args__ = [a for a in annotation_copy.__args__ if a != type(None)] # pyright: ignore + annotation_copy.__args__ = [a for a in annotation_copy.__args__ if a is not _NONE_TYPE] # pyright: ignore return _get_deserialize_callable_from_annotation(annotation_copy, module, rf) except AttributeError: pass @@ -998,7 +1069,11 @@ def __init__( @property def _class_type(self) -> typing.Any: - return getattr(self._type, "args", [None])[0] + result = getattr(self._type, "args", [None])[0] + # type may be wrapped by nested functools.partial so we need to check for that + if isinstance(result, functools.partial): + return getattr(result, "args", [None])[0] + return result @property def _rest_name(self) -> str: @@ -1009,14 +1084,37 @@ def _rest_name(self) -> str: def __get__(self, obj: Model, type=None): # pylint: disable=redefined-builtin # by this point, type and rest_name will have a value bc we default # them in __new__ of the Model class - item = obj.get(self._rest_name) + # Use _data.get() directly to avoid triggering __getitem__ which clears the cache + item = obj._data.get(self._rest_name) if item is None: return item if self._is_model: return item - return _deserialize(self._type, _serialize(item, self._format), rf=self) + + # For mutable types, we want mutations to directly affect _data + # Check if we've already deserialized this value + cache_attr = f"_deserialized_{self._rest_name}" + if hasattr(obj, cache_attr): + # Return the value from _data directly (it's been deserialized in place) + return obj._data.get(self._rest_name) + + deserialized = _deserialize(self._type, _serialize(item, self._format), rf=self) + + # For mutable types, store the deserialized value back in _data + # so mutations directly affect _data + if isinstance(deserialized, (dict, list, set)): + obj._data[self._rest_name] = deserialized + object.__setattr__(obj, cache_attr, True) # Mark as deserialized + return deserialized + + return deserialized def __set__(self, obj: Model, value) -> None: + # Clear the cached deserialized object when setting a new value + cache_attr = f"_deserialized_{self._rest_name}" + if hasattr(obj, cache_attr): + object.__delattr__(obj, cache_attr) + if value is None: # we want to wipe out entries if users set attr to None try: @@ -1184,7 +1282,7 @@ def _get_wrapped_element( _get_element(v, exclude_readonly, meta, wrapped_element) else: wrapped_element.text = _get_primitive_type_value(v) - return wrapped_element + return wrapped_element # type: ignore[no-any-return] def _get_primitive_type_value(v) -> str: @@ -1197,7 +1295,9 @@ def _get_primitive_type_value(v) -> str: return str(v) -def _create_xml_element(tag, prefix=None, ns=None): +def _create_xml_element( + tag: typing.Any, prefix: typing.Optional[str] = None, ns: typing.Optional[str] = None +) -> ET.Element: if prefix and ns: ET.register_namespace(prefix, ns) if ns: diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/azure/ai/contentunderstanding/_utils/serialization.py b/sdk/contentunderstanding/azure-ai-contentunderstanding/azure/ai/contentunderstanding/_utils/serialization.py index 45a3e44e45cb..81ec1de5922b 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/azure/ai/contentunderstanding/_utils/serialization.py +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/azure/ai/contentunderstanding/_utils/serialization.py @@ -821,13 +821,20 @@ def serialize_basic(cls, data, data_type, **kwargs): :param str data_type: Type of object in the iterable. :rtype: str, int, float, bool :return: serialized object + :raises TypeError: raise if data_type is not one of str, int, float, bool. """ custom_serializer = cls._get_custom_serializers(data_type, **kwargs) if custom_serializer: return custom_serializer(data) if data_type == "str": return cls.serialize_unicode(data) - return eval(data_type)(data) # nosec # pylint: disable=eval-used + if data_type == "int": + return int(data) + if data_type == "float": + return float(data) + if data_type == "bool": + return bool(data) + raise TypeError("Unknown basic data type: {}".format(data_type)) @classmethod def serialize_unicode(cls, data): @@ -1757,7 +1764,7 @@ def deserialize_basic(self, attr, data_type): # pylint: disable=too-many-return :param str data_type: deserialization data type. :return: Deserialized basic type. :rtype: str, int, float or bool - :raises TypeError: if string format is not valid. + :raises TypeError: if string format is not valid or data_type is not one of str, int, float, bool. """ # If we're here, data is supposed to be a basic type. # If it's still an XML node, take the text @@ -1783,7 +1790,11 @@ def deserialize_basic(self, attr, data_type): # pylint: disable=too-many-return if data_type == "str": return self.deserialize_unicode(attr) - return eval(data_type)(attr) # nosec # pylint: disable=eval-used + if data_type == "int": + return int(attr) + if data_type == "float": + return float(attr) + raise TypeError("Unknown basic data type: {}".format(data_type)) @staticmethod def deserialize_unicode(data): diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/azure/ai/contentunderstanding/aio/_operations/_operations.py b/sdk/contentunderstanding/azure-ai-contentunderstanding/azure/ai/contentunderstanding/aio/_operations/_operations.py index 95b4657f176c..4f0a261c6d42 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/azure/ai/contentunderstanding/aio/_operations/_operations.py +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/azure/ai/contentunderstanding/aio/_operations/_operations.py @@ -69,7 +69,7 @@ async def _analyze_initial( analyzer_id: str, body: Union[JSON, IO[bytes]] = _Unset, *, - string_encoding: Optional[str] = None, + string_encoding: str, processing_location: Optional[Union[str, _models.ProcessingLocation]] = None, inputs: Optional[list[_models.AnalyzeInput]] = None, model_deployments: Optional[dict[str, str]] = None, @@ -147,7 +147,7 @@ async def begin_analyze( self, analyzer_id: str, *, - string_encoding: Optional[str] = None, + string_encoding: str, processing_location: Optional[Union[str, _models.ProcessingLocation]] = None, content_type: str = "application/json", inputs: Optional[list[_models.AnalyzeInput]] = None, @@ -160,7 +160,7 @@ async def begin_analyze( :type analyzer_id: str :keyword string_encoding: The string encoding format for content spans in the response. Possible values are 'codePoint', 'utf16', and ``utf8``. Default is ``codePoint``."). - Default value is None. + Required. :paramtype string_encoding: str :keyword processing_location: The location where the data may be processed. Defaults to global. Known values are: "geography", "dataZone", and "global". Default value is None. @@ -187,7 +187,7 @@ async def begin_analyze( analyzer_id: str, body: JSON, *, - string_encoding: Optional[str] = None, + string_encoding: str, processing_location: Optional[Union[str, _models.ProcessingLocation]] = None, content_type: str = "application/json", **kwargs: Any @@ -200,7 +200,7 @@ async def begin_analyze( :type body: JSON :keyword string_encoding: The string encoding format for content spans in the response. Possible values are 'codePoint', 'utf16', and ``utf8``. Default is ``codePoint``."). - Default value is None. + Required. :paramtype string_encoding: str :keyword processing_location: The location where the data may be processed. Defaults to global. Known values are: "geography", "dataZone", and "global". Default value is None. @@ -220,7 +220,7 @@ async def begin_analyze( analyzer_id: str, body: IO[bytes], *, - string_encoding: Optional[str] = None, + string_encoding: str, processing_location: Optional[Union[str, _models.ProcessingLocation]] = None, content_type: str = "application/json", **kwargs: Any @@ -233,7 +233,7 @@ async def begin_analyze( :type body: IO[bytes] :keyword string_encoding: The string encoding format for content spans in the response. Possible values are 'codePoint', 'utf16', and ``utf8``. Default is ``codePoint``."). - Default value is None. + Required. :paramtype string_encoding: str :keyword processing_location: The location where the data may be processed. Defaults to global. Known values are: "geography", "dataZone", and "global". Default value is None. @@ -253,7 +253,7 @@ async def begin_analyze( analyzer_id: str, body: Union[JSON, IO[bytes]] = _Unset, *, - string_encoding: Optional[str] = None, + string_encoding: str, processing_location: Optional[Union[str, _models.ProcessingLocation]] = None, inputs: Optional[list[_models.AnalyzeInput]] = None, model_deployments: Optional[dict[str, str]] = None, @@ -267,7 +267,7 @@ async def begin_analyze( :type body: JSON or IO[bytes] :keyword string_encoding: The string encoding format for content spans in the response. Possible values are 'codePoint', 'utf16', and ``utf8``. Default is ``codePoint``."). - Default value is None. + Required. :paramtype string_encoding: str :keyword processing_location: The location where the data may be processed. Defaults to global. Known values are: "geography", "dataZone", and "global". Default value is None. @@ -353,9 +353,9 @@ async def _analyze_binary_initial( analyzer_id: str, binary_input: bytes, *, - string_encoding: Optional[str] = None, - processing_location: Optional[Union[str, _models.ProcessingLocation]] = None, + string_encoding: str, input_range: Optional[str] = None, + processing_location: Optional[Union[str, _models.ProcessingLocation]] = None, **kwargs: Any ) -> AsyncIterator[bytes]: error_map: MutableMapping = { @@ -366,10 +366,10 @@ async def _analyze_binary_initial( } error_map.update(kwargs.pop("error_map", {}) or {}) - _headers = kwargs.pop("headers", {}) or {} + _headers = case_insensitive_dict(kwargs.pop("headers", {}) or {}) _params = kwargs.pop("params", {}) or {} - content_type: str = kwargs.pop("content_type") + content_type: Optional[str] = kwargs.pop("content_type", _headers.pop("content-type", None)) cls: ClsType[AsyncIterator[bytes]] = kwargs.pop("cls", None) _content = binary_input @@ -377,8 +377,8 @@ async def _analyze_binary_initial( _request = build_content_understanding_analyze_binary_request( analyzer_id=analyzer_id, string_encoding=string_encoding, - processing_location=processing_location, input_range=input_range, + processing_location=processing_location, content_type=content_type, api_version=self._config.api_version, content=_content, @@ -424,9 +424,9 @@ async def begin_analyze_binary( analyzer_id: str, binary_input: bytes, *, - string_encoding: Optional[str] = None, - processing_location: Optional[Union[str, _models.ProcessingLocation]] = None, + string_encoding: str, input_range: Optional[str] = None, + processing_location: Optional[Union[str, _models.ProcessingLocation]] = None, **kwargs: Any ) -> AsyncLROPoller[_models.AnalyzeResult]: """Extract content and fields from input. @@ -437,24 +437,24 @@ async def begin_analyze_binary( :type binary_input: bytes :keyword string_encoding: The string encoding format for content spans in the response. Possible values are 'codePoint', 'utf16', and ``utf8``. Default is ``codePoint``."). - Default value is None. + Required. :paramtype string_encoding: str - :keyword processing_location: The location where the data may be processed. Defaults to - global. Known values are: "geography", "dataZone", and "global". Default value is None. - :paramtype processing_location: str or ~azure.ai.contentunderstanding.models.ProcessingLocation :keyword input_range: Range of the input to analyze (ex. ``1-3,5,9-``). Document content uses 1-based page numbers, while audio visual content uses integer milliseconds. Default value is None. :paramtype input_range: str + :keyword processing_location: The location where the data may be processed. Defaults to + global. Known values are: "geography", "dataZone", and "global". Default value is None. + :paramtype processing_location: str or ~azure.ai.contentunderstanding.models.ProcessingLocation :return: An instance of AsyncLROPoller that returns AnalyzeResult. The AnalyzeResult is compatible with MutableMapping :rtype: ~azure.core.polling.AsyncLROPoller[~azure.ai.contentunderstanding.models.AnalyzeResult] :raises ~azure.core.exceptions.HttpResponseError: """ - _headers = kwargs.pop("headers", {}) or {} + _headers = case_insensitive_dict(kwargs.pop("headers", {}) or {}) _params = kwargs.pop("params", {}) or {} - content_type: str = kwargs.pop("content_type") + content_type: Optional[str] = kwargs.pop("content_type", _headers.pop("content-type", None)) cls: ClsType[_models.AnalyzeResult] = kwargs.pop("cls", None) polling: Union[bool, AsyncPollingMethod] = kwargs.pop("polling", True) lro_delay = kwargs.pop("polling_interval", self._config.polling_interval) @@ -464,8 +464,8 @@ async def begin_analyze_binary( analyzer_id=analyzer_id, binary_input=binary_input, string_encoding=string_encoding, - processing_location=processing_location, input_range=input_range, + processing_location=processing_location, content_type=content_type, cls=lambda x, y, z: x, headers=_headers, diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/azure/ai/contentunderstanding/aio/_patch.py b/sdk/contentunderstanding/azure-ai-contentunderstanding/azure/ai/contentunderstanding/aio/_patch.py index de63032e9574..df261d856cb2 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/azure/ai/contentunderstanding/aio/_patch.py +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/azure/ai/contentunderstanding/aio/_patch.py @@ -50,22 +50,15 @@ async def begin_analyze( self, analyzer_id: str, *, - processing_location: Optional[Union[str, _models.ProcessingLocation]] = None, - content_type: str = "application/json", inputs: Optional[list[_models.AnalyzeInput]] = None, model_deployments: Optional[dict[str, str]] = None, + processing_location: Optional[Union[str, _models.ProcessingLocation]] = None, **kwargs: Any, ) -> "AnalyzeAsyncLROPoller[_models.AnalyzeResult]": # pyright: ignore[reportInvalidTypeArguments] """Extract content and fields from input. :param analyzer_id: The unique identifier of the analyzer. Required. :type analyzer_id: str - :keyword processing_location: The location where the data may be processed. Defaults to - global. Known values are: "geography", "dataZone", and "global". Default value is None. - :paramtype processing_location: str or ~azure.ai.contentunderstanding.models.ProcessingLocation - :keyword content_type: Body Parameter content-type. Content type parameter for JSON body. - Default value is "application/json". - :paramtype content_type: str :keyword inputs: Inputs to analyze. Currently, only pro mode supports multiple inputs. Default value is None. :paramtype inputs: list[~azure.ai.contentunderstanding.models.AnalyzeInput] @@ -73,6 +66,9 @@ async def begin_analyze( Ex. { "gpt-4.1": "myGpt41Deployment", "text-embedding-3-large": "myTextEmbedding3LargeDeployment" }. Default value is None. :paramtype model_deployments: dict[str, str] + :keyword processing_location: The location where the data may be processed. Defaults to + global. Known values are: "geography", "dataZone", and "global". Default value is None. + :paramtype processing_location: str or ~azure.ai.contentunderstanding.models.ProcessingLocation :return: An instance of AnalyzeAsyncLROPoller that returns AnalyzeResult. The AnalyzeResult is compatible with MutableMapping. The poller includes an .operation_id property. :rtype: ~azure.ai.contentunderstanding.aio.models.AnalyzeAsyncLROPoller[~azure.ai.contentunderstanding.models.AnalyzeResult] @@ -156,10 +152,10 @@ async def begin_analyze( # type: ignore[override] # pyright: ignore[reportInco analyzer_id: str, body: Union[JSON, IO[bytes]] = _Unset, *, - processing_location: Optional[Union[str, _models.ProcessingLocation]] = None, - content_type: Optional[str] = None, inputs: Optional[list[_models.AnalyzeInput]] = None, model_deployments: Optional[dict[str, str]] = None, + processing_location: Optional[Union[str, _models.ProcessingLocation]] = None, + content_type: Optional[str] = None, **kwargs: Any, ) -> "AnalyzeAsyncLROPoller[_models.AnalyzeResult]": # pyright: ignore[reportInvalidTypeArguments] """Extract content and fields from input. @@ -168,11 +164,6 @@ async def begin_analyze( # type: ignore[override] # pyright: ignore[reportInco :type analyzer_id: str :param body: Is either a JSON type or a IO[bytes] type. Default value is None. :type body: JSON or IO[bytes] - :keyword processing_location: The location where the data may be processed. Defaults to - global. Known values are: "geography", "dataZone", and "global". Default value is None. - :paramtype processing_location: str or ~azure.ai.contentunderstanding.models.ProcessingLocation - :keyword content_type: Body Parameter content-type. Default value is "application/json". - :paramtype content_type: str :keyword inputs: Inputs to analyze. Currently, only pro mode supports multiple inputs. Default value is None. :paramtype inputs: list[~azure.ai.contentunderstanding.models.AnalyzeInput] @@ -180,6 +171,11 @@ async def begin_analyze( # type: ignore[override] # pyright: ignore[reportInco Ex. { "gpt-4.1": "myGpt41Deployment", "text-embedding-3-large": "myTextEmbedding3LargeDeployment" }. Default value is None. :paramtype model_deployments: dict[str, str] + :keyword processing_location: The location where the data may be processed. Defaults to + global. Known values are: "geography", "dataZone", and "global". Default value is None. + :paramtype processing_location: str or ~azure.ai.contentunderstanding.models.ProcessingLocation + :keyword content_type: Body Parameter content-type. Default value is "application/json". + :paramtype content_type: str :return: An instance of AnalyzeAsyncLROPoller that returns AnalyzeResult. The AnalyzeResult is compatible with MutableMapping. The poller includes an .operation_id property. :rtype: ~azure.ai.contentunderstanding.aio.models.AnalyzeAsyncLROPoller[~azure.ai.contentunderstanding.models.AnalyzeResult] @@ -226,9 +222,9 @@ async def begin_analyze_binary( analyzer_id: str, binary_input: bytes, *, - processing_location: Optional[Union[str, _models.ProcessingLocation]] = None, input_range: Optional[str] = None, content_type: str = "application/octet-stream", + processing_location: Optional[Union[str, _models.ProcessingLocation]] = None, **kwargs: Any, ) -> "AnalyzeAsyncLROPoller[_models.AnalyzeResult]": # pyright: ignore[reportInvalidTypeArguments] """Extract content and fields from input. @@ -237,15 +233,15 @@ async def begin_analyze_binary( :type analyzer_id: str :param binary_input: The binary content of the document to analyze. Required. :type binary_input: bytes - :keyword processing_location: The location where the data may be processed. Defaults to - global. Known values are: "geography", "dataZone", and "global". Default value is None. - :paramtype processing_location: str or ~azure.ai.contentunderstanding.models.ProcessingLocation :keyword input_range: Range of the input to analyze (ex. ``1-3,5,9-``). Document content uses 1-based page numbers, while audio visual content uses integer milliseconds. Default value is None. :paramtype input_range: str :keyword content_type: Body Parameter content-type. Content type parameter for binary body. Default value is "application/octet-stream". :paramtype content_type: str + :keyword processing_location: The location where the data may be processed. Defaults to + global. Known values are: "geography", "dataZone", and "global". Default value is None. + :paramtype processing_location: str or ~azure.ai.contentunderstanding.models.ProcessingLocation :return: An instance of AnalyzeAsyncLROPoller that returns AnalyzeResult. The AnalyzeResult is compatible with MutableMapping. The poller includes an .operation_id property. :rtype: ~azure.ai.contentunderstanding.aio.models.AnalyzeAsyncLROPoller[~azure.ai.contentunderstanding.models.AnalyzeResult] @@ -256,16 +252,15 @@ async def begin_analyze_binary( matches Python's native string indexing behavior (len() and str[i] use code points). This ensures ContentSpan offsets work correctly with Python string slicing. """ - # Set string_encoding to "codePoint" (matches Python's string indexing) - kwargs["string_encoding"] = "codePoint" - - # Call parent implementation + # Call parent implementation with string_encoding set to "codePoint" + # (matches Python's string indexing) poller = await super().begin_analyze_binary( analyzer_id=analyzer_id, binary_input=binary_input, - processing_location=processing_location, + string_encoding="codePoint", input_range=input_range, content_type=content_type, + processing_location=processing_location, **kwargs, ) diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/azure/ai/contentunderstanding/aio/models/_patch.py b/sdk/contentunderstanding/azure-ai-contentunderstanding/azure/ai/contentunderstanding/aio/models/_patch.py index 38fb894034d5..dbb0f6c2660d 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/azure/ai/contentunderstanding/aio/models/_patch.py +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/azure/ai/contentunderstanding/aio/models/_patch.py @@ -60,7 +60,9 @@ def operation_id(self) -> str: raise ValueError(f"Could not extract operation ID: {str(e)}") from e @classmethod - def from_poller(cls, poller: AsyncLROPoller[PollingReturnType_co]) -> "AnalyzeAsyncLROPoller[PollingReturnType_co]": # pyright: ignore[reportInvalidTypeArguments] + def from_poller( + cls, poller: AsyncLROPoller[PollingReturnType_co] + ) -> "AnalyzeAsyncLROPoller[PollingReturnType_co]": # pyright: ignore[reportInvalidTypeArguments] """Wrap an existing AsyncLROPoller without re-initializing the polling method. This avoids duplicate HTTP requests that would occur if we created a new @@ -72,7 +74,7 @@ def from_poller(cls, poller: AsyncLROPoller[PollingReturnType_co]) -> "AnalyzeAs :rtype: AnalyzeAsyncLROPoller """ # Create instance without calling __init__ to avoid re-initialization - instance: AnalyzeAsyncLROPoller[PollingReturnType_co] = object.__new__(cls) # pyright: ignore[reportInvalidTypeArguments] + instance: "AnalyzeAsyncLROPoller[PollingReturnType_co]" = object.__new__(cls) # pyright: ignore[reportInvalidTypeArguments] # Copy all attributes from the original poller instance.__dict__.update(poller.__dict__) return instance diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/azure/ai/contentunderstanding/models/__init__.py b/sdk/contentunderstanding/azure-ai-contentunderstanding/azure/ai/contentunderstanding/models/__init__.py index ba8bd7df36ff..b307e6b84cb7 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/azure/ai/contentunderstanding/models/__init__.py +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/azure/ai/contentunderstanding/models/__init__.py @@ -5,7 +5,7 @@ # Code generated by Microsoft (R) Python Code Generator. # Changes may cause incorrect behavior and will be lost if the code is regenerated. # -------------------------------------------------------------------------- -# pylint: disable=wrong-import-position,no-name-in-module +# pylint: disable=wrong-import-position from typing import TYPE_CHECKING diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/azure/ai/contentunderstanding/models/_enums.py b/sdk/contentunderstanding/azure-ai-contentunderstanding/azure/ai/contentunderstanding/models/_enums.py index efbbf20ad2ee..ec5d5bcfd40b 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/azure/ai/contentunderstanding/models/_enums.py +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/azure/ai/contentunderstanding/models/_enums.py @@ -5,6 +5,7 @@ # Code generated by Microsoft (R) Python Code Generator. # Changes may cause incorrect behavior and will be lost if the code is regenerated. # -------------------------------------------------------------------------- +# pylint: disable=enum-must-be-uppercase from enum import Enum from azure.core import CaseInsensitiveEnumMeta @@ -215,7 +216,7 @@ class ProcessingLocation(str, Enum, metaclass=CaseInsensitiveEnumMeta): """Data may be processed in the same geography as the resource.""" DATA_ZONE = "dataZone" """Data may be processed in the same data zone as the resource.""" - GLOBAL = "global" + GLOBALEnum = "global" """Data may be processed in any Azure data center globally.""" diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/azure/ai/contentunderstanding/models/_models.py b/sdk/contentunderstanding/azure-ai-contentunderstanding/azure/ai/contentunderstanding/models/_models.py index 1529695fd830..b4c7a18b6b55 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/azure/ai/contentunderstanding/models/_models.py +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/azure/ai/contentunderstanding/models/_models.py @@ -83,8 +83,8 @@ class AnalyzeResult(_Model): :vartype created_at: ~datetime.datetime :ivar warnings: Warnings encountered while analyzing the document. :vartype warnings: list[~azure.core.ODataV4Format] - :ivar string_encoding: The string encoding format for content spans in the response. - Possible values are 'codePoint', 'utf16', and ``utf8``. Default is ``codePoint``."). + :ivar string_encoding: The string encoding format for content spans in the response. Possible + values are 'codePoint', 'utf16', and ``utf8``. Default is ``codePoint``."). :vartype string_encoding: str :ivar contents: The extracted content. Required. :vartype contents: list[~azure.ai.contentunderstanding.models.MediaContent] @@ -107,8 +107,8 @@ class AnalyzeResult(_Model): string_encoding: Optional[str] = rest_field( name="stringEncoding", visibility=["read", "create", "update", "delete", "query"] ) - """ The string encoding format for content spans in the response. - Possible values are 'codePoint', 'utf16', and ``utf8``. Default is ``codePoint``.\").""" + """The string encoding format for content spans in the response. Possible values are 'codePoint', + 'utf16', and ``utf8``. Default is ``codePoint``.\").""" contents: list["_models.MediaContent"] = rest_field(visibility=["read", "create", "update", "delete", "query"]) """The extracted content. Required.""" @@ -538,8 +538,8 @@ class ContentAnalyzer(_Model): :vartype processing_location: str or ~azure.ai.contentunderstanding.models.ProcessingLocation :ivar knowledge_sources: Additional knowledge sources used to enhance the analyzer. :vartype knowledge_sources: list[~azure.ai.contentunderstanding.models.KnowledgeSource] - :ivar models: Mapping of model roles to specific model names. - Ex. { "completion": "gpt-4.1", "embedding": "text-embedding-3-large" }. + :ivar models: Mapping of model roles to specific model names. Ex. { "completion": "gpt-4.1", + "embedding": "text-embedding-3-large" }. :vartype models: dict[str, str] :ivar supported_models: Chat completion and embedding models supported by the analyzer. :vartype supported_models: ~azure.ai.contentunderstanding.models.SupportedModels @@ -578,8 +578,8 @@ class ContentAnalyzer(_Model): ) """Additional knowledge sources used to enhance the analyzer.""" models: Optional[dict[str, str]] = rest_field(visibility=["read", "create"]) - """Mapping of model roles to specific model names. - Ex. { \"completion\": \"gpt-4.1\", \"embedding\": \"text-embedding-3-large\" }.""" + """Mapping of model roles to specific model names. Ex. { \"completion\": \"gpt-4.1\", + \"embedding\": \"text-embedding-3-large\" }.""" supported_models: Optional["_models.SupportedModels"] = rest_field(name="supportedModels", visibility=["read"]) """Chat completion and embedding models supported by the analyzer.""" @@ -699,8 +699,8 @@ class ContentAnalyzerConfig(_Model): :vartype enable_segment: bool :ivar segment_per_page: Force segmentation of document content by page. :vartype segment_per_page: bool - :ivar omit_content: Omit the content for this analyzer from analyze result. - Only return content(s) from additional analyzers specified in contentCategories, if any. + :ivar omit_content: Omit the content for this analyzer from analyze result. Only return + content(s) from additional analyzers specified in contentCategories, if any. :vartype omit_content: bool """ @@ -768,8 +768,8 @@ class ContentAnalyzerConfig(_Model): omit_content: Optional[bool] = rest_field( name="omitContent", visibility=["read", "create", "update", "delete", "query"] ) - """Omit the content for this analyzer from analyze result. - Only return content(s) from additional analyzers specified in contentCategories, if any.""" + """Omit the content for this analyzer from analyze result. Only return content(s) from additional + analyzers specified in contentCategories, if any.""" @overload def __init__( @@ -1065,16 +1065,14 @@ def __init__(self, *args: Any, **kwargs: Any) -> None: class ContentUnderstandingDefaults(_Model): """default settings for this Content Understanding resource. - :ivar model_deployments: Mapping of model names to deployments. - Ex. { "gpt-4.1": "myGpt41Deployment", "text-embedding-3-large": - "myTextEmbedding3LargeDeployment" }. Required. + :ivar model_deployments: Mapping of model names to deployments. Ex. { "gpt-4.1": + "myGpt41Deployment", "text-embedding-3-large": "myTextEmbedding3LargeDeployment" }. Required. :vartype model_deployments: dict[str, str] """ model_deployments: dict[str, str] = rest_field(name="modelDeployments", visibility=["read", "create", "update"]) - """Mapping of model names to deployments. - Ex. { \"gpt-4.1\": \"myGpt41Deployment\", \"text-embedding-3-large\": - \"myTextEmbedding3LargeDeployment\" }. Required.""" + """Mapping of model names to deployments. Ex. { \"gpt-4.1\": \"myGpt41Deployment\", + \"text-embedding-3-large\": \"myTextEmbedding3LargeDeployment\" }. Required.""" @overload def __init__( @@ -1583,9 +1581,9 @@ class DocumentContent(MediaContent, discriminator="document"): :vartype start_page_number: int :ivar end_page_number: End page number (1-indexed) of the content. Required. :vartype end_page_number: int - :ivar unit: Length unit used by the width, height, and source properties. - For images/tiff, the default unit is pixel. For PDF, the default unit is inch. Known values - are: "pixel" and "inch". + :ivar unit: Length unit used by the width, height, and source properties. For images/tiff, the + default unit is pixel. For PDF, the default unit is inch. Known values are: "pixel" and + "inch". :vartype unit: str or ~azure.ai.contentunderstanding.models.LengthUnit :ivar pages: List of pages in the document. :vartype pages: list[~azure.ai.contentunderstanding.models.DocumentPage] @@ -1620,9 +1618,8 @@ class DocumentContent(MediaContent, discriminator="document"): unit: Optional[Union[str, "_models.LengthUnit"]] = rest_field( visibility=["read", "create", "update", "delete", "query"] ) - """Length unit used by the width, height, and source properties. - For images/tiff, the default unit is pixel. For PDF, the default unit is inch. Known values - are: \"pixel\" and \"inch\".""" + """Length unit used by the width, height, and source properties. For images/tiff, the default unit + is pixel. For PDF, the default unit is inch. Known values are: \"pixel\" and \"inch\".""" pages: Optional[list["_models.DocumentPage"]] = rest_field( visibility=["read", "create", "update", "delete", "query"] ) @@ -1986,9 +1983,8 @@ class DocumentPage(_Model): :vartype height: float :ivar spans: Span(s) associated with the page in the markdown content. :vartype spans: list[~azure.ai.contentunderstanding.models.ContentSpan] - :ivar angle: The general orientation of the content in clockwise direction, - measured in degrees between (-180, 180]. - Only if enableOcr is true. + :ivar angle: The general orientation of the content in clockwise direction, measured in degrees + between (-180, 180]. Only if enableOcr is true. :vartype angle: float :ivar words: List of words in the page. Only if enableOcr and returnDetails are true. :vartype words: list[~azure.ai.contentunderstanding.models.DocumentWord] @@ -2013,9 +2009,8 @@ class DocumentPage(_Model): ) """Span(s) associated with the page in the markdown content.""" angle: Optional[float] = rest_field(visibility=["read", "create", "update", "delete", "query"]) - """The general orientation of the content in clockwise direction, - measured in degrees between (-180, 180]. - Only if enableOcr is true.""" + """The general orientation of the content in clockwise direction, measured in degrees between + (-180, 180]. Only if enableOcr is true.""" words: Optional[list["_models.DocumentWord"]] = rest_field( visibility=["read", "create", "update", "delete", "query"] ) @@ -2060,8 +2055,8 @@ def __init__(self, *args: Any, **kwargs: Any) -> None: class DocumentParagraph(_Model): - """Paragraph in a document, generally consisting of an contiguous sequence of lines - with common alignment and spacing. + """Paragraph in a document, generally consisting of an contiguous sequence of lines with common + alignment and spacing. :ivar role: Semantic role of the paragraph. Known values are: "pageHeader", "pageFooter", "pageNumber", "title", "sectionHeading", "footnote", and "formulaBlock". @@ -2286,9 +2281,9 @@ def __init__(self, *args: Any, **kwargs: Any) -> None: class DocumentWord(_Model): - """Word in a document, consisting of a contiguous sequence of characters. - For non-space delimited languages, such as Chinese, Japanese, and Korean, - each character is represented as its own word. + """Word in a document, consisting of a contiguous sequence of characters. For non-space delimited + languages, such as Chinese, Japanese, and Korean, each character is represented as its own + word. :ivar content: Word text. Required. :vartype content: str @@ -2866,13 +2861,13 @@ def __init__(self, *args: Any, **kwargs: Any) -> None: class UsageDetails(_Model): """Usage details. - :ivar document_pages_minimal: The number of document pages processed at the minimal level. - For documents without explicit pages (ex. txt, html), every 3000 UTF-16 characters is counted - as one page. + :ivar document_pages_minimal: The number of document pages processed at the minimal level. For + documents without explicit pages (ex. txt, html), every 3000 UTF-16 characters is counted as + one page. :vartype document_pages_minimal: int - :ivar document_pages_basic: The number of document pages processed at the basic level. - For documents without explicit pages (ex. txt, html), every 3000 UTF-16 characters is counted - as one page. + :ivar document_pages_basic: The number of document pages processed at the basic level. For + documents without explicit pages (ex. txt, html), every 3000 UTF-16 characters is counted as + one page. :vartype document_pages_basic: int :ivar document_pages_standard: The number of document pages processed at the standard level. For documents without explicit pages (ex. txt, html), every 3000 UTF-16 characters is counted @@ -2893,21 +2888,18 @@ class UsageDetails(_Model): document_pages_minimal: Optional[int] = rest_field( name="documentPagesMinimal", visibility=["read", "create", "update", "delete", "query"] ) - """The number of document pages processed at the minimal level. - For documents without explicit pages (ex. txt, html), every 3000 UTF-16 characters is counted - as one page.""" + """The number of document pages processed at the minimal level. For documents without explicit + pages (ex. txt, html), every 3000 UTF-16 characters is counted as one page.""" document_pages_basic: Optional[int] = rest_field( name="documentPagesBasic", visibility=["read", "create", "update", "delete", "query"] ) - """The number of document pages processed at the basic level. - For documents without explicit pages (ex. txt, html), every 3000 UTF-16 characters is counted - as one page.""" + """The number of document pages processed at the basic level. For documents without explicit pages + (ex. txt, html), every 3000 UTF-16 characters is counted as one page.""" document_pages_standard: Optional[int] = rest_field( name="documentPagesStandard", visibility=["read", "create", "update", "delete", "query"] ) - """The number of document pages processed at the standard level. - For documents without explicit pages (ex. txt, html), every 3000 UTF-16 characters is counted - as one page.""" + """The number of document pages processed at the standard level. For documents without explicit + pages (ex. txt, html), every 3000 UTF-16 characters is counted as one page.""" audio_hours: Optional[float] = rest_field( name="audioHours", visibility=["read", "create", "update", "delete", "query"] ) diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/azure/ai/contentunderstanding/models/_patch.py b/sdk/contentunderstanding/azure-ai-contentunderstanding/azure/ai/contentunderstanding/models/_patch.py index 123070695083..67a49cc50ae4 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/azure/ai/contentunderstanding/models/_patch.py +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/azure/ai/contentunderstanding/models/_patch.py @@ -91,7 +91,9 @@ def operation_id(self) -> str: raise ValueError(f"Could not extract operation ID: {str(e)}") from e @classmethod - def from_poller(cls, poller: LROPoller[PollingReturnType_co]) -> "AnalyzeLROPoller[PollingReturnType_co]": # pyright: ignore[reportInvalidTypeArguments] + def from_poller( + cls, poller: LROPoller[PollingReturnType_co] + ) -> "AnalyzeLROPoller[PollingReturnType_co]": # pyright: ignore[reportInvalidTypeArguments] """Wrap an existing LROPoller without re-initializing the polling method. This avoids duplicate HTTP requests that would occur if we created a new @@ -103,7 +105,7 @@ def from_poller(cls, poller: LROPoller[PollingReturnType_co]) -> "AnalyzeLROPoll :rtype: AnalyzeLROPoller """ # Create instance without calling __init__ to avoid re-initialization - instance: AnalyzeLROPoller[PollingReturnType_co] = object.__new__(cls) # pyright: ignore[reportInvalidTypeArguments] + instance: "AnalyzeLROPoller[PollingReturnType_co]" = object.__new__(cls) # pyright: ignore[reportInvalidTypeArguments] # Copy all attributes from the original poller instance.__dict__.update(poller.__dict__) return instance diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/cspell.json b/sdk/contentunderstanding/azure-ai-contentunderstanding/cspell.json index 03b7fba517f5..dd30a951ede2 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/cspell.json +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/cspell.json @@ -2,6 +2,7 @@ "ignoreWords": [ "Agentic", "chartjs", + "esac", "laren", "Milsa", "nlaren", diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/pyproject.toml b/sdk/contentunderstanding/azure-ai-contentunderstanding/pyproject.toml index 17c5938d2fad..323b0d19b2d6 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/pyproject.toml +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/pyproject.toml @@ -32,7 +32,7 @@ keywords = ["azure", "azure sdk"] dependencies = [ "isodate>=0.6.1", - "azure-core>=1.35.0", + "azure-core>=1.37.0", "typing-extensions>=4.6.0", ] dynamic = [ @@ -59,13 +59,3 @@ exclude = [ [tool.setuptools.package-data] pytyped = ["py.typed"] - -[tool.pytest.ini_options] -testpaths = ["tests"] -norecursedirs = [ - "TempTypeSpecFiles", - ".venv", - "node_modules", - ".git", - "__pycache__", -] diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/samples/async_samples/sample_analyze_invoice_async.py b/sdk/contentunderstanding/azure-ai-contentunderstanding/samples/async_samples/sample_analyze_invoice_async.py index 1436cfa2fc4a..6e368279c53c 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/samples/async_samples/sample_analyze_invoice_async.py +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/samples/async_samples/sample_analyze_invoice_async.py @@ -121,7 +121,11 @@ async def main() -> None: customer_name_field = document_content.fields.get("CustomerName") print(f"Customer Name: {customer_name_field.value or '(None)' if customer_name_field else '(None)'}") if customer_name_field: - print(f" Confidence: {customer_name_field.confidence:.2f}" if customer_name_field.confidence else " Confidence: N/A") + print( + f" Confidence: {customer_name_field.confidence:.2f}" + if customer_name_field.confidence + else " Confidence: N/A" + ) print(f" Source: {customer_name_field.source or 'N/A'}") if customer_name_field.spans and len(customer_name_field.spans) > 0: span = customer_name_field.spans[0] @@ -131,7 +135,11 @@ async def main() -> None: invoice_date_field = document_content.fields.get("InvoiceDate") print(f"Invoice Date: {invoice_date_field.value or '(None)' if invoice_date_field else '(None)'}") if invoice_date_field: - print(f" Confidence: {invoice_date_field.confidence:.2f}" if invoice_date_field.confidence else " Confidence: N/A") + print( + f" Confidence: {invoice_date_field.confidence:.2f}" + if invoice_date_field.confidence + else " Confidence: N/A" + ) print(f" Source: {invoice_date_field.source or 'N/A'}") if invoice_date_field.spans and len(invoice_date_field.spans) > 0: span = invoice_date_field.spans[0] @@ -144,12 +152,16 @@ async def main() -> None: currency_field = total_amount_field.value.get("CurrencyCode") amount = amount_field.value if amount_field else None # Use currency value if present, otherwise default to "" - currency = (currency_field.value if currency_field and currency_field.value else "") + currency = currency_field.value if currency_field and currency_field.value else "" if isinstance(amount, (int, float)): print(f"\nTotal: {currency}{amount:.2f}") else: print(f"\nTotal: {currency}{amount or '(None)'}") - print(f" Amount Confidence: {amount_field.confidence:.2f}" if amount_field and amount_field.confidence else " Amount Confidence: N/A") + print( + f" Amount Confidence: {amount_field.confidence:.2f}" + if amount_field and amount_field.confidence + else " Amount Confidence: N/A" + ) print(f" Source for Amount: {amount_field.source or 'N/A'}" if amount_field else " Source: N/A") # Extract array fields (collections like line items) @@ -164,7 +176,11 @@ async def main() -> None: quantity = quantity_field.value if quantity_field and quantity_field.value else "N/A" print(f" Item {i}: {description}") print(f" Quantity: {quantity}") - print(f" Quantity Confidence: {quantity_field.confidence:.2f}" if quantity_field and quantity_field.confidence else " Quantity Confidence: N/A") + print( + f" Quantity Confidence: {quantity_field.confidence:.2f}" + if quantity_field and quantity_field.confidence + else " Quantity Confidence: N/A" + ) # [END extract_invoice_fields] if not isinstance(credential, AzureKeyCredential): diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/samples/async_samples/sample_get_analyzer_async.py b/sdk/contentunderstanding/azure-ai-contentunderstanding/samples/async_samples/sample_get_analyzer_async.py index 5ea8cce64869..a22391ae5cad 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/samples/async_samples/sample_get_analyzer_async.py +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/samples/async_samples/sample_get_analyzer_async.py @@ -116,9 +116,7 @@ async def main() -> None: ) # Create analyzer configuration - config = ContentAnalyzerConfig( - return_details=True - ) + config = ContentAnalyzerConfig(return_details=True) # Create the custom analyzer custom_analyzer = ContentAnalyzer( diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/samples/sample_analyze_invoice.py b/sdk/contentunderstanding/azure-ai-contentunderstanding/samples/sample_analyze_invoice.py index 753a6112c622..184bc39f324d 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/samples/sample_analyze_invoice.py +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/samples/sample_analyze_invoice.py @@ -121,7 +121,11 @@ def main() -> None: customer_name_field = document_content.fields.get("CustomerName") print(f"Customer Name: {customer_name_field.value or '(None)' if customer_name_field else '(None)'}") if customer_name_field: - print(f" Confidence: {customer_name_field.confidence:.2f}" if customer_name_field.confidence else " Confidence: N/A") + print( + f" Confidence: {customer_name_field.confidence:.2f}" + if customer_name_field.confidence + else " Confidence: N/A" + ) print(f" Source: {customer_name_field.source or 'N/A'}") if customer_name_field.spans and len(customer_name_field.spans) > 0: span = customer_name_field.spans[0] @@ -131,7 +135,11 @@ def main() -> None: invoice_date_field = document_content.fields.get("InvoiceDate") print(f"Invoice Date: {invoice_date_field.value or '(None)' if invoice_date_field else '(None)'}") if invoice_date_field: - print(f" Confidence: {invoice_date_field.confidence:.2f}" if invoice_date_field.confidence else " Confidence: N/A") + print( + f" Confidence: {invoice_date_field.confidence:.2f}" + if invoice_date_field.confidence + else " Confidence: N/A" + ) print(f" Source: {invoice_date_field.source or 'N/A'}") if invoice_date_field.spans and len(invoice_date_field.spans) > 0: span = invoice_date_field.spans[0] @@ -144,12 +152,16 @@ def main() -> None: currency_field = total_amount_field.value.get("CurrencyCode") amount = amount_field.value if amount_field else None # Use currency value if present, otherwise default to "" - currency = (currency_field.value if currency_field and currency_field.value else "") + currency = currency_field.value if currency_field and currency_field.value else "" if isinstance(amount, (int, float)): print(f"\nTotal: {currency}{amount:.2f}") else: print(f"\nTotal: {currency}{amount or '(None)'}") - print(f" Amount Confidence: {amount_field.confidence:.2f}" if amount_field and amount_field.confidence else " Amount Confidence: N/A") + print( + f" Amount Confidence: {amount_field.confidence:.2f}" + if amount_field and amount_field.confidence + else " Amount Confidence: N/A" + ) print(f" Source for Amount: {amount_field.source or 'N/A'}" if amount_field else " Source: N/A") # Extract array fields (collections like line items) @@ -164,7 +176,11 @@ def main() -> None: quantity = quantity_field.value if quantity_field and quantity_field.value else "N/A" print(f" Item {i}: {description}") print(f" Quantity: {quantity}") - print(f" Quantity Confidence: {quantity_field.confidence:.2f}" if quantity_field and quantity_field.confidence else " Quantity Confidence: N/A") + print( + f" Quantity Confidence: {quantity_field.confidence:.2f}" + if quantity_field and quantity_field.confidence + else " Quantity Confidence: N/A" + ) # [END extract_invoice_fields] diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/samples/sample_analyze_url.py b/sdk/contentunderstanding/azure-ai-contentunderstanding/samples/sample_analyze_url.py index 90a136a93255..87c87dc77905 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/samples/sample_analyze_url.py +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/samples/sample_analyze_url.py @@ -173,7 +173,9 @@ def main() -> None: print("\n" + "=" * 60) print("IMAGE ANALYSIS FROM URL") print("=" * 60) - image_url = "https://raw.githubusercontent.com/Azure-Samples/azure-ai-content-understanding-assets/main/image/pieChart.jpg" + image_url = ( + "https://raw.githubusercontent.com/Azure-Samples/azure-ai-content-understanding-assets/main/image/pieChart.jpg" + ) print(f"Analyzing image from URL with prebuilt-imageSearch...") print(f" URL: {image_url}") diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/samples/sample_delete_result.py b/sdk/contentunderstanding/azure-ai-contentunderstanding/samples/sample_delete_result.py index 0cbb9ae13940..4b27c59e7fbe 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/samples/sample_delete_result.py +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/samples/sample_delete_result.py @@ -60,9 +60,7 @@ def main() -> None: # [START analyze_and_delete_result] # You can replace this URL with your own invoice file URL - document_url = ( - "https://raw.githubusercontent.com/Azure-Samples/azure-ai-content-understanding-assets/main/document/invoice.pdf" - ) + document_url = "https://raw.githubusercontent.com/Azure-Samples/azure-ai-content-understanding-assets/main/document/invoice.pdf" # Step 1: Analyze and wait for completion analyze_operation = client.begin_analyze( diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/samples/sample_get_analyzer.py b/sdk/contentunderstanding/azure-ai-contentunderstanding/samples/sample_get_analyzer.py index 03468a2b565d..93711d37097e 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/samples/sample_get_analyzer.py +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/samples/sample_get_analyzer.py @@ -116,9 +116,7 @@ def main() -> None: ) # Create analyzer configuration - config = ContentAnalyzerConfig( - return_details=True - ) + config = ContentAnalyzerConfig(return_details=True) # Create the custom analyzer custom_analyzer = ContentAnalyzer( diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/conftest.py b/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/conftest.py index 5855efe4c263..50c941f85c6c 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/conftest.py +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/conftest.py @@ -1,3 +1,4 @@ +# pylint: disable=line-too-long,useless-suppression # coding=utf-8 # -------------------------------------------------------------------------- # Copyright (c) Microsoft Corporation. All rights reserved. @@ -30,10 +31,10 @@ def start_proxy(test_proxy): @pytest.fixture(scope="session", autouse=True) def configure_test_proxy_matcher(test_proxy): """Configure the test proxy to handle LRO polling request matching. - + LRO operations (like begin_analyze) make multiple identical GET requests to poll status. The test proxy must match these requests in the correct order. We configure: - + 1. compare_bodies=False: Don't match on body content (polling requests have no body) 2. excluded_headers: Completely exclude these headers from matching consideration. These headers vary between recording and playback environments: @@ -45,7 +46,7 @@ def configure_test_proxy_matcher(test_proxy): """ set_custom_default_matcher( compare_bodies=False, - excluded_headers="User-Agent,x-ms-client-request-id,x-ms-request-id,Authorization,Content-Length,Accept,Connection" + excluded_headers="User-Agent,x-ms-client-request-id,x-ms-request-id,Authorization,Content-Length,Accept,Connection", ) @@ -96,11 +97,8 @@ def add_sanitizers(test_proxy): # Sanitize endpoint URLs to match DocumentIntelligence SDK pattern # Normalize any endpoint hostname to "Sanitized" to ensure recordings match between recording and playback # This regex matches the hostname part (between // and .services.ai.azure.com) and replaces it with "Sanitized" - add_general_regex_sanitizer( - value="Sanitized", - regex="(?<=\\/\\/)[^/]+(?=\\.services\\.ai\\.azure\\.com)" - ) - + add_general_regex_sanitizer(value="Sanitized", regex="(?<=\\/\\/)[^/]+(?=\\.services\\.ai\\.azure\\.com)") + # Sanitize Operation-Location headers specifically (used by LRO polling) # This ensures the poller uses the correct endpoint URL during playback # IMPORTANT: Do NOT use lookahead (?=...) as it doesn't consume the match, @@ -108,7 +106,7 @@ def add_sanitizers(test_proxy): add_header_regex_sanitizer( key="Operation-Location", value="https://Sanitized.services.ai.azure.com", - regex=r"https://[a-zA-Z0-9\-]+\.services\.ai\.azure\.com" + regex=r"https://[a-zA-Z0-9\-]+\.services\.ai\.azure\.com", ) # Sanitize Ocp-Apim-Subscription-Key header (where the API key is sent) @@ -133,27 +131,19 @@ def add_sanitizers(test_proxy): # This ensures that real resource IDs and regions are sanitized before being stored in test proxy variables source_resource_id = os.environ.get("CONTENTUNDERSTANDING_SOURCE_RESOURCE_ID", "") if source_resource_id and source_resource_id != "placeholder-source-resource-id": - add_general_string_sanitizer( - target=source_resource_id, value="placeholder-source-resource-id" - ) - + add_general_string_sanitizer(target=source_resource_id, value="placeholder-source-resource-id") + source_region = os.environ.get("CONTENTUNDERSTANDING_SOURCE_REGION", "") if source_region and source_region != "placeholder-source-region": - add_general_string_sanitizer( - target=source_region, value="placeholder-source-region" - ) - + add_general_string_sanitizer(target=source_region, value="placeholder-source-region") + target_resource_id = os.environ.get("CONTENTUNDERSTANDING_TARGET_RESOURCE_ID", "") if target_resource_id and target_resource_id != "placeholder-target-resource-id": - add_general_string_sanitizer( - target=target_resource_id, value="placeholder-target-resource-id" - ) - + add_general_string_sanitizer(target=target_resource_id, value="placeholder-target-resource-id") + target_region = os.environ.get("CONTENTUNDERSTANDING_TARGET_REGION", "") if target_region and target_region != "placeholder-target-region": - add_general_string_sanitizer( - target=target_region, value="placeholder-target-region" - ) + add_general_string_sanitizer(target=target_region, value="placeholder-target-region") # Sanitize dynamic analyzer IDs in URLs only # Note: We don't sanitize analyzer IDs in response bodies because tests using variables diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_analyze_binary.py b/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_analyze_binary.py index a25730c6a07b..08b942a7db9c 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_analyze_binary.py +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_analyze_binary.py @@ -74,9 +74,7 @@ def test_sample_analyze_binary(self, contentunderstanding_endpoint: str) -> None print("[PASS] Binary data created successfully") # Analyze the document - poller = client.begin_analyze_binary( - analyzer_id="prebuilt-documentSearch", binary_input=file_bytes - ) + poller = client.begin_analyze_binary(analyzer_id="prebuilt-documentSearch", binary_input=file_bytes) result = poller.result() diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_analyze_binary_async.py b/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_analyze_binary_async.py index 8b7ff47ada2b..9dcee0455ec8 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_analyze_binary_async.py +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_analyze_binary_async.py @@ -74,9 +74,7 @@ async def test_sample_analyze_binary_async(self, contentunderstanding_endpoint: print("[PASS] Binary data created successfully") # Analyze the document - poller = await client.begin_analyze_binary( - analyzer_id="prebuilt-documentSearch", binary_input=file_bytes - ) + poller = await client.begin_analyze_binary(analyzer_id="prebuilt-documentSearch", binary_input=file_bytes) result = await poller.result() diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_analyze_configs.py b/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_analyze_configs.py index 6708332ac186..8731c12ae32a 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_analyze_configs.py +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_analyze_configs.py @@ -153,7 +153,7 @@ def _test_document_features(self, content): formulas = getattr(page, "formulas", None) if formulas: formulas_count += len(formulas) - + if formulas_count > 0: print(f"[PASS] Found {formulas_count} formula(s) in document pages") else: diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_analyze_configs_async.py b/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_analyze_configs_async.py index 1d7b5aade82d..e8df13cbde0e 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_analyze_configs_async.py +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_analyze_configs_async.py @@ -154,7 +154,7 @@ def _test_document_features(self, content): formulas = getattr(page, "formulas", None) if formulas: formulas_count += len(formulas) - + if formulas_count > 0: print(f"[PASS] Found {formulas_count} formula(s) in document pages") else: diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_analyze_url.py b/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_analyze_url.py index 0487d18900e6..170a69b70418 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_analyze_url.py +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_analyze_url.py @@ -96,7 +96,9 @@ def test_sample_analyze_video_from_url(self, contentunderstanding_endpoint: str) # Analyze the video # Use 10-second polling interval for video analysis (longer processing time) - poller = client.begin_analyze(analyzer_id="prebuilt-videoSearch", inputs=[AnalyzeInput(url=url)], polling_interval=10) + poller = client.begin_analyze( + analyzer_id="prebuilt-videoSearch", inputs=[AnalyzeInput(url=url)], polling_interval=10 + ) result = poller.result() @@ -138,8 +140,10 @@ def test_sample_analyze_audio_from_url(self, contentunderstanding_endpoint: str) # Analyze the audio # Use 10-second polling interval for audio analysis (longer processing time) - poller = client.begin_analyze(analyzer_id="prebuilt-audioSearch", inputs=[AnalyzeInput(url=url)], polling_interval=10) - + poller = client.begin_analyze( + analyzer_id="prebuilt-audioSearch", inputs=[AnalyzeInput(url=url)], polling_interval=10 + ) + result = poller.result() # Assertion: Verify analysis operation completed @@ -179,7 +183,7 @@ def test_sample_analyze_image_from_url(self, contentunderstanding_endpoint: str) # Analyze the image poller = client.begin_analyze(analyzer_id="prebuilt-imageSearch", inputs=[AnalyzeInput(url=url)]) - + result = poller.result() # Assertion: Verify analysis operation completed diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_analyze_url_async.py b/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_analyze_url_async.py index 9d5118317624..b2ed05c1c843 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_analyze_url_async.py +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_analyze_url_async.py @@ -51,9 +51,7 @@ async def test_sample_analyze_document_from_url_async(self, contentunderstanding print(f"[PASS] Analyzing document from URL: {url}") # Analyze the document - poller = await client.begin_analyze( - analyzer_id="prebuilt-documentSearch", inputs=[AnalyzeInput(url=url)] - ) + poller = await client.begin_analyze(analyzer_id="prebuilt-documentSearch", inputs=[AnalyzeInput(url=url)]) result = await poller.result() @@ -153,7 +151,7 @@ async def test_sample_analyze_audio_from_url_async(self, contentunderstanding_en poller = await client.begin_analyze( analyzer_id="prebuilt-audioSearch", inputs=[AnalyzeInput(url=url)], polling_interval=10 ) - + result = await poller.result() # Assertion: Verify analysis operation completed @@ -193,10 +191,8 @@ async def test_sample_analyze_image_from_url_async(self, contentunderstanding_en print(f"[PASS] Analyzing image from URL: {url}") # Analyze the image - poller = await client.begin_analyze( - analyzer_id="prebuilt-imageSearch", inputs=[AnalyzeInput(url=url)] - ) - + poller = await client.begin_analyze(analyzer_id="prebuilt-imageSearch", inputs=[AnalyzeInput(url=url)]) + result = await poller.result() # Assertion: Verify analysis operation completed diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_copy_analyzer.py b/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_copy_analyzer.py index bfb39edb4681..aadcde13d34c 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_copy_analyzer.py +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_copy_analyzer.py @@ -54,7 +54,7 @@ def test_sample_copy_analyzer(self, contentunderstanding_endpoint: str, **kwargs """ # Get variables from test proxy (recorded values in playback, empty dict in recording) variables = kwargs.pop("variables", {}) - + try: client = self.create_client(endpoint=contentunderstanding_endpoint) diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_copy_analyzer_async.py b/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_copy_analyzer_async.py index c00445c2fe52..ea915781a64f 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_copy_analyzer_async.py +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_copy_analyzer_async.py @@ -53,7 +53,7 @@ async def test_sample_copy_analyzer_async(self, contentunderstanding_endpoint: s """ # Get variables from test proxy (recorded values in playback, empty dict in recording) variables = kwargs.pop("variables", {}) - + try: client = self.create_async_client(endpoint=contentunderstanding_endpoint) diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_create_analyzer.py b/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_create_analyzer.py index 6e746e583d93..4a0dcc3b9bbe 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_create_analyzer.py +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_create_analyzer.py @@ -51,7 +51,7 @@ def test_sample_create_analyzer(self, contentunderstanding_endpoint: str, **kwar """ # Get variables from test proxy (recorded values in playback, empty dict in recording) variables = kwargs.pop("variables", {}) - + client = self.create_client(endpoint=contentunderstanding_endpoint) # Generate a unique analyzer ID @@ -175,7 +175,7 @@ def test_sample_create_analyzer(self, contentunderstanding_endpoint: str, **kwar print(f"[PASS] Cleanup: Analyzer '{analyzer_id}' deleted") except Exception as e: print(f"[WARN] Cleanup failed: {str(e)}") - + print("\n[SUCCESS] All test_sample_create_analyzer assertions passed") # Return variables to be recorded for playback mode diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_create_analyzer_async.py b/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_create_analyzer_async.py index a9baf60d3109..f9fc9bbd09f9 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_create_analyzer_async.py +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_create_analyzer_async.py @@ -50,7 +50,7 @@ async def test_sample_create_analyzer_async(self, contentunderstanding_endpoint: """ # Get variables from test proxy (recorded values in playback, empty dict in recording) variables = kwargs.pop("variables", {}) - + client = self.create_async_client(endpoint=contentunderstanding_endpoint) # Generate a unique analyzer ID diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_create_classifier.py b/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_create_classifier.py index 18970c769173..6bdbe979db82 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_create_classifier.py +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_create_classifier.py @@ -51,7 +51,7 @@ def test_sample_create_classifier(self, contentunderstanding_endpoint: str, **kw """ # Get variables from test proxy (recorded values in playback, empty dict in recording) variables = kwargs.pop("variables", {}) - + client = self.create_client(endpoint=contentunderstanding_endpoint) # Generate a unique analyzer ID @@ -168,7 +168,7 @@ def test_sample_analyze_with_classifier(self, contentunderstanding_endpoint: str """ # Get variables from test proxy (recorded values in playback, empty dict in recording) variables = kwargs.pop("variables", {}) - + client = self.create_client(endpoint=contentunderstanding_endpoint) # Generate a unique analyzer ID diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_create_classifier_async.py b/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_create_classifier_async.py index a5a5c29807e9..97a2510cd86c 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_create_classifier_async.py +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_create_classifier_async.py @@ -50,7 +50,7 @@ async def test_sample_create_classifier_async(self, contentunderstanding_endpoin """ # Get variables from test proxy (recorded values in playback, empty dict in recording) variables = kwargs.pop("variables", {}) - + client = self.create_async_client(endpoint=contentunderstanding_endpoint) # Generate a unique analyzer ID @@ -148,7 +148,9 @@ async def test_sample_create_classifier_async(self, contentunderstanding_endpoin @ContentUnderstandingPreparer() @recorded_by_proxy_async - async def test_sample_analyze_with_classifier_async(self, contentunderstanding_endpoint: str, **kwargs) -> Dict[str, str]: + async def test_sample_analyze_with_classifier_async( + self, contentunderstanding_endpoint: str, **kwargs + ) -> Dict[str, str]: """Test analyzing a document with a classifier to categorize content into segments (async version). This test validates: @@ -160,7 +162,7 @@ async def test_sample_analyze_with_classifier_async(self, contentunderstanding_e """ # Get variables from test proxy (recorded values in playback, empty dict in recording) variables = kwargs.pop("variables", {}) - + client = self.create_async_client(endpoint=contentunderstanding_endpoint) # Generate a unique analyzer ID diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_delete_result.py b/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_delete_result.py index 03406673cb9a..f0df0bad611c 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_delete_result.py +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_delete_result.py @@ -56,9 +56,7 @@ def test_sample_delete_result(self, contentunderstanding_endpoint: str) -> None: print(f"[PASS] File loaded: {len(file_bytes)} bytes") # Analyze to get an operation ID - analyze_operation = client.begin_analyze( - analyzer_id="prebuilt-invoice", inputs=[AnalyzeInput(data=file_bytes)] - ) + analyze_operation = client.begin_analyze(analyzer_id="prebuilt-invoice", inputs=[AnalyzeInput(data=file_bytes)]) result: AnalyzeResult = analyze_operation.result() diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_grant_copy_auth.py b/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_grant_copy_auth.py index e51909d3a4bc..38e68a09da78 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_grant_copy_auth.py +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_grant_copy_auth.py @@ -59,7 +59,7 @@ def test_sample_grant_copy_auth(self, contentunderstanding_endpoint: str, **kwar # Get variables from test proxy (recorded values in playback, empty dict in recording) variables = kwargs.pop("variables", {}) - + try: # Always use placeholder values in variables to avoid storing real resource IDs/regions # Real values are read from environment for API calls (they'll be sanitized in request bodies) @@ -69,7 +69,7 @@ def test_sample_grant_copy_auth(self, contentunderstanding_endpoint: str, **kwar source_region = variables.setdefault("source_region", "placeholder-source-region") target_resource_id = variables.setdefault("target_resource_id", "placeholder-target-resource-id") target_region = variables.setdefault("target_region", "placeholder-target-region") - + # For actual API calls, use real values from environment if available (in live mode) # These will be sanitized in request/response bodies by conftest sanitizers if is_live(): @@ -88,7 +88,7 @@ def test_sample_grant_copy_auth(self, contentunderstanding_endpoint: str, **kwar env_target_region = os.environ.get("CONTENTUNDERSTANDING_TARGET_REGION") if env_target_region: target_region = env_target_region - + target_key = os.environ.get("CONTENTUNDERSTANDING_TARGET_KEY") # Only require environment variables in live mode diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_grant_copy_auth_async.py b/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_grant_copy_auth_async.py index f55e6dcde840..cfc77800ab5e 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_grant_copy_auth_async.py +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_grant_copy_auth_async.py @@ -60,7 +60,7 @@ async def test_sample_grant_copy_auth_async(self, contentunderstanding_endpoint: # Get variables from test proxy (recorded values in playback, empty dict in recording) variables = kwargs.pop("variables", {}) - + try: # Always use placeholder values in variables to avoid storing real resource IDs/regions # Real values are read from environment for API calls (they'll be sanitized in request bodies) @@ -70,7 +70,7 @@ async def test_sample_grant_copy_auth_async(self, contentunderstanding_endpoint: source_region = variables.setdefault("source_region", "placeholder-source-region") target_resource_id = variables.setdefault("target_resource_id", "placeholder-target-resource-id") target_region = variables.setdefault("target_region", "placeholder-target-region") - + # For actual API calls, use real values from environment if available (in live mode) # These will be sanitized in request/response bodies by conftest sanitizers if is_live(): @@ -89,7 +89,7 @@ async def test_sample_grant_copy_auth_async(self, contentunderstanding_endpoint: env_target_region = os.environ.get("CONTENTUNDERSTANDING_TARGET_REGION") if env_target_region: target_region = env_target_region - + target_key = os.environ.get("CONTENTUNDERSTANDING_TARGET_KEY") # Only require environment variables in live mode diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_update_defaults.py b/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_update_defaults.py index 45ffecd9cd0d..78c07108bb43 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_update_defaults.py +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_update_defaults.py @@ -1,3 +1,4 @@ +# pylint: disable=line-too-long,useless-suppression # coding: utf-8 # ------------------------------------------------------------------------- @@ -51,17 +52,13 @@ def test_sample_update_defaults(self, contentunderstanding_endpoint: str, **kwar # Get deployment names from variables (playback) or environment (recording) # If not found, use defaults and record them - gpt_4_1_deployment = variables.setdefault( - "gpt_4_1_deployment", - os.getenv("GPT_4_1_DEPLOYMENT", "gpt-4.1") - ) + gpt_4_1_deployment = variables.setdefault("gpt_4_1_deployment", os.getenv("GPT_4_1_DEPLOYMENT", "gpt-4.1")) gpt_4_1_mini_deployment = variables.setdefault( - "gpt_4_1_mini_deployment", - os.getenv("GPT_4_1_MINI_DEPLOYMENT", "gpt-4.1-mini") + "gpt_4_1_mini_deployment", os.getenv("GPT_4_1_MINI_DEPLOYMENT", "gpt-4.1-mini") ) text_embedding_3_large_deployment = variables.setdefault( "text_embedding_3_large_deployment", - os.getenv("TEXT_EMBEDDING_3_LARGE_DEPLOYMENT", "text-embedding-3-large") + os.getenv("TEXT_EMBEDDING_3_LARGE_DEPLOYMENT", "text-embedding-3-large"), ) client = self.create_client(endpoint=contentunderstanding_endpoint) diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_update_defaults_async.py b/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_update_defaults_async.py index 540b73ea53f7..2521494c07bc 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_update_defaults_async.py +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/samples/test_sample_update_defaults_async.py @@ -1,3 +1,4 @@ +# pylint: disable=line-too-long,useless-suppression # coding: utf-8 # ------------------------------------------------------------------------- @@ -51,17 +52,13 @@ async def test_sample_update_defaults_async(self, contentunderstanding_endpoint: # Get deployment names from variables (playback) or environment (recording) # If not found, use defaults and record them - gpt_4_1_deployment = variables.setdefault( - "gpt_4_1_deployment", - os.getenv("GPT_4_1_DEPLOYMENT", "gpt-4.1") - ) + gpt_4_1_deployment = variables.setdefault("gpt_4_1_deployment", os.getenv("GPT_4_1_DEPLOYMENT", "gpt-4.1")) gpt_4_1_mini_deployment = variables.setdefault( - "gpt_4_1_mini_deployment", - os.getenv("GPT_4_1_MINI_DEPLOYMENT", "gpt-4.1-mini") + "gpt_4_1_mini_deployment", os.getenv("GPT_4_1_MINI_DEPLOYMENT", "gpt-4.1-mini") ) text_embedding_3_large_deployment = variables.setdefault( "text_embedding_3_large_deployment", - os.getenv("TEXT_EMBEDDING_3_LARGE_DEPLOYMENT", "text-embedding-3-large") + os.getenv("TEXT_EMBEDDING_3_LARGE_DEPLOYMENT", "text-embedding-3-large"), ) client = self.create_async_client(endpoint=contentunderstanding_endpoint) diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/test_content_understanding_content_analyzers_operations.py b/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/test_content_understanding_content_analyzers_operations.py index f6b809553f88..bf11afd665d7 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/test_content_understanding_content_analyzers_operations.py +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/test_content_understanding_content_analyzers_operations.py @@ -216,9 +216,7 @@ class TestContentUnderstandingContentAnalyzersOperations(ContentUnderstandingCli @ContentUnderstandingPreparer() @recorded_by_proxy - def test_content_analyzers_begin_create_with_content_analyzer( - self, contentunderstanding_endpoint: str - ) -> None: + def test_content_analyzers_begin_create_with_content_analyzer(self, contentunderstanding_endpoint: str) -> None: """ Test Summary: - Create analyzer using ContentAnalyzer object diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/test_content_understanding_content_analyzers_operations_async.py b/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/test_content_understanding_content_analyzers_operations_async.py index 897faa12e26e..54406553e467 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/test_content_understanding_content_analyzers_operations_async.py +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/test_content_understanding_content_analyzers_operations_async.py @@ -225,17 +225,13 @@ async def test_update_defaults_async(self, contentunderstanding_endpoint: str, * # Get deployment names from variables (playback) or environment (recording) # If not found, use defaults and record them - gpt_4_1_deployment = variables.setdefault( - "gpt_4_1_deployment", - os.getenv("GPT_4_1_DEPLOYMENT", "gpt-4.1") - ) + gpt_4_1_deployment = variables.setdefault("gpt_4_1_deployment", os.getenv("GPT_4_1_DEPLOYMENT", "gpt-4.1")) gpt_4_1_mini_deployment = variables.setdefault( - "gpt_4_1_mini_deployment", - os.getenv("GPT_4_1_MINI_DEPLOYMENT", "gpt-4.1-mini") + "gpt_4_1_mini_deployment", os.getenv("GPT_4_1_MINI_DEPLOYMENT", "gpt-4.1-mini") ) text_embedding_3_large_deployment = variables.setdefault( "text_embedding_3_large_deployment", - os.getenv("TEXT_EMBEDDING_3_LARGE_DEPLOYMENT", "text-embedding-3-large") + os.getenv("TEXT_EMBEDDING_3_LARGE_DEPLOYMENT", "text-embedding-3-large"), ) client: ContentUnderstandingClient = self.create_async_client(endpoint=contentunderstanding_endpoint) diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/test_helpers.py b/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/test_helpers.py index 898ba4aa7958..0f11ab90e640 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/test_helpers.py +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/test_helpers.py @@ -79,7 +79,7 @@ def new_simple_content_analyzer_object( description="schema description here", name="schema name here", ), - processing_location=ProcessingLocation.GLOBAL, + processing_location=ProcessingLocation.GLOBALEnum, models={"completion": "gpt-4.1"}, # Required when using field_schema tags=tags, ) @@ -109,7 +109,7 @@ def new_marketing_video_analyzer_object( return_details=True, ), description=description, - processing_location=ProcessingLocation.GLOBAL, + processing_location=ProcessingLocation.GLOBALEnum, models={"completion": "gpt-4.1"}, # Required when using field_schema tags=tags, ) @@ -460,7 +460,7 @@ def new_invoice_analyzer_object( description="Invoice field extraction schema", name="invoice_schema", ), - processing_location=ProcessingLocation.GLOBAL, + processing_location=ProcessingLocation.GLOBALEnum, models={"completion": "gpt-4.1"}, # Required when using field_schema tags=tags, ) diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/testpreparer.py b/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/testpreparer.py index 225df9600f34..af4e29821e84 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/testpreparer.py +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/testpreparer.py @@ -18,7 +18,7 @@ class ContentUnderstandingClientTestBase(AzureRecordedTestCase): def create_client(self, endpoint: str) -> ContentUnderstandingClient: # Normalize endpoint: remove trailing slashes to prevent double slashes in URLs endpoint = endpoint.rstrip("/") - + # Try API key first (for Content Understanding service) # Check CONTENTUNDERSTANDING_KEY key = os.getenv("CONTENTUNDERSTANDING_KEY") diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/testpreparer_async.py b/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/testpreparer_async.py index a62e6f610115..7b13ec166d01 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/testpreparer_async.py +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/tests/testpreparer_async.py @@ -18,7 +18,7 @@ class ContentUnderstandingClientTestBaseAsync(AzureRecordedTestCase): def create_async_client(self, endpoint: str) -> ContentUnderstandingClient: # Normalize endpoint: remove trailing slashes to prevent double slashes in URLs endpoint = endpoint.rstrip("/") - + # Try API key first (for Content Understanding service) # Check CONTENTUNDERSTANDING_KEY key = os.getenv("CONTENTUNDERSTANDING_KEY") diff --git a/sdk/contentunderstanding/azure-ai-contentunderstanding/tsp-location.yaml b/sdk/contentunderstanding/azure-ai-contentunderstanding/tsp-location.yaml index 276e0386c962..a996fcaa3754 100644 --- a/sdk/contentunderstanding/azure-ai-contentunderstanding/tsp-location.yaml +++ b/sdk/contentunderstanding/azure-ai-contentunderstanding/tsp-location.yaml @@ -1,4 +1,4 @@ directory: specification/ai/ContentUnderstanding -commit: a3291026612253abe544704a27bfad1dbdd5dcc2 +commit: 5fdd87d51fd8d9f030d7d96ca678aa029877d843 repo: Azure/azure-rest-api-specs additionalDirectories: