Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ jobs:
strategy:
fail-fast: false
matrix:
python-version: ["3.10", "3.11", "3.12"]
python-version: ["3.10", "3.11", "3.12", "3.13"]

steps:
- uses: actions/checkout@v4
Expand Down
87 changes: 87 additions & 0 deletions HISTORY.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,93 @@
Release History
===============

0.2.1b7
+++++++

Build stage re-entry fix
~~~~~~~~~~~~~~~~~~~~~~~~~~
* **QA remediation failure now retries on re-entry** — fixed a bug where
``mark_stage_generated()`` was called after each remediation attempt
inside ``_run_stage_qa()``, leaving the stage with status ``"generated"``
even when QA subsequently failed. On re-entry, the stage was skipped
instead of retried. Changed to ``mark_stage_validating()`` so failed
stages remain in the retry list.

QA checklist hardening
~~~~~~~~~~~~~~~~~~~~~~~~
* **Aligned response_export_values directive** — QA checklist now requires
``response_export_values = ["*"]`` on EVERY ``azapi_resource``, matching
the terraform agent's mandatory rule (was conditional on output usage).
* **Added deploy.sh -state= flag check** — QA checklist now flags use of
``terraform output -state=`` which was removed in Terraform 1.9.
* **Added UUID hex validation** — QA checklist now checks that UUID values
in role assignment names contain only valid hex characters ``[0-9a-f]``.

Full stage retry on QA exhaustion
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* **Full stage retry when QA remediation fails** -- when QA remediation
exhausts all attempts for a stage, the build now retries the entire
stage from scratch (clean artifacts, regenerate, QA) instead of
stopping the build immediately. Previous QA findings are injected
into the new generation prompt — framed as guidance rather than
file-specific instructions — so the model avoids the same classes
of mistakes on the fresh attempt.

In practice, the same generation prompt produces passing code ~90%
of the time. The remaining ~10% failure rate is stochastic — not a
systematic prompt deficiency — meaning a fresh generation with
knowledge of what went wrong almost always succeeds. Without this
retry, that 10% forces the user to manually re-run the entire build,
losing the progress of all previously generated stages. The retry
doubles the token cost of one stage in the worst case, but saves
the full cost of restarting a 16-stage build from scratch.

Controlled by ``_MAX_FULL_STAGE_ATTEMPTS`` (default 2: 1 initial
+ 1 fresh retry).

Generation prompt improvements
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* **Front-loaded remote state no-dead-code directive** — when upstream
stages exist, a ``CROSS-STAGE DEPENDENCIES — NO DEAD CODE`` section
now appears before the architecture context in the generation prompt,
reducing unused ``terraform_remote_state`` data sources.

Agent-level service filtering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* **Agent governance checks now filter by service namespace** — added
``stage_services`` field to ``AgentContext``, populated by
``_agent_build_context()``. ``_apply_governance_check()`` now passes
stage services to ``validate_response()``, reducing false positive
anti-pattern warnings for irrelevant service namespaces.

ReDoS fix in transform handlers
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* **Replaced nested-quantifier regex with brace counting** — extracted
shared ``_find_azapi_blocks()`` helper and rewrote
``_add_response_export_values``, ``_add_resource_group_parent_id``,
and ``_remove_private_endpoint_resources`` to use it. Eliminates
potential exponential backtracking on pathological input.

Test suite consolidation
~~~~~~~~~~~~~~~~~~~~~~~~~~
* **Consolidated and enhanced unit test coverage** — migrated flat test
files to a mirrored directory structure (1:1 test-to-source mapping),
merged split test files, and removed ~114 duplicate tests across 10
files. Test suite reduced from 3,644 to 3,530 tests with zero loss
of unique coverage.

QA review continuation for large stages
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* **QA review collects complete response before evaluating** — when the
QA review response is truncated (``finish_reason=length``), the build
session now continues requesting until the full review is received,
then evaluates the concatenated result. Uses the existing
``_execute_with_continuation()`` pattern with a review-specific
continuation prompt that prevents the QA agent from generating code
in the continuation. Conversation history is saved and restored
around QA calls to prevent review messages from contaminating
subsequent stage generation.

0.2.1b6
+++++++

Expand Down
3 changes: 2 additions & 1 deletion azext_prototype/agents/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,7 @@ class AgentContext:
artifacts: dict[str, Any] = field(default_factory=dict)
shared_state: dict[str, Any] = field(default_factory=dict)
mcp_manager: Any = None # MCPManager | None — typed as Any to avoid circular import
stage_services: list[str] | None = None # ARM namespaces for service filtering

def add_artifact(self, key: str, value: Any):
"""Store an artifact for other agents to reference."""
Expand Down Expand Up @@ -299,7 +300,7 @@ def _apply_governance_check(self, response: AIResponse, context: AgentContext) -
avoid duplicating the governance warning block.
"""
iac_tool = context.project_config.get("project", {}).get("iac_tool") if context.project_config else None
warnings = self.validate_response(response.content, iac_tool=iac_tool, services=None)
warnings = self.validate_response(response.content, iac_tool=iac_tool, services=context.stage_services)
if warnings:
for w in warnings:
logger.warning("Governance: %s", w)
Expand Down
8 changes: 6 additions & 2 deletions azext_prototype/agents/builtin/qa_engineer.py
Original file line number Diff line number Diff line change
Expand Up @@ -233,6 +233,8 @@ def _encode_image(path: str) -> str:
- [ ] deploy.sh includes error handling (set -euo pipefail, trap)
- [ ] deploy.sh exports outputs to JSON file for downstream stages
- [ ] deploy.sh includes Azure login verification
- [ ] deploy.sh does NOT use `terraform output -state=` — this flag was removed
in Terraform 1.9. Use `jq` on the state file or `cd` into the stage directory

### 4. Output Completeness
- [ ] outputs.tf exports resource group name(s)
Expand All @@ -251,8 +253,8 @@ def _encode_image(path: str) -> str:
- [ ] All referenced variables are defined in variables.tf
- [ ] All referenced locals are defined in locals.tf
- [ ] Application code includes all referenced classes/models/DTOs
- [ ] Every azapi_resource whose `.output.properties` is referenced in
outputs.tf MUST have `response_export_values = ["*"]` declared
- [ ] EVERY `azapi_resource` block MUST have `response_export_values = ["*"]`
declared — no exceptions, even if outputs.tf does not reference its properties
- [ ] No .tf file is empty or contains only comments (dead files)

### 7. Terraform File Structure
Expand Down Expand Up @@ -314,6 +316,8 @@ def _encode_image(path: str) -> str:
**NOT** string interpolation on the storage account ID
- [ ] RBAC assignments for the worker identity (Stage 1) are **unconditional**
(no `count`). The worker identity exists before any service stage runs.
- [ ] UUID values in role assignment names contain only valid hex characters
`[0-9a-f]` — letters `g`-`z` are invalid and ARM rejects with `InvalidName`

### 13. Application Code (app stages only)
- [ ] Application source code is syntactically correct and complete
Expand Down
11 changes: 8 additions & 3 deletions azext_prototype/ai/token_tracker.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,30 +44,35 @@
# GitHub Copilot Premium Request Unit (PRU) multipliers.
# Each API call costs (1 × multiplier) PRUs. Only applies to the
# Copilot provider — models not in this table produce 0 PRUs.
# Source: https://docs.github.com/en/copilot/concepts/billing/copilot-requests
# Source: https://docs.github.com/en/copilot/managing-copilot/monitoring-usage-and-entitlements/about-premium-requests
# Last updated: 2026-04-08
_PRU_MULTIPLIERS: dict[str, float] = {
# Included with paid plans (0 PRUs)
"gpt-5-mini": 0,
"gpt-4.1": 0,
"gpt-4o": 0,
"raptor-mini": 0,
# Low-cost (0.25–0.33 PRUs per request)
"grok-code-fast-1": 0.25,
"claude-haiku-4.5": 0.33,
"gemini-3-flash": 0.33,
"gpt-5.1-codex-mini": 0.33,
"gpt-5.4-mini": 0.33,
# Standard (1 PRU per request)
"claude-sonnet-4": 1,
"claude-sonnet-4.5": 1,
"claude-sonnet-4.6": 1,
"gemini-2.5-pro": 1,
"gemini-3-pro": 1,
"gemini-3-pro-1.5": 1,
"gemini-3.1-pro": 1,
"gpt-5.1": 1,
"gpt-5.2": 1,
"gpt-5.2-codex": 1,
"gpt-5.3-codex": 1,
"gpt-5.4": 1,
# Premium (3+ PRUs per request)
"claude-opus-4.5": 3,
"claude-opus-4.6": 3,
"claude-opus-4.6-fast": 30,
}


Expand Down
2 changes: 1 addition & 1 deletion azext_prototype/azext_metadata.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"azext.isPreview": true,
"azext.minCliCoreVersion": "2.50.0",
"name": "prototype",
"version": "0.2.1b6",
"version": "0.2.1b7",
"azext.summary": "Azure CLI extension for building rapid prototypes with GitHub Copilot.",
"license": "MIT",
"classifiers": [
Expand Down
Loading
Loading