diff --git a/docs/README.skills.md b/docs/README.skills.md
index 3a35dce3c..dd4b8bacd 100644
--- a/docs/README.skills.md
+++ b/docs/README.skills.md
@@ -47,6 +47,7 @@ See [CONTRIBUTING.md](../CONTRIBUTING.md#adding-skills) for guidelines on how to
| [arduino-azure-iot-edge-integration](../skills/arduino-azure-iot-edge-integration/SKILL.md)
`gh skills install github/awesome-copilot arduino-azure-iot-edge-integration` | Design and implement Arduino integration with Azure IoT Hub and IoT Edge, including secure provisioning, resilient telemetry, command handling, and production guardrails. | `references/arduino-iot-checklist.md`
`references/arduino-official-best-practices.md` |
| [arize-ai-provider-integration](../skills/arize-ai-provider-integration/SKILL.md)
`gh skills install github/awesome-copilot arize-ai-provider-integration` | Creates, reads, updates, and deletes Arize AI integrations that store LLM provider credentials used by evaluators and other Arize features. Supports any LLM provider (e.g. OpenAI, Anthropic, Azure OpenAI, AWS Bedrock, Vertex AI, Gemini, NVIDIA NIM). Use when the user mentions AI integration, LLM provider credentials, create integration, list integrations, update credentials, delete integration, or connecting an LLM provider to Arize. | `references/ax-profiles.md`
`references/ax-setup.md` |
| [arize-annotation](../skills/arize-annotation/SKILL.md)
`gh skills install github/awesome-copilot arize-annotation` | Creates and manages annotation configs (categorical, continuous, freeform label schemas) and annotation queues (human review workflows) on Arize. Applies human annotations to project spans via the Python SDK. Use when the user mentions annotation config, annotation queue, label schema, human feedback, bulk annotate spans, update_annotations, labeling queue, annotate record, or human review. | `references/ax-profiles.md`
`references/ax-setup.md` |
+| [arize-compliance-audit](../skills/arize-compliance-audit/SKILL.md)
`gh skills install github/awesome-copilot arize-compliance-audit` | INVOKE THIS SKILL when auditing an AI agent or LLM app for regulatory compliance. Covers EU AI Act, GPAI Code of Practice, GDPR, NIST AI RMF, Colorado AI Act, HIPAA, and ISO 42001. Scans the codebase for compliance gaps, cross-references Arize instrumentation for audit trail coverage, and produces an actionable remediation checklist tailored to the selected frameworks. | `references/compliance-checklist-template.md`
`references/eu-ai-act-gpai.md`
`references/iso-42001.md`
`references/us-ai-compliance.md` |
| [arize-dataset](../skills/arize-dataset/SKILL.md)
`gh skills install github/awesome-copilot arize-dataset` | Creates, manages, and queries Arize datasets and examples. Covers dataset CRUD, appending examples, exporting data, and file-based dataset creation using the ax CLI. Use when the user needs test data, evaluation examples, or mentions create dataset, list datasets, export dataset, append examples, dataset version, golden dataset, or test set. | `references/ax-profiles.md`
`references/ax-setup.md` |
| [arize-evaluator](../skills/arize-evaluator/SKILL.md)
`gh skills install github/awesome-copilot arize-evaluator` | Handles LLM-as-judge evaluation workflows on Arize including creating/updating evaluators, running evaluations on spans or experiments, managing tasks, trigger-run operations, column mapping, and continuous monitoring. Use when the user mentions create evaluator, LLM judge, hallucination, faithfulness, correctness, relevance, run eval, score spans, score experiment, trigger-run, column mapping, continuous monitoring, or improve evaluator prompt. | `references/ax-profiles.md`
`references/ax-setup.md` |
| [arize-experiment](../skills/arize-experiment/SKILL.md)
`gh skills install github/awesome-copilot arize-experiment` | Creates, runs, and analyzes Arize experiments for evaluating and comparing model performance. Covers experiment CRUD, exporting runs, comparing results, and evaluation workflows using the ax CLI. Use when the user mentions create experiment, run experiment, compare models, model performance, evaluate AI, experiment results, benchmark, A/B test models, or measure accuracy. | `references/ax-profiles.md`
`references/ax-setup.md` |
diff --git a/skills/arize-compliance-audit/SKILL.md b/skills/arize-compliance-audit/SKILL.md
new file mode 100644
index 000000000..1290a947f
--- /dev/null
+++ b/skills/arize-compliance-audit/SKILL.md
@@ -0,0 +1,314 @@
+---
+name: arize-compliance-audit
+description: "INVOKE THIS SKILL when auditing an AI agent or LLM app for regulatory compliance. Covers EU AI Act, GPAI Code of Practice, GDPR, NIST AI RMF, Colorado AI Act, HIPAA, and ISO 42001. Scans the codebase for compliance gaps, cross-references Arize instrumentation for audit trail coverage, and produces an actionable remediation checklist tailored to the selected frameworks."
+---
+
+# Arize Compliance Audit Skill
+
+Use this skill when the user wants to **audit their AI agent or LLM application for regulatory compliance**. The skill scans the codebase for compliance gaps, cross-references Arize instrumentation for audit trail coverage, and produces a tailored checklist with optional remediation.
+
+**Triggers:** "audit my app for compliance", "EU AI Act requirements", "NIST AI RMF checklist", "GDPR for AI", "is my AI app compliant", "compliance checklist", "regulatory audit", "ISO 42001", "AI management system", "AIMS certification".
+
+## Disclaimer
+
+**Before doing anything else, present this disclaimer verbatim to the user:**
+
+---
+
+> ⚠️ **Legal disclaimer**
+>
+> This audit is for **guidance only** and does **not** constitute legal advice or a complete compliance assessment. It identifies common technical patterns and gaps based on publicly available regulatory frameworks, but cannot assess your organisation's specific legal obligations, contractual commitments, data processing agreements, or operational processes.
+>
+> **Do not rely on this output as a substitute for qualified legal counsel.** Regulatory compliance is a complex, jurisdiction-specific, and fact-dependent determination. Always engage a qualified attorney or compliance specialist for binding assessments.
+
+---
+
+## Core principles
+
+- **Prefer inspection over mutation** — understand the codebase before suggesting changes.
+- **Be practical, not legal** — produce developer-actionable items, not legal opinions.
+- **Tailor to jurisdiction and use case** — a chatbot has different obligations than a hiring tool. Do not dump the entire regulatory framework.
+- **Cross-reference instrumentation** — compliance requires audit trails; check whether Arize tracing captures what regulators expect.
+- **Offer remediation, always confirm** — after presenting the checklist, offer to implement specific fixes, but never modify code without explicit user confirmation.
+- **Keep output concise and production-focused** — do not generate extra documentation or summary files unless requested.
+- **Never embed literal credential values** — always reference environment variables.
+
+## Phase 0: Framework selection and use case
+
+Before scanning code, determine which compliance frameworks apply.
+
+### Step 1 — Framework selection
+
+Use the `AskUserQuestion` tool to ask the user which frameworks apply. **Do not infer or auto-select** — always ask explicitly.
+
+Ask:
+
+```
+Which compliance frameworks should this audit cover?
+Select all that apply (reply with numbers, e.g. "1, 3"):
+
+1. EU frameworks — EU AI Act, GPAI Code of Practice, GDPR
+ (choose if end-users or data subjects are located in the EU)
+
+2. US frameworks — NIST AI RMF, state laws (Colorado AI Act, NYC LL144),
+ HIPAA (if processing health data)
+ (choose if operating in the United States)
+
+3. ISO 42001 — International AI Management System standard
+ (choose if pursuing ISO 42001 certification, operating globally,
+ or wanting an internationally recognised baseline)
+
+You can select any combination. If unsure, select all that seem relevant
+and we can narrow down during the audit.
+```
+
+Based on the selection:
+- **1 selected** — EU AI Act, GPAI Code of Practice, GDPR apply. See references/eu-ai-act-gpai.md.
+- **2 selected** — NIST AI RMF, Colorado AI Act, NYC LL144, HIPAA may apply. See references/us-ai-compliance.md.
+- **3 selected** — ISO 42001 AIMS controls apply. See references/iso-42001.md. Note: ISO 42001 is an organisational management system — the audit will cover technically-auditable controls only; purely organisational clauses (leadership review, internal audits) are flagged separately.
+- **Multiple selected** — all selected frameworks apply; the audit covers the union of requirements, with cross-references where frameworks overlap.
+
+### Step 2 — Determine use case category
+
+Use the `AskUserQuestion` tool to ask: **What does your AI application do?**
+
+- **General chatbot / assistant** — Limited risk (EU), general obligations (US)
+- **Hiring / HR** — High risk (EU Art. 6, Annex III); Colorado AI Act applies; NYC LL144 applies if NYC
+- **Healthcare** — High risk (EU); HIPAA applies if processing PHI
+- **Credit / financial** — High risk (EU); Colorado AI Act applies
+- **Education** — High risk (EU)
+- **Content generation** — Limited risk (EU Art. 50 transparency); general obligations (US)
+- **GPAI model provider** — GPAI Code of Practice applies (EU)
+
+### Step 3 — Determine risk tier
+
+Based on the use case and selected frameworks:
+- **EU selected**: Classify as Unacceptable / High / Limited / Minimal per references/eu-ai-act-gpai.md
+- **US selected**: Classify as High-risk (consequential decisions per Colorado AI Act) or General
+- **ISO 42001 selected**: Risk tier is not a formal classification in ISO 42001, but note whether the system is high-stakes (which elevates the priority of impact assessment and bias controls)
+
+### Phase 0 output
+
+Present a brief summary:
+
+```
+Frameworks selected: {EU / US / ISO 42001 / combination}
+Use case: {category}
+Risk tier: {EU tier if applicable} / {US tier if applicable}
+Applicable: {list of specific regulations and standards}
+ISO 42001 note: {if selected} Audit covers technically-auditable controls only;
+ organisational clauses will be flagged but not code-audited.
+```
+
+Then proceed directly to Phase 1.
+
+## Phase 1: Codebase audit (read-only)
+
+**Do not write any code or create any files during this phase.**
+
+Systematically scan the codebase for evidence of compliance and gaps across seven domains. For each domain, run the listed searches and record findings.
+
+### A. Transparency and disclosure
+
+**What to look for:**
+- User-facing strings disclosing AI involvement: search for terms like `AI`, `artificial intelligence`, `automated`, `bot`, `machine learning`, `generated by`, `powered by` in UI templates, API responses, and user-facing code
+- Content labelling: markers on AI-generated output (text, images, audio)
+- Terms of service, privacy policy references in the codebase
+
+**Signals of concern:** Absence of any AI disclosure in user-facing code, especially if the application generates content or makes recommendations.
+
+### B. Data protection and privacy
+
+**What to look for:**
+- PII field names in code: `email`, `phone`, `ssn`, `social_security`, `date_of_birth`, `address`, `name` in prompts, context, or retrieved documents
+- PII in trace span attributes: check if `input.value` or `output.value` could contain personal data sent to Arize without redaction
+- Consent mechanisms: `consent`, `opt-in`, `opt-out`, `gdpr`, `ccpa` references
+- DPIA or privacy assessment references
+- Data retention and deletion handlers
+- Data subject rights: `right_to_access`, `right_to_erasure`, `data_subject_request`, `data_protection_officer`
+
+### C. Security
+
+**What to look for:**
+- Prompt injection defences: input validation, guardrail libraries (`guardrails-ai`, `nemo-guardrails`, `rebuff`, `lakera`), content filtering, system prompt protection
+- Data loss prevention: output scanning before returning to users, sensitive data detection
+- Tool/function calling controls: permission boundaries, allowlists, sandboxing for tool execution
+- Rate limiting and authentication on AI endpoints
+- Hardcoded secrets: `api_key`, `secret`, `password`, `token` literals in source files (not env var references)
+
+### D. Testing and evaluation
+
+**What to look for:**
+- Bias and fairness testing: references to demographic parity, impact ratios, fairness metrics
+- Red teaming or adversarial test suites: prompt injection tests, jailbreak tests
+- Evaluation frameworks: Arize evaluators, custom eval scripts, `pytest`-based evals, experiment infrastructure
+- A/B testing or model comparison infrastructure
+
+### E. Documentation
+
+**What to look for:**
+- Model cards: `MODEL_CARD.md`, `model_card.json`, `model_card.yaml`, or similar
+- System architecture documentation
+- Change logs or version tracking for prompts and model updates
+- Incident response documentation
+
+### F. Monitoring and observability
+
+**What to look for:**
+- Arize tracing setup: `arize-otel`, `register()`, `TracerProvider`, `opentelemetry`, `openinference` imports
+- If tracing exists, check coverage:
+ - All LLM calls traced (not just some)
+ - Session IDs for conversation continuity
+ - User IDs for data subject request support
+ - Error tracking and exception spans
+- Alerting and drift detection configuration
+- Trace retention configuration
+
+### G. Vendor management
+
+**What to look for:**
+- Third-party AI API usage: OpenAI, Anthropic, Google, Azure, Bedrock, Cohere imports or client instantiation
+- Model versioning: are specific model versions pinned (e.g., `gpt-4-0613`) or using `latest` / unversioned identifiers
+- Fallback and failover logic between providers
+
+### Phase 1 output
+
+Present a two-part report:
+
+**Part 1 — Summary table**
+
+| Domain | Evidence found | Gaps identified | Rating |
+|---|---|---|---|
+| A. Transparency | {findings} | {gaps} | Compliant / Partial / Non-compliant / N/A |
+| B. Data protection | {findings} | {gaps} | ... |
+| C. Security | {findings} | {gaps} | ... |
+| D. Testing | {findings} | {gaps} | ... |
+| E. Documentation | {findings} | {gaps} | ... |
+| F. Monitoring | {findings} | {gaps} | ... |
+| G. Vendor management | {findings} | {gaps} | ... |
+
+**Part 2 — Gap detail (required for every Non-compliant or Partial rating)**
+
+For each domain rated Non-compliant or Partial, write a dedicated subsection that includes:
+
+1. **The exact code path** — file path(s), line number(s), and the relevant code snippet showing where the gap exists. Do not paraphrase; quote the actual code.
+2. **Why it matters in this specific app** — explain the concrete risk in the context of this codebase (e.g. which tools could be abused, which data flows are exposed, what an attacker or regulator would find).
+3. **What is missing** — a precise description of the control or code that should exist but does not (e.g. "a span attribute processor that hashes `user_email` before the OTLP exporter fires", not just "add PII redaction").
+
+Minimum one subsection per Non-compliant/Partial domain. Do not omit this section — it is the primary value of the audit for engineering teams.
+
+Then proceed directly to Phase 2.
+
+## Phase 2: Compliance checklist
+
+Using the Phase 1 findings and the template in references/compliance-checklist-template.md, generate a **tailored compliance checklist**.
+
+### Rules for checklist generation
+
+1. **Only include relevant sections.** If the user is US-only, skip GDPR-specific items. If not healthcare, skip HIPAA. If not hiring in NYC, skip LL144.
+2. **Mark items from Phase 1.** Items where evidence was found: mark as `Compliant`. Items with gaps: mark as `Non-compliant` with a concrete remediation suggestion.
+3. **Prioritise correctly.** Critical = enforcement risk or system prohibition. High = required by regulation. Medium = recommended by framework. Low = best practice.
+4. **Be specific in remediation.** Instead of "implement input validation", say "add a guardrail library like `guardrails-ai` to validate LLM inputs and outputs against your content policy".
+5. **Include the instrumentation cross-reference table** from the template. If Arize tracing is not set up, flag this as a Critical gap — audit trails are required by EU Art. 12 and NIST MAN-2.1.
+
+### Final report
+
+Present a single consolidated report with four sections:
+
+**Section 1 — Audit scope (Phase 0 summary)**
+- Frameworks selected, use case, risk tier, applicable regulations
+
+**Section 2 — Codebase findings (Phase 1 summary table)**
+- The domain table (A–G) with evidence, gaps, and ratings
+
+**Section 3 — Gap detail (Phase 1 expanded)**
+- One subsection per Non-compliant or Partial domain, each containing: exact file paths and line numbers, quoted code snippets, app-specific risk explanation, and a precise description of what is missing. This section is mandatory — never omit it.
+
+**Section 4 — Compliance checklist (Phase 2)**
+- The tailored checklist with status and remediation suggestions, instrumentation cross-reference table, priority summary, and recommended next steps
+
+When the user asks for a report file, write a single markdown file to `/tmp/-compliance-audit-.md` containing all four sections.
+
+After presenting the report, offer Phase 3 remediation.
+
+## Phase 3: Remediation (optional)
+
+After presenting the checklist, offer to implement specific fixes. **Always use the `AskUserQuestion` tool to confirm before making any changes.**
+
+### Remediation categories
+
+**Add dependencies** — offer to install:
+- Guardrail libraries for input/output validation (e.g., `guardrails-ai`, `nemo-guardrails`)
+- PII detection/redaction packages (e.g., `presidio-analyzer`, `scrubadub`)
+- Content safety classifiers
+
+**Insert code** — offer to add:
+- AI disclosure strings in user-facing output (templates, API responses)
+- PII redaction filters on span attributes before export to Arize
+- Input validation/sanitisation on AI endpoints
+- User ID attributes on trace spans for data subject request support
+
+**Create documentation templates** — offer to scaffold:
+- Model card template (markdown file with standard sections)
+- Incident response plan template
+- Data processing record template
+
+**Configure monitoring** — offer to set up via related skills:
+- Arize evaluators for bias detection and content safety (via `arize-evaluator` skill)
+- Tracing for audit trail coverage (via `arize-instrumentation` skill)
+
+### Remediation rules
+
+- Present each remediation as a **discrete, confirmable action**. Never batch-apply changes.
+- Show exactly what will change (file, code diff concept) then use the `AskUserQuestion` tool to get confirmation before applying.
+- Follow existing code style and project conventions.
+- Never embed credentials — always use environment variables.
+- Test that the application still builds after changes.
+
+## Skill orchestration
+
+When gaps identified in Phase 1 or 2 require capabilities from other Arize skills, offer to invoke them. **Always use the `AskUserQuestion` tool to ask before invoking another skill** and explain why it is relevant to the compliance gap.
+
+| Gap | Skill to invoke | Why |
+|---|---|---|
+| No tracing / incomplete audit trail | `arize-instrumentation` | EU Art. 12 and NIST MAN-2.1 require event logging; Arize tracing provides this |
+| No bias or safety evaluation | `arize-evaluator` | Create LLM-as-judge evaluators for fairness, content safety, or quality monitoring |
+| Need trace export for compliance evidence | `arize-trace` | Export spans for regulatory documentation or incident investigation |
+| Need human review for high-risk decisions | `arize-annotation` | Set up annotation queues for human oversight per EU Art. 14 |
+| Need deep link to share compliance evidence | `arize-link` | Generate URLs to specific traces, spans, or evaluations for stakeholder review |
+
+## Instrumentation cross-reference
+
+If Arize tracing is already set up, verify it meets compliance requirements:
+
+| Compliance need | Required trace data | What to check |
+|---|---|---|
+| Audit trail for AI decisions | All LLM spans with input/output | Verify all LLM client calls are instrumented, not just some |
+| Data subject access requests | User ID attribute on spans | Check for `user.id` or custom user identifier attribute |
+| PII in traces | Sensitive data in `input.value`/`output.value` | Check if PII passes through unredacted — flag if so |
+| Incident investigation | Error spans with full context | Check for exception tracking and error status on spans |
+| Retention requirements | Trace data retained for required period | EU: appropriate period (min 6 months for high-risk); HIPAA: 6 years |
+| Bias monitoring | Demographic or group attributes | Check for metadata attributes that enable fairness analysis |
+
+If Arize tracing is **not** set up, this is a significant compliance gap. Offer: "Shall I run the `arize-instrumentation` skill to set up audit-trail tracing? Regulatory frameworks (EU AI Act Art. 12, NIST AI RMF MAN-2.1) require event logging for AI systems."
+
+## Reference links
+
+| Resource | URL |
+|---|---|
+| EU AI Act full text | https://eur-lex.europa.eu/eli/reg/2024/1689/oj |
+| GPAI Code of Practice | https://digital-strategy.ec.europa.eu/en/policies/contents-code-gpai |
+| Code of Practice portal | https://code-of-practice.ai/ |
+| NIST AI RMF | https://www.nist.gov/artificial-intelligence/ai-risk-management-framework |
+| Colorado AI Act (SB24-205) | https://leg.colorado.gov/bills/sb24-205 |
+| NYC Local Law 144 | https://www.nyc.gov/site/dca/about/automated-employment-decision-tools.page |
+| HIPAA | https://www.hhs.gov/hipaa/index.html |
+| ISO/IEC 42001:2023 | https://www.iso.org/standard/42001.html |
+| Arize AX Docs | https://arize.com/docs/ax |
+
+## Reference files
+
+- references/eu-ai-act-gpai.md — EU AI Act and GPAI Code of Practice developer guide
+- references/us-ai-compliance.md — US compliance landscape (NIST AI RMF, Colorado, NYC LL144, HIPAA)
+- references/iso-42001.md — ISO/IEC 42001:2023 AI Management Systems developer guide (technically-auditable controls only)
+- references/compliance-checklist-template.md — Reusable checklist template for Phase 2 output
diff --git a/skills/arize-compliance-audit/references/compliance-checklist-template.md b/skills/arize-compliance-audit/references/compliance-checklist-template.md
new file mode 100644
index 000000000..03f05f6aa
--- /dev/null
+++ b/skills/arize-compliance-audit/references/compliance-checklist-template.md
@@ -0,0 +1,131 @@
+# Compliance Checklist Template
+
+Use this template to generate the Phase 2 compliance checklist. Fill in the status and notes columns based on Phase 1 audit findings. Only include sections relevant to the user's jurisdiction and use case — do not dump the entire template.
+
+## Header
+
+| Field | Value |
+|---|---|
+| Application name | {app_name} |
+| Frameworks selected | {EU / US / ISO 42001 / combination} |
+| Risk classification | {Unacceptable / High / Limited / Minimal (EU) or High-risk / General (US)} |
+| Use case category | {General chatbot / Hiring / Healthcare / Credit / Education / Other} |
+| Applicable regulations | {EU AI Act, GPAI CoP, GDPR, NIST AI RMF, Colorado AI Act, NYC LL144, HIPAA, ISO 42001} |
+| Audit date | {date} |
+
+## Priority legend
+
+| Priority | Meaning |
+|---|---|
+| Critical | Blocking — non-compliance carries enforcement risk or system prohibition |
+| High | Required by applicable regulation; remediate before production |
+| Medium | Recommended by frameworks; strengthens compliance posture |
+| Low | Best practice; implement when resources allow |
+
+---
+
+## 1. Governance and documentation
+
+| # | Item | Regulation | Priority | Status | Remediation |
+|---|---|---|---|---|---|
+| 1.1 | AI governance policy documented | NIST GOV-1, ISO 42001 cl.5.2 | Medium | | |
+| 1.2 | Accountable individuals assigned for AI risk | NIST GOV-1.1, Colorado, ISO 42001 A.6.1 | High | | |
+| 1.3 | Model card or technical documentation exists | EU Art. 11, Colorado, ISO 42001 A.6.2 | High | | |
+| 1.4 | Intended purpose and limitations documented | EU Art. 13, NIST MAP-1.1, ISO 42001 A.8 | High | | |
+| 1.5 | Change log for model/prompt updates maintained | EU Art. 11, NIST GOV-5, ISO 42001 cl.8.4 | Medium | | |
+| 1.6 | Documentation retained for required period (10yr EU, 6yr HIPAA) | EU Art. 11, HIPAA | High | | |
+| 1.7 | AI impact assessment conducted and documented | ISO 42001 cl.8.3, A.5.2 | High | | *ISO 42001 only.* Create an impact assessment document covering intended use, potential harms, and affected stakeholders. May be combined with a GDPR DPIA. |
+| 1.8 | AI risk assessment process documented | ISO 42001 cl.6.1.2 | High | | *ISO 42001 only.* Document the process for identifying and assessing AI-specific risks (bias, safety, privacy, security). |
+| 1.9 | Supplier AI policy documented (third-party AI providers) | ISO 42001 A.10 | Medium | | *ISO 42001 only.* Document all third-party AI APIs used and confirm vendor obligations (no training on your data, data residency, etc.). |
+| 1.10 | Organisational clauses (leadership, internal audit, management review) | ISO 42001 cl.5.1, 9.2, 9.3 | — | *Organisational — not code-auditable* | These clauses require evidence from outside the codebase. Flag for the compliance team. |
+
+## 2. Data protection and privacy
+
+| # | Item | Regulation | Priority | Status | Remediation |
+|---|---|---|---|---|---|
+| 2.1 | Lawful basis for processing documented | GDPR Art. 6 | Critical | | |
+| 2.2 | Data Protection Impact Assessment conducted | GDPR Art. 35, ISO 42001 cl.8.3 | High | | |
+| 2.3 | Data Processing Agreements with all AI vendors | GDPR Art. 28 | Critical | | |
+| 2.4 | PII redacted from traces and span attributes | GDPR Art. 5(1)(c) | High | | |
+| 2.5 | Data subject rights handlers implemented (access, deletion, portability) | GDPR Art. 15-22 | High | | |
+| 2.6 | User ID on spans to support data subject lookups | GDPR Art. 15 | Medium | | |
+| 2.7 | Consent or opt-out mechanism for AI processing | State privacy laws | High | | |
+| 2.8 | International data transfer safeguards (SCCs/DPF) | GDPR Art. 44-49 | Critical | | |
+| 2.9 | Breach detection and 72-hour notification workflow | GDPR Art. 33 | High | | |
+| 2.10 | PHI handling compliant (BAAs, min. necessary, encryption) | HIPAA | Critical | | |
+
+## 3. Security
+
+| # | Item | Regulation | Priority | Status | Remediation |
+|---|---|---|---|---|---|
+| 3.1 | Prompt injection defences implemented | EU Art. 15, NIST MEA-2.7, ISO 42001 A.4.2 | High | | |
+| 3.2 | Output filtering / guardrails in place | NIST MAN-1.2, ISO 42001 A.4.2 | High | | |
+| 3.3 | Data loss prevention for sensitive output | NIST MAN-1.2, HIPAA, ISO 42001 A.7.4 | High | | |
+| 3.4 | Tool/function access controls (least privilege) | NIST MAN-1.2 | High | | |
+| 3.5 | Rate limiting on AI endpoints | NIST MAN-1.2 | Medium | | |
+| 3.6 | Authentication on AI endpoints | NIST MAN-1.2 | High | | |
+| 3.7 | No hardcoded secrets in source | General security | Critical | | |
+| 3.8 | Incident response plan documented | NIST MAN-3.1 | Medium | | |
+
+## 4. Testing and evaluation
+
+| # | Item | Regulation | Priority | Status | Remediation |
+|---|---|---|---|---|---|
+| 4.1 | Bias and fairness testing across demographic groups | NIST MEA-2.1, Colorado, NYC LL144, ISO 42001 A.4.2 | High | | |
+| 4.2 | Adversarial / red team testing conducted | NIST MEA-2.5, GPAI CoP | High | | |
+| 4.3 | Evaluation pipeline for accuracy and quality | NIST MEA-1.1, ISO 42001 cl.9.1 | Medium | | |
+| 4.4 | Regression test suite for AI behaviour | EU Art. 15 | Medium | | |
+| 4.5 | Impact assessment (annual for Colorado) | Colorado AI Act | High | | |
+| 4.6 | Independent bias audit (annual for NYC LL144) | NYC LL144 | Critical | | |
+
+## 5. Transparency and user communication
+
+| # | Item | Regulation | Priority | Status | Remediation |
+|---|---|---|---|---|---|
+| 5.1 | Users informed they are interacting with AI | EU Art. 50, Colorado, ISO 42001 A.8 | Critical | | |
+| 5.2 | AI-generated content labelled | EU Art. 50(2) | High | | |
+| 5.3 | Explanation provided for AI-driven decisions | EU Art. 13-14, GDPR Art. 22 | High | | |
+| 5.4 | Opt-out mechanism for automated profiling | State privacy laws | High | | |
+| 5.5 | Capabilities and limitations communicated to users | EU Art. 13, Colorado, ISO 42001 A.9 | Medium | | |
+
+## 6. Monitoring and continuous compliance
+
+| # | Item | Regulation | Priority | Status | Remediation |
+|---|---|---|---|---|---|
+| 6.1 | Arize tracing captures all LLM calls | EU Art. 12, NIST MAN-2.1, ISO 42001 cl.8.6 | High | | |
+| 6.2 | Error and exception tracking on spans | NIST MAN-2.1, ISO 42001 cl.9.1 | Medium | | |
+| 6.3 | Drift detection and alerting configured | NIST MEA-3.1, ISO 42001 cl.9.1 | Medium | | |
+| 6.4 | Trace retention meets regulatory requirements | EU Art. 12, HIPAA | High | | |
+| 6.5 | Periodic re-audit schedule defined | NIST GOV-5, Colorado, ISO 42001 cl.9.2 | Medium | | |
+
+## 7. Vendor management
+
+| # | Item | Regulation | Priority | Status | Remediation |
+|---|---|---|---|---|---|
+| 7.1 | All third-party AI APIs documented | NIST MAP-3.1, GOV-6, ISO 42001 A.10 | Medium | | |
+| 7.2 | Model versions pinned (not using "latest") | NIST MAP-3.2, ISO 42001 cl.8.4 | Medium | | |
+| 7.3 | Data Processing Agreements with all processors | GDPR Art. 28 | Critical | | |
+| 7.4 | Vendor does not train on your data (confirmed in writing) | HIPAA, GDPR, ISO 42001 A.10 | High | | |
+| 7.5 | Fallback and failover logic implemented | NIST MAN-1.1 | Low | | |
+
+---
+
+## Instrumentation cross-reference
+
+| Compliance need | Required trace data | Status |
+|---|---|---|
+| Audit trail for AI decisions | All LLM spans with input/output | |
+| Data subject access requests | User ID attribute on spans | |
+| PII exposure prevention | PII redaction on span attributes | |
+| Incident investigation | Error spans with full context | |
+| Retention requirements | Trace retention configured for required period | |
+| Bias monitoring | Demographic or group attributes on spans | |
+
+## Next steps
+
+1. Address all **Critical** items immediately
+2. Remediate **High** items before production deployment
+3. Schedule **Medium** items for the next development cycle
+4. Track **Low** items in backlog
+5. Schedule next compliance audit for: {date + 3 months or per regulatory requirement}
+6. Consult qualified legal counsel for binding compliance assessment
diff --git a/skills/arize-compliance-audit/references/eu-ai-act-gpai.md b/skills/arize-compliance-audit/references/eu-ai-act-gpai.md
new file mode 100644
index 000000000..65e8bf9b5
--- /dev/null
+++ b/skills/arize-compliance-audit/references/eu-ai-act-gpai.md
@@ -0,0 +1,176 @@
+# EU AI Act and GPAI Code of Practice — Developer Reference
+
+This reference covers the EU AI Act (Regulation 2024/1689) and the Code of Practice for General-Purpose AI (GPAI). It is written for developers building AI agents and LLM applications, not for legal professionals. Always consult qualified legal counsel for binding compliance assessments.
+
+Official sources:
+- EU AI Act full text: https://eur-lex.europa.eu/eli/reg/2024/1689/oj
+- GPAI Code of Practice: https://digital-strategy.ec.europa.eu/en/policies/contents-code-gpai
+- Code of Practice portal: https://code-of-practice.ai/
+
+## Risk classification
+
+The EU AI Act classifies AI systems into four risk tiers. Your obligations depend on which tier your application falls into.
+
+### Unacceptable risk (prohibited — Art. 5)
+
+These AI uses are banned outright:
+- Social scoring by public authorities
+- Real-time remote biometric identification in public spaces (with narrow exceptions)
+- Exploitation of vulnerabilities of specific groups (age, disability)
+- Subliminal manipulation causing harm
+- Emotion recognition in workplaces and educational institutions (with exceptions)
+- Untargeted scraping of facial images for facial recognition databases
+
+### High risk (Art. 6–7, Annex III)
+
+Subject to the strictest requirements. Includes AI used for:
+- **Employment**: recruitment, CV screening, interview evaluation, promotion decisions, task allocation
+- **Credit scoring and loan decisions**
+- **Education**: student assessment, admissions, exam proctoring
+- **Healthcare**: diagnostic assistance, treatment recommendations, triage
+- **Law enforcement**: risk assessment, evidence evaluation
+- **Migration and border control**: document authentication, risk assessment
+- **Critical infrastructure**: energy, water, transport management
+
+### Limited risk (Art. 50 — transparency obligations)
+
+AI systems that interact with people or generate content must:
+- Disclose to users that they are interacting with AI
+- Label AI-generated content (text, audio, image, video) as such
+- Label deepfakes clearly
+
+This tier covers most chatbots, virtual assistants, and content-generation tools.
+
+### Minimal risk
+
+No specific obligations. Includes spam filters, AI-enhanced video games, inventory management systems.
+
+## High-risk system requirements (Art. 8–15)
+
+If your application is classified as high-risk, you must implement:
+
+### Risk management system (Art. 9)
+- Continuous, iterative risk identification and mitigation throughout the AI system lifecycle
+- Testing with metrics appropriate to the intended purpose
+- Evaluation of risks from foreseeable misuse
+- Risk mitigation measures that cannot be fully eliminated must be communicated to deployers
+
+### Data governance (Art. 10)
+- Training, validation, and testing datasets must be relevant, sufficiently representative, and as error-free as possible
+- Statistical properties including biases must be examined
+- Data must be appropriate for the geographic, contextual, and functional setting
+
+### Technical documentation (Art. 11)
+- General description of the AI system and its intended purpose
+- Design specifications, development methodology, and development choices
+- Monitoring, functioning, and control of the system
+- Risk management documentation
+- Description of changes made throughout the lifecycle
+- Must be drawn up before the system is placed on the market and kept up to date
+
+### Record-keeping (Art. 12)
+- Automatic logging of events ("logs") throughout the system's lifecycle
+- Logs must enable tracing of the system's operation to identify risks
+- Logging must be appropriate to the intended purpose
+- **Retention**: Logs must be kept for a period appropriate to the intended purpose, at least 6 months unless provided otherwise by law
+
+### Transparency and information to deployers (Art. 13)
+- Clear, adequate information to deployers about the system's capabilities and limitations
+- Intended purpose, level of accuracy, robustness, and cybersecurity
+- Known or foreseeable circumstances that may lead to risks
+- Performance metrics including for specific persons or groups
+- Technical capabilities and limitations (input data specifications, output interpretation guidance)
+
+### Human oversight (Art. 14)
+- Designed to allow effective oversight by natural persons during use
+- Deployers must assign human oversight to competent individuals
+- Individuals must be able to: understand the system, detect anomalies, decide not to use the system, intervene or stop the system
+
+### Accuracy, robustness, cybersecurity (Art. 15)
+- Appropriate level of accuracy declared in documentation
+- Resilience to errors, faults, and inconsistencies
+- Resilience against manipulation by unauthorised third parties
+- Technical redundancy solutions where appropriate
+
+## GPAI Code of Practice obligations
+
+The GPAI Code of Practice applies to providers of general-purpose AI models. If you are building on top of a GPAI model (e.g., using OpenAI, Anthropic, or open-source LLMs), these obligations primarily fall on the model provider, but you should understand them for your own documentation and risk management.
+
+### Transparency (Commitments 1–7)
+- Prepare and maintain a **Model Documentation Form** covering model capabilities, limitations, training data summaries, and contact information
+- Publish a sufficiently detailed summary of training data content
+- Provide documentation to downstream providers and deployers
+- **Retention**: Documentation must be maintained for a minimum of 10 years after the model is placed on the market
+
+### Copyright (Commitments 8–11)
+- Establish a copyright compliance policy aligned with EU copyright law (Directive 2019/790)
+- Implement measures to mitigate risk of copyright-infringing outputs
+- Implement a complaints-handling process for rights holders
+- Document how training data was sourced and validated for copyright compliance
+
+### Safety and security (Commitments 12–22, for systemic risk models)
+- Implement a Safety and Security Framework (SSF) covering governance, risk assessment, and technical mitigations
+- Identify and evaluate systemic risks before major deployment decisions
+- Conduct model evaluations (capability, safety, societal risk)
+- Implement safeguards for identified risks
+- Conduct adversarial testing (red teaming)
+- Report serious incidents to the AI Office within prescribed timelines
+- Track and document model capabilities over time
+
+## GDPR intersection
+
+If your AI application processes personal data of EU residents, GDPR applies regardless of where you are based.
+
+### When AI processing triggers GDPR
+- Prompts or context containing personal data (names, emails, health info)
+- User conversation logs stored for fine-tuning or evaluation
+- Retrieval-augmented generation (RAG) over documents containing personal data
+- Trace data sent to observability platforms containing PII in span attributes
+
+### Key GDPR requirements for AI developers
+- **Lawful basis** (Art. 6): Consent, contract, legitimate interest, or other lawful basis for processing
+- **Data Protection Impact Assessment** (Art. 35): Required for high-risk processing, including systematic profiling and large-scale processing of sensitive data
+- **Data Processing Agreements** (Art. 28): Required with every processor (LLM providers, cloud infrastructure, observability platforms)
+- **Data subject rights** (Art. 15–22): Right to access, rectification, erasure, restriction, portability, and to object. For automated decision-making: right to human intervention and explanation
+- **Data minimisation** (Art. 5(1)(c)): Only process data that is adequate, relevant, and limited to what is necessary
+- **Breach notification** (Art. 33–34): Notify supervisory authority within 72 hours; notify affected individuals without undue delay if high risk
+- **International transfers** (Art. 44–49): Standard Contractual Clauses or Data Privacy Framework for transfers outside EEA
+
+## Developer action mapping
+
+| Requirement | Article | Developer action |
+|---|---|---|
+| AI disclosure | Art. 50 | Add UI/API indicators that content is AI-generated or user is interacting with AI |
+| Content labelling | Art. 50(2) | Label AI-generated text, images, audio, video with machine-readable markers |
+| Technical documentation | Art. 11 | Maintain model card and system architecture docs |
+| Record-keeping | Art. 12 | Implement trace logging (Arize tracing covers this) with appropriate retention |
+| Transparency to deployers | Art. 13 | Document capabilities, limitations, accuracy metrics, known risks |
+| Human oversight | Art. 14 | Implement human-in-the-loop for high-risk decisions; build override/stop mechanisms |
+| Accuracy and robustness | Art. 15 | Implement evaluation pipelines, adversarial testing, error handling |
+| Risk management | Art. 9 | Continuous risk identification, testing, mitigation documentation |
+| Data governance | Art. 10 | Audit training/RAG data for bias, representativeness, errors |
+| GDPR — lawful basis | Art. 6 | Document and implement consent or legitimate interest for data processing |
+| GDPR — DPIA | Art. 35 | Conduct impact assessment for high-risk AI processing |
+| GDPR — DPA | Art. 28 | Execute data processing agreements with all AI vendors |
+| GDPR — data subject rights | Art. 15–22 | Implement access, deletion, portability handlers; user ID on traces for lookup |
+| GDPR — PII minimisation | Art. 5(1)(c) | Redact PII in prompts, traces, and logs where not strictly necessary |
+| GDPR — breach notification | Art. 33 | Implement breach detection and 72-hour notification workflow |
+
+## Penalties
+
+| Violation | Maximum fine |
+|---|---|
+| Prohibited AI practices (Art. 5) | EUR 35 million or 7% of global annual turnover |
+| High-risk non-compliance (Art. 8–15) | EUR 15 million or 3% of global annual turnover |
+| Incorrect information to authorities | EUR 7.5 million or 1.5% of global annual turnover |
+| GDPR violations | EUR 20 million or 4% of global annual turnover |
+
+## Timeline
+
+| Date | Milestone |
+|---|---|
+| 1 August 2024 | EU AI Act entered into force |
+| 2 February 2025 | Prohibited practices apply |
+| 2 August 2025 | GPAI obligations apply to new models |
+| 2 August 2026 | Full enforcement begins — high-risk requirements, Commission can request info, model access, and order recalls |
+| 2 August 2027 | Existing GPAI models (released before Aug 2025) must comply |
diff --git a/skills/arize-compliance-audit/references/iso-42001.md b/skills/arize-compliance-audit/references/iso-42001.md
new file mode 100644
index 000000000..aa4e159c6
--- /dev/null
+++ b/skills/arize-compliance-audit/references/iso-42001.md
@@ -0,0 +1,185 @@
+# ISO/IEC 42001:2023 — AI Management Systems Developer Reference
+
+This reference covers ISO/IEC 42001:2023, the international standard for Artificial Intelligence Management Systems (AIMS). It is written for developers building AI agents and LLM applications, not for legal or audit professionals.
+
+Official source: https://www.iso.org/standard/42001.html
+
+> **Scope note:** ISO 42001 is primarily an *organisational* management system standard. Many of its requirements (leadership commitment, internal audits, management reviews) cannot be verified from code alone. This reference focuses on clauses and annex controls that produce code or documentation artifacts — things a codebase audit can actually find evidence of or flag as missing.
+
+## What ISO 42001 is
+
+ISO 42001 is to AI what ISO 27001 is to information security: a framework for establishing, implementing, maintaining, and continually improving an AI management system. It applies to any organisation that develops, provides, or uses AI systems.
+
+Unlike the EU AI Act or NIST AI RMF, ISO 42001 is a certifiable standard — organisations can be independently audited and certified. It is internationally applicable (not jurisdiction-specific) and complements other frameworks rather than replacing them.
+
+## Relationship to other frameworks
+
+| Framework | Type | Scope |
+|---|---|---|
+| ISO 42001 | Management system standard (certifiable) | Organisational + technical |
+| EU AI Act | Regulation (mandatory, EU) | Technical + procedural |
+| NIST AI RMF | Voluntary framework (US) | Risk management, technical |
+| GDPR | Regulation (mandatory, EU) | Data protection |
+
+ISO 42001 Annex C maps the standard's controls to the EU AI Act and NIST AI RMF. If you comply with EU AI Act Art. 8–15, you satisfy most ISO 42001 technical requirements.
+
+## Clause structure overview
+
+| Clause | Topic | Code-auditable? |
+|---|---|---|
+| 4 — Context | Organisational context, interested parties | No |
+| 5 — Leadership | Management commitment, AI policy | Partially (policy doc) |
+| 6 — Planning | Risk assessment, objectives | Partially (impact assessment docs) |
+| 7 — Support | Resources, awareness, documentation | Partially (model cards, changelogs) |
+| 8 — Operation | AI system lifecycle, impact assessment | Yes (lifecycle docs, impact register) |
+| 9 — Performance evaluation | Monitoring, measurement, internal audit | Partially (monitoring config) |
+| 10 — Improvement | Nonconformity, continual improvement | No |
+| Annex A — Controls | 38 controls across 9 categories | Partially |
+
+## Technically auditable clauses and controls
+
+### Clause 5.2 — AI policy
+
+**Requirement:** The organisation shall establish an AI policy that includes commitments to satisfy applicable requirements and to continual improvement.
+
+**Code artifact to look for:** A policy document referenced in the codebase (`AI_POLICY.md`, `docs/ai-policy.md`, `GOVERNANCE.md`) or a link to a policy in `README.md` or `CONTRIBUTING.md`.
+
+**Gap signal:** No policy document or reference found anywhere in the repository.
+
+### Clause 6.1.2 — AI risk assessment
+
+**Requirement:** Documented process for identifying and assessing AI-specific risks (bias, safety, privacy, security).
+
+**Code artifact to look for:** Risk assessment or DPIA documents, risk register files, or references to a risk management process.
+
+**Gap signal:** No risk or impact assessment documentation.
+
+### Clause 8.3 — AI impact assessment
+
+**Requirement:** Conduct and document an impact assessment for AI systems, covering intended use, potential harms, and affected stakeholders.
+
+**Code artifact to look for:** `AI_IMPACT_ASSESSMENT.md`, `docs/impact-assessment.*`, or similar. May be combined with a DPIA for GDPR overlap.
+
+**Gap signal:** No impact assessment document found.
+
+### Clause 8.4 — AI system lifecycle
+
+**Requirement:** Manage the AI system through a documented lifecycle: design, development, testing, deployment, monitoring, decommission. Changes are documented and controlled.
+
+**Code artifact to look for:**
+- Changelog or version history for models and prompts
+- Model card (`MODEL_CARD.md`, `model_card.json`)
+- Pinned model versions (not `latest`) in code
+- Deployment notes or release documentation
+
+**Gap signal:** No model card, no changelog, or model versions unpinned.
+
+### Clause 8.6 — AI system operation and monitoring
+
+**Requirement:** Monitor AI system performance, detect degradation, and act on findings.
+
+**Code artifact to look for:**
+- Arize tracing and evaluation setup
+- Drift detection configuration
+- Alerting rules
+- Periodic re-evaluation schedule
+
+**Gap signal:** No monitoring or evaluation infrastructure.
+
+### Clause 9.1 — Monitoring, measurement, analysis, evaluation
+
+**Requirement:** Determine what needs to be monitored, methods, when to analyse, when to report results.
+
+**Code artifact to look for:** Evaluation pipelines, Arize evaluators, scheduled jobs for performance review, dashboards.
+
+**Gap signal:** No evaluation or measurement infrastructure beyond basic logging.
+
+### Annex A controls (technically auditable subset)
+
+#### A.2.2 — Policies for AI use (organisational AI use policy)
+
+Policies governing acceptable AI use, including approved use cases, prohibited uses, and responsible use guidelines.
+
+**Artifact:** `docs/ai-use-policy.md`, `ACCEPTABLE_USE.md`, or policy links in README.
+
+#### A.3.2 — Internal audit of the AIMS
+
+Documented audit programme and results.
+
+**Artifact:** Audit log files, scheduled audit tooling config. Usually organisational — note as not code-auditable.
+
+#### A.4.2 — Addressing bias
+
+Documented processes for identifying, assessing, and mitigating bias in training data and model outputs.
+
+**Artifact:** Bias testing scripts, fairness metric definitions, Arize evaluators for fairness, demographic parity tests.
+
+#### A.5.2 — AI system impact assessment
+
+See clause 8.3 above.
+
+#### A.6.1 — Allocation of AI roles and responsibilities
+
+Accountable owners for AI systems identified and documented.
+
+**Artifact:** `CODEOWNERS`, `README.md` ownership sections, or `GOVERNANCE.md` with role assignments.
+
+#### A.6.2 — AI system lifecycle management
+
+Processes for managing the full lifecycle. See clause 8.4.
+
+#### A.7 — Data for AI systems
+
+**A.7.1 — Data governance:** Processes ensuring data quality, provenance, and fitness for purpose.
+
+**A.7.2 — Data acquisition and collection:** Documented data sources, collection methods, consent basis.
+
+**A.7.3 — Data preparation:** Data cleaning, preprocessing, and validation documented.
+
+**A.7.4 — Data quality:** Data quality checks implemented in pipeline.
+
+**Code artifact to look for:** Data validation scripts, schema definitions, data provenance documentation, data quality checks.
+
+#### A.8 — Information for interested parties
+
+Requirements for transparency to users and stakeholders — overlaps with EU AI Act Art. 13 and 50.
+
+**Artifact:** User-facing disclosures, model documentation, capability and limitation statements.
+
+#### A.9 — Use of AI systems
+
+Guidance for users of AI systems, including known limitations and appropriate use. Often in user documentation.
+
+**Artifact:** User guide, `USAGE.md`, or in-app help text describing limitations.
+
+#### A.10 — Third-party and supplier relationships
+
+**Requirement:** Supplier AI policy covering third-party AI providers and data processors.
+
+**Code artifact to look for:** Documented AI vendor list, vendor risk assessments, DPA references, supplier questionnaires.
+
+**Gap signal:** Third-party AI APIs used (OpenAI, Anthropic, etc.) with no vendor documentation.
+
+## Organisational-only requirements (not code-auditable)
+
+These clauses require organisational evidence that cannot be found in a codebase. Flag them for the user to address outside the technical audit:
+
+| Clause | Requirement |
+|---|---|
+| 5.1 | Leadership and commitment — management sign-off |
+| 5.3 | Roles and responsibilities assigned |
+| 6.2 | AI objectives and planning to achieve them |
+| 7.2 | Competence — staff training records |
+| 7.3 | Awareness — staff awareness of AI policy |
+| 9.2 | Internal audit — conducted by qualified auditors |
+| 9.3 | Management review — periodic leadership reviews |
+| 10.1 | Continual improvement programme |
+
+## Checklist priority guidance
+
+| Priority | Meaning in ISO 42001 context |
+|---|---|
+| Critical | Certification blocker or significant nonconformity |
+| High | Required control with no documented evidence |
+| Medium | Control partially evidenced; needs strengthening |
+| Low | Best practice; documentation gap only |
diff --git a/skills/arize-compliance-audit/references/us-ai-compliance.md b/skills/arize-compliance-audit/references/us-ai-compliance.md
new file mode 100644
index 000000000..86e256c6a
--- /dev/null
+++ b/skills/arize-compliance-audit/references/us-ai-compliance.md
@@ -0,0 +1,184 @@
+# US AI Compliance Landscape — Developer Reference
+
+This reference covers US regulatory frameworks applicable to AI agent and LLM application development. The US does not have a single comprehensive federal AI law; compliance is a patchwork of federal frameworks, state laws, and sector-specific regulations. Always consult qualified legal counsel for binding compliance assessments.
+
+Official sources:
+- NIST AI RMF: https://www.nist.gov/artificial-intelligence/ai-risk-management-framework
+- Colorado AI Act (SB24-205): https://leg.colorado.gov/bills/sb24-205
+- NYC Local Law 144: https://www.nyc.gov/site/dca/about/automated-employment-decision-tools.page
+- HIPAA: https://www.hhs.gov/hipaa/index.html
+
+## NIST AI Risk Management Framework (AI RMF 1.0)
+
+The NIST AI RMF is voluntary but increasingly the de facto standard for AI governance. Federal agencies require alignment for AI procurement (OMB M-24-10), and state laws like the Colorado AI Act reference NIST-aligned frameworks. Aligning with NIST AI RMF is the strongest foundation for US compliance.
+
+### Govern function
+
+Establish organisational AI governance. Developer actions:
+
+| Sub-function | What to implement |
+|---|---|
+| GOV-1 | Define an AI governance policy covering accountability, roles, and escalation paths |
+| GOV-1.1 | Assign individuals or teams responsible for AI risk management |
+| GOV-1.2 | Document the organisation's risk tolerance for AI systems |
+| GOV-2 | Establish accountability structures — who is responsible when the AI produces harmful output |
+| GOV-3 | Define processes for workforce diversity and domain expertise in AI development |
+| GOV-4 | Implement organisational processes for feedback, appeals, and override of AI decisions |
+| GOV-5 | Establish processes for continuous improvement based on monitoring data |
+| GOV-6 | Define policies for third-party AI components (model providers, APIs, data sources) |
+
+### Map function
+
+Identify and contextualise AI risks. Developer actions:
+
+| Sub-function | What to implement |
+|---|---|
+| MAP-1.1 | Document the intended purpose and known limitations of the AI system |
+| MAP-1.2 | Identify the interdependencies and downstream impacts of the system |
+| MAP-1.5 | Document assumptions and constraints of the AI system |
+| MAP-2.1 | Identify who is affected by the system (users, subjects of decisions, bystanders) |
+| MAP-2.2 | Document potential benefits and harms for each stakeholder group |
+| MAP-2.3 | Assess risks of the AI system in its real-world deployment context |
+| MAP-3.1 | Identify risks from the AI supply chain (model provider, training data, third-party tools) |
+| MAP-3.2 | Assess risks of bias, inaccuracy, or unreliability in data and models |
+| MAP-5.1 | Document the system's fitness for its intended purpose through testing |
+
+### Measure function
+
+Quantify and test AI risks. Developer actions:
+
+| Sub-function | What to implement |
+|---|---|
+| MEA-1.1 | Define and track performance metrics appropriate to the system's purpose |
+| MEA-2.1 | Test for bias and fairness across demographic groups and use cases |
+| MEA-2.2 | Evaluate the system's robustness under adversarial conditions |
+| MEA-2.3 | Test for privacy risks — data leakage, PII exposure, membership inference |
+| MEA-2.5 | Conduct red teaming and adversarial testing for safety and security |
+| MEA-2.6 | Evaluate the AI system's environmental impact (compute, energy) |
+| MEA-2.7 | Evaluate the AI system's security posture (prompt injection, data exfiltration) |
+| MEA-3.1 | Track metrics over time to detect drift, degradation, and emerging risks |
+| MEA-4.1 | Document measurement methodology and results |
+
+### Manage function
+
+Implement controls and mitigation. Developer actions:
+
+| Sub-function | What to implement |
+|---|---|
+| MAN-1.1 | Implement risk treatment plans for identified risks |
+| MAN-1.2 | Deploy technical controls: input validation, output filtering, guardrails |
+| MAN-2.1 | Implement monitoring for deployed systems (drift, performance, incidents) |
+| MAN-2.2 | Establish mechanisms for user feedback and incident reporting |
+| MAN-3.1 | Define incident response procedures for AI-related incidents |
+| MAN-3.2 | Implement escalation and notification workflows |
+| MAN-4.1 | Implement decommissioning procedures for AI systems |
+| MAN-4.2 | Document and apply lessons learned from incidents |
+
+## Colorado AI Act (SB24-205)
+
+### Scope
+Applies to **developers and deployers** of "high-risk AI systems" — those making or substantially assisting **consequential decisions** in:
+- Employment, promotion, termination
+- Education admissions and opportunities
+- Financial services (lending, credit, insurance)
+- Healthcare services
+- Housing
+- Legal services
+- Government services
+
+### Developer obligations (effective 1 February 2026)
+
+| Requirement | What to implement |
+|---|---|
+| Reasonable care | Exercise reasonable care to protect consumers from algorithmic discrimination |
+| Impact assessment | Conduct and document annual impact assessments covering performance, bias, and discrimination risks |
+| Risk management | Implement a risk management programme aligned with a recognised framework (NIST AI RMF, ISO 42001) |
+| Documentation | Make available: a general description of the system, known limitations, the type of data used, how the system was evaluated, mitigation measures |
+| Disclosure to deployers | Provide deployers with documentation needed for their own compliance |
+| Discrimination prevention | Test for and mitigate algorithmic discrimination based on protected classes (race, colour, ethnicity, sex, religion, age, disability, sexual orientation, veteran status) |
+
+### Safe harbour
+Compliance with a recognised risk management framework (NIST AI RMF or ISO 42001) establishes a rebuttable presumption of reasonable care.
+
+## NYC Local Law 144 — Automated Employment Decision Tools (AEDT)
+
+### Scope
+Applies to employers and employment agencies in New York City using AI for hiring, promotion, or other employment decisions.
+
+### Requirements
+
+| Requirement | What to implement |
+|---|---|
+| Annual bias audit | Conduct an independent bias audit before use and annually thereafter |
+| Audit scope | Assess scoring rates and impact ratios for sex, race/ethnicity, and intersectional categories |
+| Transparency | Publish a summary of the most recent bias audit on the employer's website |
+| Notice to candidates | Notify candidates at least 10 business days before use; describe the job qualifications evaluated; allow alternatives |
+| Data disclosure | Disclose what data is collected and its retention period |
+
+### Audit methodology
+- Calculate selection/scoring rates for each demographic category
+- Calculate impact ratios (selection rate of each group vs the most-selected group)
+- Flag categories with impact ratios below 0.8 (four-fifths rule) for further analysis
+
+## HIPAA for AI applications
+
+Applies if your AI application processes Protected Health Information (PHI). This includes clinical data, diagnostic information, treatment records, or any data linkable to a specific patient.
+
+### When HIPAA applies to AI
+- AI system processes, stores, or transmits PHI
+- AI system is used by or on behalf of a covered entity (healthcare provider, health plan, healthcare clearinghouse)
+- AI system is provided by a business associate of a covered entity
+
+### Requirements
+
+| Requirement | What to implement |
+|---|---|
+| Business Associate Agreement (BAA) | Execute BAAs with every vendor that touches PHI (LLM provider, cloud, observability) |
+| No training on PHI | Verify LLM providers do not use your data for model training — get written confirmation |
+| Minimum necessary | Limit PHI in prompts and context to what is strictly necessary for the task |
+| Access controls | Implement role-based access control (RBAC) or attribute-based access control (ABAC) for PHI |
+| Audit logs | Maintain immutable audit logs of all PHI access: who, what, when, why |
+| Encryption | Encrypt PHI in transit (TLS 1.2+) and at rest (AES-256) |
+| De-identification | Use Safe Harbor or Expert Determination methods to de-identify data where possible |
+| Breach notification | Notify affected individuals within 60 days; notify HHS; notify media if 500+ individuals affected |
+
+### PHI in traces and spans
+If using Arize tracing in a healthcare context:
+- Ensure PHI is redacted from `input.value` and `output.value` span attributes before export
+- Verify Arize's BAA covers trace data storage
+- Implement PII/PHI detection filters on span data before it leaves the application
+- Configure trace retention to meet HIPAA's 6-year retention requirement for audit logs
+
+## State privacy laws intersection
+
+Several state privacy laws impose additional obligations when AI processes personal data:
+
+| Law | Scope | Key AI provisions |
+|---|---|---|
+| CCPA/CPRA (California) | Businesses serving CA residents | Right to opt out of automated decision-making; right to access information about the logic involved |
+| Virginia CDPA | Businesses serving VA residents | Right to opt out of profiling in furtherance of automated decisions; data protection assessments required |
+| Connecticut Data Privacy Act | Businesses serving CT residents | Right to opt out of profiling; data protection assessments for targeted advertising and profiling |
+| Texas Data Privacy Act | Businesses serving TX residents | Data protection assessments for processing that presents a heightened risk of harm |
+| Oregon Consumer Privacy Act | Businesses serving OR residents | Right to opt out of profiling; children's data protections |
+
+Common obligations across state privacy laws:
+- Data protection assessments for AI-driven profiling
+- Opt-out mechanisms for automated decision-making
+- Transparency about automated processing
+- Data minimisation in AI contexts
+
+## Developer action summary
+
+| Compliance area | Framework | Concrete developer action |
+|---|---|---|
+| Governance policy | NIST GOV-1 | Document AI governance policy, roles, accountability |
+| Risk identification | NIST MAP-1–3, Colorado | Document intended purpose, limitations, stakeholder impact, supply chain risks |
+| Bias testing | NIST MEA-2.1, Colorado, NYC LL144 | Test output fairness across demographic groups; compute impact ratios |
+| Adversarial testing | NIST MEA-2.5, MEA-2.7 | Red team for prompt injection, jailbreak, goal hijacking, data exfiltration |
+| Input/output validation | NIST MAN-1.2 | Implement guardrails, content filtering, DLP |
+| Monitoring | NIST MAN-2.1, MEA-3.1 | Deploy Arize tracing with drift detection and alerting |
+| Incident response | NIST MAN-3.1 | Define and document AI incident response procedures |
+| Documentation | Colorado, NIST MAP-1 | Maintain model card, system docs, impact assessments |
+| Transparency | Colorado, state privacy laws | Disclose AI use to users; provide opt-out for profiling |
+| PHI protection | HIPAA | BAAs, PII/PHI redaction, audit logs, encryption, minimum necessary |
+| Privacy assessments | State privacy laws | Conduct DPAs for AI processing involving personal data |
diff --git a/skills/arize-instrumentation/SKILL.md b/skills/arize-instrumentation/SKILL.md
index f1a16a54b..c51a40cde 100644
--- a/skills/arize-instrumentation/SKILL.md
+++ b/skills/arize-instrumentation/SKILL.md
@@ -110,12 +110,18 @@ Proceed **only after the user confirms** the Phase 1 analysis.
- Python: `pip install arize-otel` plus `openinference-instrumentation-{name}` (hyphens in package name; underscores in import, e.g. `openinference.instrumentation.llama_index`).
- TypeScript/JavaScript: `@opentelemetry/sdk-trace-node` plus the relevant `@arizeai/openinference-*` package.
- Java: OpenTelemetry SDK plus `openinference-instrumentation-*` in pom.xml or build.gradle.
- - Go: `go get go.opentelemetry.io/otel go.opentelemetry.io/otel/sdk go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp` — no auto-instrumentors yet, so the agent sets OpenInference attributes manually on spans. **Wire the exporter** with `otlptracehttp.WithEndpoint("otlp.arize.com")` (US) or `otlptracehttp.WithEndpoint("otlp.eu-west-1a.arize.com")` (EU) — pass the bare hostname, no `https://` scheme — and `otlptracehttp.WithHeaders(map[string]string{"space_id": ..., "api_key": ...})`. Recent OTel Go modules require Go ≥ 1.23 — `go mod tidy` may bump the toolchain.
+ - Go: No auto-instrumentors yet — the agent sets OpenInference attributes manually on spans. Install the OTel SDK:
+ ```
+ go get go.opentelemetry.io/otel
+ go get go.opentelemetry.io/otel/sdk
+ go get go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp
+ ```
+ **Wire the exporter** with `otlptracehttp.WithEndpoint("otlp.arize.com")` (US) or `otlptracehttp.WithEndpoint("otlp.eu-west-1a.arize.com")` (EU) — pass the bare hostname, no `https://` scheme — and `otlptracehttp.WithHeaders(map[string]string{"space_id": ..., "api_key": ...})`. Recent OTel Go modules require Go ≥ 1.23 — run `go mod tidy` after installing.
3. **Credentials** — User needs an **Arize API Key** and **Space ID**. Check existing `ax` profiles for `ARIZE_API_KEY` and `ARIZE_SPACE` — never read `.env` files:
- Run `ax profiles show` to check for an existing profile.
- If no profile exists, guide the user to run `ax profiles create` which provides an **interactive wizard** that walks through API key and space setup. See [CLI profiles docs](https://arize.com/docs/api-clients/cli/profiles) for details.
- If the user needs to find their API key manually, direct them to **https://app.arize.com** and to navigate to the settings page (do not use organization-specific URLs with placeholder IDs — they won't resolve for new users).
- - If credentials are not set, instruct the user to set them as environment variables — never embed raw values in generated code. All generated instrumentation code must reference `os.environ["ARIZE_API_KEY"]` (Python), `process.env.ARIZE_API_KEY` (TypeScript/JavaScript), or `os.Getenv("ARIZE_API_KEY")` (Go).
+ - If credentials are not set, instruct the user to set them as environment variables — never embed raw values in generated code. All generated instrumentation code must reference `os.environ["ARIZE_API_KEY"]` / `os.environ["ARIZE_SPACE"]` (Python), `process.env.ARIZE_API_KEY` / `process.env.ARIZE_SPACE` (TypeScript/JavaScript), or `os.Getenv("ARIZE_API_KEY")` / `os.Getenv("ARIZE_SPACE")` (Go).
- See references/ax-profiles.md for full profile setup and troubleshooting.
4. **Centralized instrumentation** — Create a single module (e.g. `instrumentation.py`, `instrumentation.ts`, `instrumentation.go`) and initialize tracing **before** any LLM client is created.
5. **Existing OTel** — If there is already a TracerProvider, add Arize as an **additional** exporter (e.g. BatchSpanProcessor with Arize OTLP). Do not replace existing setup unless the user asks.
@@ -213,6 +219,7 @@ import (
"encoding/json"
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/attribute"
+ "go.opentelemetry.io/otel/codes"
)
var tracer = otel.Tracer("my-app")
@@ -227,10 +234,11 @@ func runAgent(ctx context.Context, userMessage string) string {
// ... LLM call ...
for _, toolUse := range toolUses {
- ctx, toolSpan := tracer.Start(ctx, toolUse.Name)
+ _, toolSpan := tracer.Start(ctx, toolUse.Name)
argsJSON, err := json.Marshal(toolUse.Input)
if err != nil {
toolSpan.RecordError(err)
+ toolSpan.SetStatus(codes.Error, err.Error())
}
toolSpan.SetAttributes(
attribute.String("openinference.span.kind", "TOOL"),
diff --git a/skills/phoenix-tracing/references/production-python.md b/skills/phoenix-tracing/references/production-python.md
index 43124c5a4..94a325500 100644
--- a/skills/phoenix-tracing/references/production-python.md
+++ b/skills/phoenix-tracing/references/production-python.md
@@ -25,9 +25,12 @@ export OPENINFERENCE_HIDE_INPUT_MESSAGES=true # Hide LLM input messages
export OPENINFERENCE_HIDE_OUTPUT_MESSAGES=true # Hide LLM output messages
export OPENINFERENCE_HIDE_INPUT_IMAGES=true # Hide image content
export OPENINFERENCE_HIDE_INPUT_TEXT=true # Hide embedding text
+export OPENINFERENCE_HIDE_LLM_TOOLS=true # Hide tool definitions sent to the LLM (llm.tools.*)
export OPENINFERENCE_BASE64_IMAGE_MAX_LENGTH=10000 # Limit image size
```
+> `OPENINFERENCE_HIDE_LLM_TOOLS` is also applied when `OPENINFERENCE_HIDE_INPUTS` is enabled, consistent with `OPENINFERENCE_HIDE_INPUT_MESSAGES` and `OPENINFERENCE_HIDE_LLM_PROMPTS`.
+
**Python TraceConfig:**
```python
@@ -37,7 +40,8 @@ from openinference.instrumentation import TraceConfig
config = TraceConfig(
hide_inputs=True,
hide_outputs=True,
- hide_input_messages=True
+ hide_input_messages=True,
+ hide_llm_tools=True,
)
register(trace_config=config)
```
diff --git a/skills/phoenix-tracing/references/production-typescript.md b/skills/phoenix-tracing/references/production-typescript.md
index 41837c833..7fb9fa0b0 100644
--- a/skills/phoenix-tracing/references/production-typescript.md
+++ b/skills/phoenix-tracing/references/production-typescript.md
@@ -74,9 +74,12 @@ export OPENINFERENCE_HIDE_INPUT_MESSAGES=true # Hide LLM input messages
export OPENINFERENCE_HIDE_OUTPUT_MESSAGES=true # Hide LLM output messages
export OPENINFERENCE_HIDE_INPUT_IMAGES=true # Hide image content
export OPENINFERENCE_HIDE_INPUT_TEXT=true # Hide embedding text
+export OPENINFERENCE_HIDE_LLM_TOOLS=true # Hide tool definitions sent to the LLM (llm.tools.*)
export OPENINFERENCE_BASE64_IMAGE_MAX_LENGTH=10000 # Limit image size
```
+> `OPENINFERENCE_HIDE_LLM_TOOLS` is also applied when `OPENINFERENCE_HIDE_INPUTS` is enabled, consistent with `OPENINFERENCE_HIDE_INPUT_MESSAGES` and `OPENINFERENCE_HIDE_LLM_PROMPTS`.
+
**TypeScript TraceConfig:**
```typescript
@@ -86,7 +89,8 @@ import { OpenAIInstrumentation } from "@arizeai/openinference-instrumentation-op
const traceConfig = {
hideInputs: true,
hideOutputs: true,
- hideInputMessages: true
+ hideInputMessages: true,
+ hideLLMTools: true,
};
const instrumentation = new OpenAIInstrumentation({ traceConfig });
diff --git a/skills/phoenix-tracing/references/projects-python.md b/skills/phoenix-tracing/references/projects-python.md
index d9681c126..5cd29b3cc 100644
--- a/skills/phoenix-tracing/references/projects-python.md
+++ b/skills/phoenix-tracing/references/projects-python.md
@@ -57,6 +57,37 @@ register(project_name="my-app-v1")
register(project_name="my-app-v2")
```
+## Via HTTP Header (OTEL Collector / config-based tools)
+
+If you cannot set resource attributes in code (e.g. when using an OTEL Collector or another configuration-driven pipeline), set the `x-project-name` HTTP header on OTLP HTTP exports. The header takes precedence over the `openinference.project.name` resource attribute; every span in the request is routed to that project.
+
+```bash
+# Via OTEL_EXPORTER_OTLP_HEADERS environment variable
+export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:6006"
+export OTEL_EXPORTER_OTLP_HEADERS="x-project-name=my-project"
+```
+
+```python
+# Using the raw OTLP HTTP exporter
+from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
+
+exporter = OTLPSpanExporter(
+ endpoint="http://localhost:6006/v1/traces",
+ headers={"x-project-name": "my-project"},
+)
+```
+
+```yaml
+# OTEL Collector otlphttp exporter
+exporters:
+ otlphttp:
+ endpoint: "http://phoenix:6006"
+ headers:
+ x-project-name: "my-project"
+```
+
+> **Note:** `x-project-name` is only supported by the **HTTP** OTLP endpoint (`/v1/traces`). For gRPC, use the `openinference.project.name` resource attribute instead.
+
## Switching Projects (Python Notebooks Only)
```python
diff --git a/skills/phoenix-tracing/references/projects-typescript.md b/skills/phoenix-tracing/references/projects-typescript.md
index d1249debe..6080eb54f 100644
--- a/skills/phoenix-tracing/references/projects-typescript.md
+++ b/skills/phoenix-tracing/references/projects-typescript.md
@@ -52,3 +52,24 @@ register({ projectName: "chatbot-claude" });
register({ projectName: "my-app-v1" });
register({ projectName: "my-app-v2" });
```
+
+## Via HTTP Header (OTEL Collector / config-based tools)
+
+If you cannot set resource attributes in code (e.g. when using an OTEL Collector or another configuration-driven pipeline), set the `x-project-name` HTTP header on OTLP HTTP exports. The header takes precedence over the `openinference.project.name` resource attribute; every span in the request is routed to that project.
+
+```bash
+# Via OTEL_EXPORTER_OTLP_HEADERS environment variable
+export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:6006"
+export OTEL_EXPORTER_OTLP_HEADERS="x-project-name=my-project"
+```
+
+```yaml
+# OTEL Collector otlphttp exporter
+exporters:
+ otlphttp:
+ endpoint: "http://phoenix:6006"
+ headers:
+ x-project-name: "my-project"
+```
+
+> **Note:** `x-project-name` is only supported by the **HTTP** OTLP endpoint (`/v1/traces`). For gRPC, use the `openinference.project.name` resource attribute instead.