agent-testing

Agent Verifier is a coding agent skill that verifies code against organizational policies, code quality patterns, security requirements, and framework best practices — before code ships. Works with Claude Code, Cursor, Windsurf, and 30+ agents.

security skills devtools code-review code-quality cursor cline windsurf code-verification ai-agent langgraph agent-skills claude-code ai-coding-assistant coding-agent agent-testing agent-verification

Updated May 26, 2026

najeed / ai-agent-eval-harness

Star

The open-source MultiAgentOps evaluation and verification harness for any industry business workflow.

Updated May 28, 2026
Python

mnvsk97 / agentbreak

Star

infrastructure chaos to test the resilience of ai agents

python testing reliability stress-testing developer-tools resilience-testing fault-injection ai-safety chaos-testing ai-agents chaos-engineering llm agent-testing

Updated May 24, 2026
Python

iscale-llc / agentic-react-nextjs-shadcn

Star

GitHub template for agent-testable SaaS apps. Next.js 16 + shadcn/ui + Neon Postgres + agent-browser e2e testing via accessibility tree.

react open-source typescript accessibility nextjs a11y ai-agents saas-template shadcn-ui drizzle-orm neon-postgres agent-testing

Updated Mar 3, 2026
TypeScript

justindobbs / Tracecore

Star

Deterministic runtime for agent evaluation

reliability-engineering specification ai-agents benchmarking-framework autogen fastapi langchain observability-platform ai-evaluation-framework agent-testing agent-benchmark deterministic-testing autoresearch

Updated Mar 25, 2026
Python

simukappu / agentverify

Star

pytest plugin for deterministic testing of AI agents. Assert agent actions, not vibes.

testing pytest pytest-plugin ai-agents llm agent-testing

Updated May 21, 2026
Python

janaraj / volnix

Star

A living world where agents exist as participants alongside NPCs, internal actors, real service APIs, budgets, policies, and consequences.

simulation mcp multi-agent governance swarm-intelligence ai-agents world-simulation llm-agents agent-evaluation agent-testing

Updated Apr 27, 2026
Python

Deep-CodeAI / Agents.KT

Star

Typed Kotlin DSL framework for AI agent systems.

Updated May 29, 2026
Kotlin

qualixar / agentassay

Star

Token-efficient stochastic testing for AI agents. 5-20x cost reduction. 10 framework adapters. Paper: arXiv:2603.02601

pytest regression-testing ai-agents llm-testing agent-evaluation agent-testing token-efficient qualixar

Updated Apr 17, 2026
Python

converra / agent-triage

Star

Diagnose your AI agents in production. Extract policies from prompts, evaluate traces, generate diagnostic reports.

Updated Mar 10, 2026
TypeScript

vitron-ai / themis

Star

Intent-first unit testing framework for AI agents in Node.js and TypeScript.

nodejs testing unit-testing typescript ai test-framework developer-tools ai-agents llm ai-testing agent-testing

Updated May 4, 2026
JavaScript

NeuZhou / agentprobe

Star

Playwright for AI Agents. Test what your agent DOES, not what it SAYS. YAML-first behavioral testing. Catch PII leaks, tool abuse, step explosions. 3200+ tests.

yaml typescript ci-cd ai-safety playwright ai-agent llm-testing tool-calling agent-testing behavior-testing

Updated Apr 7, 2026
TypeScript

bireshpatel / agent-assert

Star

Playwright-based reference implementation for testing AI agents that call tools. Five patterns: tool invocation, behavior contracts, multi-step trace verification, boundary enforcement, and failure observability. Rule-based contract matching - no second LLM required.

typescript ai ai-agents github-actions playwright llm ollama agent-testing

Updated Apr 20, 2026
TypeScript

gabriel-r-machado / AgentGuard

Star

Open-source CLI framework to regression-test AI agents with deterministic rules, provider execution and CI-ready reports.

cli typescript ci-cd openai github-actions llm ai-testing agent-testing

Updated May 5, 2026
TypeScript

pyros-projects / agent-comparison

Star

Qualitative benchmark suite for evaluating AI coding agents and orchestration paradigms on realistic, complex development tasks

orchestration ai-agents ai-benchmarks qualitative-evaluation llm-agents coding-agents agentic-workflows agent-evaluation agent-testing ai-coding-assistants agent-comparison development-tasks

Updated Nov 25, 2025
Python

NYX-305Parad0xLabs / null-arena

Star

Evaluation and competition arena for testing agents, systems, or workflows in structured local-first scenarios.

python infrastructure benchmarking arena simulation evaluation scoring experimentation agent-testing local-f

Updated Mar 19, 2026
Python

Improve this page

Add a description, image, and links to the agent-testing topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the agent-testing topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

agent-testing

Here are 64 public repositories matching this topic...

langwatch / better-agents

langwatch / scenario

LeoYeAI / myclaw-bench

dowhiledev / nomos

Aurite-ai / agent-verifier

najeed / ai-agent-eval-harness

mnvsk97 / agentbreak

iscale-llc / agentic-react-nextjs-shadcn

justindobbs / Tracecore

simukappu / agentverify

janaraj / volnix

Deep-CodeAI / Agents.KT

qualixar / agentassay

converra / agent-triage

vitron-ai / themis

NeuZhou / agentprobe

bireshpatel / agent-assert

gabriel-r-machado / AgentGuard

pyros-projects / agent-comparison

NYX-305Parad0xLabs / null-arena

Improve this page

Add this topic to your repo