Standards for building agents, better
-
Updated
Feb 22, 2026 - TypeScript
Standards for building agents, better
Agentic testing for agentic codebases
The definitive benchmark for AI agents on OpenClaw. 45 tasks across 4 tiers. Powered by MyClaw.ai
Ship agents you can audit.
The pre-flight check for AI agents
The open-source MultiAgentOps evaluation and verification harness for any industry business workflow.
GitHub template for agent-testable SaaS apps. Next.js 16 + shadcn/ui + Neon Postgres + agent-browser e2e testing via accessibility tree.
Deterministic runtime for agent evaluation
Diagnose your AI agents in production. Extract policies from prompts, evaluate traces, generate diagnostic reports.
A living world where agents exist as participants alongside NPCs, internal actors, real service APIs, budgets, policies, and consequences.
Intent-first unit testing framework for AI agents in Node.js and TypeScript.
Qualitative benchmark suite for evaluating AI coding agents and orchestration paradigms on realistic, complex development tasks
Playwright for AI Agents. Test what your agent DOES, not what it SAYS. YAML-first behavioral testing. Catch PII leaks, tool abuse, step explosions. 3200+ tests.
Typed Kotlin DSL framework for AI agent systems.
Agent testing automation 🤖 by simulating users 👥 and agents 🤝 with judge ⚖️(langwatch-scenario)
"The Operating System for AI Agents. Build, Test, Deploy, Monitor, Govern."
Simulation environment for testing and validating autonomous agents
Holdout scenario evaluation harness for AI agents. Doer/Judge/Adversary/Observer roles, probabilistic satisfaction scoring, append-only JSONL audit trails with integrity hashes. Created Dec 2025.
Evaluation and competition arena for testing agents, systems, or workflows in structured local-first scenarios.
Token-efficient stochastic testing for AI agents. 5-20x cost reduction. 10 framework adapters. Paper: arXiv:2603.02601
Add a description, image, and links to the agent-testing topic page so that developers can more easily learn about it.
To associate your repository with the agent-testing topic, visit your repo's landing page and select "manage topics."