diff --git a/docs.json b/docs.json index 59df76aa..48c0cd60 100644 --- a/docs.json +++ b/docs.json @@ -223,6 +223,7 @@ ] }, "openhands/usage/agent-canvas/customize-and-settings", + "openhands/usage/agent-canvas/critic", "openhands/usage/agent-canvas/acp-agents", "openhands/usage/agent-canvas/development", "openhands/usage/agent-canvas/troubleshooting" diff --git a/openhands/static/img/agent-canvas-critic-result.png b/openhands/static/img/agent-canvas-critic-result.png new file mode 100644 index 00000000..da219f0f Binary files /dev/null and b/openhands/static/img/agent-canvas-critic-result.png differ diff --git a/openhands/static/img/agent-canvas-critic-settings.png b/openhands/static/img/agent-canvas-critic-settings.png new file mode 100644 index 00000000..d8cfea7b Binary files /dev/null and b/openhands/static/img/agent-canvas-critic-settings.png differ diff --git a/openhands/usage/agent-canvas/critic.mdx b/openhands/usage/agent-canvas/critic.mdx new file mode 100644 index 00000000..7bc7d811 --- /dev/null +++ b/openhands/usage/agent-canvas/critic.mdx @@ -0,0 +1,139 @@ +--- +title: Critic +description: Configure critic evaluation and iterative refinement in Agent Canvas. +--- + + + The critic feature is experimental. Configuration options, scoring behavior, + and default models may change as the feature evolves. + + +The critic is an additional evaluation pass that reviews the agent's work and +predicts how likely the task is to succeed. In Agent Canvas, critic results can +appear in the conversation timeline as a success-likelihood score with detected +issue labels. + +Use the critic when you want extra QA feedback for an OpenHands agent +conversation, or when you want the agent to automatically refine its work after +a low critic score. + + + The critic applies to OpenHands agent conversations. Agent Canvas can also run + third-party ACP agents, but those agents manage their own execution loop and + may not expose the same critic evaluation path. + + +## Prerequisites + +Before enabling the critic: + +1. Configure your active LLM in `Settings > LLM`. +2. Prefer the `OpenHands` LLM provider when you want the default hosted critic + path. +3. Start a new conversation after saving critic settings. Existing + conversations keep the settings they were created with. + +## Enable the Critic + +1. Open `Settings > Verification`. +2. Toggle on `Enable Critic`. +3. Configure the `Critic API Key` field: + - If `OpenHands` is selected as your active LLM provider, leave this field + empty. The critic reuses the active provider's OpenHands Provider LLM Key. + - If you are not using the `OpenHands` LLM provider, paste an OpenHands + Provider LLM Key into `Critic API Key`, or provide the API key required by + your custom critic service. +4. Save the settings. +5. Start a new conversation. + +![Agent Canvas Verification settings with Enable Critic, iterative refinement, critic threshold, and Critic API Key guidance](/openhands/static/img/agent-canvas-critic-settings.png) + +The Critic API Key and the OpenHands Provider LLM Key are the same credential +when you use the default OpenHands-hosted critic service. You can find that key +in the `API Keys` tab of [OpenHands Cloud](https://app.all-hands.dev/settings/api-keys). + + + A dedicated `Critic API Key` overrides the active LLM key for critic calls + only. Your main LLM configuration continues to use the key from + `Settings > LLM`. + + +## Enable Iterative Refinement + +Iterative refinement lets the critic send the agent back to improve its work +when the predicted success score is too low. + +1. Open `Settings > Verification`. +2. Toggle on `Enable Critic`. +3. Toggle on `Enable Iterative Refinement`. +4. Optionally switch the settings detail view to `Advanced` or `All`. +5. Adjust: + - `Critic Threshold` - the success score required to stop refining. The + default is `0.6`. + - `Max Refinement Iterations` - the maximum number of retry attempts. The + default is `3`. +6. Save the settings and start a new conversation. + +When refinement is enabled, Agent Canvas will let the conversation continue +after a low critic score until the score passes the threshold or the maximum +iteration count is reached. + +## View Critic Results + +When the critic runs, Agent Canvas shows the result below the agent message or +finish action that was evaluated. The compact view shows the predicted success +likelihood score. You can expand the result to inspect detected issue +categories and probabilities, such as incomplete changes, missing validation, +infrastructure issues, or likely user follow-up patterns. + +![Agent Canvas conversation showing a critic success likelihood score below an agent message](/openhands/static/img/agent-canvas-critic-result.png) + +## Advanced Options + +Some critic settings are hidden until you switch the settings detail view to +`Advanced` or `All`. + +| Setting | Default | Description | +|---------|---------|-------------| +| `Critic Mode` | `finish_and_message` | Runs the critic on finish actions and agent messages. This is the recommended default. | +| `Critic Mode: all_actions` | Off | Runs the critic after every agent action. This gives more feedback but can make conversations significantly slower. | +| `Critic Server URL` | OpenHands hosted critic service | Override the critic service endpoint for custom deployments. | +| `Critic Model Name` | `critic` | Override the model name used by the critic service. | + +Only change `Critic Server URL` or `Critic Model Name` if you are operating a +custom critic service or have been given a deployment-specific endpoint. + +## Troubleshooting + +### Critic Results Do Not Appear + +- Confirm `Enable Critic` is on in `Settings > Verification`. +- Start a new conversation after saving the setting. +- Use the OpenHands agent path. Third-party ACP agents may not expose critic + results. +- Wait until the agent sends a message or finishes a task. With the default + `finish_and_message` mode, the critic does not run after every tool call. + +### Authentication Errors + +If the critic request fails with an API key or authentication error: + +- If `OpenHands` is the active LLM provider, leave `Critic API Key` empty and + confirm the active LLM profile has a saved OpenHands Provider LLM Key. +- If another LLM provider is active, enter an OpenHands Provider LLM Key in + `Critic API Key`. +- If you use a custom critic server, enter the key expected by that service. + +### Conversations Become Slow + +- Keep `Critic Mode` set to `finish_and_message` unless you need per-action + feedback. +- Disable `Enable Iterative Refinement` if you only want passive critic scores. +- Lower `Max Refinement Iterations` if repeated refinement loops are too costly. + +## Related Guides + +- [Customize and Settings](/openhands/usage/agent-canvas/customize-and-settings) +- [LLM Profiles and Model Configuration](/openhands/usage/agent-canvas/llm-profiles) +- [OpenHands LLMs](/openhands/usage/llms/openhands-llms) +- [SDK Critic Guide](/sdk/guides/critic) diff --git a/openhands/usage/agent-canvas/customize-and-settings.mdx b/openhands/usage/agent-canvas/customize-and-settings.mdx index b7bc37be..26808f42 100644 --- a/openhands/usage/agent-canvas/customize-and-settings.mdx +++ b/openhands/usage/agent-canvas/customize-and-settings.mdx @@ -30,7 +30,7 @@ The `Settings` area currently includes the following sections: | `Agent` | Agent behavior and agent-specific capabilities | | `LLM` | Provider, model, API key, and profile configuration | | `Condenser` | Context compression and summarization behavior | -| `Verification` | Approval and verification-related behavior | +| `Verification` | Approval, critic evaluation, and verification-related behavior | | `Application` | UI-level preferences and app behavior | | `Secrets` | Stored secrets used by the active backend | @@ -56,4 +56,5 @@ You might use this split like this: ## Related Guides - [Connect and Manage Backends](/openhands/usage/agent-canvas/backends) +- [Configure the Critic](/openhands/usage/agent-canvas/critic) - [Setup a Pre-built Automation](/openhands/usage/agent-canvas/prebuilt-automations)