diff --git a/openhands/usage/llms/llms.mdx b/openhands/usage/llms/llms.mdx index b09163f6..f9da5fd4 100644 --- a/openhands/usage/llms/llms.mdx +++ b/openhands/usage/llms/llms.mdx @@ -25,23 +25,23 @@ then switch back to a stronger model for planning, debugging, and review. ### Best Cloud Models by Family -| Family | Recommended Model | Model String | OpenHands Index Average | Notes | -|--------|-------------------|--------------|-------------------------|-------| -| Claude | [Claude Opus 4.7](https://github.com/OpenHands/openhands-index-results/tree/main/results/claude-opus-4-7) | `anthropic/claude-opus-4-7` | 68.2 | Best Claude-series result in the OpenHands Index. Use it for complex, long-running software work. Claude Opus 4.6 is close behind at 66.7. | -| GPT | [GPT-5.5](https://github.com/OpenHands/openhands-index-results/tree/main/results/GPT-5.5) | `openai/gpt-5.5` | 65.9 | Best GPT-series result in the OpenHands Index. GPT-5.4 is close behind at 64.3. | -| Gemini | [Gemini 3.1 Pro](https://github.com/OpenHands/openhands-index-results/tree/main/results/Gemini-3.1-Pro) | `gemini/gemini-3.1-pro-preview` | 57.0 | Best Gemini-series result in the OpenHands Index. Use Gemini 3 Flash when cost or latency is more important than top accuracy. | +| Family | Recommended Model | Model String | OpenHands Index Average | +|--------|-------------------|--------------|-------------------------| +| Claude | [claude-opus-4-8](https://github.com/OpenHands/openhands-index-results/tree/main/results/claude-opus-4-8) | Not yet listed | 71.9 | +| GPT | [GPT-5.5](https://github.com/OpenHands/openhands-index-results/tree/main/results/GPT-5.5) | `openai/gpt-5.5` | 65.9 | +| Gemini | [Gemini-3.1-Pro](https://github.com/OpenHands/openhands-index-results/tree/main/results/Gemini-3.1-Pro) | `gemini/gemini-3.1-pro-preview` | 57.0 | ### Strong Open / Open-Weight Models These open or open-weight models have good OpenHands Index scores or are recommended for local OpenHands setups: -| Model | Suggested Model String | OpenHands Index Average | Notes | -|-------|------------------------|-------------------------|-------| -| [GLM-5.1](https://github.com/OpenHands/openhands-index-results/tree/main/results/GLM-5.1) | `openrouter/z-ai/glm-5.1` | 58.2 | Strongest open-weight result currently listed in the OpenHands Index. | -| [Kimi-K2.6](https://github.com/OpenHands/openhands-index-results/tree/main/results/Kimi-K2.6) | `openrouter/moonshotai/kimi-k2.6` | 57.1 | Strong open-weight option, especially for coding and information-gathering tasks. | -| [DeepSeek-V4-Pro](https://github.com/OpenHands/openhands-index-results/tree/main/results/DeepSeek-V4-Pro) | `openrouter/deepseek/deepseek-v4-pro` | 51.3 | Strong coding and test-generation scores; current Index entry covers three benchmarks. | -| [MiniMax-M2.7](https://github.com/OpenHands/openhands-index-results/tree/main/results/MiniMax-M2.7) | `openrouter/minimax/minimax-m2.7` | 43.4 | Recommended as a lower-cost open-weight option with strong SWE-bench and SWT-bench scores. Also available from MiniMax-compatible OpenAI endpoints as `openai/MiniMax-M2.7`. | -| [Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B) | `openai/Qwen3.6-35B-A3B` for local OpenAI-compatible servers, or `openrouter/qwen/qwen3.6-35b-a3b` through OpenRouter | Not yet listed | Recommended local / self-hosted model for OpenHands. It is open-weight, supports a large context window, and is featured in the [local LLM guide](/openhands/usage/llms/local-llms). | +| Model | Suggested Model String | OpenHands Index Average | +|-------|------------------------|-------------------------| +| [GLM-5.1](https://github.com/OpenHands/openhands-index-results/tree/main/results/GLM-5.1) | `openrouter/z-ai/glm-5.1` | 58.2 | +| [Kimi-K2.6](https://github.com/OpenHands/openhands-index-results/tree/main/results/Kimi-K2.6) | `openrouter/moonshotai/kimi-k2.6` | 57.1 | +| [GLM-5](https://github.com/OpenHands/openhands-index-results/tree/main/results/GLM-5) | `openrouter/z-ai/glm-5` | 49.4 | +| [Kimi-K2.5](https://github.com/OpenHands/openhands-index-results/tree/main/results/Kimi-K2.5) | `openrouter/moonshotai/kimi-k2.5` | 49.2 | +| [DeepSeek-V3.2-Reasoner](https://github.com/OpenHands/openhands-index-results/tree/main/results/DeepSeek-V3.2-Reasoner) | `openrouter/deepseek/deepseek-v3.2-reasoner` | 45.7 | Hosted model strings can vary by provider and region. If a model string is not accepted, check the provider console and