[BOT ISSUE] Groq: stale token limits on `llama-3.1-8b-instant` (max_output 8K → 131K) and `openai/gpt-oss-20b` (max_output 32K → 65K)

## Gap

Two Groq models in `packages/proxy/schema/model_list.json` have stale `max_output_tokens` values that significantly understate their current capabilities per official Groq documentation.

## Current vs correct

| Entry | Field | Current | Correct | Source |
|---|---|---|---|---|
| `llama-3.1-8b-instant` (line 5336) | `max_output_tokens` | 8,192 | **131,072** | [Groq models page](https://console.groq.com/docs/models) |
| `llama-3.1-8b-instant` (line 5336) | `max_input_tokens` | 128,000 | **131,072** | [Groq models page](https://console.groq.com/docs/models) |
| `openai/gpt-oss-20b` (line ~3838) | `max_output_tokens` | 32,768 | **65,536** | [Groq models page](https://console.groq.com/docs/models) |

The `llama-3.1-8b-instant` gap is especially significant — max output has increased 16x from 8K to 131K tokens.

## Suggested changes

Update `llama-3.1-8b-instant`:

```json
"llama-3.1-8b-instant": {
  "format": "openai",
  "flavor": "chat",
  "input_cost_per_mil_tokens": 0.05,
  "output_cost_per_mil_tokens": 0.08,
  "displayName": "Llama 3.1 8B Instant 128k",
  "max_input_tokens": 131072,
  "max_output_tokens": 131072,
  "available_providers": [
    "groq"
  ]
}
```

Update `openai/gpt-oss-20b`:

```json
"openai/gpt-oss-20b": {
  "format": "openai",
  "flavor": "chat",
  "input_cost_per_mil_tokens": 0.075,
  "output_cost_per_mil_tokens": 0.3,
  "displayName": "GPT-OSS 20B",
  "max_input_tokens": 131072,
  "max_output_tokens": 65536,
  "available_providers": [
    "groq"
  ]
}
```

## Verification checklist

- [x] **Cross-source**: Token limits confirmed on the Groq documentation page which serves as both model listing and pricing reference:
  1. [Groq models page](https://console.groq.com/docs/models) — production models table lists context window and max completion tokens for each model
  2. Same page includes pricing ($0.05/$0.08 for llama-3.1-8b-instant, $0.075/$0.30 for gpt-oss-20b) — pricing matches catalog, confirming correct model identification
- [x] **Recent commits**: No recent commit corrects these values
- [x] **ID format**: Existing entries, no ID change needed

## Verification notes

| Field | Source | Notes |
|---|---|---|
| `llama-3.1-8b-instant` context (131,072) | Groq models page | Listed under "CONTEXT WINDOW (TOKENS)" |
| `llama-3.1-8b-instant` max completion (131,072) | Groq models page | Listed under "MAX COMPLETION TOKENS" |
| `openai/gpt-oss-20b` context (131,072) | Groq models page | Already correct in catalog |
| `openai/gpt-oss-20b` max completion (65,536) | Groq models page | Listed under "MAX COMPLETION TOKENS" |
| Pricing (both models) | Groq models page | Already correct in catalog |

## Local files inspected

- `packages/proxy/schema/model_list.json`:
  - `llama-3.1-8b-instant` (line 5336): max_input_tokens: 128000, max_output_tokens: 8192 (both stale)
  - `openai/gpt-oss-20b` (line ~3838): max_output_tokens: 32768 (stale)

## Source URLs

- https://console.groq.com/docs/models


```json
{
  "kind": "cost_update",
  "provider": "groq",
  "models": ["llama-3.1-8b-instant", "openai/gpt-oss-20b"],
  "status": "active",
  "model_specs": {
    "llama-3.1-8b-instant": {
      "format": "openai",
      "flavor": "chat",
      "input_cost_per_mil_tokens": 0.05,
      "output_cost_per_mil_tokens": 0.08,
      "displayName": "Llama 3.1 8B Instant 128k",
      "max_input_tokens": 131072,
      "max_output_tokens": 131072,
      "available_providers": ["groq"]
    },
    "openai/gpt-oss-20b": {
      "format": "openai",
      "flavor": "chat",
      "input_cost_per_mil_tokens": 0.075,
      "output_cost_per_mil_tokens": 0.3,
      "displayName": "GPT-OSS 20B",
      "max_input_tokens": 131072,
      "max_output_tokens": 65536,
      "available_providers": ["groq"]
    }
  },
  "source_urls": [
    "https://console.groq.com/docs/models"
  ]
}
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BOT ISSUE] Groq: stale token limits on `llama-3.1-8b-instant` (max_output 8K → 131K) and `openai/gpt-oss-20b` (max_output 32K → 65K) #751

Gap

Current vs correct

Suggested changes

Verification checklist

Verification notes

Local files inspected

Source URLs

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Entry	Field	Current	Correct	Source
`llama-3.1-8b-instant` (line 5336)	`max_output_tokens`	8,192	131,072	Groq models page
`llama-3.1-8b-instant` (line 5336)	`max_input_tokens`	128,000	131,072	Groq models page
`openai/gpt-oss-20b` (line ~3838)	`max_output_tokens`	32,768	65,536	Groq models page

Field	Source	Notes
`llama-3.1-8b-instant` context (131,072)	Groq models page	Listed under "CONTEXT WINDOW (TOKENS)"
`llama-3.1-8b-instant` max completion (131,072)	Groq models page	Listed under "MAX COMPLETION TOKENS"
`openai/gpt-oss-20b` context (131,072)	Groq models page	Already correct in catalog
`openai/gpt-oss-20b` max completion (65,536)	Groq models page	Listed under "MAX COMPLETION TOKENS"
Pricing (both models)	Groq models page	Already correct in catalog

[BOT ISSUE] Groq: stale token limits on llama-3.1-8b-instant (max_output 8K → 131K) and openai/gpt-oss-20b (max_output 32K → 65K) #751

Description

Gap

Current vs correct

Suggested changes

Verification checklist

Verification notes

Local files inspected

Source URLs

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

[BOT ISSUE] Groq: stale token limits on `llama-3.1-8b-instant` (max_output 8K → 131K) and `openai/gpt-oss-20b` (max_output 32K → 65K) #751