Feature Request: Persistent context slot (flat monthly fee)

There's a gap between ephemeral prompt caching (5min/1h TTL) and fine-tuning. For apps with a large, stable system context (~50-100K tokens) and moderate but irregular traffic, neither option fits well:

- **Prompt caching** works during peak hours but the cache expires during off-hours, forcing repeated cold writes when traffic resumes
- **Fine-tuning** is overkill when the context is just static reference data, not behavioral changes
- **Dedicated instances** ($1K+/month) are way too expensive for small-to-medium apps

**Example use case:** A customer support agent for a SaaS product. The system context holds ~80K tokens of product docs, pricing rules, refund policies, and edge cases. Traffic is high during business hours (Europe), dead at night. Every morning the cache is cold again — full write cost, higher latency on first requests.

### Proposal

A "persistent context slot" — a pre-loaded context that doesn't expire, for a flat monthly fee:

- Upload context once via API (or a dashboard)
- Choose which models to pre-load it on (Haiku, Sonnet, Opus — each separately)
- Send requests with just the user message + a `context_id` — no need to re-send the context every time
- No TTL, no cache misses, no cold starts

### Suggested pricing (as a user, this is what I'd be willing to pay)

| Context size | Haiku | Sonnet | Opus |
|-------------|-------|--------|------|
| 50K tokens | $5/mo | $13/mo | $20/mo |
| 100K tokens | $10/mo | $25/mo | $40/mo |
| 250K tokens | $25/mo | $60/mo | $95/mo |

Output tokens billed at normal per-request rates. The slot only covers the stored context.

### Why this matters

Today, keeping 100K tokens warm on Opus with 1h cache costs ~$36/month in refreshes alone — plus $0.05 per request in cache hits. At 1000 req/day, that's ~$1,500/month just in cache hits. Most small-to-medium apps can't justify that.

A persistent slot at $10-40/month would unlock a whole category of apps that don't exist today because the economics don't work with ephemeral caching. It also creates predictable MRR (vs volatile pay-per-use) and natural lock-in (context becomes a hosted asset).

Happy to discuss further.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Persistent context slot (flat monthly fee) #1212

Proposal

Suggested pricing (as a user, this is what I'd be willing to pay)

Why this matters

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Context size	Haiku	Sonnet	Opus
50K tokens	$5/mo	$13/mo	$20/mo
100K tokens	$10/mo	$25/mo	$40/mo
250K tokens	$25/mo	$60/mo	$95/mo

Feature Request: Persistent context slot (flat monthly fee) #1212

Description

Proposal

Suggested pricing (as a user, this is what I'd be willing to pay)

Why this matters

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions