Skip to content

Feature Request: Persistent context slot (flat monthly fee) #1212

@will83

Description

@will83

There's a gap between ephemeral prompt caching (5min/1h TTL) and fine-tuning. For apps with a large, stable system context (~50-100K tokens) and moderate but irregular traffic, neither option fits well:

  • Prompt caching works during peak hours but the cache expires during off-hours, forcing repeated cold writes when traffic resumes
  • Fine-tuning is overkill when the context is just static reference data, not behavioral changes
  • Dedicated instances ($1K+/month) are way too expensive for small-to-medium apps

Example use case: A customer support agent for a SaaS product. The system context holds ~80K tokens of product docs, pricing rules, refund policies, and edge cases. Traffic is high during business hours (Europe), dead at night. Every morning the cache is cold again — full write cost, higher latency on first requests.

Proposal

A "persistent context slot" — a pre-loaded context that doesn't expire, for a flat monthly fee:

  • Upload context once via API (or a dashboard)
  • Choose which models to pre-load it on (Haiku, Sonnet, Opus — each separately)
  • Send requests with just the user message + a context_id — no need to re-send the context every time
  • No TTL, no cache misses, no cold starts

Suggested pricing (as a user, this is what I'd be willing to pay)

Context size Haiku Sonnet Opus
50K tokens $5/mo $13/mo $20/mo
100K tokens $10/mo $25/mo $40/mo
250K tokens $25/mo $60/mo $95/mo

Output tokens billed at normal per-request rates. The slot only covers the stored context.

Why this matters

Today, keeping 100K tokens warm on Opus with 1h cache costs ~$36/month in refreshes alone — plus $0.05 per request in cache hits. At 1000 req/day, that's ~$1,500/month just in cache hits. Most small-to-medium apps can't justify that.

A persistent slot at $10-40/month would unlock a whole category of apps that don't exist today because the economics don't work with ephemeral caching. It also creates predictable MRR (vs volatile pay-per-use) and natural lock-in (context becomes a hosted asset).

Happy to discuss further.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions