Skip to content

docs: clarify BYOK + Custom Inference request path and data flow#138

Open
hongyi-chen wants to merge 10 commits into
mainfrom
hongyichen/clarify-byok-data-flow
Open

docs: clarify BYOK + Custom Inference request path and data flow#138
hongyi-chen wants to merge 10 commits into
mainfrom
hongyichen/clarify-byok-data-flow

Conversation

@hongyi-chen
Copy link
Copy Markdown
Collaborator

@hongyi-chen hongyi-chen commented May 26, 2026

Summary

Clarifies the request path and data-flow framing for both BYOK and Custom Inference endpoints in our docs, in response to warpdotdev/warp#11681 and the follow-up triage thread.

The previous wording on both pages combined two claims that aren't equivalent:

  1. Storage: API keys are stored locally (true and unchanged).
  2. Transit: requests are routed "directly" to the model provider — that part was misleading. The Warp Agent harness is server-hosted, so requests do transit Warp's backend; the key is passed in-flight per request and used to authenticate the call from warp-server, not from the client.

Both the recent r/warpdotdev complaint and issue #11681 (Custom Inference) traced back to this framing.

Per Petra's review, the storage claim on each page now focuses on what the key is for instead of where it isn't: the key is stored only on your device and used to authenticate requests to the model provider / configured endpoint. The 3-step flow and "Why does the request route through Warp's backend?" callout further down each page explain the actual transit path.

Confirmed against warp-server:

  • logic/ai/llm/anthropic/util/util.go:1032-1034 — server overrides the Anthropic SDK API key with the user-provided one per request.
  • logic/ai/llm/user_api_keys/util.go:7 — keys arrive in the request payload as Request_Settings_ApiKeys.
  • logic/ai/llm/llm_role.go:723 — server-side model routing applies BYOK preferences via WithApiKeyConfigApplied.
  • logic/ai/llm/custom_endpoint/client.go:14-21 — the OpenAI-compatible client is constructed server-side with option.WithAPIKey(hostConfig.CustomEndpointAPIKey()) and option.WithBaseURL(...CustomEndpointBaseURL()); both come in on the request.

Changes

src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx

  • Intro paragraph: dropped "access models directly" wording.
  • How BYOK works section: replaced the single "directly route your agent requests" line with an explicit 3-step data flow (harness assembles request → key authenticates the call in-flight → response streams back), and clarified that keys live in-memory only for the duration of each request.
  • Headline storage claim now uses Petra's framing: keys are stored only on your device and used to authenticate requests to your chosen model provider — without an absolute "never leaves your machine" implication.
  • Added a Why does the request route through Warp's backend? note explaining the server-side harness (same runtime as Agent Mode with Warp-billed models).
  • ZDR section: added a sentence noting BYOK request bodies transit Warp's backend but are not retained, used for training, or logged for analytics — same posture as Warp-billed traffic. Scoped the existing "data retention policies depend on..." bullet to be explicit it's about the provider side.
  • Tightened the diagram alt text from "directly through your provider API key" → "authenticates BYOK agent requests with your provider API key".
  • Fixed a stale anchor: the ZDR section linked to #how-does-byok-work but the heading on main is now How BYOK works (slug #how-byok-works).

src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx

  • Key features: rewrote the "Local configuration" bullet — endpoint API key is stored only on the device and used to authenticate requests to your endpoint.
  • How it works: replaced the blanket "never synced to Warp's servers" wording with an explicit 3-step request flow mirroring the BYOK rewrite.
  • Added a Why does the request route through Warp's backend? callout matching the BYOK page, explicitly cross-linking to BYOK so readers see the consistent posture.
  • ZDR section: added the same "request bodies transit Warp's backend but aren't used for training" framing as BYOK, and scoped the existing retention bullets to the provider side.

Additional context

  • See the implementation plan for the research summary and the breakdown that informed this update.
  • Conversation thread.
  • No structural changes (no new top-level sections, no terminology drift), no sidebar config edits, no redirects required.
  • Internal link checker: 0 broken links (python3 .agents/skills/check_for_broken_links/check_links.py --internal-only → 2702 internal links checked, 0 broken).

Conversation: https://staging.warp.dev/conversation/c3a085dc-4658-47c2-9908-e7f56672872f
Run: https://oz.staging.warp.dev/runs/019e665f-0f94-780b-9764-04bdbd28a24b
Plans:

This PR was generated with Oz.

hongyi-chen and others added 2 commits May 26, 2026 11:05
The BYOK doc previously said keys are "stored locally" (true) and that
Warp "directly routes" requests to the provider (misleading — the Warp
Agent harness is server-hosted, so traffic does transit Warp's backend
while the key is used in-flight per request).

This commit:
- Replaces "directly route" language with an explicit 3-step data flow.
- Adds a "Why does the request route through Warp's backend?" note
  explaining the server-side harness.
- Adds a sentence to the ZDR section noting BYOK request bodies are not
  retained, used for training, or logged for analytics.
- Tightens the diagram alt text and intro paragraph to remove the same
  "directly" ambiguity.

Co-Authored-By: Oz <oz-agent@warp.dev>
…ok-data-flow

# Conflicts:
#	src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx
@cla-bot cla-bot Bot added the cla-signed label May 26, 2026
@vercel
Copy link
Copy Markdown

vercel Bot commented May 26, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
docs Ready Ready Preview, Comment May 27, 2026 6:38pm

Request Review

@oz-for-oss
Copy link
Copy Markdown
Contributor

oz-for-oss Bot commented May 26, 2026

@hongyi-chen

I'm starting a first review of this pull request.

You can view the conversation on Warp.

I completed the review and no human review was requested for this pull request.

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

Powered by Oz

Copy link
Copy Markdown
Contributor

@oz-for-oss oz-for-oss Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overview

This PR clarifies the BYOK documentation by explaining that BYOK agent requests transit Warp's backend and that the user-provided provider key is used in flight. The added data-flow description is useful, but one new privacy/retention statement is broader than the existing privacy documentation supports.

Concerns

  • The new ZDR section says Warp does not retain the BYOK request body or log it for analytics, but the privacy documentation describes account-level telemetry and plan settings that can affect AI interaction collection. This should be scoped before merge.

Verdict

Found: 0 critical, 1 important, 0 suggestions

Request changes

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

Powered by Oz

Comment thread src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx Outdated
…ey.mdx

Co-authored-by: oz-for-oss[bot] <277970191+oz-for-oss[bot]@users.noreply.github.com>
Replace specific feature list (Codebase Context, Rules, Secret Redaction,
multi-step tool orchestration) with a more general 'Warp's agent harness'
reference. Keeps the explanation accurate without enumerating internals
that may evolve.

Co-Authored-By: Oz <oz-agent@warp.dev>
Issue #11681 reported that the privacy framing on the Custom Inference
endpoint page was misleading: requests are server-hosted through
warp-server, so traffic does transit Warp's backend even though the
endpoint URL and API key are stored locally on the client.

This commit narrows and corrects the privacy claim on the Custom
Inference endpoint page, mirroring the BYOK rewrite already in this PR:

- Replace the blanket 'never synced to the cloud' wording for endpoint
  URLs with a narrower, accurate claim: API keys are never synced or
  stored on Warp's servers; endpoint URLs and model identifiers may
  appear in Warp's usage telemetry, but API keys never do.
- Add an explicit 3-step request flow (harness assembles -> in-flight
  key authenticates the call -> response streams back) so the
  server-side path is no longer surprising.
- Add a 'Why does the request route through Warp's backend?' callout
  matching the BYOK page.
- Tighten the ZDR section to note that prompts/responses transit Warp's
  backend without being used for training, and scope the existing
  retention bullets to the provider side.

Also align the BYOK headline claim with the same wording ('never synced
or stored on Warp's servers') so both pages converge on a single
phrasing.

Confirmed against warp-server:
- logic/ai/llm/custom_endpoint/client.go:14-21 - the OpenAI-compatible
  client is constructed server-side using
  hostConfig.CustomEndpointAPIKey() and hostConfig.CustomEndpointBaseURL()
  from the request, not from persistent server config.
- logic/ai/llm/user_api_keys/util.go:7 - keys arrive per-request via
  Request_Settings_ApiKeys.

Co-Authored-By: Oz <oz-agent@warp.dev>
@hongyi-chen hongyi-chen changed the title docs: clarify BYOK request path and data flow docs: clarify BYOK + Custom Inference request path and data flow May 26, 2026
Per review feedback, simplify the Custom Inference endpoint privacy
framing to a single durable claim — API keys are never synced or stored
on Warp's servers — without adding a separate caveat about endpoint
URL or model identifier telemetry.

Co-Authored-By: Oz <oz-agent@warp.dev>
## How BYOK works

When you add your own model API keys in Warp, those keys are stored **locally on your device** and are **never synced to the cloud**.
When you add your own model API keys in Warp, those keys are stored **locally on your device** (in your OS keychain or equivalent secure storage) and are **never synced or stored on Warp's servers**.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When you add your own model API keys in Warp, those keys are stored **locally on your device** (in your OS keychain or equivalent secure storage) and are **never synced or stored on Warp's servers**.
When you add your own model API keys in Warp, those keys are stored **locally on your device** (in your OS keychain or equivalent secure storage) and are **never retained on Warp's servers**.

I think people don't understand/care about the sync part, what we want to get across is that your key may cross paths on our servers, but is not stored there.

Warp uses these API keys when routing your agent requests to the model provider you've configured.
When you send a prompt using a model with the **key icon**:

1. Warp's agent harness on Warp's backend assembles the request from your prompt and conversation context.
Copy link
Copy Markdown
Collaborator

@petradonka petradonka May 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like it happens on our servers, is that right? I'd assume it happens locally.

If this is true, I'd perhaps reshuffle this to:

  1. locally, we put the request together incl your API key
  2. send it to our servers and assemble the whole prompt/etc, make request to model provider
  3. stream back from provider through our servers to you

2. Your API key is sent up alongside that request and used in-flight to authenticate the call to your chosen model provider (Anthropic, OpenAI, or Google).
3. The provider's response streams back through Warp's backend to your client.

Your API key is held in memory only for the duration of each request — Warp never writes it to disk or to any database.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd perhaps rather say your API key passes through our servers but it is not stored there.

Per Petra's review feedback: the previous phrasing 'stored locally on
your device and never synced or stored on Warp's servers' technically
holds but implies too strongly that the API key never leaves the
user's machine. The key does transit Warp's backend in-flight per
request (see the 3-step flow further down each page).

Reframe the headline storage claim on both pages to focus on what the
key is for instead of where it isn't: it is stored only on the user's
device and used to authenticate requests to the model provider /
configured endpoint. The downstream 3-step flow and 'Why does the
request route through Warp's backend?' callout remain unchanged and
continue to explain the actual transit path.

Co-Authored-By: Oz <oz-agent@warp.dev>
Per the remaining review feedback on PR #138:

- Replace 'stored only on your device' headline claim with explicit
  language that the key passes through Warp's servers but is not
  stored there, mirroring Petra's preferred phrasing.
- Reshuffle the 3-step flow so step 1 is local (client pulls the
  key from secure storage and sends it up) and step 2 explicitly
  states that the agent harness runs on Warp's backend, answering
  Petra's question about where assembly happens.
- Reword the 'held in memory' sentence to use the same
  'passes through but is not stored' framing.

Same changes applied in parallel to the Custom Inference Endpoint page.

Co-Authored-By: Oz <oz-agent@warp.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants