docs: clarify BYOK + Custom Inference request path and data flow#138
docs: clarify BYOK + Custom Inference request path and data flow#138hongyi-chen wants to merge 10 commits into
Conversation
The BYOK doc previously said keys are "stored locally" (true) and that Warp "directly routes" requests to the provider (misleading — the Warp Agent harness is server-hosted, so traffic does transit Warp's backend while the key is used in-flight per request). This commit: - Replaces "directly route" language with an explicit 3-step data flow. - Adds a "Why does the request route through Warp's backend?" note explaining the server-side harness. - Adds a sentence to the ZDR section noting BYOK request bodies are not retained, used for training, or logged for analytics. - Tightens the diagram alt text and intro paragraph to remove the same "directly" ambiguity. Co-Authored-By: Oz <oz-agent@warp.dev>
…ok-data-flow # Conflicts: # src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
I'm starting a first review of this pull request. You can view the conversation on Warp. I completed the review and no human review was requested for this pull request. Comment Powered by Oz |
There was a problem hiding this comment.
Overview
This PR clarifies the BYOK documentation by explaining that BYOK agent requests transit Warp's backend and that the user-provided provider key is used in flight. The added data-flow description is useful, but one new privacy/retention statement is broader than the existing privacy documentation supports.
Concerns
- The new ZDR section says Warp does not retain the BYOK request body or log it for analytics, but the privacy documentation describes account-level telemetry and plan settings that can affect AI interaction collection. This should be scoped before merge.
Verdict
Found: 0 critical, 1 important, 0 suggestions
Request changes
Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).
Powered by Oz
…ey.mdx Co-authored-by: oz-for-oss[bot] <277970191+oz-for-oss[bot]@users.noreply.github.com>
Replace specific feature list (Codebase Context, Rules, Secret Redaction, multi-step tool orchestration) with a more general 'Warp's agent harness' reference. Keeps the explanation accurate without enumerating internals that may evolve. Co-Authored-By: Oz <oz-agent@warp.dev>
Issue #11681 reported that the privacy framing on the Custom Inference
endpoint page was misleading: requests are server-hosted through
warp-server, so traffic does transit Warp's backend even though the
endpoint URL and API key are stored locally on the client.
This commit narrows and corrects the privacy claim on the Custom
Inference endpoint page, mirroring the BYOK rewrite already in this PR:
- Replace the blanket 'never synced to the cloud' wording for endpoint
URLs with a narrower, accurate claim: API keys are never synced or
stored on Warp's servers; endpoint URLs and model identifiers may
appear in Warp's usage telemetry, but API keys never do.
- Add an explicit 3-step request flow (harness assembles -> in-flight
key authenticates the call -> response streams back) so the
server-side path is no longer surprising.
- Add a 'Why does the request route through Warp's backend?' callout
matching the BYOK page.
- Tighten the ZDR section to note that prompts/responses transit Warp's
backend without being used for training, and scope the existing
retention bullets to the provider side.
Also align the BYOK headline claim with the same wording ('never synced
or stored on Warp's servers') so both pages converge on a single
phrasing.
Confirmed against warp-server:
- logic/ai/llm/custom_endpoint/client.go:14-21 - the OpenAI-compatible
client is constructed server-side using
hostConfig.CustomEndpointAPIKey() and hostConfig.CustomEndpointBaseURL()
from the request, not from persistent server config.
- logic/ai/llm/user_api_keys/util.go:7 - keys arrive per-request via
Request_Settings_ApiKeys.
Co-Authored-By: Oz <oz-agent@warp.dev>
Per review feedback, simplify the Custom Inference endpoint privacy framing to a single durable claim — API keys are never synced or stored on Warp's servers — without adding a separate caveat about endpoint URL or model identifier telemetry. Co-Authored-By: Oz <oz-agent@warp.dev>
| ## How BYOK works | ||
|
|
||
| When you add your own model API keys in Warp, those keys are stored **locally on your device** and are **never synced to the cloud**. | ||
| When you add your own model API keys in Warp, those keys are stored **locally on your device** (in your OS keychain or equivalent secure storage) and are **never synced or stored on Warp's servers**. |
There was a problem hiding this comment.
| When you add your own model API keys in Warp, those keys are stored **locally on your device** (in your OS keychain or equivalent secure storage) and are **never synced or stored on Warp's servers**. | |
| When you add your own model API keys in Warp, those keys are stored **locally on your device** (in your OS keychain or equivalent secure storage) and are **never retained on Warp's servers**. |
I think people don't understand/care about the sync part, what we want to get across is that your key may cross paths on our servers, but is not stored there.
| Warp uses these API keys when routing your agent requests to the model provider you've configured. | ||
| When you send a prompt using a model with the **key icon**: | ||
|
|
||
| 1. Warp's agent harness on Warp's backend assembles the request from your prompt and conversation context. |
There was a problem hiding this comment.
This sounds like it happens on our servers, is that right? I'd assume it happens locally.
If this is true, I'd perhaps reshuffle this to:
- locally, we put the request together incl your API key
- send it to our servers and assemble the whole prompt/etc, make request to model provider
- stream back from provider through our servers to you
| 2. Your API key is sent up alongside that request and used in-flight to authenticate the call to your chosen model provider (Anthropic, OpenAI, or Google). | ||
| 3. The provider's response streams back through Warp's backend to your client. | ||
|
|
||
| Your API key is held in memory only for the duration of each request — Warp never writes it to disk or to any database. |
There was a problem hiding this comment.
I'd perhaps rather say your API key passes through our servers but it is not stored there.
Per Petra's review feedback: the previous phrasing 'stored locally on your device and never synced or stored on Warp's servers' technically holds but implies too strongly that the API key never leaves the user's machine. The key does transit Warp's backend in-flight per request (see the 3-step flow further down each page). Reframe the headline storage claim on both pages to focus on what the key is for instead of where it isn't: it is stored only on the user's device and used to authenticate requests to the model provider / configured endpoint. The downstream 3-step flow and 'Why does the request route through Warp's backend?' callout remain unchanged and continue to explain the actual transit path. Co-Authored-By: Oz <oz-agent@warp.dev>
Per the remaining review feedback on PR #138: - Replace 'stored only on your device' headline claim with explicit language that the key passes through Warp's servers but is not stored there, mirroring Petra's preferred phrasing. - Reshuffle the 3-step flow so step 1 is local (client pulls the key from secure storage and sends it up) and step 2 explicitly states that the agent harness runs on Warp's backend, answering Petra's question about where assembly happens. - Reword the 'held in memory' sentence to use the same 'passes through but is not stored' framing. Same changes applied in parallel to the Custom Inference Endpoint page. Co-Authored-By: Oz <oz-agent@warp.dev>
Summary
Clarifies the request path and data-flow framing for both BYOK and Custom Inference endpoints in our docs, in response to warpdotdev/warp#11681 and the follow-up triage thread.
The previous wording on both pages combined two claims that aren't equivalent:
warp-server, not from the client.Both the recent r/warpdotdev complaint and issue #11681 (Custom Inference) traced back to this framing.
Per Petra's review, the storage claim on each page now focuses on what the key is for instead of where it isn't: the key is stored only on your device and used to authenticate requests to the model provider / configured endpoint. The 3-step flow and "Why does the request route through Warp's backend?" callout further down each page explain the actual transit path.
Confirmed against
warp-server:logic/ai/llm/anthropic/util/util.go:1032-1034— server overrides the Anthropic SDK API key with the user-provided one per request.logic/ai/llm/user_api_keys/util.go:7— keys arrive in the request payload asRequest_Settings_ApiKeys.logic/ai/llm/llm_role.go:723— server-side model routing applies BYOK preferences viaWithApiKeyConfigApplied.logic/ai/llm/custom_endpoint/client.go:14-21— the OpenAI-compatible client is constructed server-side withoption.WithAPIKey(hostConfig.CustomEndpointAPIKey())andoption.WithBaseURL(...CustomEndpointBaseURL()); both come in on the request.Changes
src/content/docs/agent-platform/inference/bring-your-own-api-key.mdxHow BYOK workssection: replaced the single "directly route your agent requests" line with an explicit 3-step data flow (harness assembles request → key authenticates the call in-flight → response streams back), and clarified that keys live in-memory only for the duration of each request.Why does the request route through Warp's backend?note explaining the server-side harness (same runtime as Agent Mode with Warp-billed models).#how-does-byok-workbut the heading onmainis nowHow BYOK works(slug#how-byok-works).src/content/docs/agent-platform/inference/custom-inference-endpoint.mdxKey features: rewrote the "Local configuration" bullet — endpoint API key is stored only on the device and used to authenticate requests to your endpoint.How it works: replaced the blanket "never synced to Warp's servers" wording with an explicit 3-step request flow mirroring the BYOK rewrite.Why does the request route through Warp's backend?callout matching the BYOK page, explicitly cross-linking to BYOK so readers see the consistent posture.Additional context
python3 .agents/skills/check_for_broken_links/check_links.py --internal-only→ 2702 internal links checked, 0 broken).Conversation: https://staging.warp.dev/conversation/c3a085dc-4658-47c2-9908-e7f56672872f
Run: https://oz.staging.warp.dev/runs/019e665f-0f94-780b-9764-04bdbd28a24b
Plans:
This PR was generated with Oz.