From ac6e6fc880d197c351a3f4e3b8556bf1839dd533 Mon Sep 17 00:00:00 2001
From: Hong Yi Chen <hongyigma@gmail.com>
Date: Tue, 26 May 2026 11:05:21 -0700
Subject: [PATCH 1/8] docs: clarify BYOK request path and data flow
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The BYOK doc previously said keys are "stored locally" (true) and that
Warp "directly routes" requests to the provider (misleading — the Warp
Agent harness is server-hosted, so traffic does transit Warp's backend
while the key is used in-flight per request).

This commit:
- Replaces "directly route" language with an explicit 3-step data flow.
- Adds a "Why does the request route through Warp's backend?" note
  explaining the server-side harness.
- Adds a sentence to the ZDR section noting BYOK request bodies are not
  retained, used for training, or logged for analytics.
- Tightens the diagram alt text and intro paragraph to remove the same
  "directly" ambiguity.

Co-Authored-By: Oz <oz-agent@warp.dev>
---
 .../bring-your-own-api-key.mdx                | 22 ++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/src/content/docs/support-and-community/plans-and-billing/bring-your-own-api-key.mdx b/src/content/docs/support-and-community/plans-and-billing/bring-your-own-api-key.mdx
index 69aa1dfe..b21f951b 100644
--- a/src/content/docs/support-and-community/plans-and-billing/bring-your-own-api-key.mdx
+++ b/src/content/docs/support-and-community/plans-and-billing/bring-your-own-api-key.mdx
@@ -7,7 +7,7 @@ description: >-
 
 Warp supports **Bring Your Own Key (BYOK)** for users who want to connect Warp’s agent to their own Anthropic, OpenAI, or Google API accounts.
 
-This lets you use your own API keys to access models directly, giving you full control over model selection, billing, and data routing. See [Model Choice](/agent-platform/capabilities/model-choice/) for a list of supported models.
+This lets you use your own API keys for model access, giving you control over model selection, billing, and data routing. See [Model Choice](/agent-platform/capabilities/model-choice/) for a list of supported models.
 
 BYOK provides greater flexibility in model access and ensures Warp **never consumes your** [credits](/support-and-community/plans-and-billing/credits/) for requests routed through your own keys.
 
@@ -21,9 +21,19 @@ BYOK and customer-supplied inference (BYOLLM via Amazon Bedrock or Google Vertex
 
 ## How does BYOK work?
 
-When you add your own model API keys in Warp, those keys are stored **locally on your device** and are **never synced to the cloud**.
+When you add your own model API keys in Warp, those keys are stored **locally on your device** (in your OS keychain or equivalent secure storage) and are **never persisted on Warp's servers**.
 
-Warp uses these API keys to directly route your agent requests to the model provider you've configured.
+When you send a prompt using a model with the **key icon**:
+
+1. The Warp Agent runtime on Warp's backend assembles the request, including your prompt, [Codebase Context](/agent-platform/capabilities/codebase-context/), [Rules](/agent-platform/capabilities/rules/), [Secret Redaction](/support-and-community/privacy-and-security/secret-redaction/), and tool definitions.
+2. Your API key is sent up alongside that request and used in-flight to authenticate the call to your chosen model provider (Anthropic, OpenAI, or Google).
+3. The provider's response streams back through Warp's backend to your client.
+
+Your API key is held in memory only for the duration of each request — Warp never writes it to disk or to any database.
+
+:::note
+**Why does the request route through Warp's backend?** The Warp Agent runtime — the same runtime that powers [Agent Mode](/agent-platform/local-agents/interacting-with-agents/terminal-and-agent-modes/) with Warp-billed models — runs server-side so it can apply Codebase Context, Rules, Secret Redaction, and multi-step tool orchestration consistently. BYOK swaps the credential used to call the provider; it does not change where the harness runs.
+:::
 
 :::caution
 BYOK does not apply to [Cloud Agents](/agent-platform/cloud-agents/overview/). Because your API keys are stored locally on your device, they are not available to cloud-hosted agent runs. Cloud agent runs always consume [Warp credits](/support-and-community/plans-and-billing/credits/).
@@ -35,7 +45,7 @@ When a model is selected using your own key:
 * Costs are billed directly through your model provider account.
 * Warp does not retain or store your API key on any of its servers.
 
-![Diagram showing how Warp routes BYOK agent requests directly through your provider API key, bypassing Warp credits.](../../../../assets/support-and-community/Pricing-Blog-BYOK.png)
+![Diagram showing how Warp authenticates BYOK agent requests with your provider API key, bypassing Warp credits.](../../../../assets/support-and-community/Pricing-Blog-BYOK.png)
 
 ## Enabling BYOK
 
@@ -107,9 +117,11 @@ You can choose to enable **Warp credit fallback**. When enabled, if an agent req
 
 Warp is **SOC 2 compliant** and has **Zero Data Retention (ZDR)** policies with all of its contracted LLM providers. No customer AI data is retained, stored, or used for training by the model providers.
 
+BYOK prompts and responses transit Warp's backend (see [How does BYOK work?](#how-does-byok-work)), but Warp does not retain the request body, does not use it for training, and does not log it for analytics — the same posture that applies to Warp-billed traffic.
+
 However, when you use your own API key:
 
-* Data retention policies depend on your provider’s account settings.
+* Data retention policies on the **provider side** depend on your provider’s account settings.
 * Warp cannot enforce ZDR for requests sent through your API keys.
 * If your Anthropic, OpenAI, or Google account does not have ZDR enabled, your requests may be retained by the provider according to their terms.
 

From 7e785b2d2bd66d148fdf278fd1288aa334d0c2db Mon Sep 17 00:00:00 2001
From: Hong Yi Chen <hongyi@warp.dev>
Date: Tue, 26 May 2026 14:16:39 -0400
Subject: [PATCH 2/8] Update
 src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx

Co-authored-by: oz-for-oss[bot] <277970191+oz-for-oss[bot]@users.noreply.github.com>
---
 .../docs/agent-platform/inference/bring-your-own-api-key.mdx    | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx b/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx
index a81be26d..c42b1e67 100644
--- a/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx
+++ b/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx
@@ -127,7 +127,7 @@ You can choose to enable **Warp credit fallback**. When enabled, if an agent req
 
 Warp is **SOC 2 compliant** and has **Zero Data Retention (ZDR)** policies with all of its contracted LLM providers. No customer AI data is retained, stored, or used for training by the model providers.
 
-BYOK prompts and responses transit Warp's backend (see [How BYOK works](#how-byok-works)), but Warp does not retain the request body, does not use it for training, and does not log it for analytics — the same posture that applies to Warp-billed traffic.
+BYOK prompts and responses transit Warp's backend (see [How BYOK works](#how-byok-works)). Warp does not use this content for training; retention and analytics handling follow the same account-level privacy and telemetry settings that apply to Warp-billed traffic.
 
 However, when you use your own API key:
 

From 27cbfe240c5bfbc15deb19f859f3fafb504ddade Mon Sep 17 00:00:00 2001
From: Hong Yi Chen <hongyigma@gmail.com>
Date: Tue, 26 May 2026 11:19:28 -0700
Subject: [PATCH 3/8] docs: generalize harness explanation in BYOK doc

Replace specific feature list (Codebase Context, Rules, Secret Redaction,
multi-step tool orchestration) with a more general 'Warp's agent harness'
reference. Keeps the explanation accurate without enumerating internals
that may evolve.

Co-Authored-By: Oz <oz-agent@warp.dev>
---
 .../docs/agent-platform/inference/bring-your-own-api-key.mdx  | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx b/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx
index c42b1e67..29e701ea 100644
--- a/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx
+++ b/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx
@@ -35,14 +35,14 @@ When you add your own model API keys in Warp, those keys are stored **locally on
 
 When you send a prompt using a model with the **key icon**:
 
-1. The Warp Agent runtime on Warp's backend assembles the request, including your prompt, [Codebase Context](/agent-platform/capabilities/codebase-context/), [Rules](/agent-platform/capabilities/rules/), [Secret Redaction](/support-and-community/privacy-and-security/secret-redaction/), and tool definitions.
+1. Warp's agent harness on Warp's backend assembles the request from your prompt and conversation context.
 2. Your API key is sent up alongside that request and used in-flight to authenticate the call to your chosen model provider (Anthropic, OpenAI, or Google).
 3. The provider's response streams back through Warp's backend to your client.
 
 Your API key is held in memory only for the duration of each request — Warp never writes it to disk or to any database.
 
 :::note
-**Why does the request route through Warp's backend?** The Warp Agent runtime — the same runtime that powers [Agent Mode](/agent-platform/local-agents/interacting-with-agents/terminal-and-agent-modes/) with Warp-billed models — runs server-side so it can apply Codebase Context, Rules, Secret Redaction, and multi-step tool orchestration consistently. BYOK swaps the credential used to call the provider; it does not change where the harness runs.
+**Why does the request route through Warp's backend?** Warp's agent harness runs server-side — the same runtime that powers [Agent Mode](/agent-platform/local-agents/interacting-with-agents/terminal-and-agent-modes/) with Warp-billed models. BYOK swaps the credential used to call the provider; it does not change where the harness runs.
 :::
 
 :::caution

From 793d4f5e145a6227d39b60bbcaf5278535f83cb4 Mon Sep 17 00:00:00 2001
From: Oz <oz-agent@warp.dev>
Date: Tue, 26 May 2026 23:01:53 +0000
Subject: [PATCH 4/8] docs: clarify Custom Inference endpoint request path and
 data flow

Issue #11681 reported that the privacy framing on the Custom Inference
endpoint page was misleading: requests are server-hosted through
warp-server, so traffic does transit Warp's backend even though the
endpoint URL and API key are stored locally on the client.

This commit narrows and corrects the privacy claim on the Custom
Inference endpoint page, mirroring the BYOK rewrite already in this PR:

- Replace the blanket 'never synced to the cloud' wording for endpoint
  URLs with a narrower, accurate claim: API keys are never synced or
  stored on Warp's servers; endpoint URLs and model identifiers may
  appear in Warp's usage telemetry, but API keys never do.
- Add an explicit 3-step request flow (harness assembles -> in-flight
  key authenticates the call -> response streams back) so the
  server-side path is no longer surprising.
- Add a 'Why does the request route through Warp's backend?' callout
  matching the BYOK page.
- Tighten the ZDR section to note that prompts/responses transit Warp's
  backend without being used for training, and scope the existing
  retention bullets to the provider side.

Also align the BYOK headline claim with the same wording ('never synced
or stored on Warp's servers') so both pages converge on a single
phrasing.

Confirmed against warp-server:
- logic/ai/llm/custom_endpoint/client.go:14-21 - the OpenAI-compatible
  client is constructed server-side using
  hostConfig.CustomEndpointAPIKey() and hostConfig.CustomEndpointBaseURL()
  from the request, not from persistent server config.
- logic/ai/llm/user_api_keys/util.go:7 - keys arrive per-request via
  Request_Settings_ApiKeys.

Co-Authored-By: Oz <oz-agent@warp.dev>
---
 .../inference/bring-your-own-api-key.mdx      |  2 +-
 .../inference/custom-inference-endpoint.mdx   | 24 +++++++++++++++----
 2 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx b/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx
index 29e701ea..68e679df 100644
--- a/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx
+++ b/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx
@@ -31,7 +31,7 @@ Platform credits apply to every cloud agent run on any plan, and to local agent
 
 ## How BYOK works
 
-When you add your own model API keys in Warp, those keys are stored **locally on your device** (in your OS keychain or equivalent secure storage) and are **never persisted on Warp's servers**.
+When you add your own model API keys in Warp, those keys are stored **locally on your device** (in your OS keychain or equivalent secure storage) and are **never synced or stored on Warp's servers**.
 
 When you send a prompt using a model with the **key icon**:
 
diff --git a/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx b/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx
index 604a004c..6eb5c9a8 100644
--- a/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx
+++ b/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx
@@ -18,7 +18,7 @@ Custom inference endpoints are available on Free and all eligible paid plans for
 * **OpenAI-compatible** - Works with any endpoint that implements the OpenAI Chat Completions API.
 * **Provider flexibility** - Use a model router (OpenRouter, LiteLLM), a model provider with an OpenAI-compatible surface (z.ai), or your own internal gateway.
 * **No AI credits consumed for inference** - Inference is billed directly by your endpoint provider. On Business and Enterprise, local agent runs that route through a custom inference endpoint still consume [platform credits](/support-and-community/plans-and-billing/platform-credits/) for Warp's platform infrastructure.
-* **Local configuration** - Endpoint URLs and credentials are stored locally on your device and never synced to the cloud.
+* **Local API key storage** - Your endpoint API key is stored locally on your device (in your OS keychain or equivalent secure storage) and is **never synced or stored on Warp's servers**. Endpoint URLs and model identifiers may appear in Warp's usage telemetry, but the API key itself never does.
 
 ## How it works
 
@@ -29,7 +29,19 @@ A custom inference endpoint expects your endpoint to implement the **OpenAI Chat
 * **z.ai** - A model provider with an OpenAI-compatible API surface for its models.
 * **Internal gateways** - Any in-house service that fronts model providers behind an OpenAI-compatible endpoint (for example, a corporate AI gateway with logging, redaction, or access control).
 
-When you configure a custom inference endpoint, Warp stores the endpoint URL, model identifiers, and credentials **locally on your device**. They are never synced to Warp's servers.
+When you configure a custom inference endpoint, your endpoint URL, model identifiers, and API key are stored **locally on your device**. The API key itself is **never synced or stored on Warp's servers**.
+
+When you send a prompt using an endpoint-routed model:
+
+1. Warp's agent harness on Warp's backend assembles the request from your prompt and conversation context.
+2. Your endpoint URL and API key are sent up alongside that request and used in-flight to call your configured endpoint.
+3. Your endpoint's response streams back through Warp's backend to your client.
+
+Your API key is held in memory only for the duration of each request — Warp never writes it to disk or to any database. Endpoint URLs and model identifiers may be recorded in Warp's usage telemetry; API keys are never included.
+
+:::note
+**Why does the request route through Warp's backend?** Warp's agent harness runs server-side — the same runtime that powers [Agent Mode](/agent-platform/local-agents/interacting-with-agents/terminal-and-agent-modes/) with Warp-billed models and [BYOK](/agent-platform/inference/bring-your-own-api-key/). A custom inference endpoint swaps the upstream destination and credential; it does not change where the harness runs.
+:::
 
 :::caution
 Custom inference endpoints don't apply to [Cloud Agents](/agent-platform/cloud-agents/overview/). Because the configuration is stored locally, it isn't available to cloud-hosted agent runs. Cloud agent runs always consume [Warp credits](/support-and-community/plans-and-billing/credits/).
@@ -39,7 +51,7 @@ When a model routed through your endpoint is selected:
 
 * Warp **doesn't consume** your [AI credits](/support-and-community/plans-and-billing/credits/) for that request.
 * Costs are billed directly by your endpoint provider.
-* Warp doesn't retain or store your endpoint credentials on any of its servers.
+* Warp doesn't retain or store your API key on any of its servers.
 
 ## Enabling a custom inference endpoint
 
@@ -78,13 +90,15 @@ Some AI-powered features (Codebase Context, Active AI recommendations, cloud age
 
 Warp is **SOC 2 compliant** and has **Zero Data Retention (ZDR)** agreements with all of its contracted LLM providers.
 
+Custom inference endpoint prompts and responses transit Warp's backend (see [How it works](#how-it-works)). Warp does not use this content for training; retention and analytics handling follow the same account-level privacy and telemetry settings that apply to Warp-billed traffic.
+
 When you use a custom inference endpoint:
 
-* Data retention is determined by **your endpoint provider** and any upstream model providers they route to.
+* Data retention on the **provider side** is determined by your endpoint provider and any upstream model providers they route to.
 * Warp **cannot enforce ZDR** for requests sent through a custom inference endpoint.
 * If your endpoint provider does not have ZDR with the underlying model provider, your requests may be retained according to their terms.
 
-Review your endpoint provider's data handling and retention policies before routing sensitive prompts through a custom inference endpoint.
+Warp itself never stores your endpoint API key. Review your endpoint provider's data handling and retention policies before routing sensitive prompts through a custom inference endpoint.
 
 ## Centrally managed configuration
 

From 86111bd48425077621bbeaad649ebf48da63473d Mon Sep 17 00:00:00 2001
From: Oz <oz-agent@warp.dev>
Date: Tue, 26 May 2026 23:05:46 +0000
Subject: [PATCH 5/8] docs: drop telemetry caveat from Custom Inference doc
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Per review feedback, simplify the Custom Inference endpoint privacy
framing to a single durable claim — API keys are never synced or stored
on Warp's servers — without adding a separate caveat about endpoint
URL or model identifier telemetry.

Co-Authored-By: Oz <oz-agent@warp.dev>
---
 .../agent-platform/inference/custom-inference-endpoint.mdx    | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx b/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx
index 6eb5c9a8..66c8ceed 100644
--- a/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx
+++ b/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx
@@ -18,7 +18,7 @@ Custom inference endpoints are available on Free and all eligible paid plans for
 * **OpenAI-compatible** - Works with any endpoint that implements the OpenAI Chat Completions API.
 * **Provider flexibility** - Use a model router (OpenRouter, LiteLLM), a model provider with an OpenAI-compatible surface (z.ai), or your own internal gateway.
 * **No AI credits consumed for inference** - Inference is billed directly by your endpoint provider. On Business and Enterprise, local agent runs that route through a custom inference endpoint still consume [platform credits](/support-and-community/plans-and-billing/platform-credits/) for Warp's platform infrastructure.
-* **Local API key storage** - Your endpoint API key is stored locally on your device (in your OS keychain or equivalent secure storage) and is **never synced or stored on Warp's servers**. Endpoint URLs and model identifiers may appear in Warp's usage telemetry, but the API key itself never does.
+* **Local API key storage** - Your endpoint API key is stored locally on your device (in your OS keychain or equivalent secure storage) and is **never synced or stored on Warp's servers**.
 
 ## How it works
 
@@ -37,7 +37,7 @@ When you send a prompt using an endpoint-routed model:
 2. Your endpoint URL and API key are sent up alongside that request and used in-flight to call your configured endpoint.
 3. Your endpoint's response streams back through Warp's backend to your client.
 
-Your API key is held in memory only for the duration of each request — Warp never writes it to disk or to any database. Endpoint URLs and model identifiers may be recorded in Warp's usage telemetry; API keys are never included.
+Your API key is held in memory only for the duration of each request — Warp never writes it to disk or to any database.
 
 :::note
 **Why does the request route through Warp's backend?** Warp's agent harness runs server-side — the same runtime that powers [Agent Mode](/agent-platform/local-agents/interacting-with-agents/terminal-and-agent-modes/) with Warp-billed models and [BYOK](/agent-platform/inference/bring-your-own-api-key/). A custom inference endpoint swaps the upstream destination and credential; it does not change where the harness runs.

From 868637b77402ea047c47fbb7dbf46109ac49f679 Mon Sep 17 00:00:00 2001
From: Oz <oz-agent@warp.dev>
Date: Wed, 27 May 2026 16:00:43 +0000
Subject: [PATCH 6/8] docs: soften BYOK / Custom Inference key-storage claim
 per review

Per Petra's review feedback: the previous phrasing 'stored locally on
your device and never synced or stored on Warp's servers' technically
holds but implies too strongly that the API key never leaves the
user's machine. The key does transit Warp's backend in-flight per
request (see the 3-step flow further down each page).

Reframe the headline storage claim on both pages to focus on what the
key is for instead of where it isn't: it is stored only on the user's
device and used to authenticate requests to the model provider /
configured endpoint. The downstream 3-step flow and 'Why does the
request route through Warp's backend?' callout remain unchanged and
continue to explain the actual transit path.

Co-Authored-By: Oz <oz-agent@warp.dev>
---
 .../docs/agent-platform/inference/bring-your-own-api-key.mdx  | 2 +-
 .../agent-platform/inference/custom-inference-endpoint.mdx    | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx b/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx
index 68e679df..583f4e9b 100644
--- a/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx
+++ b/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx
@@ -31,7 +31,7 @@ Platform credits apply to every cloud agent run on any plan, and to local agent
 
 ## How BYOK works
 
-When you add your own model API keys in Warp, those keys are stored **locally on your device** (in your OS keychain or equivalent secure storage) and are **never synced or stored on Warp's servers**.
+When you add your own model API keys in Warp, those keys are stored **only on your device** (in your OS keychain or equivalent secure storage) and are used to authenticate requests to your chosen model provider.
 
 When you send a prompt using a model with the **key icon**:
 
diff --git a/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx b/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx
index 66c8ceed..7ea07563 100644
--- a/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx
+++ b/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx
@@ -18,7 +18,7 @@ Custom inference endpoints are available on Free and all eligible paid plans for
 * **OpenAI-compatible** - Works with any endpoint that implements the OpenAI Chat Completions API.
 * **Provider flexibility** - Use a model router (OpenRouter, LiteLLM), a model provider with an OpenAI-compatible surface (z.ai), or your own internal gateway.
 * **No AI credits consumed for inference** - Inference is billed directly by your endpoint provider. On Business and Enterprise, local agent runs that route through a custom inference endpoint still consume [platform credits](/support-and-community/plans-and-billing/platform-credits/) for Warp's platform infrastructure.
-* **Local API key storage** - Your endpoint API key is stored locally on your device (in your OS keychain or equivalent secure storage) and is **never synced or stored on Warp's servers**.
+* **Local API key storage** - Your endpoint API key is stored **only on your device** (in your OS keychain or equivalent secure storage) and is used to authenticate requests to your endpoint.
 
 ## How it works
 
@@ -29,7 +29,7 @@ A custom inference endpoint expects your endpoint to implement the **OpenAI Chat
 * **z.ai** - A model provider with an OpenAI-compatible API surface for its models.
 * **Internal gateways** - Any in-house service that fronts model providers behind an OpenAI-compatible endpoint (for example, a corporate AI gateway with logging, redaction, or access control).
 
-When you configure a custom inference endpoint, your endpoint URL, model identifiers, and API key are stored **locally on your device**. The API key itself is **never synced or stored on Warp's servers**.
+When you configure a custom inference endpoint, your endpoint URL, model identifiers, and API key are stored **only on your device**, and the API key is used to authenticate requests to your endpoint.
 
 When you send a prompt using an endpoint-routed model:
 

From 1aa9e115f17a1c529d1a6cc3b9c55def04fed653 Mon Sep 17 00:00:00 2001
From: Oz <oz-agent@warp.dev>
Date: Wed, 27 May 2026 18:35:27 +0000
Subject: [PATCH 7/8] docs: adopt Petra's plain-language framing for BYOK key
 handling

Per the remaining review feedback on PR #138:

- Replace 'stored only on your device' headline claim with explicit
  language that the key passes through Warp's servers but is not
  stored there, mirroring Petra's preferred phrasing.
- Reshuffle the 3-step flow so step 1 is local (client pulls the
  key from secure storage and sends it up) and step 2 explicitly
  states that the agent harness runs on Warp's backend, answering
  Petra's question about where assembly happens.
- Reword the 'held in memory' sentence to use the same
  'passes through but is not stored' framing.

Same changes applied in parallel to the Custom Inference Endpoint page.

Co-Authored-By: Oz <oz-agent@warp.dev>
---
 .../inference/bring-your-own-api-key.mdx               |  8 ++++----
 .../inference/custom-inference-endpoint.mdx            | 10 +++++-----
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx b/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx
index 583f4e9b..025c566c 100644
--- a/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx
+++ b/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx
@@ -31,15 +31,15 @@ Platform credits apply to every cloud agent run on any plan, and to local agent
 
 ## How BYOK works
 
-When you add your own model API keys in Warp, those keys are stored **only on your device** (in your OS keychain or equivalent secure storage) and are used to authenticate requests to your chosen model provider.
+When you add your own model API keys in Warp, those keys are stored **only on your device** (in your OS keychain or equivalent secure storage). Your key passes through Warp's servers in-flight to reach the model provider, but Warp never stores it there.
 
 When you send a prompt using a model with the **key icon**:
 
-1. Warp's agent harness on Warp's backend assembles the request from your prompt and conversation context.
-2. Your API key is sent up alongside that request and used in-flight to authenticate the call to your chosen model provider (Anthropic, OpenAI, or Google).
+1. Your local Warp client pulls your API key from your device's secure storage and sends it up to Warp's backend along with your prompt.
+2. Warp's agent harness, which runs on Warp's backend, assembles the full request (system instructions, conversation context, tools) and uses your key in-flight to call your chosen model provider (Anthropic, OpenAI, or Google).
 3. The provider's response streams back through Warp's backend to your client.
 
-Your API key is held in memory only for the duration of each request — Warp never writes it to disk or to any database.
+Your API key passes through Warp's servers each time you send a request, but Warp never stores it there — it's used only in-flight to call the provider, then discarded.
 
 :::note
 **Why does the request route through Warp's backend?** Warp's agent harness runs server-side — the same runtime that powers [Agent Mode](/agent-platform/local-agents/interacting-with-agents/terminal-and-agent-modes/) with Warp-billed models. BYOK swaps the credential used to call the provider; it does not change where the harness runs.
diff --git a/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx b/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx
index 22265282..596a5659 100644
--- a/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx
+++ b/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx
@@ -18,7 +18,7 @@ Custom inference endpoints are available on Free and all eligible paid plans for
 * **OpenAI-compatible** - Works with any endpoint that implements the OpenAI Chat Completions API.
 * **Provider flexibility** - Use a model router (OpenRouter, LiteLLM), a model provider with an OpenAI-compatible surface (z.ai), or your own internal gateway.
 * **No AI credits consumed for inference** - Inference is billed directly by your endpoint provider. On Business and Enterprise, local agent runs that route through a custom inference endpoint still consume [platform credits](/support-and-community/plans-and-billing/platform-credits/) for Warp's platform infrastructure.
-* **Local API key storage** - Your endpoint API key is stored **only on your device** (in your OS keychain or equivalent secure storage) and is used to authenticate requests to your endpoint.
+* **Local API key storage** - Your endpoint API key is stored **only on your device** (in your OS keychain or equivalent secure storage). It passes through Warp's servers in-flight to reach your endpoint, but Warp never stores it there.
 
 ## How it works
 
@@ -29,15 +29,15 @@ A custom inference endpoint expects your endpoint to implement the **OpenAI Chat
 * **z.ai** - A model provider with an OpenAI-compatible API surface for its models.
 * **Internal gateways** - Any in-house service that fronts model providers behind an OpenAI-compatible endpoint (for example, a corporate AI gateway with logging, redaction, or access control).
 
-When you configure a custom inference endpoint, your endpoint URL, model identifiers, and API key are stored **only on your device**, and the API key is used to authenticate requests to your endpoint.
+When you configure a custom inference endpoint, your endpoint URL, model identifiers, and API key are stored **only on your device**. Your API key passes through Warp's servers in-flight when you send a request, but Warp never stores it there.
 
 When you send a prompt using an endpoint-routed model:
 
-1. Warp's agent harness on Warp's backend assembles the request from your prompt and conversation context.
-2. Your endpoint URL and API key are sent up alongside that request and used in-flight to call your configured endpoint.
+1. Your local Warp client pulls your endpoint URL and API key from your device's secure storage and sends them up to Warp's backend along with your prompt.
+2. Warp's agent harness, which runs on Warp's backend, assembles the full request (system instructions, conversation context, tools) and uses your key in-flight to call your configured endpoint.
 3. Your endpoint's response streams back through Warp's backend to your client.
 
-Your API key is held in memory only for the duration of each request — Warp never writes it to disk or to any database.
+Your API key passes through Warp's servers each time you send a request, but Warp never stores it there — it's used only in-flight to call your endpoint, then discarded.
 
 :::note
 **Why does the request route through Warp's backend?** Warp's agent harness runs server-side — the same runtime that powers [Agent Mode](/agent-platform/local-agents/interacting-with-agents/terminal-and-agent-modes/) with Warp-billed models and [BYOK](/agent-platform/inference/bring-your-own-api-key/). A custom inference endpoint swaps the upstream destination and credential; it does not change where the harness runs.

From ecc08ffce6324cb31d3390b402c1bfe3b39dd4c3 Mon Sep 17 00:00:00 2001
From: Oz <oz-agent@warp.dev>
Date: Wed, 27 May 2026 22:23:00 +0000
Subject: [PATCH 8/8] docs: align BYOK / Custom Inference headline copy with
 in-app wording

The in-app description for the Custom inference settings panel landed
on a simpler framing (warpdotdev/warp#11780):

    API keys are stored only on your device, never on Warp's servers.
    They're used to make requests to your chosen model provider.

Mirror that framing as the headline statement in the docs so the docs
and in-app copy use the same vocabulary. The 3-step request flow and
the 'passes through but isn't stored' elaboration below remain in place
to answer the deeper 'how does the key get to the provider?' question
for users who want detail.

Same change applied in parallel to the Custom Inference Endpoint page
(both the Key features bullet and the How it works intro).

Co-Authored-By: Oz <oz-agent@warp.dev>
---
 .../docs/agent-platform/inference/bring-your-own-api-key.mdx  | 2 +-
 .../agent-platform/inference/custom-inference-endpoint.mdx    | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx b/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx
index 025c566c..9457d07b 100644
--- a/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx
+++ b/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx
@@ -31,7 +31,7 @@ Platform credits apply to every cloud agent run on any plan, and to local agent
 
 ## How BYOK works
 
-When you add your own model API keys in Warp, those keys are stored **only on your device** (in your OS keychain or equivalent secure storage). Your key passes through Warp's servers in-flight to reach the model provider, but Warp never stores it there.
+When you add your own model API keys in Warp, those keys are stored **only on your device** (in your OS keychain or equivalent secure storage), never on Warp's servers. They're used to make requests to your chosen model provider.
 
 When you send a prompt using a model with the **key icon**:
 
diff --git a/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx b/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx
index 596a5659..ba2cfdec 100644
--- a/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx
+++ b/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx
@@ -18,7 +18,7 @@ Custom inference endpoints are available on Free and all eligible paid plans for
 * **OpenAI-compatible** - Works with any endpoint that implements the OpenAI Chat Completions API.
 * **Provider flexibility** - Use a model router (OpenRouter, LiteLLM), a model provider with an OpenAI-compatible surface (z.ai), or your own internal gateway.
 * **No AI credits consumed for inference** - Inference is billed directly by your endpoint provider. On Business and Enterprise, local agent runs that route through a custom inference endpoint still consume [platform credits](/support-and-community/plans-and-billing/platform-credits/) for Warp's platform infrastructure.
-* **Local API key storage** - Your endpoint API key is stored **only on your device** (in your OS keychain or equivalent secure storage). It passes through Warp's servers in-flight to reach your endpoint, but Warp never stores it there.
+* **Local API key storage** - Your endpoint API key is stored **only on your device** (in your OS keychain or equivalent secure storage), never on Warp's servers. It's used to make requests to your configured endpoint.
 
 ## How it works
 
@@ -29,7 +29,7 @@ A custom inference endpoint expects your endpoint to implement the **OpenAI Chat
 * **z.ai** - A model provider with an OpenAI-compatible API surface for its models.
 * **Internal gateways** - Any in-house service that fronts model providers behind an OpenAI-compatible endpoint (for example, a corporate AI gateway with logging, redaction, or access control).
 
-When you configure a custom inference endpoint, your endpoint URL, model identifiers, and API key are stored **only on your device**. Your API key passes through Warp's servers in-flight when you send a request, but Warp never stores it there.
+When you configure a custom inference endpoint, your endpoint URL, model identifiers, and API key are stored **only on your device**, never on Warp's servers. Your API key is used to make requests to your configured endpoint.
 
 When you send a prompt using an endpoint-routed model: