From ac6e6fc880d197c351a3f4e3b8556bf1839dd533 Mon Sep 17 00:00:00 2001 From: Hong Yi Chen Date: Tue, 26 May 2026 11:05:21 -0700 Subject: [PATCH 1/8] docs: clarify BYOK request path and data flow MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The BYOK doc previously said keys are "stored locally" (true) and that Warp "directly routes" requests to the provider (misleading — the Warp Agent harness is server-hosted, so traffic does transit Warp's backend while the key is used in-flight per request). This commit: - Replaces "directly route" language with an explicit 3-step data flow. - Adds a "Why does the request route through Warp's backend?" note explaining the server-side harness. - Adds a sentence to the ZDR section noting BYOK request bodies are not retained, used for training, or logged for analytics. - Tightens the diagram alt text and intro paragraph to remove the same "directly" ambiguity. Co-Authored-By: Oz --- .../bring-your-own-api-key.mdx | 22 ++++++++++++++----- 1 file changed, 17 insertions(+), 5 deletions(-) diff --git a/src/content/docs/support-and-community/plans-and-billing/bring-your-own-api-key.mdx b/src/content/docs/support-and-community/plans-and-billing/bring-your-own-api-key.mdx index 69aa1dfe..b21f951b 100644 --- a/src/content/docs/support-and-community/plans-and-billing/bring-your-own-api-key.mdx +++ b/src/content/docs/support-and-community/plans-and-billing/bring-your-own-api-key.mdx @@ -7,7 +7,7 @@ description: >- Warp supports **Bring Your Own Key (BYOK)** for users who want to connect Warp’s agent to their own Anthropic, OpenAI, or Google API accounts. -This lets you use your own API keys to access models directly, giving you full control over model selection, billing, and data routing. See [Model Choice](/agent-platform/capabilities/model-choice/) for a list of supported models. +This lets you use your own API keys for model access, giving you control over model selection, billing, and data routing. See [Model Choice](/agent-platform/capabilities/model-choice/) for a list of supported models. BYOK provides greater flexibility in model access and ensures Warp **never consumes your** [credits](/support-and-community/plans-and-billing/credits/) for requests routed through your own keys. @@ -21,9 +21,19 @@ BYOK and customer-supplied inference (BYOLLM via Amazon Bedrock or Google Vertex ## How does BYOK work? -When you add your own model API keys in Warp, those keys are stored **locally on your device** and are **never synced to the cloud**. +When you add your own model API keys in Warp, those keys are stored **locally on your device** (in your OS keychain or equivalent secure storage) and are **never persisted on Warp's servers**. -Warp uses these API keys to directly route your agent requests to the model provider you've configured. +When you send a prompt using a model with the **key icon**: + +1. The Warp Agent runtime on Warp's backend assembles the request, including your prompt, [Codebase Context](/agent-platform/capabilities/codebase-context/), [Rules](/agent-platform/capabilities/rules/), [Secret Redaction](/support-and-community/privacy-and-security/secret-redaction/), and tool definitions. +2. Your API key is sent up alongside that request and used in-flight to authenticate the call to your chosen model provider (Anthropic, OpenAI, or Google). +3. The provider's response streams back through Warp's backend to your client. + +Your API key is held in memory only for the duration of each request — Warp never writes it to disk or to any database. + +:::note +**Why does the request route through Warp's backend?** The Warp Agent runtime — the same runtime that powers [Agent Mode](/agent-platform/local-agents/interacting-with-agents/terminal-and-agent-modes/) with Warp-billed models — runs server-side so it can apply Codebase Context, Rules, Secret Redaction, and multi-step tool orchestration consistently. BYOK swaps the credential used to call the provider; it does not change where the harness runs. +::: :::caution BYOK does not apply to [Cloud Agents](/agent-platform/cloud-agents/overview/). Because your API keys are stored locally on your device, they are not available to cloud-hosted agent runs. Cloud agent runs always consume [Warp credits](/support-and-community/plans-and-billing/credits/). @@ -35,7 +45,7 @@ When a model is selected using your own key: * Costs are billed directly through your model provider account. * Warp does not retain or store your API key on any of its servers. -![Diagram showing how Warp routes BYOK agent requests directly through your provider API key, bypassing Warp credits.](../../../../assets/support-and-community/Pricing-Blog-BYOK.png) +![Diagram showing how Warp authenticates BYOK agent requests with your provider API key, bypassing Warp credits.](../../../../assets/support-and-community/Pricing-Blog-BYOK.png) ## Enabling BYOK @@ -107,9 +117,11 @@ You can choose to enable **Warp credit fallback**. When enabled, if an agent req Warp is **SOC 2 compliant** and has **Zero Data Retention (ZDR)** policies with all of its contracted LLM providers. No customer AI data is retained, stored, or used for training by the model providers. +BYOK prompts and responses transit Warp's backend (see [How does BYOK work?](#how-does-byok-work)), but Warp does not retain the request body, does not use it for training, and does not log it for analytics — the same posture that applies to Warp-billed traffic. + However, when you use your own API key: -* Data retention policies depend on your provider’s account settings. +* Data retention policies on the **provider side** depend on your provider’s account settings. * Warp cannot enforce ZDR for requests sent through your API keys. * If your Anthropic, OpenAI, or Google account does not have ZDR enabled, your requests may be retained by the provider according to their terms. From 7e785b2d2bd66d148fdf278fd1288aa334d0c2db Mon Sep 17 00:00:00 2001 From: Hong Yi Chen Date: Tue, 26 May 2026 14:16:39 -0400 Subject: [PATCH 2/8] Update src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx Co-authored-by: oz-for-oss[bot] <277970191+oz-for-oss[bot]@users.noreply.github.com> --- .../docs/agent-platform/inference/bring-your-own-api-key.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx b/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx index a81be26d..c42b1e67 100644 --- a/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx +++ b/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx @@ -127,7 +127,7 @@ You can choose to enable **Warp credit fallback**. When enabled, if an agent req Warp is **SOC 2 compliant** and has **Zero Data Retention (ZDR)** policies with all of its contracted LLM providers. No customer AI data is retained, stored, or used for training by the model providers. -BYOK prompts and responses transit Warp's backend (see [How BYOK works](#how-byok-works)), but Warp does not retain the request body, does not use it for training, and does not log it for analytics — the same posture that applies to Warp-billed traffic. +BYOK prompts and responses transit Warp's backend (see [How BYOK works](#how-byok-works)). Warp does not use this content for training; retention and analytics handling follow the same account-level privacy and telemetry settings that apply to Warp-billed traffic. However, when you use your own API key: From 27cbfe240c5bfbc15deb19f859f3fafb504ddade Mon Sep 17 00:00:00 2001 From: Hong Yi Chen Date: Tue, 26 May 2026 11:19:28 -0700 Subject: [PATCH 3/8] docs: generalize harness explanation in BYOK doc Replace specific feature list (Codebase Context, Rules, Secret Redaction, multi-step tool orchestration) with a more general 'Warp's agent harness' reference. Keeps the explanation accurate without enumerating internals that may evolve. Co-Authored-By: Oz --- .../docs/agent-platform/inference/bring-your-own-api-key.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx b/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx index c42b1e67..29e701ea 100644 --- a/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx +++ b/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx @@ -35,14 +35,14 @@ When you add your own model API keys in Warp, those keys are stored **locally on When you send a prompt using a model with the **key icon**: -1. The Warp Agent runtime on Warp's backend assembles the request, including your prompt, [Codebase Context](/agent-platform/capabilities/codebase-context/), [Rules](/agent-platform/capabilities/rules/), [Secret Redaction](/support-and-community/privacy-and-security/secret-redaction/), and tool definitions. +1. Warp's agent harness on Warp's backend assembles the request from your prompt and conversation context. 2. Your API key is sent up alongside that request and used in-flight to authenticate the call to your chosen model provider (Anthropic, OpenAI, or Google). 3. The provider's response streams back through Warp's backend to your client. Your API key is held in memory only for the duration of each request — Warp never writes it to disk or to any database. :::note -**Why does the request route through Warp's backend?** The Warp Agent runtime — the same runtime that powers [Agent Mode](/agent-platform/local-agents/interacting-with-agents/terminal-and-agent-modes/) with Warp-billed models — runs server-side so it can apply Codebase Context, Rules, Secret Redaction, and multi-step tool orchestration consistently. BYOK swaps the credential used to call the provider; it does not change where the harness runs. +**Why does the request route through Warp's backend?** Warp's agent harness runs server-side — the same runtime that powers [Agent Mode](/agent-platform/local-agents/interacting-with-agents/terminal-and-agent-modes/) with Warp-billed models. BYOK swaps the credential used to call the provider; it does not change where the harness runs. ::: :::caution From 793d4f5e145a6227d39b60bbcaf5278535f83cb4 Mon Sep 17 00:00:00 2001 From: Oz Date: Tue, 26 May 2026 23:01:53 +0000 Subject: [PATCH 4/8] docs: clarify Custom Inference endpoint request path and data flow Issue #11681 reported that the privacy framing on the Custom Inference endpoint page was misleading: requests are server-hosted through warp-server, so traffic does transit Warp's backend even though the endpoint URL and API key are stored locally on the client. This commit narrows and corrects the privacy claim on the Custom Inference endpoint page, mirroring the BYOK rewrite already in this PR: - Replace the blanket 'never synced to the cloud' wording for endpoint URLs with a narrower, accurate claim: API keys are never synced or stored on Warp's servers; endpoint URLs and model identifiers may appear in Warp's usage telemetry, but API keys never do. - Add an explicit 3-step request flow (harness assembles -> in-flight key authenticates the call -> response streams back) so the server-side path is no longer surprising. - Add a 'Why does the request route through Warp's backend?' callout matching the BYOK page. - Tighten the ZDR section to note that prompts/responses transit Warp's backend without being used for training, and scope the existing retention bullets to the provider side. Also align the BYOK headline claim with the same wording ('never synced or stored on Warp's servers') so both pages converge on a single phrasing. Confirmed against warp-server: - logic/ai/llm/custom_endpoint/client.go:14-21 - the OpenAI-compatible client is constructed server-side using hostConfig.CustomEndpointAPIKey() and hostConfig.CustomEndpointBaseURL() from the request, not from persistent server config. - logic/ai/llm/user_api_keys/util.go:7 - keys arrive per-request via Request_Settings_ApiKeys. Co-Authored-By: Oz --- .../inference/bring-your-own-api-key.mdx | 2 +- .../inference/custom-inference-endpoint.mdx | 24 +++++++++++++++---- 2 files changed, 20 insertions(+), 6 deletions(-) diff --git a/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx b/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx index 29e701ea..68e679df 100644 --- a/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx +++ b/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx @@ -31,7 +31,7 @@ Platform credits apply to every cloud agent run on any plan, and to local agent ## How BYOK works -When you add your own model API keys in Warp, those keys are stored **locally on your device** (in your OS keychain or equivalent secure storage) and are **never persisted on Warp's servers**. +When you add your own model API keys in Warp, those keys are stored **locally on your device** (in your OS keychain or equivalent secure storage) and are **never synced or stored on Warp's servers**. When you send a prompt using a model with the **key icon**: diff --git a/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx b/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx index 604a004c..6eb5c9a8 100644 --- a/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx +++ b/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx @@ -18,7 +18,7 @@ Custom inference endpoints are available on Free and all eligible paid plans for * **OpenAI-compatible** - Works with any endpoint that implements the OpenAI Chat Completions API. * **Provider flexibility** - Use a model router (OpenRouter, LiteLLM), a model provider with an OpenAI-compatible surface (z.ai), or your own internal gateway. * **No AI credits consumed for inference** - Inference is billed directly by your endpoint provider. On Business and Enterprise, local agent runs that route through a custom inference endpoint still consume [platform credits](/support-and-community/plans-and-billing/platform-credits/) for Warp's platform infrastructure. -* **Local configuration** - Endpoint URLs and credentials are stored locally on your device and never synced to the cloud. +* **Local API key storage** - Your endpoint API key is stored locally on your device (in your OS keychain or equivalent secure storage) and is **never synced or stored on Warp's servers**. Endpoint URLs and model identifiers may appear in Warp's usage telemetry, but the API key itself never does. ## How it works @@ -29,7 +29,19 @@ A custom inference endpoint expects your endpoint to implement the **OpenAI Chat * **z.ai** - A model provider with an OpenAI-compatible API surface for its models. * **Internal gateways** - Any in-house service that fronts model providers behind an OpenAI-compatible endpoint (for example, a corporate AI gateway with logging, redaction, or access control). -When you configure a custom inference endpoint, Warp stores the endpoint URL, model identifiers, and credentials **locally on your device**. They are never synced to Warp's servers. +When you configure a custom inference endpoint, your endpoint URL, model identifiers, and API key are stored **locally on your device**. The API key itself is **never synced or stored on Warp's servers**. + +When you send a prompt using an endpoint-routed model: + +1. Warp's agent harness on Warp's backend assembles the request from your prompt and conversation context. +2. Your endpoint URL and API key are sent up alongside that request and used in-flight to call your configured endpoint. +3. Your endpoint's response streams back through Warp's backend to your client. + +Your API key is held in memory only for the duration of each request — Warp never writes it to disk or to any database. Endpoint URLs and model identifiers may be recorded in Warp's usage telemetry; API keys are never included. + +:::note +**Why does the request route through Warp's backend?** Warp's agent harness runs server-side — the same runtime that powers [Agent Mode](/agent-platform/local-agents/interacting-with-agents/terminal-and-agent-modes/) with Warp-billed models and [BYOK](/agent-platform/inference/bring-your-own-api-key/). A custom inference endpoint swaps the upstream destination and credential; it does not change where the harness runs. +::: :::caution Custom inference endpoints don't apply to [Cloud Agents](/agent-platform/cloud-agents/overview/). Because the configuration is stored locally, it isn't available to cloud-hosted agent runs. Cloud agent runs always consume [Warp credits](/support-and-community/plans-and-billing/credits/). @@ -39,7 +51,7 @@ When a model routed through your endpoint is selected: * Warp **doesn't consume** your [AI credits](/support-and-community/plans-and-billing/credits/) for that request. * Costs are billed directly by your endpoint provider. -* Warp doesn't retain or store your endpoint credentials on any of its servers. +* Warp doesn't retain or store your API key on any of its servers. ## Enabling a custom inference endpoint @@ -78,13 +90,15 @@ Some AI-powered features (Codebase Context, Active AI recommendations, cloud age Warp is **SOC 2 compliant** and has **Zero Data Retention (ZDR)** agreements with all of its contracted LLM providers. +Custom inference endpoint prompts and responses transit Warp's backend (see [How it works](#how-it-works)). Warp does not use this content for training; retention and analytics handling follow the same account-level privacy and telemetry settings that apply to Warp-billed traffic. + When you use a custom inference endpoint: -* Data retention is determined by **your endpoint provider** and any upstream model providers they route to. +* Data retention on the **provider side** is determined by your endpoint provider and any upstream model providers they route to. * Warp **cannot enforce ZDR** for requests sent through a custom inference endpoint. * If your endpoint provider does not have ZDR with the underlying model provider, your requests may be retained according to their terms. -Review your endpoint provider's data handling and retention policies before routing sensitive prompts through a custom inference endpoint. +Warp itself never stores your endpoint API key. Review your endpoint provider's data handling and retention policies before routing sensitive prompts through a custom inference endpoint. ## Centrally managed configuration From 86111bd48425077621bbeaad649ebf48da63473d Mon Sep 17 00:00:00 2001 From: Oz Date: Tue, 26 May 2026 23:05:46 +0000 Subject: [PATCH 5/8] docs: drop telemetry caveat from Custom Inference doc MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Per review feedback, simplify the Custom Inference endpoint privacy framing to a single durable claim — API keys are never synced or stored on Warp's servers — without adding a separate caveat about endpoint URL or model identifier telemetry. Co-Authored-By: Oz --- .../agent-platform/inference/custom-inference-endpoint.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx b/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx index 6eb5c9a8..66c8ceed 100644 --- a/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx +++ b/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx @@ -18,7 +18,7 @@ Custom inference endpoints are available on Free and all eligible paid plans for * **OpenAI-compatible** - Works with any endpoint that implements the OpenAI Chat Completions API. * **Provider flexibility** - Use a model router (OpenRouter, LiteLLM), a model provider with an OpenAI-compatible surface (z.ai), or your own internal gateway. * **No AI credits consumed for inference** - Inference is billed directly by your endpoint provider. On Business and Enterprise, local agent runs that route through a custom inference endpoint still consume [platform credits](/support-and-community/plans-and-billing/platform-credits/) for Warp's platform infrastructure. -* **Local API key storage** - Your endpoint API key is stored locally on your device (in your OS keychain or equivalent secure storage) and is **never synced or stored on Warp's servers**. Endpoint URLs and model identifiers may appear in Warp's usage telemetry, but the API key itself never does. +* **Local API key storage** - Your endpoint API key is stored locally on your device (in your OS keychain or equivalent secure storage) and is **never synced or stored on Warp's servers**. ## How it works @@ -37,7 +37,7 @@ When you send a prompt using an endpoint-routed model: 2. Your endpoint URL and API key are sent up alongside that request and used in-flight to call your configured endpoint. 3. Your endpoint's response streams back through Warp's backend to your client. -Your API key is held in memory only for the duration of each request — Warp never writes it to disk or to any database. Endpoint URLs and model identifiers may be recorded in Warp's usage telemetry; API keys are never included. +Your API key is held in memory only for the duration of each request — Warp never writes it to disk or to any database. :::note **Why does the request route through Warp's backend?** Warp's agent harness runs server-side — the same runtime that powers [Agent Mode](/agent-platform/local-agents/interacting-with-agents/terminal-and-agent-modes/) with Warp-billed models and [BYOK](/agent-platform/inference/bring-your-own-api-key/). A custom inference endpoint swaps the upstream destination and credential; it does not change where the harness runs. From 868637b77402ea047c47fbb7dbf46109ac49f679 Mon Sep 17 00:00:00 2001 From: Oz Date: Wed, 27 May 2026 16:00:43 +0000 Subject: [PATCH 6/8] docs: soften BYOK / Custom Inference key-storage claim per review Per Petra's review feedback: the previous phrasing 'stored locally on your device and never synced or stored on Warp's servers' technically holds but implies too strongly that the API key never leaves the user's machine. The key does transit Warp's backend in-flight per request (see the 3-step flow further down each page). Reframe the headline storage claim on both pages to focus on what the key is for instead of where it isn't: it is stored only on the user's device and used to authenticate requests to the model provider / configured endpoint. The downstream 3-step flow and 'Why does the request route through Warp's backend?' callout remain unchanged and continue to explain the actual transit path. Co-Authored-By: Oz --- .../docs/agent-platform/inference/bring-your-own-api-key.mdx | 2 +- .../agent-platform/inference/custom-inference-endpoint.mdx | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx b/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx index 68e679df..583f4e9b 100644 --- a/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx +++ b/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx @@ -31,7 +31,7 @@ Platform credits apply to every cloud agent run on any plan, and to local agent ## How BYOK works -When you add your own model API keys in Warp, those keys are stored **locally on your device** (in your OS keychain or equivalent secure storage) and are **never synced or stored on Warp's servers**. +When you add your own model API keys in Warp, those keys are stored **only on your device** (in your OS keychain or equivalent secure storage) and are used to authenticate requests to your chosen model provider. When you send a prompt using a model with the **key icon**: diff --git a/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx b/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx index 66c8ceed..7ea07563 100644 --- a/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx +++ b/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx @@ -18,7 +18,7 @@ Custom inference endpoints are available on Free and all eligible paid plans for * **OpenAI-compatible** - Works with any endpoint that implements the OpenAI Chat Completions API. * **Provider flexibility** - Use a model router (OpenRouter, LiteLLM), a model provider with an OpenAI-compatible surface (z.ai), or your own internal gateway. * **No AI credits consumed for inference** - Inference is billed directly by your endpoint provider. On Business and Enterprise, local agent runs that route through a custom inference endpoint still consume [platform credits](/support-and-community/plans-and-billing/platform-credits/) for Warp's platform infrastructure. -* **Local API key storage** - Your endpoint API key is stored locally on your device (in your OS keychain or equivalent secure storage) and is **never synced or stored on Warp's servers**. +* **Local API key storage** - Your endpoint API key is stored **only on your device** (in your OS keychain or equivalent secure storage) and is used to authenticate requests to your endpoint. ## How it works @@ -29,7 +29,7 @@ A custom inference endpoint expects your endpoint to implement the **OpenAI Chat * **z.ai** - A model provider with an OpenAI-compatible API surface for its models. * **Internal gateways** - Any in-house service that fronts model providers behind an OpenAI-compatible endpoint (for example, a corporate AI gateway with logging, redaction, or access control). -When you configure a custom inference endpoint, your endpoint URL, model identifiers, and API key are stored **locally on your device**. The API key itself is **never synced or stored on Warp's servers**. +When you configure a custom inference endpoint, your endpoint URL, model identifiers, and API key are stored **only on your device**, and the API key is used to authenticate requests to your endpoint. When you send a prompt using an endpoint-routed model: From 1aa9e115f17a1c529d1a6cc3b9c55def04fed653 Mon Sep 17 00:00:00 2001 From: Oz Date: Wed, 27 May 2026 18:35:27 +0000 Subject: [PATCH 7/8] docs: adopt Petra's plain-language framing for BYOK key handling Per the remaining review feedback on PR #138: - Replace 'stored only on your device' headline claim with explicit language that the key passes through Warp's servers but is not stored there, mirroring Petra's preferred phrasing. - Reshuffle the 3-step flow so step 1 is local (client pulls the key from secure storage and sends it up) and step 2 explicitly states that the agent harness runs on Warp's backend, answering Petra's question about where assembly happens. - Reword the 'held in memory' sentence to use the same 'passes through but is not stored' framing. Same changes applied in parallel to the Custom Inference Endpoint page. Co-Authored-By: Oz --- .../inference/bring-your-own-api-key.mdx | 8 ++++---- .../inference/custom-inference-endpoint.mdx | 10 +++++----- 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx b/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx index 583f4e9b..025c566c 100644 --- a/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx +++ b/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx @@ -31,15 +31,15 @@ Platform credits apply to every cloud agent run on any plan, and to local agent ## How BYOK works -When you add your own model API keys in Warp, those keys are stored **only on your device** (in your OS keychain or equivalent secure storage) and are used to authenticate requests to your chosen model provider. +When you add your own model API keys in Warp, those keys are stored **only on your device** (in your OS keychain or equivalent secure storage). Your key passes through Warp's servers in-flight to reach the model provider, but Warp never stores it there. When you send a prompt using a model with the **key icon**: -1. Warp's agent harness on Warp's backend assembles the request from your prompt and conversation context. -2. Your API key is sent up alongside that request and used in-flight to authenticate the call to your chosen model provider (Anthropic, OpenAI, or Google). +1. Your local Warp client pulls your API key from your device's secure storage and sends it up to Warp's backend along with your prompt. +2. Warp's agent harness, which runs on Warp's backend, assembles the full request (system instructions, conversation context, tools) and uses your key in-flight to call your chosen model provider (Anthropic, OpenAI, or Google). 3. The provider's response streams back through Warp's backend to your client. -Your API key is held in memory only for the duration of each request — Warp never writes it to disk or to any database. +Your API key passes through Warp's servers each time you send a request, but Warp never stores it there — it's used only in-flight to call the provider, then discarded. :::note **Why does the request route through Warp's backend?** Warp's agent harness runs server-side — the same runtime that powers [Agent Mode](/agent-platform/local-agents/interacting-with-agents/terminal-and-agent-modes/) with Warp-billed models. BYOK swaps the credential used to call the provider; it does not change where the harness runs. diff --git a/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx b/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx index 22265282..596a5659 100644 --- a/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx +++ b/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx @@ -18,7 +18,7 @@ Custom inference endpoints are available on Free and all eligible paid plans for * **OpenAI-compatible** - Works with any endpoint that implements the OpenAI Chat Completions API. * **Provider flexibility** - Use a model router (OpenRouter, LiteLLM), a model provider with an OpenAI-compatible surface (z.ai), or your own internal gateway. * **No AI credits consumed for inference** - Inference is billed directly by your endpoint provider. On Business and Enterprise, local agent runs that route through a custom inference endpoint still consume [platform credits](/support-and-community/plans-and-billing/platform-credits/) for Warp's platform infrastructure. -* **Local API key storage** - Your endpoint API key is stored **only on your device** (in your OS keychain or equivalent secure storage) and is used to authenticate requests to your endpoint. +* **Local API key storage** - Your endpoint API key is stored **only on your device** (in your OS keychain or equivalent secure storage). It passes through Warp's servers in-flight to reach your endpoint, but Warp never stores it there. ## How it works @@ -29,15 +29,15 @@ A custom inference endpoint expects your endpoint to implement the **OpenAI Chat * **z.ai** - A model provider with an OpenAI-compatible API surface for its models. * **Internal gateways** - Any in-house service that fronts model providers behind an OpenAI-compatible endpoint (for example, a corporate AI gateway with logging, redaction, or access control). -When you configure a custom inference endpoint, your endpoint URL, model identifiers, and API key are stored **only on your device**, and the API key is used to authenticate requests to your endpoint. +When you configure a custom inference endpoint, your endpoint URL, model identifiers, and API key are stored **only on your device**. Your API key passes through Warp's servers in-flight when you send a request, but Warp never stores it there. When you send a prompt using an endpoint-routed model: -1. Warp's agent harness on Warp's backend assembles the request from your prompt and conversation context. -2. Your endpoint URL and API key are sent up alongside that request and used in-flight to call your configured endpoint. +1. Your local Warp client pulls your endpoint URL and API key from your device's secure storage and sends them up to Warp's backend along with your prompt. +2. Warp's agent harness, which runs on Warp's backend, assembles the full request (system instructions, conversation context, tools) and uses your key in-flight to call your configured endpoint. 3. Your endpoint's response streams back through Warp's backend to your client. -Your API key is held in memory only for the duration of each request — Warp never writes it to disk or to any database. +Your API key passes through Warp's servers each time you send a request, but Warp never stores it there — it's used only in-flight to call your endpoint, then discarded. :::note **Why does the request route through Warp's backend?** Warp's agent harness runs server-side — the same runtime that powers [Agent Mode](/agent-platform/local-agents/interacting-with-agents/terminal-and-agent-modes/) with Warp-billed models and [BYOK](/agent-platform/inference/bring-your-own-api-key/). A custom inference endpoint swaps the upstream destination and credential; it does not change where the harness runs. From ecc08ffce6324cb31d3390b402c1bfe3b39dd4c3 Mon Sep 17 00:00:00 2001 From: Oz Date: Wed, 27 May 2026 22:23:00 +0000 Subject: [PATCH 8/8] docs: align BYOK / Custom Inference headline copy with in-app wording The in-app description for the Custom inference settings panel landed on a simpler framing (warpdotdev/warp#11780): API keys are stored only on your device, never on Warp's servers. They're used to make requests to your chosen model provider. Mirror that framing as the headline statement in the docs so the docs and in-app copy use the same vocabulary. The 3-step request flow and the 'passes through but isn't stored' elaboration below remain in place to answer the deeper 'how does the key get to the provider?' question for users who want detail. Same change applied in parallel to the Custom Inference Endpoint page (both the Key features bullet and the How it works intro). Co-Authored-By: Oz --- .../docs/agent-platform/inference/bring-your-own-api-key.mdx | 2 +- .../agent-platform/inference/custom-inference-endpoint.mdx | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx b/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx index 025c566c..9457d07b 100644 --- a/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx +++ b/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx @@ -31,7 +31,7 @@ Platform credits apply to every cloud agent run on any plan, and to local agent ## How BYOK works -When you add your own model API keys in Warp, those keys are stored **only on your device** (in your OS keychain or equivalent secure storage). Your key passes through Warp's servers in-flight to reach the model provider, but Warp never stores it there. +When you add your own model API keys in Warp, those keys are stored **only on your device** (in your OS keychain or equivalent secure storage), never on Warp's servers. They're used to make requests to your chosen model provider. When you send a prompt using a model with the **key icon**: diff --git a/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx b/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx index 596a5659..ba2cfdec 100644 --- a/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx +++ b/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx @@ -18,7 +18,7 @@ Custom inference endpoints are available on Free and all eligible paid plans for * **OpenAI-compatible** - Works with any endpoint that implements the OpenAI Chat Completions API. * **Provider flexibility** - Use a model router (OpenRouter, LiteLLM), a model provider with an OpenAI-compatible surface (z.ai), or your own internal gateway. * **No AI credits consumed for inference** - Inference is billed directly by your endpoint provider. On Business and Enterprise, local agent runs that route through a custom inference endpoint still consume [platform credits](/support-and-community/plans-and-billing/platform-credits/) for Warp's platform infrastructure. -* **Local API key storage** - Your endpoint API key is stored **only on your device** (in your OS keychain or equivalent secure storage). It passes through Warp's servers in-flight to reach your endpoint, but Warp never stores it there. +* **Local API key storage** - Your endpoint API key is stored **only on your device** (in your OS keychain or equivalent secure storage), never on Warp's servers. It's used to make requests to your configured endpoint. ## How it works @@ -29,7 +29,7 @@ A custom inference endpoint expects your endpoint to implement the **OpenAI Chat * **z.ai** - A model provider with an OpenAI-compatible API surface for its models. * **Internal gateways** - Any in-house service that fronts model providers behind an OpenAI-compatible endpoint (for example, a corporate AI gateway with logging, redaction, or access control). -When you configure a custom inference endpoint, your endpoint URL, model identifiers, and API key are stored **only on your device**. Your API key passes through Warp's servers in-flight when you send a request, but Warp never stores it there. +When you configure a custom inference endpoint, your endpoint URL, model identifiers, and API key are stored **only on your device**, never on Warp's servers. Your API key is used to make requests to your configured endpoint. When you send a prompt using an endpoint-routed model: