diff --git a/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx b/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx index 75e4034b..9457d07b 100644 --- a/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx +++ b/src/content/docs/agent-platform/inference/bring-your-own-api-key.mdx @@ -7,7 +7,7 @@ description: >- Warp supports **Bring Your Own API Key (BYOK)** for users who want to connect Warp's agents to their own Anthropic, OpenAI, or Google API accounts. -This lets you use your own API keys to access models directly, giving you full control over model selection, billing, and data routing. See [Model Choice](/agent-platform/inference/model-choice/) for a list of supported models. +This lets you use your own API keys for model access, giving you control over model selection, billing, and data routing. See [Model Choice](/agent-platform/inference/model-choice/) for a list of supported models. BYOK provides greater flexibility in model access and ensures Warp **never consumes your** [AI credits](/support-and-community/plans-and-billing/credits/) for requests routed through your own keys. @@ -31,9 +31,19 @@ Platform credits apply to every cloud agent run on any plan, and to local agent ## How BYOK works -When you add your own model API keys in Warp, those keys are stored **locally on your device** and are **never synced to the cloud**. +When you add your own model API keys in Warp, those keys are stored **only on your device** (in your OS keychain or equivalent secure storage), never on Warp's servers. They're used to make requests to your chosen model provider. -Warp uses these API keys when routing your agent requests to the model provider you've configured. +When you send a prompt using a model with the **key icon**: + +1. Your local Warp client pulls your API key from your device's secure storage and sends it up to Warp's backend along with your prompt. +2. Warp's agent harness, which runs on Warp's backend, assembles the full request (system instructions, conversation context, tools) and uses your key in-flight to call your chosen model provider (Anthropic, OpenAI, or Google). +3. The provider's response streams back through Warp's backend to your client. + +Your API key passes through Warp's servers each time you send a request, but Warp never stores it there — it's used only in-flight to call the provider, then discarded. + +:::note +**Why does the request route through Warp's backend?** Warp's agent harness runs server-side — the same runtime that powers [Agent Mode](/agent-platform/local-agents/interacting-with-agents/terminal-and-agent-modes/) with Warp-billed models. BYOK swaps the credential used to call the provider; it does not change where the harness runs. +::: :::caution BYOK does not apply to [Cloud Agents](/agent-platform/cloud-agents/overview/). Because your API keys are stored locally on your device, they are not available to cloud-hosted agent runs. Cloud agent runs always consume [Warp credits](/support-and-community/plans-and-billing/credits/). @@ -45,7 +55,7 @@ When a model is selected using your own key: * Costs are billed directly through your model provider account. * Warp does not retain or store your API key on any of its servers. -![Diagram showing how Warp routes BYOK agent requests directly through your provider API key, bypassing Warp credits.](../../../../assets/support-and-community/Pricing-Blog-BYOK.png) +![Diagram showing how Warp authenticates BYOK agent requests with your provider API key, bypassing Warp credits.](../../../../assets/support-and-community/Pricing-Blog-BYOK.png) ## Enabling BYOK @@ -117,9 +127,11 @@ You can choose to enable **Warp credit fallback**. When enabled, if an agent req Warp is **SOC 2 compliant** and has **Zero Data Retention (ZDR)** policies with all of its contracted LLM providers. No customer AI data is retained, stored, or used for training by the model providers. +BYOK prompts and responses transit Warp's backend (see [How BYOK works](#how-byok-works)). Warp does not use this content for training; retention and analytics handling follow the same account-level privacy and telemetry settings that apply to Warp-billed traffic. + However, when you use your own API key: -* Data retention policies depend on your provider’s account settings. +* Data retention policies on the **provider side** depend on your provider’s account settings. * Warp cannot enforce ZDR for requests sent through your API keys. * If your Anthropic, OpenAI, or Google account does not have ZDR enabled, your requests may be retained by the provider according to their terms. diff --git a/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx b/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx index 23a4d4b9..ba2cfdec 100644 --- a/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx +++ b/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx @@ -18,7 +18,7 @@ Custom inference endpoints are available on Free and all eligible paid plans for * **OpenAI-compatible** - Works with any endpoint that implements the OpenAI Chat Completions API. * **Provider flexibility** - Use a model router (OpenRouter, LiteLLM), a model provider with an OpenAI-compatible surface (z.ai), or your own internal gateway. * **No AI credits consumed for inference** - Inference is billed directly by your endpoint provider. On Business and Enterprise, local agent runs that route through a custom inference endpoint still consume [platform credits](/support-and-community/plans-and-billing/platform-credits/) for Warp's platform infrastructure. -* **Local configuration** - Endpoint URLs and credentials are stored locally on your device and never synced to the cloud. +* **Local API key storage** - Your endpoint API key is stored **only on your device** (in your OS keychain or equivalent secure storage), never on Warp's servers. It's used to make requests to your configured endpoint. ## How it works @@ -29,7 +29,19 @@ A custom inference endpoint expects your endpoint to implement the **OpenAI Chat * **z.ai** - A model provider with an OpenAI-compatible API surface for its models. * **Internal gateways** - Any in-house service that fronts model providers behind an OpenAI-compatible endpoint (for example, a corporate AI gateway with logging, redaction, or access control). -When you configure a custom inference endpoint, Warp stores the endpoint URL, model identifiers, and credentials **locally on your device**. They are never synced to Warp's servers. +When you configure a custom inference endpoint, your endpoint URL, model identifiers, and API key are stored **only on your device**, never on Warp's servers. Your API key is used to make requests to your configured endpoint. + +When you send a prompt using an endpoint-routed model: + +1. Your local Warp client pulls your endpoint URL and API key from your device's secure storage and sends them up to Warp's backend along with your prompt. +2. Warp's agent harness, which runs on Warp's backend, assembles the full request (system instructions, conversation context, tools) and uses your key in-flight to call your configured endpoint. +3. Your endpoint's response streams back through Warp's backend to your client. + +Your API key passes through Warp's servers each time you send a request, but Warp never stores it there — it's used only in-flight to call your endpoint, then discarded. + +:::note +**Why does the request route through Warp's backend?** Warp's agent harness runs server-side — the same runtime that powers [Agent Mode](/agent-platform/local-agents/interacting-with-agents/terminal-and-agent-modes/) with Warp-billed models and [BYOK](/agent-platform/inference/bring-your-own-api-key/). A custom inference endpoint swaps the upstream destination and credential; it does not change where the harness runs. +::: :::caution Custom inference endpoints don't apply to [Cloud Agents](/agent-platform/cloud-agents/overview/). Because the configuration is stored locally, it isn't available to cloud-hosted agent runs. Cloud agent runs always consume [Warp credits](/support-and-community/plans-and-billing/credits/). @@ -39,7 +51,7 @@ When a model routed through your endpoint is selected: * Warp **doesn't consume** your [AI credits](/support-and-community/plans-and-billing/credits/) for that request. * Costs are billed directly by your endpoint provider. -* Warp doesn't retain or store your endpoint credentials on any of its servers. +* Warp doesn't retain or store your API key on any of its servers. ## Enabling a custom inference endpoint @@ -86,13 +98,15 @@ Some AI-powered features (Codebase Context, Active AI recommendations, cloud age Warp is **SOC 2 compliant** and has **Zero Data Retention (ZDR)** agreements with all of its contracted LLM providers. +Custom inference endpoint prompts and responses transit Warp's backend (see [How it works](#how-it-works)). Warp does not use this content for training; retention and analytics handling follow the same account-level privacy and telemetry settings that apply to Warp-billed traffic. + When you use a custom inference endpoint: -* Data retention is determined by **your endpoint provider** and any upstream model providers they route to. +* Data retention on the **provider side** is determined by your endpoint provider and any upstream model providers they route to. * Warp **cannot enforce ZDR** for requests sent through a custom inference endpoint. * If your endpoint provider does not have ZDR with the underlying model provider, your requests may be retained according to their terms. -Review your endpoint provider's data handling and retention policies before routing sensitive prompts through a custom inference endpoint. +Warp itself never stores your endpoint API key. Review your endpoint provider's data handling and retention policies before routing sensitive prompts through a custom inference endpoint. ## Centrally managed configuration