Anthropic Backend

The Anthropic backend provides access to Claude models through Anthropic's Messages API. Claude models are known for their strong reasoning capabilities, long context windows, and excellent instruction following.

Overview

The Anthropic backend connects to Anthropic's official API using an API key. It supports both streaming and non-streaming responses, tool calling, and all standard Claude features.

Key Features

Full support for all Claude models (Claude 3.5 Sonnet, Claude 3 Opus, Claude 3 Haiku)
Extended context windows (up to 200K tokens)
Streaming and non-streaming responses
Tool calling (function calling)
Vision capabilities
Strong reasoning and instruction following

Supported API Features

The Anthropic backend connector provides comprehensive support for the official Anthropic Messages API. The following sections detail all supported request parameters, content types, and response formats.

Request Parameters

Parameter	Type	Description
`model`	string	Model identifier (e.g., `claude-3-5-sonnet-20241022`)
`messages`	array	Conversation messages with `role` and `content`
`max_tokens`	integer	Maximum tokens to generate (required)
`system`	string/array	System prompt (string or structured blocks with cache control)
`temperature`	float	Sampling temperature (0.0-1.0)
`top_p`	float	Nucleus sampling parameter
`top_k`	integer	Top-k sampling parameter
`stop_sequences`	array	Custom stop sequences
`stream`	boolean	Enable streaming responses
`metadata`	object	Request metadata including `user_id`
`tools`	array	Tool definitions for function calling
`tool_choice`	string/object	Tool selection strategy
`service_tier`	string	Priority tier (`auto`, `standard_only`)
`thinking`	object	Extended thinking configuration

Extended Thinking

Enable Claude's extended thinking capability to include the model's reasoning process in responses:

{
  "model": "claude-sonnet-4-20250514",
  "max_tokens": 16000,
  "thinking": {
    "type": "enabled",
    "budget_tokens": 10000
  },
  "messages": [
    {"role": "user", "content": "Solve this complex problem..."}
  ]
}

When enabled, responses may include thinking content blocks containing the model's step-by-step reasoning. The budget_tokens parameter controls how many tokens can be used for thinking.

Service Tier

Control request prioritization for high-demand periods:

"auto" - Automatic tier selection (default)
"standard_only" - Force standard capacity tier (may queue during high demand)

{
  "model": "claude-3-5-sonnet-20241022",
  "max_tokens": 1024,
  "service_tier": "auto",
  "messages": [...]
}

Multimodal Content

The backend supports multimodal inputs including images and documents.

Image Content (Base64):

{
  "type": "image",
  "source": {
    "type": "base64",
    "media_type": "image/png",
    "data": "<base64-encoded-image>"
  }
}

Image Content (URL):

{
  "type": "image",
  "source": {
    "type": "url",
    "url": "https://example.com/image.jpg"
  }
}

Document Content (PDF):

{
  "type": "document",
  "source": {
    "type": "base64",
    "media_type": "application/pdf",
    "data": "<base64-encoded-pdf>"
  },
  "title": "document.pdf"
}

System Prompts with Cache Control

System prompts can include cache control hints for prompt caching optimization:

{
  "system": [
    {
      "type": "text",
      "text": "You are a helpful assistant with extensive domain knowledge...",
      "cache_control": {"type": "ephemeral"}
    }
  ],
  "messages": [...]
}

Tool Use / Function Calling

Full support for Anthropic's tool use protocol:

Tool Definition:

{
  "tools": [
    {
      "name": "get_weather",
      "description": "Get current weather for a location",
      "input_schema": {
        "type": "object",
        "properties": {
          "location": {"type": "string", "description": "City name"}
        },
        "required": ["location"]
      }
    }
  ]
}

Tool Choice Options:

Value	Description
`"auto"`	Model decides whether to use tools
`"none"`	Disable tool use for this request
`{"type": "any"}`	Force the model to use a tool
`{"type": "tool", "name": "tool_name"}`	Force use of a specific tool
`{"type": "any", "disable_parallel_tool_use": true}`	Force tool use but disable parallel calls

Streaming Response Events

When stream: true, the backend emits standard Anthropic SSE events:

Event	Description
`message_start`	Initial message metadata (id, model, usage)
`content_block_start`	Start of a content block (text, tool_use, thinking)
`content_block_delta`	Incremental content updates
`content_block_stop`	End of a content block
`message_delta`	Message-level updates (stop_reason, usage)
`message_stop`	End of message stream
`ping`	Keep-alive event

Response Fields

Field	Description
`id`	Unique message identifier (e.g., `msg_01XFDUDYJgAACzvnptvVoYEL`)
`type`	Always `"message"`
`role`	Always `"assistant"`
`content`	Array of content blocks (text, tool_use, thinking)
`model`	Model used for generation
`stop_reason`	Why generation stopped: `end_turn`, `max_tokens`, `stop_sequence`, `tool_use`
`stop_sequence`	The matched stop sequence (if `stop_reason` is `stop_sequence`)
`usage`	Token usage: `input_tokens`, `output_tokens`, `cache_creation_input_tokens`, `cache_read_input_tokens`

Cross-API Translation

When using the Anthropic backend with requests from other API formats (OpenAI, Gemini), the proxy automatically translates parameters:

Anthropic Feature	OpenAI Equivalent	Gemini Equivalent
`thinking`	`extra_body.thinking`	`thinkingConfig`
`service_tier`	`extra_body.service_tier`	N/A
Image blocks	`image_url` content parts	`inlineData` parts
Document blocks	Passthrough	Passthrough
`tool_choice: any`	`tool_choice: required`	`toolConfig.mode: ANY`
`stop_sequences`	`stop`	`stopSequences`

Configuration

Environment Variables

export ANTHROPIC_API_KEY="sk-ant-..."

CLI Arguments

# Start proxy with Anthropic as default backend
python -m src.core.cli --default-backend anthropic

# With specific model
python -m src.core.cli --default-backend anthropic --force-model claude-3-5-sonnet-20241022

YAML Configuration

# config.yaml
backends:
  anthropic:
    type: anthropic

default_backend: anthropic

Usage Examples

Basic Chat Completion

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_PROXY_KEY" \
  -d '{
    "model": "claude-3-5-sonnet-20241022",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ]
  }'

Using Anthropic Messages API Directly

The proxy also exposes the native Anthropic Messages API:

curl -X POST http://localhost:8000/anthropic/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: YOUR_PROXY_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-3-5-sonnet-20241022",
    "max_tokens": 1024,
    "messages": [
      {"role": "user", "content": "Hello, Claude!"}
    ]
  }'

Streaming Response

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_PROXY_KEY" \
  -d '{
    "model": "claude-3-5-sonnet-20241022",
    "messages": [
      {"role": "user", "content": "Write a detailed explanation"}
    ],
    "stream": true
  }'

With Tool Calling

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_PROXY_KEY" \
  -d '{
    "model": "claude-3-5-sonnet-20241022",
    "messages": [
      {"role": "user", "content": "What is the weather in London?"}
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get the current weather",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {"type": "string"}
            }
          }
        }
      }
    ]
  }'

Long Context Usage

Claude models support very long context windows:

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_PROXY_KEY" \
  -d '{
    "model": "claude-3-5-sonnet-20241022",
    "messages": [
      {"role": "user", "content": "Analyze this entire codebase: [very long content]"}
    ],
    "max_tokens": 4096
  }'

Use Cases

Claude Code Integration

The proxy is designed to work seamlessly with Claude Code (Anthropic's CLI tool).

Main proxy port (namespaced API; no dedicated listener required):

export ANTHROPIC_API_URL=http://localhost:8000/anthropic
export ANTHROPIC_API_KEY=YOUR_PROXY_KEY
claude

Dedicated Anthropic port (root /v1/messages; start the proxy with --anthropic-port or set anthropic_port / ANTHROPIC_PORT):

export ANTHROPIC_API_URL=http://localhost:8001
export ANTHROPIC_API_KEY=YOUR_PROXY_KEY
claude

Complex Reasoning Tasks

Claude models excel at:

Long-form content analysis
Code review and refactoring
Complex problem solving
Detailed explanations
Following multi-step instructions

Development and Testing

Use Claude models for:

Testing different reasoning approaches
Comparing with other providers
Validating instruction following
Long context window testing

Anthropic accounts and API keys

Use the anthropic backend with a normal Anthropic API key. Do not route production traffic through unofficial OAuth-token or Claude-Code credential paths; those conflict with Anthropic’s terms and are not documented here.

Dedicated Anthropic Port

The proxy can expose the Anthropic API on a dedicated port for better compatibility with Anthropic-specific clients:

Configuration

# config.yaml (root-level; see configuration.md)
host: "127.0.0.1"
port: 8000
anthropic_port: 8001  # Omit or null to disable the dedicated Anthropic listener

Or via environment variable:

export ANTHROPIC_PORT=8001

Usage

# Point Claude Code to the dedicated port
export ANTHROPIC_API_URL=http://localhost:8001
export ANTHROPIC_API_KEY=YOUR_PROXY_KEY
claude

Model Parameters

You can specify model parameters using URI syntax:

# With temperature
curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic:claude-3-5-sonnet-20241022?temperature=0.7",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

See URI Model Parameters for more details.

Troubleshooting

401 Unauthorized

Verify your ANTHROPIC_API_KEY is set correctly
Check that the API key is valid and has not expired
Ensure you're using the correct authentication header (x-api-key for native API, Authorization for OpenAI-compatible)

429 Rate Limit Exceeded

Anthropic has rate limits based on your account tier
Consider using API Key Rotation (via multiple backend instances) for load balancing
Use failover to switch to alternative models

Model Not Found

Verify the model name is correct (e.g., claude-3-5-sonnet-20241022)
Check that your API key has access to the requested model
Some models may require special access

Context Window Exceeded

Claude models have large context windows, but they're not unlimited
Use the proxy's context window enforcement to catch issues early
Consider summarizing or chunking very long inputs

Related Features

Model Name Rewrites - Route Claude models to other providers
Hybrid Backend - Combine Claude with other models
Quality Verifier System - Use Claude for response verification

FilesExpand file tree

anthropic.md

Latest commit

History

anthropic.md

File metadata and controls

Anthropic Backend

Overview

Key Features

Supported API Features

Request Parameters

Extended Thinking

Service Tier

Multimodal Content

System Prompts with Cache Control

Tool Use / Function Calling

Streaming Response Events

Response Fields

Cross-API Translation

Configuration

Environment Variables

CLI Arguments

YAML Configuration

Usage Examples

Basic Chat Completion

Using Anthropic Messages API Directly

Streaming Response

With Tool Calling

Long Context Usage

Use Cases

Claude Code Integration

Complex Reasoning Tasks

Development and Testing

Anthropic accounts and API keys

Dedicated Anthropic Port

Configuration

Usage

Model Parameters

Troubleshooting

401 Unauthorized

429 Rate Limit Exceeded

Model Not Found

Context Window Exceeded

Related Features

Related Documentation