Skip to content

chore(pricing): Update vertex-ai pricing#550

Open
siddharthsambharia-portkey wants to merge 37 commits intomainfrom
pricing-update/vertex-ai
Open

chore(pricing): Update vertex-ai pricing#550
siddharthsambharia-portkey wants to merge 37 commits intomainfrom
pricing-update/vertex-ai

Conversation

@siddharthsambharia-portkey
Copy link
Copy Markdown
Collaborator

@siddharthsambharia-portkey siddharthsambharia-portkey commented Mar 17, 2026

🔄 Pricing Update: vertex-ai

📊 Summary (complete_diff mode)

Change Type Count
➕ Models added 2
🔄 Models updated (merged) 23

➕ New Models

  • gemini-2.5-pro-tts
  • gemini-2.5-flash-tts

🔄 Updated Models

  • gemini-2.5-pro
  • gemini-2.5-flash
  • gemini-2.5-flash-lite
  • gemini-2.5-flash-image
  • gemini-2.0-flash-001
  • gemini-2.5-flash-preview-09-2025
  • gemini-2.5-flash-lite-preview-09-2025
  • gemini-3-pro-preview
  • gemini-3-flash-preview
  • gemini-3-pro-image-preview
  • gemini-3.1-pro-preview
  • gemini-3.1-flash-lite-preview
  • gemini-3.1-flash-image-preview
  • veo-3.1-fast-generate-001
  • veo-3.0-fast-generate-preview
  • gemini-embedding-001
  • gemini-embedding-2-preview
  • text-embedding-005
  • text-multilingual-embedding-002
  • text-embedding-large-exp-03-07
  • multimodalembedding@001
  • claude-opus-4-1@20250805
  • claude-opus-4@20250514

Model → Pricing Page Mapping

Google – Gemini (token pricing, $/1M)

Model ID Publisher / Section Source Notes
gemini-2.5-pro Google – Gemini 2.5 Pro API input $1.25, output $10, cache_read $0.31, batch $0.625/$5, web_search 3.5¢, enterprise_web_search 4.5¢
gemini-2.5-flash Google – Gemini 2.5 Flash API input $0.30, output $2.50, cache_read $0.075, batch $0.15/$1.25, web_search 3.5¢, enterprise_web_search 4.5¢
gemini-2.5-flash-lite Google – Gemini 2.5 Flash Lite API input $0.10, output $0.40, cache_read $0.025, batch $0.05/$0.20, web_search 3.5¢, enterprise_web_search 4.5¢
gemini-2.5-flash-image Google – Gemini 2.5 Flash (image variant) API Same token pricing as gemini-2.5-flash + image_token $30/1M
gemini-2.5-flash-preview-09-2025 Google – Gemini 2.5 Flash API Preview alias — priced same as gemini-2.5-flash
gemini-2.5-flash-lite-preview-09-2025 Google – Gemini 2.5 Flash Lite API Preview alias — priced same as gemini-2.5-flash-lite
gemini-2.0-flash-001 Google – Gemini 2.0 Flash API input $0.15, output $0.60, cache_read $0.0375, batch $0.075/$0.30, web_search 3.5¢
gemini-2.0-flash-lite-001 Google – Gemini 2.0 Flash Lite API input $0.075, output $0.30, web_search 3.5¢
gemini-3-pro-preview Google – Gemini 3 Pro API input $2.00, output $12.00, batch $1/$6, web_search 1.4¢, enterprise_web_search 4.5¢
gemini-3-flash-preview Google – Gemini 3 Flash API input $0.50, output $3.00, batch $0.25/$1.50, web_search 1.4¢, enterprise_web_search 4.5¢
gemini-3-pro-image-preview Google – Gemini 3 Pro (image variant) API Same as gemini-3-pro-preview + image_token $120/1M
gemini-3.1-pro-preview Google – Gemini 3.1 Pro API input $2.00, output $12.00, batch $1/$6, web_search 1.4¢, enterprise_web_search 4.5¢
gemini-3.1-flash-lite-preview Google – Gemini 3.1 Flash Lite API input $0.25, output $1.50, batch $0.125/$0.75, web_search 1.4¢, enterprise_web_search 4.5¢
gemini-3.1-flash-image-preview Google – Gemini 3.1 Flash (image variant) API input $0.25, output $1.50 + image_token $60/1M, web_search 1.4¢
gemini-2.5-computer-use-preview-10-2025 Google – Gemini 2.5 (computer use) API – price not found No dedicated pricing row; added with price 0
gemini-2.5-pro-tts Google – Gemini 2.5 TTS API – price not found TTS model excluded from generative AI pricing page
gemini-2.5-flash-tts Google – Gemini 2.5 TTS API – price not found TTS model excluded from generative AI pricing page

Google – Imagen (per-image pricing)

Model ID Publisher / Section Source Notes
imagen-4.0-ultra-generate-001 Google – Imagen 4.0 Ultra API $0.06/image
imagen-4.0-generate-001 Google – Imagen 4.0 API $0.04/image
imagen-4.0-fast-generate-001 Google – Imagen 4.0 Fast API $0.02/image
imagen-3.0-generate-002 Google – Imagen 3.0 API $0.04/image
imagen-3.0-capability-001 Google – Imagen 3.0 (capability) API Priced same as imagen-3.0-generate per schema rules ($0.04/image)
imagen-3.0-capability-002 Google – Imagen 3.0 (capability) API Priced same as imagen-3.0-generate per schema rules ($0.04/image)

Google – Veo (per-second video pricing)

Model ID Publisher / Section Source Notes
veo-3.1-generate-001 Google – Veo 3.1 API $0.20/sec video, 8s default, 1 sample
veo-3.1-fast-generate-001 Google – Veo 3.1 Fast API $0.10/sec video, 8s default, 1 sample
veo-3.1-generate-preview Google – Veo 3.1 API Preview alias — priced same as veo-3.1-generate-001
veo-3.1-fast-generate-preview Google – Veo 3.1 Fast API Preview alias — priced same as veo-3.1-fast-generate-001
veo-3.0-generate-001 Google – Veo 3.0 API $0.20/sec video, 8s default, 1 sample
veo-3.0-fast-generate-001 Google – Veo 3.0 Fast API $0.10/sec video, 8s default, 1 sample
veo-3.0-generate-preview Google – Veo 3.0 API Preview alias — priced same as veo-3.0-generate-001
veo-3.0-fast-generate-preview Google – Veo 3.0 Fast API Preview alias — priced same as veo-3.0-fast-generate-001
veo-2.0-generate-001 Google – Veo 2.0 API $0.50/sec video, 8s default, 1 sample

Google – Embedding

Model ID Publisher / Section Source Notes
gemini-embedding-001 Google – Gemini Embedding API $0.00015/1K tokens
gemini-embedding-2-preview Google – Gemini Embedding API Preview; priced same as gemini-embedding-001
text-embedding-005 Google – Text Embedding API $0.000025/1K characters (per_million_characters unit)
text-multilingual-embedding-002 Google – Text Multilingual Embedding API $0.000025/1K characters
text-embedding-large-exp-03-07 Google – Text Embedding Large (experimental) API Priced same as gemini-embedding family ($0.00015/1K tokens)
textembedding-gecko@003 Google – Legacy Embedding API – price not found Legacy model; no dedicated pricing row
textembedding-gecko-multilingual@001 Google – Legacy Embedding API – price not found Legacy model; no dedicated pricing row
multimodalembedding@001 Google – Multimodal Embedding API Per-image $0.002¢, per-video-standard $0.002¢

Anthropic – Claude

Model ID Publisher / Section Source Notes
claude-opus-4-6 Anthropic – Claude Opus 4.6 API @default stripped; $5/$25, cache_write $6.25, cache_read $0.50
claude-sonnet-4-6 Anthropic – Claude Sonnet 4.6 API @default stripped; $3/$15, cache_write $3.75, cache_read $0.30
claude-opus-4-5@20251101 Anthropic – Claude Opus 4.5 API Pinned version; $5/$25, cache_write $6.25, cache_read $0.50
claude-sonnet-4-5@20250929 Anthropic – Claude Sonnet 4.5 API Pinned version; $3/$15, cache_write $3.75, cache_read $0.30
claude-haiku-4-5@20251001 Anthropic – Claude Haiku 4.5 API Pinned version; $1/$5, cache_write $1.25, cache_read $0.10
claude-opus-4-1@20250805 Anthropic – Claude Opus 4.1 API Pinned version; $5/$25, cache_write $6.25, cache_read $0.50
claude-opus-4@20250514 Anthropic – Claude Opus 4 API Pinned version; $5/$25, cache_write $6.25, cache_read $0.50
claude-sonnet-4@20250514 Anthropic – Claude Sonnet 4 API Pinned version; $3/$15, cache_write $3.75, cache_read $0.30

OpenAI – GPT

Model ID Publisher / Section Source Notes
gpt-oss-120b-maas OpenAI – GPT-OSS 120B API $0.09/$0.36

OpenAI models excluded (self-deploy / whisper): gpt-4o-self-deploy, gpt-4o-mini-self-deploy, o3-self-deploy, o4-mini-self-deploy, whisper-1

Meta – Llama

Model ID Publisher / Section Source Notes
llama-3.3-70b-instruct-maas Meta – Llama 3.3 70B API $0.72/$0.72
llama-4-maverick-17b-128e-instruct-maas Meta – Llama 4 Maverick API $0.35/$1.15

Meta models excluded: llama-guard-*, prompt-guard-* (guard models); faster-rcnn-*, retinanet-*, mask-rcnn-*, segment-anything-*, sam3-* (non-generative CV); xlm-roberta-*, roberta-* (non-generative NLP); nllb-* (translation); imagebind-* (non-generative); all self-deploy without -maas suffix

Qwen

Model ID Publisher / Section Source Notes
qwen3-235b-a22b-instruct-2507-maas Qwen – Qwen3 235B API $0.22/$0.88
qwen3-coder-480b-a35b-instruct-maas Qwen – Qwen3 Coder 480B API $0.22/$1.80
qwen3-next-80b-a3b-instruct-maas Qwen – Qwen3 Next 80B API $0.15/$1.20
qwen3-next-80b-a3b-thinking-maas Qwen – Qwen3 Next 80B (thinking) API $0.15/$1.20 (same row as instruct)

Qwen excluded: qwen-image (explicit policy); all self-deploy models without -maas suffix

Mistral

Model ID Publisher / Section Source Notes
mistral-small-2503 Mistral – Mistral Small API $0.10/$0.30
mistral-medium-3 Mistral – Mistral Medium API $0.40/$2.00
codestral-2 Mistral – Codestral 2 API $0.30/$0.90

Mistral excluded: mistral-ocr-2505 (OCR); codestral-2501-self-deploy, ministral-3, mistral-large-3 (self-deploy without -maas); mistral/mixtral from mistral-ai namespace (self-deploy)

DeepSeek

Model ID Publisher / Section Source Notes
deepseek-r1-0528-maas DeepSeek – DeepSeek R1 0528 API $1.35/$5.40
deepseek-v3.1-maas DeepSeek – DeepSeek V3.1 API $0.60/$1.70
deepseek-v3.2-maas DeepSeek – DeepSeek V3.2 API $0.56/$1.68

DeepSeek excluded: deepseek-ocr-maas (OCR by name); all self-deploy variants (deepseek-r1, deepseek-v3, etc. without -maas)

Kimi / Moonshot

Model ID Publisher / Section Source Notes
kimi-k2-thinking-maas Moonshot – Kimi K2 Thinking API $0.60/$2.50

Kimi excluded: kimi-k2, kimi-k2-5 (self-deploy without -maas)

MiniMax

Model ID Publisher / Section Source Notes
minimax-m2-maas MiniMax – MiniMax M2 API $0.30/$1.20

MiniMax excluded: minimax-m2 (self-deploy without -maas)

ZAI.org / GLM

Model ID Publisher / Section Source Notes
glm-4.7-maas ZAI.org – GLM-4.7 API $0.60/$2.20
glm-5-maas ZAI.org – GLM-5 API $1.00/$3.20

ZAI excluded: glm-image (explicit policy); glm-ocr (OCR); glm-4.7, glm-5, glm-4.5 (self-deploy without -maas)

AI21

AI21: jamba-large-1.6 — self-deploy (has_deploy: true, no -maas suffix); excluded per partner rules. No includable models from AI21.


Generated by Pricing Agent on 2026-04-01

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant