This repository contains recipes that provide instructions to reproduce specific workload performance measurements, which are part of a confidential benchmarking program. These recipes focus on helping you reliably achieve performance metrics, such as throughput, that demonstrate the combined hardware and software stack on GPUs.
Note: The recipes in this repository are not designed as general-purpose code samples or tutorials for using Compute Engine-based products.
This content is for you if you are a customer or partner who needs to:
- Validate hardware performance with your suppliers.
- Inform purchasing decisions using the benchmarking data.
- Reproduce optimal performance scenarios before you customize workflows for your own requirements.
To reproduce a benchmark, follow these steps:
- Identify your requirements: determine the model, GPU type, workload, framework, and orchestrator that you are interested in.
- Select a recipe: based on your requirements use the Benchmark support matrix to find a recipe that meets your needs.
- Follow the recipe: each recipe will provide you with procedures to complete the following tasks:
- prepare your environment.
- run the benchmark.
- analyze the benchmarks results. This includes not just the results but detailed logs for further analysis. You can automate your infrastructure setup using Cluster Toolkit. For more information, see Automated GPU environment deployment with Cluster Toolkit.
| Models | GPU Machine Type | Framework | Workload Type | Orchestrator | Link to the recipe |
|---|---|---|---|---|---|
| GPT3-175B | A3 Mega (NVIDIA H100) | NeMo (25.07) | Pre-training | GKE | Link |
| Llama-3-70B | A3 Mega (NVIDIA H100) | NeMo (25.07) | Pre-training | GKE | Link |
| Mixtral-8-7B | A3 Mega (NVIDIA H100) | NeMo (25.07) | Pre-training | GKE | Link |
| Models | GPU Machine Type | Framework | Workload Type | Orchestrator | Link to the recipe |
|---|---|---|---|---|---|
| Llama-3.1-70B | A3 Ultra (NVIDIA H200) | MaxText | Pre-training | GKE | Link |
| Llama-3.1-70B | A3 Ultra (NVIDIA H200) | NeMo (24.07) | Pre-training | GKE | Link |
| Llama-3-70B | A3 Ultra (NVIDIA H200) | Megatron-Bridge (26.02) | Pre-training | GKE | Link |
| Llama-3-70B | A3 Ultra (NVIDIA H200) | Megatron-Bridge (25.11) | Pre-training | Slurm | Link |
| Llama-3-8B | A3 Ultra (NVIDIA H200) | Megatron-Bridge (25.11) | Pre-training | Slurm | Link |
| Llama-3.1-405B | A3 Ultra (NVIDIA H200) | MaxText | Pre-training | GKE | Link |
| Llama-3.1-405B | A3 Ultra (NVIDIA H200) | NeMo (24.12) | Pre-training | GKE | Link |
| Mixtral-8-7B | A3 Ultra (NVIDIA H200) | NeMo (24.07) | Pre-training | GKE | Link |
| DeepSeek-V3 | A3 Ultra (NVIDIA H200) | Megatron-Bridge (26.02) | Pre-training | GKE | Link |
| GPT OSS 120B | A3 Ultra (NVIDIA H200) | NeMo (26.02) | Pre-training | GKE | Link |
| Qwen-3-30B | A3 Ultra (NVIDIA H200) | NeMo (26.02) | Pre-training | GKE | Link |
| Wan-2.1 | A3 Ultra (NVIDIA H200) | Megatron-Bridge (26.02) | Pre-training | GKE | Link |
| Models | GPU Machine Type | Framework / Library | Workload Type | Orchestrator | Link to the recipe |
|---|---|---|---|---|---|
| Llama-3.1-70B | A4 (NVIDIA B200) | MaxText | Pre-training | GKE | Link |
| Llama-3.1-70B | A4 (NVIDIA B200) | NeMo (25.07) | Pre-training | GKE | Link |
| Llama-3.1-70B | A4 (NVIDIA B200) | NeMo (26.02) | Pre-training | GKE | Link |
| Llama-3.1-70B | A4 (NVIDIA B200) | Megatron-Bridge (25.09) | Pre-training | Slurm | Link |
| Llama-3.1-405B | A4 (NVIDIA B200) | MaxText | Pre-training | GKE | Link |
| Llama-3.1-405B | A4 (NVIDIA B200) | NeMo (25.07) | Pre-training | GKE | Link |
| Llama-3.1-405B | A4 (NVIDIA B200) | NeMo (26.02) | Pre-training | GKE | Link |
| Llama-3.1-405B | A4 (NVIDIA B200) | Megatron-Bridge (25.09) | Pre-training | Slurm | Link |
| Mixtral-8-7B | A4 (NVIDIA B200) | NeMo (25.07) | Pre-training | GKE | Link |
| PaliGemma2 | A4 (NVIDIA B200) | Hugging Face Accelerate | Finetuning | GKE | Link |
| DeepSeek-V3 | A4 (NVIDIA B200) | Megatron-Bridge (25.11) | Pre-training | GKE | Link |
| DeepSeek-V3 | A4 (NVIDIA B200) | Megatron-Bridge (26.02) | Pre-training | GKE | Link |
| GPT OSS 120B | A4 (NVIDIA B200) | Megatron-Bridge (26.02) | Pre-training | GKE | Link |
| Llama-3-8B | A4 (NVIDIA B200) | Megatron-Bridge (26.02) | Pre-training | GKE | Link |
| Qwen-3-235B | A4 (NVIDIA B200) | Megatron-Bridge (25.11) | Pre-training | GKE | Link |
| Qwen-3-235B | A4 (NVIDIA B200) | Megatron-Bridge (26.02) | Pre-training | GKE | Link |
| Qwen-3-235B | A4 (NVIDIA B200) | Megatron-Bridge (25.11) | Pre-training | Slurm | Link |
| Qwen-3-30B | A4 (NVIDIA B200) | NeMo (26.02) | Pre-training | GKE | Link |
| Wan-2.1-14B | A4 (NVIDIA B200) | NeMo (25.11) | Pre-training | GKE | Link |
| Models | GPU Machine Type | Framework | Workload Type | Orchestrator | Link to the recipe |
|---|---|---|---|---|---|
| Llama-3.1-8B | A4X (NVIDIA GB200) | NeMo (25.07) | Pre-training | GKE | Link |
| Llama-3.1-8B | A4X (NVIDIA GB200) | Megatron-Bridge (25.11) | Pre-training | GKE | Link |
| Llama-3.1-8B | A4X (NVIDIA GB200) | Megatron-Bridge (25.11) | Pre-training | Slurm | Link |
| Llama-3.1-70B | A4X (NVIDIA GB200) | NeMo (25.07) | Pre-training | GKE | Link |
| Llama-3.1-70B | A4X (NVIDIA GB200) | Megatron-Bridge (26.02) | Pre-training | GKE | Link |
| Llama-3.1-405B | A4X (NVIDIA GB200) | NeMo (25.07) | Pre-training | GKE | Link |
| Llama-3.1-405B | A4X (NVIDIA GB200) | NeMo (26.02) | Pre-training | GKE | Link |
| Llama-3.1-405B | A4X (NVIDIA GB200) | Megatron-Bridge (26.02) | Pre-training | GKE | Link |
| Llama-3.1-405B | A4X (NVIDIA GB200) | Megatron-Bridge (25.09) | Pre-training | Slurm | Link |
| Nemotron-4-340B | A4X (NVIDIA GB200) | NeMo (25.09) | Pre-training | GKE | Link |
| Wan-2.1-14B | A4X (NVIDIA GB200) | NeMo (25.11) | Pre-training | GKE | Link |
| Wan-2.1-14B | A4X (NVIDIA GB200) | NeMo (26.02) | Pre-training | GKE | Link |
| Wan-2.1-14B | A4X (NVIDIA GB200) | NeMo (25.11) | Pre-training | Slurm | Link |
| DeepSeek-V3 | A4X (NVIDIA GB200) | Megatron-Bridge (25.11) | Pre-training | GKE | Link |
| Qwen-3-235B | A4X (NVIDIA GB200) | Megatron-Bridge (25.11) | Pre-training | GKE | Link |
| Qwen-3-235B | A4X (NVIDIA GB200) | Megatron-Bridge (25.11) | Pre-training | Slurm | Link |
| Qwen-3-30B | A4X (NVIDIA GB200) | Megatron-Bridge (25.11) | Pre-training | GKE | Link |
| Qwen-3-30B | A4X (NVIDIA GB200) | Megatron-Bridge (25.11) | Pre-training | Slurm | Link |
| Models | GPU Machine Type | Framework | Workload Type | Orchestrator | Link to the recipe |
|---|---|---|---|---|---|
| Llama-4 | A3 Mega (NVIDIA H100) | SGLang | Inference | GKE | Link |
| DeepSeek R1 671B | A3 Mega (NVIDIA H100) | SGLang | Inference | GKE | Link |
| DeepSeek R1 671B | A3 Mega (NVIDIA H100) | vLLM | Inference | GKE | Link |
| Models | GPU Machine Type | Framework | Workload Type | Orchestrator | Link to the recipe |
|---|---|---|---|---|---|
| GPT OSS 120B | A3 Ultra (NVIDIA H200) | vLLM | Inference | GKE | Link |
| Llama-4 | A3 Ultra (NVIDIA H200) | vLLM | Inference | GKE | Link |
| Llama-3.1-405B | A3 Ultra (NVIDIA H200) | TensorRT-LLM | Inference | GKE | Link |
| DeepSeek R1 671B | A3 Ultra (NVIDIA H200) | SGLang | Inference | GKE | Link |
| DeepSeek R1 671B | A3 Ultra (NVIDIA H200) | vLLM | Inference | GKE | Link |
| Models | GPU Machine Type | Framework | Workload Type | Orchestrator | Link to the recipe |
|---|---|---|---|---|---|
| DeepSeek R1 671B | A4 (NVIDIA B200) | vLLM | Inference | GKE | Link |
| DeepSeek R1 671B | A4 (NVIDIA B200) | SGLang | Inference | GKE | Link |
| DeepSeek R1 671B | A4 (NVIDIA B200) | TensorRT-LLM | Inference | GKE | Link |
| Llama 3.1 405B | A4 (NVIDIA B200) | TensorRT-LLM | Inference | GKE | Link |
| Qwen 2.5 VL 7B | A4 (NVIDIA B200) | TensorRT-LLM | Inference | GKE | Link |
| Qwen 3 235B A22B | A4 (NVIDIA B200) | TensorRT-LLM | Inference | GKE | Link |
| Qwen 3 32B | A4 (NVIDIA B200) | TensorRT-LLM | Inference | GKE | Link |
| Models | GPU Machine Type | Framework | Workload Type | Orchestrator | Link to the recipe |
|---|---|---|---|---|---|
| DeepSeek R1 671B | A4X (NVIDIA GB200) | vLLM (v0.14.0rc1) | Inference | GKE | Link |
| Wan2.2 T2V A14B Diffusers | A4X (NVIDIA GB200) | SGLang (latest) | Inference | GKE | Link |
| Wan2.2 I2V A14B Diffusers | A4X (NVIDIA GB200) | SGLang (latest) | Inference | GKE | Link |
| DeepSeek R1 671B | A4X (NVIDIA GB200) | TensorRT-LLM (1.3.0rc5) | Inference | GKE | Link Link for Using Google Cloud Storage (GCS) as Storage Option Link for Using Lustre as Storage Option |
| Llama 3.1 405B | A4X (NVIDIA GB200) | TensorRT-LLM (1.3.0rc5) | Inference | GKE | Link |
| Llama 3.1 70B | A4X (NVIDIA GB200) | TensorRT-LLM (1.3.0rc5) | Inference | GKE | Link |
| Llama 3.1 8B | A4X (NVIDIA GB200) | TensorRT-LLM (1.3.0rc5) | Inference | GKE | Link |
| Qwen 2.5 VL 7B | A4X (NVIDIA GB200) | TensorRT-LLM (1.3.0rc5) | Inference | GKE | Link |
| Qwen 3 235B A22B | A4X (NVIDIA GB200) | TensorRT-LLM (1.3.0rc5) | Inference | GKE | Link |
| Qwen 3 32B | A4X (NVIDIA GB200) | TensorRT-LLM (1.3.0rc5) | Inference | GKE | Link |
| Qwen 3 4B | A4X (NVIDIA GB200) | TensorRT-LLM (1.3.0rc5) | Inference | GKE | Link |
| Models | GPU Machine Type | Framework | Workload Type | Orchestrator | Link to the recipe |
|---|---|---|---|---|---|
| Qwen3 8B | G4 (NVIDIA RTX PRO 6000 Blackwell) | vLLM | Inference | GCE | Link |
| Qwen3 30B A3B | G4 (NVIDIA RTX PRO 6000 Blackwell) | TensorRT-LLM | Inference | GCE | Link |
| Qwen3 4B | G4 (NVIDIA RTX PRO 6000 Blackwell) | TensorRT-LLM | Inference | GCE | Link |
| Qwen3 8B | G4 (NVIDIA RTX PRO 6000 Blackwell) | TensorRT-LLM | Inference | GCE | Link |
| Qwen3 32B | G4 (NVIDIA RTX PRO 6000 Blackwell) | TensorRT-LLM | Inference | GCE | Link |
| Qwen3 32B | G4 (NVIDIA RTX PRO 6000 Blackwell) | vLLM | Inference | GCE | Link |
| Llama3.1 70B | G4 (NVIDIA RTX PRO 6000 Blackwell) | TensorRT-LLM | Inference | GCE | Link |
| DeepSeek R1 | G4 (NVIDIA RTX PRO 6000 Blackwell) | TensorRT-LLM | Inference | GCE | Link |
| Qwen3 235B | G4 (NVIDIA RTX PRO 6000 Blackwell) | TensorRT-LLM | Inference | GCE | Link |
| Wan2.2 14B | G4 (NVIDIA RTX PRO 6000 Blackwell) | SGLang | Inference | GCE | Link |
| Models | GPU Machine Type | Framework | Workload Type | Orchestrator | Link to the recipe |
|---|---|---|---|---|---|
| Llama-3.1-70B | A3 Mega (NVIDIA H100) | NeMo | Pre-training using Google Cloud Storage buckets for checkpoints | GKE | Link |
| Models | GPU Machine Type | Framework | Workload Type | Orchestrator | Link to the recipe |
|---|---|---|---|---|---|
| Llama-3.1-70B | A3 Mega (NVIDIA H100) | NeMo | Pre-training using the Google Cloud Resiliency library | GKE | Link |
| Llama-3.1-405B | A3 Ultra (NVIDIA H200) | NeMo | Pre-training using the Google Cloud Resiliency library | GKE | Link |
| Mixtral-8x7B | A3 Ultra (NVIDIA H200) | NeMo | Pre-training using the Google Cloud Resiliency library | GKE | Link |
./training: this directory contains recipes with instructions to reproduce training benchmarks with GPUs../inference: this directory contains recipes with instructions to reproduce inference benchmarks with GPUs../src: this directory contains the shared dependencies required to run benchmarks, such as Docker images and Helm charts../docs: this directory contains supporting documentation for explanations of benchmark methodologies or configurations.
This repository provides the steps that you can use to reproduce a specific benchmark. The actual performance measurements and the complete, confidential benchmark report are not included.
Performance benchmarks measure the performance of various workloads on the platform. These benchmarks are primarily used to validate performance with hardware suppliers and to provide you with data for purchasing decisions.
Benchmark data is considered a point-in-time measurement and completed benchmarks are not repeated. We maintain and update the recipes in this repository on a best-effort basis.
For general guidance on how to get started using Compute products, refer to the official documentation and tutorials:
- Compute Engine overview
- Compute Engine samples
- Cloud GPU documentation
- AI Hypercomputer documentation
- Automated GPU environment deployment with Cluster Toolkit
If you have questions or encounter problems with this repository, report them through GitHub Issues or reach out to your Google Cloud account team for assistance.
Note: This is not an officially supported Google product. This project is not eligible for the Google Open Source Software Vulnerability Rewards Program.