Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
---
title: Train and Benchmark AI Workloads with DeepSpeed on Google Cloud C4A Axion VM

draft: true
cascade:
draft: true

description: Set up PyTorch and DeepSpeed on Google Cloud C4A Axion Arm VMs running SUSE Linux to train neural network models, benchmark AI workloads, and validate scalable CPU-based AI execution on Arm64 processors.

minutes_to_complete: 30

who_is_this_for: This is an introductory topic for DevOps engineers, ML engineers, and software developers who want to run AI training and benchmarking workloads using PyTorch and DeepSpeed on SUSE Linux Enterprise Server (SLES) Arm64, validate CPU-based neural network execution, and benchmark AI performance on Arm processors.

learning_objectives:
- Install and configure PyTorch and DeepSpeed on Google Cloud C4A Axion processors for Arm64
- Create and execute neural network training workloads using PyTorch
- Benchmark CPU-based AI workloads on Arm64 processors
- Validate scalable AI execution and workload performance on GCP Axion Arm VMs

prerequisites:
- A [Google Cloud Platform (GCP)](https://cloud.google.com/free) account with billing enabled
- Basic familiarity with Python and machine learning concepts

author: Pareena Verma

##### Tags
skilllevels: Introductory
subjects: ML
cloud_service_providers:
- Google Cloud

armips:
- Neoverse

tools_software_languages:
- DeepSpeed
- PyTorch
- Python

operatingsystems:
- Linux

# ================================================================================
# FIXED, DO NOT MODIFY
# ================================================================================

further_reading:
- resource:
title: DeepSpeed official documentation
link: https://www.deepspeed.ai/
type: documentation

- resource:
title: DeepSpeed GitHub repository
link: https://github.com/microsoft/DeepSpeed
type: documentation

- resource:
title: PyTorch documentation
link: https://pytorch.org/docs/stable/index.html
type: documentation

weight: 1
layout: "learningpathall"
learning_path_main_page: yes
---
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
# ================================================================================
# FIXED, DO NOT MODIFY THIS FILE
# ================================================================================
weight: 21 # Set to always be larger than the content in this path to be at the end of the navigation.
title: "Next Steps" # Always the same, html page title.
layout: "learningpathall" # All files under learning paths have this same wrapper for Hugo processing.
---
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
---
title: Learn about DeepSpeed and Google Axion C4A for AI training
weight: 2

layout: "learningpathall"
---

## Google Axion C4A Arm instances for AI and machine learning

Google Axion C4A is a family of Arm-based virtual machines built on Google’s custom Axion CPU, which is based on Arm Neoverse V2 cores. Designed for high-performance and energy-efficient computing, these virtual machines offer strong performance for AI, machine learning, data analytics, and modern cloud-native workloads.

The C4A series provides a cost-effective alternative to x86 virtual machines while leveraging the scalability and efficiency advantages of the Arm architecture in Google Cloud.

For AI and machine learning workloads, Axion processors provide high multi-core CPU throughput, efficient tensor computation performance, improved performance-per-watt, and scalable CPU execution for training and inference workloads. These capabilities make Axion Arm-based systems suitable for neural network training, benchmarking, experiment validation, and scalable AI development pipelines.

To learn more, see the Google blog [Introducing Google Axion Processors, our new Arm-based CPUs](https://cloud.google.com/blog/products/compute/introducing-googles-new-arm-based-cpu).

## DeepSpeed for scalable AI training on Arm

DeepSpeed is an open-source deep learning optimization framework developed by Microsoft to enable efficient and scalable training of large AI models. It is widely used for distributed deep learning, memory optimization, large language model (LLM) training, efficient inference execution, and high-performance AI workloads.

DeepSpeed provides a unified optimization platform with capabilities such as:

* ZeRO (Zero Redundancy Optimizer) memory optimization
* Distributed training acceleration
* Mixed precision training
* Pipeline and tensor parallelism
* Optimized inference execution
* Scalable AI workload management

Running DeepSpeed on Google Axion C4A Arm-based infrastructure enables efficient CPU-based AI training and benchmarking workflows by utilizing multi-core Arm processors and optimized memory performance. This results in improved performance-per-watt, reduced infrastructure costs, and scalable execution for AI experimentation and model training workloads.

On SUSE Linux Enterprise Server Arm64 environments, some DeepSpeed native CPU communication extensions require newer GCC toolchains for compilation. For this reason, this Learning Path uses DeepSpeed compatibility-mode installation together with PyTorch CPU execution to provide stable AI workload validation and benchmarking on GCP Axion Arm64 processors.

Common use cases include neural network training, AI benchmarking, scalable experimentation pipelines, distributed AI research environments, and CPU-based inference validation workflows.

To learn more, see the [DeepSpeed documentation](https://www.deepspeed.ai/) and the [DeepSpeed GitHub repository](https://github.com/microsoft/DeepSpeed).

## What you've learned and what's next

You've now learned about Google Axion C4A Arm-based virtual machines and their performance advantages for AI and machine learning workloads. You were also introduced to core DeepSpeed capabilities including distributed training optimization, ZeRO memory optimization, scalable AI execution, and CPU-based AI benchmarking workflows.

Next, you'll set up PyTorch and DeepSpeed on a GCP Axion Arm64 virtual machine, configure a Python AI/ML environment, and begin running AI training and benchmarking workloads on Arm processors.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Loading