ArmDeveloperEcosystem · pareenaverma · May 15, 2026 · May 11, 2026 · May 15, 2026
diff --git a/content/learning-paths/servers-and-cloud-computing/deepspeed-on-axion/_index.md b/content/learning-paths/servers-and-cloud-computing/deepspeed-on-axion/_index.md
@@ -0,0 +1,66 @@
+---
+title: Train and Benchmark AI Workloads with DeepSpeed on Google Cloud C4A Axion VM
+
+draft: true
+cascade:
+    draft: true
+
+description: Set up PyTorch and DeepSpeed on Google Cloud C4A Axion Arm VMs running SUSE Linux to train neural network models, benchmark AI workloads, and validate scalable CPU-based AI execution on Arm64 processors.
+
+minutes_to_complete: 30
+
+who_is_this_for: This is an introductory topic for DevOps engineers, ML engineers, and software developers who want to run AI training and benchmarking workloads using PyTorch and DeepSpeed on SUSE Linux Enterprise Server (SLES) Arm64, validate CPU-based neural network execution, and benchmark AI performance on Arm processors.
+
+learning_objectives:
+    - Install and configure PyTorch and DeepSpeed on Google Cloud C4A Axion processors for Arm64
+    - Create and execute neural network training workloads using PyTorch
+    - Benchmark CPU-based AI workloads on Arm64 processors
+    - Validate scalable AI execution and workload performance on GCP Axion Arm VMs
+
+prerequisites:
+  - A [Google Cloud Platform (GCP)](https://cloud.google.com/free) account with billing enabled
+  - Basic familiarity with Python and machine learning concepts
+
+author: Pareena Verma
+
+##### Tags
+skilllevels: Introductory
+subjects: ML
+cloud_service_providers:
+  - Google Cloud
+
+armips:
+  - Neoverse
+
+tools_software_languages:
+  - DeepSpeed
+  - PyTorch
+  - Python
+
+operatingsystems:
+  - Linux
+
+# ================================================================================
+#       FIXED, DO NOT MODIFY
+# ================================================================================
+
+further_reading:
+  - resource:
+      title: DeepSpeed official documentation
+      link: https://www.deepspeed.ai/
+      type: documentation
+
+  - resource:
+      title: DeepSpeed GitHub repository
+      link: https://github.com/microsoft/DeepSpeed
+      type: documentation
+
+  - resource:
+      title: PyTorch documentation
+      link: https://pytorch.org/docs/stable/index.html
+      type: documentation
+
+weight: 1
+layout: "learningpathall"
+learning_path_main_page: yes
+---
diff --git a/...nt/learning-paths/servers-and-cloud-computing/deepspeed-on-axion/_next-steps.md b/...nt/learning-paths/servers-and-cloud-computing/deepspeed-on-axion/_next-steps.md
@@ -0,0 +1,8 @@
+---
+# ================================================================================
+#       FIXED, DO NOT MODIFY THIS FILE
+# ================================================================================
+weight: 21                  # Set to always be larger than the content in this path to be at the end of the navigation.
+title: "Next Steps"         # Always the same, html page title.
+layout: "learningpathall"   # All files under learning paths have this same wrapper for Hugo processing.
+---
diff --git a/...ent/learning-paths/servers-and-cloud-computing/deepspeed-on-axion/background.md b/...ent/learning-paths/servers-and-cloud-computing/deepspeed-on-axion/background.md
@@ -0,0 +1,43 @@
+---
+title: Learn about DeepSpeed and Google Axion C4A for AI training
+weight: 2
+
+layout: "learningpathall"
+---
+
+## Google Axion C4A Arm instances for AI and machine learning
+
+Google Axion C4A is a family of Arm-based virtual machines built on Google’s custom Axion CPU, which is based on Arm Neoverse V2 cores. Designed for high-performance and energy-efficient computing, these virtual machines offer strong performance for AI, machine learning, data analytics, and modern cloud-native workloads.
+
+The C4A series provides a cost-effective alternative to x86 virtual machines while leveraging the scalability and efficiency advantages of the Arm architecture in Google Cloud.
+
+For AI and machine learning workloads, Axion processors provide high multi-core CPU throughput, efficient tensor computation performance, improved performance-per-watt, and scalable CPU execution for training and inference workloads. These capabilities make Axion Arm-based systems suitable for neural network training, benchmarking, experiment validation, and scalable AI development pipelines.
+
+To learn more, see the Google blog [Introducing Google Axion Processors, our new Arm-based CPUs](https://cloud.google.com/blog/products/compute/introducing-googles-new-arm-based-cpu).
+
+## DeepSpeed for scalable AI training on Arm
+
+DeepSpeed is an open-source deep learning optimization framework developed by Microsoft to enable efficient and scalable training of large AI models. It is widely used for distributed deep learning, memory optimization, large language model (LLM) training, efficient inference execution, and high-performance AI workloads.
+
+DeepSpeed provides a unified optimization platform with capabilities such as:
+
+* ZeRO (Zero Redundancy Optimizer) memory optimization  
+* Distributed training acceleration  
+* Mixed precision training  
+* Pipeline and tensor parallelism  
+* Optimized inference execution  
+* Scalable AI workload management  
+
+Running DeepSpeed on Google Axion C4A Arm-based infrastructure enables efficient CPU-based AI training and benchmarking workflows by utilizing multi-core Arm processors and optimized memory performance. This results in improved performance-per-watt, reduced infrastructure costs, and scalable execution for AI experimentation and model training workloads.
+
+On SUSE Linux Enterprise Server Arm64 environments, some DeepSpeed native CPU communication extensions require newer GCC toolchains for compilation. For this reason, this Learning Path uses DeepSpeed compatibility-mode installation together with PyTorch CPU execution to provide stable AI workload validation and benchmarking on GCP Axion Arm64 processors.
+
+Common use cases include neural network training, AI benchmarking, scalable experimentation pipelines, distributed AI research environments, and CPU-based inference validation workflows.
+
+To learn more, see the [DeepSpeed documentation](https://www.deepspeed.ai/) and the [DeepSpeed GitHub repository](https://github.com/microsoft/DeepSpeed).
+
+## What you've learned and what's next
+
+You've now learned about Google Axion C4A Arm-based virtual machines and their performance advantages for AI and machine learning workloads. You were also introduced to core DeepSpeed capabilities including distributed training optimization, ZeRO memory optimization, scalable AI execution, and CPU-based AI benchmarking workflows.
+
+Next, you'll set up PyTorch and DeepSpeed on a GCP Axion Arm64 virtual machine, configure a Python AI/ML environment, and begin running AI training and benchmarking workloads on Arm processors.
diff --git a/...g-paths/servers-and-cloud-computing/deepspeed-on-axion/images/gcp-pubip-ssh.png b/...g-paths/servers-and-cloud-computing/deepspeed-on-axion/images/gcp-pubip-ssh.png
diff --git a/...rning-paths/servers-and-cloud-computing/deepspeed-on-axion/images/gcp-shell.png b/...rning-paths/servers-and-cloud-computing/deepspeed-on-axion/images/gcp-shell.png
diff --git a/...learning-paths/servers-and-cloud-computing/deepspeed-on-axion/images/gcp-vm.png b/...learning-paths/servers-and-cloud-computing/deepspeed-on-axion/images/gcp-vm.png