ArmDeveloperEcosystem · jasonrandrews · Feb 13, 2026 · Feb 12, 2026 · Feb 12, 2026 · Feb 12, 2026
diff --git a/...-paths/laptops-and-desktops/dgx_spark_voicechatbot/1_offline_voice_assistant.md b/...-paths/laptops-and-desktops/dgx_spark_voicechatbot/1_offline_voice_assistant.md
@@ -1,12 +1,12 @@
 ---
-title: Learn about offline voice assistants
+title: Build an offline voice assistant with whisper and vLLM
 weight: 2
 
 ### FIXED, DO NOT MODIFY
 layout: learningpathall
 ---
 
-## Why build an offline voice assistant?
+## Benefits of running a voice assistant offline
 
 Voice-based AI assistants are becoming essential in customer support, productivity tools, and embedded interfaces. For example, a retail kiosk might need to answer product-related questions verbally without relying on internet access. However, many of these systems depend heavily on cloud services for speech recognition and language understanding, raising concerns around latency, cost, and data privacy.
 
@@ -16,16 +16,16 @@ You avoid unpredictable latency caused by network fluctuations, prevent sensitiv
 
 By combining local speech-to-text (STT) with a locally hosted large language model (LLM), you gain complete control over the pipeline and eliminate API dependencies. You can experiment, customize, and scale without relying on external services.
 
-## What are some common development challenges?
+## Challenges of building a local voice assistant
 
 While the benefits are clear, building a local voice assistant involves several engineering challenges.
 
 Real-time audio segmentation requires reliably identifying when users start and stop speaking, accounting for natural pauses and background noise. Timing mismatches between STT and LLM components can cause delayed responses or repeated input, reducing conversational quality. You also need to balance CPU/GPU workloads to keep the pipeline responsive without overloading resources or blocking audio capture.
 
-## Why use Arm and DGX Spark?
+## Why run offline voice AI on Arm-based DGX Spark?
 
-Arm-powered platforms like [DGX Spark](https://www.nvidia.com/en-gb/products/workstations/dgx-spark/) allow efficient parallelism: use CPU cores for audio preprocessing and whisper inference, while offloading LLM reasoning to powerful GPUs. This architecture balances throughput and energy efficiency—ideal for private, on-premises AI workloads. To understand the CPU and GPU architecture of DGX Spark, refer to [Unlock quantized LLM performance on Arm-based NVIDIA DGX Spark](/learning-paths/laptops-and-desktops/dgx_spark_llamacpp/).
+Arm-powered platforms like [DGX Spark](https://www.nvidia.com/en-gb/products/workstations/dgx-spark/) allow efficient parallelism: use CPU cores for audio preprocessing and whisper inference, while offloading LLM reasoning to powerful GPUs. This architecture balances throughput and energy efficiency-ideal for private, on-premises AI workloads. To understand the CPU and GPU architecture of DGX Spark, refer to [Unlock quantized LLM performance on Arm-based NVIDIA DGX Spark](/learning-paths/laptops-and-desktops/dgx_spark_llamacpp/).
 
 DGX Spark also supports standard USB interfaces, making it easy to connect consumer-grade microphones for development or deployment. This makes it viable for edge inference and desktop-style prototyping.
 
-In this Learning Path, you’ll build a complete, offline voice chatbot prototype using PyAudio, faster-whisper, and vLLM on an Arm-based system—resulting in a fully functional assistant that runs entirely on local hardware with no internet dependency.
+In this Learning Path, you'll build a complete, offline voice chatbot prototype using PyAudio, faster-whisper, and vLLM on an Arm-based system-resulting in a fully functional assistant that runs entirely on local hardware with no internet dependency.
diff --git a/content/learning-paths/laptops-and-desktops/dgx_spark_voicechatbot/2_setup.md b/content/learning-paths/laptops-and-desktops/dgx_spark_voicechatbot/2_setup.md
@@ -6,9 +6,11 @@ weight: 3
 layout: learningpathall
 ---
 
-[Faster‑whisper](https://github.com/SYSTRAN/faster-whisper) is a high‑performance reimplementation of OpenAI Whisper, designed to significantly reduce transcription latency and memory usage. It is well suited for local and real‑time speech‑to‑text (STT) pipelines, especially when running on CPU‑only systems or hybrid CPU/GPU environments.
+## Set up faster-whisper for offline speech recognition
 
-You'll use faster‑whisper as the STT engine to convert raw microphone input into structured text. At this stage, the goal is to install faster‑whisper correctly and verify that it can transcribe audio reliably. Detailed tuning and integration are covered in later sections.
+[faster-whisper](https://github.com/SYSTRAN/faster-whisper) is a high-performance reimplementation of OpenAI Whisper, designed to significantly reduce transcription latency and memory usage. It's well suited for local and real-time speech-to-text (STT) pipelines, especially when running on CPU-only systems or hybrid CPU/GPU environments.
+
+You'll use faster-whisper as the STT engine to convert raw microphone input into structured text. At this stage, the goal is to install faster-whisper correctly and verify that it can transcribe audio reliably. Detailed tuning and integration are covered in later sections.
 
 ### Install build dependencies
 
@@ -22,11 +24,11 @@ sudo apt install python3.12 python3.12-venv python3.12-dev -y
 sudo apt install gcc portaudio19-dev ffmpeg -y
 ```
 
-## Create and activate Python environment
+## Create and activate a Python environment
 
 In particular, [pyaudio](https://pypi.org/project/PyAudio/) (used for real-time microphone capture) depends on the PortAudio library and the Python C API. These must match the version of Python you're using.
 
-Now that the system libraries are in place and audio input is verified, it's time to set up an isolated Python environment for your voice assistant project. This will prevent dependency conflicts and make your installation reproducible.
+Set up an isolated Python environment for your voice assistant project to prevent dependency conflicts and make your installation reproducible.
 
 ```bash
 python3.12 -m venv va_env
@@ -53,7 +55,7 @@ pip install requests webrtcvad sounddevice==0.5.3
 ```
 
 {{% notice Note %}}
-While sounddevice==0.5.4 is available, it introduces callback-related errors during audio stream cleanup that may confuse beginners.
+While sounddevice==0.5.4 is available, it introduces callback-related errors during audio stream cleanup that can confuse beginners.
 Use sounddevice==0.5.3, which is stable and avoids these warnings.
 {{% /notice %}}
 
@@ -162,7 +164,7 @@ Recording for 10 seconds...
 
 {{% notice Note %}}
 To stop the script, press Ctrl+C during any transcription loop. The current 10-second recording completes and transcribes before the program exits cleanly.
-Avoid using Ctrl+Z, which suspends the process instead of terminating it.
+Don't use Ctrl+Z, which suspends the process instead of terminating it.
 {{% /notice %}}
 
 
@@ -189,7 +191,7 @@ pip install sounddevice==0.5.3
 
 You can record audio without errors, but nothing is played back.
 
-Verify that your USB microphone or headset is selected as the default input/output device. Also ensure the system volume is not muted.
+Ensure that your USB microphone or headset is selected as the default input/output device. Also check that the system volume isn't muted.
 
 **Fix:** List all available audio devices:
 

diff --git a/...t/learning-paths/laptops-and-desktops/dgx_spark_voicechatbot/3_fasterwhisper.md b/...t/learning-paths/laptops-and-desktops/dgx_spark_voicechatbot/3_fasterwhisper.md
@@ -6,6 +6,8 @@ weight: 4
 layout: learningpathall
 ---
 
+## Build a CPU-based speech-to-text engine
+
 In this section, you'll build a real-time speech-to-text (STT) pipeline using only the CPU. Starting from a basic 10-second recorder, you'll incrementally add noise filtering, sentence segmentation, and parallel audio processing to achieve a transcription engine for Arm-based systems like DGX Spark.
 
 You'll start from a minimal loop and iterate toward a multithreaded, VAD-enhanced STT engine.
@@ -104,7 +106,7 @@ When you speak to the device, the output is similar to:
 
 {{% notice Note %}}
 faster-whisper supports many models like tiny, base, small, medium and large-v1/2/3.
-Check the [GitHub repository](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages) for more model details.
+See the [GitHub repository](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages) for more model details.
 {{% /notice %}}
 
 
@@ -238,15 +240,15 @@ When you say a long sentence with multiple clauses, the output is similar to:
  Segment done.
 ```
 
-The result is a smoother and more accurate voice UX—particularly important when integrating with downstream LLMs in later sections.
+The result is a smoother and more accurate voice UX - particularly important when integrating with downstream LLMs in later sections.
 
 ### Demo: Real-time speech transcription on Arm CPU with faster-whisper
 
-This demo shows the real-time transcription pipeline in action, running on an Arm-based DGX Spark system. Using a USB microphone and the faster-whisper model (`medium.en`), the system records voice input, processes it on the CPU, and returns accurate transcriptions with timestamps—all without relying on cloud services.
+This demo shows the real-time transcription pipeline in action, running on an Arm-based DGX Spark system. Using a USB microphone and the faster-whisper model (`medium.en`), the system records voice input, processes it on the CPU, and returns accurate transcriptions with timestamps - all without relying on cloud services.
 
 Notice the clean terminal output and low latency, demonstrating how the pipeline is optimized for local, real-time voice recognition on resource-efficient hardware.
 
-![Real-time speech transcription demo with volume visualization#center](fasterwhipser_demo1.gif "Figure 1: Real-time speech transcription with audio volume bar")
+![Real-time speech transcription demo with volume visualization alt-txt#center](fasterwhipser_demo1.gif "Real-time speech transcription with audio volume bar")
 
 The device runs audio capture and transcription in parallel. Use `threading.Thread` to collect audio without blocking, store audio frames in a `queue.Queue`, and in the main thread, poll for new data and run STT.
 

diff --git a/...t/learning-paths/laptops-and-desktops/dgx_spark_voicechatbot/3a_segmentation.md b/...t/learning-paths/laptops-and-desktops/dgx_spark_voicechatbot/3a_segmentation.md
@@ -6,7 +6,9 @@ weight: 5
 layout: learningpathall
 ---
 
-After applying the previous steps—model upgrade, VAD, smart turn detection, and multi-threaded audio collection—you now have a high-quality, CPU-based local speech-to-text system.
+## Optimize speech segmentation for your environment
+
+After applying the previous steps-model upgrade, VAD, smart turn detection, and multi-threaded audio collection - you now have a high-quality, CPU-based local speech-to-text system.
 
 At this stage, the core pipeline is complete. What remains is fine-tuning: adapting the system to your environment, microphone setup, and speaking style. This flexibility is one of the key advantages of a fully local STT pipeline.
 
@@ -42,7 +44,7 @@ Adjust this setting based on background noise and microphone quality.
 
 ### Tuning `MIN_SPEECH_SEC` and `SILENCE_LIMIT_SEC`
 
-- `MIN_SPEECH_SEC`: This parameter defines the minimum duration of detected speech required before a segment is considered valid. Use this to filter out very short utterances such as false starts or background chatter.
+- `MIN_SPEECH_SEC`: This parameter defines the minimum duration of detected speech needed before a segment is considered valid. Use this to filter out very short utterances such as false starts or background chatter.
 	- Lower values: More responsive, but may capture incomplete phrases or noise
 	- Higher values: More stable sentences, but slower response
 
@@ -58,7 +60,7 @@ Based on practical experiments, the following presets provide a good starting po
 |----------------------|----------------------|-------------------------|-------------------|
 | Short command phrases | 0.8 | 0.6 | Optimized for quick voice commands such as "yes", "next", or "stop". Prioritizes responsiveness over sentence completeness. |
 | Natural conversational speech | 1.0 | 1.0 | Balanced settings for everyday dialogue with natural pauses between phrases. |
-| Long-form explanations (for example, tutorials) | 2.0 | 2.0 | Designed for longer sentences and structured explanations, reducing the risk of premature segmentation. |
+| Long-form explanations such as tutorials | 2.0 | 2.0 | Designed for longer sentences and structured explanations, reducing the risk of premature segmentation. |
 
 ## Apply these settings
 

diff --git a/content/learning-paths/laptops-and-desktops/dgx_spark_voicechatbot/4_vllm.md b/content/learning-paths/laptops-and-desktops/dgx_spark_voicechatbot/4_vllm.md
@@ -6,6 +6,8 @@ weight: 6
 layout: learningpathall
 ---
 
+## Deploy vLLM for local language generation
+
 In the previous section, you built a complete Speech-to-Text (STT) engine using faster-whisper, running efficiently on Arm-based CPUs. Now it's time to add the next building block: a local large language model (LLM) that can generate intelligent responses from user input.
 
 You'll integrate [vLLM](https://vllm.ai/), a high-performance LLM inference engine that runs on GPU and supports advanced features such as continuous batching, OpenAI-compatible APIs, and quantized models.
@@ -18,7 +20,7 @@ vLLM is especially effective in hybrid systems like the DGX Spark, where CPU cor
 
 ### Install and launch vLLM with GPU acceleration
 
-In this section, you’ll install and launch vLLM—an optimized large language model (LLM) inference engine that runs efficiently on GPU. This component will complete your local speech-to-response pipeline by transforming transcribed text into intelligent replies.
+In this section, you'll install and launch vLLM - an optimized large language model (LLM) inference engine that runs efficiently on GPU. This component will complete your local speech-to-response pipeline by transforming transcribed text into intelligent replies.
 
 #### Install Docker and pull vLLM image
 
@@ -45,7 +47,7 @@ nvcr.io/nvidia/vllm      25.11-py3                 d33d4cadbe0f   2 months ago
 
 #### Download a quantized model (GPTQ)
 
-Use Hugging Face CLI to download a pre-quantized LLM such as Mistral-7B-Instruct-GPTQ and Meta-Llama-3-70B-Instruct-GPTQ models for following Real-Time AI Conversations.
+Use Hugging Face CLI to download a pre-quantized LLM such as Mistral-7B-Instruct-GPTQ and Meta-Llama-3-70B-Instruct-GPTQ models for real-time AI conversations.
 
 ```bash
 pip install huggingface_hub
@@ -99,7 +101,7 @@ docker run -it --gpus all -p 8000:8000 \
 ```
 
 {{% notice Note %}}
-Tip: The first launch will compile and cache the model. To reduce startup time in future runs, consider creating a Docker snapshot with docker commit.
+The first launch compiles and caches the model. To reduce startup time in future runs, consider creating a Docker snapshot with docker commit.
 {{% /notice %}}
 
 You can also check your NVIDIA driver and CUDA compatibility during the vLLM launch by looking at the output.

diff --git a/...nt/learning-paths/laptops-and-desktops/dgx_spark_voicechatbot/4a_integration.md b/...nt/learning-paths/laptops-and-desktops/dgx_spark_voicechatbot/4a_integration.md
@@ -6,7 +6,9 @@ weight: 7
 layout: learningpathall
 ---
 
-Now that both faster-whisper and vLLM are working independently, it's time to connect them into a real-time speech-to-response pipeline. Your system will listen to live audio, transcribe it, and send the transcription to vLLM to generate an intelligent reply—all running locally without cloud services.
+## Integrate STT with vLLM for voice interaction
+
+Now that both faster-whisper and vLLM are working independently, it's time to connect them into a real-time speech-to-response pipeline. Your system will listen to live audio, transcribe it, and send the transcription to vLLM to generate an intelligent reply - all running locally without cloud services.
 
 ### Dual process architecture: vLLM and STT
 
@@ -27,7 +29,7 @@ This separation has several advantages:
 
 Separating container startup from model launch provides greater control and improves development experience.
 
-By launching the container first, you can troubleshoot errors like model path issues or GPU memory limits directly inside the environment—without the container shutting down immediately. It also speeds up iteration: you avoid reloading the entire image each time you tweak settings or restart the model.
+By launching the container first, you can troubleshoot errors like model path issues or GPU memory limits directly inside the environment, without the container shutting down immediately. It also speeds up iteration: you avoid reloading the entire image each time you tweak settings or restart the model.
 
 This structure also improves visibility. You can inspect files, monitor GPU usage, or run diagnostics like `curl` and `nvidia-smi` inside the container. Breaking these steps apart makes the process easier to understand, debug, and extend.
 
@@ -52,7 +54,7 @@ vllm serve /models/mistral-7b \
     --dtype float16
 ```
 
-Look for "Application startup complete," in the output:
+Look for "Application startup complete." in the output:
 
 ```output
 (APIServer pid=1) INFO:     Started server process [1]
@@ -113,7 +115,7 @@ print(f" AI  : {reply}\n")
 This architecture mirrors the OpenAI Chat API design, enabling future enhancements like system-level prompts, multi-turn history, or role-specific behavior.
 
 {{% notice tip %}}
-If you encounter a "model does not exist" error, double-check the model path you used when launching vLLM. It must match MODEL_NAME exactly.
+If you encounter a "model doesn't exist" error, double-check the model path you used when launching vLLM. It must match MODEL_NAME exactly.
 {{% /notice %}}
 
 Switch to another terminal and save the following Python code in a file named `stt-client.py`:
@@ -280,9 +282,7 @@ If your input is too short, you'll see:
 Skipped short segment (1.32s < 2.0s)
 ```
 
-{{% notice Tip %}}
-You can fine-tune these parameters in future sections to better fit your speaking style or environment.
-{{% /notice %}}
+{{% notice Tip %}}You can fine-tune these parameters in future sections to better fit your speaking style or environment.{{% /notice %}}
 
 ## What you've accomplished and what's next