ArmDeveloperEcosystem · jasonrandrews · Feb 13, 2026 · Jan 28, 2026 · Feb 5, 2026 · Feb 9, 2026
diff --git a/.wordlist.txt b/.wordlist.txt
@@ -5644,4 +5644,16 @@ Numbat
 SKUs
 asct
 geminicli
-passwordless
+passwordless
+AWQ
+Coqui
+GPTQ
+PortAudio
+PyAudio
+Riva
+UX
+actionability
+customizations
+pyaudio
+sounddevice
+webrtcvad
diff --git a/...-paths/laptops-and-desktops/dgx_spark_voicechatbot/1_offline_voice_assistant.md b/...-paths/laptops-and-desktops/dgx_spark_voicechatbot/1_offline_voice_assistant.md
@@ -1,12 +1,12 @@
 ---
-title: Learn about offline voice assistants
+title: Build an offline voice assistant with whisper and vLLM
 weight: 2
 
 ### FIXED, DO NOT MODIFY
 layout: learningpathall
 ---
 
-## Why build an offline voice assistant?
+## Benefits of running a voice assistant offline
 
 Voice-based AI assistants are becoming essential in customer support, productivity tools, and embedded interfaces. For example, a retail kiosk might need to answer product-related questions verbally without relying on internet access. However, many of these systems depend heavily on cloud services for speech recognition and language understanding, raising concerns around latency, cost, and data privacy.
 
@@ -16,16 +16,16 @@ You avoid unpredictable latency caused by network fluctuations, prevent sensitiv
 
 By combining local speech-to-text (STT) with a locally hosted large language model (LLM), you gain complete control over the pipeline and eliminate API dependencies. You can experiment, customize, and scale without relying on external services.
 
-## What are some common development challenges?
+## Challenges of building a local voice assistant
 
 While the benefits are clear, building a local voice assistant involves several engineering challenges.
 
 Real-time audio segmentation requires reliably identifying when users start and stop speaking, accounting for natural pauses and background noise. Timing mismatches between STT and LLM components can cause delayed responses or repeated input, reducing conversational quality. You also need to balance CPU/GPU workloads to keep the pipeline responsive without overloading resources or blocking audio capture.
 
-## Why use Arm and DGX Spark?
+## Why run offline voice AI on Arm-based DGX Spark?
 
-Arm-powered platforms like [DGX Spark](https://www.nvidia.com/en-gb/products/workstations/dgx-spark/) allow efficient parallelism: use CPU cores for audio preprocessing and whisper inference, while offloading LLM reasoning to powerful GPUs. This architecture balances throughput and energy efficiency—ideal for private, on-premises AI workloads. To understand the CPU and GPU architecture of DGX Spark, refer to [Unlock quantized LLM performance on Arm-based NVIDIA DGX Spark](/learning-paths/laptops-and-desktops/dgx_spark_llamacpp/).
+Arm-powered platforms like [DGX Spark](https://www.nvidia.com/en-gb/products/workstations/dgx-spark/) allow efficient parallelism: use CPU cores for audio preprocessing and whisper inference, while offloading LLM reasoning to powerful GPUs. This architecture balances throughput and energy efficiency-ideal for private, on-premises AI workloads. To understand the CPU and GPU architecture of DGX Spark, refer to [Unlock quantized LLM performance on Arm-based NVIDIA DGX Spark](/learning-paths/laptops-and-desktops/dgx_spark_llamacpp/).
 
 DGX Spark also supports standard USB interfaces, making it easy to connect consumer-grade microphones for development or deployment. This makes it viable for edge inference and desktop-style prototyping.
 
-In this Learning Path, you’ll build a complete, offline voice chatbot prototype using PyAudio, faster-whisper, and vLLM on an Arm-based system—resulting in a fully functional assistant that runs entirely on local hardware with no internet dependency.
+In this Learning Path, you'll build a complete, offline voice chatbot prototype using PyAudio, faster-whisper, and vLLM on an Arm-based system-resulting in a fully functional assistant that runs entirely on local hardware with no internet dependency.
diff --git a/content/learning-paths/laptops-and-desktops/dgx_spark_voicechatbot/2_setup.md b/content/learning-paths/laptops-and-desktops/dgx_spark_voicechatbot/2_setup.md
@@ -6,9 +6,11 @@ weight: 3
 layout: learningpathall
 ---
 
-[Faster‑whisper](https://github.com/SYSTRAN/faster-whisper) is a high‑performance reimplementation of OpenAI Whisper, designed to significantly reduce transcription latency and memory usage. It is well suited for local and real‑time speech‑to‑text (STT) pipelines, especially when running on CPU‑only systems or hybrid CPU/GPU environments.
+## Set up faster-whisper for offline speech recognition
 
-You'll use faster‑whisper as the STT engine to convert raw microphone input into structured text. At this stage, the goal is to install faster‑whisper correctly and verify that it can transcribe audio reliably. Detailed tuning and integration are covered in later sections.
+[faster-whisper](https://github.com/SYSTRAN/faster-whisper) is a high-performance re-implementation of OpenAI Whisper, designed to significantly reduce transcription latency and memory usage. It's well suited for local and real-time speech-to-text (STT) pipelines, especially when running on CPU-only systems or hybrid CPU/GPU environments.
+
+You'll use faster-whisper as the STT engine to convert raw microphone input into structured text. At this stage, the goal is to install faster-whisper correctly and verify that it can transcribe audio reliably. Detailed tuning and integration are covered in later sections.
 
 ### Install build dependencies
 
@@ -22,11 +24,11 @@ sudo apt install python3.12 python3.12-venv python3.12-dev -y
 sudo apt install gcc portaudio19-dev ffmpeg -y
 ```
 
-## Create and activate Python environment
+## Create and activate a Python environment
 
 In particular, [pyaudio](https://pypi.org/project/PyAudio/) (used for real-time microphone capture) depends on the PortAudio library and the Python C API. These must match the version of Python you're using.
 
-Now that the system libraries are in place and audio input is verified, it's time to set up an isolated Python environment for your voice assistant project. This will prevent dependency conflicts and make your installation reproducible.
+Set up an isolated Python environment for your voice assistant project to prevent dependency conflicts and make your installation reproducible.
 
 ```bash
 python3.12 -m venv va_env
@@ -53,7 +55,7 @@ pip install requests webrtcvad sounddevice==0.5.3
 ```
 
 {{% notice Note %}}
-While sounddevice==0.5.4 is available, it introduces callback-related errors during audio stream cleanup that may confuse beginners.
+While sounddevice==0.5.4 is available, it introduces callback-related errors during audio stream cleanup that can confuse beginners.
 Use sounddevice==0.5.3, which is stable and avoids these warnings.
 {{% /notice %}}
 
@@ -162,7 +164,7 @@ Recording for 10 seconds...
 
 {{% notice Note %}}
 To stop the script, press Ctrl+C during any transcription loop. The current 10-second recording completes and transcribes before the program exits cleanly.
-Avoid using Ctrl+Z, which suspends the process instead of terminating it.
+Don't use Ctrl+Z, which suspends the process instead of terminating it.
 {{% /notice %}}
 
 
@@ -189,7 +191,7 @@ pip install sounddevice==0.5.3
 
 You can record audio without errors, but nothing is played back.
 
-Verify that your USB microphone or headset is selected as the default input/output device. Also ensure the system volume is not muted.
+Ensure that your USB microphone or headset is selected as the default input/output device. Also check that the system volume isn't muted.
 
 **Fix:** List all available audio devices:
 

diff --git a/...t/learning-paths/laptops-and-desktops/dgx_spark_voicechatbot/3_fasterwhisper.md b/...t/learning-paths/laptops-and-desktops/dgx_spark_voicechatbot/3_fasterwhisper.md
@@ -6,6 +6,8 @@ weight: 4
 layout: learningpathall
 ---
 
+## Build a CPU-based speech-to-text engine
+
 In this section, you'll build a real-time speech-to-text (STT) pipeline using only the CPU. Starting from a basic 10-second recorder, you'll incrementally add noise filtering, sentence segmentation, and parallel audio processing to achieve a transcription engine for Arm-based systems like DGX Spark.
 
 You'll start from a minimal loop and iterate toward a multithreaded, VAD-enhanced STT engine.
@@ -104,7 +106,7 @@ When you speak to the device, the output is similar to:
 
 {{% notice Note %}}
 faster-whisper supports many models like tiny, base, small, medium and large-v1/2/3.
-Check the [GitHub repository](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages) for more model details.
+See the [GitHub repository](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages) for more model details.
 {{% /notice %}}
 
 
@@ -238,15 +240,15 @@ When you say a long sentence with multiple clauses, the output is similar to:
  Segment done.
 ```
 
-The result is a smoother and more accurate voice UX—particularly important when integrating with downstream LLMs in later sections.
+The result is a smoother and more accurate voice UX - particularly important when integrating with downstream LLMs in later sections.
 
 ### Demo: Real-time speech transcription on Arm CPU with faster-whisper
 
-This demo shows the real-time transcription pipeline in action, running on an Arm-based DGX Spark system. Using a USB microphone and the faster-whisper model (`medium.en`), the system records voice input, processes it on the CPU, and returns accurate transcriptions with timestamps—all without relying on cloud services.
+This demo shows the real-time transcription pipeline in action, running on an Arm-based DGX Spark system. Using a USB microphone and the faster-whisper model (`medium.en`), the system records voice input, processes it on the CPU, and returns accurate transcriptions with timestamps - all without relying on cloud services.
 
 Notice the clean terminal output and low latency, demonstrating how the pipeline is optimized for local, real-time voice recognition on resource-efficient hardware.
 
-![Real-time speech transcription demo with volume visualization#center](fasterwhipser_demo1.gif "Figure 1: Real-time speech transcription with audio volume bar")
+![Real-time speech transcription demo with volume visualization alt-txt#center](fasterwhipser_demo1.gif "Real-time speech transcription with audio volume bar")
 
 The device runs audio capture and transcription in parallel. Use `threading.Thread` to collect audio without blocking, store audio frames in a `queue.Queue`, and in the main thread, poll for new data and run STT.
 

diff --git a/...t/learning-paths/laptops-and-desktops/dgx_spark_voicechatbot/3a_segmentation.md b/...t/learning-paths/laptops-and-desktops/dgx_spark_voicechatbot/3a_segmentation.md
@@ -6,7 +6,9 @@ weight: 5
 layout: learningpathall
 ---
 
-After applying the previous steps—model upgrade, VAD, smart turn detection, and multi-threaded audio collection—you now have a high-quality, CPU-based local speech-to-text system.
+## Optimize speech segmentation for your environment
+
+After applying the previous steps-model upgrade, VAD, smart turn detection, and multi-threaded audio collection - you now have a high-quality, CPU-based local speech-to-text system.
 
 At this stage, the core pipeline is complete. What remains is fine-tuning: adapting the system to your environment, microphone setup, and speaking style. This flexibility is one of the key advantages of a fully local STT pipeline.
 
@@ -42,7 +44,7 @@ Adjust this setting based on background noise and microphone quality.
 
 ### Tuning `MIN_SPEECH_SEC` and `SILENCE_LIMIT_SEC`
 
-- `MIN_SPEECH_SEC`: This parameter defines the minimum duration of detected speech required before a segment is considered valid. Use this to filter out very short utterances such as false starts or background chatter.
+- `MIN_SPEECH_SEC`: This parameter defines the minimum duration of detected speech needed before a segment is considered valid. Use this to filter out very short utterances such as false starts or background chatter.
 	- Lower values: More responsive, but may capture incomplete phrases or noise
 	- Higher values: More stable sentences, but slower response
 
@@ -58,7 +60,7 @@ Based on practical experiments, the following presets provide a good starting po
 |----------------------|----------------------|-------------------------|-------------------|
 | Short command phrases | 0.8 | 0.6 | Optimized for quick voice commands such as "yes", "next", or "stop". Prioritizes responsiveness over sentence completeness. |
 | Natural conversational speech | 1.0 | 1.0 | Balanced settings for everyday dialogue with natural pauses between phrases. |
-| Long-form explanations (for example, tutorials) | 2.0 | 2.0 | Designed for longer sentences and structured explanations, reducing the risk of premature segmentation. |
+| Long-form explanations such as tutorials | 2.0 | 2.0 | Designed for longer sentences and structured explanations, reducing the risk of premature segmentation. |
 
 ## Apply these settings