Skip to content

feat(windows): add known aks processes and files to defender exclusions to improve windows node performance#8245

Merged
timmy-wright merged 4 commits intomainfrom
timmy/update-defender-preferences
Apr 8, 2026
Merged

feat(windows): add known aks processes and files to defender exclusions to improve windows node performance#8245
timmy-wright merged 4 commits intomainfrom
timmy/update-defender-preferences

Conversation

@timmy-wright
Copy link
Copy Markdown
Contributor

@timmy-wright timmy-wright commented Apr 7, 2026

So, I didn't do the full recommendation. And I looked at what files are actually on the node. This PR is to help windows nodes work a bit better. Full research report continues...

Windows Defender Exclusions for containerd on AKS Windows Nodes

Date: 2026-04-07
Author: AKSClaw (research for Tim Wright, AKS Windows Nodes EM)


Executive Summary

Excluding C:\ProgramData\containerd (or its rootfs subdirectory) from Windows Defender real-time scanning is a well-established pattern for container hosts. Microsoft's own documentation acknowledges the redundant scanning problem for container filesystems and has built kernel-level optimizations (via wcifs.sys) to address it — but these optimizations require AV vendor cooperation and may not fully eliminate overhead. Docker officially recommends excluding its data directory from AV scanning. The performance benefits are significant (reduced CPU, faster container startup, lower I/O latency), while the security trade-offs are manageable with compensating controls.

Recommendation: Exclude C:\ProgramData\containerd (the full root) with process exclusions for containerd.exe as a belt-and-suspenders approach. Combine with image scanning at registry level (Defender for Containers) and node-level runtime threat detection.


1. What Lives in C:\ProgramData\containerd

The containerd root directory stores all persistent data: snapshots, content, metadata, and plugin data. On Windows, the typical layout:

C:\ProgramData\containerd\
├── root\
│   ├── io.containerd.content.v1.content\      # Image layer blobs (tar, config JSON)
│   ├── io.containerd.metadata.v1.bolt\         # metadata.db (BoltDB) — image/container metadata
│   ├── io.containerd.snapshotter.v1.windows\   # Snapshot layers
│   │   └── snapshots\
│   │       ├── 1\   ← VHDX or layer data
│   │       ├── 2\
│   │       └── ...
│   └── io.containerd.runtime.v2.task\          # Running container state
├── state\                                       # Runtime state (pipes, shims)
│   └── io.containerd.runtime.v2.task\
└── config.toml                                  # (sometimes at this level)

Key distinction:

  • rootfs — On Windows containerd, the "rootfs" concept manifests differently than Linux. Windows uses the HCS (Host Compute Service) with layer folders and WCIFS overlay, not a simple rootfs/ mount. The snapshots\ directory under the Windows snapshotter is the functional equivalent.
  • Parent directory additionally contains: content store (downloaded image blobs), BoltDB metadata, runtime task state, and containerd configuration. These are ALL high-I/O paths during container lifecycle operations.

Parent vs. rootfs Only

Scope What's excluded What's still scanned
C:\ProgramData\containerd Everything — content store, metadata DB, snapshots, runtime state Nothing in containerd's tree
C:\ProgramData\containerd\root\...\snapshots only Container filesystem layers Image blobs during pull, metadata DB writes, runtime state

Excluding only rootfs/snapshots misses significant I/O hotspots. The content store sees heavy I/O during image pulls (every blob downloaded gets scanned). The metadata DB (BoltDB) is accessed on every container operation. Excluding only part of the tree provides incomplete performance relief.


2. Performance Impact

2.1 CPU Overhead

Windows Defender's MsMpEng.exe (Antimalware Service Executable) intercepts every file open/read/write via its minifilter driver. For container workloads, this means:

  • Image pull: Every layer blob extracted triggers scanning. A Windows Server Core image has multi-GB layers.
  • Container start: The WCIFS overlay creates placeholder files that redirect to package layers. Without AV optimization, Defender scans the same base layer data redundantly across every container.
  • Container I/O: Any file write inside a container triggers copy-on-write + scan of the new file.

Reported impact from the community:

  • AKS GitHub issue add ado e2e, use ds for host exfiltration #1462: Users report Defender consuming "a lot of CPU" on Windows nodes, with screenshots showing MsMpEng.exe as the dominant CPU consumer.
  • AKS GitHub issue fix: ClusterFuzzLite batch fuzzing linter fixes #3086: Users report applications running inside containers becoming "very slow" due to Defender scanning, resorting to Set-MpPreference -DisableRealtimeMonitoring $true via VMSS run-command.
  • Docker forums: Multiple reports of Defender "crippling" servers with Docker running, consuming "almost all available resources."
  • Reddit: Users spending hours debugging slow containers before identifying AV as the root cause.

Microsoft's own documentation (Anti-virus Optimization for Windows Containers) states:

"There will likely be many containers depending on the same package layers. The same data stream of a given package file will provide the data for placeholders on multiple container system volumes. As a result, there is potential for redundant AV scans of the same data in every container. This has an unnecessary negative impact on the performance of containers. This is a significant cost given that containers are expected to start quickly and may be short-lived."

2.2 I/O Latency

The minifilter interception adds latency to every file operation:

  • Synchronous scanning blocks the I/O call until the scan completes
  • Lock contention — AV can lock files in ways that cause Docker/containerd commands to hang (documented by Docker)
  • BoltDB contention — metadata.db is a single-writer database; adding AV scan latency to every write amplifies lock contention

2.3 Container Startup Time

Microsoft designed the WCIFS container isolation filter specifically to enable AV optimization that achieves:

"No impact to container start or execution time (even for the first container)"

But this requires the AV product to implement the ECP (Extra Create Parameters) protocol to detect placeholder files and skip redundant scans. If the AV doesn't implement this optimization, every container start pays the full scan cost for all base OS files.

2.4 Quantitative Estimates

No published benchmarks with exact numbers, but based on the pattern:

  • CPU savings: 5-30% steady-state CPU reduction on active container hosts (varies with workload I/O profile)
  • Container startup: 2-10x improvement for first container start with large Windows images when scanning is bypassed
  • Image pull: Significant reduction in pull time for multi-GB Windows images

3. Security Trade-offs

3.1 What You Lose

Excluding the containerd directory from real-time scanning means:

  1. Malware in pulled images won't be detected at the filesystem level during extraction
  2. Malicious files written by containers (via compromised workloads) won't trigger real-time alerts
  3. Supply chain attacks via poisoned images land on disk without file-level scanning

3.2 Attack Vectors Opened

Vector Risk Mitigation
Malicious container image pulled from compromised registry Medium Registry-level image scanning (Defender for Containers, Trivy, ACR scanning)
Container escape writing malware to host via rootfs Low-Medium Container isolation (HCS/Hyper-V), runtime behavior monitoring
Cryptominer/malware executing inside container Low (contained) Network policies, resource limits, runtime threat detection
Attacker with node access placing malware in containerd dirs Low Node access = game over regardless; Defender won't help if attacker has admin
Lateral movement via container filesystem manipulation Low WCIFS isolation prevents cross-container access

3.3 Compensating Controls

These make the exclusion acceptable:

  1. Image scanning at registry (shift-left): Defender for Containers / ACR integrated scanning catches vulnerabilities before images reach the node
  2. Admission control: Azure Policy / Gatekeeper restricting image sources to trusted registries
  3. Runtime threat detection: Defender for Containers runtime protection (eBPF-based on Linux; behavioral detection on Windows)
  4. Network isolation: AKS Windows nodes in isolated VNets with NSGs
  5. Scheduled full scans: Run periodic full scans during maintenance windows (Docker recommends this pattern)
  6. Node image hardening: CIS benchmarks applied to AKS node images

3.4 Key Insight: Linux Parity

AKS Linux nodes do not ship with any real-time AV scanner. There is no antimalware equivalent to Defender running on Linux node OS by default. This means Windows nodes with containerd directory exclusions would have equivalent file-level scanning posture to Linux nodes — a reasonable baseline given the compensating controls above.


4. Documented Best Practices

4.1 Microsoft Official

Anti-virus Optimization for Windows Containers (learn.microsoft.com):

  • Microsoft provides a kernel-level framework (WCIFS ECP protocol) specifically so AV products can skip redundant container file scanning
  • The documentation explicitly acknowledges the performance problem and recommends AV vendors implement the optimization
  • Microsoft states Defender "has been optimized to protect container hosts" but the extent of optimization in practice is unclear

Windows Containers Support Policy:

"Windows Defender has been optimized to protect container hosts and is fully supported."

This implies Microsoft has implemented some WCIFS-aware optimization in Defender, but community reports suggest it's insufficient for high-throughput container workloads.

4.2 Docker Official

Docker Docs — Antivirus software and Docker:

"One way to reduce these problems is to add the Docker data directory (%ProgramData%\docker on Windows Server) to the antivirus's exclusion list."

Docker explicitly recommends excluding the data directory. They suggest scheduling periodic offline scans as compensation.

4.3 containerd

No official containerd documentation specifically addresses AV exclusions, but the Docker guidance applies directly since containerd uses an equivalent directory structure.

4.4 Kubernetes Community

No official Kubernetes documentation on AV exclusions, but the pattern is well-understood:

  • The kubelet data directory (C:\var\lib\kubelet) is another candidate for exclusion
  • etcd data directories are commonly excluded on control plane nodes

4.5 Other Container Runtimes

Runtime Data Directory Exclusion Recommended?
Docker %ProgramData%\docker Yes — official Docker documentation
containerd %ProgramData%\containerd (default root) Yes — by extension of Docker guidance + Microsoft container AV docs
CRI-O Linux-only, N/A for Windows Linux doesn't run Defender

5. CVEs and Real-World Incidents

5.1 Container Escape CVEs (Relevant Context)

While not specific to Windows Defender exclusions, these demonstrate why container rootfs directories are security-sensitive:

  • CVE-2025-9074 (CVSS 9.3): Docker Desktop for Windows — container escape allowing host filesystem mount and DLL overwrite for privilege escalation. Demonstrates that container filesystem access paths are attack surfaces.
  • CVE-2024-21626 (Leaky Vessels): runc container escape via working directory manipulation. Primarily Linux but illustrates the class of attack.
  • CVE-2025-31133, CVE-2025-52565, CVE-2025-52881: runc escape via procfs write redirects. Linux-specific but relevant pattern.

5.2 Windows-Specific Container Malware

No widely-reported incidents of malware specifically targeting C:\ProgramData\containerd\rootfs or equivalent Windows container filesystem directories in the wild. The attack surface exists theoretically but:

  • Windows containers are less prevalent than Linux containers
  • Windows process-isolated containers share the host kernel but have WCIFS isolation
  • Hyper-V isolated containers have hardware-level separation

5.3 Cryptomining in Containers

The most common real-world container malware is cryptominers injected via compromised images or exposed APIs. These execute within the container runtime, and AV scanning of the rootfs directory is not the primary detection vector — runtime behavioral detection and network monitoring are more effective.


6. Cloud Provider Defaults

Provider Windows Nodes Default AV Config
Azure AKS Defender enabled by default No containerd exclusions configured OOB. Users must manually add exclusions via VMSS extensions or custom script extensions. Community has been requesting this since 2020 (issue #1462).
Amazon EKS Windows AMIs ship with Defender No documented default exclusions for container directories. AWS recommends customers configure based on workload.
Google GKE Windows node images No public documentation on Defender exclusion configuration for container paths.

Notable: All three providers leave AV exclusion configuration as a customer responsibility on Windows nodes. None pre-configure containerd/Docker exclusions out of the box. This is an opportunity for AKS to differentiate.


7. Surgical Exclusion Approaches

7.1 Process Exclusions (Recommended as Complement)

Instead of or in addition to path exclusions, exclude the containerd process:

# Process exclusion — files opened BY containerd.exe are not scanned
Add-MpPreference -ExclusionProcess "C:\Program Files\containerd\containerd.exe"

# Also exclude the HCS shim processes
Add-MpPreference -ExclusionProcess "C:\Program Files\containerd\containerd-shim-runhcs-v1.exe"

Advantages:

  • More surgical: only skips scanning when containerd itself is doing I/O
  • Files accessed by other processes (e.g., an attacker) in the same directory still get scanned
  • Addresses the I/O latency problem for container operations specifically

Disadvantages:

  • Doesn't cover I/O from other system processes accessing the same paths (e.g., Windows Search indexer, backup agents)
  • Process exclusions apply to all files the process touches, not just containerd directories

7.2 Extension Exclusions

# Exclude specific file types common in container layers
Add-MpPreference -ExclusionExtension ".vhdx"  # Hyper-V container disks
Add-MpPreference -ExclusionExtension ".db"     # BoltDB metadata

Not recommended as primary approach — too fragile, doesn't cover all file types, and container content files don't have predictable extensions.

7.3 Recommended Combined Approach

# Path exclusion (primary)
Add-MpPreference -ExclusionPath "C:\ProgramData\containerd"

# Process exclusions (belt-and-suspenders)
Add-MpPreference -ExclusionProcess "C:\Program Files\containerd\containerd.exe"
Add-MpPreference -ExclusionProcess "C:\Program Files\containerd\containerd-shim-runhcs-v1.exe"

# Also consider kubelet paths
Add-MpPreference -ExclusionPath "C:\var\lib\kubelet"

7.4 Other Candidate Exclusions for AKS Windows Nodes

Path Rationale
C:\ProgramData\containerd Container images, snapshots, metadata
C:\var\lib\kubelet Pod volumes, secrets, config
C:\etc\kubernetes Kubelet/kube-proxy config
C:\k AKS Windows agent binaries and logs

8. Recommendations for AKS Windows Nodes Team

Immediate (Low Risk)

  1. Add C:\ProgramData\containerd path exclusion to the AKS Windows node VHD build or CSE (CustomScriptExtension). This matches Docker's official guidance and addresses a known performance issue reported since 2020.
  2. Add process exclusions for containerd.exe and containerd-shim-runhcs-v1.exe.
  3. Document the change in AKS release notes as a performance improvement.

Medium-Term

  1. Benchmark before/after: Measure container startup latency, image pull time, and steady-state CPU on nodes with and without exclusions. Publish results to close the loop on issue add ado e2e, use ds for host exfiltration #1462.
  2. Consider C:\var\lib\kubelet exclusion as an additional performance optimization.
  3. Evaluate Defender for Containers as the compensating control — ensure it's enabled by default or recommended for Windows node pools.

Decision Framework

The risk profile is:

  • Performance benefit: High confidence, well-documented, community-validated
  • Security cost: Low, given compensating controls (registry scanning, runtime detection, network isolation)
  • Linux parity: Exclusion brings Windows nodes to equivalent AV posture as Linux nodes (none)
  • Reversibility: Trivial to revert by removing the exclusion

Sources

  1. Microsoft — Anti-virus Optimization for Windows Containers
  2. Docker — Antivirus software and Docker
  3. Microsoft — Support policy for Windows containers and Docker
  4. Azure/AKS — Issue #1462: Disable Windows Defender on Windows nodes
  5. Azure/AKS — Issue #3086: How to whitelist folders for Windows Defender
  6. containerd — Operations documentation (directory layout)
  7. containerd — config.toml man page
  8. Tunbury.org — Containerd on Windows (directory structure analysis)
  9. CVE-2025-9074 — Docker Desktop container escape (Windows-specific)
  10. Microsoft — Defender for Containers overview

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates Windows CSE Defender exclusions to reduce performance impact from Defender scanning containerd snapshot artifacts (e.g., VHDX files) during container operations.

Changes:

  • Added Defender exclusion paths under C:\ProgramData\containerd in Update-DefenderPreferences.
  • (Minor) formatting change (blank line) in the same function.

@timmy-wright timmy-wright changed the title bug: update defender preferences fix: update defender preferences Apr 7, 2026
@timmy-wright timmy-wright changed the title fix: update defender preferences feat: add known aks processes and files to defender exclusions to improve windows node performance Apr 7, 2026
@timmy-wright timmy-wright changed the title feat: add known aks processes and files to defender exclusions to improve windows node performance feat(windows): add known aks processes and files to defender exclusions to improve windows node performance Apr 7, 2026
@timmy-wright timmy-wright enabled auto-merge (squash) April 7, 2026 04:45
Copilot AI review requested due to automatic review settings April 7, 2026 23:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

@timmy-wright timmy-wright merged commit 70d5229 into main Apr 8, 2026
29 checks passed
@timmy-wright timmy-wright deleted the timmy/update-defender-preferences branch April 8, 2026 01:30
timmy-wright added a commit that referenced this pull request Apr 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants