-
Notifications
You must be signed in to change notification settings - Fork 835
Description
Summary
The Vulkan backend produces all-zero outputs on a PowerVR D-Series GPU (Google Pixel 10 Pro). The same model files work correctly on macOS via MoltenVK and on Android via XNNPACK.
Environment
- Device: Google Pixel 10 Pro
- GPU: PowerVR D-Series DXT-48-1536 MC1 (
maxImageDimension3D = 2048) - ExecuTorch: Built from source, branch
fix/vulkan-texture-ubo-budget(my UBO budget fix from PR Fix Vulkan texture tensor UBO budget overflow #17294, rebased on upstream/main as of Feb 7 2026, commitba2516cefa) - NDK: 28.2.13676358 (Clang 19.0.1)
- Build: Release, arm64-v8a, XNNPACK + Vulkan backends
What I Observe
| Platform | Backend | YOLO (v8) | MobileNet |
|---|---|---|---|
| macOS | XNNPACK | Correct | Correct |
| macOS | Vulkan (MoltenVK) | Correct | Correct |
| Android | XNNPACK | Correct | Correct |
| Android | Vulkan (PowerVR) | All zeros | NaN values |
- YOLO: Output tensor shape
[1, 84, 8400]is correct, but all confidence values are exactly0.0 - MobileNet: Output contains NaN values
- Both
texture_limits: (2048, 2048, 2048)andstorage_type_override: BUFFERproduce the same zero results
What I Found by Adding Tracing
I added __android_log_print traces to VulkanBackend.cpp, ComputeGraph.cpp, and StagingBuffer.cpp to narrow things down. Key findings:
1. GPU is PowerVR — no support in ExecuTorch
GPU: name='powervr d-series dxt-48-1536 mc1', max3D=2048
ExecuTorch's Vulkan backend only handles Adreno, Mali, NVIDIA, and SwiftShader. There is zero PowerVR-specific handling.
2. Output texture extents exceed maxImageDimension3D
output_tensor[3] extents=(2100,64,1) max3D=2048 EXCEEDS=1
output_tensor[4] extents=(2100,4,16) max3D=2048 EXCEEDS=1
2100 = 8400 anchors / 4 (texel packing). The detection head outputs also exceed the limit. This is only checked in #ifdef VULKAN_DEBUG builds (Tensor.cpp:632-660), so release builds silently hit undefined behavior.
I exported with texture_limits: (2048, 2048, 2048) in VulkanPartitioner, but this only controls which ops get delegated — it doesn't account for texel packing that turns dimension 8400 into texture extent 2100.
3. Even in-limit tensors are zero
Output tensors 0-2 have extents within the 2048 limit (e.g., (80,80,16), (40,40,32), (20,20,64)), but they are also all zeros. This suggests either intermediate tensors also exceed limits, or one bad texture corrupts the entire command buffer state on PowerVR.
4. Execution mechanics work fine
- 332 nodes encode and submit, fence waits successfully
- Staging buffers have correct memory flags (
HOST_VISIBLE | HOST_COHERENT | DEVICE_LOCAL) - Input data is valid (verified non-zero values in staging buffer)
- GPU "completes" work but staging buffers read back all zeros
5. Single command buffer didn't help
I tried forcing all dispatches into one command buffer (setting execute_threshold_node_count to UINT32_MAX). Same result — all zeros.
Related
- PR Fix Vulkan texture tensor UBO budget overflow #17294 — my fix for a separate UBO budget crash (
uniform data allocation has exceeded tensor uniform buffer size). That fix prevents a crash but does not affect the zero-output issue. cases.py:1464-1470— there's an existing TODO noting Android arm64 failures where "writes from the first or second shader dispatch being 'ignored'" which matches my symptoms exactly.
Questions
- Is PowerVR expected to work at all with the Vulkan backend? Or is it currently untested/unsupported?
- Could the
texture_limitspartitioner option be made aware of texel packing so it avoids delegating ops whose packed extents exceedmaxImageDimension3D? - Should the texture extent check in
Tensor.cpp:632-660be enabled in release builds (not just debug)? - Any other suggestions for debugging this? Happy to add more traces or test patches.