[ET-VK][ez] Add AOT support for PackedInt8_4C1W dtype#17389
[ET-VK][ez] Add AOT support for PackedInt8_4C1W dtype#17389SS-JIA wants to merge 3 commits intogh/SS-JIA/420/basefrom
Conversation
This adds end-to-end support for the PackedInt8_4C1W memory layout throughout the serialization and AOT pipeline. The 4C1W layout packs 4 channels into a single texel with width-major ordering, which is the natural output layout for convolutions that produce channel-packed results. - Adds PACKED_INT8_4C1W = 8 to the FlatBuffers schema and Python schema class - Adds deserialization mapping in VulkanBackend.cpp - Updates quantize/dequantize per-tensor op registrations to accept any PackedInt8 layout (not just 4W4C), enabling the layout propagation pass to choose the optimal layout - Adds new TensorRepSet constants: PACKED_INT8_BUFFER (all quantized layouts), PACKED_INT8_4C1W_BUFFER, and PACKED_INT8_CHANNELS_PACKED_BUFFER (4W4C + 4C1W) Differential Revision: [D93000167](https://our.internmc.facebook.com/intern/diff/D93000167/) [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17389
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New Failures, 2 Unrelated FailuresAs of commit 32015e1 with merge base 964c565 ( NEW FAILURES - The following jobs have failed:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
BROKEN TRUNK - The following job failed but was present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
This adds end-to-end support for the PackedInt8_4C1W memory layout throughout the serialization and AOT pipeline. The 4C1W layout packs 4 channels into a single texel with width-major ordering, which is the natural output layout for convolutions that produce channel-packed results. - Adds PACKED_INT8_4C1W = 8 to the FlatBuffers schema and Python schema class - Adds deserialization mapping in VulkanBackend.cpp - Updates quantize/dequantize per-tensor op registrations to accept any PackedInt8 layout (not just 4W4C), enabling the layout propagation pass to choose the optimal layout - Adds new TensorRepSet constants: PACKED_INT8_BUFFER (all quantized layouts), PACKED_INT8_4C1W_BUFFER, and PACKED_INT8_CHANNELS_PACKED_BUFFER (4W4C + 4C1W) Differential Revision: [D93000167](https://our.internmc.facebook.com/intern/diff/D93000167/) [ghstack-poisoned]
This adds end-to-end support for the PackedInt8_4C1W memory layout throughout the serialization and AOT pipeline. The 4C1W layout packs 4 channels into a single texel with width-major ordering, which is the natural output layout for convolutions that produce channel-packed results. - Adds PACKED_INT8_4C1W = 8 to the FlatBuffers schema and Python schema class - Adds deserialization mapping in VulkanBackend.cpp - Updates quantize/dequantize per-tensor op registrations to accept any PackedInt8 layout (not just 4W4C), enabling the layout propagation pass to choose the optimal layout - Adds new TensorRepSet constants: PACKED_INT8_BUFFER (all quantized layouts), PACKED_INT8_4C1W_BUFFER, and PACKED_INT8_CHANNELS_PACKED_BUFFER (4W4C + 4C1W) Differential Revision: [D93000167](https://our.internmc.facebook.com/intern/diff/D93000167/) [ghstack-poisoned]
Stack from ghstack (oldest at bottom):
This adds end-to-end support for the PackedInt8_4C1W memory layout throughout the serialization and AOT pipeline. The 4C1W layout packs 4 channels into a single texel with width-major ordering, which is the natural output layout for convolutions that produce channel-packed results.
Differential Revision: D93000167