[ET-VK][qconv][ez] Make q8ta_im2col shader support stride_w != 1 #17386

SS-JIA · 2026-02-11T20:15:34Z

Stack from ghstack (oldest at bottom):

The im2col path previously had two shaders: the generic q8ta_im2col
which loaded input elements one int8 at a time and supported all strides,
and q8ta_im2col_4w4c which loaded packed int32s (4 channels at once)
but was restricted to stride_w == 1 because it assumed consecutive
width positions in the input (i.e. input_x_base + i).

This diff replaces both shaders with a single q8ta_im2col that uses
the efficient packed loading approach from q8ta_im2col_4w4c while
generalizing the width offset to input_x_base + i * stride_x. This
removes the stride_w == 1 restriction and deletes the separate
q8ta_im2col_4w4c shader entirely.

The C++ dispatch is updated to reference the unified shader name, and
the test gate on stride.w == 1 for the im2col path is removed.

Authored with Claude.

Differential Revision: D93000161

The im2col path previously had two shaders: the generic `q8ta_im2col` which loaded input elements one int8 at a time and supported all strides, and `q8ta_im2col_4w4c` which loaded packed int32s (4 channels at once) but was restricted to stride_w == 1 because it assumed consecutive width positions in the input (i.e. `input_x_base + i`). This diff replaces both shaders with a single `q8ta_im2col` that uses the efficient packed loading approach from `q8ta_im2col_4w4c` while generalizing the width offset to `input_x_base + i * stride_x`. This removes the stride_w == 1 restriction and deletes the separate `q8ta_im2col_4w4c` shader entirely. The C++ dispatch is updated to reference the unified shader name, and the test gate on `stride.w == 1` for the im2col path is removed. Authored with Claude. Differential Revision: [D93000161](https://our.internmc.facebook.com/intern/diff/D93000161/) [ghstack-poisoned]

pytorch-bot · 2026-02-11T20:15:39Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17386

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit e7abc43 with merge base 964c565 ():

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / test-samsung-models-linux / linux-job (gh) (trunk failure)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-02-11T20:16:18Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

…w != 1" The im2col path previously had two shaders: the generic `q8ta_im2col` which loaded input elements one int8 at a time and supported all strides, and `q8ta_im2col_4w4c` which loaded packed int32s (4 channels at once) but was restricted to stride_w == 1 because it assumed consecutive width positions in the input (i.e. `input_x_base + i`). This diff replaces both shaders with a single `q8ta_im2col` that uses the efficient packed loading approach from `q8ta_im2col_4w4c` while generalizing the width offset to `input_x_base + i * stride_x`. This removes the stride_w == 1 restriction and deletes the separate `q8ta_im2col_4w4c` shader entirely. The C++ dispatch is updated to reference the unified shader name, and the test gate on `stride.w == 1` for the im2col path is removed. Authored with Claude. Differential Revision: [D93000161](https://our.internmc.facebook.com/intern/diff/D93000161/) [ghstack-poisoned]

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 11, 2026

meta-codesync bot added fb-exported meta-exported labels Feb 11, 2026

SS-JIA mentioned this pull request Feb 11, 2026

Back out "[Diff Train][pytorch/executorch] Apply fixup patch to fbsource" #17399

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ET-VK][qconv][ez] Make q8ta_im2col shader support stride_w != 1 #17386

[ET-VK][qconv][ez] Make q8ta_im2col shader support stride_w != 1 #17386

SS-JIA commented Feb 11, 2026 •

edited

Loading

Uh oh!

pytorch-bot bot commented Feb 11, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[ET-VK][qconv][ez] Make q8ta_im2col shader support stride_w != 1 #17386

Are you sure you want to change the base?

[ET-VK][qconv][ez] Make q8ta_im2col shader support stride_w != 1 #17386

Conversation

SS-JIA commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17386

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

github-actions bot commented Feb 11, 2026

This PR needs a release notes: label

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

SS-JIA commented Feb 11, 2026 •

edited

Loading

pytorch-bot bot commented Feb 11, 2026 •

edited

Loading

This PR needs a `release notes:` label