-
Notifications
You must be signed in to change notification settings - Fork 837
Arm backend: Support for aten.slice_scatter and slice_copy with non-unit step #17413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Decompose edge.aten.slice_scatter into fast and general paths - Fast path (step==1): slice_copy + cat on the updated dimension - General path (step>1): arange + index_put with permute - Fix slice_copy lowering to accept the 5-arg form Change-Id: Ic21e995bee8f722f6d03ec3cc3b04747186a8a9b Co-authored-by: Rob Elliott <Robert.Elliott@arm.com> Signed-off-by: Yufeng Shi <yufeng.shi@arm.com> Signed-off-by: Rob Elliott <Robert.Elliott@arm.com>
- Decompose strided slice_copy into unit-step slice_copy plus optional right padding and view_copy reshapes - Update SliceCopySupported check for the supported pattern - Add non-unit-step slice tests Change-Id: Ida60ee2f42c283d50c9e3185dca1f9ea2238cf83 Signed-off-by: Yufeng Shi <yufeng.shi@arm.com> Signed-off-by: Rob Elliott <Robert.Elliott@arm.com>
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17413
Note: Links to docs will display an error until the docs builds have been completed. ❌ 5 New Failures, 6 Cancelled JobsAs of commit 58faa08 with merge base b871398 ( NEW FAILURES - The following jobs have failed:
CANCELLED JOBS - The following jobs were cancelled. Please retry:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Adds Arm backend support for aten.slice_copy with non-unit step and aten.slice_scatter.default by introducing decomposition passes and expanding test coverage across TOSA/Ethos/VGF pipelines.
Changes:
- Add decomposition passes for strided
slice_copyand forslice_scatter(fast path forstep==1, general path forstep>1). - Update slice_copy lowering/support checks to handle the 5-arg form and validate dtype/profile constraints.
- Add new/expanded tests for non-unit-step slice and slice_scatter across backends.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| backends/arm/test/ops/test_slice_scatter.py | Adds slice_scatter tests for unit and non-unit step across TOSA/Ethos/VGF. |
| backends/arm/test/ops/test_slice.py | Adds non-unit-step slice tests across TOSA/Ethos/VGF. |
| backends/arm/test/models/stable_diffusion/test_CLIPTextModelWithProjection.py | Marks a known-failing TOSA FP test as xfail. |
| backends/arm/operators/op_slice.py | Updates slice_copy lowering to accept 5-arg form and enforce step==1 at lowering. |
| backends/arm/operator_support/tosa_profile_supported_op_lists.py | Adds aten.slice_scatter.default to supported ops list. |
| backends/arm/operator_support/slice_copy_support.py | Expands slice_copy support checks (arg count, positive step, dtype/profile validation). |
| backends/arm/_passes/decompose_strided_slice_copy_pass.py | Introduces decomposition for non-unit-step slice_copy into unit-step ops. |
| backends/arm/_passes/decompose_slice_scatter_pass.py | Introduces decomposition for slice_scatter into fast/general lowering forms. |
| backends/arm/_passes/arm_pass_manager.py | Wires new decomposition passes into TOSA and annotation pipelines. |
| backends/arm/_passes/init.py | Exports the newly added passes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # index_positions: [W] where W = len(arange(start_i, end_i, step)) | ||
| index_positions = super().call_operator( | ||
| arange_op, | ||
| (start_i, end_i, step), | ||
| {"dtype": torch.int32, "device": input_device}, | ||
| meta, | ||
| updated=True, | ||
| ) | ||
|
|
||
| src_val = src.data | ||
| src_shape = src_val.shape | ||
| index_shape = index_positions.data.shape | ||
| # slice_shape is input_shape with dim_norm replaced by W | ||
| # input_shape: [d0, ..., D, ..., d{r-1}] | ||
| # -> slice_shape: [d0, ..., W, ..., d{r-1}] | ||
| slice_shape = list(input_shape) | ||
| slice_shape[dim_norm] = int(index_shape[0]) |
Copilot
AI
Feb 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
arange is always emitted even when step == 1 (fast path), and the resulting index_positions is unused in that branch. If dead-code elimination doesn’t reliably remove it before partitioning/lowering, this can introduce an unnecessary (and potentially unsupported) op in the step==1 path. Move index_positions = arange(...) (and any dependent shape checks) inside the step > 1 branch; for step == 1, compute the expected slice length directly from start_i/end_i for validation.
| # index_positions: [W] where W = len(arange(start_i, end_i, step)) | |
| index_positions = super().call_operator( | |
| arange_op, | |
| (start_i, end_i, step), | |
| {"dtype": torch.int32, "device": input_device}, | |
| meta, | |
| updated=True, | |
| ) | |
| src_val = src.data | |
| src_shape = src_val.shape | |
| index_shape = index_positions.data.shape | |
| # slice_shape is input_shape with dim_norm replaced by W | |
| # input_shape: [d0, ..., D, ..., d{r-1}] | |
| # -> slice_shape: [d0, ..., W, ..., d{r-1}] | |
| slice_shape = list(input_shape) | |
| slice_shape[dim_norm] = int(index_shape[0]) | |
| src_val = src.data | |
| src_shape = src_val.shape | |
| if step == 1: | |
| # Fast path: contiguous slice, avoid materializing index_positions via arange. | |
| # slice_shape is input_shape with dim_norm replaced by W = end_i - start_i. | |
| # input_shape: [d0, ..., D, ..., d{r-1}] | |
| # -> slice_shape: [d0, ..., W, ..., d{r-1}] | |
| slice_shape = list(input_shape) | |
| slice_shape[dim_norm] = int(end_i - start_i) | |
| else: | |
| # index_positions: [W] where W = len(arange(start_i, end_i, step)) | |
| index_positions = super().call_operator( | |
| arange_op, | |
| (start_i, end_i, step), | |
| {"dtype": torch.int32, "device": input_device}, | |
| meta, | |
| updated=True, | |
| ) | |
| index_shape = index_positions.data.shape | |
| # slice_shape is input_shape with dim_norm replaced by W | |
| # input_shape: [d0, ..., D, ..., d{r-1}] | |
| # -> slice_shape: [d0, ..., W, ..., d{r-1}] | |
| slice_shape = list(input_shape) | |
| slice_shape[dim_norm] = int(index_shape[0]) |
| # ---- fast path: contiguous update (step == 1) ---- | ||
| if step == 1: |
Copilot
AI
Feb 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
arange is always emitted even when step == 1 (fast path), and the resulting index_positions is unused in that branch. If dead-code elimination doesn’t reliably remove it before partitioning/lowering, this can introduce an unnecessary (and potentially unsupported) op in the step==1 path. Move index_positions = arange(...) (and any dependent shape checks) inside the step > 1 branch; for step == 1, compute the expected slice length directly from start_i/end_i for validation.
| return super().call_operator(op, args, kwargs, meta) | ||
|
|
||
| x, dim, start, end, step = args | ||
| assert step > 0, "slice_copy step must be positive" |
Copilot
AI
Feb 12, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid using assert for input/graph validation in production passes since assertions can be stripped with optimization flags and produce less actionable failures. Prefer raising NotImplementedError or RuntimeError with a message consistent with other pass validations (e.g., the way DecomposeSliceScatterPass handles step <= 0).
| assert step > 0, "slice_copy step must be positive" | |
| if step <= 0: | |
| raise RuntimeError(f"slice_copy step must be positive, got {step}") |
Arm backend: Add support for aten.slice_copy with non-unit step
right padding and view_copy reshapes
Arm backend: Add support for aten.slice_scatter.default
Change-Id: Ida60ee2f42c283d50c9e3185dca1f9ea2238cf83
Change-Id: Ic21e995bee8f722f6d03ec3cc3b04747186a8a9b
Co-authored-by: Rob Elliott Robert.Elliott@arm.com
Signed-off-by: Yufeng Shi yufeng.shi@arm.com
Signed-off-by: Rob Elliott Robert.Elliott@arm.com
cc @freddan80 @per @zingo @oscarandersson8218 @digantdesai