[ET Device Support] DeviceAllocator interface and DeviceAllocatorRegistry by Gasoonjia · Pull Request #17535 · pytorch/executorch

Gasoonjia · 2026-02-18T19:25:54Z

Stack from ghstack (oldest at bottom):

[ET Device Support] Add NonConstBufferDevice schema for per-buffer device mapping #18330
-> [ET Device Support] DeviceAllocator interface and DeviceAllocatorRegistry #17535
[ET Device Support] Annotate device attributes of CUDA backend IO tensors cuda device #18080

This diff introduces the DeviceAllocator abstract interface and DeviceAllocatorRegistry for device-specific memory allocation. This is a foundational abstraction that enables the runtime to dispatch memory operations to the appropriate device backend other than CPU (CUDA, etc.).

DeviceAllocator interface provides:

init_buffer() - Initialize memory buffer pools for memory-planned tensors
get_offset_address() - Get pointer to offset within pre-allocated buffer
allocate() / deallocate() - Dynamic device memory allocation
copy_host_to_device() / copy_device_to_host() - Data transfer between host and device
device_type() - Returns the device type this allocator handles

DeviceAllocatorRegistry provides:

Singleton registry mapping DeviceType → DeviceAllocator
register_allocator() / get_allocator() methods
Fixed-size array indexed by device type (no dynamic allocation, embedded-friendly)

Design notes:

Registry stores raw pointers (non-owning) - allocators are expected to be singletons with static lifetime
Follows ExecuTorch's embedded-first philosophy (no std::unique_ptr, no heap allocation in registry)
Convenience free functions register_device_allocator() and get_device_allocator() for ease of use

Differential Revision: D93635656

…stry This diff introduces the `DeviceAllocator` abstract interface and `DeviceAllocatorRegistry` for device-specific memory allocation. This is a foundational abstraction that enables the runtime to dispatch memory operations to the appropriate device backend other than CPU (CUDA, etc.). **DeviceAllocator interface provides:** - `init_buffer()` - Initialize memory buffer pools for memory-planned tensors - `get_offset_address()` - Get pointer to offset within pre-allocated buffer - `allocate()` / `deallocate()` - Dynamic device memory allocation - `copy_host_to_device()` / `copy_device_to_host()` - Data transfer between host and device - `device_type()` - Returns the device type this allocator handles **DeviceAllocatorRegistry provides:** - Singleton registry mapping DeviceType → DeviceAllocator - `register_allocator()` / `get_allocator()` methods - Fixed-size array indexed by device type (no dynamic allocation, embedded-friendly) **Design notes:** - Registry stores raw pointers (non-owning) - allocators are expected to be singletons with static lifetime - Follows ExecuTorch's embedded-first philosophy (no std::unique_ptr, no heap allocation in registry) - Convenience free functions `register_device_allocator()` and `get_device_allocator()` for ease of use Differential Revision: [D93635656](https://our.internmc.facebook.com/intern/diff/D93635656/) [ghstack-poisoned]

pytorch-bot · 2026-02-18T19:25:58Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17535

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 71e7d02 with merge base 0cafcb2 ():

NEW FAILURE - The following job has failed:

Propose to merge ghstack orig PRs to main / Try to create a PR with ghstack /orig branch (gh)
Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…stry This diff introduces the `DeviceAllocator` abstract interface and `DeviceAllocatorRegistry` for device-specific memory allocation. This is a foundational abstraction that enables the runtime to dispatch memory operations to the appropriate device backend other than CPU (CUDA, etc.). **DeviceAllocator interface provides:** - `init_buffer()` - Initialize memory buffer pools for memory-planned tensors - `get_offset_address()` - Get pointer to offset within pre-allocated buffer - `allocate()` / `deallocate()` - Dynamic device memory allocation - `copy_host_to_device()` / `copy_device_to_host()` - Data transfer between host and device - `device_type()` - Returns the device type this allocator handles **DeviceAllocatorRegistry provides:** - Singleton registry mapping DeviceType → DeviceAllocator - `register_allocator()` / `get_allocator()` methods - Fixed-size array indexed by device type (no dynamic allocation, embedded-friendly) **Design notes:** - Registry stores raw pointers (non-owning) - allocators are expected to be singletons with static lifetime - Follows ExecuTorch's embedded-first philosophy (no std::unique_ptr, no heap allocation in registry) - Convenience free functions `register_device_allocator()` and `get_device_allocator()` for ease of use Differential Revision: [D93635656](https://our.internmc.facebook.com/intern/diff/D93635656/) ghstack-source-id: 342367956 Pull Request resolved: #17535

github-actions · 2026-02-18T19:27:27Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

…locatorRegistry" This diff introduces the `DeviceAllocator` abstract interface and `DeviceAllocatorRegistry` for device-specific memory allocation. This is a foundational abstraction that enables the runtime to dispatch memory operations to the appropriate device backend other than CPU (CUDA, etc.). **DeviceAllocator interface provides:** - `init_buffer()` - Initialize memory buffer pools for memory-planned tensors - `get_offset_address()` - Get pointer to offset within pre-allocated buffer - `allocate()` / `deallocate()` - Dynamic device memory allocation - `copy_host_to_device()` / `copy_device_to_host()` - Data transfer between host and device - `device_type()` - Returns the device type this allocator handles **DeviceAllocatorRegistry provides:** - Singleton registry mapping DeviceType → DeviceAllocator - `register_allocator()` / `get_allocator()` methods - Fixed-size array indexed by device type (no dynamic allocation, embedded-friendly) **Design notes:** - Registry stores raw pointers (non-owning) - allocators are expected to be singletons with static lifetime - Follows ExecuTorch's embedded-first philosophy (no std::unique_ptr, no heap allocation in registry) - Convenience free functions `register_device_allocator()` and `get_device_allocator()` for ease of use Differential Revision: [D93635656](https://our.internmc.facebook.com/intern/diff/D93635656/) [ghstack-poisoned]

…stry Pull Request resolved: #17535 This diff introduces the `DeviceAllocator` abstract interface and `DeviceAllocatorRegistry` for device-specific memory allocation. This is a foundational abstraction that enables the runtime to dispatch memory operations to the appropriate device backend other than CPU (CUDA, etc.). **DeviceAllocator interface provides:** - `init_buffer()` - Initialize memory buffer pools for memory-planned tensors - `get_offset_address()` - Get pointer to offset within pre-allocated buffer - `allocate()` / `deallocate()` - Dynamic device memory allocation - `copy_host_to_device()` / `copy_device_to_host()` - Data transfer between host and device - `device_type()` - Returns the device type this allocator handles **DeviceAllocatorRegistry provides:** - Singleton registry mapping DeviceType → DeviceAllocator - `register_allocator()` / `get_allocator()` methods - Fixed-size array indexed by device type (no dynamic allocation, embedded-friendly) **Design notes:** - Registry stores raw pointers (non-owning) - allocators are expected to be singletons with static lifetime - Follows ExecuTorch's embedded-first philosophy (no std::unique_ptr, no heap allocation in registry) - Convenience free functions `register_device_allocator()` and `get_device_allocator()` for ease of use Differential Revision: [D93635656](https://our.internmc.facebook.com/intern/diff/D93635656/) ghstack-source-id: 342371816

…locatorRegistry" This diff introduces the `DeviceAllocator` abstract interface and `DeviceAllocatorRegistry` for device-specific memory allocation. This is a foundational abstraction that enables the runtime to dispatch memory operations to the appropriate device backend other than CPU (CUDA, etc.). **DeviceAllocator interface provides:** - `init_buffer()` - Initialize memory buffer pools for memory-planned tensors - `get_offset_address()` - Get pointer to offset within pre-allocated buffer - `allocate()` / `deallocate()` - Dynamic device memory allocation - `copy_host_to_device()` / `copy_device_to_host()` - Data transfer between host and device - `device_type()` - Returns the device type this allocator handles **DeviceAllocatorRegistry provides:** - Singleton registry mapping DeviceType → DeviceAllocator - `register_allocator()` / `get_allocator()` methods - Fixed-size array indexed by device type (no dynamic allocation, embedded-friendly) **Design notes:** - Registry stores raw pointers (non-owning) - allocators are expected to be singletons with static lifetime - Follows ExecuTorch's embedded-first philosophy (no std::unique_ptr, no heap allocation in registry) - Convenience free functions `register_device_allocator()` and `get_device_allocator()` for ease of use Differential Revision: [D93635656](https://our.internmc.facebook.com/intern/diff/D93635656/) [ghstack-poisoned]

…stry Pull Request resolved: #17535 This diff introduces the `DeviceAllocator` abstract interface and `DeviceAllocatorRegistry` for device-specific memory allocation. This is a foundational abstraction that enables the runtime to dispatch memory operations to the appropriate device backend other than CPU (CUDA, etc.). **DeviceAllocator interface provides:** - `allocate()` / `deallocate()` - Dynamic device memory allocation - `copy_host_to_device()` / `copy_device_to_host()` - Data transfer between host and device - `device_type()` - Returns the device type this allocator handles **DeviceAllocatorRegistry provides:** - Singleton registry mapping DeviceType → DeviceAllocator - `register_allocator()` / `get_allocator()` methods - Fixed-size array indexed by device type (no dynamic allocation, embedded-friendly) **Design notes:** - Registry stores raw pointers (non-owning) - allocators are expected to be singletons with static lifetime - Follows ExecuTorch's embedded-first philosophy (no std::unique_ptr, no heap allocation in registry) - Convenience free functions `register_device_allocator()` and `get_device_allocator()` for ease of use ghstack-source-id: 350691519 Differential Revision: [D93635656](https://our.internmc.facebook.com/intern/diff/D93635656/)

…locatorRegistry" This diff introduces the `DeviceAllocator` abstract interface and `DeviceAllocatorRegistry` for device-specific memory allocation. This is a foundational abstraction that enables the runtime to dispatch memory operations to the appropriate device backend other than CPU (CUDA, etc.). **DeviceAllocator interface provides:** - `init_buffer()` - Initialize memory buffer pools for memory-planned tensors - `get_offset_address()` - Get pointer to offset within pre-allocated buffer - `allocate()` / `deallocate()` - Dynamic device memory allocation - `copy_host_to_device()` / `copy_device_to_host()` - Data transfer between host and device - `device_type()` - Returns the device type this allocator handles **DeviceAllocatorRegistry provides:** - Singleton registry mapping DeviceType → DeviceAllocator - `register_allocator()` / `get_allocator()` methods - Fixed-size array indexed by device type (no dynamic allocation, embedded-friendly) **Design notes:** - Registry stores raw pointers (non-owning) - allocators are expected to be singletons with static lifetime - Follows ExecuTorch's embedded-first philosophy (no std::unique_ptr, no heap allocation in registry) - Convenience free functions `register_device_allocator()` and `get_device_allocator()` for ease of use Differential Revision: [D93635656](https://our.internmc.facebook.com/intern/diff/D93635656/) [ghstack-poisoned]

…stry Pull Request resolved: #17535 This diff introduces the `DeviceAllocator` abstract interface and `DeviceAllocatorRegistry` for device-specific memory allocation. This is a foundational abstraction that enables the runtime to dispatch memory operations to the appropriate device backend other than CPU (CUDA, etc.). **DeviceAllocator interface provides:** - `allocate()` / `deallocate()` - Dynamic device memory allocation - `copy_host_to_device()` / `copy_device_to_host()` - Data transfer between host and device - `device_type()` - Returns the device type this allocator handles **DeviceAllocatorRegistry provides:** - Singleton registry mapping DeviceType → DeviceAllocator - `register_allocator()` / `get_allocator()` methods - Fixed-size array indexed by device type (no dynamic allocation, embedded-friendly) **Design notes:** - Registry stores raw pointers (non-owning) - allocators are expected to be singletons with static lifetime - Follows ExecuTorch's embedded-first philosophy (no std::unique_ptr, no heap allocation in registry) - Convenience free functions `register_device_allocator()` and `get_device_allocator()` for ease of use ghstack-source-id: 351558865 Differential Revision: [D93635656](https://our.internmc.facebook.com/intern/diff/D93635656/)

…locatorRegistry" This diff introduces the `DeviceAllocator` abstract interface and `DeviceAllocatorRegistry` for device-specific memory allocation. This is a foundational abstraction that enables the runtime to dispatch memory operations to the appropriate device backend other than CPU (CUDA, etc.). **DeviceAllocator interface provides:** - `init_buffer()` - Initialize memory buffer pools for memory-planned tensors - `get_offset_address()` - Get pointer to offset within pre-allocated buffer - `allocate()` / `deallocate()` - Dynamic device memory allocation - `copy_host_to_device()` / `copy_device_to_host()` - Data transfer between host and device - `device_type()` - Returns the device type this allocator handles **DeviceAllocatorRegistry provides:** - Singleton registry mapping DeviceType → DeviceAllocator - `register_allocator()` / `get_allocator()` methods - Fixed-size array indexed by device type (no dynamic allocation, embedded-friendly) **Design notes:** - Registry stores raw pointers (non-owning) - allocators are expected to be singletons with static lifetime - Follows ExecuTorch's embedded-first philosophy (no std::unique_ptr, no heap allocation in registry) - Convenience free functions `register_device_allocator()` and `get_device_allocator()` for ease of use Differential Revision: [D93635656](https://our.internmc.facebook.com/intern/diff/D93635656/) [ghstack-poisoned]

digantdesai

Review automatically exported from Phabricator review in Meta.

JacobSzwejbka · 2026-04-29T17:18:48Z

+ * A mock DeviceAllocator implementation for testing purposes.
+ * Tracks calls to verify the registry dispatches correctly.
+ */
+class MockDeviceAllocator : public DeviceAllocator {


Not super sure how valuable the mock tests are

…locatorRegistry" This diff introduces the `DeviceAllocator` abstract interface and `DeviceAllocatorRegistry` for device-specific memory allocation. This is a foundational abstraction that enables the runtime to dispatch memory operations to the appropriate device backend other than CPU (CUDA, etc.). **DeviceAllocator interface provides:** - `init_buffer()` - Initialize memory buffer pools for memory-planned tensors - `get_offset_address()` - Get pointer to offset within pre-allocated buffer - `allocate()` / `deallocate()` - Dynamic device memory allocation - `copy_host_to_device()` / `copy_device_to_host()` - Data transfer between host and device - `device_type()` - Returns the device type this allocator handles **DeviceAllocatorRegistry provides:** - Singleton registry mapping DeviceType → DeviceAllocator - `register_allocator()` / `get_allocator()` methods - Fixed-size array indexed by device type (no dynamic allocation, embedded-friendly) **Design notes:** - Registry stores raw pointers (non-owning) - allocators are expected to be singletons with static lifetime - Follows ExecuTorch's embedded-first philosophy (no std::unique_ptr, no heap allocation in registry) - Convenience free functions `register_device_allocator()` and `get_device_allocator()` for ease of use Differential Revision: [D93635656](https://our.internmc.facebook.com/intern/diff/D93635656/) [ghstack-poisoned]

…sors cuda device (#18080) Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #18330 * #17535 * __->__ #18080 Update cuda backend partitioner to annotate its IO tensors as cuda device Differential Revision: [D96010436](https://our.internmc.facebook.com/intern/diff/D96010436/)

…vice mapping (#18330) Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #18330 * #17535 * #18080 Adds the NonConstBufferDevice table to the FlatBuffer schema (program.fbs) and the corresponding Python dataclass to schema.py. This enables mapping each non-constant planned memory buffer to a specific device type (CPU, CUDA, etc.). The field is optional and absent for CPU-only programs, ensuring zero binary size regression. Differential Revision: [D97335597](https://our.internmc.facebook.com/intern/diff/D97335597/)

Gasoonjia requested review from JacobSzwejbka and lucylq as code owners February 18, 2026 19:25

Gasoonjia mentioned this pull request Feb 18, 2026

[ET Device Support] Schema changes: device info on Tensor and buffer-level device array #17533

Merged

Gasoonjia mentioned this pull request Feb 18, 2026

[ET Device Support] TensorImpl carries device info #17534

Merged

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 18, 2026

meta-codesync Bot added fb-exported meta-exported labels Feb 18, 2026

Gasoonjia mentioned this pull request Mar 19, 2026

[ET Device Support] Parse device info from serialized tensor in tensor_parser #18328

Merged

Gasoonjia mentioned this pull request Mar 19, 2026

[ET Device Support] Add NonConstBufferDevice schema for per-buffer device mapping #18330

Merged

Gasoonjia added 2 commits March 19, 2026 11:44

Gasoonjia mentioned this pull request Mar 20, 2026

[ET Device Support] Device-aware memory planning: separate buffers per device type #18375

Merged

This was referenced Apr 6, 2026

[ET Device Support] Define AOT device copy ops registry #18728

Merged

[ET Device Support] Define et_copy runtime h2d and d2h copy ops #18729

Open

[ET Device Support] PropagateDevicePass inserts H2D/D2H copy ops at delegate boundaries #18730

Open

digantdesai approved these changes Apr 15, 2026

View reviewed changes

JacobSzwejbka reviewed Apr 29, 2026

View reviewed changes

JacobSzwejbka approved these changes Apr 29, 2026

View reviewed changes

Gasoonjia added 3 commits May 8, 2026 14:45

Gasoonjia merged commit fa44bce into gh/gasoonjia/124/base May 11, 2026
174 of 175 checks passed

Gasoonjia deleted the gh/gasoonjia/124/head branch May 11, 2026 20:21

Gasoonjia had a problem deploying to cherry-pick-bot May 11, 2026 20:21 — with GitHub Actions Failure

Gasoonjia had a problem deploying to cherry-pick-bot May 11, 2026 20:24 — with GitHub Actions Failure

Gasoonjia had a problem deploying to cherry-pick-bot May 11, 2026 20:31 — with GitHub Actions Failure

Gasoonjia had a problem deploying to cherry-pick-bot May 11, 2026 20:37 — with GitHub Actions Failure

Gasoonjia had a problem deploying to cherry-pick-bot May 11, 2026 20:43 — with GitHub Actions Failure

Gasoonjia had a problem deploying to cherry-pick-bot May 11, 2026 21:56 — with GitHub Actions Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ET Device Support] DeviceAllocator interface and DeviceAllocatorRegistry#17535

[ET Device Support] DeviceAllocator interface and DeviceAllocatorRegistry#17535
Gasoonjia merged 12 commits into
gh/gasoonjia/124/basefrom
gh/gasoonjia/124/head

Gasoonjia commented Feb 18, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Feb 18, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Feb 18, 2026

Uh oh!

digantdesai left a comment

Uh oh!

JacobSzwejbka Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Gasoonjia commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17535

❌ 1 New Failure

Uh oh!

github-actions Bot commented Feb 18, 2026

This PR needs a release notes: label

Uh oh!

digantdesai left a comment

Choose a reason for hiding this comment

Uh oh!

JacobSzwejbka Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Gasoonjia commented Feb 18, 2026 •

edited

Loading

pytorch-bot Bot commented Feb 18, 2026 •

edited

Loading

This PR needs a `release notes:` label