Skip to content

Conversation

@dawidborycki
Copy link
Contributor

Before submitting a pull request for a new Learning Path, please review Create a Learning Path

  • I have reviewed Create a Learning Path

Please do not include any confidential information in your contribution. This includes confidential microarchitecture details and unannounced product information.

  • I have checked my contribution for confidential information

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of the Creative Commons Attribution 4.0 International License.

Copy link

@GemmaParis GemmaParis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first chapter the "theory" chapter, perhaps it is too long, but I like your style of writing and the content conveyed. Let's see what the LP team thinks of this. The rest looks great! I have picked the need to upgrade from "Arm compute library" to "Arm Kleidi Kernels". See comments

1. Cross-platform support. ORT runs on Windows, Linux, macOS, and mobile operating systems like Android and iOS. It has first-class support for both x86 and Arm64 architectures, making it ideal for deployment on devices ranging from cloud servers to Raspberry Pi boards and smartphones.

2. Hardware acceleration. ORT integrates with a wide range of execution providers (EPs) that tap into hardware capabilities:
* Arm NEON / Arm Compute Library for efficient CPU execution on Arm64.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say "Arm Kleidi kernels accelerated with Arm Neon, SVE2 and SME2, for efficient CPU execution on Arm64"

A typical ONNX workflow looks like this:
1. Train the model. You first use your preferred framework (e.g., PyTorch, TensorFlow, or scikit-learn) to design and train a model. At this stage, you benefit from the flexibility and ecosystem of the framework of your choice.
2. Export to ONNX. Once trained, the model is exported into the ONNX format using built-in converters (such as torch.onnx.export for PyTorch). This produces a portable .onnx file describing the network architecture, weights, and metadata.
3. Run inference with ONNX Runtime. The ONNX model can now be executed on different devices using ONNX Runtime. On Arm64 hardware, ONNX Runtime takes advantage of the Arm Compute Library and NEON instructions, while on Android devices it can leverage NNAPI for mobile accelerators.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Arm Kleidi kernels accelerated with NEON, SVE2 and SME2 instructions"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants