Speed up host-arm workflow via native execution on Arm64 runners#318
Speed up host-arm workflow via native execution on Arm64 runners#318DrXiao wants to merge 2 commits intosysprog21:masterfrom
Conversation
|
By the way, I have another branch, The commit and the test results are provided as follows: In summary, although using |
mk/arm.mk
Outdated
| @@ -1,4 +1,4 @@ | |||
| ARCH_NAME = armv7l | |||
| ARCH_NAME = armv7l aarch64 | |||
There was a problem hiding this comment.
This is confusing. Not all Arm64 machines support the A32 ISA; therefore, the supported instruction sets must be validated explicitly, without assuming that Arm64 implies Arm32 compatibility.
There was a problem hiding this comment.
After further investigation, it appears that there is no way to validate whether a machine supports the A32 ISA using shell commands.
Therefore, I would like to propose an altenative approach: introducing a new variable, FORCE_NATIVE, to control whether native execution should be enforced. For example:
$ make # Let the build system automatically determine the proper way to run
$ make FORCE_NATIVE=1 # Enforce native executionBy default, the build system performs native execution only when the machine architecture is armv7l; otherwise, QEMU is used to run the generated executables. However, if a user has confirmed that their machine can use native execution, they can manually set FORCE_NATIVE=1 to enforce it.
For Arm64 runners, I found a reference on the Arm Developer website. Although GitHub's official documentation does not describe the hardware details of Arm64 runners, the referenced page states:
Arm-hosted runners are powered by Cobalt 100 processors, based on the Arm Neoverse N2. The free runners have 4 vCPUs and Armv9-A features including Scalable Vector Extension 2 (SVE2).
From this information, we can conclude that GitHub's Arm64 runners use Arm Neoverse N2 processors to run workflows on VMs. According to the Arm Neoverse N2 specifications, the supported ISAs explicitly include A32.
Thus, if the proposed FORCE_NATIVE approach is acceptable, we can directly add a workflow step to run make FORCE_NATIVE=1, allowing native execution on Arm64 runners. As a result, the speedup can still be achieved.
There was a problem hiding this comment.
The above description represents my initial view. I can research further and provide more information if necessary.
There was a problem hiding this comment.
We can introduce an allow list of machines with verified A32 support, eliminating the need for the manual FORCE_NATIVE option.
The original build system automatically uses qemu-arm to run the compiler when targeting Arm32 if the native environment is not armv7l. However, modern Linux distributions typically enable CONFIG_COMPAT, which allows a 64-bit system to execute 32-bit binaries directly without any emulator. Therefore, this commit updates the build system to detect the user's machine model and enable native execution when it is supported.
47f6100 to
8417f89
Compare
Since the build system has been updated to allow native execution on AArch64 when targeting Arm32, this commit improves the workflow definitions by running the "host-arm" job on Arm64 runners to perform validation. As a result of these changes, the "host-arm" job no longer uses "run-on-arch-action".
8417f89 to
30b1d7a
Compare
|
The build system has been updated to detect the user's machine using I have verified the allow list approach on the following hardware platforms:
After adopting the new approach, the above platforms are still able to perform native execution. |
| # | Neoverse-N2 | ARMv9-A | | GitHub's | | ||
| # | | | | Arm-hosted runners | | ||
| # +-------------+--------------+-----------------+--------------------+ | ||
| ALLOW_MACHINES = Cortex-A8 Cortex-A53 Cortex-A72 Cortex-A76 Neoverse-N2 |
There was a problem hiding this comment.
We usually refer to Raspberry Pi as a machine, while Arm Cortex-A72 is treated as an architecture. In this context, machine denotes a concrete hardware platform, distinguishing it from architectural descriptions or CPU cores (e.g. Cortex-A53) and their supported hardware features.
There was a problem hiding this comment.
If we hope that the allowlist is defined in terms of concrete machines (e.g.: ALLOW_MACHINES = "Raspberry Pi 5" "BeagleBone Black"), the only way to identify the running machine appears to be reading /proc/device-tree/model.
# BeagleBone Black
$ cat /proc/device-tree/model
TI AM335x BeagleBone Black
# Raspberry Pi 5
$ cat /proc/device-tree/model
Raspberry Pi 5 Model B Rev 1.0
# On desktop computers or Arm-hosted runners, /proc/device-tree-model does not existThen, we can use /proc/device-tree/model to determine whether the running machine is in the allowlist.
Although this approach may enable Raspberry Pi and BeagleBone boards to perform native execution, it does not work for Arm-hosted runners because a reliable machine name cannot be obtained on those runners.
(It seems difficult to find a consistent way to obtain the running machine name across different hardware platforms.)
There was a problem hiding this comment.
If we hope that the allowlist is defined in terms of concrete machines (e.g.:
ALLOW_MACHINES = "Raspberry Pi 5" "BeagleBone Black"), the only way to identify the running machine appears to be reading/proc/device-tree/model.
...
On desktop computers or Arm-hosted runners, /proc/device-tree-model does not exist
Instead, you can simply install prebuilt fastfetch executable file to determine machine names/types.
|
👋 Hi, I'm an automated AI code review bot. I ran some checks on this PR and found 2 points that might be worth attention (could be false positives, please use your judgment):
If you find these suggestions disruptive, you can reply "stop" , and I'll automatically skip this repository in the future. |
The proposed changes modify the build system to allow native execution on AArch64 when targeting Arm32, and update the workflow definitions for
host-armto use Arm64 runners and perform native bootstrapping and other validations.Advantages
The original workflow of
host-armusesrun-on-arch-actionto validate whether bootstrapping and test cases run successfully. However, this job often takes a long time (6 ~ 8 minutes) because it relies on QEMU(*) to perform the build and run the test cases.(*) If I understand correctly,
run-on-arch-actionactually usesqemu-systemto execute the specified commands, which makes the emulation slow.After applying the proposed changes, the job can be sped up via native execution on Arm64 runners. According to the results,
host-armcan complete bootstrapping and test case validations within 2 minutes.As a result, the workflow is noticeably faster.
Summary by cubic
Run the host-arm workflow natively on Arm64 runners by enabling Arm32 execution on AArch64. This removes QEMU emulation and cuts the job from ~6–8 minutes to ~2 minutes.
Written for commit 30b1d7a. Summary will update on new commits.