🐛 Describe the bug
Wth this command:
./examples/arm/run.sh --model_name=llama
I get this error:
Running e2e flow for model 'llama' with flags '--delegate --quantize '
--------------------------------------------------------------------------------
CALL python3 -m backends.arm.scripts.aot_arm_compiler --model_name=llama --target=ethos-u55-128 --delegate --quantize --intermediate=/home/bpang/mywork/Ethos-U85/executorch/arm_test/llama
--output=/home/bpang/mywork/Ethos-U85/executorch/arm_test/llama/llama_arm_delegate_ethos-u55-128.pte --system_config=Ethos_U55_High_End_Embedded --memory_mode=Shared_Sram --config=Arm/v
ela.ini
[WARNING 2026-05-18 11:00:00,361 aot_arm_compiler.py:134] Using a model from examples/models. Not all of these are currently supported.
I tokenizers:regex.cpp:27] Registering override fallback regex
Checkpoint not provided, using default initialization.
[WARNING 2026-05-18 11:00:07,284 insert_int32_casts_after_int64_placeholders.py:113] Inserting a casting node _to_dim_order_copy_default after tokens to cast int64 placeholder to int32 for
tokens defined in [no stack trace found]
/usr/lib64/python3.11/copyreg.py:105: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
return cls.__new__(cls, *args)
/home/bpang/tosa-env/lib64/python3.11/site-packages/torchao/quantization/pt2e/observer.py:1306: UserWarning: torch.inf detected in input tensor, ignoring input
warnings.warn("torch.inf detected in input tensor, ignoring input")
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/home/bpang/mywork/Ethos-U85/executorch/backends/arm/scripts/aot_arm_compiler.py", line 1127, in <module>
main()
File "/home/bpang/mywork/Ethos-U85/executorch/backends/arm/scripts/aot_arm_compiler.py", line 1028, in main
model_quant, edge = _to_edge_TOSA_delegate(
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bpang/mywork/Ethos-U85/executorch/backends/arm/scripts/aot_arm_compiler.py", line 829, in _to_edge_TOSA_delegate
model_quant, exported_program = quantize_model(
^^^^^^^^^^^^^^^
File "/home/bpang/mywork/Ethos-U85/executorch/backends/arm/scripts/aot_arm_compiler.py", line 799, in quantize_model
model_quant = quantize(
^^^^^^^^^
File "/home/bpang/mywork/Ethos-U85/executorch/backends/arm/scripts/aot_arm_compiler.py", line 375, in quantize
m(*sample)
File "/home/bpang/tosa-env/lib64/python3.11/site-packages/torch/fx/graph_module.py", line 949, in call_wrapped
return self._wrapped_call(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bpang/tosa-env/lib64/python3.11/site-packages/torch/fx/graph_module.py", line 461, in __call__
raise e
File "/home/bpang/tosa-env/lib64/python3.11/site-packages/torch/fx/graph_module.py", line 447, in __call__
return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bpang/tosa-env/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bpang/tosa-env/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<eval_with_key>.302", line 167, in forward
File "/home/bpang/tosa-env/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bpang/tosa-env/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bpang/tosa-env/lib64/python3.11/site-packages/torchao/quantization/pt2e/observer.py", line 1317, in forward
self.reset_histogram(x, x_min, x_max)
File "/home/bpang/tosa-env/lib64/python3.11/site-packages/torchao/quantization/pt2e/observer.py", line 1293, in reset_histogram
new_histogram = torch.histc(x, self.bins, min=min_val, max=max_val) # type: ignore[arg-type]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: torch.histc: range of [-nan, -nan] is not finite
Versions
Collecting environment information...
PyTorch version: 2.11.0+cu130
Is debug build: False
CUDA used to build PyTorch: 13.0
ROCM used to build PyTorch: N/A
OS: Red Hat Enterprise Linux 9.5 (Plow) (x86_64)
GCC version: (GCC) 11.5.0 20240719 (Red Hat 11.5.0-5)
Clang version: 19.1.7 (Red Hat, Inc. 19.1.7-2.el9)
CMake version: version 3.26.5
Libc version: glibc-2.34
Python version: 3.11.13 (main, Mar 26 2026, 00:00:00) [GCC 11.5.0 20240719 (Red Hat 11.5.0-11)] (64-bit runtime)
Python platform: Linux-5.14.0-503.35.1.el9_5.x86_64-x86_64-with-glibc2.34
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Caching allocator config: N/A
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 45 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) Platinum 8462Y+
CPU family: 6
Model: 85
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 32
Stepping: 7
BogoMIPS: 5599.99
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_pe
rfmon rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand
hypervisor lahf_lm abm 3dnowprefetch ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb av
x512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves wbnoinvd arat avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg rdpid fsrm md_clear flush
_l1d arch_capabilities
Hypervisor vendor: VMware
Virtualization type: full
L1d cache: 1.5 MiB (32 instances)
L1d cache: 1.5 MiB (32 instances)
L1i cache: 1 MiB (32 instances)
L2 cache: 64 MiB (32 instances)
L3 cache: 1.9 GiB (32 instances)
NUMA node(s): 1
NUMA node0 CPU(s): 0-31
Vulnerability Gather data sampling: Unknown: Dependent on hypervisor status
Vulnerability Itlb multihit: KVM: Mitigation: VMX unsupported
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed: Mitigation; Enhanced IBRS
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Enhanced / Automatic IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS SW sequence; BHI SW loop, KVM SW loop
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Versions of relevant libraries:
[pip3] executorch==1.2.0
[pip3] numpy==2.4.4
[pip3] nvidia-cublas==13.1.0.3
[pip3] nvidia-cuda-cupti==13.0.85
[pip3] nvidia-cuda-nvrtc==13.0.88
[pip3] nvidia-cuda-runtime==13.0.96
[pip3] nvidia-cudnn-cu13==9.19.0.56
[pip3] nvidia-cufft==12.0.0.61
[pip3] nvidia-curand==10.4.0.35
[pip3] nvidia-cusolver==12.0.4.66
[pip3] nvidia-cusparse==12.6.3.3
[pip3] nvidia-cusparselt-cu13==0.8.0
[pip3] nvidia-nccl-cu13==2.28.9
[pip3] nvidia-nvjitlink==13.0.88
[pip3] nvidia-nvtx==13.0.85
[pip3] pytorch_tokenizers==1.2.0
[pip3] torch==2.11.0
[pip3] torchao==0.17.0
[pip3] torchvision==0.26.0
[pip3] triton==3.6.0
cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell @rascani
🐛 Describe the bug
Wth this command:
./examples/arm/run.sh --model_name=llamaI get this error:
Versions
Collecting environment information...
PyTorch version: 2.11.0+cu130
Is debug build: False
CUDA used to build PyTorch: 13.0
ROCM used to build PyTorch: N/A
OS: Red Hat Enterprise Linux 9.5 (Plow) (x86_64)
GCC version: (GCC) 11.5.0 20240719 (Red Hat 11.5.0-5)
Clang version: 19.1.7 (Red Hat, Inc. 19.1.7-2.el9)
CMake version: version 3.26.5
Libc version: glibc-2.34
Python version: 3.11.13 (main, Mar 26 2026, 00:00:00) [GCC 11.5.0 20240719 (Red Hat 11.5.0-11)] (64-bit runtime)
Python platform: Linux-5.14.0-503.35.1.el9_5.x86_64-x86_64-with-glibc2.34
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Caching allocator config: N/A
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 45 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) Platinum 8462Y+
CPU family: 6
Model: 85
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 32
Stepping: 7
BogoMIPS: 5599.99
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_pe
rfmon rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand
hypervisor lahf_lm abm 3dnowprefetch ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb av
x512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves wbnoinvd arat avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg rdpid fsrm md_clear flush
_l1d arch_capabilities
Hypervisor vendor: VMware
Virtualization type: full
L1d cache: 1.5 MiB (32 instances)
L1d cache: 1.5 MiB (32 instances)
L1i cache: 1 MiB (32 instances)
L2 cache: 64 MiB (32 instances)
L3 cache: 1.9 GiB (32 instances)
NUMA node(s): 1
NUMA node0 CPU(s): 0-31
Vulnerability Gather data sampling: Unknown: Dependent on hypervisor status
Vulnerability Itlb multihit: KVM: Mitigation: VMX unsupported
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed: Mitigation; Enhanced IBRS
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Enhanced / Automatic IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS SW sequence; BHI SW loop, KVM SW loop
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Versions of relevant libraries:
[pip3] executorch==1.2.0
[pip3] numpy==2.4.4
[pip3] nvidia-cublas==13.1.0.3
[pip3] nvidia-cuda-cupti==13.0.85
[pip3] nvidia-cuda-nvrtc==13.0.88
[pip3] nvidia-cuda-runtime==13.0.96
[pip3] nvidia-cudnn-cu13==9.19.0.56
[pip3] nvidia-cufft==12.0.0.61
[pip3] nvidia-curand==10.4.0.35
[pip3] nvidia-cusolver==12.0.4.66
[pip3] nvidia-cusparse==12.6.3.3
[pip3] nvidia-cusparselt-cu13==0.8.0
[pip3] nvidia-nccl-cu13==2.28.9
[pip3] nvidia-nvjitlink==13.0.88
[pip3] nvidia-nvtx==13.0.85
[pip3] pytorch_tokenizers==1.2.0
[pip3] torch==2.11.0
[pip3] torchao==0.17.0
[pip3] torchvision==0.26.0
[pip3] triton==3.6.0
cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell @rascani