fail to build 'llama' using examples/arm/run.sh

### 🐛 Describe the bug

Wth this command:

`./examples/arm/run.sh --model_name=llama`

I get this error:

```
Running e2e flow for model 'llama' with flags '--delegate --quantize '
--------------------------------------------------------------------------------
CALL python3 -m backends.arm.scripts.aot_arm_compiler --model_name=llama --target=ethos-u55-128 --delegate --quantize  --intermediate=/home/bpang/mywork/Ethos-U85/executorch/arm_test/llama
 --output=/home/bpang/mywork/Ethos-U85/executorch/arm_test/llama/llama_arm_delegate_ethos-u55-128.pte --system_config=Ethos_U55_High_End_Embedded --memory_mode=Shared_Sram   --config=Arm/v
ela.ini
[WARNING 2026-05-18 11:00:00,361 aot_arm_compiler.py:134] Using a model from examples/models. Not all of these are currently supported.
I tokenizers:regex.cpp:27] Registering override fallback regex
Checkpoint not provided, using default initialization.
[WARNING 2026-05-18 11:00:07,284 insert_int32_casts_after_int64_placeholders.py:113] Inserting a casting node _to_dim_order_copy_default after tokens to cast int64 placeholder to int32 for
 tokens defined in [no stack trace found]
/usr/lib64/python3.11/copyreg.py:105: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
  return cls.__new__(cls, *args)
/home/bpang/tosa-env/lib64/python3.11/site-packages/torchao/quantization/pt2e/observer.py:1306: UserWarning: torch.inf detected in input tensor, ignoring input
  warnings.warn("torch.inf detected in input tensor, ignoring input")
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/bpang/mywork/Ethos-U85/executorch/backends/arm/scripts/aot_arm_compiler.py", line 1127, in <module>
    main()
  File "/home/bpang/mywork/Ethos-U85/executorch/backends/arm/scripts/aot_arm_compiler.py", line 1028, in main
    model_quant, edge = _to_edge_TOSA_delegate(
                        ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bpang/mywork/Ethos-U85/executorch/backends/arm/scripts/aot_arm_compiler.py", line 829, in _to_edge_TOSA_delegate
    model_quant, exported_program = quantize_model(
                                    ^^^^^^^^^^^^^^^
  File "/home/bpang/mywork/Ethos-U85/executorch/backends/arm/scripts/aot_arm_compiler.py", line 799, in quantize_model
    model_quant = quantize(
                  ^^^^^^^^^
  File "/home/bpang/mywork/Ethos-U85/executorch/backends/arm/scripts/aot_arm_compiler.py", line 375, in quantize
    m(*sample)
  File "/home/bpang/tosa-env/lib64/python3.11/site-packages/torch/fx/graph_module.py", line 949, in call_wrapped
    return self._wrapped_call(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bpang/tosa-env/lib64/python3.11/site-packages/torch/fx/graph_module.py", line 461, in __call__
    raise e
  File "/home/bpang/tosa-env/lib64/python3.11/site-packages/torch/fx/graph_module.py", line 447, in __call__
    return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bpang/tosa-env/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bpang/tosa-env/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<eval_with_key>.302", line 167, in forward
  File "/home/bpang/tosa-env/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bpang/tosa-env/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
   return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bpang/tosa-env/lib64/python3.11/site-packages/torchao/quantization/pt2e/observer.py", line 1317, in forward
    self.reset_histogram(x, x_min, x_max)
  File "/home/bpang/tosa-env/lib64/python3.11/site-packages/torchao/quantization/pt2e/observer.py", line 1293, in reset_histogram
    new_histogram = torch.histc(x, self.bins, min=min_val, max=max_val)  # type: ignore[arg-type]
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: torch.histc: range of [-nan, -nan] is not finite

```

### Versions

Collecting environment information...
PyTorch version: 2.11.0+cu130
Is debug build: False
CUDA used to build PyTorch: 13.0
ROCM used to build PyTorch: N/A

OS: Red Hat Enterprise Linux 9.5 (Plow) (x86_64)
GCC version: (GCC) 11.5.0 20240719 (Red Hat 11.5.0-5)
Clang version: 19.1.7 (Red Hat, Inc. 19.1.7-2.el9)
CMake version: version 3.26.5
Libc version: glibc-2.34

Python version: 3.11.13 (main, Mar 26 2026, 00:00:00) [GCC 11.5.0 20240719 (Red Hat 11.5.0-11)] (64-bit runtime)
Python platform: Linux-5.14.0-503.35.1.el9_5.x86_64-x86_64-with-glibc2.34
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Caching allocator config: N/A

CPU:
Architecture:                         x86_64
CPU op-mode(s):                       32-bit, 64-bit
Address sizes:                        45 bits physical, 48 bits virtual
Byte Order:                           Little Endian
CPU(s):                               32
On-line CPU(s) list:                  0-31
Vendor ID:                            GenuineIntel
Model name:                           Intel(R) Xeon(R) Platinum 8462Y+
CPU family:                           6
Model:                                85
Thread(s) per core:                   1
Core(s) per socket:                   1
Socket(s):                            32
Stepping:                             7
BogoMIPS:                             5599.99
Flags:                                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_pe
rfmon rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand
 hypervisor lahf_lm abm 3dnowprefetch ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb av
x512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves wbnoinvd arat avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg rdpid fsrm md_clear flush
_l1d arch_capabilities
Hypervisor vendor:                    VMware
Virtualization type:                  full
L1d cache:                            1.5 MiB (32 instances)
L1d cache:                            1.5 MiB (32 instances)
L1i cache:                            1 MiB (32 instances)
L2 cache:                             64 MiB (32 instances)
L3 cache:                             1.9 GiB (32 instances)
NUMA node(s):                         1
NUMA node0 CPU(s):                    0-31
Vulnerability Gather data sampling:   Unknown: Dependent on hypervisor status
Vulnerability Itlb multihit:          KVM: Mitigation: VMX unsupported
Vulnerability L1tf:                   Not affected
Vulnerability Mds:                    Not affected
Vulnerability Meltdown:               Not affected
Vulnerability Mmio stale data:        Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed:               Mitigation; Enhanced IBRS
Vulnerability Spec rstack overflow:   Not affected
Vulnerability Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:             Mitigation; Enhanced / Automatic IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS SW sequence; BHI SW loop, KVM SW loop
Vulnerability Srbds:                  Not affected
Vulnerability Tsx async abort:        Not affected

Versions of relevant libraries:
[pip3] executorch==1.2.0
[pip3] numpy==2.4.4
[pip3] nvidia-cublas==13.1.0.3
[pip3] nvidia-cuda-cupti==13.0.85
[pip3] nvidia-cuda-nvrtc==13.0.88
[pip3] nvidia-cuda-runtime==13.0.96
[pip3] nvidia-cudnn-cu13==9.19.0.56
[pip3] nvidia-cufft==12.0.0.61
[pip3] nvidia-curand==10.4.0.35
[pip3] nvidia-cusolver==12.0.4.66
[pip3] nvidia-cusparse==12.6.3.3
[pip3] nvidia-cusparselt-cu13==0.8.0
[pip3] nvidia-nccl-cu13==2.28.9
[pip3] nvidia-nvjitlink==13.0.88
[pip3] nvidia-nvtx==13.0.85
[pip3] pytorch_tokenizers==1.2.0
[pip3] torch==2.11.0
[pip3] torchao==0.17.0
[pip3] torchvision==0.26.0
[pip3] triton==3.6.0


cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell @rascani

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fail to build 'llama' using examples/arm/run.sh #19647

🐛 Describe the bug

Versions

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

fail to build 'llama' using examples/arm/run.sh #19647

Description

🐛 Describe the bug

Versions

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions