Skip to content

[BugFix][Target][LLVM] Route sinh/cosh/atan/asinh/erf through libm extern#19568

Open
swjng wants to merge 4 commits into
apache:mainfrom
swjng:fix/llvm-math-overflow
Open

[BugFix][Target][LLVM] Route sinh/cosh/atan/asinh/erf through libm extern#19568
swjng wants to merge 4 commits into
apache:mainfrom
swjng:fix/llvm-math-overflow

Conversation

@swjng
Copy link
Copy Markdown
Contributor

@swjng swjng commented May 15, 2026

Summary

Six LLVM legalize rules in src/target/llvm/intrin_rule_llvm.cc use inline mathematical identities that fail on representable inputs because the intermediate computation overflows or cancels, even though the true result is in float32 range:

Op Inline form Failure True result
sinh/cosh (#19559) (exp(x) ± exp(-x)) / 2 exp(89) > FLT_MAX, intermediate is inf sinh(89) ≈ 2.24e38
atan (#19560) asin(x / sqrt(x²+1)) overflows for ` x
asinh (#19561) log(x + sqrt(x²+1)) same overflow → log(inf)=inf asinh(3e22) ≈ 52.45
erf (#19562) A&S 1 − poly(t)·exp(−x²) poly·exp(−x²) ≈ 1 for tiny ` x
acosh (no issue) log(x + sqrt(x²−1)) same overflow → inf acosh(3e22) ≈ 52.45

acosh was not in the original issue cluster but shows the identical bug pattern to asinh; folding it in keeps this PR's scope consistent ("naive math identity → libm extern"). Happy to split it out if reviewers prefer.

Fix

Route all six through the existing DispatchPureExtern<FloatSuffix> helper — i.e. sinhf, coshf, atanf, asinhf, acoshf, erff — the same pattern asin/acos use after #19567. ULP-grade accuracy across the reported ranges.

sinh(89.0):    ORT=2.244806e+38  TVM=2.244806e+38  (was inf)
atan(3e22):    ORT=1.5707964     TVM=1.5707963     (was 0.0)
asinh(3e22):   ORT=52.44863      TVM=52.44863      (was inf)
acosh(3e22):   ORT=52.44863      TVM=52.44863      (was inf)
erf(3e-12):    ORT=3.385e-12     TVM=3.385e-12     (was 0.0)

Atan is re-enabled in test_unary; the overflow that previously broke it is fixed.

Notes for reviewers

Inline-vs-extern decision. If the inline identities were a deliberate fast-path (e.g. for autovectorization or to avoid extern-call overhead in tight loops), please flag it and I'll switch to stable inline forms instead — exp(x − ln 2) ± exp(−x − ln 2) for sinh/cosh, range-reduced asinh/acosh sign(x)·log(2|x|) for large |x|, small-|x| Taylor branch for erf, etc. I could not find evidence of such intent in the git history (sinh/cosh: original commit; atan/asinh/acosh: #17945 / #17969 follow-ups; erf: #18104 was framed as "more precise than tanh-approx", not "fast inline").

Fixes #19559.
Fixes #19560.
Fixes #19561.
Fixes #19562.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the LLVM legalization rules for several intrinsic operations, including cosh, sinh, atan, asinh, and erf, by replacing manual mathematical expansions with a unified DispatchPureExtern call. Additionally, the Atan operator has been enabled in the ONNX frontend tests. The reviewer pointed out that the using namespace intrin; statements, local call variable assignments, and manual null checks are redundant in the new implementation because DispatchPureExtern is fully qualified and performs its own internal validation.

Comment thread src/target/llvm/intrin_rule_llvm.cc Outdated
Comment thread src/target/llvm/intrin_rule_llvm.cc Outdated
Comment thread src/target/llvm/intrin_rule_llvm.cc Outdated
Comment thread src/target/llvm/intrin_rule_llvm.cc Outdated
Comment thread src/target/llvm/intrin_rule_llvm.cc Outdated
Comment thread src/target/llvm/intrin_rule_llvm.cc Outdated
PrimExpr exp_posx = exp(x);
PrimExpr ret = (exp_posx + exp_negx) / two;
return ret;
return ::tvm::codegen::intrin::DispatchPureExtern<::tvm::codegen::intrin::FloatSuffix>(e);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a question: default.FLowerIntrinsic already handles conversion for these operations, wouldn't simply deleting the llvm.FLegalize registrations achieve the same result as adding redundant DispatchPureExtern rules in intrin_rule_llvm.cc? Are there any other considerations?

TVM_REGISTER_OP("tirx.cosh")
.set_attr<FLowerIntrinsic>("default.FLowerIntrinsic", DispatchPureExtern<FloatSuffix>);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I didn't check the fallback.

swjng added 4 commits May 18, 2026 16:32
…tern

Five LLVM legalize rules used inline mathematical identities that fail
on representable inputs because the intermediate computation overflows
or cancels, even though the true result is in range:

- `sinh`/`cosh`: `(exp(x) ± exp(-x)) / 2` — `exp(89) > FLT_MAX` so
  `exp(x)` itself overflows. True `sinh(89) ≈ 2.24e38` fits in float32.
  (apache#19559)
- `atan`: `asin(x / sqrt(x*x + 1))` — `x*x` overflows for
  `|x| > sqrt(FLT_MAX) ≈ 1.84e19`, then `sqrt(inf) = inf`,
  `x / inf = 0`, `asin(0) = 0`. (apache#19560)
- `asinh`: `log(x + sqrt(x*x + 1))` — same `x*x` overflow. True
  `asinh(3e22) ≈ 52.45`. (apache#19561)
- `erf`: Abramowitz–Stegun `1 - poly(t) * exp(-x*x)` — for small `|x|`,
  `poly * exp(-x*x) ≈ 1` and the subtraction cancels to 0 in float32,
  flushing `erf(3e-12)` to 0 instead of the true `~3.4e-12`. (apache#19562)

All four route through the existing `DispatchPureExtern<FloatSuffix>`
helper (i.e. `sinhf`, `coshf`, `atanf`, `asinhf`, `erff`), the same
pattern already used by `asin`/`acos`. ULP-grade accuracy across
representative ranges; `Atan` is re-enabled in `test_unary` since the
overflow that previously broke it is fixed.

Note for reviewers: if the inline identities were a deliberate
fast-path (e.g. for autovectorization or to avoid extern call overhead
in tight loops), please flag it and I'll switch to stable inline forms
(`exp(x − ln 2) ± exp(−x − ln 2)` for sinh/cosh, range-reduced asinh,
small-x Taylor for erf, etc.). I could not find evidence of such intent
in the git history.

Acosh shows the same `sqrt(x*x − 1)` overflow pattern but is not
covered by any of the listed issues; happy to include it as a follow-up
if maintainers want.

Fixes apache#19559.
Fixes apache#19560.
Fixes apache#19561.
Fixes apache#19562.
`acosh` has the same `sqrt(x*x - 1)` overflow pattern as `asinh`:
intermediate `x*x` overflows float32 for `|x| > sqrt(FLT_MAX) ≈ 1.84e19`,
so `sqrt(inf) = inf`, `log(x + inf) = inf`, while the true result
`acosh(3e22) ≈ 52.45` is well within range. No issue was filed for this
op but the bug is identical to apache#19561 and the fix is the same.
…rappers

Per review feedback: with the inline math identities gone, the
`DispatchPureExtern<FloatSuffix>` call is fully qualified
(`::tvm::codegen::intrin::...`) and the `using namespace intrin;` line
inside each lambda no longer brings anything into scope. Drop it from
the six ops touched in this PR (sinh, cosh, atan, asinh, acosh, erf).
The `CallNode* call` ICHECK is kept for parity with the rest of the
file (every legalize lambda in this translation unit performs that
check).
Delete llvm.FLegalize registrations for sinh, cosh, atan, asinh, acosh,
and erf that simply forwarded to DispatchPureExtern. The default.FLowerIntrinsic
fallback in intrin_rule.cc already provides identical behavior, so these
entries were redundant.
@swjng swjng force-pushed the fix/llvm-math-overflow branch from c3bfcea to aba2270 Compare May 18, 2026 07:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants