[BugFix][Target][LLVM] Route sinh/cosh/atan/asinh/erf through libm extern#19568
[BugFix][Target][LLVM] Route sinh/cosh/atan/asinh/erf through libm extern#19568swjng wants to merge 4 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors the LLVM legalization rules for several intrinsic operations, including cosh, sinh, atan, asinh, and erf, by replacing manual mathematical expansions with a unified DispatchPureExtern call. Additionally, the Atan operator has been enabled in the ONNX frontend tests. The reviewer pointed out that the using namespace intrin; statements, local call variable assignments, and manual null checks are redundant in the new implementation because DispatchPureExtern is fully qualified and performs its own internal validation.
| PrimExpr exp_posx = exp(x); | ||
| PrimExpr ret = (exp_posx + exp_negx) / two; | ||
| return ret; | ||
| return ::tvm::codegen::intrin::DispatchPureExtern<::tvm::codegen::intrin::FloatSuffix>(e); |
There was a problem hiding this comment.
I have a question: default.FLowerIntrinsic already handles conversion for these operations, wouldn't simply deleting the llvm.FLegalize registrations achieve the same result as adding redundant DispatchPureExtern rules in intrin_rule_llvm.cc? Are there any other considerations?
Lines 78 to 79 in 4b93f20
There was a problem hiding this comment.
You're right, I didn't check the fallback.
…tern Five LLVM legalize rules used inline mathematical identities that fail on representable inputs because the intermediate computation overflows or cancels, even though the true result is in range: - `sinh`/`cosh`: `(exp(x) ± exp(-x)) / 2` — `exp(89) > FLT_MAX` so `exp(x)` itself overflows. True `sinh(89) ≈ 2.24e38` fits in float32. (apache#19559) - `atan`: `asin(x / sqrt(x*x + 1))` — `x*x` overflows for `|x| > sqrt(FLT_MAX) ≈ 1.84e19`, then `sqrt(inf) = inf`, `x / inf = 0`, `asin(0) = 0`. (apache#19560) - `asinh`: `log(x + sqrt(x*x + 1))` — same `x*x` overflow. True `asinh(3e22) ≈ 52.45`. (apache#19561) - `erf`: Abramowitz–Stegun `1 - poly(t) * exp(-x*x)` — for small `|x|`, `poly * exp(-x*x) ≈ 1` and the subtraction cancels to 0 in float32, flushing `erf(3e-12)` to 0 instead of the true `~3.4e-12`. (apache#19562) All four route through the existing `DispatchPureExtern<FloatSuffix>` helper (i.e. `sinhf`, `coshf`, `atanf`, `asinhf`, `erff`), the same pattern already used by `asin`/`acos`. ULP-grade accuracy across representative ranges; `Atan` is re-enabled in `test_unary` since the overflow that previously broke it is fixed. Note for reviewers: if the inline identities were a deliberate fast-path (e.g. for autovectorization or to avoid extern call overhead in tight loops), please flag it and I'll switch to stable inline forms (`exp(x − ln 2) ± exp(−x − ln 2)` for sinh/cosh, range-reduced asinh, small-x Taylor for erf, etc.). I could not find evidence of such intent in the git history. Acosh shows the same `sqrt(x*x − 1)` overflow pattern but is not covered by any of the listed issues; happy to include it as a follow-up if maintainers want. Fixes apache#19559. Fixes apache#19560. Fixes apache#19561. Fixes apache#19562.
`acosh` has the same `sqrt(x*x - 1)` overflow pattern as `asinh`: intermediate `x*x` overflows float32 for `|x| > sqrt(FLT_MAX) ≈ 1.84e19`, so `sqrt(inf) = inf`, `log(x + inf) = inf`, while the true result `acosh(3e22) ≈ 52.45` is well within range. No issue was filed for this op but the bug is identical to apache#19561 and the fix is the same.
…rappers Per review feedback: with the inline math identities gone, the `DispatchPureExtern<FloatSuffix>` call is fully qualified (`::tvm::codegen::intrin::...`) and the `using namespace intrin;` line inside each lambda no longer brings anything into scope. Drop it from the six ops touched in this PR (sinh, cosh, atan, asinh, acosh, erf). The `CallNode* call` ICHECK is kept for parity with the rest of the file (every legalize lambda in this translation unit performs that check).
Delete llvm.FLegalize registrations for sinh, cosh, atan, asinh, acosh, and erf that simply forwarded to DispatchPureExtern. The default.FLowerIntrinsic fallback in intrin_rule.cc already provides identical behavior, so these entries were redundant.
c3bfcea to
aba2270
Compare
Summary
Six LLVM legalize rules in
src/target/llvm/intrin_rule_llvm.ccuse inline mathematical identities that fail on representable inputs because the intermediate computation overflows or cancels, even though the true result is infloat32range:sinh/cosh(#19559)(exp(x) ± exp(-x)) / 2exp(89) > FLT_MAX, intermediate isinfsinh(89) ≈ 2.24e38atan(#19560)asin(x / sqrt(x²+1))x²overflows for `asinh(#19561)log(x + sqrt(x²+1))x²overflow →log(inf)=infasinh(3e22) ≈ 52.45erf(#19562)1 − poly(t)·exp(−x²)poly·exp(−x²) ≈ 1for tiny `acosh(no issue)log(x + sqrt(x²−1))x²overflow →infacosh(3e22) ≈ 52.45acoshwas not in the original issue cluster but shows the identical bug pattern toasinh; folding it in keeps this PR's scope consistent ("naive math identity → libm extern"). Happy to split it out if reviewers prefer.Fix
Route all six through the existing
DispatchPureExtern<FloatSuffix>helper — i.e.sinhf,coshf,atanf,asinhf,acoshf,erff— the same patternasin/acosuse after #19567. ULP-grade accuracy across the reported ranges.Atanis re-enabled intest_unary; the overflow that previously broke it is fixed.Notes for reviewers
Inline-vs-extern decision. If the inline identities were a deliberate fast-path (e.g. for autovectorization or to avoid extern-call overhead in tight loops), please flag it and I'll switch to stable inline forms instead —
exp(x − ln 2) ± exp(−x − ln 2)for sinh/cosh, range-reduced asinh/acoshsign(x)·log(2|x|)for large|x|, small-|x|Taylor branch for erf, etc. I could not find evidence of such intent in the git history (sinh/cosh: original commit; atan/asinh/acosh: #17945 / #17969 follow-ups; erf: #18104 was framed as "more precise than tanh-approx", not "fast inline").Fixes #19559.
Fixes #19560.
Fixes #19561.
Fixes #19562.