Add fast approximate reciprocal methods for float vectors by tomcur · Pull Request #204 · linebender/fearless_simd

tomcur · 2026-02-22T14:19:01Z

x86 and AArch64 have instructions to calculate fast approximate reciprocals, and these can speed up some algorithms quite nicely (e.g. sprinkling this in Vello's flatten_simd.rs results in -4% flattening timings for GhostScript Tiger (actually landing that there requires a bit of thought whether the lowered precision is acceptable of course!)).

There is some detail here that this PR as-is doesn't attempt to solve. x86's rcp has about 12 bits of precision, AArch64's vrecpe about 8 bits. AArch64 has an additional instruction however, vrecps, to perform a Newton refinement step, which bumps the precision to 16 bits. That'd look something like the following.

let x0 = vrecpeq_f32(a);
x0 * vrecpsq_f32(a, x0); // calculates x0 * (2 - x0 * a), roughly doubling the precision of the `x0` estimate

Then, AVX512 introduces rcp14, which allows calculating to 14-bit precision with (I believe) the same performance as rcp, and extends support to f64.

In any case, this method does the simplest thing of just exposing the cheapest hardware estimate, similar to e.g. Highway's ApproximateReciprocal.

x86 and AArch64 have instructions to calculate fast approximate reciprocals, and these can speed up some algorithms quite nicely (e.g. sprinkling this in Vello's `flatten_simd.rs` results in -4% flattening timings for GhostScript Tiger (actually landing that there requires a bit of thought whether the lowered precision is acceptable of course!). There is some detail here that this PR as-is doesn't attempt to solve. x86's `rcp` has about 12 bits of precision, AArch64's `vrecpe` about 8 bits. AArch64 has an additional instruction however, `vrecps`, to perform a Newton refinement step, which bumps the precision to 16 bits. That'd look something like the following. ```rust let x0 = vrecpeq_f32(a); x0 * vrecpsq_f32(a, x0); // calculates x0 * (2 - x0 * a), roughly doubling the precision of the `x0` estimate ``` Then, AVX512 introduces `rcp14`, which allows calculating to 14-bit precision with (I believe) the same performance as `rcp`, and extends support to `f64`. In any case, this method does the simplest thing of just exposing the cheapest hardware estimate, similar to e.g. Highway's `ApproximateReciprocal`.

LaurenzV · 2026-02-23T13:45:58Z

fearless_simd/src/generated/fallback.rs

    }
    #[inline(always)]
+    fn approximate_recip_f32x4(self, a: f32x4<Self>) -> f32x4<Self> {
+        self.splat_f32x4(1.0) / a


I haven't tried it, does division work without splatting? I think for mutliplication it works at least.

LaurenzV · 2026-02-23T13:47:17Z

fearless_simd/src/generated/wasm.rs

    }
    #[inline(always)]
+    fn approximate_recip_f32x4(self, a: f32x4<Self>) -> f32x4<Self> {
+        self.div_f32x4(self.splat_f32x4(1.0), a)


Same comment as in fallback

LaurenzV · 2026-02-23T13:49:13Z

fearless_simd/src/generated/avx2.rs

        unsafe { _mm_sqrt_ps(a.into()).simd_into(self) }
    }
    #[inline(always)]
+    fn approximate_recip_f32x4(self, a: f32x4<Self>) -> f32x4<Self> {


I'm wondering whether we should just spell reciprocal out. But should be fine this way!

Wondered the same thing, but decided to mirror e.g. f32::recip.

tomcur force-pushed the approximate-recip branch from 691c363 to 52520f7 Compare February 22, 2026 14:20

Make tests consistent

6bdfe8d

LaurenzV approved these changes Feb 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fast approximate reciprocal methods for float vectors#204

Add fast approximate reciprocal methods for float vectors#204
tomcur wants to merge 2 commits intolinebender:mainfrom
tomcur:approximate-recip

tomcur commented Feb 22, 2026

Uh oh!

LaurenzV Feb 23, 2026

Uh oh!

LaurenzV Feb 23, 2026

Uh oh!

LaurenzV Feb 23, 2026

Uh oh!

tomcur Feb 23, 2026

Uh oh!

LaurenzV Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tomcur commented Feb 22, 2026

Uh oh!

LaurenzV Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

LaurenzV Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

LaurenzV Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

tomcur Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

LaurenzV Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants