Skip to content

Conversation

@LaurenzV
Copy link
Collaborator

I think now is a good time to get this out. I double-checked the changelog and only found one other PR worth mentioning.

@valadaptive
Copy link
Contributor

Since #159 and #170 are both breaking changes that touch the core of the library, I would've liked to get them in before cutting a release. I suppose it doesn't matter on the technical side of things, and maybe it's better to just land those later and cut another release immediately afterwards. However, if there's a minimum wait time between releases, then I'd like to get both of those in first.

I think fearless_simd is fairly close to dropping the "experimental" warning as well; I believe implementations for all supported architectures are completed now. We should figure out which release we want to officially drop the warning for.

@LaurenzV
Copy link
Collaborator Author

I wouldn’t mind cutting a v0.5 right after those two PRs are merged, Injust think it would be good to have a “checkpoint” up until now because as we’ve seen #159 is a bigger change that seems to have performance impacts in certain cases, so it would be good to have this release to fall back on in case we notice other problems (I’m sure it will be fine! but I don’t think it hurts either.)

@valadaptive
Copy link
Contributor

Injust think it would be good to have a “checkpoint” up until now because as we’ve seen #159 is a bigger change that seems to have performance impacts in certain cases, so it would be good to have this release to fall back on in case we notice other problems (I’m sure it will be fine! but I don’t think it hurts either.)

The performance impacts of #159 are part of the reason I want to get it in before v0.4. Say the v0.4 release announcement gets posted to Reddit or Hacker News, people go "neat, looks like fearless_simd is actually usable now" and start using it in their libraries and optimizing around it, and then we release v0.5 and everything needs to be re-tuned.

IMO, vello_cpu is probably experiencing weird performance changes because it's already been heavily tuned on the autovectorization-based implementation. I would rather figure out the real causes of the performance regressions. The ones we've tracked down so far are things we probably shouldn't have done in the first place: forgetting to call vectorize, deinterleaving data then immediately re-interleaving it, converting an array to a vector then immediately converting it back to an array.

FWIW, I had to rearrange some code when porting my own project to fearless_simd as well, despite already using native vector types and performing the exact same operations that mapped to the exact same instructions. LLVM just decided to schedule the instructions differently and make things 10% slower. Unfortunately, I think the optimizer is just inherently fickle.

Maybe we could just re-add Level::fallback as a deprecated alias of Level::baseline, and madd/msub as deprecated aliases of mul_add/mul_sub, to make it easier to revert to v0.3 if need be?

@LaurenzV LaurenzV marked this pull request as draft December 16, 2025 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants