Skip to content

Introduce VCL, SIMD wrappers, and Vectorized RasterizeSDF(Spheres)#2190

Open
Idclip wants to merge 7 commits intoAcademySoftwareFoundation:masterfrom
Idclip:vcl_simd
Open

Introduce VCL, SIMD wrappers, and Vectorized RasterizeSDF(Spheres)#2190
Idclip wants to merge 7 commits intoAcademySoftwareFoundation:masterfrom
Idclip:vcl_simd

Conversation

@Idclip
Copy link
Copy Markdown
Contributor

@Idclip Idclip commented Apr 8, 2026

This PR proposed to bring in Agner Fog's vectorclass (VCL) library as an internal but optional dependency on x86/x86_64 architecture. It then introduces some further infrastructure to improve/tidy our x86 intrinsic usage/ISA targeting, along with an additional wrapper header openvdb/simd/Simd.h which wraps the VCL Vec containers within a openvdb::simd namespace. This second level of wrapping exists to:

  • Abstract away the underlying SIMD container types should we want to use a different library in the future (e.g. std::simd)
  • Provide a namespace for transient selection of other architectures in the future (arm/neon, etc)
  • Allows us to implement a generic API for non-vectorized or non-x86 builds that instead work on Tuples of values, but allows algorithms to be written with a single implementation for both build types.

Note that the later point is crucial - this encourages us to write code that is more applicable to SIMD concepts, regardless of whether VCL/ISA targeting is in use. Many tools in VDB are inherently memory bound; that is, lots of data reading, little computation. Even when explicitly disabling compiler vectorization for specific x86 ISA's, there are notable performance improvements to be observed in many methods simply be restructuring inner loops to work on multiple components (i.e. many inner loops vs many outer loops).

Finally, to keep this PR small and primarily infrastructural, it contains one vectorized tool port, rasterizeSdf(SphereSettings), which demonstrates how to both migrate from AoS->SoA and port existing scalar code to a templated method for VCL, Tuple and scalar arithmetic types. This implementation works with and without VCL.

The following table demonstrates the observed speedups with all configurations:

  • Scalar - That is, no AoS->SoA, no VCL or Tuples with no ISA targetting, with SSE42 and with AVX
  • Array<2> - Tuples of 2 doubles with no ISA targetting, with SSE42 and with AVX
  • Array<4> - Tuples of 4 doubles with no ISA targetting, with SSE42 and with AVX
  • Intrinsics - Using VCL with SSE42 (2 x doubles) and with AVX (4 x doubles)
image

Note that this particular tool requires no discussion over determinism of horizontal reduction or Intel vs AMD instruction specs - that is, there are no horizontal accumulations and no reciprocal emissions for this case. We can deffer this discussion to a future PR.

I have working implementations of the following which, should this PR be accepted, I can further contribute:

  • PointRasterizeSDF.h rasterizeSdf(SmoothSpheres)
  • PointRasterizeSDF.h rasterizeSdf(Ellipsoids)
  • PointRasterizeTrilinear.h rasterizeTrilinear
  • PrincipalComponentAnalysis.h pca

…7883e0ed9ce6

Signed-off-by: Nick Avramoussis <4256455+Idclip@users.noreply.github.com>
Idclip added 5 commits April 9, 2026 12:52
Signed-off-by: Nick Avramoussis <4256455+Idclip@users.noreply.github.com>
…alar arithmetic

Signed-off-by: Nick Avramoussis <4256455+Idclip@users.noreply.github.com>
…. RasterizeSDF with spheres

Signed-off-by: Nick Avramoussis <4256455+Idclip@users.noreply.github.com>
Signed-off-by: Nick Avramoussis <4256455+Idclip@users.noreply.github.com>
Signed-off-by: Nick Avramoussis <4256455+Idclip@users.noreply.github.com>
Signed-off-by: Nick Avramoussis <4256455+Idclip@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant