Skip to content

Created concepts for samplers, added quotient_and_pdf variants to satisfy the concepts#1001

Open
karimsayedre wants to merge 61 commits intomasterfrom
sampler-concepts
Open

Created concepts for samplers, added quotient_and_pdf variants to satisfy the concepts#1001
karimsayedre wants to merge 61 commits intomasterfrom
sampler-concepts

Conversation

@karimsayedre
Copy link
Copy Markdown
Contributor

@karimsayedre karimsayedre commented Feb 18, 2026

Examples PR

Notes:

  • The quotient_and_pdf() methods in UniformHemisphere, UniformSphere, ProjectedHemisphere, and ProjectedSphere shadow the struct type sampling::quotient_and_pdf<Q, P> from quotient_and_pdf.hlsl. DXC can't resolve the return type because the method name takes precedence over the struct name during lookup. Fixed by fully qualifying with ::nbl::hlsl::sampling::quotient_and_pdf<U, T>.
  • Obv. there's some refactoring to be done to satisfy all the concepts, so for not Basic (Level1) samplers are concept tested

Comment thread include/nbl/builtin/hlsl/sampling/box_muller_transform.hlsl Outdated
Comment thread include/nbl/builtin/hlsl/sampling/box_muller_transform.hlsl Outdated
Comment thread include/nbl/builtin/hlsl/sampling/concepts.hlsl Outdated
Comment thread include/nbl/builtin/hlsl/sampling/concepts.hlsl Outdated
Comment thread include/nbl/builtin/hlsl/sampling/concepts.hlsl Outdated
Comment thread include/nbl/builtin/hlsl/sampling/concepts.hlsl Outdated
Comment thread include/nbl/builtin/hlsl/sampling/concepts.hlsl Outdated
Comment thread include/nbl/builtin/hlsl/sampling/concepts.hlsl Outdated
Comment thread include/nbl/builtin/hlsl/sampling/cos_weighted_spheres.hlsl Outdated
Comment thread include/nbl/builtin/hlsl/sampling/quotient_and_pdf.hlsl
…concepts

- Move codomain_and_*Pdf and domain_and_*Pdf structs into their own warp_and_pdf.hlsl header
- Keeping quotient_and_pdf.hlsl focused on importance sampling quotients for BxDFs
- Add SampleWithPDF, SampleWithRcpPDF, and SampleWithDensity concepts to validate sample types
- Used concept composition (NBL_CONCEPT_REQ_TYPE_ALIAS_CONCEPT) to build ResamplableSampler on TractableSampler and BijectiveSampler on ResamplableSampler
Comment thread include/nbl/builtin/hlsl/sampling/cos_weighted_spheres.hlsl Outdated
Comment thread include/nbl/builtin/hlsl/sampling/warp_and_pdf.hlsl Outdated
Comment thread include/nbl/builtin/hlsl/sampling/concepts.hlsl Outdated
Comment thread include/nbl/builtin/hlsl/sampling/concepts.hlsl Outdated
Comment thread include/nbl/builtin/hlsl/sampling/concepts.hlsl
# Conflicts:
#	examples_tests
#	include/nbl/builtin/hlsl/path_tracing/gaussian_filter.hlsl
#	include/nbl/builtin/hlsl/sampling/basic.hlsl
#	include/nbl/builtin/hlsl/sampling/box_muller_transform.hlsl
Comment on lines +99 to 103
s = sin<T>(theta);
c = cos<T>(theta);
s = sqrt<T>(T(NBL_FP64_LITERAL(1.0))-c*c);
s = ieee754::flipSign(s, theta < T(NBL_FP64_LITERAL(0.0)));
// s = sqrt<T>(T(NBL_FP64_LITERAL(1.0))-c*c);
// s = ieee754::flipSign(s, theta < T(NBL_FP64_LITERAL(0.0)));
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

write a comment why, justification is probably in #1048

typename Bilinear<scalar_type>::cache_type bilinearCache;
};

// NOTE: produces a degenerate (all-zero) bilinear patch when the receiver normal faces away
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't be true anymore thanks to us adding min to linear to prevent degeneracy

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved by me

# Conflicts:
#	examples_tests
Comment thread include/nbl/builtin/hlsl/sampling/concentric_mapping.hlsl
Comment thread include/nbl/builtin/hlsl/shapes/spherical_rectangle.hlsl
Comment on lines +54 to +61
// degenerate triangle: any side has near-zero sin, so csc blows up
if (hlsl::any<vector<bool, 3> >(retval.csc_sides >= hlsl::promote<vector3_type>(numeric_limits<scalar_type>::max)))
{
retval.cos_vertices = hlsl::promote<vector3_type>(0.0);
retval.sin_vertices = hlsl::promote<vector3_type>(0.0);
retval.solid_angle = 0;
return retval;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but when only one side has near zero sine, you end up with a thin line (really long thin triangle), do we need to handle that more gracefully?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can leave as an open TODO

vector3_type(hlsl::dot(retval.vertices[1], retval.vertices[2]), hlsl::dot(retval.vertices[2], retval.vertices[0]), hlsl::dot(retval.vertices[0], retval.vertices[1])),
hlsl::promote<vector3_type>(-1.0), hlsl::promote<vector3_type>(1.0));
const vector3_type sin_sides2 = hlsl::promote<vector3_type>(1.0) - retval.cos_sides * retval.cos_sides;
retval.csc_sides = hlsl::rsqrt<vector3_type>(sin_sides2);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you might only need to clamp the sin_sides2 = 1.0 - min(retval.cos_sides * retval.cos_sides,1.0)

because when you use cos_sides for the cos_vertices you clamp those again

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it only makes sense clamp cos_sides[2] individually here, because the samplers don't use anything else

so that only the square in sin_sides2 computation would get min

{
if (pyramidAngles())
return 0.f;
if (solid_angle <= numeric_limits<scalar_type>::epsilon)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you want to multiply the epsilon by 2*PI which is the max solid angle of a spherical triangle (can't be bigger than a hemisphere)

// the accumulated cosine slightly outside [-1,1] on GPU, making acos
// return NaN. GPU max(NaN,0)=0 then silently zeroes the solid angle.
retval.solid_angle = hlsl::max(angle_adder.getClampedSumOfArccosMinusPi(), scalar_type(0.0));

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I accept this, but why are you maxing something thats already clamped ? Why not clamp with the proper thing? I'm pretty sure compiler will have hard time optimizing the max on top of clamp

Fruthermore if you are clamping the sum, is there a point to clamping cos_vertices ?

Comment on lines +63 to 65
// cos_a - cos_b * cos_c - (1/sin_b * 1/sin_c)
retval.cos_vertices = hlsl::clamp((retval.cos_sides - retval.cos_sides.yzx * retval.cos_sides.zxy) * retval.csc_sides.yzx * retval.csc_sides.zxy, hlsl::promote<vector3_type>(-1.0), hlsl::promote<vector3_type>(1.0)); // using Spherical Law of Cosines
retval.sin_vertices = hlsl::sqrt(hlsl::promote<vector3_type>(1.0) - retval.cos_vertices * retval.cos_vertices);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

who will be hurt (who's not already clamping) if cos_vertices is unclamped?

its simpler to clamp a square when computing sin_vertices = 1.0-cos*cos because a square is always positive, so you only need a min not a clamp

Comment thread include/nbl/builtin/hlsl/sampling/spherical_triangle.hlsl
Comment on lines +79 to +102
retval.triCosC = tri.cos_sides[2];
// precompute great circle normal of arc AC: cross(A,C) has magnitude sin(b),
// so multiplying by csc(b) normalizes it; zero when side AC is degenerate
const scalar_type cscB = tri.csc_sides[1];
const vector3_type arcACPlaneNormal = hlsl::cross(tri.vertices[0], tri.vertices[2]) * hlsl::select(cscB < numeric_limits<scalar_type>::max, cscB, scalar_type(0));
retval.e_C = hlsl::cross(arcACPlaneNormal, tri.vertices[0]);
retval.cosA = tri.cos_vertices[0];
retval.sinA = tri.sin_vertices[0];
if (Algorithm == STA_ARVO)
{
retval.sinA_triCosC = retval.sinA * retval.triCosC;
retval.eCdotB = hlsl::dot(retval.e_C, tri.vertices[1]);
}
return retval;
}

codomain_type generate(const domain_type u, NBL_REF_ARG(cache_type) cache) NBL_CONST_MEMBER_FUNC
{
// Step 1: compute sin/cos of A_hat and the angle difference (A_hat - alpha)
const scalar_type A_hat = u.x / rcpSolidAngle;
scalar_type sinA_hat, cosA_hat;
math::sincos(A_hat, sinA_hat, cosA_hat);
const scalar_type s = sinA_hat * cosA - cosA_hat * sinA; // sin(A_hat - alpha)
const scalar_type t = cosA_hat * cosA + sinA_hat * sinA; // cos(A_hat - alpha)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the notation you're using , capital letters are for angles for vertices on the tangent plane, and small letters are for the angle at the pyramid apex/sides

Image

Comment on lines +137 to +138
cosBp = nbl::hlsl::clamp(cosBp, scalar_type(-1), scalar_type(1));
sinBp = sqrt<scalar_type>(max<scalar_type>(scalar_type(0), scalar_type(1) - cosBp * cosBp));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cosBp is already clamped, IEEE754 makes it impossible for 1-cosBp*cosBp to be outside of [-1,1]

Comment on lines -67 to -82
if (triCscB < numeric_limits<scalar_type>::max)
{
const scalar_type cosAngleAlongAC = ((v_ * q - u_ * p) * cosA - v_) / ((v_ * p + u_ * q) * sinA);
if (nbl::hlsl::abs(cosAngleAlongAC) < 1.f)
C_s += math::quaternion<scalar_type>::slerp_delta(tri_vertices[0], tri_vertices[2] * triCscB, cosAngleAlongAC);
}

vector3_type retval = tri_vertices[1];
const scalar_type cosBC_s = nbl::hlsl::dot(C_s, tri_vertices[1]);
const scalar_type csc_b_s = 1.0 / nbl::hlsl::sqrt(1.0 - cosBC_s * cosBC_s);
if (csc_b_s < numeric_limits<scalar_type>::max)
{
const scalar_type cosAngleAlongBC_s = scalar_type(1.0) + cosBC_s * u.y - u.y;
if (nbl::hlsl::abs(cosAngleAlongBC_s) < scalar_type(1.0))
retval += math::quaternion<scalar_type>::slerp_delta(tri_vertices[1], C_s * csc_b_s, cosAngleAlongBC_s);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how fast would this have been if you replaced the if statements with hlsl::select and have no branching plus inversesqrt and the fixed sincos ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

esp given that bits of slerp_delta could be precomputed in create #1001 (comment)

Comment on lines +316 to +317
// mirrored from base for uniform access across both specializations
scalar_type rcpSolidAngle;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but base already keeps it ?

const scalar_type rcpR = scalar_type(1.0) / nbl::hlsl::sqrt(R_sq);
vector3_type cp = base.tri_vertices[0] * (tripleE * rcpR) + base.e_C * (-tripleA * rcpR);
// two intersections exist; pick the one on the minor arc A->C
if (nbl::hlsl::dot(cp, base.tri_vertices[0] + vertexC) < scalar_type(0.0))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

store base.tri_vertices[0] + vertexC instead of vertexC

Also why the if-statement you could just do cp *= sign(dot()); or cp *= ieee754:::... if you don't want to mul by 0 when dot is 0

Comment on lines +229 to +294
domain_type generateInverse(const codomain_type L) NBL_CONST_MEMBER_FUNC
{
// Step 1: find C' = intersection of great circles (B,L) and (A,C)
const vector3_type BxL = nbl::hlsl::cross(base.tri_vertices[1], L);
const scalar_type sinBL_sq = nbl::hlsl::dot(BxL, BxL);
if (sinBL_sq < numeric_limits<scalar_type>::epsilon)
{
// L ~ B: u.y ~ 0, u.x is indeterminate (all u.x map to B when u.y=0).
// Recover u.y from |L-B|^2 / |A-B|^2 (using C'=A; the (1-cosCpB) ratio
// cancels so any C' gives the same result).
const vector3_type LminusB = L - base.tri_vertices[1];
const vector3_type AminusB = base.tri_vertices[0] - base.tri_vertices[1];
const scalar_type v_num = nbl::hlsl::dot(LminusB, LminusB);
const scalar_type v_denom = nbl::hlsl::dot(AminusB, AminusB);
const scalar_type v = hlsl::select(v_denom > numeric_limits<scalar_type>::epsilon,
nbl::hlsl::clamp(v_num / v_denom, scalar_type(0.0), scalar_type(1.0)),
scalar_type(0.0));
return vector2_type(scalar_type(0.0), v);
}

// C' lies on arc AC, so C' = A*cos(t) + e_C*sin(t).
// C' also lies on the B-L plane, so dot(BxL, C') = 0.
// Solving: (cos(t), sin(t)) = (tripleE, -tripleA) / R
const scalar_type tripleA = nbl::hlsl::dot(BxL, base.tri_vertices[0]);
const scalar_type tripleE = nbl::hlsl::dot(BxL, base.e_C);
const scalar_type R_sq = tripleA * tripleA + tripleE * tripleE;
if (R_sq < numeric_limits<scalar_type>::epsilon)
return vector2_type(scalar_type(0.0), scalar_type(0.0));

const scalar_type rcpR = scalar_type(1.0) / nbl::hlsl::sqrt(R_sq);
vector3_type cp = base.tri_vertices[0] * (tripleE * rcpR) + base.e_C * (-tripleA * rcpR);
// two intersections exist; pick the one on the minor arc A->C
if (nbl::hlsl::dot(cp, base.tri_vertices[0] + vertexC) < scalar_type(0.0))
cp = -cp;

// Step 2: u.x = solidAngle(A,B,C') / solidAngle(A,B,C)
// Van Oosterom-Strackee: tan(Omega/2) = |A.(BxC')| / (1 + A.B + B.C' + C'.A)
//
// Numerator stability: the naive triple product dot(A, cross(B, C')) suffers
// catastrophic cancellation when C' is near A (small u.x), because
// cross(B, C') ~ cross(B, A) and dot(A, cross(B, A)) = 0 exactly.
// Expanding C' = cosBp*A + sinBp*e_C into the triple product:
// A.(BxC') = cosBp * A.(BxA) + sinBp * A.(Bxe_C) = sinBp * A.(Bxe_C)
// since A.(BxA) = 0 identically. This avoids the cancellation.
const scalar_type cosBp_inv = nbl::hlsl::dot(cp, base.tri_vertices[0]);
const scalar_type sinBp_inv = nbl::hlsl::dot(cp, base.e_C);
const scalar_type AxBdotE = nbl::hlsl::dot(base.tri_vertices[0], nbl::hlsl::cross(base.tri_vertices[1], base.e_C));
const scalar_type num = sinBp_inv * AxBdotE;
const scalar_type cosCpB = nbl::hlsl::dot(base.tri_vertices[1], cp);
const scalar_type den = scalar_type(1.0) + base.triCosC + cosCpB + cosBp_inv;
const scalar_type subSolidAngle = scalar_type(2.0) * nbl::hlsl::atan2(nbl::hlsl::abs(num), den);
const scalar_type u = nbl::hlsl::clamp(subSolidAngle * rcpSolidAngle, scalar_type(0.0), scalar_type(1.0));

// Step 3: u.y = |L-B|^2 / |C'-B|^2
// Squared Euclidean distance avoids catastrophic cancellation vs (1-dot)/(1-dot)
const vector3_type LminusB = L - base.tri_vertices[1];
const vector3_type cpMinusB = cp - base.tri_vertices[1];
const scalar_type v_num = nbl::hlsl::dot(LminusB, LminusB);
const scalar_type v_denom = nbl::hlsl::dot(cpMinusB, cpMinusB);
const scalar_type v = hlsl::select(v_denom > numeric_limits<scalar_type>::epsilon,
nbl::hlsl::clamp(v_num / nbl::hlsl::max(v_denom, numeric_limits<scalar_type>::min),
scalar_type(0.0), scalar_type(1.0)),
scalar_type(0.0));

return vector2_type(u, v);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this does not look faster than my old impl

Comment on lines 98 to 106
@@ -87,7 +104,7 @@ struct SphericalTriangle
// See https://www.desmos.com/calculator/sdptomhbju
// Furthermore we could clip the polynomial calc to `Cu+D or `(Bu+C)u+D` for small arguments
const vector3_type pyramidAngles = hlsl::acos<vector3_type>(cos_sides);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw it would make more sense to keep acos_csc_approx instead of cos_sides as a member

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or to ALSO keep it

Comment on lines +52 to +53
// r_disk / r_xy = sqrt(1-z) / sqrt(1-z^2) = 1/sqrt(1+z)
const T scale = T(1.0) / hlsl::sqrt<T>(T(1.0) + v.z);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rsqrt

Comment on lines +40 to +41
const T z = T(1.0) - cmCache.r2;
const T xyScale = hlsl::sqrt<T>(hlsl::max<T>(T(0.0), T(2.0) - cmCache.r2));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert r2<=1.f then you don't need to clamp before the sqrt

does this pass any tests? why is it 2-r^2 !?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this just be a sqrt(r2) ?

Comment on lines +73 to +81
cache.abs_cos_theta = bilinear.forwardWeight(cache.warped, cache.bilinearCache);
cache.abs_cos_theta = bilinear.forwardWeight(u, cache.bilinearCache);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong thing, the MIS weight needs to be consistent, so this would need to be dot(L,receiverNormal)

do not store abs_cos_theta in the cache!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to pass the sampled L instead in the cache

Comment on lines +50 to +51
scalar_type abs_cos_theta;
vector2_type warped;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as the triangle

you don't need warped at all

abs_cos_theta wrong thing to save for UsePdfAsWeight=false, you should be saving L instead

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or in this case sampleOffset

{
ProjectedSphericalRectangle<T,UsePdfAsWeight> retval;
const vector3_type n = hlsl::mul(shape.basis, _receiverNormal);
retval.localReceiverNormal = n;
Copy link
Copy Markdown
Member

@devshgraphicsprogramming devshgraphicsprogramming Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternative is to produce 4 normalized corners in worldspace from a compressed rectangle

origin
origin+right
origin+up
origin+right+up

then 4 normalizations

here you're doing 4 full 3d normalizations with your lenSq, you're only saving a few MULs

the dot products are less expensive because its not 2 FMA and 1 MUL done 4 times over.

but the matrix mul to transform the receiver normal into rectangle local space is equivalent to 3 dots.

Add a create that takes a CompressedSphericalRectangle it might end up being faster because you're not using results of operations, there's a possibility of deeper pipelining/less stalls or different register usage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants