Faster convolve numba by ricardoV94 · Pull Request #2175 · pymc-devs/pytensor

ricardoV94 · 2026-05-27T08:48:33Z

LLVM really wins when it knows the static shape of the kernel, it can vectorize the inner loop.

Benchmark	Main (μs)	Branch (μs)	Speedup	Specialization?
numba batch=False mode=valid	2.50	1.47	1.70x	Yes — static shapes (183,), (6,)
numba batch=True mode=valid	9.23	2.15	4.29x	Yes — static shapes (7,183), (7,6)
numba batch=False mode=full	2.49	2.47	1.01x	No effect — `full` path unchanged; specialization only rewrites `valid_convolve1d`
numba batch=True mode=full	9.26	8.76	1.06x	No effect — `full` path unchanged
numba grad full	82.28	81.00	~1.0x	No — inputs have `shape=(8, None)`, `use_static=False`
numba grad valid	80.72	80.27	~1.0x	No — inputs have `shape=(8, None)`, `use_static=False`

On the defaul/demo MMM model in pymc-marketing this translates to a ~1.2x speedup in the logp+dlogp function.

This PR also develops a mechanism to provide the out_argument for blockwise/non-scalar RV functions, which avoids the useless copy of the inner function buffer to the blockwise batched buffer. We should follow up and start using this in as many places as we can.

For now there's a hacky .handles_out argument that specifies the behavior, we can think of a better API, but I wouldn't hang too much on it.

The speed benefits are smaller (you can see 1-2 us in batched cases where the argument plays a role). It's more dramatic for blockwise of cheap inner graphs, and obviously reduces intermediate memory consumption.

ricardoV94 · 2026-05-27T12:38:45Z

+@register_canonicalize
+@register_specialize
+@node_rewriter([SpecifyShape])
+def local_specify_shape_alloc(fgraph, node):


this was messing some intermediate graphs I explored

should this be shape_unsafe or something? I know it's not literally shape unsafe, but it's weird that a ViewOp ends up mutating the inputs.

Hmm so this could mask an specifyshape(alloc(x, 3), 5), in which we only get the 3 at runtime. If it was static we would know at graph definition time.

In practice it's more like we have a alloc(x, shape(y)), from an elemwise broadcast and y doesn't have static shape, but through rewrites we found at some point that alloc must have length 5, so it's simpler not to rely on the shape of y

say alloc(x, shape(y)) + zeros(5) -> becomes specify_shape(alloc(x, shape(y)), 5) -> alloc(x, 5)

jessegrabowski · 2026-05-27T13:39:53Z

I remember we talked about batched 1d vs 2d convolution, does that story play in here anywhere?

ricardoV94 · 2026-05-27T13:55:31Z

I remember we talked about batched 1d vs 2d convolution, does that story play in here anywhere?

Maybe for large kernels -> GPU but jax will likely do that already. For CPU and small kernels I don't think so. Like fft only starts winning at the 1000s. But Convolution is a rabbit hole, this is by no means the solution, just empirically better than what we had before.

Also we don't yet have a native Convolve2D so it's not even an option

jessegrabowski · 2026-05-27T13:58:27Z

what do you mean by native? Numba dispatch?

ricardoV94 · 2026-05-27T13:59:26Z

what do you mean by native? Numba dispatch?

Yes

ricardoV94 · 2026-05-27T14:06:08Z

regardless of conv2d tricks, this would still apply tho the unbatched 1d case ofc

jessegrabowski · 2026-05-27T14:44:12Z

Yeah i get that im asking a tangent question

jessegrabowski

looks good broadly, some questions (in particular about overloading SpecifyShape to be actually enforced on the graph, not just as shape information)

jessegrabowski · 2026-05-27T13:42:28Z

+    a_static_len = node.inputs[0].type.shape[-1]
+    b_static_len = node.inputs[1].type.shape[-1]


I guess this is rewrite safe because by the time we get to dispatch we're always done rewriting?

What would be unsafe about it? The only thing is numba sees a constant in the inner loop. If we can't trust atatic shape after compile we would need to change many other places

nothing would be unsafe about it. i'm just thinking out loud.

jessegrabowski · 2026-05-27T16:23:19Z

-            return valid_convolve1d(x, y)
+            return valid_convolve1d(x, y, out=out)
+
+    convolve_1d.handles_out = True


Do we need to go back and add this tag everywhere that allows inplace?

There's nothing else that supports it, it's a new argument/property

jessegrabowski · 2026-05-27T16:25:27Z

+@register_canonicalize
+@register_specialize
+@node_rewriter([SpecifyShape])
+def local_specify_shape_alloc(fgraph, node):


should this be shape_unsafe or something? I know it's not literally shape unsafe, but it's weird that a ViewOp ends up mutating the inputs.

ricardoV94 added numba performance vectorization memory opt Memory optimization labels May 27, 2026

ricardoV94 force-pushed the faster_convolve_numba branch from 613b985 to cb54a7e Compare May 27, 2026 11:37

ricardoV94 marked this pull request as ready for review May 27, 2026 12:09

ricardoV94 mentioned this pull request May 27, 2026

Freeze MMM model before sampling pymc-labs/pymc-marketing#2606

Open

ricardoV94 requested a review from jessegrabowski May 27, 2026 12:37

ricardoV94 commented May 27, 2026

View reviewed changes

ricardoV94 added 4 commits May 27, 2026 14:46

Rewrite SpecifyShape of Alloc

d3be4fb

Numba convolve: special path for valid and small kernels

f294db5

Numba vectorize: Allow out argument

fcc0571

Numba vectorize: rewrite core_shape graph

e447d97

ricardoV94 force-pushed the faster_convolve_numba branch from cb54a7e to e447d97 Compare May 27, 2026 12:47

jessegrabowski approved these changes May 27, 2026

View reviewed changes

		a_static_len = node.inputs[0].type.shape[-1]
		b_static_len = node.inputs[1].type.shape[-1]

Conversation

ricardoV94 commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ricardoV94 May 27, 2026

Choose a reason for hiding this comment

Uh oh!

jessegrabowski May 27, 2026

Choose a reason for hiding this comment

Uh oh!

ricardoV94 May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jessegrabowski commented May 27, 2026

Uh oh!

ricardoV94 commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jessegrabowski commented May 27, 2026

Uh oh!

ricardoV94 commented May 27, 2026

Uh oh!

ricardoV94 commented May 27, 2026

Uh oh!

jessegrabowski commented May 27, 2026

Uh oh!

jessegrabowski left a comment

Choose a reason for hiding this comment

Uh oh!

jessegrabowski May 27, 2026

Choose a reason for hiding this comment

Uh oh!

ricardoV94 May 27, 2026

Choose a reason for hiding this comment

Uh oh!

jessegrabowski May 27, 2026

Choose a reason for hiding this comment

Uh oh!

jessegrabowski May 27, 2026

Choose a reason for hiding this comment

Uh oh!

ricardoV94 May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jessegrabowski May 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ricardoV94 commented May 27, 2026 •

edited

Loading

ricardoV94 May 27, 2026 •

edited

Loading

ricardoV94 commented May 27, 2026 •

edited

Loading

ricardoV94 May 27, 2026 •

edited

Loading